linux.kernel - 26 new messages in 14 topics - digest
linux.kernel
http://groups.google.com/group/linux.kernel?hl=en
linux.kernel@googlegroups.com
Today's topics:
* xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/fa3947f0129612f2?hl=en
* kernel: audit/fix non-modular users of module_init in core code - 2 messages,
1 author
http://groups.google.com/group/linux.kernel/t/63a135f2704e060d?hl=en
* reciprocal_divide: correction/update of the algorithm - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/79ceff511638a29b?hl=en
* sys, seccomp: add PR_SECCOMP_EXT and SECCOMP_EXT_ACT_TSYNC - 5 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/e2f632c51e6112a2?hl=en
* ARM: perf_event: Support percpu irqs for the CPU PMU - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/678bc3f50d334daf?hl=en
* ARM: OMAP4: sleep: byteswap data for big-endian - 3 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/42dfa14ea7e21a21?hl=en
* Makefile: Build with -Werror=date-time if the compiler supports it - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/2a0984624aad195f?hl=en
* cross rename v3 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2b03f04905ff68d3?hl=en
* hvc: ensure hvc_init is only ever called once in hvc_console.c - 2 messages,
2 authors
http://groups.google.com/group/linux.kernel/t/5282e7efa4f9699f?hl=en
* Should we make the primary interrupt handler configurable for regmap_add_irq_
chip()? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8dc782cf0d3940f5?hl=en
* sysfs_rename_link() and its usage - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/f1f1b010c1a48195?hl=en
* perf report: Add --percentage option - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a2b106be220a1f65?hl=en
* mei: allow multiple retries if the hw reset has failed - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/1ed6574c84928b7d?hl=en
* locks: skip deadlock detection on FL_FILE_PVT locks - 4 messages, 3 authors
http://groups.google.com/group/linux.kernel/t/6a7c503eabf52b07?hl=en
==============================================================================
TOPIC: xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
http://groups.google.com/group/linux.kernel/t/fa3947f0129612f2?hl=en
==============================================================================
== 1 of 1 ==
Date: Tues, Jan 14 2014 12:50 pm
From: Zoltan Kiss
A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations
v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush
v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.
v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.
[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: kernel: audit/fix non-modular users of module_init in core code
http://groups.google.com/group/linux.kernel/t/63a135f2704e060d?hl=en
==============================================================================
== 1 of 2 ==
Date: Tues, Jan 14 2014 12:50 pm
From: Paul Gortmaker
Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular. So using module_init as
an alias for __initcall can be somewhat misleading.
Fix these up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.
The audit targets the following module_init users for change:
kernel/user.c obj-y
kernel/kexec.c bool KEXEC (one instance per arch)
kernel/profile.c bool PROFILING
kernel/hung_task.c bool DETECT_HUNG_TASK
kernel/sched/stats.c bool SCHEDSTATS
kernel/user_namespace.c bool USER_NS
Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for these files) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that difference has been observed
during testing.
Also, two instances of missing ";" at EOL are fixed in kexec.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
kernel/hung_task.c | 3 +--
kernel/kexec.c | 4 ++--
kernel/profile.c | 2 +-
kernel/sched/stats.c | 2 +-
kernel/user.c | 3 +--
kernel/user_namespace.c | 2 +-
6 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 9328b80eaf14..7899ee9dd212 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -244,5 +244,4 @@ static int __init hung_task_init(void)
return 0;
}
-
-module_init(hung_task_init);
+subsys_initcall(hung_task_init);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 9c970167e402..418f069b0314 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1234,7 +1234,7 @@ static int __init crash_notes_memory_init(void)
}
return 0;
}
-module_init(crash_notes_memory_init)
+subsys_initcall(crash_notes_memory_init);
/*
@@ -1628,7 +1628,7 @@ static int __init crash_save_vmcoreinfo_init(void)
return 0;
}
-module_init(crash_save_vmcoreinfo_init)
+subsys_initcall(crash_save_vmcoreinfo_init);
/*
* Move into place and start executing a preloaded standalone
diff --git a/kernel/profile.c b/kernel/profile.c
index 6631e1ef55ab..b37576b22acc 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -604,5 +604,5 @@ int __ref create_proc_profile(void) /* false positive from hotcpu_notifier */
hotcpu_notifier(profile_cpu_callback, 0);
return 0;
}
-module_init(create_proc_profile);
+subsys_initcall(create_proc_profile);
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home