Thursday, November 7, 2013

linux.kernel - 26 new messages in 14 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* platform: add chrome platform directory - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ca8784c0df1499d1?hl=en
* [ext2] XIP does not work on ext2 - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/7324138990336f3c?hl=en
* MCS Lock: Barrier corrections - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/c949c96528a47270?hl=en
* irq_work: Provide a irq work that can be processed on any cpu - 8 messages,
3 authors
http://groups.google.com/group/linux.kernel/t/7bc00cc28be7d4b5?hl=en
* usb-serial lockdep trace in linus' current tree. - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/10ab65d5a0e4e1ea?hl=en
* wire up CPU features to udev based module loading - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/7bd137213e7534f5?hl=en
* ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/b1f45f3038c24409?hl=en
* Introducing Device Tree Overlays - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/7c26bee9afcc62d1?hl=en
* ext4: Fix reading of extended tv_sec (bug 23732) - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/1c061384ec63b78b?hl=en
* LEDS: tca6507 - fix bugs in parsing of device-tree configuration. - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/1eb3aff0df8d6fb5?hl=en
* staging: zsmalloc: Ensure handle is never 0 on success - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/56e7b28ac067dee2?hl=en
* kmod: Run usermodehelpers only on cpus allowed for kthreadd V2 - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/a1700157e52e62ff?hl=en
* dmaengine: add msm bam dma driver - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/eb7ef92f0916cf63?hl=en
* linux-next: reminder - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/6baf83471f093f73?hl=en

==============================================================================
TOPIC: platform: add chrome platform directory
http://groups.google.com/group/linux.kernel/t/ca8784c0df1499d1?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Olof Johansson


It makes sense to split out the Chromebook/Chromebox hardware platform
drivers to a separate subdirectory, since some of it will be shared
between ARM and x86.

This moves over the existing chromeos_laptop driver without making
any other changes, and adds appropriate Kconfig entries for the new
directory. It also adds a MAINTAINERS entry for the new subdir.

Signed-off-by: Olof Johansson <olof@lixom.net>
---
MAINTAINERS | 5 ++++
drivers/platform/Kconfig | 1 +
drivers/platform/Makefile | 1 +
drivers/platform/chrome/Kconfig | 28 ++++++++++++++++++++++
drivers/platform/chrome/Makefile | 2 ++
drivers/platform/{x86 => chrome}/chromeos_laptop.c | 0
drivers/platform/x86/Kconfig | 11 ---------
drivers/platform/x86/Makefile | 1 -
8 files changed, 37 insertions(+), 12 deletions(-)
create mode 100644 drivers/platform/chrome/Kconfig
create mode 100644 drivers/platform/chrome/Makefile
rename drivers/platform/{x86 => chrome}/chromeos_laptop.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 831b8690cf13..07e312a3377b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2129,6 +2129,11 @@ L: linux-usb@vger.kernel.org
S: Maintained
F: drivers/usb/chipidea/

+CHROME HARDWARE PLATFORM SUPPORT
+M: Olof Johansson <olof@lixom.net>
+S: Maintained
+F: drivers/platform/chrome/
+
CISCO VIC ETHERNET NIC DRIVER
M: Christian Benvenuti <benve@cisco.com>
M: Sujith Sankar <ssujith@cisco.com>
diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
index 69616aeaa966..09fde58b12e0 100644
--- a/drivers/platform/Kconfig
+++ b/drivers/platform/Kconfig
@@ -5,3 +5,4 @@ if GOLDFISH
source "drivers/platform/goldfish/Kconfig"
endif

+source "drivers/platform/chrome/Kconfig"
diff --git a/drivers/platform/Makefile b/drivers/platform/Makefile
index 8a44a4cd6d1e..3656b7b17b99 100644
--- a/drivers/platform/Makefile
+++ b/drivers/platform/Makefile
@@ -5,3 +5,4 @@
obj-$(CONFIG_X86) += x86/
obj-$(CONFIG_OLPC) += olpc/
obj-$(CONFIG_GOLDFISH) += goldfish/
+obj-$(CONFIG_CHROME_PLATFORMS) += chrome/
diff --git a/drivers/platform/chrome/Kconfig b/drivers/platform/chrome/Kconfig
new file mode 100644
index 000000000000..b13303e75a34
--- /dev/null
+++ b/drivers/platform/chrome/Kconfig
@@ -0,0 +1,28 @@
+#
+# Platform support for Chrome OS hardware (Chromebooks and Chromeboxes)
+#
+
+menuconfig CHROME_PLATFORMS
+ bool "Platform support for Chrome hardware"
+ depends on X86
+ ---help---
+ Say Y here to get to see options for platform support for
+ various Chromebooks and Chromeboxes. This option alone does
+ not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and disabled.
+
+if CHROME_PLATFORMS
+
+config CHROMEOS_LAPTOP
+ tristate "Chrome OS Laptop"
+ depends on I2C
+ depends on DMI
+ ---help---
+ This driver instantiates i2c and smbus devices such as
+ light sensors and touchpads.
+
+ If you have a supported Chromebook, choose Y or M here.
+ The module will be called chromeos_laptop.
+
+endif # CHROMEOS_PLATFORMS
diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
new file mode 100644
index 000000000000..015e9195e226
--- /dev/null
+++ b/drivers/platform/chrome/Makefile
@@ -0,0 +1,2 @@
+
+obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o
diff --git a/drivers/platform/x86/chromeos_laptop.c b/drivers/platform/chrome/chromeos_laptop.c
similarity index 100%
rename from drivers/platform/x86/chromeos_laptop.c
rename to drivers/platform/chrome/chromeos_laptop.c
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index b51a7460cc49..d9dcd37b5a52 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -79,17 +79,6 @@ config ASUS_LAPTOP

If you have an ACPI-compatible ASUS laptop, say Y or M here.

-config CHROMEOS_LAPTOP
- tristate "Chrome OS Laptop"
- depends on I2C
- depends on DMI
- ---help---
- This driver instantiates i2c and smbus devices such as
- light sensors and touchpads.
-
- If you have a supported Chromebook, choose Y or M here.
- The module will be called chromeos_laptop.
-
config DELL_LAPTOP
tristate "Dell Laptop Extras"
depends on X86
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index 5dbe19324351..f0e6aa407ffb 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -50,7 +50,6 @@ obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o
obj-$(CONFIG_INTEL_OAKTRAIL) += intel_oaktrail.o
obj-$(CONFIG_SAMSUNG_Q10) += samsung-q10.o
obj-$(CONFIG_APPLE_GMUX) += apple-gmux.o
-obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o
obj-$(CONFIG_INTEL_RST) += intel-rst.o
obj-$(CONFIG_INTEL_SMARTCONNECT) += intel-smartconnect.o

--
1.8.4.1.601.g02b3b1d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: [ext2] XIP does not work on ext2
http://groups.google.com/group/linux.kernel/t/7324138990336f3c?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Jan Kara


On Thu 07-11-13 13:50:09, Andiry Xu wrote:
> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@suse.cz> wrote:
> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
> >> >> >> know it's not related the thread but I really appreciate if you can
> >> >> >> answer my question.
> >> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page
> >> >> > basis - you first access the page, it gets faulted in and you can further
> >> >> > access it. So for small (sub page size) accesses this is a win because you
> >> >> > don't have an overhead of syscall and fs write path. For accesses larger
> >> >> > than page size the overhead of syscall and some initial checks is well
> >> >> > hidden by other things. I guess write() ends up being more efficient
> >> >> > because write path taken for each page is somewhat lighter than full page
> >> >> > fault. But you'd need to look into perf data to get some hard numbers on
> >> >> > where the time is spent.
> >> >> >
> >> >>
> >> >> Thanks for the reply. However I have filled up the whole RAM disk
> >> >> before doing the test, i.e. asked the brd driver to allocate all the
> >> >> pages initially.
> >> > Well, pages in ramdisk are always present, that's not an issue. But you
> >> > will get a page fault to map a particular physical page in process'
> >> > virtual address space when you first access that virtual address in the
> >> > mapping from the process. The cost of setting up this virtual->physical
> >> > mapping is what I'm talking about.
> >> >
> >>
> >> Yes, you are right, there are page faults observed with perf. I
> >> misunderstood page fault as copying pages between backing store and
> >> physical memory.
> >>
> >> > If you had a process which first mmaps the file and writes to all pages in
> >> > the mapping and *then* measure the cost of another round of writing to the
> >> > mapping, I would expect you should see speeds close to those of memory bus.
> >> >
> >>
> >> I've tried this as well. mmap() performance improves but still not as
> >> good as write().
> >> I used the perf report to compare write() and mmap() applications. For
> >> write() version, top of perf report shows as:
> >> 33.33% __copy_user_nocache
> >> 4.72% ext2_get_blocks
> >> 4.42% mutex_unlock
> >> 3.59% __find_get_block
> >>
> >> which looks reasonable.
> >>
> >> However, for mmap() version, the perf report looks strange:
> >> 94.98% libc-2.15.so [.] 0x000000000014698d
> >> 2.25% page_fault
> >> 0.18% handle_mm_fault
> >>
> >> I don't know what the first item is but it took the majority of cycles.
> > The first item means that it's some userspace code in libc. My guess
> > would be that it's libc's memcpy() function (or whatever you use to write
> > to mmap). How do you access the mmap?
> >
>
> Like this:
>
> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
> for (i = 0; i < count; i++)
> {
> memcpy(dest, src, request_size);
> dest += request_size;
> }
OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
to tune that though...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Andiry Xu


On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 07-11-13 13:50:09, Andiry Xu wrote:
>> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@suse.cz> wrote:
>> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
>> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@suse.cz> wrote:
>> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
>> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
>> >> >> >> know it's not related the thread but I really appreciate if you can
>> >> >> >> answer my question.
>> >> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page
>> >> >> > basis - you first access the page, it gets faulted in and you can further
>> >> >> > access it. So for small (sub page size) accesses this is a win because you
>> >> >> > don't have an overhead of syscall and fs write path. For accesses larger
>> >> >> > than page size the overhead of syscall and some initial checks is well
>> >> >> > hidden by other things. I guess write() ends up being more efficient
>> >> >> > because write path taken for each page is somewhat lighter than full page
>> >> >> > fault. But you'd need to look into perf data to get some hard numbers on
>> >> >> > where the time is spent.
>> >> >> >
>> >> >>
>> >> >> Thanks for the reply. However I have filled up the whole RAM disk
>> >> >> before doing the test, i.e. asked the brd driver to allocate all the
>> >> >> pages initially.
>> >> > Well, pages in ramdisk are always present, that's not an issue. But you
>> >> > will get a page fault to map a particular physical page in process'
>> >> > virtual address space when you first access that virtual address in the
>> >> > mapping from the process. The cost of setting up this virtual->physical
>> >> > mapping is what I'm talking about.
>> >> >
>> >>
>> >> Yes, you are right, there are page faults observed with perf. I
>> >> misunderstood page fault as copying pages between backing store and
>> >> physical memory.
>> >>
>> >> > If you had a process which first mmaps the file and writes to all pages in
>> >> > the mapping and *then* measure the cost of another round of writing to the
>> >> > mapping, I would expect you should see speeds close to those of memory bus.
>> >> >
>> >>
>> >> I've tried this as well. mmap() performance improves but still not as
>> >> good as write().
>> >> I used the perf report to compare write() and mmap() applications. For
>> >> write() version, top of perf report shows as:
>> >> 33.33% __copy_user_nocache
>> >> 4.72% ext2_get_blocks
>> >> 4.42% mutex_unlock
>> >> 3.59% __find_get_block
>> >>
>> >> which looks reasonable.
>> >>
>> >> However, for mmap() version, the perf report looks strange:
>> >> 94.98% libc-2.15.so [.] 0x000000000014698d
>> >> 2.25% page_fault
>> >> 0.18% handle_mm_fault
>> >>
>> >> I don't know what the first item is but it took the majority of cycles.
>> > The first item means that it's some userspace code in libc. My guess
>> > would be that it's libc's memcpy() function (or whatever you use to write
>> > to mmap). How do you access the mmap?
>> >
>>
>> Like this:
>>
>> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
>> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
>> for (i = 0; i < count; i++)
>> {
>> memcpy(dest, src, request_size);
>> dest += request_size;
>> }
> OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
> to tune that though...
>

Hmm, I will try some different kinds of memcpy to see if there is a
difference. Just want to make sure I do not make some stupid mistakes
before trying that.
Thanks a lot for your help!

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: MCS Lock: Barrier corrections
http://groups.google.com/group/linux.kernel/t/c949c96528a47270?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Peter Zijlstra


On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
> Michel, are you planning to do an implementation of
> load-acquire/store-release functions of various architectures?

A little something like this:
http://marc.info/?l=linux-arch&m=138386254111507

It so happens we were working on that the past week or so due to another
issue ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Michel Lespinasse


On Thu, Nov 7, 2013 at 2:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
>> Michel, are you planning to do an implementation of
>> load-acquire/store-release functions of various architectures?
>
> A little something like this:
> http://marc.info/?l=linux-arch&m=138386254111507
>
> It so happens we were working on that the past week or so due to another
> issue ;-)

Haha, awesome, I wasn't aware of this effort.

Tim: my approach would be to provide the acquire/release operations in
arch-specific include files, and have a default implementation using
barriers for arches who don't provide these new ops. That way you make
it work on all arches at once (using the default implementation) and
make it fast on any arch that cares.

>> Or is the approach of arch specific memory barrier for MCS
>> an acceptable one before load-acquire and store-release
>> are available? Are there any technical issues remaining with
>> the patchset after including including Waiman's arch specific barrier?

I don't want to stand in the way of Waiman's change, and I had
actually taken the same approach with arch-specific barriers when
proposing some queue spinlocks in the past; however I do feel that
this comes back regularly enough that having acquire/release
primitives available would help, hence my proposal.

That said, earlier in the thread Linus said we should probably get all
our ducks in a row before going forward with this, so...

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: irq_work: Provide a irq work that can be processed on any cpu
http://groups.google.com/group/linux.kernel/t/7bc00cc28be7d4b5?hl=en
==============================================================================

== 1 of 8 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Frederic Weisbecker


On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
> Have a look into irq_work_queue(). There is:
> /*
> * If the work is not "lazy" or the tick is stopped, raise the irq
> * work interrupt (if supported by the arch), otherwise, just wait
> * for the next tick. We do this even for unbound work to make sure
> * *some* CPU will be doing the work.
> */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
>
> So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

Ok but we raise that interrupt locally, not to the other CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 8 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Frederic Weisbecker


On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
> Have a look into irq_work_queue(). There is:
> /*
> * If the work is not "lazy" or the tick is stopped, raise the irq
> * work interrupt (if supported by the arch), otherwise, just wait
> * for the next tick. We do this even for unbound work to make sure
> * *some* CPU will be doing the work.
> */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
>
> So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

That said I agree that it would be nice to have smp_call_function_many() support
non waiting calls, something based on llist, that would be less deadlock prone
to begin with.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 3 of 8 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Frederic Weisbecker


2013/11/7 Jan Kara <jack@suse.cz>:
> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because interrupt from it wasn't
> served for too long. This is why just silencing watchdogs isn't a
> reliable solution to the problem and we simply have to avoid spending
> too long in console_unlock() with interrupts disabled.
>
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over X characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. User can tune the value X via
> printk.offload_chars kernel parameter.
>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Jan Kara <jack@suse.cz>

When a message takes tens of seconds to be printed, it usually means
we are in trouble somehow :)
I wonder what printk source can trigger such a high volume.

May be cutting some huge message into smaller chunks could help? That
would re enable interrupts between each call.

It's hard to tell without the context, but using other CPUs for
rescuing doesn't look like a good solution. What if the issue happens
in UP to begin with?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 4 of 8 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Jan Kara


On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara <jack@suse.cz>:
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because interrupt from it wasn't
> > served for too long. This is why just silencing watchdogs isn't a
> > reliable solution to the problem and we simply have to avoid spending
> > too long in console_unlock() with interrupts disabled.
> >
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over X characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. User can tune the value X via
> > printk.offload_chars kernel parameter.
> >
> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
>
> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.
Machines with tens of processors and thousands of scsi devices. When
device discovery happens on boot, all processors are busily reporting new
scsi devices and one poor looser is bound to do the printing for ever and
ever until the machine dies...

Or try running sysrq-t on a large machine with serial console connected. The
machine will die because of lockups (although in this case I agree it is more
of a problem of sysrq-t doing lots of printing in interrupt-disabled
context).

> May be cutting some huge message into smaller chunks could help? That
> would re enable interrupts between each call.
>
> It's hard to tell without the context, but using other CPUs for
> rescuing doesn't look like a good solution. What if the issue happens
> in UP to begin with?
The real trouble in practice is that while one cpu is doing printing,
other cpus are appending to the printk buffer. So the cpu can be printing
for a *long* time. So offloading the work to other cpus which are also
appending messages seems as a fair thing to do.

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 5 of 8 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Steven Rostedt


On Thu, 7 Nov 2013 23:43:52 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:


> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.

The only ones that I'm aware of is the prints from sysrq. Which
includes a ftrace_dump as well. These have caused issues in the past,
but usually I do them on boxes that I reboot afterward anyway.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 6 of 8 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Frederic Weisbecker


On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > But then, who's going to process that work if every CPUs is idle?
> > > Have a look into irq_work_queue(). There is:
> > > /*
> > > * If the work is not "lazy" or the tick is stopped, raise the irq
> > > * work interrupt (if supported by the arch), otherwise, just wait
> > > * for the next tick. We do this even for unbound work to make sure
> > > * *some* CPU will be doing the work.
> > > */
> > > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > > arch_irq_work_raise();
> > > }
> > >
> > > So we raise an interrupt if there would be no timer ticking (which is
> > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > patches...
> >
> > Ok but we raise that interrupt locally, not to the other CPUs.
> True, but that doesn't really matter in this case. Any CPU (including the
> local one) can handle the unbound work. So from the definition of the
> unbound work things are OK.

I don't see how that can be ok. You want to offline a work because the local CPU
can't handle it, right? If the local CPU can handle it you can just use local
irq works.

>
> Regarding my use for printk - if all (other) CPUs are idle then we can
> easily afford making the current cpu busy printing, that's not a problem.
> There's nothing else to do than to print what's remaining in the printk
> buffer...

So if the current CPU can handle it, what is the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 7 of 8 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Jan Kara


On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > But then, who's going to process that work if every CPUs is idle?
> > Have a look into irq_work_queue(). There is:
> > /*
> > * If the work is not "lazy" or the tick is stopped, raise the irq
> > * work interrupt (if supported by the arch), otherwise, just wait
> > * for the next tick. We do this even for unbound work to make sure
> > * *some* CPU will be doing the work.
> > */
> > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > arch_irq_work_raise();
> > }
> >
> > So we raise an interrupt if there would be no timer ticking (which is
> > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > patches...
>
> Ok but we raise that interrupt locally, not to the other CPUs.
True, but that doesn't really matter in this case. Any CPU (including the
local one) can handle the unbound work. So from the definition of the
unbound work things are OK.

Regarding my use for printk - if all (other) CPUs are idle then we can
easily afford making the current cpu busy printing, that's not a problem.
There's nothing else to do than to print what's remaining in the printk
buffer...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 8 of 8 ==
Date: Thurs, Nov 7 2013 3:10 pm
From: Jan Kara


On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > > But then, who's going to process that work if every CPUs is idle?
> > > > Have a look into irq_work_queue(). There is:
> > > > /*
> > > > * If the work is not "lazy" or the tick is stopped, raise the irq
> > > > * work interrupt (if supported by the arch), otherwise, just wait
> > > > * for the next tick. We do this even for unbound work to make sure
> > > > * *some* CPU will be doing the work.
> > > > */
> > > > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > > > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > > > arch_irq_work_raise();
> > > > }
> > > >
> > > > So we raise an interrupt if there would be no timer ticking (which is
> > > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > > patches...
> > >
> > > Ok but we raise that interrupt locally, not to the other CPUs.
> > True, but that doesn't really matter in this case. Any CPU (including the
> > local one) can handle the unbound work. So from the definition of the
> > unbound work things are OK.
>
> I don't see how that can be ok. You want to offline a work because the
> local CPU can't handle it, right? If the local CPU can handle it you can
> just use local irq works.
>
> > Regarding my use for printk - if all (other) CPUs are idle then we can
> > easily afford making the current cpu busy printing, that's not a problem.
> > There's nothing else to do than to print what's remaining in the printk
> > buffer...
>
> So if the current CPU can handle it, what is the problem?
I hope this gets cleared out in my other email. But to make sure: If
other CPUs are idle (i.e. not appending to the printk buffer), we can well
handle the printing on the current CPU (with some breaks to allow
interrupts to be served etc.). If other CPUs are also appending to printk
buffer, that's where we really want to push the work to other CPUs.

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: usb-serial lockdep trace in linus' current tree.
http://groups.google.com/group/linux.kernel/t/10ab65d5a0e4e1ea?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Dave Jones


Seeing this since todays USB merge.

WARNING: CPU: 0 PID: 226 at kernel/lockdep.c:2740 lockdep_trace_alloc+0xc5/0xd0()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
Modules linked in: usb_debug(+) kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode(+) pcspkr serio_raw
CPU: 0 PID: 226 Comm: systemd-udevd Not tainted 3.12.0+ #112
ffffffff81a22d3d ffff88023cde5670 ffffffff8171a8e8 ffff88023cde56b8
ffff88023cde56a8 ffffffff8105430d 0000000000000046 00000000000080d0
0000000000000010 0000000000000001 ffff880244407a80 ffff88023cde5708
Call Trace:
[<ffffffff8171a8e8>] dump_stack+0x4e/0x82
[<ffffffff8105430d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff8105437c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff810cbcd5>] lockdep_trace_alloc+0xc5/0xd0
[<ffffffff811a5633>] __kmalloc+0x53/0x350
[<ffffffff8153e417>] ? xhci_urb_enqueue+0xb7/0x610
[<ffffffff81339f5c>] ? debug_dma_mapping_error+0x7c/0x90
[<ffffffff8153e417>] xhci_urb_enqueue+0xb7/0x610
[<ffffffff8150fad6>] usb_hcd_submit_urb+0xa6/0xae0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff810c5439>] ? get_lock_stats+0x19/0x60
[<ffffffff810c5bae>] ? put_lock_stats.isra.28+0xe/0x40
[<ffffffff81511569>] usb_submit_urb+0x1f9/0x470
[<ffffffff81554ca5>] usb_serial_generic_write_start+0xf5/0x210
[<ffffffff81554ef0>] usb_serial_generic_write+0x70/0x90
[<ffffffff81555637>] usb_console_write+0xc7/0x220
[<ffffffff810af585>] call_console_drivers.constprop.23+0xa5/0x1e0
[<ffffffff810afe0c>] console_unlock+0x40c/0x460
[<ffffffff810b10ec>] register_console+0x12c/0x390
[<ffffffff81555b62>] usb_serial_console_init+0x22/0x40
[<ffffffff815539aa>] usb_serial_probe+0xfea/0x10e0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff810c5439>] ? get_lock_stats+0x19/0x60
[<ffffffff8172004d>] ? __mutex_unlock_slowpath+0xed/0x1a0
[<ffffffff810c8af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
[<ffffffff810c8bcd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff815164ef>] usb_probe_interface+0x1cf/0x300
[<ffffffff814ad607>] driver_probe_device+0x87/0x390
[<ffffffff814ad9e3>] __driver_attach+0x93/0xa0
[<ffffffff814ad950>] ? __device_attach+0x40/0x40
[<ffffffff814ab53b>] bus_for_each_dev+0x6b/0xb0
[<ffffffff814ad02e>] driver_attach+0x1e/0x20
[<ffffffff8155246e>] usb_serial_register_drivers+0x29e/0x580
[<ffffffffa0005000>] ? 0xffffffffa0004fff
[<ffffffffa000501e>] usb_serial_module_init+0x1e/0x1000 [usb_debug]
[<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
[<ffffffff8103c7b3>] ? set_memory_nx+0x43/0x50
[<ffffffff810d9e42>] load_module+0x1fd2/0x26a0
[<ffffffff810d4f90>] ? store_uevent+0x40/0x40
[<ffffffff810da6a6>] SyS_finit_module+0x86/0xb0
[<ffffffff8172db64>] tracesys+0xdd/0xe2
---[ end trace ee033a3c9fd6263b ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: wire up CPU features to udev based module loading
http://groups.google.com/group/linux.kernel/t/7bd137213e7534f5?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Andi Kleen


> - the module aliases host tool has no arch specific dependencies at
> all except having x86cpu as one of the entries: would you mind
> dropping the x86 prefix there? Or rather add dependencies on $ARCH?
> (If we drop it there, we basically end up with 'cpu:' everywhere)

Should be fine.

> - in the vendor/family/model case, it may be preferable to drop these
> fields entirely from certain modules' aliases if they match on 'any'
> (provided that the module tools permit this) rather than add
> architecture, variant, revision, etc fields for all architectures if
> they can only ever match on one

The module tools require everything matching with the same wild cards.

So I don't know how "any" would work.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: "H. Peter Anvin"


On 11/07/2013 02:15 PM, Ard Biesheuvel wrote:
>
> That would involve repurposing/generalizing a bit more of the existing
> x86-only code than I did the first time around, but if you (as x86
> maintainers) are happy with that, I'm all for it.
>
> I do have a couple of questions then
> - the module aliases host tool has no arch specific dependencies at
> all except having x86cpu as one of the entries: would you mind
> dropping the x86 prefix there? Or rather add dependencies on $ARCH?
> (If we drop it there, we basically end up with 'cpu:' everywhere)

I think it makes sense to indicate what kind of CPU the string refers
to, as the top-level indicator of what is going on. This might be
possible to macroize the generation of this prefix, though.

> - in the vendor/family/model case, it may be preferable to drop these
> fields entirely from certain modules' aliases if they match on 'any'
> (provided that the module tools permit this) rather than add
> architecture, variant, revision, etc fields for all architectures if
> they can only ever match on one

I think that can be CPU dependent.

> - some of the X86_ macros would probable be redefined in terms of the
> generic macros rather than the other way around, which would result in
> some changes under arch/x86 as well, is that acceptable for you?

If you are talking about X86_FEATURE_* then almost certainly no,
although I'm willing to listen to what you have in mind.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp
http://groups.google.com/group/linux.kernel/t/b1f45f3038c24409?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Stephen Boyd


On 11/06/13 17:50, Josh Cartwright wrote:
> On Fri, Nov 01, 2013 at 03:08:53PM -0700, Stephen Boyd wrote:
>> diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
>> index f35906b..71a8592 100644
>> --- a/arch/arm/kernel/devtree.c
>> +++ b/arch/arm/kernel/devtree.c
>> @@ -25,6 +25,7 @@
>> #include <asm/smp_plat.h>
>> #include <asm/mach/arch.h>
>> #include <asm/mach-types.h>
>> +#include <asm/smp.h>
>>
>> void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>> {
>> @@ -63,6 +64,36 @@ void __init arm_dt_memblock_reserve(void)
>> }
>> }
>>
>> +#ifdef CONFIG_SMP
>> +extern struct of_cpu_method __cpu_method_of_table[];
>> +
>> +static const struct of_cpu_method __cpu_method_of_table_sentinel
>> + __used __section(__cpu_method_of_table_end);
> Having a sentinel allocated into the linked image makes a lot of sense
> in other cases (IRQCHIP/CLOCKSOURCE_OF_DECLARE, etc), where it's used to
> terminate an of_device_id table (as is expected by of_match_table and
> friends).
>
> In this case, however, you aren't building a match table, so having a
> sentinel allocated isn't necessary. I'd suggest bookending the table
> with a VMLINUX_SYMBOL(__cpu_method_of_table_end) instead.
>
> A whole 2 pointers worth of savings!
>

Yes, will do. Thanks.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: Introducing Device Tree Overlays
http://groups.google.com/group/linux.kernel/t/7c26bee9afcc62d1?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Guenter Roeck


On Thu, Nov 07, 2013 at 10:06:31PM +0200, Pantelis Antoniou wrote:
> Hi Sebastian,
>
> On Nov 7, 2013, at 9:25 PM, Sebastian Andrzej Siewior wrote:
>
> > On 06.11.13, Guenter Roeck wrote:
> > |…
> > thanks for the explanation.
> >
> >> We use DT overlays to describe the hardware on those boards and, if necessary,
> >> its configuration. For example, if there is a PCIe switch, the overlay would
> >> describe its memory and bus number configuration.
> >
> > So have your "fix" configuration and a few overlays you switch at
> > runtime. The problem you have is that you want to switch a specific part
> > if your configuration at runtime. I assume you run DT on ARM. What

powerpc

> > happens if you swtich from ARM to x86 and you "keep" your FPGA
> > configuration requirement? You can't use both, DT and ACPI, right? So
> > what happens then?
> >
>
> FWIW DT has been ported to x86. And is present on arm/powerpc/mips/arc and possibly
> others.
>
> So what are we talking about again? If you care about the non-DT case, why
> don't you make a patch about how you could support Guenter's use case on
> the x86.
>
> His use case is not uncommon, believe it or not, and x86 would benefit from
> something this flexible.
>
Together with the work Thierry has done a couple of years ago, using DT
to augment ACPI data on x86 platforms, I don't really see a major problem
with using DT overlays on x86. Sure, it will require some work, and the
resulting patches may not be accepted for upstream integration,
but the concept is already there, and we plan to make good use of it.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Thurs, Nov 7 2013 3:10 pm
From: Guenter Roeck


On Thu, Nov 07, 2013 at 09:46:26PM +0100, Sebastian Andrzej Siewior wrote:
> On 07.11.13, Pantelis Antoniou wrote:
> > Hi Sebastian,
> Hi Pantelis,
>
> > FWIW DT has been ported to x86. And is present on arm/powerpc/mips/arc and possibly
> > others.
>
> Yes, I know. I am the one that did the work for CE4100, the first one
> that boots with DT on x86.
>
> > So what are we talking about again? If you care about the non-DT case, why
> > don't you make a patch about how you could support Guenter's use case on
> > the x86.
>
> I am only saying that this "hot-plug a device at a non hot-plugagle bus at
> runtime" is not limited to DT but this solution is. X86 + ACPI is not
> the only limitation. ARM is (forced) going to ACPI as well as far I
> know. And this solution is limited to DT. This is what I am pointing
> out.
>
I can't tell about ARM, but I am not entirely sure how ACPI support on ARM
is going to help us on powerpc.

> > His use case is not uncommon, believe it or not, and x86 would benefit from
> > something this flexible.
>
> I *think* a more flexible solution would be something like bus_type which is
> exposed via configfs. It would be attached behind a certain device/bus where
> the "physical" hotplug interface is. The user would then be able to read the
> configuration based on whatever information he has and could then create
> devices he likes at runtime. This wouldn't depend much on the firmware that is
> used but would require a little more work I think.
>
Quite frankly, I am interested at a solution that works and solves our problems.
I am not looking for something that is 100% perfect and may never be delivered.

Fortunately, the Linux kernel was willing to adopt multiple different file
systems, and still accepts new ones on a regular basis. If a new file system
is better, it will start getting used, and old file systems are being phased out
as fewer people use them. I would hope the same should be possible with DT
overlays and possible other future solutions for the same problem, and that
we won't have to wait for the perfect solution from day 1.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: ext4: Fix reading of extended tv_sec (bug 23732)
http://groups.google.com/group/linux.kernel/t/1c061384ec63b78b?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: David Turner


On Thu, 2013-11-07 at 17:03 +0100, Jan Kara wrote:
> So I'm somewhat wondering: Previously we decoded tv_nsec regardless of
> tv_sec size. After your patch we do it only if sizeof(time->tv_sec) > 4. Is
> this an intended change? Why is it OK?

This is an error. Here is a corrected version of the patch.


--

In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446. The representation (which this patch does not alter) is a bit
hackish, in that the most-significant bit is no longer (alone)
sufficient to indicate the sign. That's because we're representing an
asymmetric range, with seven times as many positive values as
negative.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended. This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Mark Harris <mh8928@yahoo.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
---
fs/ext4/ext4.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index af815ea..3c2d0b3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -722,10 +722,15 @@ static inline __le32 ext4_encode_extra_time(struct timespec *time)

static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
{
- if (sizeof(time->tv_sec) > 4)
- time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
- << 32;
- time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
+ if (sizeof(time->tv_sec) > 4) {
+ u64 extra_bits = (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK);
+ if (time->tv_sec > 0 || extra_bits != EXT4_EPOCH_MASK) {
+ time->tv_sec &= 0xFFFFFFFF;
+ time->tv_sec |= extra_bits << 32;
+ }
+ }
+ time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >>
+ EXT4_EPOCH_BITS;
}

#define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \
--
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Thurs, Nov 7 2013 3:20 pm
From: Jan Kara


On Thu 07-11-13 17:54:24, David Turner wrote:
> On Thu, 2013-11-07 at 17:03 +0100, Jan Kara wrote:
> > So I'm somewhat wondering: Previously we decoded tv_nsec regardless of
> > tv_sec size. After your patch we do it only if sizeof(time->tv_sec) > 4. Is
> > this an intended change? Why is it OK?
>
> This is an error. Here is a corrected version of the patch.
>
>
> --
>
> In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
> the {a,c,m}time fields, deferring the year 2038 problem to the year
> 2446. The representation (which this patch does not alter) is a bit
> hackish, in that the most-significant bit is no longer (alone)
> sufficient to indicate the sign. That's because we're representing an
> asymmetric range, with seven times as many positive values as
> negative.
>
> When decoding these extended fields, for times whose bottom 32 bits
> would represent a negative number, sign extension causes the 64-bit
> extended timestamp to be negative as well, which is not what's
> intended. This patch corrects that issue, so that the only negative
> {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
> timestamps).
>
> Signed-off-by: David Turner <novalis@novalis.org>
> Reported-by: Mark Harris <mh8928@yahoo.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
> ---
> fs/ext4/ext4.h | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index af815ea..3c2d0b3 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -722,10 +722,15 @@ static inline __le32 ext4_encode_extra_time(struct timespec *time)
>
> static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
> {
> - if (sizeof(time->tv_sec) > 4)
> - time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
> - << 32;
> - time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> + if (sizeof(time->tv_sec) > 4) {
> + u64 extra_bits = (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK);
^^^^
Still unnecessary type cast here (but that's a cosmetic issue).

Otherwise the patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

Honza

> + if (time->tv_sec > 0 || extra_bits != EXT4_EPOCH_MASK) {
> + time->tv_sec &= 0xFFFFFFFF;
> + time->tv_sec |= extra_bits << 32;
> + }
> + }
> + time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >>
> + EXT4_EPOCH_BITS;
> }
>
> #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \
> --
> 1.8.1.2
>
>
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: LEDS: tca6507 - fix bugs in parsing of device-tree configuration.
http://groups.google.com/group/linux.kernel/t/1eb3aff0df8d6fb5?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Bryan Wu


On Thu, Oct 31, 2013 at 7:33 PM, NeilBrown <neilb@suse.de> wrote:
>
>
> 1/ The led_info array must be allocated to allow the full number
> of LEDs even if not all are present. The array maybe be sparsely
> filled but it is indexed by device address so we must at least
> allocate as many slots as the highest address used. It is easiest
> just to allocate all 7.
>
> 2/ range check the 'reg' value properly.
>
> 3/ led.flags must be initialised to zero, else all leds could
> be treated as GPIOs (depending on what happens to be on the stack).
>
> Signed-off-by: NeilBrown <neilb@suse.de>
>

Thanks, queued into devel branch of my tree for 3.14 kernel.

-Bryan

> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
> index 8cc304f36728..f5063f447463 100644
> --- a/drivers/leds/leds-tca6507.c
> +++ b/drivers/leds/leds-tca6507.c
> @@ -682,7 +682,7 @@ tca6507_led_dt_init(struct i2c_client *client)
> return ERR_PTR(-ENODEV);
>
> tca_leds = devm_kzalloc(&client->dev,
> - sizeof(struct led_info) * count, GFP_KERNEL);
> + sizeof(struct led_info) * NUM_LEDS, GFP_KERNEL);
> if (!tca_leds)
> return ERR_PTR(-ENOMEM);
>
> @@ -695,9 +695,9 @@ tca6507_led_dt_init(struct i2c_client *client)
> of_get_property(child, "label", NULL) ? : child->name;
> led.default_trigger =
> of_get_property(child, "linux,default-trigger", NULL);
> -
> + led.flags = 0;
> ret = of_property_read_u32(child, "reg", &reg);
> - if (ret != 0)
> + if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
> continue;
>
> tca_leds[reg] = led;
> @@ -708,7 +708,7 @@ tca6507_led_dt_init(struct i2c_client *client)
> return ERR_PTR(-ENOMEM);
>
> pdata->leds.leds = tca_leds;
> - pdata->leds.num_leds = count;
> + pdata->leds.num_leds = NUM_LEDS;
>
> return pdata;
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: staging: zsmalloc: Ensure handle is never 0 on success
http://groups.google.com/group/linux.kernel/t/56e7b28ac067dee2?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Olav Haugan


On 11/6/2013 7:06 PM, Greg KH wrote:
> On Wed, Nov 06, 2013 at 04:00:02PM -0800, Olav Haugan wrote:
>> On 11/5/2013 5:56 PM, Greg KH wrote:
>>> On Tue, Nov 05, 2013 at 04:54:12PM -0800, Olav Haugan wrote:
>>>> zsmalloc encodes a handle using the page pfn and an object
>>>> index. On some hardware platforms the pfn could be 0 and this
>>>> causes the encoded handle to be 0 which is interpreted as an
>>>> allocation failure.
>>>
>>> What platforms specifically have this issue?
>>
>> Currently some of Qualcomm SoC's have physical memory that starts at
>> address 0x0 which causes this problem.
>
> Then say this, and list the exact SoC's that can have this problem so
> people know how to evaluate the bugfix and see if it is relevant for
> their systems.
>
>> I believe this could be a problem
>> on any platforms if memory is configured to be starting at physical
>> address 0x0 for these platforms.
>
> Have you seen this be a problem? So it's just a theoretical issue at
> this point in time?

Yes, I can consistently reproduce it. It is not just theoretical.

Thanks,

Olav Haugan

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: kmod: Run usermodehelpers only on cpus allowed for kthreadd V2
http://groups.google.com/group/linux.kernel/t/a1700157e52e62ff?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Frederic Weisbecker


On Thu, Nov 07, 2013 at 04:43:11PM +0000, Christoph Lameter wrote:
> usermodehelper() threads can currently run on all processors.
> This is an issue for low latency cores. Spawnig a new thread causes
> cpu holdoffs in the range of hundreds of microseconds to a few
> milliseconds. Not good for cores on which processes run that need
> to react as fast as possible.
>
> kthreadd threads can be restricted using taskset to a limited set
> of processors. Then the kernel thread pool will not fork processes
> on those anymore thereby protecting those processors from additional
> latencies.
>
> Make usermodehelper() threads obey the limitations that kthreadd is
> restricted to. Kthreadd is not the parent of usermodehelper threads
> so we need to explicitly get the allowed processors for kthreadd.
>
> Before this patch there is no way to limit the cpus that usermodehelper
> can run on since the affinity is set when the thread is spawned to
> all processors.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/include/linux/kthread.h
> ===================================================================
> --- linux.orig/include/linux/kthread.h 2013-11-07 10:31:46.667807582 -0600
> +++ linux/include/linux/kthread.h 2013-11-07 10:31:46.663807693 -0600
> @@ -51,6 +51,7 @@ void kthread_parkme(void);
> int kthreadd(void *unused);
> extern struct task_struct *kthreadd_task;
> extern int tsk_fork_get_node(struct task_struct *tsk);
> +void set_kthreadd_affinity(void);
>
> /*
> * Simple work processor based on kthread.
> Index: linux/kernel/kmod.c
> ===================================================================
> --- linux.orig/kernel/kmod.c 2013-11-07 10:31:46.667807582 -0600
> +++ linux/kernel/kmod.c 2013-11-07 10:35:28.825645008 -0600
> @@ -40,6 +40,7 @@
> #include <linux/ptrace.h>
> #include <linux/async.h>
> #include <asm/uaccess.h>
> +#include <linux/kthread.h>
>
> #include <trace/events/module.h>
>
> @@ -209,8 +210,13 @@ static int ____call_usermodehelper(void
> flush_signal_handlers(current, 1);
> spin_unlock_irq(&current->sighand->siglock);
>
> - /* We can run anywhere, unlike our parent keventd(). */
> - set_cpus_allowed_ptr(current, cpu_all_mask);
> + /*
> + * Kthreadd can be restricted to a set of processors if the user wants to
> + * protect other processors from OS latencies. If that has happened then
> + * we do not want to disturb the other processors here either so we start
> + * the usermode helper threads only on the processors allowed for kthreadd.
> + */
> + set_kthreadd_affinity();

I'm sorry to burden again on this but this looks too odd.

usermodehelper works are created via workqueues, right? And workqueues are an issue as
well for those who want CPU isolation.

So this looks like a more general problem than just call_usermodehelper.

Last time I checked, it seemed to me that this is an unbound workqueue? If so can't we tune
the affinity of this workqueue? If not perhaps that's something we want to solve and which
could be useful for all the users who don't want their CPU to be disturbed.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: dmaengine: add msm bam dma driver
http://groups.google.com/group/linux.kernel/t/eb7ef92f0916cf63?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 3:10 pm
From: Andy Gross


On Thu, Oct 31, 2013 at 04:46:21PM -0500, Andy Gross wrote:
> On Thu, Oct 31, 2013 at 10:29:48PM +0530, Vinod Koul wrote:
> > On Fri, Oct 25, 2013 at 03:24:02PM -0500, Andy Gross wrote:
> >
> > This should be sent to dmaengine@vger.kernel.org
>
> I'll add the list when I send the second iteration or should I send it over mid
> stream?
>
> > > Add the DMA engine driver for the MSM Bus Access Manager (BAM) DMA controller
> > > found in the MSM 8x74 platforms.
> > >
> > > Each BAM DMA device is associated with a specific on-chip peripheral. Each
> > > channel provides a uni-directional data transfer engine that is capable of
> > > transferring data between the peripheral and system memory (System mode) or
> > > between two peripherals (BAM2BAM).
> > >
> > > The initial release of this driver only supports slave transfers between
> > > peripherals and system memory.
> > >
> > > Signed-off-by: Andy Gross <agross@codeaurora.org>
> > > +/*
> > > + * bam_alloc_chan - Allocate channel resources for DMA channel.
> > > + * @chan: specified channel
> > > + *
> > > + * This function allocates the FIFO descriptor memory and resets the channel
> > > + */
> > > +static int bam_alloc_chan(struct dma_chan *chan)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + u32 val;
> > > + union bam_pipe_ctrl pctrl;
> > > +
> > > + /* check for channel activity */
> > > + pctrl.value = ioread32(bdev->regs + BAM_P_CTRL(bchan->id));
> > > + if (pctrl.bits.p_en) {
> > > + dev_err(bdev->dev, "channel already active\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + /* allocate FIFO descriptor space */
> > > + bchan->fifo_virt = (struct bam_desc_hw *)dma_alloc_coherent(bdev->dev,
> > > + BAM_DESC_FIFO_SIZE, &bchan->fifo_phys,
> > > + GFP_KERNEL);
> > > +
> > > + if (!bchan->fifo_virt) {
> > > + dev_err(bdev->dev, "Failed to allocate descriptor fifo\n");
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + /* reset channel */
> > > + iowrite32(1, bdev->regs + BAM_P_RST(bchan->id));
> > > + iowrite32(0, bdev->regs + BAM_P_RST(bchan->id));
> > > +
> > > + /* configure fifo address/size in bam channel registers */
> > > + iowrite32(bchan->fifo_phys, bdev->regs +
> > > + BAM_P_DESC_FIFO_ADDR(bchan->id));
> > > + iowrite32(BAM_DESC_FIFO_SIZE, bdev->regs +
> > > + BAM_P_FIFO_SIZES(bchan->id));
> > > +
> > > + /* unmask and enable interrupts for defined EE, bam and error irqs */
> > > + iowrite32(BAM_IRQ_MSK, bdev->regs + BAM_IRQ_SRCS_EE(bchan->ee));
> > > +
> > > + /* enable the per pipe interrupts, enable EOT and INT irqs */
> > > + iowrite32(P_DEFAULT_IRQS_EN, bdev->regs + BAM_P_IRQ_EN(bchan->id));
> > > +
> > > + /* unmask the specific pipe and EE combo */
> > > + val = ioread32(bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > + val |= 1 << bchan->id;
> > > + iowrite32(val, bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > +
> > > + /* set fixed direction and mode, then enable channel */
> > I was going to question why you are doing hw specfic stuff in alloc channel but
> > now why do you enable seems to be a bigger question in mind?
>
> The fifo_virt is used to store the hardware descriptors that are used directly
> by the dma controller. I have to at least fill in the descriptor FIFO address
> and size and reset the channel to clear the fifo offset inside the hardware.
> After reset the internal fifo offset is 0. And every subsequent transaction
> increments this. That is how it knows which descriptors to work on inside the
> descriptor fifo memory.
>
> I can definitely defer the rest of hte h/w interactions until the point that I
> need to actually kick off the dma controller.
>
>
> > > + pctrl.value = 0;
> > > + pctrl.bits.p_direction =
> > > + (bchan->bam_slave.slave.direction == DMA_DEV_TO_MEM) ?
> > > + BAM_PIPE_PRODUCER : BAM_PIPE_CONSUMER;
> > > + pctrl.bits.p_sys_mode = BAM_PIPE_MODE_SYSTEM;
> > > + pctrl.bits.p_en = 1;
> > > +
> > > + iowrite32(pctrl.value, bdev->regs + BAM_P_CTRL(bchan->id));
> > > +
> > > + /* set desc threshold */
> > > + /* do bookkeeping for tracking used EEs, used during IRQ handling */
> > > + set_bit(bchan->ee, &bdev->enabled_ees);
> > > +
> > > + bchan->head = 0;
> > > + bchan->tail = 0;
> > > +
> > > + return 0;
> > You said you are going to allocate descriptors so right thing would be return
> > number of allocated desc here!
>
> OK, I missed that.
>
> > > +}
> > > +
> > > +/*
> > > + * bam_free_chan - Frees dma resources associated with specific channel
> > > + * @chan: specified channel
> > > + *
> > > + * Free the allocated fifo descriptor memory and channel resources
> > > + *
> > > + */
> > > +static void bam_free_chan(struct dma_chan *chan)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + u32 val;
> > > +
> > Shouldn't you check if channel is busy?
> >
>
> Yes, I'll add that in. With no return code, how useful is this to the caller?
>
>
> > > + /* reset channel */
> > > + iowrite32(1, bdev->regs + BAM_P_RST(bchan->id));
> > > + iowrite32(0, bdev->regs + BAM_P_RST(bchan->id));
> > > +
> > > + dma_free_coherent(bdev->dev, BAM_DESC_FIFO_SIZE, bchan->fifo_virt,
> > > + bchan->fifo_phys);
> > > +
> > > + /* mask irq for pipe/channel */
> > > + val = ioread32(bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > + val &= ~(1 << bchan->id);
> > > + iowrite32(val, bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > +
> > > + /* disable irq */
> > > + iowrite32(0, bdev->regs + BAM_P_IRQ_EN(bchan->id));
> > > +
> > > + clear_bit(bchan->ee, &bdev->enabled_ees);
> > > +}
> > > +
> > > +/*
> > > + * bam_slave_config - set slave configuration for channel
> > > + * @chan: dma channel
> > > + * @cfg: slave configuration
> > > + *
> > > + * Sets slave configuration for channel
> > > + * Only allow setting direction once. BAM channels are unidirectional
> > > + * and the direction is set in hardware.
> > > + *
> > > + */
> > > +static void bam_slave_config(struct bam_chan *bchan,
> > > + struct bam_dma_slave_config *bcfg)
> >
> > > +{
> > > + struct bam_device *bdev = bchan->device;
> > > +
> > > + bchan->bam_slave.desc_threshold = bcfg->desc_threshold;
> > what does the desc_threshold mean?
>
> The desc threshhold determines the minimum number of bytes of descriptor that
> causes a write event to be communicated to the peripheral. I default to 4 bytes
> (1 descriptor), but this is configurable through the DMA_SLAVE_CONFIG interface
> using the extended slave_config structure.
>
> > > +
> > > + /* set desc threshold */
> > > + iowrite32(bcfg->desc_threshold, bdev->regs + BAM_DESC_CNT_TRSHLD);
> > > +}
> > > +
> > > +/*
> > > + * bam_start_dma - loads up descriptors and starts dma
> > > + * @chan: dma channel
> > > + *
> > > + * Loads descriptors into descriptor fifo and starts dma controller
> > > + *
> > > + * NOTE: Must hold channel lock
> > > +*/
> > > +static void bam_start_dma(struct bam_chan *bchan)
> > > +{
> > > + struct bam_device *bdev = bchan->device;
> > > + struct bam_async_desc *async_desc, *_adesc;
> > > + u32 curr_len, val;
> > > + u32 num_processed = 0;
> > > +
> > > + if (list_empty(&bchan->pending))
> > > + return;
> > > +
> > > + curr_len = (bchan->head <= bchan->tail) ?
> > > + bchan->tail - bchan->head :
> > > + MAX_DESCRIPTORS - bchan->head + bchan->tail;
> > > +
> > > + list_for_each_entry_safe(async_desc, _adesc, &bchan->pending, node) {
> > > +
> > > + /* bust out if we are out of room */
> > > + if (async_desc->num_desc + curr_len > MAX_DESCRIPTORS)
> > > + break;
> > > +
> > > + /* copy descriptors into fifo */
> > > + if (bchan->tail + async_desc->num_desc > MAX_DESCRIPTORS) {
> > > + u32 partial = MAX_DESCRIPTORS - bchan->tail;
> > > +
> > > + memcpy(&bchan->fifo_virt[bchan->tail], async_desc->desc,
> > > + partial * sizeof(struct bam_desc_hw));
> > > + memcpy(bchan->fifo_virt, &async_desc->desc[partial],
> > > + (async_desc->num_desc - partial) *
> > > + sizeof(struct bam_desc_hw));
> > memcpy for device memory copy?
>
> My initial thought was that I needed to wait until now to fill in the
> descriptors on the fifo that was allocated. The fifo memory is accessed from
> the dma controller. The controller just manages an internal offset that rolls
> over based on the size of the configured fifo memory. I was keeping the
> descriptors on hand until I needed to program them into the fifo memory, hence
> the copy.
>
> > > + } else {
> > > + memcpy(&bchan->fifo_virt[bchan->tail], async_desc->desc,
> > > + async_desc->num_desc *
> > > + sizeof(struct bam_desc_hw));
> > > + }
> > > +
> > > + num_processed++;
> > > + bchan->tail += async_desc->num_desc;
> > > + bchan->tail %= MAX_DESCRIPTORS;
> > > + curr_len += async_desc->num_desc;
> > > +
> > > + list_move_tail(&async_desc->node, &bchan->active);
> > > + }
> > > +
> > > + /* bail if we didn't queue anything to the active queue */
> > > + if (!num_processed)
> > > + return;
> > > +
> > > + async_desc = list_first_entry(&bchan->active, struct bam_async_desc,
> > > + node);
> > > +
> > > + val = ioread32(bdev->regs + BAM_P_SW_OFSTS(bchan->id));
> > > + val &= P_SW_OFSTS_MASK;
> > > +
> > > + /* kick off dma by forcing a write event to the pipe */
> > > + iowrite32((bchan->tail * sizeof(struct bam_desc_hw)),
> > > + bdev->regs + BAM_P_EVNT_REG(bchan->id));
> > > +}
> > > +
> > > +/*
> > > + * bam_tx_submit - Adds transaction to channel pending queue
> > > + * @tx: transaction to queue
> > > + *
> > > + * Adds dma transaction to pending queue for channel
> > > + *
> > > +*/
> > > +static dma_cookie_t bam_tx_submit(struct dma_async_tx_descriptor *tx)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(tx->chan);
> > > + struct bam_async_desc *desc = to_bam_async_desc(tx);
> > > + dma_cookie_t cookie;
> > > +
> > > + spin_lock_bh(&bchan->lock);
> > > +
> > > + cookie = dma_cookie_assign(tx);
> > > + list_add_tail(&desc->node, &bchan->pending);
> > > +
> > > + spin_unlock_bh(&bchan->lock);
> > > +
> > > + return cookie;
> > > +}
> > Okay you are *NOT* using virt-dma layer, all this can be removed if you do that,
> > this is similar to what vchan_tx_submit() does!
> >
>
> I'll look into reworking and utilizing the virt-dma layer.
>

Is it acceptable to pick and choose the pieces of the virt-dma layer that
benefit me? Or do I have to absorb all of it. The smaller helper structs and
functions are fine, but in some cases they force a certain implementation.

The bam_tx_submit and perhaps the cleanup functions are about the only pieces
I'd be able to leverage from the virt-dma, aside from the structures.

The main issue with the rest of the code is that it doesn't fit the behavior of
my dma controller. Because i have a fixed sized circular buffer for my
descriptor FIFO, I have to copy in the new descriptors before I start up the
dma channel.

Using the vchan_cookie_complete() forces me to start the next transaction within
the interrupt, or schedule another tasklet to do that work for me. The overhead
for copying what could be a large number of descriptors into the FIFO might
introduce too much latency inside the IRQ handler (especially since this is done
to non-cached memory). Using another tasklet for just restarting the dma
controller seems klunky to me.

I would rather start the next transaction within the tasklet queued from the IRQ
(vchan_cookie_complete), but because it uses it's own tasklet, I wouldn't be
able to leverage that.

The vchan_cookie_complete() usage within the IRQ handler also implys that only 1
dma transaction is completed per IRQ. That might not be the case for me,
because I can advance the DMA FIFO offset to tell the controller to keep going
to the next transaction. By the time I get to servicing the IRQ, I might have
completed more than 1 transaction. I suppose you could call
vchan_cookie_complete() multiple times, but that seems wrong to me due to the
multiple calls to tasklet_schedule.


> > > +
> > > +/*
> > > + * bam_prep_slave_sg - Prep slave sg transaction
> > > + *
> > > + * @chan: dma channel
> > > + * @sgl: scatter gather list
> > > + * @sg_len: length of sg
> > > + * @direction: DMA transfer direction
> > > + * @flags: DMA flags
> > > + * @context: transfer context (unused)
> > > + */
> > > +static struct dma_async_tx_descriptor *bam_prep_slave_sg(struct dma_chan *chan,
> > > + struct scatterlist *sgl, unsigned int sg_len,
> > > + enum dma_transfer_direction direction, unsigned long flags,
> > > + void *context)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + struct bam_async_desc *async_desc = NULL;
> > > + struct scatterlist *sg;
> > > + u32 i;
> > > + struct bam_desc_hw *desc;
> > > +
> > > +
> > > + if (!is_slave_direction(direction)) {
> > > + dev_err(bdev->dev, "invalid dma direction\n");
> > > + goto err_out;
> > > + }
> > > +
> > > + /* direction has to match pipe configuration from the slave config */
> > > + if (direction != bchan->bam_slave.slave.direction) {
> > > + dev_err(bdev->dev,
> > > + "trans does not match channel configuration\n");
> > > + goto err_out;
> > > + }
> > > +
> > > + /* make sure number of descriptors will fit within the fifo */
> > > + if (sg_len > MAX_DESCRIPTORS) {
> > > + dev_err(bdev->dev, "not enough fifo descriptor space\n");
> > > + goto err_out;
> > > + }
> > what prevents you from assigning more virtual descriptors to this case and then
> > submit those after HW descriptors are done.
>
> I hadn't considered the virtual descriptors. I'll see what I can do to utilize
> that and rework this. I can see the virtual descriptors simplifying this a good
> bit.
>
> > > +
> > > + /* allocate enough room to accomodate the number of entries */
> > > + async_desc = kzalloc(sizeof(*async_desc) +
> > > + (sg_len * sizeof(struct bam_desc_hw)), GFP_KERNEL);
> > this cna be called from non sleepy context and the recommedation is to use
> > GFP_NOWAIT for memory allocation
> >
>
> I missed this. I'll change it.
>
> > > +static int bam_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> > > + unsigned long arg)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + struct bam_dma_slave_config *bconfig;
> > > + int ret = 0;
> > > +
> > > + switch (cmd) {
> > > + case DMA_PAUSE:
> > > + spin_lock_bh(&bchan->lock);
> > > + iowrite32(1, bdev->regs + BAM_P_HALT(bchan->id));
> > > + spin_unlock_bh(&bchan->lock);
> > > + break;
> > > + case DMA_RESUME:
> > > + spin_lock_bh(&bchan->lock);
> > > + iowrite32(0, bdev->regs + BAM_P_HALT(bchan->id));
> > > + spin_unlock_bh(&bchan->lock);
> > > + break;
> > > + case DMA_TERMINATE_ALL:
> > > + bam_dma_terminate_all(chan);
> > > + break;
> > > + case DMA_SLAVE_CONFIG:
> > > + bconfig = (struct bam_dma_slave_config *)arg;
> > > + bam_slave_config(bchan, bconfig);
> > DMA_SLAVE_CONFIG expects arg as struct dma_slave_config, not this. Pls dont
> > voilate APIs
>
> So for extended slave_config structures, the caller needs to send in a ptr to
> the dma_slave_config and then I can determine the bam_dma_slave_config. Will
> fix.
>
> > > + break;
> > > + default:
> > > + ret = -EIO;
> > I would expect -ENXIO here!
> >
>
> OK.
>
> > > + break;
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +/*
> > > + * bam_tx_status - returns status of transaction
> > > + * @chan: dma channel
> > > + * @cookie: transaction cookie
> > > + * @txstate: DMA transaction state
> > > + *
> > > + * Return status of dma transaction
> > > + */
> > > +static enum dma_status bam_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
> > > + struct dma_tx_state *txstate)
> > > +{
> > > + return dma_cookie_status(chan, cookie, txstate);
> > hmmm, this wont work in many cases for slave. For example if you try to get this
> > working with audio you need to provide the residue values.
> > The results you are providing here will not be useful, so I would recommedn
> > fixing it
> >
>
> Ok so in this case I'd need to read where I am in the descriptor chain and
> calculate the residual. That shouldn't be a problem.
>
> > > +
> > > +static int bam_dma_probe(struct platform_device *pdev)
> > > +{
> > > + struct bam_device *bdev;
> > > + int err, i;
> > > +
> > > + bdev = kzalloc(sizeof(*bdev), GFP_KERNEL);
> > devm_ pls
> >
>
> will fix.
>
> > > + if (!bdev) {
> > > + dev_err(&pdev->dev, "insufficient memory for private data\n");
> > > + err = -ENOMEM;
> > > + goto err_no_bdev;
> > > + }
> > > +
> > > + bdev->dev = &pdev->dev;
> > > + dev_set_drvdata(bdev->dev, bdev);
> > > +
> > > + bdev->regs = of_iomap(pdev->dev.of_node, 0);
> > > + if (!bdev->regs) {
> > > + dev_err(bdev->dev, "unable to ioremap base\n");
> > > + err = -ENOMEM;
> > > + goto err_free_bamdev;
> > > + }
> > > +
> > > + bdev->irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
> > > + if (bdev->irq == NO_IRQ) {
> > > + dev_err(bdev->dev, "unable to map irq\n");
> > > + err = -EINVAL;
> > > + goto err_unmap_mem;
> > > + }
> > > +
> > > + bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk");
> > > + if (IS_ERR(bdev->bamclk)) {
> > > + err = PTR_ERR(bdev->bamclk);
> > > + goto err_free_irq;
> > > + }
> > > +
> > > + err = clk_prepare_enable(bdev->bamclk);
> > > + if (err) {
> > > + dev_err(bdev->dev, "failed to prepare/enable clock");
> > > + goto err_free_irq;
> > > + }
> > > +
> > > + err = request_irq(bdev->irq, &bam_dma_irq, IRQF_TRIGGER_HIGH, "bam_dma",
> > > + bdev);
> > > + if (err) {
> > > + dev_err(bdev->dev, "error requesting irq\n");
> > > + err = -EINVAL;
> > > + goto err_disable_clk;
> > > + }
> > > +
> > > + if (bam_init(bdev)) {
> > > + dev_err(bdev->dev, "cannot initialize bam device\n");
> > > + err = -EINVAL;
> > > + goto err_disable_clk;
> > > + }
> > > +
> > > + bdev->channels = kzalloc(sizeof(*bdev->channels) * bdev->num_channels,
> > > + GFP_KERNEL);
> > > +
> > > + if (!bdev->channels) {
> > > + dev_err(bdev->dev, "unable to allocate channels\n");
> > > + err = -ENOMEM;
> > > + goto err_disable_clk;
> > > + }
> > > +
> > > + /* allocate and initialize channels */
> > > + INIT_LIST_HEAD(&bdev->common.channels);
> > > +
> > > + for (i = 0; i < bdev->num_channels; i++)
> > > + bam_channel_init(bdev, &bdev->channels[i], i);
> > > +
> > > + /* set max dma segment size */
> > > + bdev->common.dev = bdev->dev;
> > > + bdev->common.dev->dma_parms = &bdev->dma_parms;
> > > + if (dma_set_max_seg_size(bdev->common.dev, BAM_MAX_DATA_SIZE)) {
> > > + dev_err(bdev->dev, "cannot set maximum segment size\n");
> > > + goto err_disable_clk;
> > > + }
> > > +
> > > + /* set capabilities */
> > > + dma_cap_zero(bdev->common.cap_mask);
> > > + dma_cap_set(DMA_SLAVE, bdev->common.cap_mask);
> > > +
> > > + /* initialize dmaengine apis */
> > > + bdev->common.device_alloc_chan_resources = bam_alloc_chan;
> > > + bdev->common.device_free_chan_resources = bam_free_chan;
> > > + bdev->common.device_prep_slave_sg = bam_prep_slave_sg;
> > > + bdev->common.device_control = bam_control;
> > > + bdev->common.device_issue_pending = bam_issue_pending;
> > > + bdev->common.device_tx_status = bam_tx_status;
> > > + bdev->common.dev = bdev->dev;
> > > +
> > > + dma_async_device_register(&bdev->common);
> > > +
> > > + if (pdev->dev.of_node) {
> > > + err = of_dma_controller_register(pdev->dev.of_node,
> > > + bam_dma_xlate, &bdev->common);
> > > +
> > > + if (err) {
> > > + dev_err(bdev->dev, "failed to register of_dma\n");
> > > + goto err_unregister_dma;
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > +
> > > +err_unregister_dma:
> > > + dma_async_device_unregister(&bdev->common);
> > > +err_free_irq:
> > > + free_irq(bdev->irq, bdev);
> > > +err_disable_clk:
> > > + clk_disable_unprepare(bdev->bamclk);
> > > +err_unmap_mem:
> > > + iounmap(bdev->regs);
> > > +err_free_bamdev:
> > > + if (bdev)
> > > + kfree(bdev->channels);
> > > + kfree(bdev);
> > > +err_no_bdev:
> > you need to get yourslef introduced with devm_ friends to ease this part!
> >
> > Overall I think driver needs to bit more plumbing and also needs to use the
> > virt-dma so that bunch of work already done is not redefined here.
>
> I'll rework for comments and see how I can incorporate the virt-dma.
>
> --
> sent by an employee of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: linux-next: reminder
http://groups.google.com/group/linux.kernel/t/6baf83471f093f73?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 3:30 pm
From: Stephen Rothwell


Hi all,

Just a reminder to *not* add any v3.14 material to linux-next until after
v3.13-rc1 has been released.

Also, please resist the temptation to rebase your trees before sending
them upstream.

--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au




==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate