twitter: linux.kernel - 26 new messages in 14 topics

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* BUG: mm, numa: test segfaults, only when NUMA balancing is on - 2 messages,
1 author
http://groups.google.com/group/linux.kernel/t/8b6a8eb2945b92ff?hl=en
* [ext2] XIP does not work on ext2 - 3 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/7324138990336f3c?hl=en
* rbtree: Fix rbtree_postorder_for_each_entry_safe() iterator - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/47bd7300ac52f95d?hl=en
* ARM: pinctrl: Add Broadcom Capri pinctrl driver - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b0e58ba3e23c2b35?hl=en
* wire up CPU features to udev based module loading - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/7bd137213e7534f5?hl=en
* ARM: add initial support for Marvell Berlin SoCs - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a3e38d590da25903?hl=en
* of: irq: Fix interrupt-map entry matching - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2db00bbb83f94fe6?hl=en
* irq_work: Provide a irq work that can be processed on any cpu - 6 messages,
2 authors
http://groups.google.com/group/linux.kernel/t/7bc00cc28be7d4b5?hl=en
* provide estimated available memory in /proc/meminfo - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/3de838d7bfb53c0c?hl=en
* Introducing Device Tree Overlays - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/7c26bee9afcc62d1?hl=en
* platform: add chrome platform directory - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ca8784c0df1499d1?hl=en
* MCS Lock: Barrier corrections - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/c949c96528a47270?hl=en
* usb-serial lockdep trace in linus' current tree. - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/10ab65d5a0e4e1ea?hl=en
* ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/b1f45f3038c24409?hl=en

==============================================================================
TOPIC: BUG: mm, numa: test segfaults, only when NUMA balancing is on
http://groups.google.com/group/linux.kernel/t/8b6a8eb2945b92ff?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 1:50 pm
From: Alex Thorlton

On Wed, Nov 06, 2013 at 01:10:48PM +0000, Mel Gorman wrote:
> On Mon, Nov 04, 2013 at 02:03:46PM -0600, Alex Thorlton wrote:
> > On Mon, Nov 04, 2013 at 02:58:28PM +0000, Mel Gorman wrote:
> > > On Wed, Oct 16, 2013 at 10:54:29AM -0500, Alex Thorlton wrote:
> > > > Hi guys,
> > > >
> > > > I ran into a bug a week or so ago, that I believe has something to do
> > > > with NUMA balancing, but I'm having a tough time tracking down exactly
> > > > what is causing it. When running with the following configuration
> > > > options set:
> > > >
> > >
> > > Can you test with patches
> > > cd65718712469ad844467250e8fad20a5838baae..0255d491848032f6c601b6410c3b8ebded3a37b1
> > > applied? They fix some known memory corruption problems, were merged for
> > > 3.12 (so alternatively just test 3.12) and have been tagged for -stable.
> >
> > I just finished testing with 3.12, and I'm still seeing the same issue.
>
> Ok, I plugged your test into mmtests and ran it a few times but was not
> able to reproduce the same issue. It's a much smaller machine which
> might be a factor.
>
> > I'll poke around a bit more on this in the next few days and see if I
> > can come up with any more information. In the meantime, let me know if
> > you have any other suggestions.
> >
>
> Try the following patch on top of 3.12. It's a patch that is expected to
> be merged for 3.13. On its own it'll hurt automatic NUMA balancing in
> -stable but corruption trumps performance and the full series is not
> going to be considered acceptable for -stable

I gave this patch a shot, and it didn't seem to solve the problem.
Actually I'm running into what appear to be *worse* problems on the 3.12
kernel. Here're a couple stack traces of what I get when I run the test
on 3.12, 512 cores:

(These are just two of the CPUs, obviously, but most of the memscale
processes appeared to be in one of these two spots)

Nov 7 13:54:39 uvpsw1 kernel: NMI backtrace for cpu 6
Nov 7 13:54:39 uvpsw1 kernel: CPU: 6 PID: 17759 Comm: thp_memscale Not tainted 3.12.0-rc7-medusa-00006-g0255d49 #381
Nov 7 13:54:39 uvpsw1 kernel: Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20
Nov 7 13:54:39 uvpsw1 kernel: task: ffff8810647e0300 ti: ffff88106413e000 task.ti: ffff88106413e000
Nov 7 13:54:39 uvpsw1 kernel: RIP: 0010:[<ffffffff8151c7d5>] [<ffffffff8151c7d5>] _raw_spin_lock+0x1a/0x25
Nov 7 13:54:39 uvpsw1 kernel: RSP: 0018:ffff88106413fd38 EFLAGS: 00000283
Nov 7 13:54:39 uvpsw1 kernel: RAX: 00000000a1a9a0fe RBX: 0000000000000206 RCX: ffff880000000000
Nov 7 13:54:41 uvpsw1 kernel: RDX: 000000000000a1a9 RSI: 00003ffffffff000 RDI: ffff8907ded35494
Nov 7 13:54:41 uvpsw1 kernel: RBP: ffff88106413fd38 R08: 0000000000000006 R09: 0000000000000002
Nov 7 13:54:41 uvpsw1 kernel: R10: 0000000000000007 R11: ffff88106413ff40 R12: ffff8907ded35494
Nov 7 13:54:42 uvpsw1 kernel: R13: ffff88106413fe1c R14: ffff8810637a05f0 R15: 0000000000000206
Nov 7 13:54:42 uvpsw1 kernel: FS: 00007fffd5def700(0000) GS:ffff88107d980000(0000) knlGS:0000000000000000
Nov 7 13:54:42 uvpsw1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 7 13:54:42 uvpsw1 kernel: CR2: 00007fffd5ded000 CR3: 00000107dfbcf000 CR4: 00000000000007e0
Nov 7 13:54:42 uvpsw1 kernel: Stack:
Nov 7 13:54:42 uvpsw1 kernel: ffff88106413fda8 ffffffff810d670a 0000000000000002 0000000000000006
Nov 7 13:54:42 uvpsw1 kernel: 00007fff57dde000 ffff8810640e1cc0 000002006413fe10 ffff8907ded35440
Nov 7 13:54:45 uvpsw1 kernel: ffff88106413fda8 0000000000000206 0000000000000002 0000000000000000
Nov 7 13:54:45 uvpsw1 kernel: Call Trace:
Nov 7 13:54:45 uvpsw1 kernel: [<ffffffff810d670a>] follow_page_mask+0x123/0x3f1
Nov 7 13:54:45 uvpsw1 kernel: [<ffffffff810d7c4e>] __get_user_pages+0x3e3/0x488
Nov 7 13:54:45 uvpsw1 kernel: [<ffffffff810d7d90>] get_user_pages+0x4d/0x4f
Nov 7 13:54:45 uvpsw1 kernel: [<ffffffff810ec869>] SyS_get_mempolicy+0x1a9/0x3e0
Nov 7 13:54:45 uvpsw1 kernel: [<ffffffff8151d422>] system_call_fastpath+0x16/0x1b
Nov 7 13:54:46 uvpsw1 kernel: Code: b1 17 39 c8 ba 01 00 00 00 74 02 31 d2 89 d0 c9 c3 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 d0 74 0c 66 8b 07 <66> 39 d0 74 04 f3 90 eb f4 c9 c3 55 48 89 e5 9c 59 fa b8 00 00

Nov 7 13:55:59 uvpsw1 kernel: NMI backtrace for cpu 8
Nov 7 13:55:59 uvpsw1 kernel: INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 1.099 msecs
Nov 7 13:56:04 uvpsw1 kernel: CPU: 8 PID: 17761 Comm: thp_memscale Not tainted 3.12.0-rc7-medusa-00006-g0255d49 #381
Nov 7 13:56:04 uvpsw1 kernel: Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20
Nov 7 13:56:04 uvpsw1 kernel: task: ffff881063c56380 ti: ffff8810621b8000 task.ti: ffff8810621b8000
Nov 7 13:56:04 uvpsw1 kernel: RIP: 0010:[<ffffffff8151c7d5>] [<ffffffff8151c7d5>] _raw_spin_lock+0x1a/0x25
Nov 7 13:56:04 uvpsw1 kernel: RSP: 0018:ffff8810621b9c98 EFLAGS: 00000283
Nov 7 13:56:04 uvpsw1 kernel: RAX: 00000000a20aa0ff RBX: ffff8810621002b0 RCX: 8000000000000025
Nov 7 13:56:04 uvpsw1 kernel: RDX: 000000000000a20a RSI: ffff8810621002b0 RDI: ffff8907ded35494
Nov 7 13:56:04 uvpsw1 kernel: RBP: ffff8810621b9c98 R08: 0000000000000001 R09: 0000000000000001
Nov 7 13:56:04 uvpsw1 kernel: R10: 000000000000000a R11: 0000000000000246 R12: ffff881062f726b8
Nov 7 13:56:04 uvpsw1 kernel: R13: 0000000000000001 R14: ffff8810621002b0 R15: ffff881062f726b8
Nov 7 13:56:09 uvpsw1 kernel: FS: 00007fff79512700(0000) GS:ffff88107da00000(0000) knlGS:0000000000000000
Nov 7 13:56:09 uvpsw1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 7 13:56:09 uvpsw1 kernel: CR2: 00007fff79510000 CR3: 00000107dfbcf000 CR4: 00000000000007e0
Nov 7 13:56:09 uvpsw1 kernel: Stack:
Nov 7 13:56:09 uvpsw1 kernel: ffff8810621b9cb8 ffffffff810f3e57 8000000000000025 ffff881062f726b8
Nov 7 13:56:09 uvpsw1 kernel: ffff8810621b9ce8 ffffffff810f3edb 80000187dd73e166 00007fe2dae00000
Nov 7 13:56:09 uvpsw1 kernel: ffff881063708ff8 00007fe2db000000 ffff8810621b9dc8 ffffffff810def2c
Nov 7 13:56:09 uvpsw1 kernel: Call Trace:
Nov 7 13:56:09 uvpsw1 kernel: [<ffffffff810f3e57>] __pmd_trans_huge_lock+0x1a/0x7c
Nov 7 13:56:10 uvpsw1 kernel: [<ffffffff810f3edb>] change_huge_pmd+0x22/0xcc
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810def2c>] change_protection+0x200/0x591
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810ecb07>] change_prot_numa+0x16/0x2c
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff8106c247>] task_numa_work+0x224/0x29a
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810551b1>] task_work_run+0x81/0x99
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810025e1>] do_notify_resume+0x539/0x54b
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810c3ce9>] ? put_page+0x10/0x24
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff810ec9fa>] ? SyS_get_mempolicy+0x33a/0x3e0
Nov 7 13:56:14 uvpsw1 kernel: [<ffffffff8151d6aa>] int_signal+0x12/0x17
Nov 7 13:56:14 uvpsw1 kernel: Code: b1 17 39 c8 ba 01 00 00 00 74 02 31 d2 89 d0 c9 c3 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 d0 74 0c 66 8b 07 <66> 39 d0 74 04 f3 90 eb f4 c9 c3 55 48 89 e5 9c 59 fa b8 00 00

I managed to bisect the issue down to this commit:

0255d491848032f6c601b6410c3b8ebded3a37b1 is the first bad commit
commit 0255d491848032f6c601b6410c3b8ebded3a37b1
Author: Mel Gorman <mgorman@suse.de>
Date: Mon Oct 7 11:28:47 2013 +0100

mm: Account for a THP NUMA hinting update as one PTE update

A THP PMD update is accounted for as 512 pages updated in vmstat. This is
large difference when estimating the cost of automatic NUMA balancing and
can be misleading when comparing results that had collapsed versus split
THP. This patch addresses the accounting issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 e5a44a1f0eea2f41d2cccbdf07eafee4e171b1e2 ef030a7c78ef346095ac991c3e3aa139498ed8e7 M mm

I haven't had a chance yet to dig into the code for this commit to see
what might be causing the crashes, but I have confirmed that this is
where the new problem started (checked the commit before this, and we
don't get the crash, just segfaults like we were getting before). So,
in summary, we still have the segfault issue, but this new issue seems
to be a bit more serious, so I'm going to try and chase this one down
first.

Let me know if you'd like any more information from me and I'll be glad
to provide it.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:00 pm
From: Alex Thorlton

On Wed, Oct 16, 2013 at 10:54:29AM -0500, Alex Thorlton wrote:
> Hi guys,
>
> I ran into a bug a week or so ago, that I believe has something to do
> with NUMA balancing, but I'm having a tough time tracking down exactly
> what is causing it. When running with the following configuration
> options set:
>
> CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
> CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
> CONFIG_NUMA_BALANCING=y
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not set
>
> I get intermittent segfaults when running the memscale test that we've
> been using to test some of the THP changes. Here's a link to the test:
>
> ftp://shell.sgi.com/collect/memscale/

For anyone who's interested, this test has been moved to:

http://oss.sgi.com/projects/memtests/thp_memscale.tar.gz

It should remain there permanently.

>
> I typically run the test with a line similar to this:
>
> ./thp_memscale -C 0 -m 0 -c <cores> -b <memory>
>
> Where <cores> is the number of cores to spawn threads on, and <memory>
> is the amount of memory to reserve from each core. The <memory> field
> can accept values like 512m or 1g, etc. I typically run 256 cores and
> 512m, though I think the problem should be reproducable on anything with
> 128+ cores.
>
> The test never seems to have any problems when running with hugetlbfs
> on and NUMA balancing off, but it segfaults every once in a while with
> the config options above. It seems to occur more frequently, the more
> cores you run on. It segfaults on about 50% of the runs at 256 cores,
> and on almost every run at 512 cores. The fewest number of cores I've
> seen a segfault on has been 128, though it seems to be rare on this many
> cores.
>
> At this point, I'm not familiar enough with NUMA balancing code to know
> what could be causing this, and we don't typically run with NUMA
> balancing on, so I don't see this in my everyday testing, but I felt
> that it was definitely worth bringing up.
>
> If anybody has any ideas of where I could poke around to find a
> solution, please let me know.
>
> - Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [ext2] XIP does not work on ext2
http://groups.google.com/group/linux.kernel/t/7324138990336f3c?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Nov 7 2013 2:00 pm
From: Andiry Xu

On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 07-11-13 12:14:13, Andiry Xu wrote:
>> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@suse.cz> wrote:
>> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
>> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
>> >> >> know it's not related the thread but I really appreciate if you can
>> >> >> answer my question.
>> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page
>> >> > basis - you first access the page, it gets faulted in and you can further
>> >> > access it. So for small (sub page size) accesses this is a win because you
>> >> > don't have an overhead of syscall and fs write path. For accesses larger
>> >> > than page size the overhead of syscall and some initial checks is well
>> >> > hidden by other things. I guess write() ends up being more efficient
>> >> > because write path taken for each page is somewhat lighter than full page
>> >> > fault. But you'd need to look into perf data to get some hard numbers on
>> >> > where the time is spent.
>> >> >
>> >>
>> >> Thanks for the reply. However I have filled up the whole RAM disk
>> >> before doing the test, i.e. asked the brd driver to allocate all the
>> >> pages initially.
>> > Well, pages in ramdisk are always present, that's not an issue. But you
>> > will get a page fault to map a particular physical page in process'
>> > virtual address space when you first access that virtual address in the
>> > mapping from the process. The cost of setting up this virtual->physical
>> > mapping is what I'm talking about.
>> >
>>
>> Yes, you are right, there are page faults observed with perf. I
>> misunderstood page fault as copying pages between backing store and
>> physical memory.
>>
>> > If you had a process which first mmaps the file and writes to all pages in
>> > the mapping and *then* measure the cost of another round of writing to the
>> > mapping, I would expect you should see speeds close to those of memory bus.
>> >
>>
>> I've tried this as well. mmap() performance improves but still not as
>> good as write().
>> I used the perf report to compare write() and mmap() applications. For
>> write() version, top of perf report shows as:
>> 33.33% __copy_user_nocache
>> 4.72% ext2_get_blocks
>> 4.42% mutex_unlock
>> 3.59% __find_get_block
>>
>> which looks reasonable.
>>
>> However, for mmap() version, the perf report looks strange:
>> 94.98% libc-2.15.so [.] 0x000000000014698d
>> 2.25% page_fault
>> 0.18% handle_mm_fault
>>
>> I don't know what the first item is but it took the majority of cycles.
> The first item means that it's some userspace code in libc. My guess
> would be that it's libc's memcpy() function (or whatever you use to write
> to mmap). How do you access the mmap?
>

Like this:

fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
for (i = 0; i < count; i++)
{
memcpy(dest, src, request_size);
dest += request_size;
}

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 3 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Jan Kara

On Thu 07-11-13 13:50:09, Andiry Xu wrote:
> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@suse.cz> wrote:
> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
> >> >> >> know it's not related the thread but I really appreciate if you can
> >> >> >> answer my question.
> >> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page
> >> >> > basis - you first access the page, it gets faulted in and you can further
> >> >> > access it. So for small (sub page size) accesses this is a win because you
> >> >> > don't have an overhead of syscall and fs write path. For accesses larger
> >> >> > than page size the overhead of syscall and some initial checks is well
> >> >> > hidden by other things. I guess write() ends up being more efficient
> >> >> > because write path taken for each page is somewhat lighter than full page
> >> >> > fault. But you'd need to look into perf data to get some hard numbers on
> >> >> > where the time is spent.
> >> >> >
> >> >>
> >> >> Thanks for the reply. However I have filled up the whole RAM disk
> >> >> before doing the test, i.e. asked the brd driver to allocate all the
> >> >> pages initially.
> >> > Well, pages in ramdisk are always present, that's not an issue. But you
> >> > will get a page fault to map a particular physical page in process'
> >> > virtual address space when you first access that virtual address in the
> >> > mapping from the process. The cost of setting up this virtual->physical
> >> > mapping is what I'm talking about.
> >> >
> >>
> >> Yes, you are right, there are page faults observed with perf. I
> >> misunderstood page fault as copying pages between backing store and
> >> physical memory.
> >>
> >> > If you had a process which first mmaps the file and writes to all pages in
> >> > the mapping and *then* measure the cost of another round of writing to the
> >> > mapping, I would expect you should see speeds close to those of memory bus.
> >> >
> >>
> >> I've tried this as well. mmap() performance improves but still not as
> >> good as write().
> >> I used the perf report to compare write() and mmap() applications. For
> >> write() version, top of perf report shows as:
> >> 33.33% __copy_user_nocache
> >> 4.72% ext2_get_blocks
> >> 4.42% mutex_unlock
> >> 3.59% __find_get_block
> >>
> >> which looks reasonable.
> >>
> >> However, for mmap() version, the perf report looks strange:
> >> 94.98% libc-2.15.so [.] 0x000000000014698d
> >> 2.25% page_fault
> >> 0.18% handle_mm_fault
> >>
> >> I don't know what the first item is but it took the majority of cycles.
> > The first item means that it's some userspace code in libc. My guess
> > would be that it's libc's memcpy() function (or whatever you use to write
> > to mmap). How do you access the mmap?
> >
>
> Like this:
>
> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
> for (i = 0; i < count; i++)
> {
> memcpy(dest, src, request_size);
> dest += request_size;
> }
OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
to tune that though...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 3 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Andiry Xu

On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 07-11-13 13:50:09, Andiry Xu wrote:
>> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@suse.cz> wrote:
>> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
>> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@suse.cz> wrote:
>> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
>> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
>> >> >> >> know it's not related the thread but I really appreciate if you can
>> >> >> >> answer my question.
>> >> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page
>> >> >> > basis - you first access the page, it gets faulted in and you can further
>> >> >> > access it. So for small (sub page size) accesses this is a win because you
>> >> >> > don't have an overhead of syscall and fs write path. For accesses larger
>> >> >> > than page size the overhead of syscall and some initial checks is well
>> >> >> > hidden by other things. I guess write() ends up being more efficient
>> >> >> > because write path taken for each page is somewhat lighter than full page
>> >> >> > fault. But you'd need to look into perf data to get some hard numbers on
>> >> >> > where the time is spent.
>> >> >> >
>> >> >>
>> >> >> Thanks for the reply. However I have filled up the whole RAM disk
>> >> >> before doing the test, i.e. asked the brd driver to allocate all the
>> >> >> pages initially.
>> >> > Well, pages in ramdisk are always present, that's not an issue. But you
>> >> > will get a page fault to map a particular physical page in process'
>> >> > virtual address space when you first access that virtual address in the
>> >> > mapping from the process. The cost of setting up this virtual->physical
>> >> > mapping is what I'm talking about.
>> >> >
>> >>
>> >> Yes, you are right, there are page faults observed with perf. I
>> >> misunderstood page fault as copying pages between backing store and
>> >> physical memory.
>> >>
>> >> > If you had a process which first mmaps the file and writes to all pages in
>> >> > the mapping and *then* measure the cost of another round of writing to the
>> >> > mapping, I would expect you should see speeds close to those of memory bus.
>> >> >
>> >>
>> >> I've tried this as well. mmap() performance improves but still not as
>> >> good as write().
>> >> I used the perf report to compare write() and mmap() applications. For
>> >> write() version, top of perf report shows as:
>> >> 33.33% __copy_user_nocache
>> >> 4.72% ext2_get_blocks
>> >> 4.42% mutex_unlock
>> >> 3.59% __find_get_block
>> >>
>> >> which looks reasonable.
>> >>
>> >> However, for mmap() version, the perf report looks strange:
>> >> 94.98% libc-2.15.so [.] 0x000000000014698d
>> >> 2.25% page_fault
>> >> 0.18% handle_mm_fault
>> >>
>> >> I don't know what the first item is but it took the majority of cycles.
>> > The first item means that it's some userspace code in libc. My guess
>> > would be that it's libc's memcpy() function (or whatever you use to write
>> > to mmap). How do you access the mmap?
>> >
>>
>> Like this:
>>
>> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
>> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
>> for (i = 0; i < count; i++)
>> {
>> memcpy(dest, src, request_size);
>> dest += request_size;
>> }
> OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
> to tune that though...
>

Hmm, I will try some different kinds of memcpy to see if there is a
difference. Just want to make sure I do not make some stupid mistakes
before trying that.
Thanks a lot for your help!

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: rbtree: Fix rbtree_postorder_for_each_entry_safe() iterator
http://groups.google.com/group/linux.kernel/t/47bd7300ac52f95d?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:00 pm
From: Cody P Schafer

On 11/07/2013 01:38 PM, Andrew Morton wrote:
> On Wed, 6 Nov 2013 17:42:30 -0800 Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>
>> The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
>> underflow behavior when testing for loop termination. In particular
>> it expects that
>> &rb_entry(NULL, type, field)->field
>> is NULL. But the result of this expression is not defined by a C standard
>> and some gcc versions (e.g. 4.3.4) assume the above expression can never
>> be equal to NULL. The net result is an oops because the iteration is not
>> properly terminated.
>>
>> Fix the problem by modifying the iterator to avoid pointer underflows.
>
> So the sole caller is in zswap.c. Is that code actually generating oopses?

I can't reproduce the oopses (at all) with my build/gcc version, but Jan
has reported seeing them (not in zswap, however).

>
> IOW, is there any need to fix this in 3.12 or earlier?
>

The zswap usage change showed up in 3.12.
In my opinion, it is probably a good idea to apply the fix to 3.12.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Jan Kara

On Thu 07-11-13 13:38:00, Andrew Morton wrote:
> On Wed, 6 Nov 2013 17:42:30 -0800 Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>
> > The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
> > underflow behavior when testing for loop termination. In particular
> > it expects that
> > &rb_entry(NULL, type, field)->field
> > is NULL. But the result of this expression is not defined by a C standard
> > and some gcc versions (e.g. 4.3.4) assume the above expression can never
> > be equal to NULL. The net result is an oops because the iteration is not
> > properly terminated.
> >
> > Fix the problem by modifying the iterator to avoid pointer underflows.
>
> So the sole caller is in zswap.c. Is that code actually generating oopses?
Oh, I didn't know there is any user of that iterator already in tree. Let
me check... Umm, looking at the disassembly of
zswap_frontswap_invalidate_aread:
0xffffffff8112c9a5 <+37>: mov %r13,%rdi
0xffffffff8112c9a8 <+40>: callq 0xffffffff81227620 <rb_first_postorder>
0xffffffff8112c9ad <+45>: mov %rax,%rdi
0xffffffff8112c9b0 <+48>: mov %rax,%rbx
0xffffffff8112c9b3 <+51>: callq 0xffffffff812275d0 <rb_next_postorder>
0xffffffff8112c9b8 <+56>: mov %rax,%r12
0xffffffff8112c9bb <+59>: nopl 0x0(%rax,%rax,1)
0xffffffff8112c9c0 <+64>: mov 0x28(%rbx),%rsi
0xffffffff8112c9c4 <+68>: mov 0x40(%r13),%rdi
0xffffffff8112c9c8 <+72>: callq 0xffffffff811352b0 <zbud_free>
0xffffffff8112c9cd <+77>: mov 0x1105504(%rip),%rdi
0xffffffff8112c9d4 <+84>: mov %rbx,%rsi
0xffffffff8112c9d7 <+87>: callq 0xffffffff81130b80 <kmem_cache_free>
0xffffffff8112c9dc <+92>: lock decl 0x110539d(%rip)
0xffffffff8112c9e3 <+99>: mov %r12,%rdi
0xffffffff8112c9e6 <+102>: mov %r12,%rbx
0xffffffff8112c9e9 <+105>: callq 0xffffffff812275d0 <rb_next_postorder>
0xffffffff8112c9ee <+110>: mov %rax,%r12
0xffffffff8112c9f1 <+113>: jmp 0xffffffff8112c9c0 <zswap_frontswap_invalidate_area+64>

So my gcc helpfully compiled that iterator into an endless loop as well,
although now it is a perfectly valid C code. According to our gcc guys
that was a bug in some gcc versions which is already fixed. But anyway
pushing my patch to 3.12 or anything that actually uses that iterator will
probably save us some bug reports.

Honza

--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ARM: pinctrl: Add Broadcom Capri pinctrl driver
http://groups.google.com/group/linux.kernel/t/b0e58ba3e23c2b35?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:10 pm
From: "Sherman Yin"

On 13-11-06 09:00 AM, Stephen Warren wrote:
> You probably don't want to reference the individual xxx1/2/3 nodes in
> the client pinctrl properties, but instead wrap them in a higher-level
> node that represents the whole pinctrl state. Then, the client pinctrl
> properties can reference just that single parent node, instead of many
> small nodes. In other words:
>
> pinctrl@... {
> ...
> sx: state_xxx {
> xxx1 { ... };
> xxx2 { ... };
> xxx3 { ... };
> };
> sy: state_yyy {
> yyy1 { ... };
> yyy2 { ... };
> };
> }
>
> some_client@... {
> ...
> pinctrl-names = "default";
> pinctrl-0 = <&sx>;
> };
>
> other_client@... {
> ...
> pinctrl-names = "default";
> pinctrl-0 = <&sy>;
> };
>
> rather than:
>
> pinctrl@... {
> ...
> sx1: xxx1 { ... };
> sx2: xxx2 { ... };
> sx3: xxx3 { ... };
> sy1: yyy1 { ... };
> sy2: yyy2 { ... };
> }
>
> some_client@... {
> ...
> pinctrl-names = "default";
> pinctrl-0 = <&sx1 &sx2 &sx3>;
> };
>
> other_client@... {
> ...
> pinctrl-names = "default";
> pinctrl-0 = <&sy1 &sy2>;
> };
>
> This is exactly how the Tegra pinctrl bindings work for example.

Ok, right, I mistakenly thought the "xxx1" nodes are pin config nodes.
Actually that's the way my original driver works as well, other than the
fact that I don't have as many "xxx1" type nodes as decribed in the
"xxx" node below.

>> This works fine. However, I'm just thinking that
>> it would have been easier if we could specify just one node:
>>
>> xxx {
>> pins = <PINA>, <PINB>, <PINC>;
>> function = <...>;
>> pull-up = <1 1 0>;
>> }
>>
>> This "feature" seems a bit more concise to me and is what I did for my
>> original pinctrl driver. The only downside is that with this method,
>> one cannot specify "don't touch this option for this pin" if the same
>> property must provide values for other pins.
>
> The other downside is that if the lists get even slightly long, it get
> really hard to match up the entries in the t properties.

Agree that it would start to get difficult to read if a subnode has too
many pins. I guess the solution would be to somehow split up the pins
to more subnodes with fewer pins each.

Regards,
Sherman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: wire up CPU features to udev based module loading
http://groups.google.com/group/linux.kernel/t/7bd137213e7534f5?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Ard Biesheuvel

On 7 November 2013 22:39, Andi Kleen <ak@linux.intel.com> wrote:
> On Thu, Nov 07, 2013 at 01:09:41PM -0800, H. Peter Anvin wrote:
>> On 11/07/2013 09:17 AM, Ard Biesheuvel wrote:
>> > This series implements automatic module loading based on optional CPU features,
>> > and tries to do so in a generic way. Currently, 32 feature bits are supported,
>> > and how they map to actual CPU features is entirely up to the architecture.
>>
>> NAK.
>>
>> We in the x86 world already left 32 bits way behind; we currently have
>> 320 bit feature masks.
>>
>> If you're aiming at doing this in a generic way, it needs to be able to
>> accommodate the current x86cpu feature stuff as a subset, which this
>> doesn't.
>
> They can just use the exact same code/macros as x86cpu, just need a different
> prefix and use wildcards if they miss something (e.g. family)
>

That would involve repurposing/generalizing a bit more of the existing
x86-only code than I did the first time around, but if you (as x86
maintainers) are happy with that, I'm all for it.

I do have a couple of questions then
- the module aliases host tool has no arch specific dependencies at
all except having x86cpu as one of the entries: would you mind
dropping the x86 prefix there? Or rather add dependencies on $ARCH?
(If we drop it there, we basically end up with 'cpu:' everywhere)
- in the vendor/family/model case, it may be preferable to drop these
fields entirely from certain modules' aliases if they match on 'any'
(provided that the module tools permit this) rather than add
architecture, variant, revision, etc fields for all architectures if
they can only ever match on one
- some of the X86_ macros would probable be redefined in terms of the
generic macros rather than the other way around, which would result in
some changes under arch/x86 as well, is that acceptable for you?

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Andi Kleen

> - the module aliases host tool has no arch specific dependencies at
> all except having x86cpu as one of the entries: would you mind
> dropping the x86 prefix there? Or rather add dependencies on $ARCH?
> (If we drop it there, we basically end up with 'cpu:' everywhere)

Should be fine.

> - in the vendor/family/model case, it may be preferable to drop these
> fields entirely from certain modules' aliases if they match on 'any'
> (provided that the module tools permit this) rather than add
> architecture, variant, revision, etc fields for all architectures if
> they can only ever match on one

The module tools require everything matching with the same wild cards.

So I don't know how "any" would work.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ARM: add initial support for Marvell Berlin SoCs
http://groups.google.com/group/linux.kernel/t/a3e38d590da25903?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Arnd Bergmann

On Thursday 07 November 2013, Sebastian Hesselbarth wrote:
> I haven't looked deeper into this, but I guess it will not be hard
> to make ARM_TWD independent of SMP.

Yes, I agree. Just make sure to look at the list archives to see if someone
already did that work.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: of: irq: Fix interrupt-map entry matching
http://groups.google.com/group/linux.kernel/t/2db00bbb83f94fe6?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Tomasz Figa

On Thursday 07 of November 2013 10:40:16 Rob Herring wrote:
> On Thu, Nov 7, 2013 at 5:32 AM, Tomasz Figa <t.figa@samsung.com> wrote:
> > Hi Grant,
> >
> > Could you pick this patch up? It fixes boot-up at least on several
> > Exynos based platforms, which use interrupt-map nodes with
> > #interrupt-cells higher than 1.
> >
> > Also please disregard patch 2/2, as your fix that has been merged
> > seems
> > to be fine.
>
> I've applied the 1st patch.

Thanks Rob.

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: irq_work: Provide a irq work that can be processed on any cpu
http://groups.google.com/group/linux.kernel/t/7bc00cc28be7d4b5?hl=en
==============================================================================

== 1 of 6 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Jan Kara

On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara <jack@suse.cz>:
> > Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> > processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> > simple and we don't have to pick any particular cpu to do the work. We
> > just do the work from a timer tick on whichever cpu it happens first.
> > This is useful as a lightweight and simple code path without locking or
> > other dependencies to offload work to other cpu if possible.
> >
> > We will use this type of irq work to make a guarantee of forward
> > progress of printing to a (serial) console when printing on one cpu
> > would cause interrupts to be disabled for too long.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > include/linux/irq_work.h | 2 ++
> > kernel/irq_work.c | 41 +++++++++++++++++++++++++----------------
> > 2 files changed, 27 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 66017028dcb3..ca07a16355ed 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -16,6 +16,8 @@
> > #define IRQ_WORK_BUSY 2UL
> > #define IRQ_WORK_FLAGS 3UL
> > #define IRQ_WORK_LAZY 4UL /* Doesn't want IPI, wait for tick */
> > +#define __IRQ_WORK_UNBOUND 8UL /* Use IRQ_WORK_UNBOUND instead! */
> > +#define IRQ_WORK_UNBOUND (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu can process this work */
> >
> > struct irq_work {
> > unsigned long flags;
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index 55fcce6065cf..b06350b63c67 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -22,6 +22,9 @@
> > static DEFINE_PER_CPU(struct llist_head, irq_work_list);
> > static DEFINE_PER_CPU(int, irq_work_raised);
> >
> > +/* List of irq-work any CPU can pick up */
> > +static LLIST_HEAD(unbound_irq_work_list);
> > +
> > /*
> > * Claim the entry so that no one else will poke at it.
> > */
> > @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
> > /* Queue the entry and raise the IPI if needed. */
> > preempt_disable();
> >
> > - llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> > + if (work->flags & __IRQ_WORK_UNBOUND)
> > + llist_add(&work->llnode, &unbound_irq_work_list);
> > + else
> > + llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> >
> > /*
> > * If the work is not "lazy" or the tick is stopped, raise the irq
> > * work interrupt (if supported by the arch), otherwise, just wait
> > - * for the next tick.
> > + * for the next tick. We do this even for unbound work to make sure
> > + * *some* CPU will be doing the work.
> > */
> > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
> > return true;
> > }
> >
> > -static void __irq_work_run(void)
> > +static void process_irq_work_list(struct llist_head *llhead)
> > {
> > unsigned long flags;
> > struct irq_work *work;
> > - struct llist_head *this_list;
> > struct llist_node *llnode;
> >
> > -
> > - /*
> > - * Reset the "raised" state right before we check the list because
> > - * an NMI may enqueue after we find the list empty from the runner.
> > - */
> > - __this_cpu_write(irq_work_raised, 0);
> > - barrier();
> > -
> > - this_list = &__get_cpu_var(irq_work_list);
> > - if (llist_empty(this_list))
> > + if (llist_empty(llhead))
> > return;
> >
> > BUG_ON(!irqs_disabled());
> > -
> > - llnode = llist_del_all(this_list);
> > + llnode = llist_del_all(llhead);
> > while (llnode != NULL) {
> > work = llist_entry(llnode, struct irq_work, llnode);
> >
> > @@ -146,6 +142,19 @@ static void __irq_work_run(void)
> > }
> > }
> >
> > +static void __irq_work_run(void)
> > +{
> > + /*
> > + * Reset the "raised" state right before we check the list because
> > + * an NMI may enqueue after we find the list empty from the runner.
> > + */
> > + __this_cpu_write(irq_work_raised, 0);
> > + barrier();
> > +
> > + process_irq_work_list(&__get_cpu_var(irq_work_list));
> > + process_irq_work_list(&unbound_irq_work_list);
> > +}
> > +
>
> But then, who's going to process that work if every CPUs is idle?
Have a look into irq_work_queue(). There is:
/*
* If the work is not "lazy" or the tick is stopped, raise the irq
* work interrupt (if supported by the arch), otherwise, just wait
* for the next tick. We do this even for unbound work to make sure
* *some* CPU will be doing the work.
*/
if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
arch_irq_work_raise();
}

So we raise an interrupt if there would be no timer ticking (which is
what I suppose you mean by "CPU is idle"). That is nothing changed by my
patches...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 6 ==
Date: Thurs, Nov 7 2013 2:20 pm
From: Frederic Weisbecker

2013/11/7 Jan Kara <jack@suse.cz>:
> Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> simple and we don't have to pick any particular cpu to do the work. We
> just do the work from a timer tick on whichever cpu it happens first.
> This is useful as a lightweight and simple code path without locking or
> other dependencies to offload work to other cpu if possible.
>
> We will use this type of irq work to make a guarantee of forward
> progress of printing to a (serial) console when printing on one cpu
> would cause interrupts to be disabled for too long.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> include/linux/irq_work.h | 2 ++
> kernel/irq_work.c | 41 +++++++++++++++++++++++++----------------
> 2 files changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 66017028dcb3..ca07a16355ed 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -16,6 +16,8 @@
> #define IRQ_WORK_BUSY 2UL
> #define IRQ_WORK_FLAGS 3UL
> #define IRQ_WORK_LAZY 4UL /* Doesn't want IPI, wait for tick */
> +#define __IRQ_WORK_UNBOUND 8UL /* Use IRQ_WORK_UNBOUND instead! */
> +#define IRQ_WORK_UNBOUND (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu can process this work */
>
> struct irq_work {
> unsigned long flags;
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 55fcce6065cf..b06350b63c67 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -22,6 +22,9 @@
> static DEFINE_PER_CPU(struct llist_head, irq_work_list);
> static DEFINE_PER_CPU(int, irq_work_raised);
>
> +/* List of irq-work any CPU can pick up */
> +static LLIST_HEAD(unbound_irq_work_list);
> +
> /*
> * Claim the entry so that no one else will poke at it.
> */
> @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
> /* Queue the entry and raise the IPI if needed. */
> preempt_disable();
>
> - llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> + if (work->flags & __IRQ_WORK_UNBOUND)
> + llist_add(&work->llnode, &unbound_irq_work_list);
> + else
> + llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
>
> /*
> * If the work is not "lazy" or the tick is stopped, raise the irq
> * work interrupt (if supported by the arch), otherwise, just wait
> - * for the next tick.
> + * for the next tick. We do this even for unbound work to make sure
> + * *some* CPU will be doing the work.
> */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
> return true;
> }
>
> -static void __irq_work_run(void)
> +static void process_irq_work_list(struct llist_head *llhead)
> {
> unsigned long flags;
> struct irq_work *work;
> - struct llist_head *this_list;
> struct llist_node *llnode;
>
> -
> - /*
> - * Reset the "raised" state right before we check the list because
> - * an NMI may enqueue after we find the list empty from the runner.
> - */
> - __this_cpu_write(irq_work_raised, 0);
> - barrier();
> -
> - this_list = &__get_cpu_var(irq_work_list);
> - if (llist_empty(this_list))
> + if (llist_empty(llhead))
> return;
>
> BUG_ON(!irqs_disabled());
> -
> - llnode = llist_del_all(this_list);
> + llnode = llist_del_all(llhead);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> @@ -146,6 +142,19 @@ static void __irq_work_run(void)
> }
> }
>
> +static void __irq_work_run(void)
> +{
> + /*
> + * Reset the "raised" state right before we check the list because
> + * an NMI may enqueue after we find the list empty from the runner.
> + */
> + __this_cpu_write(irq_work_raised, 0);
> + barrier();
> +
> + process_irq_work_list(&__get_cpu_var(irq_work_list));
> + process_irq_work_list(&unbound_irq_work_list);
> +}
> +

But then, who's going to process that work if every CPUs is idle?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 6 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Frederic Weisbecker

On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
> Have a look into irq_work_queue(). There is:
> /*
> * If the work is not "lazy" or the tick is stopped, raise the irq
> * work interrupt (if supported by the arch), otherwise, just wait
> * for the next tick. We do this even for unbound work to make sure
> * *some* CPU will be doing the work.
> */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
>
> So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

Ok but we raise that interrupt locally, not to the other CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 4 of 6 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Frederic Weisbecker

On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
> Have a look into irq_work_queue(). There is:
> /*
> * If the work is not "lazy" or the tick is stopped, raise the irq
> * work interrupt (if supported by the arch), otherwise, just wait
> * for the next tick. We do this even for unbound work to make sure
> * *some* CPU will be doing the work.
> */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
>
> So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

That said I agree that it would be nice to have smp_call_function_many() support
non waiting calls, something based on llist, that would be less deadlock prone
to begin with.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 5 of 6 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Frederic Weisbecker

2013/11/7 Jan Kara <jack@suse.cz>:
> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because interrupt from it wasn't
> served for too long. This is why just silencing watchdogs isn't a
> reliable solution to the problem and we simply have to avoid spending
> too long in console_unlock() with interrupts disabled.
>
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over X characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. User can tune the value X via
> printk.offload_chars kernel parameter.
>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Jan Kara <jack@suse.cz>

When a message takes tens of seconds to be printed, it usually means
we are in trouble somehow :)
I wonder what printk source can trigger such a high volume.

May be cutting some huge message into smaller chunks could help? That
would re enable interrupts between each call.

It's hard to tell without the context, but using other CPUs for
rescuing doesn't look like a good solution. What if the issue happens
in UP to begin with?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 6 of 6 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Jan Kara

On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara <jack@suse.cz>:
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because interrupt from it wasn't
> > served for too long. This is why just silencing watchdogs isn't a
> > reliable solution to the problem and we simply have to avoid spending
> > too long in console_unlock() with interrupts disabled.
> >
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over X characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. User can tune the value X via
> > printk.offload_chars kernel parameter.
> >
> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
>
> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.
Machines with tens of processors and thousands of scsi devices. When
device discovery happens on boot, all processors are busily reporting new
scsi devices and one poor looser is bound to do the printing for ever and
ever until the machine dies...

Or try running sysrq-t on a large machine with serial console connected. The
machine will die because of lockups (although in this case I agree it is more
of a problem of sysrq-t doing lots of printing in interrupt-disabled
context).

> May be cutting some huge message into smaller chunks could help? That
> would re enable interrupts between each call.
>
> It's hard to tell without the context, but using other CPUs for
> rescuing doesn't look like a good solution. What if the issue happens
> in UP to begin with?
The real trouble in practice is that while one cpu is doing printing,
other cpus are appending to the printk buffer. So the cpu can be printing
for a *long* time. So offloading the work to other cpus which are also
appending messages seems as a fair thing to do.

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: provide estimated available memory in /proc/meminfo
http://groups.google.com/group/linux.kernel/t/3de838d7bfb53c0c?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Andrew Morton

On Thu, 7 Nov 2013 16:21:32 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> > Subject: provide estimated available memory in /proc/meminfo
> >
> > Many load balancing and workload placing programs check /proc/meminfo
> > to estimate how much free memory is available. They generally do this
> > by adding up "free" and "cached", which was fine ten years ago, but
> > is pretty much guaranteed to be wrong today.
> >
> > It is wrong because Cached includes memory that is not freeable as
> > page cache, for example shared memory segments, tmpfs, and ramfs,
> > and it does not include reclaimable slab memory, which can take up
> > a large fraction of system memory on mostly idle systems with lots
> > of files.
> >
> > Currently, the amount of memory that is available for a new workload,
> > without pushing the system into swap, can be estimated from MemFree,
> > Active(file), Inactive(file), and SReclaimable, as well as the "low"
> > watermarks from /proc/zoneinfo.
> >
> > However, this may change in the future, and user space really should
> > not be expected to know kernel internals to come up with an estimate
> > for the amount of free memory.
> >
> > It is more convenient to provide such an estimate in /proc/meminfo.
> > If things change in the future, we only have to change it in one place.
> >
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> I have a suspicion that people will end up relying on this number to
> start new workloads in situations where lots of the page cache is
> actually heavily used. We might not swap, but there will still be IO
> from thrashing cache.
>
> Maybe we'll have to subtract mapped cache pages in the future to
> mitigate this risk somehow...
>
> Anyway, we can defer this to when it's proven to be an actual problem.

Well not really. Once we release this thing with a particular
implementation, we are constrained in making any later changes. If we
change it to produce larger numbers, someone's workload will start
swapping. If we change it to produce smaller numbers, someone's
workload will refuse to start.

It all needs a bit of thought, and even some testing! I labelled this
one for-3.14.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Introducing Device Tree Overlays
http://groups.google.com/group/linux.kernel/t/7c26bee9afcc62d1?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Guenter Roeck

On Thu, Nov 07, 2013 at 08:25:58PM +0100, Sebastian Andrzej Siewior wrote:
> On 06.11.13, Guenter Roeck wrote:
> |…
> thanks for the explanation.
>
> > We use DT overlays to describe the hardware on those boards and, if necessary,
> > its configuration. For example, if there is a PCIe switch, the overlay would
> > describe its memory and bus number configuration.
>
> So have your "fix" configuration and a few overlays you switch at
> runtime. The problem you have is that you want to switch a specific part
> if your configuration at runtime. I assume you run DT on ARM. What
> happens if you swtich from ARM to x86 and you "keep" your FPGA
> configuration requirement? You can't use both, DT and ACPI, right? So
> what happens then?
>
We intend to use DT on x86 to augment ACPI data. There is a variety of reasons
why we can not use ACPI, nor do we want to as we prefer a single method for
handling OIR on all platforms. FWIW, the non-x86 platform is powerpc, not arm.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Thurs, Nov 7 2013 3:00 pm
From: Guenter Roeck

On Thu, Nov 07, 2013 at 10:06:31PM +0200, Pantelis Antoniou wrote:
> Hi Sebastian,
>
> On Nov 7, 2013, at 9:25 PM, Sebastian Andrzej Siewior wrote:
>
> > On 06.11.13, Guenter Roeck wrote:
> > |…
> > thanks for the explanation.
> >
> >> We use DT overlays to describe the hardware on those boards and, if necessary,
> >> its configuration. For example, if there is a PCIe switch, the overlay would
> >> describe its memory and bus number configuration.
> >
> > So have your "fix" configuration and a few overlays you switch at
> > runtime. The problem you have is that you want to switch a specific part
> > if your configuration at runtime. I assume you run DT on ARM. What

powerpc

> > happens if you swtich from ARM to x86 and you "keep" your FPGA
> > configuration requirement? You can't use both, DT and ACPI, right? So
> > what happens then?
> >
>
> FWIW DT has been ported to x86. And is present on arm/powerpc/mips/arc and possibly
> others.
>
> So what are we talking about again? If you care about the non-DT case, why
> don't you make a patch about how you could support Guenter's use case on
> the x86.
>
> His use case is not uncommon, believe it or not, and x86 would benefit from
> something this flexible.
>
Together with the work Thierry has done a couple of years ago, using DT
to augment ACPI data on x86 platforms, I don't really see a major problem
with using DT overlays on x86. Sure, it will require some work, and the
resulting patches may not be accepted for upstream integration,
but the concept is already there, and we plan to make good use of it.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: platform: add chrome platform directory
http://groups.google.com/group/linux.kernel/t/ca8784c0df1499d1?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Olof Johansson

It makes sense to split out the Chromebook/Chromebox hardware platform
drivers to a separate subdirectory, since some of it will be shared
between ARM and x86.

This moves over the existing chromeos_laptop driver without making
any other changes, and adds appropriate Kconfig entries for the new
directory. It also adds a MAINTAINERS entry for the new subdir.

Signed-off-by: Olof Johansson <olof@lixom.net>
---
MAINTAINERS | 5 ++++
drivers/platform/Kconfig | 1 +
drivers/platform/Makefile | 1 +
drivers/platform/chrome/Kconfig | 28 ++++++++++++++++++++++
drivers/platform/chrome/Makefile | 2 ++
drivers/platform/{x86 => chrome}/chromeos_laptop.c | 0
drivers/platform/x86/Kconfig | 11 ---------
drivers/platform/x86/Makefile | 1 -
8 files changed, 37 insertions(+), 12 deletions(-)
create mode 100644 drivers/platform/chrome/Kconfig
create mode 100644 drivers/platform/chrome/Makefile
rename drivers/platform/{x86 => chrome}/chromeos_laptop.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 831b8690cf13..07e312a3377b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2129,6 +2129,11 @@ L: linux-usb@vger.kernel.org
S: Maintained
F: drivers/usb/chipidea/

+CHROME HARDWARE PLATFORM SUPPORT
+M: Olof Johansson <olof@lixom.net>
+S: Maintained
+F: drivers/platform/chrome/
+
CISCO VIC ETHERNET NIC DRIVER
M: Christian Benvenuti <benve@cisco.com>
M: Sujith Sankar <ssujith@cisco.com>
diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
index 69616aeaa966..09fde58b12e0 100644
--- a/drivers/platform/Kconfig
+++ b/drivers/platform/Kconfig
@@ -5,3 +5,4 @@ if GOLDFISH
source "drivers/platform/goldfish/Kconfig"
endif

+source "drivers/platform/chrome/Kconfig"
diff --git a/drivers/platform/Makefile b/drivers/platform/Makefile
index 8a44a4cd6d1e..3656b7b17b99 100644
--- a/drivers/platform/Makefile
+++ b/drivers/platform/Makefile
@@ -5,3 +5,4 @@
obj-$(CONFIG_X86) += x86/
obj-$(CONFIG_OLPC) += olpc/
obj-$(CONFIG_GOLDFISH) += goldfish/
+obj-$(CONFIG_CHROME_PLATFORMS) += chrome/
diff --git a/drivers/platform/chrome/Kconfig b/drivers/platform/chrome/Kconfig
new file mode 100644
index 000000000000..b13303e75a34
--- /dev/null
+++ b/drivers/platform/chrome/Kconfig
@@ -0,0 +1,28 @@
+#
+# Platform support for Chrome OS hardware (Chromebooks and Chromeboxes)
+#
+
+menuconfig CHROME_PLATFORMS
+ bool "Platform support for Chrome hardware"
+ depends on X86
+ ---help---
+ Say Y here to get to see options for platform support for
+ various Chromebooks and Chromeboxes. This option alone does
+ not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and disabled.
+
+if CHROME_PLATFORMS
+
+config CHROMEOS_LAPTOP
+ tristate "Chrome OS Laptop"
+ depends on I2C
+ depends on DMI
+ ---help---
+ This driver instantiates i2c and smbus devices such as
+ light sensors and touchpads.
+
+ If you have a supported Chromebook, choose Y or M here.
+ The module will be called chromeos_laptop.
+
+endif # CHROMEOS_PLATFORMS
diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
new file mode 100644
index 000000000000..015e9195e226
--- /dev/null
+++ b/drivers/platform/chrome/Makefile
@@ -0,0 +1,2 @@
+
+obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o
diff --git a/drivers/platform/x86/chromeos_laptop.c b/drivers/platform/chrome/chromeos_laptop.c
similarity index 100%
rename from drivers/platform/x86/chromeos_laptop.c
rename to drivers/platform/chrome/chromeos_laptop.c
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index b51a7460cc49..d9dcd37b5a52 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -79,17 +79,6 @@ config ASUS_LAPTOP

If you have an ACPI-compatible ASUS laptop, say Y or M here.

-config CHROMEOS_LAPTOP
- tristate "Chrome OS Laptop"
- depends on I2C
- depends on DMI
- ---help---
- This driver instantiates i2c and smbus devices such as
- light sensors and touchpads.
-
- If you have a supported Chromebook, choose Y or M here.
- The module will be called chromeos_laptop.
-
config DELL_LAPTOP
tristate "Dell Laptop Extras"
depends on X86
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index 5dbe19324351..f0e6aa407ffb 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -50,7 +50,6 @@ obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o
obj-$(CONFIG_INTEL_OAKTRAIL) += intel_oaktrail.o
obj-$(CONFIG_SAMSUNG_Q10) += samsung-q10.o
obj-$(CONFIG_APPLE_GMUX) += apple-gmux.o
-obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o
obj-$(CONFIG_INTEL_RST) += intel-rst.o
obj-$(CONFIG_INTEL_SMARTCONNECT) += intel-smartconnect.o

--
1.8.4.1.601.g02b3b1d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: MCS Lock: Barrier corrections
http://groups.google.com/group/linux.kernel/t/c949c96528a47270?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Nov 7 2013 2:30 pm
From: Peter Zijlstra

On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
> Michel, are you planning to do an implementation of
> load-acquire/store-release functions of various architectures?

A little something like this:
http://marc.info/?l=linux-arch&m=138386254111507

It so happens we were working on that the past week or so due to another
issue ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Thurs, Nov 7 2013 2:50 pm
From: Michel Lespinasse

On Thu, Nov 7, 2013 at 2:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
>> Michel, are you planning to do an implementation of
>> load-acquire/store-release functions of various architectures?
>
> A little something like this:
> http://marc.info/?l=linux-arch&m=138386254111507
>
> It so happens we were working on that the past week or so due to another
> issue ;-)

Haha, awesome, I wasn't aware of this effort.

Tim: my approach would be to provide the acquire/release operations in
arch-specific include files, and have a default implementation using
barriers for arches who don't provide these new ops. That way you make
it work on all arches at once (using the default implementation) and
make it fast on any arch that cares.

>> Or is the approach of arch specific memory barrier for MCS
>> an acceptable one before load-acquire and store-release
>> are available? Are there any technical issues remaining with
>> the patchset after including including Waiman's arch specific barrier?

I don't want to stand in the way of Waiman's change, and I had
actually taken the same approach with arch-specific barriers when
proposing some queue spinlocks in the past; however I do feel that
this comes back regularly enough that having acquire/release
primitives available would help, hence my proposal.

That said, earlier in the thread Linus said we should probably get all
our ducks in a row before going forward with this, so...

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: usb-serial lockdep trace in linus' current tree.
http://groups.google.com/group/linux.kernel/t/10ab65d5a0e4e1ea?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Dave Jones

Seeing this since todays USB merge.

WARNING: CPU: 0 PID: 226 at kernel/lockdep.c:2740 lockdep_trace_alloc+0xc5/0xd0()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
Modules linked in: usb_debug(+) kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode(+) pcspkr serio_raw
CPU: 0 PID: 226 Comm: systemd-udevd Not tainted 3.12.0+ #112
ffffffff81a22d3d ffff88023cde5670 ffffffff8171a8e8 ffff88023cde56b8
ffff88023cde56a8 ffffffff8105430d 0000000000000046 00000000000080d0
0000000000000010 0000000000000001 ffff880244407a80 ffff88023cde5708
Call Trace:
[<ffffffff8171a8e8>] dump_stack+0x4e/0x82
[<ffffffff8105430d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff8105437c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff810cbcd5>] lockdep_trace_alloc+0xc5/0xd0
[<ffffffff811a5633>] __kmalloc+0x53/0x350
[<ffffffff8153e417>] ? xhci_urb_enqueue+0xb7/0x610
[<ffffffff81339f5c>] ? debug_dma_mapping_error+0x7c/0x90
[<ffffffff8153e417>] xhci_urb_enqueue+0xb7/0x610
[<ffffffff8150fad6>] usb_hcd_submit_urb+0xa6/0xae0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff810c5439>] ? get_lock_stats+0x19/0x60
[<ffffffff810c5bae>] ? put_lock_stats.isra.28+0xe/0x40
[<ffffffff81511569>] usb_submit_urb+0x1f9/0x470
[<ffffffff81554ca5>] usb_serial_generic_write_start+0xf5/0x210
[<ffffffff81554ef0>] usb_serial_generic_write+0x70/0x90
[<ffffffff81555637>] usb_console_write+0xc7/0x220
[<ffffffff810af585>] call_console_drivers.constprop.23+0xa5/0x1e0
[<ffffffff810afe0c>] console_unlock+0x40c/0x460
[<ffffffff810b10ec>] register_console+0x12c/0x390
[<ffffffff81555b62>] usb_serial_console_init+0x22/0x40
[<ffffffff815539aa>] usb_serial_probe+0xfea/0x10e0
[<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
[<ffffffff810c531f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[<ffffffff810c5439>] ? get_lock_stats+0x19/0x60
[<ffffffff8172004d>] ? __mutex_unlock_slowpath+0xed/0x1a0
[<ffffffff810c8af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
[<ffffffff810c8bcd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff815164ef>] usb_probe_interface+0x1cf/0x300
[<ffffffff814ad607>] driver_probe_device+0x87/0x390
[<ffffffff814ad9e3>] __driver_attach+0x93/0xa0
[<ffffffff814ad950>] ? __device_attach+0x40/0x40
[<ffffffff814ab53b>] bus_for_each_dev+0x6b/0xb0
[<ffffffff814ad02e>] driver_attach+0x1e/0x20
[<ffffffff8155246e>] usb_serial_register_drivers+0x29e/0x580
[<ffffffffa0005000>] ? 0xffffffffa0004fff
[<ffffffffa000501e>] usb_serial_module_init+0x1e/0x1000 [usb_debug]
[<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
[<ffffffff8103c7b3>] ? set_memory_nx+0x43/0x50
[<ffffffff810d9e42>] load_module+0x1fd2/0x26a0
[<ffffffff810d4f90>] ? store_uevent+0x40/0x40
[<ffffffff810da6a6>] SyS_finit_module+0x86/0xb0
[<ffffffff8172db64>] tracesys+0xdd/0xe2
---[ end trace ee033a3c9fd6263b ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp
http://groups.google.com/group/linux.kernel/t/b1f45f3038c24409?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 7 2013 2:40 pm
From: Stephen Boyd

On 11/06/13 17:50, Josh Cartwright wrote:
> On Fri, Nov 01, 2013 at 03:08:53PM -0700, Stephen Boyd wrote:
>> diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
>> index f35906b..71a8592 100644
>> --- a/arch/arm/kernel/devtree.c
>> +++ b/arch/arm/kernel/devtree.c
>> @@ -25,6 +25,7 @@
>> #include <asm/smp_plat.h>
>> #include <asm/mach/arch.h>
>> #include <asm/mach-types.h>
>> +#include <asm/smp.h>
>>
>> void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>> {
>> @@ -63,6 +64,36 @@ void __init arm_dt_memblock_reserve(void)
>> }
>> }
>>
>> +#ifdef CONFIG_SMP
>> +extern struct of_cpu_method __cpu_method_of_table[];
>> +
>> +static const struct of_cpu_method __cpu_method_of_table_sentinel
>> + __used __section(__cpu_method_of_table_end);
> Having a sentinel allocated into the linked image makes a lot of sense
> in other cases (IRQCHIP/CLOCKSOURCE_OF_DECLARE, etc), where it's used to
> terminate an of_device_id table (as is expected by of_match_table and
> friends).
>
> In this case, however, you aren't building a match table, so having a
> sentinel allocated isn't necessary. I'd suggest bookending the table
> with a VMLINUX_SYMBOL(__cpu_method_of_table_end) instead.
>
> A whole 2 pointers worth of savings!
>

Yes, will do. Thanks.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Thursday, November 7, 2013

linux.kernel - 26 new messages in 14 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts