Wednesday, April 21, 2010

linux.kernel - 26 new messages in 21 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* blkio: Fix blkio crash during rq stat update - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/ca7cdb4ebf628cfa?hl=en
* mm,migration: Allow the migration of PageSwapCache pages - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/960c66df176a3b18?hl=en
* Xonar DX invalid PCI I/O range since 977d17bb174 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b63a4d8da5b1e3c9?hl=en
* KVM bug, git bisected - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/cf6012901770396c?hl=en
* Uprobes Implementation - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/84dba0fdc14a384d?hl=en
* CFQ read performance regression - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/f106eaf14f59195d?hl=en
* busy inodes -> ext3 umount crash - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/f78e1c3f6ea5fa8a?hl=en
* perf: fix initialization bug in parse_single_tracepoint_event() - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/20270d80ef5317db?hl=en
* readahead on directories - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2d53dc39c03ec4a5?hl=en
* perf lock: Fix state machine to recognize lock sequence - 2 messages, 1
author
http://groups.google.com/group/linux.kernel/t/96b38e24ca76f796?hl=en
* tracing update - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/4448b9a8aa388a5f?hl=en
* kbuild: add *.rej pattern to .gitignore - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8c244be9807a27cc?hl=en
* proc: fix badness in fs/proc/generic.c - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c9573c3936607e54?hl=en
* fs: fix bad string output - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/cebfaca6a899df53?hl=en
* perf: introduce model specific events and AMD IBS - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/3660ac7771798be1?hl=en
* Threaded irq handler question - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/97b689eff0a3fa9a?hl=en
* Reservations request‏ - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8a1ce57be1ba655e?hl=en
* x86,pci,acpi: Handle invalid _CRS - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/711b7090c86b176e?hl=en
* Mass storage gadget: Handle eject request - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/0dcdd3896a0d44fc?hl=en
* init: Provide a kernel start parameter to increase pid_max v2 - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/9a25064c21d093f8?hl=en
* error at compaction (Re: mmotm 2010-04-15-14-42 uploaded - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/b663875a42c376e5?hl=en

==============================================================================
TOPIC: blkio: Fix blkio crash during rq stat update
http://groups.google.com/group/linux.kernel/t/ca7cdb4ebf628cfa?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 21 2010 8:50 am
From: Jens Axboe


On Wed, Apr 21 2010, Jens Axboe wrote:
> On Wed, Apr 21 2010, Vivek Goyal wrote:
> > @@ -1001,6 +1002,11 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
> > return cfqg;
> > }
> >
> > +static inline struct cfq_group *cfq_ref_get_cfqg(struct cfq_group *cfqg) {
> > + atomic_inc(&cfqg->ref);
> > + return cfqg;
> > +}
> > +
>
> Sorry to keep harping on this - style is still wrong, and:

I'll fix these up by hand.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 21 2010 8:50 am
From: Vivek Goyal


On Wed, Apr 21, 2010 at 05:41:28PM +0200, Jens Axboe wrote:
> On Wed, Apr 21 2010, Jens Axboe wrote:
> > On Wed, Apr 21 2010, Vivek Goyal wrote:
> > > @@ -1001,6 +1002,11 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
> > > return cfqg;
> > > }
> > >
> > > +static inline struct cfq_group *cfq_ref_get_cfqg(struct cfq_group *cfqg) {
> > > + atomic_inc(&cfqg->ref);
> > > + return cfqg;
> > > +}
> > > +
> >
> > Sorry to keep harping on this - style is still wrong, and:
>
> I'll fix these up by hand.

Thanks Jens. Sorry for the trouble.

Vivek

>
> --
> Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mm,migration: Allow the migration of PageSwapCache pages
http://groups.google.com/group/linux.kernel/t/960c66df176a3b18?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 8:50 am
From: Christoph Lameter


On Wed, 21 Apr 2010, Mel Gorman wrote:

> > > 2. Is the BUG_ON check in
> > > include/linux/swapops.h#migration_entry_to_page() now wrong? (I
> > > think yes, but I'm not sure and I'm having trouble verifying it)
> >
> > The bug check ensures that migration entries only occur when the page
> > is locked. This patch changes that behavior. This is going too oops
> > therefore in unmap_and_move() when you try to remove the migration_ptes
> > from an unlocked page.
> >
>
> It's not unmap_and_move() that the problem is occurring on but during a
> page fault - presumably in do_swap_page but I'm not 100% certain.

remove_migration_pte() calls migration_entry_to_page(). So it must do that
only if the page is still locked.

You need to ensure that the page is not unlocked in move_to_new_page() if
the migration ptes are kept.

move_to_new_page() only unlocks the new page not the original page. So that is safe.

And it seems that the old page is also unlocked in unmap_and_move() only
after the migration_ptes have been removed? So we are fine after all...?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Xonar DX invalid PCI I/O range since 977d17bb174
http://groups.google.com/group/linux.kernel/t/b63a4d8da5b1e3c9?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:00 am
From: Jesse Barnes


On Tue, 20 Apr 2010 23:06:09 -0700
Yinghai <yinghai.lu@oracle.com> wrote:
> after several times retry, mmio ranges get assigned, but io port range can be allocated enough range. it needs 16k,
> but under 05:01.0 to 08:00.0 and 09:04.0 orginal io port from BIOS allocation get lost.
>
> wonder be good, if We can restore it for such case.
>
> current may have to disable bridge resizing feature by default.
>
> can you send out
> lspci -vvxxx
> lspci -tv

Since we don't really know which devices will be in use until drivers
load (and not even then if they're userspace drivers), it might be best
to put off the reassignment until a PCI driver expresses an interest in
the range.

At least, it seems like that would be closer to the ideal approach than
trying to reassign everything at boot, potentially making devices that
don't matter get resources and leaving important devices disabled.

--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: KVM bug, git bisected
http://groups.google.com/group/linux.kernel/t/cf6012901770396c?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 21 2010 9:00 am
From: "Rafael J. Wysocki"


On Wednesday 21 April 2010, Peter Zijlstra wrote:
> On Wed, 2010-04-21 at 08:20 +0200, Borislav Petkov wrote:
> > From: "Rafael J. Wysocki" <rjw@sisk.pl>
> > Date: Wed, Apr 21, 2010 at 07:02:02AM +0200
> >
> > > On Wednesday 21 April 2010, Rik van Riel wrote:
> > > > On 04/20/2010 05:11 PM, Rik van Riel wrote:
> > > > > On 04/19/2010 11:19 PM, Rafael J. Wysocki wrote:
> > > > >> This message has been generated automatically as a part of a summary
> > > > >> report
> > > > >> of recent regressions.
> > > > >>
> > > > >> The following bug entry is on the current list of known regressions
> > > > >> from 2.6.33. Please verify if it still should be listed and let the
> > > > >> tracking team
> > > > >> know (either way).
> > > > >>
> > > > >>
> > > > >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15672
> > > > >> Subject : KVM bug, git bisected
> > > > >> Submitter : Kent Overstreet<kent.overstreet@gmail.com>
> > > > >> Date : 2010-03-27 12:43 (24 days old)
> > > > >> First-Bad-Commit:
> > > > >> http://kernel.org/git/linus/5beb49305251e5669852ed541e8e2f2f7696c53e
> > > > >> Message-ID :<4BADFD74.8060904@gmail.com>
> > > > >> References : http://marc.info/?l=linux-kernel&m=126969385121711&w=2
> > > > >
> > > > > Should be fixed by commit ea90002b0fa7bdee86ec22eba1d951f30bf043a6
> > > >
> > > > Never mind me - this is a harmless (but loud) overflow
> > > > of PREEMPT_BITS in the preempt count.
> > >
> > > OK, what am I supposed to do with this entry, then? Close?
> >
> > FWIW, I hit that warning too when chasing the anon_vma regression. It
> > seems on certain workloads (for me it was several kvm guests) we're
> > close to max preemption depth.
> >
> > Anyway, adding some more people to Cc.
>
> Right, so my proposed solution to this is to make those locks
> preemptible, but that's a large and unfinished patch-set.
>
> As it is, its only a warning, nothing really serious should happen, but
> the situation does suck.

I'm not sure if it's worth keeping that listed, though, as the problem is known
and won't be solved before .34 final.

OK to close as "will fix later"?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 21 2010 9:10 am
From: Peter Zijlstra


On Wed, 2010-04-21 at 17:57 +0200, Rafael J. Wysocki wrote:
>
> OK to close as "will fix later"?
>
Sure

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Uprobes Implementation
http://groups.google.com/group/linux.kernel/t/84dba0fdc14a384d?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:10 am
From: Oleg Nesterov


On 04/21, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <oleg@redhat.com> [2010-04-20 17:30:23]:
>
> > I must have missed something. But I do not see where do we use
> > uprobe_process->tg_leader. We never read it, apart from
> > BUG_ON(uproc->tg_leader != tg_leader). No?
>
> static int free_uprocess(struct uprobe_process *uproc)
> {
> ....
> put_pid(uproc->tg_leader);
> uproc->tg_leader = NULL;
>
> }

Yes, yes, I see it does get/put pid. But where do we actually use
uproc->tg_leader? Why it is needed at all?

> > Also the declarations don't look nice... Probably I missed something,
> > but why the code uses "void *" instead of "user_bkpt_xol_area *" for
> > xol_area everywhere?
> >
> > OK, even if "void *" makes sense for uproc->uprobe_process,
^^^^^^^^^^^^^^^^^^^^^
I meant uprobe_process->xol_area

> why
> > xol_alloc_area/xol_get_insn_slot/etc do not use "user_bkpt_xol_area *" ?
> >
>
> user_bkpt_xol_area isn't exposed. This provides flexibility in changing
> the algorithm for more efficient slot allocation. Currently we allocate
> slots from just one page. Later on we could end-up having to allocate
> from more than contiguous pages. There was some discussion about
> allocating slots from TLS. So there is more than one reason that
> user_bkpt_xol can change. We could expose the struct and not access the
> fields directly but that would be hard to enforce.

Still can't understand... Yes, we shouldn't expose the details, but we
can just add "struct user_bkpt_xol_area;" into include file.

OK, this is minor.

> > > If the utask has to be allocated, then uprobes has to search
> > > for the probepoint again in task context.
> > > I dont think it would be an issue to search for the probepoint a
> > > second time in the task context.
> >
> > Agreed. Although we need the new TIF_ bit for tracehook_notify_resume(),
> > it can't trust "if (current->utask...)" checks.
>
> But do we need a new TIF bit? Can we just reuse the TIF_NOTIFY_RESUME
> flag that we use now?

Probably not... But somehow tracehook_notify_resume/uprobe_notify_resume
should know we hit the bp and we need to allocate utask. Yes,
tracehook_notify_resume() can always call uprobe_notify_resume()
unconditionally, and uprobe_notify_resume() can notice the
"find_probept() && !current->utask" case, but probably it is better to
make this more explicit. And of course, the new bit should be set along
with TIF_NOTIFY_RESUME.

Or. Instead of TIF_ bit, we can use something like

#define UTASK_PLEASE_ALLOCATE_ME ((struct uprobe_task *)1)

uprobe_bkpt_notifier() sets current->utask = UTASK_PLEASE_ALLOCATE_ME,
then tracehook_notify_resume/uprobe_notify_resume check this case.

I dunno, please do what you think right.


OK, the last questions:

1. Can't multiple write_opcode()'s race with each other?

Say, pre_ssout() calls remove_bkpt() lockless. can't it race
with register_uprobe() which may write to the same page?

And, without uses_xol_strategy() there are more racy callers
of write_opcode()... Probably something else.

2. Can't write_opcode() conflict with ksm doing replace_page() ?

3. mprotect(). write_opcode() checks !VM_WRITE. This is correct,
otherwise we can race with the user-space writing to the same
page.

But suppose that the application does mprotect(PROT_WRITE) after
register_uprobe() installs the bp, now unregister_uprobe/etc can't
restore the original insn?

4. mremap(). What if the application does mremap() and moves the
memory? After that vaddr of user_bkpt/uprobe no longer matches
the virtual address of bp. This breaks uprobe_bkpt_notifier(),
unregister_uprobe(), etc.

Even worse. Say, unregister_uprobe() calls remove_bkpt().
mremap()+mmap() can be called after ->read_opcode() verifies vaddr
points to bkpt_insn, but before write_opcode() changes the page.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: CFQ read performance regression
http://groups.google.com/group/linux.kernel/t/f106eaf14f59195d?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:10 am
From: Miklos Szeredi


Jens, Corrado,

Here's a graph showing the number of issued but not yet completed
requests versus time for CFQ and NOOP schedulers running the tiobench
benchmark with 8 threads:

http://www.kernel.org/pub/linux/kernel/people/mszeredi/blktrace/queue-depth.jpg

It shows pretty clearly the performance problem is because CFQ is not
issuing enough request to fill the bandwidth.

Is this the correct behavior of CFQ or is this a bug?

This is on a vanilla 2.6.34-rc4 kernel with two tunables modified:

read_ahead_kb=512
low_latency=0 (for CFQ)

Thanks,
Miklos


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: busy inodes -> ext3 umount crash
http://groups.google.com/group/linux.kernel/t/f78e1c3f6ea5fa8a?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:10 am
From: Eric Paris


On Wed, 2010-04-21 at 17:47 +0200, Jiri Slaby wrote:
> On 04/21/2010 05:24 PM, Eric Paris wrote:
> > I'll take a look, but I'm not seeing a problem right off
> > hand. This patch wasn't supposed to mess with inode refcounting at
> > all....
>
> Heh, but with very high probability now, it did :). Do you want me to
> inject some printouts anywhere? Can you reproduce it? KDE doesn't
> trigger the bug. After I switched from KDE to gnome in qemu, it started
> to occur (X :0 & (sleep 1; DISPLAY=:0 gnome-session)).

Well I reproduced and I'll take a look. reliable steps seem to be:

# dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))
# mkfs.ext3 -m 0 /dev/shm/ext3
# mount -oloop /dev/shm/ext3 /mnt/c
# touch /mnt/c/file
# inotifywait -m /mnt/c/file
# umount /mnt/c
# dmesg|tail

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: perf: fix initialization bug in parse_single_tracepoint_event()
http://groups.google.com/group/linux.kernel/t/20270d80ef5317db?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:10 am
From: Stephane Eranian

The parse_single_tracepoint_event() was setting some attributes
before it validated the event was indeed a tracepoint event. This
caused problems with other initialization routines like in the
builtin-top.c module whereby sample_period is not set if not 0.

Signed-off-by: Stephane Eranian <eranian@google.com>
--
parse-events.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3b4ec67..82b8b7f 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -418,12 +418,6 @@ parse_single_tracepoint_event(char *sys_name,
u64 id;
int fd;

- attr->sample_type |= PERF_SAMPLE_RAW;
- attr->sample_type |= PERF_SAMPLE_TIME;
- attr->sample_type |= PERF_SAMPLE_CPU;
-
- attr->sample_period = 1;
-
snprintf(evt_path, MAXPATHLEN, "%s/%s/%s/id", debugfs_path,
sys_name, evt_name);

@@ -442,6 +436,13 @@ parse_single_tracepoint_event(char *sys_name,
attr->type = PERF_TYPE_TRACEPOINT;
*strp = evt_name + evt_length;

+ attr->sample_type |= PERF_SAMPLE_RAW;
+ attr->sample_type |= PERF_SAMPLE_TIME;
+ attr->sample_type |= PERF_SAMPLE_CPU;
+
+ attr->sample_period = 1;
+
+
return EVT_HANDLED;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: readahead on directories
http://groups.google.com/group/linux.kernel/t/2d53dc39c03ec4a5?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:20 am
From: Jamie Lokier


Phillip Susi wrote:
> On 4/20/2010 8:44 PM, Jamie Lokier wrote:
> > readahead() doesn't make much sense on a directory - the offset and
> > size aren't meaningful.
> >
> > But does plain opendir/readdir/closedir solve the problem?
>
> No, since those are synchronous. I want to have readahead() queue up
> reading the entire directory in the background to avoid blocking, and
> get the queue filled with a bunch of requests that can be merged into
> larger segments before being dispatched to the hardware.

Asynchronous is available: Use clone or pthreads.

More broadly: One of the ways to better I/O sorting is to make sure
you've got enough things in parallel that the I/O queue is never
empty, so what you issue has time to get sorted before it reaches the
head of the queue for dispatch. On the other hand, not so many things
in parallel that the queues fill up and throttle. Unfortunately it
only works if things aren't serialised by kernel locks - but there's been
a lot of work on lockless this and that in the kernel, which may help.

Back to your problem: You need a bunch of scattered block requests to
be queued and sorted sanely, and readdir doesn't do that, and even
waits for each block before issuing the next request.

Or does it?

A quick skim of fs/{ext3,ext4}/dir.c finds a call to
page_cache_sync_readahead. Doesn't that do any reading ahead? :-)

> I don't actually care to have the contents of the
> directories returned, so readdir() does more than I need in that
> respect, and also it performs a blocking read of one disk block at a
> time, which is horribly slow with a cold cache.

I/O is the probably the biggest cost, so it's more important to get
the I/O pattern you want than worrying about return values you'll discard.

If readdir() calls are slowed by lots of calls and libc, consider
using the getdirentries system call directly.

If not, fs/ext4/namei.c:ext4_dir_inode_operations points to
ext4_fiemap. So you may have luck calling FIEMAP or FIBMAP on the
directory, and then reading blocks using the block device. I'm not
sure if the cache loaded via the block device (when mounted) will then
be used for directory lookups.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: perf lock: Fix state machine to recognize lock sequence
http://groups.google.com/group/linux.kernel/t/96b38e24ca76f796?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 21 2010 9:20 am
From: Frederic Weisbecker


On Wed, Apr 21, 2010 at 06:12:06PM +0900, Hitoshi Mitake wrote:
> On 04/21/10 10:26, Frederic Weisbecker wrote:
> > 2) I can't get lock_acquired traces. Not sure why yet...
>
> Really? It's mystery... I'll seek the cause.


In fact they are here, but not displayed in perf trace because
of a format file parse error. I'm investigating, I think it
doesn't impact perf lock though.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 21 2010 9:20 am
From: Frederic Weisbecker


On Wed, Apr 21, 2010 at 10:55:58AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-04-21 at 03:26 +0200, Frederic Weisbecker wrote:
> > The max lockdep depth may
> > change in the future, or become variable, so we can't relay on that.
> >
> Change, possible, variable unlikely, you need a memory allocation to
> extend it, and memory allocation takes locks, which is not a nice thing
> to do in the middle of lockdep.


Sure, I didn't meant on runtime but on boot, like in John's recent
proposal.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: tracing update
http://groups.google.com/group/linux.kernel/t/4448b9a8aa388a5f?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:20 am
From: Frederic Weisbecker


On Wed, Apr 21, 2010 at 10:05:44AM +0200, Ingo Molnar wrote:
>
> > > Frederic Weisbecker (1):
> > > tracing: Dump either the oops's cpu source or all cpus buffers
>
> FYI, it has build failures on x86:
>
> kernel/trace/trace.c:4412: error: conflicting types for '__ftrace_dump'
> kernel/trace/trace_selftest.c:258: note: previous declaration of '__ftrace_dump' was here


Oops, fix coming soon.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: kbuild: add *.rej pattern to .gitignore
http://groups.google.com/group/linux.kernel/t/8c244be9807a27cc?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:30 am
From: Américo Wang


On Wed, Apr 21, 2010 at 05:31:34PM +0200, Guennadi Liakhovetski wrote:
>On Wed, 21 Apr 2010, Américo Wang wrote:
>
>> On Wed, Apr 21, 2010 at 01:47:29PM +0200, Guennadi Liakhovetski wrote:
>> >Tell git to ignore .rej files, generated by patch(1).
>> >
>> >Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
>>
>> NACK.
>>
>> You have to handle *.rej files, not leaving them there.
>
>Hm? I know I have to handle them, and I do handle them, but IMHO it's
>entirely valid and up to the user to leave them where they are. You could
>apply similar argument to *.orig or to *~ files. In other words, I'm not
>convinced.
>

I am not the only people who is against this, there
was a discussion about this before, please search lkml
archive.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: proc: fix badness in fs/proc/generic.c
http://groups.google.com/group/linux.kernel/t/c9573c3936607e54?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:30 am
From: Américo Wang


On Wed, Apr 21, 2010 at 07:09:42PM +0800, GuanJun He wrote:
>fix badness in fs/proc/generic.c, Bug 15589 - 2.6.34-rc1: Badness at
>fs/proc/generic.c:316
>
>Signed-off-by: Guanjun He <heguanbo@gmail.com>

NACK.

The callers of __xlate_proc_name() all hold proc_subdir_lock.

>
>---
>diff -Nupr linux-2.6.34-rc1.orig/fs/proc/generic.c
>linux-2.6.34-rc1/fs/proc/generic.c
>--- linux-2.6.34-rc1.orig/fs/proc/generic.c 2010-03-09
>02:45:44.000000000 +0800
>+++ linux-2.6.34-rc1/fs/proc/generic.c 2010-04-21 19:02:49.000000000 +0800
>@@ -297,11 +297,13 @@ static int __xlate_proc_name(const char
> const char *cp = name, *next;
> struct proc_dir_entry *de;
> int len;
>+ int rtn = 0;
>
> de = *ret;
> if (!de)
> de = &proc_root;
>
>+ spin_lock(&proc_subdir_lock);
> while (1) {
> next = strchr(cp, '/');
> if (!next)
>@@ -313,14 +315,17 @@ static int __xlate_proc_name(const char
> break;
> }
> if (!de) {
>- WARN(1, "name '%s'\n", name);
>- return -ENOENT;
>+ WARN(1, "name \"%s\"\n", name);
>+ rtn = -ENOENT;
>+ goto out;
> }
> cp += len + 1;
> }
> *residual = cp;
> *ret = de;
>- return 0;
>+out:
>+ spin_unlock(&proc_subdir_lock);
>+ return rtn;
> }
>
> static int xlate_proc_name(const char *name, struct proc_dir_entry **ret,
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
Live like a child, think like the god.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: fs: fix bad string output
http://groups.google.com/group/linux.kernel/t/cebfaca6a899df53?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:30 am
From: Américo Wang


On Wed, Apr 21, 2010 at 04:59:40PM +0800, GuanJun He wrote:
>fix bad string output,[Bug #15589] 2.6.34-rc1: Badness at fs/proc/generic.c:316
>
>Signed-off-by: Guanjun He <heguanbo@gmail.com>
>

NACK.

This even doesn't pass compiling. Check the prototype of kstrdup().

Also, we already had a patch:

https://patchwork.kernel.org/patch/93658/


>---
>diff -Nupr linux-2.6.34-rc1-orig/fs/proc/generic.c
>linux-2.6.34-rc1/fs/proc/generic.c
>--- linux-2.6.34-rc1-orig/fs/proc/generic.c 2010-04-21
>13:47:18.000000000 +0800
>+++ linux-2.6.34-rc1/fs/proc/generic.c 2010-04-21 16:42:11.000000000 +0800
>@@ -313,7 +313,14 @@ static int __xlate_proc_name(const char
> break;
> }
> if (!de) {
>- WARN(1, "name '%s'\n", name);
>+ cp = kstrdup(name);
>+ if (cp) {
>+ next = cp;
>+ while(*next) {
>+ if (*next == '/') *next++ = '_';
>+ }
>+ WARN(1, "name '%s'\n", cp);
>+ }
> return -ENOENT;
> }
> cp += len + 1;
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
Live like a child, think like the god.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: perf: introduce model specific events and AMD IBS
http://groups.google.com/group/linux.kernel/t/3660ac7771798be1?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 21 2010 9:40 am
From: Robert Richter


On 21.04.10 14:11:05, Peter Zijlstra wrote:
> On Thu, 2010-04-15 at 17:16 +0200, Robert Richter wrote:
> > On 15.04.10 09:44:21, Peter Zijlstra wrote:
> > > On Tue, 2010-04-13 at 22:23 +0200, Robert Richter wrote:
> > > > This patch series introduces model specific events and impments AMD
> > > > IBS (Instruction Based Sampling) for perf_events.
> > >
> > > I would much rather it uses the ->precise thing PEBS also uses,
> > > otherwise we keep growing funny arch extensions and end up with a
> > > totally fragmented trainwreck of an ABI.
> >
> > I agree that an exiting flag could be reused. But the naming 'precise'
> > could be misleading. Maybe we rename it to 'model_spec' or something
> > else that underlines the idea of having model specific setups.
>
> Right, so I really hate PERF_SAMPLE_RAW, and I'm considering simply
> removing that for PEBS as well, its just too ugly. If we want the
> register set we need to work on getting PERF_SAMPLE_REGS in a sensible
> shape.

The ABI should provide hw access to all pmu features. As it is not
possible to abstract these features for all models and architectures
in the same way and a new feature may work completely different, the
only solution I see is to provide a way to write to pmu registers and
receive sampling data unfiltered back. For IBS for example the data in
a sample does not fit into existing generic events. Making IBS generic
also does not help much, since it will not be available on different
models or architectures. Introducing event types that will never
reused do not need to be generalized, it is better to generalize the
way how to handle this kind of events. This is the reason I like the
model_spec/raw_config/raw_sample approach.

> As to the meaning for ->precise, its meant to convey the counters are
> not affected by skid and the like, I thought IBS provided exact IPs as
> well (/me should re-read the IBS docs).

Yes, the real rip is in the sample, but there is much more data than
that. So the rip is only a subset.

> The thing with something like ->model_spec and PERF_SAMPLE_RAW is that
> it doesn't provide a clear model, the user doesn't know what to expect
> of it, it could be anything.
>
> We want the ABI to express clear concepts, and things like lets bypass
> everything and just dump stuff out to userspace really are to be avoided
> at all costs.

Of course, it could be anything. But this is not the intention. If
configs and buffers or close or exactly as the hw i/f, then it is
spec'ed and well defined. The user only have to know how to configure
a certain hw feature of a special model and how to get data back. This
is how IBS is implemented. Maybe this is the same that you have in
mind with PERF_SAMPLE_REG? This interface can then be reused for a
very different feature and this looks to me like a clear solution. I
do not see alternatives. And even if we process the samples in the
kernel somehow, in the end we pack it and send it to userspace.

> Sadly IBS seems to be an utter trainwreck in the concept department (I'm
> still struggling how to make a sensible interpretation of the data it
> gathers).

That's the point, there is no generalization of this type of data, but
still it is useful.

-Robert

>
> The thing I absolutely want to avoid is the ABI becoming a fragmented
> trainwreck like oprofile is.
>
> Also not using sample_period for the sample period is of course utterly
> unacceptable.
>

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 21 2010 10:00 am
From: Robert Richter


On 21.04.10 13:37:27, Stephane Eranian wrote:
> Why do you need model_spec, in addition to your special encoding?
>
> >  /*
> > + * Model specific hardware events
> > + *
> > + * With the attr.model_spec bit set we can setup hardware events
> > + * others than generic performance counters. A special PMU 64 bit
> > + * config value can be passed through the perf_event interface. The
> > + * concept of PMU model-specific arguments was practiced already in
> > + * Perfmon2. The type of event (8 bits) is determinded from the config
> > + * value too, bit 32-39 are reserved for this.
> > + */
> Isn't the config field big enough to encode all the information you need?
> In the kernel, you could check bit 32-39 and based on host CPU determine
> whether it refers to IBS or is a bogus value. I am trying to figure out what
> model_spec buys you. I believe RAW does not mean the final value as
> accepted by HW but a value that must be interpreted by the model-specific
> code to eventually come up with a raw HW value. In the current code, the
> RAW value is never passed as is, it is assembled from various bits and
> pieces incl. attr.config of course.

The raw config value without the model_spec bit set would asume a
config value for a generic x86 counter. We could reuse also an unused
bit in this config value, but what if this bit will be somewhen used?
Or, it is available on one hw bot not another? So I found it much
cleaner to introduce this attribute flag that can be reused on other
architectures. Also, you will then have the freedom to implement model
specific generic events without using raw_config. As most pmu features
are setup with a 64 bit config value, I would prefer to have also the
encoding of the model specific event type outside of the config
value. A want to be close to the hw register layout without shifting
bits back and forth. This may also introduce trouble in the future if
we need all 64 bits.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Threaded irq handler question
http://groups.google.com/group/linux.kernel/t/97b689eff0a3fa9a?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:40 am
From: Will Newton


Hi all,

I have a threaded irq handler attached to a level-triggered gpio
interrupt line. The check handler checks the status of the gpio line
and disables the irq line amd returns WAKE_THREAD in that case:

static irqreturn_t isr_check(int irq, void *p)
{
if (gpio_status()) {
disable_irq_nosync(irq);
return IRQ_WAKE_THREAD;
}

return IRQ_NONE;
}

The thread then does some i2c transactions that will eventually
deassert the gpio line:

static irqreturn_t isr(int irq, void *p)
{
/* do some i2c transfers */
enable_irq(irq);
return IRQ_HANDLED;
}

My problem is that this structure does not work, because once I call
disable_irq_nosync() on the irq in the check handler the thread will
no longer run because the irq is disabled. However if I don't call
disable_irq_nosync() I will get endless irqs because the line is
level-triggered and will not be deasserted until the thread has run.

Could someone tell me what I'm doing wrong here?

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Reservations request&#8207;
http://groups.google.com/group/linux.kernel/t/8a1ce57be1ba655e?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:40 am
From: "Corlett Andrew"

Hello,

I will like to make a reservation for my guest coming to your
country on the below dates and i want you to confirm the availability and
total cost to me.

Check IN: July 20th 2010
Check OUT: July 25TH 2010
Rooms Needed: 3 SINGLE ROOMS
Guest Coming: 3

Kindly get back to me with the total cost for
their stay and advice if you accept visa or master credit cards
so i can arrange the payment immediately with my credit card.
Please note that i will want to offset the total cost for the entire
stay immediately.

corlette Andrew


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: x86,pci,acpi: Handle invalid _CRS
http://groups.google.com/group/linux.kernel/t/711b7090c86b176e?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 21 2010 9:50 am
From: Yinghai Lu


On 04/21/2010 08:21 AM, Bjorn Helgaas wrote:
>
>
>> + if (pci_use_crs)
>> + pci_bus_remove_resources(bus);
>> +
>> info.res_num = 0;
>> acpi_walk_resources(device->handle, METHOD_NAME__CRS, setup_resource,
>> &info);
>>
>> + if (pci_use_crs && !info.res_num) {
>> + /* Restore default one */
>> + bus->resource[0] = &ioport_resource;
>> + bus->resource[1] = &iomem_resource;
>>
> This is ugly because it just repeats this code from pci_create_bus(),
> and there's no indication either here or there that they are connected.
>
> Admittedly, I think it's also sort of ugly that pci_bus_remove_resources()
> exists at all -- I'd rather have some sort of hook so we could set the
> bus resources correctly the first time.
>
sure. that would be better. Can you have patch for that.
> Maybe you could at least add a pci_bus_set_default_resources() that
> could be called both here and from pci_create_bus().
>
good idea.
> Why are you doing this patch? Did you see a machine where the host
> bridge was left with no resources because of _CRS issues? If so,
> this patch feels like a band-aid. I'd rather investigate the issue
> directly, because that would probably be a Linux problem we could fix.
>
> Also, if there *is* a reported problem, you should include a link to
> the bugzilla or email thread.
>
No, just code review.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 21 2010 10:00 am
From: Bjorn Helgaas


On Wednesday 21 April 2010 10:45:46 am Yinghai Lu wrote:
> On 04/21/2010 08:21 AM, Bjorn Helgaas wrote:

> > Why are you doing this patch? Did you see a machine where the host
> > bridge was left with no resources because of _CRS issues? If so,
> > this patch feels like a band-aid. I'd rather investigate the issue
> > directly, because that would probably be a Linux problem we could fix.
> >
> > Also, if there *is* a reported problem, you should include a link to
> > the bugzilla or email thread.
> >
> No, just code review.

In that case, I don't think this patch is a good idea. Assume we have
a Linux problem with _CRS parsing, and as a result, we don't find any
host bridge resources. This patch will cause us to silently revert to
the default resources, so we lose the opportunity to debug and fix the
Linux problem.

Even worse, if there's a *BIOS* problem with _CRS, this patch will hide
it and make it harder to diagnose.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Mass storage gadget: Handle eject request
http://groups.google.com/group/linux.kernel/t/0dcdd3896a0d44fc?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 9:50 am
From: Michał Nazarewicz


On Wed, 21 Apr 2010 17:38:22 +0200, <fabien.chouteau@gmail.com> wrote:
> From: Fabien Chouteau <fabien.chouteau@barco.com>
>
>
> Signed-off-by: Fabien Chouteau <fabien.chouteau@barco.com>

Commit message missing, in both patches. It should contain some
details on what the patch does and how this can be used.

> @@ -2412,7 +2451,6 @@ static void fsg_disable(struct usb_function *f)
> raise_exception(fsg->common, FSG_STATE_CONFIG_CHANGE);
> }
>-

This is not important, but if you are going to resend the patch,
how about not deleting this line?

> /*-------------------------------------------------------------------------*/
> static void handle_exception(struct fsg_common *common)
> @@ -2641,7 +2679,7 @@ static int fsg_main_thread(void *common_)
> /* Write permission is checked per LUN in store_*() functions. */
> static DEVICE_ATTR(ro, 0644, fsg_show_ro, fsg_store_ro);
> static DEVICE_ATTR(file, 0644, fsg_show_file, fsg_store_file);
> -
> +static DEVICE_ATTR(ejected, 0444, fsg_show_ejected, NULL);

Same here.

Otherwise, both patches look fine to me.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał "mina86" Nazarewicz (o o)
ooo +---[mina86@mina86.com]---[mina86@jabber.org]---ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: init: Provide a kernel start parameter to increase pid_max v2
http://groups.google.com/group/linux.kernel/t/9a25064c21d093f8?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 10:00 am
From: Hedi Berriche


On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
| > of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
| > started before the login prompt. It's estimated that with 2048 CPU's we will pass
|
| Is that perhaps the bug not the 32K limit?

Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
tasks, all but few being kernel threads.

Worst case scenario i.e. 4096 CPUs system (+ typically thousands of disks) will
most certainly pain to boot, if it ever manages to, when pid_max is set to 32K.

Cheers,
Hedi.
--
Be careful of reading health books, you might die of a misprint.
-- Mark Twain
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: error at compaction (Re: mmotm 2010-04-15-14-42 uploaded
http://groups.google.com/group/linux.kernel/t/b663875a42c376e5?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 21 2010 10:00 am
From: Minchan Kim


Hi, Mel.

On Wed, 2010-04-21 at 11:20 +0100, Mel Gorman wrote:
> On Wed, Apr 21, 2010 at 06:48:06PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 21 Apr 2010 17:28:38 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > On Mon, 19 Apr 2010 20:39:19 +0100
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > >
> > > > On Mon, Apr 19, 2010 at 07:14:42PM +0100, Mel Gorman wrote:
> > > > ==== CUT HERE ====
> > > > mm,compaction: Map free pages in the address space after they get split for compaction
> > > >
> > > > split_free_page() is a helper function which takes a free page from the
> > > > buddy lists and splits it into order-0 pages. It is used by memory
> > > > compaction to build a list of destination pages. If
> > > > CONFIG_DEBUG_PAGEALLOC is set, a kernel paging request bug is triggered
> > > > because split_free_page() did not call the arch-allocation hooks or map
> > > > the page into the kernel address space.
> > > >
> > > > This patch does not update split_free_page() as it is called with
> > > > interrupts held. Instead it documents that callers of split_free_page()
> > > > are responsible for calling the arch hooks and to map the page and fixes
> > > > compaction.
> > > >
> > > > This is a fix to the patch mm-compaction-memory-compaction-core.patch.
> > > >
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > >
> > > Sorry, I think I hit another? error again. (sorry, no log.)
> > > What I did was...
> > > Running 2 shells.
> > > while true; do make -j 16;make cleanl;done
> > > and
> > > while true; do echo 0 > /proc/sys/vm/compact_memory;done
> > >
> > >
> > > Using the same config.
> > >
> > > Apr 21 17:27:47 localhost kernel: ------------[ cut here ]------------
> > > Apr 21 17:27:47 localhost kernel: kernel BUG at include/linux/swapops.h:105!
> > > Apr 21 17:27:47 localhost kernel: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> > > Apr 21 17:27:47 localhost kernel: last sysfs file: /sys/devices/virtual/net/br0/statistics/collisions
> > > Apr 21 17:27:47 localhost kernel: CPU 3
> > > Apr 21 17:27:47 localhost kernel: Modules linked in: fuse sit tunnel4 ipt_MASQUERADE iptable_nat nf_nat bridge stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput ioatdma ppdev parport_pc i5000_edac bnx2 iTCO_wdt edac_core iTCO_vendor_support shpchp parport e1000e kvm_intel dca kvm i2c_i801 i2c_core i5k_amb pcspkr megaraid_sas [last unloaded: microcode]
> > > Apr 21 17:27:47 localhost kernel:
> > > Apr 21 17:27:47 localhost kernel: Pid: 27892, comm: cc1 Tainted: G W 2.6.34-rc4-mm1+ #4 D2519/PRIMERGY
> > > Apr 21 17:27:47 localhost kernel: RIP: 0010:[<ffffffff8114e9cf>] [<ffffffff8114e9cf>] migration_entry_wait+0x16f/0x180
> > > Apr 21 17:27:47 localhost kernel: RSP: 0000:ffff88008d9efe08 EFLAGS: 00010246
> > > Apr 21 17:27:47 localhost kernel: RAX: ffffea0000000000 RBX: ffffea0000241100 RCX: 0000000000000001
> > > Apr 21 17:27:47 localhost kernel: RDX: 000000000000a4e0 RSI: ffff880621a4ab00 RDI: 000000000149c03e
> > > Apr 21 17:27:47 localhost kernel: RBP: ffff88008d9efe38 R08: 0000000000000000 R09: 0000000000000000
> > > Apr 21 17:27:47 localhost kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff880621a4aae8
> > > Apr 21 17:27:47 localhost kernel: R13: 00000000bf811000 R14: 000000000149c03e R15: 0000000000000000
> > > Apr 21 17:27:47 localhost kernel: FS: 00007fe6abc90700(0000) GS:ffff880005a00000(0000) knlGS:0000000000000000
> > > Apr 21 17:27:47 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Apr 21 17:27:47 localhost kernel: CR2: 00007fe6a37279a0 CR3: 000000008d942000 CR4: 00000000000006e0
> > > Apr 21 17:27:47 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > Apr 21 17:27:47 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Apr 21 17:27:47 localhost kernel: Process cc1 (pid: 27892, threadinfo ffff88008d9ee000, task ffff8800b23ec820)
> > > Apr 21 17:27:47 localhost kernel: Stack:
> > > Apr 21 17:27:47 localhost kernel: ffffea000101aee8 ffff880621a4aae8 ffff88008d9efe38 00007fe6a37279a0
> > > Apr 21 17:27:47 localhost kernel: <0> ffff8805d9706d90 ffff880621a4aa00 ffff88008d9efef8 ffffffff81126d05
> > > Apr 21 17:27:47 localhost kernel: <0> ffff88008d9efec8 0000000000000246 0000000000000000 ffffffff81586533
> > > Apr 21 17:27:47 localhost kernel: Call Trace:
> > > Apr 21 17:27:47 localhost kernel: [<ffffffff81126d05>] handle_mm_fault+0x995/0x9b0
> > > Apr 21 17:27:47 localhost kernel: [<ffffffff81586533>] ? do_page_fault+0x103/0x330
> > > Apr 21 17:27:47 localhost kernel: [<ffffffff8104bf40>] ? finish_task_switch+0x0/0xf0
> > > Apr 21 17:27:47 localhost kernel: [<ffffffff8158659e>] do_page_fault+0x16e/0x330
> > > Apr 21 17:27:47 localhost kernel: [<ffffffff81582f35>] page_fault+0x25/0x30
> > > Apr 21 17:27:47 localhost kernel: Code: 53 08 85 c9 0f 84 32 ff ff ff 8d 41 01 89 4d d8 89 45 d4 8b 75 d4 8b 45 d8 f0 0f b1 32 89 45 dc 8b 45 dc 39 c8 74 aa 89 c1 eb d7 <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
> > > Apr 21 17:27:47 localhost kernel: RIP [<ffffffff8114e9cf>] migration_entry_wait+0x16f/0x180
> > > Apr 21 17:27:47 localhost kernel: RSP <ffff88008d9efe08>
> > > Apr 21 17:27:47 localhost kernel: ---[ end trace 4860ab585c1fcddb ]---
> > >
> >
> > It seems that this is a new error.
> >
> >
> > static inline struct page *migration_entry_to_page(swp_entry_t entry)
> > {
> > struct page *p = pfn_to_page(swp_offset(entry));
> > /*
> > * Any use of migration entries may only occur while the
> > * corresponding page is locked
> > */
> > BUG_ON(!PageLocked(p));
> > return p;
> > }
> >
> >
> > Hits this BUG_ON()....then, the page migration_entry points to is unlocked.
> >
> > But we always do
> >
> > lock_page(old_page);
> > unamp(old_page);
> > remap(new_page);
> > unlock_page(old_page);
> >
> > So....some pte wasn't updated at remap ?
> >
>
> I'm working on reproducing the problem. I've hit it only once. My stress
> tests were using dd instead of make like yours did and my
> compilation-orientated test would not have been hitting compaction as
> hard.
>
> The theory I'm working on is that it's a PageSwapCache page that was
> unmapped and not remapped (remap_swapcache == 0) in move_to_new_page().
> In this case, the page would be migrated, left in place and unlocked.
> Later when a swap fault occurred, the migration PTE is found and the
> bug_on triggers i.e. the bug check is no longer valid because it is
> possible for an unlocked migration pte to be left behind.

Hmm. How about the situation?


CPU A CPU B

1. unmap_and_move
2. lock_page
3. PageAnon && !page_mapped && PageSwapCache 3' do_fork
4. remap_swapcache = 0 4' pte lock, page_dup_rmap <- race happens
5. try_to_unmap - make migration entry by 4'
6. move_to_newpage
7. don't call remove_migration due to 4
8. do_swap_page
9. migration_entry_wait
10. goto out
11. fault!

In this case, process of CPU B will be killed although it passes PageLocked
So I think we have to find another method.

I might be wrong since nearly falling asleep. :(

--
Kind regards,
Minchan Kim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate