Wednesday, April 14, 2010

linux.kernel - 26 new messages in 15 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* perf & kvm: Enhance perf to collect KVM guest os statistics from host side -
2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/1d27c8aca75a3d96?hl=en
* block: blk-timeout.c ensure jiffies wrap is handled correctly in blk_rq_
timed_out_timer - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/da66213e07823ac7?hl=en
* USB transfer_buffer allocations on 64bit systems - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a96a33ce18e828e9?hl=en
* Sound goes too fast due to commit 7b3a177b0 - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2f9470dadc204a7c?hl=en
* mm: disallow direct reclaim page writeback - 4 messages, 3 authors
http://groups.google.com/group/linux.kernel/t/019901c2ac5d237e?hl=en
* tun: orphan an skb on tx - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/27f4f72d2c57841a?hl=en
* X86: Optimise fls(), ffs() and fls64() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ccd9cc33b88612fd?hl=en
* Trying to fix ITE-887x parallel/serial driver bugs (including unhandled IRQs)
- 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8b58e8b6f16e6544?hl=en
* Input: add HP Compaq 2710p to 'noloop' table - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/1127ff79543457ae?hl=en
* powerpc/perf_event: Fix oops due to perf_event_do_pending call - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/059e16f430142a41?hl=en
* change alloc function in alloc_slab_page - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/dd86fca671cd7477?hl=en
* eeepc-wmi: depends on INPUT_SPARSEKMAP - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/97d65304157da648?hl=en
* ext2: Preparation to remove BKL (v2) - 6 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8dbc9da49e078d62?hl=en
* [PATCH 00/12] perf: introduce model specific events and AMD IBS - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/3660ac7771798be1?hl=en
* vmalloc performance - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/4b8b51e9b25ac5ba?hl=en

==============================================================================
TOPIC: perf & kvm: Enhance perf to collect KVM guest os statistics from host
side
http://groups.google.com/group/linux.kernel/t/1d27c8aca75a3d96?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 14 2010 3:50 am
From: Ingo Molnar

* Avi Kivity <avi@redhat.com> wrote:

> On 04/14/2030 12:05 PM, Zhang, Yanmin wrote:
> >Here is the new patch of V3 against tip/master of April 13th
> >if anyone wants to try it.
> >
>
> Thanks for persisting despite the flames.
>
> Can you please separate arch/x86/kvm part of the patch? That will make for
> easier reviewing, and will need to go through separate trees.

Once it gets into a state that it can be applied could you please create a
separate, -git based branch for it, so that i can pull it for testing and
integration with the tools/perf/ bits?

Assuming there are no serious conflicts with pending KVM work.

(or i can do that too)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 14 2010 4:20 am
From: Avi Kivity


On 04/14/2010 01:43 PM, Ingo Molnar wrote:
>>
>> Thanks for persisting despite the flames.
>>
>> Can you please separate arch/x86/kvm part of the patch? That will make for
>> easier reviewing, and will need to go through separate trees.
>>
> Once it gets into a state that it can be applied could you please create a
> separate, -git based branch for it, so that i can pull it for testing and
> integration with the tools/perf/ bits?
>
>

Sure.

> Assuming there are no serious conflicts with pending KVM work.
>

There will be a conflict with the NMI fix (which has to go in first,
we'll want to backport it), I'll put it on the same branch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: block: blk-timeout.c ensure jiffies wrap is handled correctly in blk_rq_
timed_out_timer
http://groups.google.com/group/linux.kernel/t/da66213e07823ac7?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 4:00 am
From: Richard Kennedy


blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8dd18e78a523513749e5b54bda07b0cb
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.

This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>

---
patch against 2.6.34-rc4
Compiled & tested on x86_64

regards
Richard


diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 1ba7e0a..4f0c06c 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -109,6 +109,7 @@ void blk_rq_timed_out_timer(unsigned long data)
struct request_queue *q = (struct request_queue *) data;
unsigned long flags, next = 0;
struct request *rq, *tmp;
+ int next_set = 0;

spin_lock_irqsave(q->queue_lock, flags);

@@ -122,16 +123,13 @@ void blk_rq_timed_out_timer(unsigned long data)
if (blk_mark_rq_complete(rq))
continue;
blk_rq_timed_out(rq);
- } else if (!next || time_after(next, rq->deadline))
+ } else if (!next_set || time_after(next, rq->deadline)) {
next = rq->deadline;
+ next_set = 1;
+ }
}

- /*
- * next can never be 0 here with the list non-empty, since we always
- * bump ->deadline to 1 so we can detect if the timer was ever added
- * or not. See comment in blk_add_timer()
- */
- if (next)
+ if (next_set)
mod_timer(&q->timeout, round_jiffies_up(next));

spin_unlock_irqrestore(q->queue_lock, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: USB transfer_buffer allocations on 64bit systems
http://groups.google.com/group/linux.kernel/t/a96a33ce18e828e9?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 4:10 am
From: Pedro Ribeiro


On 14 April 2010 11:47, Pedro Ribeiro <pedrib@gmail.com> wrote:
> On 14 April 2010 11:09, Daniel Mack <daniel@caiaq.de> wrote:
>
>> Thanks! So the only thing I can do for now is submit exactly this patch.
>> At least, it helps you and it shouldn't break anything. The question
>> remains whether this type of memory should be used for all
>> transfer_buffers.
>>
>
> Is there any chance you could push this to -stable? I don't care
> because I always use the latest kernel, but the next Debian stable and
> Ubuntu LTS are going to use 2.6.32.
>
>>> Any idea why is mem=4096m different than a regular boot since I have 4GB anyway?
>>
>> On Fri, Apr 09, 2010 at 04:11:52PM -0600, Robert Hancock wrote:
>>> If you have 4GB of RAM then almost certainly you have memory located
>>> at addresses over 4GB. If you look at the e820 memory map printed at
>>> the start of dmesg on bootup and see entries with addresses of
>>> 100000000 or higher reported as usable, then this is the case.
>>
>> Could you post the these e820 line from your dmesg when booted with
>> mem=4096?
>>
>> Daniel
>>
>>
>
> This is the e820 WITHOUT mem=4096m:
>
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
> [    0.000000]  BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000bd4a1000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd4a1000 - 00000000bd4a7000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd4a7000 - 00000000bd5b8000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd5b8000 - 00000000bd60f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd60f000 - 00000000bd6c6000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd6c6000 - 00000000bd6d1000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd6d1000 - 00000000bd6d4000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd6d4000 - 00000000bd6d8000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd6d8000 - 00000000bd6dc000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd6dc000 - 00000000bd6df000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd6df000 - 00000000bd706000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd706000 - 00000000bd708000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd708000 - 00000000bd90f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd90f000 - 00000000bd99f000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd99f000 - 00000000bd9ff000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd9ff000 - 00000000bda00000 (usable)
> [    0.000000]  BIOS-e820: 00000000bdc00000 - 00000000c0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed10000 - 00000000fed14000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed18000 - 00000000fed1a000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed90000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> [    0.000000]  BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
> [    0.000000]  BIOS-e820: 0000000100000000 - 000000013c000000 (usable)
>
>
>
> This is the e820 output WITH mem=4096m
>
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
> [    0.000000]  BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000bd4a1000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd4a1000 - 00000000bd4a7000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd4a7000 - 00000000bd5b8000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd5b8000 - 00000000bd60f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd60f000 - 00000000bd6c6000 (usable)
> [    0.000000]  BIOS-e820: 00000000bd6c6000 - 00000000bd6d1000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd6d1000 - 00000000bd6d4000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd6d4000 - 00000000bd6d8000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd6d8000 - 00000000bd6dc000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd6dc000 - 00000000bd6df000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd6df000 - 00000000bd706000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd706000 - 00000000bd708000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd708000 - 00000000bd90f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000bd90f000 - 00000000bd99f000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000bd99f000 - 00000000bd9ff000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000bd9ff000 - 00000000bda00000 (usable)
> [    0.000000]  BIOS-e820: 00000000bdc00000 - 00000000c0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed10000 - 00000000fed14000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed18000 - 00000000fed1a000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed90000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> [    0.000000]  BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
> [    0.000000]  BIOS-e820: 0000000100000000 - 000000013c000000 (usable)
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] user-defined physical RAM map:
> [    0.000000]  user: 0000000000000000 - 000000000009ec00 (usable)
> [    0.000000]  user: 000000000009ec00 - 00000000000a0000 (reserved)
> [    0.000000]  user: 00000000000dc000 - 0000000000100000 (reserved)
> [    0.000000]  user: 0000000000100000 - 00000000bd4a1000 (usable)
> [    0.000000]  user: 00000000bd4a1000 - 00000000bd4a7000 (reserved)
> [    0.000000]  user: 00000000bd4a7000 - 00000000bd5b8000 (usable)
> [    0.000000]  user: 00000000bd5b8000 - 00000000bd60f000 (reserved)
> [    0.000000]  user: 00000000bd60f000 - 00000000bd6c6000 (usable)
> [    0.000000]  user: 00000000bd6c6000 - 00000000bd6d1000 (ACPI NVS)
> [    0.000000]  user: 00000000bd6d1000 - 00000000bd6d4000 (ACPI data)
> [    0.000000]  user: 00000000bd6d4000 - 00000000bd6d8000 (reserved)
> [    0.000000]  user: 00000000bd6d8000 - 00000000bd6dc000 (ACPI NVS)
> [    0.000000]  user: 00000000bd6dc000 - 00000000bd6df000 (reserved)
> [    0.000000]  user: 00000000bd6df000 - 00000000bd706000 (ACPI NVS)
> [    0.000000]  user: 00000000bd706000 - 00000000bd708000 (ACPI data)
> [    0.000000]  user: 00000000bd708000 - 00000000bd90f000 (reserved)
> [    0.000000]  user: 00000000bd90f000 - 00000000bd99f000 (ACPI NVS)
> [    0.000000]  user: 00000000bd99f000 - 00000000bd9ff000 (ACPI data)
> [    0.000000]  user: 00000000bd9ff000 - 00000000bda00000 (usable)
> [    0.000000]  user: 00000000bdc00000 - 00000000c0000000 (reserved)
> [    0.000000]  user: 00000000e0000000 - 00000000f0000000 (reserved)
> [    0.000000]  user: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  user: 00000000fed00000 - 00000000fed00400 (reserved)
> [    0.000000]  user: 00000000fed10000 - 00000000fed14000 (reserved)
> [    0.000000]  user: 00000000fed18000 - 00000000fed1a000 (reserved)
> [    0.000000]  user: 00000000fed1c000 - 00000000fed90000 (reserved)
> [    0.000000]  user: 00000000fee00000 - 00000000fee01000 (reserved)
> [    0.000000]  user: 00000000ff800000 - 0000000100000000 (reserved)
>
> So basically the BIOS is incorrectly reporting
> BIOS-e820: 0000000100000000 - 000000013c000000 (usable)
>
> right?
>
> Thanks,
> Pedro
>

(sorry for the spam)

Actually this can't be right, because booting with mem=4096m only
gives me 3047008 kb of usable memory, versus 3949684 kb without
mem=4096m.

Pedro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Sound goes too fast due to commit 7b3a177b0
http://groups.google.com/group/linux.kernel/t/2f9470dadc204a7c?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 14 2010 4:30 am
From: Éric Piel


On 14/04/10 08:08, Takashi Iwai wrote:
> At Tue, 13 Apr 2010 23:54:26 +0200,
> Éric Piel wrote:
>>
>> Hello,
>>
>> Since 2.6.34-rc*, I have a regression on alsa which prevents the sound
>> to be played correctly. When playing, the music goes too fast, skipping
>> some parts. Typically, it's very easy to reproduce by doing:
>> time mplayer -endpos 30 sound-file-which-lasts-more-than-thirty-sec.mp3
>>
>> If the wall clock is less than 30s, you have the bug. With my intel-hda
>> (AD1981), it's reliably reproducible: it gives ~27s, instead of the
>> normal ~31s.
>>
>> After bisection, it turns out that it is commit
>> 7b3a177b0d4f92b3431b8dca777313a07533a710, aka "ALSA: pcm_lib: fix
>> "something must be really wrong" condition" which caused this
>> regression. Reverting it on top of 2.6.34-rc3+ fixes the problem.
>
> What happens if you pass position_fix=1 option to snd-hda-intel?
Oh! Very good remark...
I've just noticed that I had an option already on the module:
bdl_pos_adj=0. It seems it's not needed anymore to get my card working
fine. If I remove every option (leaving bdl_pos_adj to the default value
1), it works fine. If I put bdl_pos_adj=0 and position_fix=1, it works
fine again.

I don't fully grasp the meaning of bdl_pos_adj, so I don't know if it's
a bug to not play correctly when forcing it to 0. Is it?

I'll ask to another reporter who had the same problem if bdl_pos_adj is
also set to 0...

> Is it via PulseAudio or other backend?
This happens both with pulseaudio, oss and alsa (in which case it plays
the 30s clip in 12s).

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 14 2010 5:00 am
From: Éric Piel


On 14/04/10 13:22, Éric Piel wrote:
:
> I don't fully grasp the meaning of bdl_pos_adj, so I don't know if it's
> a bug to not play correctly when forcing it to 0. Is it?
>
> I'll ask to another reporter who had the same problem if bdl_pos_adj is
> also set to 0...
Frank (cc'd here) has the same problem of music going too fast (on a
2.6.33 kernel with the "culprit" commit applied). On his system "cat
/sys/module/snd_hda_intel/parameters/bdl_pos_adj" gives:
32,32,-1,-1,-1,-1,-1,-1

He also mentioned that on another system also using snd_hda_intel, with
the same bdl_pos_adj, it works fine.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mm: disallow direct reclaim page writeback
http://groups.google.com/group/linux.kernel/t/019901c2ac5d237e?hl=en
==============================================================================

== 1 of 4 ==
Date: Wed, Apr 14 2010 4:30 am
From: Chris Mason


On Wed, Apr 14, 2010 at 12:06:36PM +0200, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> >
> > Huh, 912 bytes...for select, really? From poll.h:
> >
> > /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
> > additional memory. */
> > #define MAX_STACK_ALLOC 832
> > #define FRONTEND_STACK_ALLOC 256
> > #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
> > #define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
> >
> > So, select is intentionally trying to use that much stack. It should be using
> > GFP_NOFS if it really wants to suck down that much stack...
>
> There are lots of other call chains which use multiple KB bytes by itself,
> so why not give select() that measly 832 bytes?
>
> You think only file systems are allowed to use stack? :)

Grin, most definitely.

>
> Basically if you cannot tolerate 1K (or more likely more) of stack
> used before your fs is called you're toast in lots of other situations
> anyways.

Well, on a 4K stack kernel, 832 bytes is a very large percentage for
just one function.

Direct reclaim is a problem because it splices parts of the kernel that
normally aren't connected together. The people that code in select see
832 bytes and say that's teeny, I should have taken 3832 bytes.

But they don't realize their function can dive down into ecryptfs then
the filesystem then maybe loop and then perhaps raid6 on top of a
network block device.

>
> > kernel had some sort of way to dynamically allocate ram, it could try
> > that too.
>
> It does this for large inputs, but the whole point of the stack fast
> path is to avoid it for common cases when a small number of fds is
> only needed.
>
> It's significantly slower to go to any external allocator.

Yeah, but since the call chain does eventually go into the allocator,
this function needs to be more stack friendly.

I do agree that we can't really solve this with noinline_for_stack pixie
dust, the long call chains are going to be a problem no matter what.

Reading through all the comments so far, I think the short summary is:

Cleaning pages in direct reclaim helps the VM because it is able to make
sure that lumpy reclaim finds adjacent pages. This isn't a fast
operation, it has to wait for IO (infinitely slow compared to the CPU).

Will it be good enough for the VM if we add a hint to the bdi writeback
threads to work on a general area of the file? The filesystem will get
writepages(), the VM will get the IO it needs started.

I know Mel mentioned before he wasn't interested in waiting for helper
threads, but I don't see how we can work without it.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 4 ==
Date: Wed, Apr 14 2010 5:20 am
From: Andi Kleen


Chris Mason <chris.mason@oracle.com> writes:
>>
>> Basically if you cannot tolerate 1K (or more likely more) of stack
>> used before your fs is called you're toast in lots of other situations
>> anyways.
>
> Well, on a 4K stack kernel, 832 bytes is a very large percentage for
> just one function.

To be honest I think 4K stack simply has to go. I tend to call
it "russian roulette" mode.

It was just a old workaround for a very old buggy VM that couldn't free 8K
pages and the VM is a lot better at that now. And the general trend is
to more complex code everywhere, so 4K stacks become more and more hazardous.

It was a bad idea back then and is still a bad idea, getting
worse and worse with each MLOC being added to the kernel each year.

We don't have any good ways to verify that obscure paths through
the more and more subsystems won't exceed it (in fact I'm pretty
sure there are plenty of problems in exotic configurations)

And even if you can make a specific load work there's basically
no safety net.

The only part of the 4K stack code that's good is the separate
interrupt stack, but that one should be just combined with a sane 8K
process stack.

But yes on a 4K kernel you probably don't want to do any direct reclaim.
Maybe for GFP_NOFS everywhere except user allocations when it's set?
Or simply drop it?

> But they don't realize their function can dive down into ecryptfs then
> the filesystem then maybe loop and then perhaps raid6 on top of a
> network block device.

Those stackings need to use separate threads anyways. A lot of them
do in fact. Block avoided this problem by iterating instead of
recursing. Those that still recurse on the same stack simply
need to be fixed.

> Yeah, but since the call chain does eventually go into the allocator,
> this function needs to be more stack friendly.

For common fast paths it doesn't go into the allocator.

-Andi

--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 3 of 4 ==
Date: Wed, Apr 14 2010 5:30 am
From: Alan Cox


> The only part of the 4K stack code that's good is the separate
> interrupt stack, but that one should be just combined with a sane 8K
> process stack.

The reality is that if you are blowing a 4K process stack you are
probably playing russian roulette on the current 8K x86-32 stack as well
because of the non IRQ split. So it needs fixing either way
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 4 of 4 ==
Date: Wed, Apr 14 2010 5:40 am
From: Andi Kleen


On Wed, Apr 14, 2010 at 01:32:29PM +0100, Alan Cox wrote:
> > The only part of the 4K stack code that's good is the separate
> > interrupt stack, but that one should be just combined with a sane 8K
> > process stack.
>
> The reality is that if you are blowing a 4K process stack you are
> probably playing russian roulette on the current 8K x86-32 stack as well
> because of the non IRQ split. So it needs fixing either way

Yes I think the 8K stack on 32bit should be combined with a interrupt
stack too. There's no reason not to have an interrupt stack ever.

Again the problem with fixing it is that you won't have any safety net
for a slightly different stacking etc. path that you didn't cover.

That said extreme examples (like some of those Chris listed) definitely
need fixing by moving them to different threads. But even after that
you still want a safety net. 4K is just too near the edge.

Maybe it would work if we never used any indirect calls, but that's
clearly not the case.

-Andi

--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: tun: orphan an skb on tx
http://groups.google.com/group/linux.kernel/t/27f4f72d2c57841a?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 5:00 am
From: David Miller


From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 14 Apr 2010 08:58:22 +0800

> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
>>
>> Herbert Acked your patch, so I guess its OK, but I think it can be
>> dangerous.
>
> The tun socket accounting was never designed to stop it from
> flooding another tun interface. It's there to stop it from
> transmitting above a destination interface TX bandwidth and
> cause unnecessary packet drops. It also limits the total amount
> of kernel memory that can be pinned down by a single tun interface.
>
> In this case, all we're doing is shifting the accounting from the
> "hardware" queue to the qdisc queue.
>
> So your ability to flood a tun interface is essentially unchanged.
>
> BTW we do the same thing in a number of hardware drivers, as well
> as virtio-net.

Right. Although this reminds me about the whole SKB
orphaning on xmit issue that keeps coming back to haunt
us.

If there weren't odd references to the SKB's socket in
the packet scheduler et al. we could just orphan these
things right upon entry to the qdisc and not have to
add hacks like this to every driver.

In fact... maybe we can just do it in dev_hard_queue_xmit()
since we are out of the qdisc at that point.... but I guess
there might be weird drivers that want the SKB socket in
their ->xmit routine... Ho hum.

In any event that's net-next-2.6 exploratory material, and I've
applied this patch to net-2.6, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: X86: Optimise fls(), ffs() and fls64()
http://groups.google.com/group/linux.kernel/t/ccd9cc33b88612fd?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 5:00 am
From: David Howells


Matthew Wilcox <matthew@wil.cx> wrote:

> I don't know whether we can get it /documented/, but the architect I
> asked said "We'll never get away with reverting to the older behavior,
> so in essence the architecture is set to not overwrite."

Does that mean we can rely on it? Linus?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Trying to fix ITE-887x parallel/serial driver bugs (including unhandled
IRQs)
http://groups.google.com/group/linux.kernel/t/8b58e8b6f16e6544?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 5:10 am
From: Alan Cox


> > for free address ranges. It's hardly 'random probing' and it isn't a
> > normal BAR or guaranteed to have been configured by anything beforehand.

> Yes, but the drivers don't use INTCBAR. They scan through a herd-coded
> list of possible port ranges:

They can't just use INTCBAR as it may not have been configured by anything
before the device is probed. We do need to find and assign suitable ports
although this should probably be done via the PCI quirks on PCI setup.

> The interrupt controller (INTC) provides IRQ status bits in a single
> I/O port byte. I only have the datasheet for IT8875F, which is 1
> parallel and no serial ports, so my understanding of UART IRQs is sort
> of reverse engineering. Polling the UART IIR register shows that it
> acts like a normal 16550 UART when set to SERIRQ mode, but I get no
> IRQs generated. Setting to standard PCI INTA interrupts, the IRQs are
> generated, but the UART IIR registers do not set the IRQ bits.

Weird design.

http://kr.ic-on-line.cn/IOL/viewpdf/IT8871F_200216.htm

provides a tiny bit of info on the uart side but I've not found any
complete documentation.

> One possibility is to install custom io_serial_in/out functions. When
> the value of IIR is requested, the INTC can be checked, and modify the
> result to what it should be for a normal UART. That is a bit of a
> hack, but makes it much easier to fit into the existing code.

Seems a good starting point. The serial side code is robust for shared
IRQs.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Input: add HP Compaq 2710p to 'noloop' table
http://groups.google.com/group/linux.kernel/t/1127ff79543457ae?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 14 2010 5:20 am
From: Jiri Kosina


Add HP Compaq 2710p to 'noloop' table. Otherwsie this machine is reported
to not report properly to AUX_LOOP command time-to-time, causing
non-working keyboard due to messed up i8042.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

---

The relevant i8042.debug output from the failing case is

drivers/input/serio/i8042.c: 20 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 65 <- i8042 (return) [0]
drivers/input/serio/i8042.c: 60 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 74 -> i8042 (parameter) [0]
drivers/input/serio/i8042.c: d3 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 5a -> i8042 (parameter) [0]
drivers/input/serio/i8042.c: 5a <- i8042 (return) [0]
drivers/input/serio/i8042.c: a7 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 20 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 74 <- i8042 (return) [0]
drivers/input/serio/i8042.c: a8 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 20 -> i8042 (command) [0]
drivers/input/serio/i8042.c: 54 <- i8042 (return) [1]
drivers/input/serio/i8042.c: 60 -> i8042 (command) [1]
drivers/input/serio/i8042.c: 56 -> i8042 (parameter) [1]
drivers/input/serio/i8042.c: d3 -> i8042 (command) [1]
drivers/input/serio/i8042.c: a5 -> i8042 (parameter) [1]
drivers/input/serio/i8042.c: 60 -> i8042 (command) [1]
drivers/input/serio/i8042.c: 74 -> i8042 (parameter) [1]
drivers/input/serio/i8042.c: d3 -> i8042 (command) [1]
drivers/input/serio/i8042.c: f0 -> i8042 (parameter) [1]
drivers/input/serio/i8042.c: 00 <- i8042 (return) [1]
drivers/input/serio/i8042.c: 60 -> i8042 (command) [1]
drivers/input/serio/i8042.c: 56 -> i8042 (parameter) [1]
drivers/input/serio/i8042.c: 60 -> i8042 (command) [1]
drivers/input/serio/i8042.c: 47 -> i8042 (parameter) [1]
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
drivers/input/serio/i8042.c: f2 -> i8042 (kbd-data) [1]
...
drivers/input/serio/i8042.c: ed -> i8042 (kbd-data) [52]

drivers/input/serio/i8042-x86ia64io.h | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h
index ead0494..d2c7cf0 100644
--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -172,6 +172,13 @@ static const struct dmi_system_id __initconst i8042_dmi_noloop_table[] = {
DMI_MATCH(DMI_PRODUCT_VERSION, "Rev 1"),
},
},
+ {
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "HP Compaq 2710p (#ABD)"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "F.14"),
+ },
+ },
{ }
};

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Wed, Apr 14 2010 5:40 am
From: Matthew Garrett


On Wed, Apr 14, 2010 at 02:16:25PM +0200, Jiri Kosina wrote:
> + .matches = {
> + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"),
> + DMI_MATCH(DMI_PRODUCT_NAME, "HP Compaq 2710p (#ABD)"),
> + DMI_MATCH(DMI_PRODUCT_VERSION, "F.14"),

Is this really the only BIOS version affected? It seems unlikely. It's
also a pretty good indication that we're doing something wrong.

--
Matthew Garrett | mjg59@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: powerpc/perf_event: Fix oops due to perf_event_do_pending call
http://groups.google.com/group/linux.kernel/t/059e16f430142a41?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 14 2010 5:20 am
From: Michael Ellerman


On Wed, 2010-04-14 at 16:46 +1000, Paul Mackerras wrote:
> Anton Blanchard found that large POWER systems would occasionally
> crash in the exception exit path when profiling with perf_events.
> The symptom was that an interrupt would occur late in the exit path
> when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts
> should be hard-disabled at this point but they were enabled. Because
> the interrupt was not recoverable the system panicked.
>
...
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 1b16b9a..0441bbd 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -532,25 +532,60 @@ void __init iSeries_time_init_early(void)
> }
>

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate