twitter: linux.kernel - 26 new messages in 12 topics

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

Today's topics:

* Unify KVM kernel-space and user-space code into a single project - 10
messages, 3 authors
http://groups.google.com/group/linux.kernel/t/728080d27436ebc6?hl=en
* iwl6050 firmware? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/0cf355ced9420544?hl=en
* drivers:misc: sources for Init manager module - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/dea8452d95b440c1?hl=en
* serial: TTY: new ldisc for TI BT/FM/GPS chips - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a48539970046ed8e?hl=en
* [PATCH 1/3] ARM: Rudimentary syscall interfaces - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/9125e56499e8192a?hl=en
* signals-clear-signal-tty-when-the-last-thread-exits.fix - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/e005c59e3057f94b?hl=en
* move_task_off_dead_cpu: take rq->lock around select_fallback_rq() - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/4d5446e9c56ba098?hl=en
* first bad commit: 1f36f774 Switch !O_CREAT case to use of do_last() - 4
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/038ede0f9291048c?hl=en
* Loan Offer - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/aa885d388204be40?hl=en
* x86,pat Convert memtype_lock into an rw_lock. - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/0eaa0952026b905c?hl=en
* behavior of recvmmsg() on blocking sockets - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c024c7e374f1ca59?hl=en
* oom killer: break from infinite loop - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5b8c0541d70dad4c?hl=en

==============================================================================
TOPIC: Unify KVM kernel-space and user-space code into a single project
http://groups.google.com/group/linux.kernel/t/728080d27436ebc6?hl=en
==============================================================================

== 1 of 10 ==
Date: Wed, Mar 24 2010 8:50 am
From: Joerg Roedel

On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>> $ cd /sys/kvm/guest0
>> $ ls -l
>> -r-------- 1 root root 0 2009-08-17 12:05 name
>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>> $ cat name
>> guest0
>> $ # ...
>>
>> The fs/ directory is used as the mount point for the guest root fs.
>
> The problem is /sys/kvm, not /sys/kvm/fs.

I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>
> How I see it: perf-kernel puts the guest pid into every sample, and
> perf-userspace uses that to resolve to a mountpoint served by fuse, or
> to a unix domain socket that serves the files.

We need a bit more information than just the qemu-pid, but yes, this
would also work out.

>> If a vm breaks into qemu it can access the host file system which is the
>> bigger problem. In this case there is no isolation anymore. From that
>> context it can even kill other VMs of the same user independent of a
>> hypothetical /sys/kvm/.
>
> It cannot. sVirt labels the disk image and other files qemu needs with
> the appropriate label, and everything else is off limits. Even if you
> run the guest as root, it won't have access to other files.

See my reply to Daniel's email.

>> Yes, but its different from the implementation point-of-view. For the
>> user it surely all plays together.
>
> We need qemu to cooperate for mmio tracing, and we can cooperate with
> qemu for symbol resolution. If it prevents adding another kernel API,
> that's a win from my POV.

Thats true. Probably qemu can inject this information in the
kvm-trace-events stream.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 10 ==
Date: Wed, Mar 24 2010 9:00 am
From: Joerg Roedel

On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>> Even better. So a guest which breaks out can't even access its own
>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>
> But what security label does that directory have? How can we make sure
> that whoever needs access to those files, gets them?
>
> Automatically created objects don't work well with that model. They're
> simply missing information.

If we go the /proc/<pid>/kvm way then the directory should probably
inherit the label from /proc/<pid>/?
Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/<pid> after all.

Joerg

== 3 of 10 ==
Date: Wed, Mar 24 2010 9:00 am
From: Avi Kivity

On 03/24/2010 05:50 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
>
>> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>>
>>> Even better. So a guest which breaks out can't even access its own
>>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>>>
>> But what security label does that directory have? How can we make sure
>> that whoever needs access to those files, gets them?
>>
>> Automatically created objects don't work well with that model. They're
>> simply missing information.
>>
> If we go the /proc/<pid>/kvm way then the directory should probably
> inherit the label from /proc/<pid>/?
>

That's a security policy. The security people like their policies
outside the kernel.

For example, they may want a label that allows a trace context to read
the data, and also qemu itself for introspection.

> Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
> still bound to a single process with a /proc/<pid> after all.
>

Ditto.

--
error compiling committee.c: too many arguments to function

== 4 of 10 ==
Date: Wed, Mar 24 2010 9:00 am
From: Joerg Roedel

On Wed, Mar 24, 2010 at 05:49:42PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:46 PM, Joerg Roedel wrote:
>> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>>
>>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>>
>>>> $ cd /sys/kvm/guest0
>>>> $ ls -l
>>>> -r-------- 1 root root 0 2009-08-17 12:05 name
>>>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>>>> $ cat name
>>>> guest0
>>>> $ # ...
>>>>
>>>> The fs/ directory is used as the mount point for the guest root fs.
>>>>
>>> The problem is /sys/kvm, not /sys/kvm/fs.
>>>
>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
>> example. This would keep anything in the process space (except for the
>> global list of VMs which we should have anyway).
>>
>
> How about ~/.qemu/guests/$pid?

That makes it hard for perf to find it and even harder to get a list of
all VMs. With /proc/<pid>/kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.

Joerg

== 5 of 10 ==
Date: Wed, Mar 24 2010 9:00 am
From: Avi Kivity

On 03/24/2010 05:46 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>
>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>
>>> $ cd /sys/kvm/guest0
>>> $ ls -l
>>> -r-------- 1 root root 0 2009-08-17 12:05 name
>>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>>> $ cat name
>>> guest0
>>> $ # ...
>>>
>>> The fs/ directory is used as the mount point for the guest root fs.
>>>
>> The problem is /sys/kvm, not /sys/kvm/fs.
>>
> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
> example. This would keep anything in the process space (except for the
> global list of VMs which we should have anyway).
>

How about ~/.qemu/guests/$pid?

--
error compiling committee.c: too many arguments to function

== 6 of 10 ==
Date: Wed, Mar 24 2010 9:10 am
From: Peter Zijlstra

On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.

So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.

And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 7 of 10 ==
Date: Wed, Mar 24 2010 9:20 am
From: Avi Kivity

On 03/24/2010 06:03 PM, Peter Zijlstra wrote:
> On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
>
>
>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>>
> I'd much prefer a pid like suggested later, keeps the samples smaller.
>
> But that said, we need guest kernel events like mmap and context
> switches too, otherwise we simply can't make sense of guest userspace
> addresses, we need to know the guest address space layout.
>

The kernel knows some of the address space layout, qemu knows all of it.

> So aside from a filesystem content, we first need mmap and context
> switch events to find the files we need to access.
>

This only works for the guest kernel, we don't know anything about guest
processes [1].

> And while I appreciate all the security talk, its basically pointless
> anyway, the host can access it anyway, everybody agrees on that, but
> still you're arguing the case..
>

root can access anything, but we're not talking about root. The idea is
to protect against a guest that has exploited its qemu and is now
attacking the host and its fellow guests. uid protection is no good
since we want to isolate the guest from host processes belonging to the
same uid and from other guests running under the same uid.

[1] We can find out guest pids if we teach the kernel what to
dereference, i.e. gs:offset1->offset2->offset3. Of course this varies
from kernel to kernel, so we need some kind of bytecode that we can run
in perf nmi context. Kind of what we need to run an unwinder for
-fomit-frame-pointer.

--
error compiling committee.c: too many arguments to function

== 8 of 10 ==
Date: Wed, Mar 24 2010 9:20 am
From: Avi Kivity

On 03/24/2010 05:59 PM, Joerg Roedel wrote:
>
>
>>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
>>> example. This would keep anything in the process space (except for the
>>> global list of VMs which we should have anyway).
>>>
>>>
>> How about ~/.qemu/guests/$pid?
>>
> That makes it hard for perf to find it and even harder to get a list of
> all VMs.

Looks trivial to find a guest, less so with enumerating (still doable).

> With /proc/<pid>/kvm/guest we could symlink all guest
> directories to /proc/kvm/ and perf reads the list from there. Also perf
> can easily derive the directory for a guest from its pid.
> Last but not least its kernel-created and thus independent from the
> userspace part being used.
>

Doesn't perf already has a dependency on naming conventions for finding
debug information?

--
error compiling committee.c: too many arguments to function

== 9 of 10 ==
Date: Wed, Mar 24 2010 9:20 am
From: Joerg Roedel

On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>> If we go the /proc/<pid>/kvm way then the directory should probably
>> inherit the label from /proc/<pid>/?
>
> That's a security policy. The security people like their policies
> outside the kernel.
>
> For example, they may want a label that allows a trace context to read
> the data, and also qemu itself for introspection.

Hm, I am not a security expert. But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

Joerg

== 10 of 10 ==
Date: Wed, Mar 24 2010 9:30 am
From: Avi Kivity

On 03/24/2010 06:17 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
>
>> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>>
>>> If we go the /proc/<pid>/kvm way then the directory should probably
>>> inherit the label from /proc/<pid>/?
>>>
>> That's a security policy. The security people like their policies
>> outside the kernel.
>>
>> For example, they may want a label that allows a trace context to read
>> the data, and also qemu itself for introspection.
>>
> Hm, I am not a security expert.

I'm out of my depth here as well.

> But is this not only one entity more for
> sVirt to handle? I would leave that decision to the sVirt developers.
> Does attaching the same label as for the VM resources mean that root
> could not access it anymore?
>

IIUC processes run under a context, and there's a policy somewhere that
tells you which context can access which label (and with what
permissions). There was a server on the Internet once that gave you
root access and invited you to attack it. No idea if anyone succeeded
or not (I got bored after about a minute).

So it depends on the policy. If you attach the same label, that means
all files with the same label have the same access permissions. I think.

--
error compiling committee.c: too many arguments to function

==============================================================================
TOPIC: iwl6050 firmware?
http://groups.google.com/group/linux.kernel/t/0cf355ced9420544?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:00 am
From: Andy Isaacson

On Wed, Mar 24, 2010 at 09:15:53AM +0100, drago01 wrote:
> > It's been several months since iwl6050 support was merged to mainline,
> >
> > commit e1228374d648efe451973bc5f3d1f9a8e943ec0b
[snip]
> > but the firmware isn't yet in linux-firmware.git nor on
>
> http://intellinuxwireless.org/iwlwifi/downloads/iwlwifi-6000-ucode-9.193.4.1.tgz
> ?
>
> I don't think the 6050 need a separate firmware.

Yes, it does; it's asking for iwlwifi-6050-4.ucode on boot. (I was
suprised too.)

-andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: drivers:misc: sources for Init manager module
http://groups.google.com/group/linux.kernel/t/dea8452d95b440c1?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Mar 24 2010 9:00 am
From: Greg KH

On Wed, Mar 24, 2010 at 08:24:01PM +0530, Pavan Savoy wrote:
> 4. As Alan suggested, If I make it self-contained by pushing number of
> line disciplines to a slightly larger number, then would it be OK ?

That is fine with me, as long as you continue to work on fixing the
issues in the code, and do not object to changing the user/kernel
interface over time to be one that is more sane.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Wed, Mar 24 2010 9:20 am
From: Marcel Holtmann

Hi Pavan,

> I wanted to somehow put this in staging because then it would probably have a thorough architectural review process.
> Some details about this driver -
>
> 1. This driver will be used by Bluetooth-BlueZ/FM-V4L2 and GPS (probably character device driver) using the EXPORTED symbols (-register/_unregister).
>
> 2. Much like the hciattach daemon which maintains N_HCI bluetooth line discipline, this driver will also have a User-Space N_TI_WL Init manager (UIM) maintaining the Line discipline.

can you explain why you think this is needed and we can not interface
this directly. If it is a serial port, what protocol does it talk?

> 3. Because of the UIM should know when to install/uninstall line discipline, the /sys entry is created a root called UIM (a new kobject) and UIM daemon would write it's PID to it.

I don't understand this. This sounds like a broken concept to me.

> 4. As Alan suggested, If I make it self-contained by pushing number of line disciplines to a slightly larger number, then would it be OK ?

Just from a quick look, I think within a few review cycles this might be
able to get proper upstream inclusion. No idea why bother with staging
in the first place. Lets do this correctly.

Regards

Marcel

==============================================================================
TOPIC: serial: TTY: new ldisc for TI BT/FM/GPS chips
http://groups.google.com/group/linux.kernel/t/a48539970046ed8e?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:00 am
From: Greg KH

On Wed, Mar 24, 2010 at 08:43:07AM +0000, Alan Cox wrote:
> On Tue, 23 Mar 2010 19:28:03 -0700
> Greg KH <gregkh@suse.de> wrote:
>
> > On Tue, Mar 23, 2010 at 03:41:09PM -0500, pavan_savoy@ti.com wrote:
> > > From: Pavan Savoy <pavan_savoy@ti.com>
> > >
> > > A new N_TI_WL line discipline added for TI BT/FM/GPS
> > > combo chips which make use of same TTY to communicate
> > > with chip. This is to be made use of individual protocol
> > > BT/FM/GPS drivers.
> > >
> > > Signed-off-by: Pavan Savoy <pavan_savoy@ti.com>
> > > ---
> > > include/linux/tty.h | 3 ++-
> >
> > Staging code needs to be completly self-contained in the drivers/staging
> > directory. Is there any way to do this without touching this file?
>
> Put the N_TI_WL into a staging header, and bump the number of ldiscs as a
> separate non staging patch - we should probably simply bump it a bit
> anyway to leave room for experimental work using a built kernel image.
> There is no requirement that the number of ldiscs matches up to the last
> ldisc assigned.

That sounds good to me.

thanks,

==============================================================================
TOPIC: [PATCH 1/3] ARM: Rudimentary syscall interfaces
http://groups.google.com/group/linux.kernel/t/9125e56499e8192a?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:00 am
From: Oren Laadan

Matt Helsley wrote:
> On Wed, Mar 24, 2010 at 12:57:46AM -0400, Oren Laadan wrote:
>> On Tue, 23 Mar 2010, Matt Helsley wrote:
>>
>>> On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux wrote:
>>>> On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
>>>>> This small commit introduces a global state of system calls for ARM
>>>>> making it possible for a debugger or checkpointing to gain information
>>>>> about another process' state with respect to system calls.
>>>> I don't particularly like the idea that we always store the syscall
>>>> number to memory for every system call, whether the stored version is
>>>> used or not.
>>>>
>>>> Since ARM caches are generally not write allocate, this means mostly
>>>> write-only variables can have a higher than expected expense.
>>>>
>>>> Is there not some thread flag which can be checked to see if we need to
>>>> store the syscall number?
>>> Perhaps before we freeze the task we can save the syscall number on ARM.
>>> The patches suggest that the signal delivery path -- which the freezer
>>> utilizes -- has the syscall number already.

Actually, the signal path doesn't have the syscall number, it has
a binary "in syscall" value.

>>>
>>> Should work since the threads must be frozen first anyway.
>> I like the idea.
>>
>> However, would it also work for those cases when the freezing does not
>> occur from the signal delivery path - e.g. for vfork and ptraced tasks ?
>
> We could just as easily set it before the vfork uninterruptible completion.
> ptracing I'd don't know about though.
>

vfork() uses freezer_do_not_count() to tell the freezer that it's
effectively frozen. It's also used by drivers/char/apm-emulation.c

Looking at calls to ptrace_notify(), ptrace_stop() and ptace_event(),
there are several places where a ptraced task can stop with TASK_TRACED
(which is good enough for the freezer), outside the signal handling
path.

This means that recording the syscall number for all these cases is
going to be tedious and intrusive.

I prefer to somehow figure out the syscall from the task's state or
pt_regs, or by (re)using the same assembly code that already does that.

Oren.

==============================================================================
TOPIC: signals-clear-signal-tty-when-the-last-thread-exits.fix
http://groups.google.com/group/linux.kernel/t/e005c59e3057f94b?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:10 am
From: Oleg Nesterov

On 03/24, Andrew Morton wrote:
>
> > - struct tty_struct *tty;
> > + struct tty_struct *tty = NULL; /* supress gcc warning */
>
> uninitialized_var() is a neater way.

Aha, indeed.

Will resend soon...

Oleg.

==============================================================================
TOPIC: move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
http://groups.google.com/group/linux.kernel/t/4d5446e9c56ba098?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Mar 24 2010 9:10 am
From: Oleg Nesterov

On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > {
> > + struct rq *rq = cpu_rq(dead_cpu);
> > + int needs_cpu, dest_cpu;
> > + unsigned long flags;
> > again:
> > + local_irq_save(flags);
> > +
> > + raw_spin_lock(&rq->lock);
> > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
>
> ^
> kernel/sched.c:5445: warning: 'dest_cpu' may be used uninitialized in this function

Hmm. looks like my gcc is more friendly...

OK. certainly I'll send the updated patch, if this series passes
your review otherwise.

Oleg.

== 2 of 2 ==
Date: Wed, Mar 24 2010 9:20 am
From: Peter Zijlstra

On Wed, 2010-03-24 at 17:07 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > > {
> > > + struct rq *rq = cpu_rq(dead_cpu);
> > > + int needs_cpu, dest_cpu;
> > > + unsigned long flags;
> > > again:
> > > + local_irq_save(flags);
> > > +
> > > + raw_spin_lock(&rq->lock);
> > > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
> >
> > ^
> > kernel/sched.c:5445: warning: 'dest_cpu' may be used uninitialized in this function
>
> Hmm. looks like my gcc is more friendly...

Hrm, that and I'm apparently unable to read, it said dest_cpu, not
dead_cpu.. a well, I'll slam an __maybe_unused in.

> OK. certainly I'll send the updated patch, if this series passes
> your review otherwise.

Yeah, you made a few good points in 0/6, am now staring at the code on
how to close those holes, hope to post something sensible soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: first bad commit: 1f36f774 Switch !O_CREAT case to use of do_last()
http://groups.google.com/group/linux.kernel/t/038ede0f9291048c?hl=en
==============================================================================

== 1 of 4 ==
Date: Wed, Mar 24 2010 9:10 am
From: Al Viro

On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote:
> > Bloody impressive... Does that happen to underlying fs or to what you
> > are seeing via NFS?
>
> Only via NFS. All local access is fine.
>
> After the corruption above I can cd to the local mount cp a fresh copy
> of .git/index file and play around just fine.
> Once I return to the NFS mounted directory, a git status will do it.
> It does not matter if caches are cold (Takes a long time) or hot it happens
> every time.
>
> Weird I know, I'm playing some more with it as we speak

What happens if you export to box running older kernel *or* from box
running older kernel? IOW, is that nfsd or nfs client getting unhappy?
I'd suspect the latter, but...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 4 ==
Date: Wed, Mar 24 2010 9:10 am
From: Boaz Harrosh

On 03/24/2010 06:00 PM, Al Viro wrote:
> On Wed, Mar 24, 2010 at 05:49:39PM +0200, Boaz Harrosh wrote:
>> - I have an exofs filesystem mounted on /mnt/exofs
>> - []$ cd /mnt/exofs/some_linux_git; git status;
>> All is fine
>> - []$ mount -t nfs4 -o minorversion=0 localhost:/ /mnt/nfs
>> (Where etc/exports will export /mnt/exofs via nfs4.1)
>> - []$ cd /mnt/nfs/some_linux_git; git status;
>> This will fail and will corrupt the .git/index file. Sometimes the file would be
>> too short, and sometimes the file will become a directory (Yes really)
>
> Bloody impressive... Does that happen to underlying fs or to what you
> are seeing via NFS?

Only via NFS. All local access is fine.

After the corruption above I can cd to the local mount cp a fresh copy
of .git/index file and play around just fine.
Once I return to the NFS mounted directory, a git status will do it.
It does not matter if caches are cold (Takes a long time) or hot it happens
every time.

Weird I know, I'm playing some more with it as we speak

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 4 ==
Date: Wed, Mar 24 2010 9:10 am
From: Al Viro

On Wed, Mar 24, 2010 at 05:49:39PM +0200, Boaz Harrosh wrote:
> - I have an exofs filesystem mounted on /mnt/exofs
> - []$ cd /mnt/exofs/some_linux_git; git status;
> All is fine
> - []$ mount -t nfs4 -o minorversion=0 localhost:/ /mnt/nfs
> (Where etc/exports will export /mnt/exofs via nfs4.1)
> - []$ cd /mnt/nfs/some_linux_git; git status;
> This will fail and will corrupt the .git/index file. Sometimes the file would be
> too short, and sometimes the file will become a directory (Yes really)

Bloody impressive... Does that happen to underlying fs or to what you
are seeing via NFS?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 4 of 4 ==
Date: Wed, Mar 24 2010 9:20 am
From: Boaz Harrosh

On 03/24/2010 06:07 PM, Al Viro wrote:
> On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote:
>>> Bloody impressive... Does that happen to underlying fs or to what you
>>> are seeing via NFS?
>>
>> Only via NFS. All local access is fine.
>>
>> After the corruption above I can cd to the local mount cp a fresh copy
>> of .git/index file and play around just fine.
>> Once I return to the NFS mounted directory, a git status will do it.
>> It does not matter if caches are cold (Takes a long time) or hot it happens
>> every time.
>>
>> Weird I know, I'm playing some more with it as we speak
>
> What happens if you export to box running older kernel *or* from box
> running older kernel? IOW, is that nfsd or nfs client getting unhappy?
> I'd suspect the latter, but...

Good question, I'm just getting to that because currently it's all
over localhost (same kernel, BTW inside a UML)

I will try what you said. Please through any other tests on me, if needed.

Boaz

==============================================================================
TOPIC: Loan Offer
http://groups.google.com/group/linux.kernel/t/aa885d388204be40?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:20 am
From: "global loan company"

==============================================================================
TOPIC: x86,pat Convert memtype_lock into an rw_lock.
http://groups.google.com/group/linux.kernel/t/0eaa0952026b905c?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:20 am
From: Suresh Siddha

On Wed, 2010-03-24 at 04:32 -0700, Peter Zijlstra wrote:
> On Wed, 2010-03-17 at 16:19 -0800, Suresh Siddha wrote:
> > On Wed, 2010-03-17 at 12:51 -0700, H. Peter Anvin wrote:
> > > Well, as you know :) tglx and I are on the road ... I'll try to get to it on Friday before I take off again.
> >
> > Also I talked to Thomas about this rwlock conversion and he referred to
> > RT issues with rwlock. And the best is to avoid this using RCU.
>
> Its not just RT, even for mainline rwlock_t is a massive pain and often
> is no better (actually worse) than a spinlock due to the massive
> cacheline bouncing it introduces.

Don't we have the same cacheline bouncing issues with the ticket
spinlocks?

thanks,
suresh

==============================================================================
TOPIC: behavior of recvmmsg() on blocking sockets
http://groups.google.com/group/linux.kernel/t/c024c7e374f1ca59?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:20 am
From: Brandon Black

[Not on the list, please CC responses]

Currently, my application code uses blocking UDP sockets and is
basically structured like this:

while(1) {
recvmsg(fd, ...);
// do some work on the packet
sendmsg(fd, ...);
}

It uses a thread-per-socket model, and the "do some work" code is very
fast, and so this turns out to be more efficient than non-blocking
techniques for my use case. Today I started playing with 2.6.33's new
recvmmsg(), hoping to convert my code like so (still on blocking
sockets):

while(1) {
recvmmsg(fd, ...);
// do some work on up to N packets
// loop over sendmsg() foreach packet to be sent
// (or sendmmsg() if/when that interface becomes available)
}

The catch I ran into is that on a blocking socket, recvmmsg() will
block until *all* vlen packets have been received. The behavior I'd
prefer for my use case would be for it to only block until at least
one packet is available, not until all are available. Or in code
terms, the first internal call to recvmsg should use the supplied
flags, and the rest of the recvmsg calls should use flags &
MSG_DONTWAIT. It's not clear to me which is the better default
behavior, but I feel like at the very least there should be a flag
that can switch behavior between the two possible interpretations of
"blocking".

Obviously, I can also work around this at the user level by simply
switching to a non-blocking socket and using select/poll before
recvmmsg, but then under any conditions where only 1 packet is
available (probably fairly common) I'm issuing two syscalls per packet
when only one should be necessary (and only one is necessary when
using recvmsg()). This seems inefficient and antithetical to one of
the design goals of recvmmsg (cut down the syscalls:packets ratio).

Thoughts on this?

Thanks,
-- Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: oom killer: break from infinite loop
http://groups.google.com/group/linux.kernel/t/5b8c0541d70dad4c?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Mar 24 2010 9:30 am
From: Anfei Zhou

In multi-threading environment, if the current task(A) have got
the mm->mmap_sem semaphore, and the thread(B) in the same process
is selected to be oom killed, because they shares the same semaphore,
thread B can not really be killed. So __alloc_pages_slowpath turns
to be a infinite loop. Here set all the threads in the group to
TIF_MEMDIE, it gets a chance to break and exit.

Signed-off-by: Anfei Zhou <anfei.zhou@gmail.com>
---
mm/oom_kill.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9b223af..aab9892 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -381,6 +381,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
*/
static void __oom_kill_task(struct task_struct *p, int verbose)
{
+ struct task_struct *t;
+
if (is_global_init(p)) {
WARN_ON(1);
printk(KERN_WARNING "tried to kill init!\n");
@@ -412,6 +414,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
+ for (t = next_thread(p); t != p; t = next_thread(t))
+ set_tsk_thread_flag(t, TIF_MEMDIE);

force_sig(SIGKILL, p);
}
--
1.6.4.rc1

==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Wednesday, March 24, 2010

linux.kernel - 26 new messages in 12 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts