linux.kernel - 26 new messages in 15 topics - digest
linux.kernel
http://groups.google.com/group/linux.kernel?hl=en
Today's topics:
* Q: select_fallback_rq() && cpuset_lock() - 5 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/ac95f8c657b055e2?hl=en
* bit errors on spitz - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/32584eab7e27d325?hl=en
* oprofile, perf, x86: introduce new functions to reserve perfctrs - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/716ea40ac646a344?hl=en
* Squashfs updates for 2.6.34-rc1 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c4f8fdfe2329a906?hl=en
* Race condition in TTY ldisc layer in 2.6.33 - 4 messages, 3 authors
http://groups.google.com/group/linux.kernel/t/0dd856e75c9c2098?hl=en
* ATA 4 KiB sector issues. - 4 messages, 4 authors
http://groups.google.com/group/linux.kernel/t/94d9b232ec44429a?hl=en
* aio: compat_ioctl issue? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b4bbeb78a34e8be1?hl=en
* 2.6.34-rc1: rcu lockdep bug? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/20ae452eb2399ce9?hl=en
* trivial: driver-core: document ERR_PTR() return values - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/50f4e3c0ee1c5143?hl=en
* x86,pat Update the page flags for memtype atomically instead of using
memtype_lock. - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/e2baf95b4f9a3c5d?hl=en
* 2.6.34-rc1: kernel BUG at mm/slab.c:2989! - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/affffdf23d5859d0?hl=en
* powerpc/BSR: fix device_create() return value check - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8786ed49bb2be013?hl=en
* c2port: fix device_create() return value check - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8aea6e5a429a53dd?hl=en
* mtd/nand/r852: fix build for CONFIG_PM=n - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/56fd47df335c0b65?hl=en
* Reposting: [PATCH v1] serial: Add OMAP high-speed UART driver. - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/9800da1317981df4?hl=en
==============================================================================
TOPIC: Q: select_fallback_rq() && cpuset_lock()
http://groups.google.com/group/linux.kernel/t/ac95f8c657b055e2?hl=en
==============================================================================
== 1 of 5 ==
Date: Thurs, Mar 11 2010 7:30 am
From: Oleg Nesterov
On 03/11, Oleg Nesterov wrote:
>
> How can we fix this later? Perhaps we can change
> cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and
> fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask.
Wait. We need to fix the CPU_DEAD case anyway?
Hmm. 6ad4c18884e864cf4c77f9074d3d1816063f99cd
"sched: Fix balance vs hotplug race" did s/CPU_DEAD/CPU_DOWN_PREPARE/
in cpuset_track_online_cpus(). This doesn't look exactly right to me,
we shouldn't do remove_tasks_in_empty_cpuset() at CPU_DOWN_PREPARE
stage, it can fail.
Otoh. This means that move_task_of_dead_cpu() can never see the
task without active cpus in ->cpus_allowed, it is called later by
CPU_DEAD. So, cpuset_lock() is not needed at all.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 5 ==
Date: Thurs, Mar 11 2010 7:40 am
From: Peter Zijlstra
On Thu, 2010-03-11 at 15:52 +0100, Oleg Nesterov wrote:
> On 03/10, Oleg Nesterov wrote:
> >
> > On 03/10, Peter Zijlstra wrote:
> > >
> > > Right, so if you refresh these patches, I'll feed them to mingo and they
> > > should eventually end up in -linus, how's that? :-)
> >
> > Great. Will redo/resend tomorrow ;)
>
> That was a great plan, but it doesn't work.
>
> With the recent changes we have even more problems with
> cpuset_cpus_allowed_locked(). Not only it misses cpuset_lock() (which
> doesn't work anyway and must die imho), it is very wrong to even call
> this function from try_to_wakeup() - this can deadlock.
>
> Because task_cs() is protected by task_lock() which is not irq-safe,
> and cpuset_cpus_allowed_locked() takes this lock.
You're right, and lockdep doesn't normally warn about that because
nobody really hits this path :/
> We need more changes in cpuset.c. Btw, select_fallback_rq() takes
> rcu_read_lock around cpuset_cpus_allowed_locked(). Why? I must have
> missed something, but afaics this buys nothing.
for task_cs() iirc.
> From the previous email:
>
> On 03/10, Peter Zijlstra wrote:
> >
> > On Wed, 2010-03-10 at 18:30 +0100, Oleg Nesterov wrote:
> > > On 03/10, Peter Zijlstra wrote:
> > > >
> > > > I guess the quick fix is to really bail and always use cpu_online_mask
> > > > in select_fallback_rq().
> > >
> > > Yes, but this breaks cpusets.
> >
> > Arguably, so, but you can also argue that binding a task to a cpu and
> > then unplugging that cpu without first fixing up the affinity is a 'you
> > get to keep both pieces' situation.
>
> Well yes, but still it was supposed the kernel should handle this case
> correctly, the task shouldn't escape its cpuset.
>
> However, currently I don't see another option. I think we should fix the
> possible deadlocks and kill cpuset_lock/cpuset_cpus_allowed_locked first,
> then try to fix cpusets.
>
> See the trivial (uncompiled) patch below. It just states a fact
> cpuset_cpus_allowed() logic is broken.
>
> How can we fix this later? Perhaps we can change
> cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and
> fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask.
Problem is, we can't really fix up tasks, wakeup must be able to find a
suitable cpu.
> At first glance this should work in try_to_wake_up(p) case, we can't
> race with cpuset_change_cpumask()/etc because of TASK_WAKING logic.
Well, cs->cpus_possible can still go funny on us.
> But I am not sure how can we fix move_task_off_dead_cpu(). I think
> __migrate_task_irq() itself is fine, but if select_fallback_rq() is
> called from move_task_off_dead_cpu() nothing protects ->cpus_allowed.
It has that retry loop in case the migration fails, right?
> We can race with cpusets, or even with the plain set_cpus_allowed().
> Probably nothing really bad can happen, if the resulting cpumask
> doesn't have online cpus due to the racing memcpys, we should retry
> after __migrate_task_irq() fails. Or we can take cpu_rq(cpu)-lock
> around cpumask_copy(p->cpus_allowed, cpu_possible_mask).
It does the retry thing.
> sched_exec() seems fine, the task is current and running,
> "No more Mr. Nice Guy." case is not possible.
>
> What do you think?
>
> Btw, I think there is a small bug in set_cpus_allowed_ptr(),
> wake_up_process(rq->migration_thread) can hit NULL, we should do
> wake_up_process(mt).
Agreed.
> @@ -2289,10 +2289,9 @@ static int select_fallback_rq(int cpu, s
>
> /* No more Mr. Nice Guy. */
> if (dest_cpu >= nr_cpu_ids) {
> - rcu_read_lock();
> - cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> - rcu_read_unlock();
> - dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
> + // XXX: take cpu_rq(cpu)->lock ???
> + cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
> + dest_cpu = cpumask_any(cpu_active_mask);
Right, this seems safe.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 3 of 5 ==
Date: Thurs, Mar 11 2010 7:50 am
From: Peter Zijlstra
On Thu, 2010-03-11 at 16:22 +0100, Oleg Nesterov wrote:
> On 03/11, Oleg Nesterov wrote:
> >
> > How can we fix this later? Perhaps we can change
> > cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and
> > fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask.
>
> Wait. We need to fix the CPU_DEAD case anyway?
>
> Hmm. 6ad4c18884e864cf4c77f9074d3d1816063f99cd
> "sched: Fix balance vs hotplug race" did s/CPU_DEAD/CPU_DOWN_PREPARE/
> in cpuset_track_online_cpus(). This doesn't look exactly right to me,
> we shouldn't do remove_tasks_in_empty_cpuset() at CPU_DOWN_PREPARE
> stage, it can fail.
Sure, tough luck for those few tasks.
> Otoh. This means that move_task_of_dead_cpu() can never see the
> task without active cpus in ->cpus_allowed, it is called later by
> CPU_DEAD. So, cpuset_lock() is not needed at all.
Right,.. so the whole problem is cpumask ops are terribly expensive
since we got this CONFIG_CPUMASK_OFFSTACK muck, so we try to reduce
these ops in the regular scheduling paths, in the patch you referenced
above the tradeof was between fixing the sched_domains up too often vs
adding a cpumask_and in a hot-path, guess who won ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 4 of 5 ==
Date: Thurs, Mar 11 2010 8:30 am
From: Peter Zijlstra
On Thu, 2010-03-11 at 17:19 +0100, Oleg Nesterov wrote:
> > > How can we fix this later? Perhaps we can change
> > > cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and
> > > fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask.
> >
> > Problem is, we can't really fix up tasks, wakeup must be able to find a
> > suitable cpu.
>
> Yes sure. I meant, wakeup()->select_fallback_rq() sets cpus_allowed =
> cpu_possible_map as we discussed. Then cpuset_track_online_cpus(CPU_DEAD)
> fixes the affected tasks.
Ah, have that re-validate the p->cpus_allowed for all cpuset tasks, ok
that might work.
> > > At first glance this should work in try_to_wake_up(p) case, we can't
> > > race with cpuset_change_cpumask()/etc because of TASK_WAKING logic.
> >
> > Well, cs->cpus_possible can still go funny on us.
>
> What do you mean? Afaics, cpusets always uses set_cpus_allowed() to
> change task->cpus_allowed.
Confusion^2 ;-), I failed to grasp your fixup idea and got confused,
which confused you.
> > > But I am not sure how can we fix move_task_off_dead_cpu(). I think
> > > __migrate_task_irq() itself is fine, but if select_fallback_rq() is
> > > called from move_task_off_dead_cpu() nothing protects ->cpus_allowed.
> >
> > It has that retry loop in case the migration fails, right?
> >
> > > We can race with cpusets, or even with the plain set_cpus_allowed().
> > > Probably nothing really bad can happen, if the resulting cpumask
> > > doesn't have online cpus due to the racing memcpys, we should retry
> > > after __migrate_task_irq() fails. Or we can take cpu_rq(cpu)-lock
> > > around cpumask_copy(p->cpus_allowed, cpu_possible_mask).
> >
> > It does the retry thing.
>
> Yes, I mentioned retry logic too. But it can't always help, even without
> cpusets.
>
> Suppose a task T is bound to the dead CPU, and move_task_off_dead_cpu()
> races with set_cpus_allowed(new_mask). I think it is fine if T gets
> either new_mask or cpu_possible_map in ->cpus_allowed. But, it can get
> a "random" mix if 2 memcpy() run in parallel. And it is possible that
> __migrate_task_irq() will not fail if dest_cpu falls into resulting mask.
Ah indeed. One would almost construct a cpumask_assign that uses RCU
atomic pointer assignment for all this stupid cpumask juggling :/
> > > @@ -2289,10 +2289,9 @@ static int select_fallback_rq(int cpu, s
> > >
> > > /* No more Mr. Nice Guy. */
> > > if (dest_cpu >= nr_cpu_ids) {
> > > - rcu_read_lock();
> > > - cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> > > - rcu_read_unlock();
> > > - dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
> > > + // XXX: take cpu_rq(cpu)->lock ???
> > > + cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
> > > + dest_cpu = cpumask_any(cpu_active_mask);
> >
> >
> > Right, this seems safe.
>
> OK, I'll try to read this code a bit more and then send this patch.
Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 5 of 5 ==
Date: Thurs, Mar 11 2010 8:30 am
From: Oleg Nesterov
On 03/11, Peter Zijlstra wrote:
>
> On Thu, 2010-03-11 at 15:52 +0100, Oleg Nesterov wrote:
>
> > Btw, select_fallback_rq() takes
> > rcu_read_lock around cpuset_cpus_allowed_locked(). Why? I must have
> > missed something, but afaics this buys nothing.
>
> for task_cs() iirc.
it should be stable under task_lock()... Never mind.
> > How can we fix this later? Perhaps we can change
> > cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and
> > fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask.
>
> Problem is, we can't really fix up tasks, wakeup must be able to find a
> suitable cpu.
Yes sure. I meant, wakeup()->select_fallback_rq() sets cpus_allowed =
cpu_possible_map as we discussed. Then cpuset_track_online_cpus(CPU_DEAD)
fixes the affected tasks.
> > At first glance this should work in try_to_wake_up(p) case, we can't
> > race with cpuset_change_cpumask()/etc because of TASK_WAKING logic.
>
> Well, cs->cpus_possible can still go funny on us.
What do you mean? Afaics, cpusets always uses set_cpus_allowed() to
change task->cpus_allowed.
> > But I am not sure how can we fix move_task_off_dead_cpu(). I think
> > __migrate_task_irq() itself is fine, but if select_fallback_rq() is
> > called from move_task_off_dead_cpu() nothing protects ->cpus_allowed.
>
> It has that retry loop in case the migration fails, right?
>
> > We can race with cpusets, or even with the plain set_cpus_allowed().
> > Probably nothing really bad can happen, if the resulting cpumask
> > doesn't have online cpus due to the racing memcpys, we should retry
> > after __migrate_task_irq() fails. Or we can take cpu_rq(cpu)-lock
> > around cpumask_copy(p->cpus_allowed, cpu_possible_mask).
>
> It does the retry thing.
Yes, I mentioned retry logic too. But it can't always help, even without
cpusets.
Suppose a task T is bound to the dead CPU, and move_task_off_dead_cpu()
races with set_cpus_allowed(new_mask). I think it is fine if T gets
either new_mask or cpu_possible_map in ->cpus_allowed. But, it can get
a "random" mix if 2 memcpy() run in parallel. And it is possible that
__migrate_task_irq() will not fail if dest_cpu falls into resulting mask.
> > @@ -2289,10 +2289,9 @@ static int select_fallback_rq(int cpu, s
> >
> > /* No more Mr. Nice Guy. */
> > if (dest_cpu >= nr_cpu_ids) {
> > - rcu_read_lock();
> > - cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> > - rcu_read_unlock();
> > - dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
> > + // XXX: take cpu_rq(cpu)->lock ???
> > + cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
> > + dest_cpu = cpumask_any(cpu_active_mask);
>
>
> Right, this seems safe.
OK, I'll try to read this code a bit more and then send this patch.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: bit errors on spitz
http://groups.google.com/group/linux.kernel/t/32584eab7e27d325?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 7:50 am
From: Stanislav Brabec
Andy Green wrote:
> I saw very similar failures for a long time on our iMX31 based device.
> Eventually I found a Freescale errata where the RAM inside the USB2
> macrocell started to make single bit errors below 1.38V Vcore; ours was
> 1.4V at that time but dipped on CPU load.
Good tip. It seems that nobody ported driver for the voltage control
chip ISL6271 from 2.4 kernel, and bootloader probably does not set
correct values.
Datasheet:
http://www.penguin.cz/~utx/zaurus/datasheets/power/Xscale/ISL6271.pdf
--
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: oprofile, perf, x86: introduce new functions to reserve perfctrs
http://groups.google.com/group/linux.kernel/t/716ea40ac646a344?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 7:50 am
From: Robert Richter
On 11.03.10 13:47:16, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> > Alternatively, could we maybe further simplify this reservation into:
> >
> > int reserve_pmu(void);
> > void release_pmu(void);
> >
> > And not bother with anything finer grained.
>
> Yeah, that looks quite a bit simpler.
It does not solve the current problem that some parts of the pmu *are*
used simultaneously by different subsystems. But, even if only perf
would be used in the kernel you still can't be sure that all parts of
the pmu are available to be used, you simply don't have it under your
control. So why such limitations as an 'int reserve_pmu(int index)' is
almost the same but provides much more flexibility?
The question already arose if the watchdog would use perf permanently
and thus would block oprofile by making it unusable. The current
reservation code would provide a framework to solves this, sharing
perfctrs with watchdog, perf and oprofile. And, since the pmu becomes
more and more features other than perfctrs, why shouldn't it be
possible to run one feature with perf and the other with oprofile?
> It's all about making it easier to live with legacies anyway - all modern
> facilities will use perf events to access the PMU.
Scheduling events with perf is also 'some sort' of reservation, so
this code could be moved later into perf at all. In this case we also
will have to be able to reserve single counters or features by its
index.
For now, I don't think it is possible to change oprofile to use perf
in a big bang. This will disrupt oprofile users. I want to do the
switch to perf in a series of small changes and patch sets to make
sure, oprofile will not break. And this new reservation code is a step
towards perf.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Squashfs updates for 2.6.34-rc1
http://groups.google.com/group/linux.kernel/t/c4f8fdfe2329a906?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 7:50 am
From: Woody Suwalski
Phillip Lougher wrote:
> Hi Linus,
>
> Please consider pulling the following revised Squashfs update for
> 2.6.34-rc1.
> I have removed all the lzma/bunzip2/inflate/lzo code changes (which as
> far as I
> know were the blocking issue previously).
>
> What remains in the pull request is a clean-up/refactoring of the zlib
> wrapper code, outline knowledge of lzma/lzo compressed filesystems
> (unsupported,
> but it gives users an understandable error message when they try to
> mount them),
> and some trivial code tidying.
>
> Please pull.
>
> Thanks
>
> Phillip
>
> -------------
>
> The following changes since commit
> 7284ce6c9f6153d1777df5f310c959724d1bd446:
> Linus Torvalds (1):
> Linux 2.6.33-rc4
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus.git
> master
>
> Phillip Lougher (6):
> Squashfs: move zlib decompression wrapper code into a separate file
> Squashfs: factor out remaining zlib dependencies into separate
> wrapper file
> Squashfs: add a decompressor framework
> Squashfs: add decompressor entries for lzma and lzo
> Squashfs: get rid of obsolete variable in struct squashfs_sb_info
> Squashfs: get rid of obsolete definition in header file
>
> fs/squashfs/Makefile | 2 +-
> fs/squashfs/block.c | 76 ++--------------------
> fs/squashfs/cache.c | 1 -
> fs/squashfs/decompressor.c | 68 +++++++++++++++++++
> fs/squashfs/decompressor.h | 55 +++++++++++++++
> fs/squashfs/dir.c | 1 -
> fs/squashfs/export.c | 1 -
> fs/squashfs/file.c | 1 -
> fs/squashfs/fragment.c | 1 -
> fs/squashfs/id.c | 1 -
> fs/squashfs/inode.c | 1 -
> fs/squashfs/namei.c | 1 -
> fs/squashfs/squashfs.h | 8 ++-
> fs/squashfs/squashfs_fs.h | 6 +-
> fs/squashfs/squashfs_fs_sb.h | 40 ++++++------
> fs/squashfs/super.c | 49 +++++++-------
> fs/squashfs/symlink.c | 1 -
> fs/squashfs/zlib_wrapper.c | 150
> ++++++++++++++++++++++++++++++++++++++++++
> 18 files changed, 335 insertions(+), 128 deletions(-)
> create mode 100644 fs/squashfs/decompressor.c
> create mode 100644 fs/squashfs/decompressor.h
> create mode 100644 fs/squashfs/zlib_wrapper.c
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Current 2.6.34-rc1 squashfs is broken.
I have a large (500M) squashfs root filesystem, which is mounted from
initramfs.
After mounting I can chroot into it.
With 2.6.34-rc1 seems that some directories are corrupted - e.g. /lib
may have only files up to libnss (alphabetically), or that some binaries
are not executing - because I see they are all NULLs on read, e.t.c.
Basically - the root is not usable.
Replacing the fs/squashfs source dir with a 2.6.33's one fixes the
problem - so it is the new code issue...
Woody
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Race condition in TTY ldisc layer in 2.6.33
http://groups.google.com/group/linux.kernel/t/0dd856e75c9c2098?hl=en
==============================================================================
== 1 of 4 ==
Date: Thurs, Mar 11 2010 8:00 am
From: Michał Mirosław
On Thu, Mar 11, 2010 at 05:50:09AM -0800, Greg KH wrote:
> On Thu, Mar 11, 2010 at 02:32:22PM +0100, Micha? Miros?aw wrote:
> > Looks like there's some race contition in switching ldiscs on USB serial
> > ports. The following warnings trigger sometimes after killing and restarting
> > process that changes ldisc and waits forever. In case you want to look at
> > the code, there it is: http://rere.qmqm.pl/~mirq/sermmc/
> If you apply git commit 638b9648ab51c9c549ff5735d3de519ef6199df3 to the
> 2.6.33 kernel, does it solve the issue for you?
I'm running with the patch now. After couple cycles of starting and killing
the ldisc-setting process I get no warnings. I'll get back to you when/if
I encounter them again.
Best Regards,
Micha³ Miros³aw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 4 ==
Date: Thurs, Mar 11 2010 8:10 am
From: Michał Mirosław
On Thu, Mar 11, 2010 at 04:50:15PM +0100, Micha³ Miros³aw wrote:
> On Thu, Mar 11, 2010 at 05:50:09AM -0800, Greg KH wrote:
> > On Thu, Mar 11, 2010 at 02:32:22PM +0100, Micha? Miros?aw wrote:
> > > Looks like there's some race contition in switching ldiscs on USB serial
> > > ports. The following warnings trigger sometimes after killing and restarting
> > > process that changes ldisc and waits forever. In case you want to look at
> > > the code, there it is: http://rere.qmqm.pl/~mirq/sermmc/
> > If you apply git commit 638b9648ab51c9c549ff5735d3de519ef6199df3 to the
> > 2.6.33 kernel, does it solve the issue for you?
> I'm running with the patch now. After couple cycles of starting and killing
> the ldisc-setting process I get no warnings. I'll get back to you when/if
> I encounter them again.
Hah. Just seconds after I sent that mail I hit this again. The stack traces
are exactly the same (except for different starting address of
flush_to_ldisc()).
The warnings are triggered by:
struct usb_serial_port *port = tty->driver_data;
...
/* count is managed under the mutex lock for the tty so cannot
drop to zero until after the last close completes */
WARN_ON(!port->port.count);
BTW, Couple of minutes earlier I got this message:
[ 201.629616] cp210x ttyUSB0: usb_serial_generic_resubmit_read_urb - failed resubmitting read urb, error -1
But after that I disconnected and reconnected the device, so this is probably
not relevant here.
Best Regards,
Micha³ Miros³aw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 3 of 4 ==
Date: Thurs, Mar 11 2010 8:30 am
From: Alan Stern
On Thu, 11 Mar 2010, [iso-8859-2] Micha� Miros�aw wrote:
> On Thu, Mar 11, 2010 at 04:50:15PM +0100, Micha� Miros�aw wrote:
> > On Thu, Mar 11, 2010 at 05:50:09AM -0800, Greg KH wrote:
> > > On Thu, Mar 11, 2010 at 02:32:22PM +0100, Micha? Miros?aw wrote:
> > > > Looks like there's some race contition in switching ldiscs on USB serial
> > > > ports. The following warnings trigger sometimes after killing and restarting
> > > > process that changes ldisc and waits forever. In case you want to look at
> > > > the code, there it is: http://rere.qmqm.pl/~mirq/sermmc/
> > > If you apply git commit 638b9648ab51c9c549ff5735d3de519ef6199df3 to the
> > > 2.6.33 kernel, does it solve the issue for you?
> > I'm running with the patch now. After couple cycles of starting and killing
> > the ldisc-setting process I get no warnings. I'll get back to you when/if
> > I encounter them again.
>
> Hah. Just seconds after I sent that mail I hit this again. The stack traces
> are exactly the same (except for different starting address of
> flush_to_ldisc()).
>
> The warnings are triggered by:
>
> struct usb_serial_port *port = tty->driver_data;
> ...
> /* count is managed under the mutex lock for the tty so cannot
> drop to zero until after the last close completes */
> WARN_ON(!port->port.count);
>
> BTW, Couple of minutes earlier I got this message:
>
> [ 201.629616] cp210x ttyUSB0: usb_serial_generic_resubmit_read_urb - failed resubmitting read urb, error -1
>
> But after that I disconnected and reconnected the device, so this is probably
> not relevant here.
It looks like Greg gave you the wrong commit ID. You should use this
one: 49d3380e3f1297ff7bdc700c0a7fe6c3a036b3ab. It removes those
WARN_ON statements entirely.
Actually, you should use both commits.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 4 of 4 ==
Date: Thurs, Mar 11 2010 8:30 am
From: Johan Hovold
On Thu, Mar 11, 2010 at 05:06:53PM +0100, Michał Mirosław wrote:
> On Thu, Mar 11, 2010 at 04:50:15PM +0100, Michał Mirosław wrote:
> > On Thu, Mar 11, 2010 at 05:50:09AM -0800, Greg KH wrote:
> > > On Thu, Mar 11, 2010 at 02:32:22PM +0100, Micha? Miros?aw wrote:
> > > > Looks like there's some race contition in switching ldiscs on USB serial
> > > > ports. The following warnings trigger sometimes after killing and restarting
> > > > process that changes ldisc and waits forever. In case you want to look at
> > > > the code, there it is: http://rere.qmqm.pl/~mirq/sermmc/
> > > If you apply git commit 638b9648ab51c9c549ff5735d3de519ef6199df3 to the
> > > 2.6.33 kernel, does it solve the issue for you?
> > I'm running with the patch now. After couple cycles of starting and killing
> > the ldisc-setting process I get no warnings. I'll get back to you when/if
> > I encounter them again.
>
> Hah. Just seconds after I sent that mail I hit this again. The stack traces
> are exactly the same (except for different starting address of
> flush_to_ldisc()).
>
> The warnings are triggered by:
>
> struct usb_serial_port *port = tty->driver_data;
> ...
> /* count is managed under the mutex lock for the tty so cannot
> drop to zero until after the last close completes */
> WARN_ON(!port->port.count);
This is a false warning that was fixed by commit
49d3380e3f1297ff7bdc700c0a7fe6c3a036b3ab.
> BTW, Couple of minutes earlier I got this message:
>
> [ 201.629616] cp210x ttyUSB0: usb_serial_generic_resubmit_read_urb - failed resubmitting read urb, error -1
>
> But after that I disconnected and reconnected the device, so this is probably
> not relevant here.
You're right, it's unrelated. I submitted a patch a while ago that
removes this message (as it is not an error):
http://thread.gmane.org/gmane.linux.usb.general/28047/focus=28052
Regards,
Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: ATA 4 KiB sector issues.
http://groups.google.com/group/linux.kernel/t/94d9b232ec44429a?hl=en
==============================================================================
== 1 of 4 ==
Date: Thurs, Mar 11 2010 8:10 am
From: Mike Snitzer
On Thu, Mar 11, 2010 at 10:00 AM, Nikanth Karthikesan <knikanth@suse.de> wrote:
> On Thursday 11 March 2010 19:58:11 Theodore Tso wrote:
>> On Mar 11, 2010, at 8:57 AM, Nikanth Karthikesan wrote:
>> > I guess, what he meant was, to keep filesystem blocks aligned, even if
>> > the partition is not. Say if the partition is mis-aligned by 512-bytes,
>> > let the filesystem waste 4k-512bytes and keep it's blocks aligned. But it
>> > might be a case of over-engineering, possibly requiring disk format
>> > change.
>>
>> Ah, yes, I agree with you; that's probably what he meant.
>>
>> Sure, that's theoretically possible, but it would mean changing every
>> single filesystem, and it would require a file system format change --- or
>> at least a file system format extension.
>>
>> It would seem to be way easier to simply fix the partitioning tools to do
>> the right thing, though.
>>
>
> Yes. May be, just a simple but transparent device-mapper like mapping on top
> of the mis-aligned partition, to do the alignment. Then the file-system code
> need not change much.
>
> But Linux already has device-mapper and Linux will not be affected with mis-
> aligned partitions, when we use LVM.
Well, device-mapper and LVM needed to be updated to make them "just
work" but yes that work has been done.
> But the actual problem here is that partitioning tools might create partitions
> that wont allow other operating-systems to boot. So it might be enough, if the
> partitioning tools just create partitions with (mis-)alignment requirement for
> Windows.
I'm not following...
Anyway, 4K drives that are 512b logical and 4K physical may or may not
also have "DOS partition compensation" that use LBA -1 as the first
naturally (4K) aligned start. This means that the partition tools
need to shift the start of the first primary partition to be offset by
3584 bytes (7 512b sectors) for use with Linux. But for windows,
AFAIK windows XP and windows 7 create all partitions aligned on 1MB
boundaries. Linux's parted and fdisk create 1MB aligned partitions
now too.
So the only outlier is older versions of windows (< XP) and Linux (old
fdisk and parted, etc also use DOS partitioning) that don't use
naturally aligned (e.g. 1MB) partition boundaries. In those versions
of Windows and LInux there are ways to change the default start of
sector 63. That said, there is an opportunity to improve
documentation for how to workaround DOS partitioning on these
operating systems.
One other piece worth mentioning on this "IO Toplogy" support in the
entire Linux I/O Stack is the virt layers. hch has already extended
the virt-io protocol and qemu is in the finishing stages of being
updated to properly consume the "IO Topology" information. So we
really don't have any gaps in the Linux I/O stack.
mkp in particular, Jens, James, myself, and others implemented and
refined the SCSI and block changes. kzak, jim meyering, hans de
goede, hch, eric sandeen, bob peterson, myself and others updated all
other I/O stack layers ranging from DM to LVM, libblkid, fdisk, parted
to anaconda to mkfs.ext[234], mkfs.xfs, mkfs.gfs2 to virt-io and qemu.
FYI, all of these advances will be in Fedora 13 (quite a few are
already in Fedora 12).
There are obviously other Linux systems and userland tools (likely
Xen, other mkfs.* and more) that should be updated. Hopefully
maintainers and/or contributors of these projects will follow-up to
address those that need updating.
Again please see:
http://oss.oracle.com/~mkp/docs/linux-advanced-storage.pdf
http://people.redhat.com/msnitzer/docs/io-limits.txt
Some omissions include: Linux MD, which has been updated as mkp
pointed out, and I neglected to talk about virt-io and qemu (but like
I said they have been updated too).
Hopefully we're all closer to being on the same page now.
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 4 ==
Date: Thurs, Mar 11 2010 8:30 am
From: Gene Heskett
On Thursday 11 March 2010, tytso@mit.edu wrote:
>On Thu, Mar 11, 2010 at 08:35:26PM +0530, Nikanth Karthikesan wrote:
>> The real problem, here is just that partitioning-tools should create
>> partitions that can work with both XP as well as Windows7. May be distro
>> installers, should ask the user which compatibility he needs.
>
>4k aligned sectors will *work* with Windows XP, will it not? It's
>just simply a matter of Windows XP, being really ancient, doesn't
>create properly alligned partitions by default.
>
>And how often are we going to see Windows XP systems with these new 4k
>physical sector drives anyway, where the first OS to touch the
>partition is Windows XP? And in the case where this does happy, the
>resulting partition will be result in terribly performance for Windows
>XP as well as Linux.
>
>What's the specific scenario which you are trying to solve, and how
>likely is it to occur in real life?
And potentially one more question from a list lurker, Ted. Where are the
tools that allow us to check and/or adjust that? I ask since I have 3 of
these terrabyte drives in this box now and have no clue how to either check
to see if we're off, or how to fix it if it is. And I have called my self
following this discussion without noting if the tools have been specifically
named.
Thanks
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Authors are easy to get on with -- if you're fond of children.
-- Michael Joseph, "Observer"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 3 of 4 ==
Date: Thurs, Mar 11 2010 8:40 am
From: "H. Peter Anvin"
On 03/11/2010 05:57 AM, Nikanth Karthikesan wrote:
>
> I guess, what he meant was, to keep filesystem blocks aligned, even if the
> partition is not. Say if the partition is mis-aligned by 512-bytes, let the
> filesystem waste 4k-512bytes and keep it's blocks aligned. But it might be a
> case of over-engineering, possibly requiring disk format change.
>
That's basically what you end up having to do for FAT filesystems to be
aligned.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 4 of 4 ==
Date: Thurs, Mar 11 2010 8:40 am
From: Greg Freemyer
On Thu, Mar 11, 2010 at 10:25 AM, <tytso@mit.edu> wrote:
> On Thu, Mar 11, 2010 at 08:35:26PM +0530, Nikanth Karthikesan wrote:
>> The real problem, here is just that partitioning-tools should create
>> partitions that can work with both XP as well as Windows7. May be distro
>> installers, should ask the user which compatibility he needs.
>
> 4k aligned sectors will *work* with Windows XP, will it not? It's
> just simply a matter of Windows XP, being really ancient, doesn't
> create properly alligned partitions by default.
>
> And how often are we going to see Windows XP systems with these new 4k
> physical sector drives anyway, where the first OS to touch the
> partition is Windows XP? And in the case where this does happy, the
> resulting partition will be result in terribly performance for Windows
> XP as well as Linux.
>
> What's the specific scenario which you are trying to solve, and how
> likely is it to occur in real life?
>
> - Ted
Ted,
Apparently the real issue is Win2K, not XP.
It seems to require the boot partition and possibly all partitions
start on a cylinder boundary. And may have 255/63 hard-coded in to
define what a cylinder is. I agree with the apparent consensus that a
2010 era linux partitioner does not need to be Win2K compatible. If
someone wants to install Win2K they will need to either use an older
generation partitioner to create the partitions or use specific
command-line args to force a non-optimal alignment.
I do think the linux partitioners should provide a way to force a
cylinder alignment. Tejun, I would like to see your doc describe how
to force a win2k compatible partition layout.
fyi: The same issue apparently also exists for users still running OS/2.
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: aio: compat_ioctl issue?
http://groups.google.com/group/linux.kernel/t/b4bbeb78a34e8be1?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 8:10 am
From: Jeff Moyer
Michael Tokarev <mjt@tls.msk.ru> writes:
> Michael Tokarev wrote:
>> Jeff Moyer wrote:
>> []
>>>>>> I just come across a situation (next in a long row :)
>>>>>> when on x86, 32bit userspace does not work with 64bit
>>>>>> kernel. This time this is about aio requests.
>>> [snip]
> []
>>> Could you maybe print out the values that are passed to io_getevents?
>>
>> They were in my first email, here it goes again:
>>
>> io_submit: lio_opcode=7 reqprio=0 iov=0x9cd7018{0xf5599000,4096}, niov=1, offset=0
>> io_getevents: expected 4096 got -22 (EINVAL)
>>
>> This is what gets passed to libaio -- strace here
>> does not decode the arguments unfortunately.
> []
>> My *guess* is that it handles read/write correctly but
>> does not properly handle preadv/pwritev (opcode=7 is
>> IO_CMD_PREADV as far as I can see). That'll explain
>> my "testcase" with Oracle which does not use preadv.
>
> Actually, looking at the code in fs/compat.c, I don't see
> where it converts iovecs. Yes it converts iocbs, but for
> readv/writev it also needs to convert iovecs. Oh well,
> that expects to be quite painful... :(
Yeah, whoops. I built the libaio test harness using -m32 and this patch
works for me. Would you mind giving it a try?
Thanks,
Jeff
diff --git a/fs/aio.c b/fs/aio.c
index 1cf12b3..5a38805 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -36,6 +36,7 @@
#include <linux/blkdev.h>
#include <linux/mempool.h>
#include <linux/hash.h>
+#include <linux/compat.h>
#include <asm/kmap_types.h>
#include <asm/uaccess.h>
@@ -1384,13 +1385,20 @@ static ssize_t aio_fsync(struct kiocb *iocb)
return ret;
}
-static ssize_t aio_setup_vectored_rw(int type, struct kiocb *kiocb)
+static ssize_t aio_setup_vectored_rw(int type, struct kiocb *kiocb, bool compat)
{
ssize_t ret;
- ret = rw_copy_check_uvector(type, (struct iovec __user *)kiocb->ki_buf,
- kiocb->ki_nbytes, 1,
- &kiocb->ki_inline_vec, &kiocb->ki_iovec);
+ if (compat)
+ ret = compat_rw_copy_check_uvector(type,
+ (struct compat_iovec __user *)kiocb->ki_buf,
+ kiocb->ki_nbytes, 1, &kiocb->ki_inline_vec,
+ &kiocb->ki_iovec);
+ else
+ ret = rw_copy_check_uvector(type,
+ (struct iovec __user *)kiocb->ki_buf,
+ kiocb->ki_nbytes, 1, &kiocb->ki_inline_vec,
+ &kiocb->ki_iovec);
if (ret < 0)
goto out;
@@ -1420,7 +1428,7 @@ static ssize_t aio_setup_single_vector(struct kiocb *kiocb)
* Performs the initial checks and aio retry method
* setup for the kiocb at the time of io submission.
*/
-static ssize_t aio_setup_iocb(struct kiocb *kiocb)
+static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
{
struct file *file = kiocb->ki_filp;
ssize_t ret = 0;
@@ -1469,7 +1477,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb)
ret = security_file_permission(file, MAY_READ);
if (unlikely(ret))
break;
- ret = aio_setup_vectored_rw(READ, kiocb);
+ ret = aio_setup_vectored_rw(READ, kiocb, compat);
if (ret)
break;
ret = -EINVAL;
@@ -1483,7 +1491,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb)
ret = security_file_permission(file, MAY_WRITE);
if (unlikely(ret))
break;
- ret = aio_setup_vectored_rw(WRITE, kiocb);
+ ret = aio_setup_vectored_rw(WRITE, kiocb, compat);
if (ret)
break;
ret = -EINVAL;
@@ -1548,7 +1556,8 @@ static void aio_batch_free(struct hlist_head *batch_hash)
}
static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb, struct hlist_head *batch_hash)
+ struct iocb *iocb, struct hlist_head *batch_hash,
+ bool compat)
{
struct kiocb *req;
struct file *file;
@@ -1609,7 +1618,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
req->ki_opcode = iocb->aio_lio_opcode;
- ret = aio_setup_iocb(req);
+ ret = aio_setup_iocb(req, compat);
if (ret)
goto out_put_req;
@@ -1637,20 +1646,8 @@ out_put_req:
return ret;
}
-/* sys_io_submit:
- * Queue the nr iocbs pointed to by iocbpp for processing. Returns
- * the number of iocbs queued. May return -EINVAL if the aio_context
- * specified by ctx_id is invalid, if nr is < 0, if the iocb at
- * *iocbpp[0] is not properly initialized, if the operation specified
- * is invalid for the file descriptor in the iocb. May fail with
- * -EFAULT if any of the data structures point to invalid data. May
- * fail with -EBADF if the file descriptor specified in the first
- * iocb is invalid. May fail with -EAGAIN if insufficient resources
- * are available to queue any iocbs. Will return 0 if nr is 0. Will
- * fail with -ENOSYS if not implemented.
- */
-SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
- struct iocb __user * __user *, iocbpp)
+long do_io_submit(aio_context_t ctx_id, long nr,
+ struct iocb __user *__user * iocbpp, bool compat)
{
struct kioctx *ctx;
long ret = 0;
@@ -1687,7 +1684,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
break;
}
- ret = io_submit_one(ctx, user_iocb, &tmp, batch_hash);
+ ret = io_submit_one(ctx, user_iocb, &tmp, batch_hash, compat);
if (ret)
break;
}
@@ -1696,6 +1693,24 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
put_ioctx(ctx);
return i ? i : ret;
}
+
+/* sys_io_submit:
+ * Queue the nr iocbs pointed to by iocbpp for processing. Returns
+ * the number of iocbs queued. May return -EINVAL if the aio_context
+ * specified by ctx_id is invalid, if nr is < 0, if the iocb at
+ * *iocbpp[0] is not properly initialized, if the operation specified
+ * is invalid for the file descriptor in the iocb. May fail with
+ * -EFAULT if any of the data structures point to invalid data. May
+ * fail with -EBADF if the file descriptor specified in the first
+ * iocb is invalid. May fail with -EAGAIN if insufficient resources
+ * are available to queue any iocbs. Will return 0 if nr is 0. Will
+ * fail with -ENOSYS if not implemented.
+ */
+SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
+ struct iocb __user * __user *, iocbpp)
+{
+ return do_io_submit(ctx_id, nr, iocbpp, 0);
+}
/* lookup_kiocb
* Finds a given iocb for cancellation.
diff --git a/fs/compat.c b/fs/compat.c
index 00d90c2..340f20d 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -567,6 +567,61 @@ out:
return ret;
}
+/* A write operation does a read from user space and vice versa */
+#define vrfy_dir(type) ((type) == READ ? VERIFY_WRITE : VERIFY_READ)
+
+ssize_t compat_rw_copy_check_uvector(int type,
+ const struct compat_iovec __user *uiov32, unsigned long niov,
+ unsigned long fast_segs, struct iovec *fast_pointer,
+ struct iovec **ret_pointer)
+{
+ ssize_t tot_len = 0;
+ struct iovec *kiov = fast_pointer;
+
+ if (niov == 0)
+ goto out;
+ if (niov > UIO_MAXIOV) {
+ tot_len = -EINVAL;
+ goto out;
+ }
+ if (niov > fast_segs) {
+ kiov = kmalloc(niov*sizeof(struct iovec), GFP_KERNEL);
+ if (kiov == NULL) {
+ tot_len = -ENOMEM;
+ goto out;
+ }
+ *ret_pointer = kiov;
+ }
+
+ while (niov > 0) {
+ compat_uptr_t buf;
+ compat_size_t len;
+
+ if (get_user(len, &uiov32->iov_len) ||
+ get_user(buf, &uiov32->iov_base)) {
+ tot_len = -EFAULT;
+ goto out;
+ }
+ if (len < 0 || (tot_len + len < tot_len)) {
+ tot_len = -EINVAL;
+ goto out;
+ }
+ if (!access_ok(vrfy_dir(type), buf, len)) {
+ tot_len = -EFAULT;
+ goto out;
+ }
+ tot_len += len;
+ kiov->iov_base = compat_ptr(buf);
+ kiov->iov_len = (__kernel_size_t) len;
+ uiov32++;
+ kiov++;
+ niov--;
+ }
+
+out:
+ return tot_len;
+}
+
static inline long
copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
{
@@ -599,7 +654,7 @@ compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
ret = copy_iocb(nr, iocb, iocb64);
if (!ret)
- ret = sys_io_submit(ctx_id, nr, iocb64);
+ ret = do_io_submit(ctx_id, nr, iocb64, 1);
return ret;
}
diff --git a/include/linux/aio.h b/include/linux/aio.h
index 811dbb3..54b6ef8 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -212,6 +212,8 @@ extern void kick_iocb(struct kiocb *iocb);
extern int aio_complete(struct kiocb *iocb, long res, long res2);
struct mm_struct;
extern void exit_aio(struct mm_struct *mm);
+extern long do_io_submit(aio_context_t ctx_id, long nr,
+ struct iocb __user *__user * iocbpp, bool compat);
#else
static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
static inline int aio_put_req(struct kiocb *iocb) { return 0; }
@@ -219,6 +221,9 @@ static inline void kick_iocb(struct kiocb *iocb) { }
static inline int aio_complete(struct kiocb *iocb, long res, long res2) { return 0; }
struct mm_struct;
static inline void exit_aio(struct mm_struct *mm) { }
+static inline long do_io_submit(aio_context_t ctx_id, long nr,
+ struct iocb __user * __user * iocbpp,
+ bool compat) { }

0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home