twitter: linux.kernel - 26 new messages in 16 topics

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

Today's topics:

* [RFC] futex: FUTEX_LOCK with optional adaptive spinning - 5 messages, 3
authors
http://groups.google.com/group/linux.kernel/t/7e8797cb4a1b598b?hl=en
* combine nmi_watchdog and softlockup - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/e11a086865962e7a?hl=en
* mm,migration: Allow the migration of PageSwapCache pages - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/90e8b21b6e197190?hl=en
* ntfs: use add_to_page_cache_lru - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/39a26d0d748e53ee?hl=en
* Btrfs updates - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b82fa5a8ba9bce86?hl=en
* hackbench regression due to commit 9dfc6e68bfe6e - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/234a5e1140b9914e?hl=en
* Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3) - 5
messages, 3 authors
http://groups.google.com/group/linux.kernel/t/eec470a501781823?hl=en
* x86: let 'reservetop' functioning right - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/eaaf1a980d8f3f01?hl=en
* perf, x86: Enable Nehalem-EX support - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/484d02ae823cd092?hl=en
* Extended partition mapping wrong size - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/acd203ae293d289b?hl=en
* libertas/sdio: 8686: set ECSI bit for 1-bit transfers - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/207b2568b9561ec6?hl=en
* OMAP: Fix for bus width which improves SD card's peformance. - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/b7a61b5d5d3b2476?hl=en
* NFS: Fix RCU warnings in nfs_inode_return_delegation_noreclaim() [ver #2] -
1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/efc3d1e6a49a4cbe?hl=en
* An incorrect assumption over radix_tree_tag_get() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/bcc75fb93cd52d92?hl=en
* regulator: regulator_get behaviour without CONFIG_REGULATOR set - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/30191bf6a382ba77?hl=en
* VMware Balloon driver - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/bcacb0fe945d573b?hl=en

==============================================================================
TOPIC: [RFC] futex: FUTEX_LOCK with optional adaptive spinning
http://groups.google.com/group/linux.kernel/t/7e8797cb4a1b598b?hl=en
==============================================================================

== 1 of 5 ==
Date: Tues, Apr 6 2010 8:40 am
From: Peter Zijlstra

On Tue, 2010-04-06 at 08:33 -0700, Darren Hart wrote:
> Peter Zijlstra wrote:
> > On Tue, 2010-04-06 at 07:47 -0700, Ulrich Drepper wrote:
> >> On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz@infradead.org>
> >> wrote:
> >>> try
> >>> spin
> >>> try
> >>> syscall
> >> This is available for a long time in the mutex implementation
> >> (PTHREAD_MUTEX_ADAPTIVE_NP mutex type). It hasn't show much
> >> improvement if any. There were some people demanding this support for
> >> as far as I know they are not using it now. This is adaptive
> >> spinning, learning from previous calls how long to wait. But it's
> >> still unguided. There is no way to get information like "the owner
> >> has been descheduled".
> >
> > That's where the FUTEX_LOCK thing comes in, it does all those, the above
> > was a single spin loop to amortize the syscall overhead.
> >
> > I wouldn't make it any more complex than a single pause ins, syscalls
> > are terribly cheap these days.
>
> And yet they still seem to have a real impact on the futex_lock
> benchmark. Perhaps I am just still looking at pathological cases, but
> there is a strong correlation between high syscall counts and really low
> iterations per second. Granted this also correlates with lock
> contention. However, when using the same period and duty-cycle I find
> that a locking mechanism that makes significantly fewer syscalls also
> significantly outperforms one that makes more. Kind of handwavy stilly,
> I'll have more numbers this afternoon.

Sure, but I'm still not sure why FUTEX_LOCK ends up making more syscalls
than FUTEX_WAIT based locking. Both should only do the syscall when the
lock is contended, both should only ever do 1 syscall per acquire,
right?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 5 ==
Date: Tues, Apr 6 2010 9:10 am
From: Avi Kivity

On 04/06/2010 06:28 PM, Darren Hart wrote:
> Alan Cox wrote:
>> On Tue, 06 Apr 2010 15:35:31 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>>>> Yes, but that's the best case for spinning. You could simply use a
>>>> userspace spinlock in this case.
>>> Userspace spinlocks are evil.. they should _never_ be used.
>>
>> Thats a gross and inaccurate simplification. For the case Avi is talking
>> about spinning in userspace makes sense in a lot of environments. Once
>> you've got one thread pinned per cpu (or gang scheduling >-) ) there are
>> various environments where it makes complete and utter sense.
>
> Hi Alan,
>
> Do you feel some of these situations would also benefit from some
> kernel assistance to stop spinning when the owner schedules out? Or
> are you saying that there are situations where pure userspace
> spinlocks will always be the best option?
>
> If the latter, I'd think that they would also be situations where
> sched_yield() is not used as part of the spin loop. If so, then these
> are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to
> provide a better informed mechanism for making spin or sleep
> decisions. If sleeping isn't part of the locking construct
> implementation, then FUTEX_LOCK_ADAPTIVE doesn't have much to offer.

IMO the best solution is to spin in userspace while the lock holder is
running, fall into the kernel when it is scheduled out.

--
error compiling committee.c: too many arguments to function

== 3 of 5 ==
Date: Tues, Apr 6 2010 9:20 am
From: Thomas Gleixner

On Tue, 6 Apr 2010, Avi Kivity wrote:

> On 04/06/2010 06:28 PM, Darren Hart wrote:
> > Alan Cox wrote:
> > > On Tue, 06 Apr 2010 15:35:31 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
> > > > > Yes, but that's the best case for spinning. You could simply use a
> > > > > userspace spinlock in this case.
> > > > Userspace spinlocks are evil.. they should _never_ be used.
> > >
> > > Thats a gross and inaccurate simplification. For the case Avi is talking
> > > about spinning in userspace makes sense in a lot of environments. Once
> > > you've got one thread pinned per cpu (or gang scheduling >-) ) there are
> > > various environments where it makes complete and utter sense.
> >
> > Hi Alan,
> >
> > Do you feel some of these situations would also benefit from some kernel
> > assistance to stop spinning when the owner schedules out? Or are you saying
> > that there are situations where pure userspace spinlocks will always be the
> > best option?
> >
> > If the latter, I'd think that they would also be situations where
> > sched_yield() is not used as part of the spin loop. If so, then these are
> > not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to provide a
> > better informed mechanism for making spin or sleep decisions. If sleeping
> > isn't part of the locking construct implementation, then FUTEX_LOCK_ADAPTIVE
> > doesn't have much to offer.
>
> IMO the best solution is to spin in userspace while the lock holder is
> running, fall into the kernel when it is scheduled out.

That's just not realistic as user space has no idea whether the lock
holder is running or not and when it's scheduled out without a syscall :)

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 4 of 5 ==
Date: Tues, Apr 6 2010 9:20 am
From: Avi Kivity

On 04/06/2010 05:09 PM, Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 16:41 +0300, Avi Kivity wrote:
>
>> On 04/06/2010 04:35 PM, Peter Zijlstra wrote:
>>
>>> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>>>
>>>
>>>> Yes, but that's the best case for spinning. You could simply use a
>>>> userspace spinlock in this case.
>>>>
>>>>
>>> Userspace spinlocks are evil.. they should _never_ be used.
>>>
>>>
>> But in this case they're fastest. If we don't provide a non-evil
>> alternative, people will use them.
>>
>>
> That's what FUTEX_LOCK is about.
>

That works for the uncontended case. For the contended case, the waiter
and the owner have to go into the kernel and back out to transfer
ownership. In the non-adaptive case you have to switch to the idle task
and back as well, and send an IPI. That's a lot of latency if the
unlock happened just after the waiter started the descent into the kernel.

--
error compiling committee.c: too many arguments to function

== 5 of 5 ==
Date: Tues, Apr 6 2010 9:30 am
From: Avi Kivity

On 04/06/2010 07:14 PM, Thomas Gleixner wrote:
>
>> IMO the best solution is to spin in userspace while the lock holder is
>> running, fall into the kernel when it is scheduled out.
>>
> That's just not realistic as user space has no idea whether the lock
> holder is running or not and when it's scheduled out without a syscall :)
>

The kernel could easily expose this information by writing into the
thread's TLS area.

So:

- the kernel maintains a current_cpu field in a thread's tls
- lock() atomically writes a pointer to the current thread's current_cpu
when acquiring
- the kernel writes an invalid value to current_cpu when switching out
- a contended lock() retrieves the current_cpu pointer, and spins as
long as it is a valid cpu

--
error compiling committee.c: too many arguments to function

==============================================================================
TOPIC: combine nmi_watchdog and softlockup
http://groups.google.com/group/linux.kernel/t/e11a086865962e7a?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 8:40 am
From: Cyrill Gorcunov

On Tue, Apr 06, 2010 at 04:13:30PM +0200, Frederic Weisbecker wrote:
[...]
> > +static int watchdog_enable(int cpu)
> > +{
> > + struct perf_event_attr *wd_attr;
> > + struct perf_event *event = per_cpu(watchdog_ev, cpu);
> > + struct task_struct *p = per_cpu(softlockup_watchdog, cpu);
> > +
> > + /* is it already setup and enabled? */
> > + if (event && event->state > PERF_EVENT_STATE_OFF)
> > + goto out;
> > +
> > + /* it is setup but not enabled */
> > + if (event != NULL)
> > + goto out_enable;
> > +
> > + /* Try to register using hardware perf events first */
> > + wd_attr = &wd_hw_attr;
> > + wd_attr->sample_period = hw_nmi_get_sample_period();
> > + event = perf_event_create_kernel_counter(wd_attr, cpu, -1, watchdog_overflow_callback);
> > + if (!IS_ERR(event)) {
> > + printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
> > + goto out_save;
> > + }
> > +
> > + /* hardware doesn't exist or not supported, fallback to software events */
> > + printk(KERN_INFO "NMI watchdog: hardware not available, trying software events\n");
> > + wd_attr = &wd_sw_attr;
> > + wd_attr->sample_period = softlockup_thresh * NSEC_PER_SEC;
> > + event = perf_event_create_kernel_counter(wd_attr, cpu, -1, watchdog_overflow_callback);
>
> I fear the cpu clock is not going to help you detecting any hard lockups.
> If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> not going to fire.
>

I guess it's not supposed to. For such cases only nmi irqs may help for which
the perf events are there (/me need to check if we program apic timer for anything
like that). But it should help for other deadlocks. Or I miss something?

-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mm,migration: Allow the migration of PageSwapCache pages
http://groups.google.com/group/linux.kernel/t/90e8b21b6e197190?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 8:50 am
From: Minchan Kim

On Sat, Apr 3, 2010 at 1:02 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> PageAnon pages that are unmapped may or may not have an anon_vma so are
> not currently migrated. However, a swap cache page can be migrated and
> fits this description. This patch identifies page swap caches and allows
> them to be migrated but ensures that no attempt to made to remap the pages
> would would potentially try to access an already freed anon_vma.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Thanks for your effort, Mel.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ntfs: use add_to_page_cache_lru
http://groups.google.com/group/linux.kernel/t/39a26d0d748e53ee?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Apr 6 2010 8:50 am
From: Nick Piggin

On Tue, Apr 06, 2010 at 12:46:05AM +0900, Minchan Kim wrote:
>
> While I discuss exportable add_to_page_cache_lru,
> http://marc.info/?l=linux-mm&m=127047788612445&w=2
> I found that
> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html.
>
> Let's change to use it in ntfs.

Thanks, I went through most filesystems and changed it in those too,
including ntfs. Haven't had much feedback but I'll just send the
patches to Andrew if they haven't been merged by next release.

== 2 of 2 ==
Date: Tues, Apr 6 2010 9:40 am
From: Minchan Kim

Hi, Nick.

On Wed, Apr 7, 2010 at 12:47 AM, Nick Piggin <npiggin@suse.de> wrote:
> On Tue, Apr 06, 2010 at 12:46:05AM +0900, Minchan Kim wrote:
>>
>> While I discuss exportable add_to_page_cache_lru,
>> http://marc.info/?l=linux-mm&m=127047788612445&w=2
>> I found that
>> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html.
>>
>> Let's change to use it in ntfs.
>
> Thanks, I went through most filesystems and changed it in those too,
> including ntfs. Haven't had much feedback but I'll just send the
> patches to Andrew if they haven't been merged by next release.
>

Please, let me know it.
That's because actually I had a plan to change about some filesystem.
If you already did it, I don't need to do. :)

Thanks, Nick.

==============================================================================
TOPIC: Btrfs updates
http://groups.google.com/group/linux.kernel/t/b82fa5a8ba9bce86?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 8:50 am
From: Chris Mason

On Mon, Apr 05, 2010 at 03:36:37PM -0400, Chris Mason wrote:
> Hello everyone,
>
> The master branch of the btrfs-unstable repo has a collection of fixes:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git master
>

That pull request was missing another enospc fix from Josef, and Yan
Zheng noticed we were sometimes trying to allocate zero sized chunks
from the disk pool. I've updated the git tree with two new commits,
here is the corrected shortlog:

Josef Bacik (5) commits (+47/-96):
Btrfs: fail to mount if we have problems reading the block groups (+12/-4)
Btrfs: fix small race with delalloc flushing waitqueue's (+4/-5)
Btrfs: fix chunk allocate size calculation (+3/-1)
Btrfs: fix data enospc check overflow (+15/-5)
Btrfs: kill max_extent mount option (+13/-81)

Zhao Lei (3) commits (+9/-12):
Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() (+2/-2)
Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() (+2/-0)
Btrfs: Remove unnecessary finish_wait() in wait_current_trans() (+5/-10)

Dan Carpenter (3) commits (+5/-1):
Btrfs: handle kmalloc() failure in inode lookup ioctl (+3/-0)
Btrfs: check btrfs_get_extent return for IS_ERR() (+1/-1)
Btrfs: dereferencing freed memory (+1/-0)

Chris Mason (2) commits (+10/-0):
Btrfs: make sure the chunk allocator doesn't create zero length chunks (+6/-0)
Btrfs: add check for changed leaves in setup_leaf_for_split (+4/-0)

Sage Weil (1) commits (+31/-66):
Btrfs: create snapshot references in same commit as snapshot

Andrea Gelmini (1) commits (+0/-1):
Btrfs: remove duplicate include in ioctl.c

Miao Xie (1) commits (+4/-1):
Btrfs: add NULL check for do_walk_down()

Nick Piggin (1) commits (+5/-32):
Btrfs: use add_to_page_cache_lru, use __page_cache_alloc

Total: (17) commits

fs/btrfs/transaction.c | 112 +++++++++++++++---------------------------------
fs/btrfs/inode.c | 59 +------------------------
fs/btrfs/extent-tree.c | 43 ++++++++++++------
fs/btrfs/super.c | 23 +--------
fs/btrfs/compression.c | 22 +--------
fs/btrfs/volumes.c | 16 +++++-
fs/btrfs/extent_io.c | 15 ------
fs/btrfs/disk-io.c | 12 +++--
fs/btrfs/ioctl.c | 7 ++-
fs/btrfs/ordered-data.c | 6 +-
fs/btrfs/ctree.c | 4 +
fs/btrfs/ctree.h | 1
12 files changed, 111 insertions(+), 209 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: hackbench regression due to commit 9dfc6e68bfe6e
http://groups.google.com/group/linux.kernel/t/234a5e1140b9914e?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 8:50 am
From: Christoph Lameter

On Tue, 6 Apr 2010, Zhang, Yanmin wrote:

> Thanks. I tried 2 and 4 times and didn't see much improvement.
> I checked /proc/vamallocinfo and it doesn't have item of pcpu_get_vm_areas
> when I use 4 times of PERCPU_DYNAMIC_RESERVE.

> I used perf to collect dtlb misses and LLC misses. dtlb miss data is not
> stable. Sometimes, we have a bigger dtlb miss, but get a better result.
>
> LLC misses data are more stable. Only LLC-load-misses is the clear sign now.
> LLC-store-misses has no big difference.

LLC-load-miss is exactly what condition?

The cacheline environment in the hotpath should only include the following
cache lines (without debugging and counters):

1. The first cacheline from the kmem_cache structure

(This is different from the sitation before the 2.6.34 changes. Earlier
some critical values (object length etc) where available
from the kmem_cache_cpu structure. The cacheline containing the percpu
structure array was needed to determome the kmem_cache_cpu address!)

2. The first cacheline from kmem_cache_cpu

3. The first cacheline of the data object (free pointer)

And in case of a kfree/ kmem_cache_free:

4. Cacheline that contains the page struct of the page the object resides
in.

Can you post the .config you are using and the bootup messages?

==============================================================================
TOPIC: Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3)
http://groups.google.com/group/linux.kernel/t/eec470a501781823?hl=en
==============================================================================

== 1 of 5 ==
Date: Tues, Apr 6 2010 8:50 am
From: Rik van Riel

On 04/06/2010 11:34 AM, Minchan Kim wrote:

> Let's see the unlink_anon_vmas.
>
> 1. list_for_each_entry_safe(avc,next, vma->anon_vma_chain, same_vma)
> 2. anon_vma_unlink
> 3. spin_lock(anon_vma->lock)<-- HERE LOCK.
> 4. list_del(anon_vma_chain->same_anon_vma);
>
> What if anon_vma is destroyed and reuse by SLAB_XXX_RCU for another
> anon_vma object between 2 and 3?
> I mean how to make sure 3) does lock valid anon_vma?
>
> I hope it is culprit.

How can the anon_vma get destroyed and reused, when this
anon_vma_chain still has a reference to it (and the
anon_vma has not been freed yet)?

What combination of circumstances is necessary for
your bug hypothetical to happen?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 5 ==
Date: Tues, Apr 6 2010 9:00 am
From: Minchan Kim

On Tue, 2010-04-06 at 11:40 -0400, Rik van Riel wrote:
> On 04/06/2010 11:34 AM, Minchan Kim wrote:
>
> > Let's see the unlink_anon_vmas.
> >
> > 1. list_for_each_entry_safe(avc,next, vma->anon_vma_chain, same_vma)
> > 2. anon_vma_unlink
> > 3. spin_lock(anon_vma->lock)<-- HERE LOCK.
> > 4. list_del(anon_vma_chain->same_anon_vma);
> >
> > What if anon_vma is destroyed and reuse by SLAB_XXX_RCU for another
> > anon_vma object between 2 and 3?
> > I mean how to make sure 3) does lock valid anon_vma?
> >
> > I hope it is culprit.
>
> How can the anon_vma get destroyed and reused, when this
> anon_vma_chain still has a reference to it (and the

Doesn't anon_vma_chain have a ref counter on anon_vma?

> anon_vma has not been freed yet)?

AFAIK, anon_vma can be reused without free by SLAB_XXX_RCU.
So we always use it carefully by page_lock_anon_vma or manual check
with RCU and page_mapped.

What am I missing?

>
> What combination of circumstances is necessary for
> your bug hypothetical to happen?

CPU A CPU B

unlink_anon_vmas
list_for_each_entry

free_pgtable
anon_vma_unlink
<crazy stall> spin_lock(anon_vma);
list_del(same_anon_vma)
spin_unlock(anon_vma)
anon_vma_unlink
anon_vma_free
reuse for another anon_vma
spin_lock(another anon_vma)
list_del(another anon_vma)

If my assumption is wrong, please correct me.
Thanks, Rik.

--
Kind regards,
Minchan Kim

== 3 of 5 ==
Date: Tues, Apr 6 2010 9:10 am
From: Linus Torvalds

On Wed, 7 Apr 2010, Minchan Kim wrote:
>
> Let's see the unlink_anon_vmas.
>
> 1. list_for_each_entry_safe(avc,next, vma->anon_vma_chain, same_vma)
> 2. anon_vma_unlink
> 3. spin_lock(anon_vma->lock) <-- HERE LOCK.
> 4. list_del(anon_vma_chain->same_anon_vma);
>
> What if anon_vma is destroyed and reuse by SLAB_XXX_RCU for another
> anon_vma object between 2 and 3?
> I mean how to make sure 3) does lock valid anon_vma?
>
> I hope it is culprit.

I don't think so. That isn't the racy case. We're working with a
anon_vma_chain, so the anonvma is all there.

The racy case is when we look up an anonvma by the page, and the page gets
unmapped at the same time because somebody else is travelling over the LRU
list of the page itself, isn't it?

I do wonder if "page_lock_anon_vma()" should check the whole
"page_mapped()" case _after_ taking the anon_vma lock. Because if the race
happens, we're following a anon_vma list that has nothing to do with that
page (it's stilla _valid_ list, since we locked the anon_vma, but will it
be ok?)

IOW, what is it that really keeps the anon_vma list reliable _and_
relevant wrt the page? We know we may get a stale anon_vma, are we ok if
that anon_vma list doesn't actually have anything to do with the page any
more?

I think the first check in "page_address_in_vma()" protects us, but
whatever.

However, that made me look at the PAGE_MIGRATION case. That seems to be
just broken. It's doing that page_anon_vma() + spin_lock without holding
any RCU locks, so there is no guarantee that anon_vma there is at all
valid.

Is that function always called with rcu_read_lock()?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 4 of 5 ==
Date: Tues, Apr 6 2010 9:30 am
From: Minchan Kim

Hi, Linus.

On Tue, 2010-04-06 at 08:55 -0700, Linus Torvalds wrote:
>
> On Wed, 7 Apr 2010, Minchan Kim wrote:
> >
> > Let's see the unlink_anon_vmas.
> >
> > 1. list_for_each_entry_safe(avc,next, vma->anon_vma_chain, same_vma)
> > 2. anon_vma_unlink
> > 3. spin_lock(anon_vma->lock) <-- HERE LOCK.
> > 4. list_del(anon_vma_chain->same_anon_vma);
> >
> > What if anon_vma is destroyed and reuse by SLAB_XXX_RCU for another
> > anon_vma object between 2 and 3?
> > I mean how to make sure 3) does lock valid anon_vma?
> >
> > I hope it is culprit.
>
> I don't think so. That isn't the racy case. We're working with a
> anon_vma_chain, so the anonvma is all there.
>

But the anon_vma is using for another anon_vma.
Nonetheless, anon_vma_unlink does list_del(anon_vma's same_anon_vma).
I doubt it.

> The racy case is when we look up an anonvma by the page, and the page gets
> unmapped at the same time because somebody else is travelling over the LRU
> list of the page itself, isn't it?

Yes. but I thought page might travel with anon_vmas which have
same_anon_vma deleted by race.

>
> I do wonder if "page_lock_anon_vma()" should check the whole
> "page_mapped()" case _after_ taking the anon_vma lock. Because if the race
> happens, we're following a anon_vma list that has nothing to do with that
> page (it's stilla _valid_ list, since we locked the anon_vma, but will it
> be ok?)

So we always use it with (vma_address and page_check_address) to make
sure validation of anon_vma.
But I think it's not good design. I want to hold lock ahead checking of
page_mapped but maybe performance issue? I am not sure.

>
> IOW, what is it that really keeps the anon_vma list reliable _and_
> relevant wrt the page? We know we may get a stale anon_vma, are we ok if
> that anon_vma list doesn't actually have anything to do with the page any
> more?
> I think the first check in "page_address_in_vma()" protects us, but
> whatever.
>
> However, that made me look at the PAGE_MIGRATION case. That seems to be
> just broken. It's doing that page_anon_vma() + spin_lock without holding
> any RCU locks, so there is no guarantee that anon_vma there is at all
> valid.

FYI, recently there is a patch about migration case.
http://lkml.org/lkml/2010/4/2/145

>
> Is that function always called with rcu_read_lock()?
>
> Linus

--
Kind regards,
Minchan Kim

== 5 of 5 ==
Date: Tues, Apr 6 2010 9:40 am
From: Linus Torvalds

On Wed, 7 Apr 2010, Minchan Kim wrote:
> >
> > I don't think so. That isn't the racy case. We're working with a
> > anon_vma_chain, so the anonvma is all there.
>
> But the anon_vma is using for another anon_vma.

No, that can only happen if somebody has done "anon_vma_free()" on it. And
nobody does that if the anonvma still has a non-empty'&anon_vma->head'.

So as long as the anon_vma has a anon_vma_chain entry associated with it
(or a ksm refcount, but that's a separate issue), it's not going to be
re-allocated for any other use, because it's not going to be free'd.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: x86: let 'reservetop' functioning right
http://groups.google.com/group/linux.kernel/t/eaaf1a980d8f3f01?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:00 am
From: Liang Li

When specify 'reservetop=0xbadc0de' kernel parameter, the kernel will
stop booting due to a early_ioremap bug that relate to commit 8827247ff.

The root cause of boot failure problem is the value of 'slot_virt[i]'
was initialized in setup_arch->early_ioremap_init. But later in
setup_arch, the function 'parse_early_param' will modify 'FIXADDR_TOP'
when 'reservetop=0xbadc0de' being specified.

The simplest fix might be use __fix_to_virt(idx0) to get updated value
of 'FIXADDR_TOP' in '__early_ioremap' instead of reference old value
from slot_virt[slot] directly.

Signed-off-by: Liang Li <liang.li@windriver.com>
---
Hi,

I am not sure if kernel command line option 'reservetop' will block boot
is the normal situation or not. Could someone tell me if we should fix
it or just leave it as is or someone is doing something to replace
'reservetop' with some other stuff?

arch/x86/mm/ioremap.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5eb1ba7..ea82ef0 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -537,9 +537,9 @@ __early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
--nrpages;
}
if (early_ioremap_debug)
- printk(KERN_CONT "%08lx + %08lx\n", offset, slot_virt[slot]);
+ printk(KERN_CONT "%08lx + %08lx\n", offset, __fix_to_virt(idx0));

- prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);
+ prev_map[slot] = (void __iomem *)(offset + __fix_to_virt(idx0));
return prev_map[slot];
}

--
1.6.6

==============================================================================
TOPIC: perf, x86: Enable Nehalem-EX support
http://groups.google.com/group/linux.kernel/t/484d02ae823cd092?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:00 am
From: tip-bot for Vince Weaver

Commit-ID: 134fbadf028a5977a1b06b0253d3ee33e6f0c642
Gitweb: http://git.kernel.org/tip/134fbadf028a5977a1b06b0253d3ee33e6f0c642
Author: Vince Weaver <vweaver1@eecs.utk.edu>
AuthorDate: Tue, 6 Apr 2010 10:01:19 -0400
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 6 Apr 2010 17:52:59 +0200

perf, x86: Enable Nehalem-EX support

According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/perf_event_intel.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 84bfde6..9c794ac 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -936,6 +936,7 @@ static __init int intel_pmu_init(void)

case 26: /* 45 nm nehalem, "Bloomfield" */
case 30: /* 45 nm nehalem, "Lynnfield" */
+ case 46: /* 45 nm nehalem-ex, "Beckton" */
memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
sizeof(hw_cache_event_ids));

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Extended partition mapping wrong size
http://groups.google.com/group/linux.kernel/t/acd203ae293d289b?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:10 am
From: Phillip Susi

On 4/6/2010 11:33 AM, Karel Zak wrote:
> You have to care about the partition table (EBR).
>
> The current 1024 bytes is completely useless size, if you enlarge the
> size of the partition (for example to 1MiB) you will see reports from
> people who lost their extended partitions. (I don't believe that all
> mkfs programs are able to detect/skip EBR.)

Good point... mke2fs won't damage it since it leaves the first sector
intact and places its boot sector in sector 1 ( or was it 2? ), but FAT
and NTFS won't be so kind.

>> Then again, I could swear that once upon a time the kernel simply
>> did not bother creating a dev node for the extended partition, and
>> this seems to be a hack that was put in to make it easy for LILO to
>> install to one. Personally I'd prefer going back to the old
>> behavior of just not having a useless device there.
>
> This is probably better idea than enlarge the size :-)

Aye, I'd prefer the useless device to be removed, but at the very least
it should not use a size of 2 when the second sector is not actually
there because it is the first sector of the logical partition, though we
just patched parted to reserve that second sector anyhow even though you
are using none alignment mode, just to avoid running into this problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: libertas/sdio: 8686: set ECSI bit for 1-bit transfers
http://groups.google.com/group/linux.kernel/t/207b2568b9561ec6?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:10 am
From: Dan Williams

On Tue, 2010-04-06 at 10:52 +0200, Daniel Mack wrote:
> When operating in 1-bit mode, SDAT1 is used as dedicated interrupt line.
> However, the 8686 will only drive this line when the ECSI bit is set in
> the CCCR_IF register.
>
> Thanks to Alagu Sankar for pointing me in the right direction.
>
> Signed-off-by: Daniel Mack <daniel@caiaq.de>
> Cc: Alagu Sankar <alagusankar@embwise.com>
> Cc: Volker Ernst <volker.ernst@txtr.com>
> Cc: Dan Williams <dcbw@redhat.com>
> Cc: John W. Linville <linville@tuxdriver.com>
> Cc: Holger Schurig <hs4233@mail.mn-solutions.de>
> Cc: Bing Zhao <bzhao@marvell.com>
> Cc: libertas-dev@lists.infradead.org
> Cc: linux-wireless@vger.kernel.org
> Cc: linux-mmc@vger.kernel.org

Acked-by: Dan Williams <dcbw@redhat.com>

> ---
> drivers/net/wireless/libertas/if_sdio.c | 22 ++++++++++++++++++++++
> include/linux/mmc/sdio.h | 2 ++
> 2 files changed, 24 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/libertas/if_sdio.c b/drivers/net/wireless/libertas/if_sdio.c
> index 7a73f62..33206a9 100644
> --- a/drivers/net/wireless/libertas/if_sdio.c
> +++ b/drivers/net/wireless/libertas/if_sdio.c
> @@ -34,6 +34,8 @@
> #include <linux/mmc/card.h>
> #include <linux/mmc/sdio_func.h>
> #include <linux/mmc/sdio_ids.h>
> +#include <linux/mmc/sdio.h>
> +#include <linux/mmc/host.h>
>
> #include "host.h"
> #include "decl.h"
> @@ -942,6 +944,7 @@ static int if_sdio_probe(struct sdio_func *func,
> int ret, i;
> unsigned int model;
> struct if_sdio_packet *packet;
> + struct mmc_host *host = func->card->host;
>
> lbs_deb_enter(LBS_DEB_SDIO);
>
> @@ -1022,6 +1025,25 @@ static int if_sdio_probe(struct sdio_func *func,
> if (ret)
> goto disable;
>
> + /* For 1-bit transfers to the 8686 model, we need to enable the
> + * interrupt flag in the CCCR register. Set the MMC_QUIRK_LENIENT_FN0
> + * bit to allow access to non-vendor registers. */
> + if ((card->model == IF_SDIO_MODEL_8686) &&
> + (host->caps & MMC_CAP_SDIO_IRQ) &&
> + (host->ios.bus_width == MMC_BUS_WIDTH_1)) {
> + u8 reg;
> +
> + func->card->quirks |= MMC_QUIRK_LENIENT_FN0;
> + reg = sdio_f0_readb(func, SDIO_CCCR_IF, &ret);
> + if (ret)
> + goto release_int;
> +
> + reg |= SDIO_BUS_ECSI;
> + sdio_f0_writeb(func, reg, SDIO_CCCR_IF, &ret);
> + if (ret)
> + goto release_int;
> + }
> +
> card->ioport = sdio_readb(func, IF_SDIO_IOPORT, &ret);
> if (ret)
> goto release_int;
> diff --git a/include/linux/mmc/sdio.h b/include/linux/mmc/sdio.h
> index 0ebaef5..329a8fa 100644
> --- a/include/linux/mmc/sdio.h
> +++ b/include/linux/mmc/sdio.h
> @@ -94,6 +94,8 @@
>
> #define SDIO_BUS_WIDTH_1BIT 0x00
> #define SDIO_BUS_WIDTH_4BIT 0x02
> +#define SDIO_BUS_ECSI 0x20 /* Enable continuous SPI interrupt */
> +#define SDIO_BUS_SCSI 0x40 /* Support continuous SPI interrupt */
>
> #define SDIO_BUS_ASYNC_INT 0x20
>

==============================================================================
TOPIC: OMAP: Fix for bus width which improves SD card's peformance.
http://groups.google.com/group/linux.kernel/t/b7a61b5d5d3b2476?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Apr 6 2010 9:20 am
From: "Madhusudhan"

> -----Original Message-----
> From: Felipe Balbi [mailto:me@felipebalbi.com]
> Sent: Tuesday, April 06, 2010 12:01 AM
> To: Madhusudhan
> Cc: me@felipebalbi.com; 'kishore kadiyala'; 'Vimal Singh';
> tony@atomide.com; svenkatr@ti.com; linux-omap@vger.kernel.org; linux-
> kernel@vger.kernel.org; jarkko.lavinen@nokia.com
> Subject: Re: [PATCH v3] OMAP: Fix for bus width which improves SD card's
> peformance.
>
> Hi,
>
> On Mon, Apr 05, 2010 at 12:19:29PM -0500, Madhusudhan wrote:
> > Since the first if command already checks for the 8-bit the second check
> > like >= 4 is definitely not readable in my opinion.
>
> how come ???
>
> > Functionally do you see anything wrong with this patch??
>
> functionally no, but (hypothetical situation) and if on
> omap4/5/6/whatever, omap controller supports a bigger bus width then
> you'll have to add a line like:
>
> + if (mmc_slot(host).wires == 16)
> + mmc->caps |= (MMC_CAP_16_BIT_DATA | MMC_CAP_8_BIT_DATA |
> + MMC_CAP_4_BIT_DATA);
> - if (mmc_slot(host).wires == 8)
> + else if (mmc_slot(host).wires == 8)
>
> do you see the problem ?? In my opinion it doesn't scale well.
>

The point we should note here is that MMC spec supports a max bus width of
8-bit. So anything beyond 8-bit is not in the picture as of today.

But, my bad on miss interpreting the snippet Felipe sent earlier.

if (mmc_slot(host).wires >= 8)
mmc->caps |= MMC_CAP_8_BIT_DATA;
if (mmc_slot(host).wires >= 4)
mmc->caps |= MMC_CAP_4_BIT_DATA;

I missed the fact that you removed the setting of 4-bit from the first
check.

I am okay with the above snippet as it is a trivial change that we are
trying to patch here which fixes an important issue.

Regards,
Madhu
> --
> balbi

== 2 of 2 ==
Date: Tues, Apr 6 2010 9:40 am
From: Felipe Balbi

On Tue, Apr 06, 2010 at 06:16:01PM +0200, ext Madhusudhan wrote:
>
>
>> -----Original Message-----
>> From: Felipe Balbi [mailto:me@felipebalbi.com]
>> Sent: Tuesday, April 06, 2010 12:01 AM
>> To: Madhusudhan
>> Cc: me@felipebalbi.com; 'kishore kadiyala'; 'Vimal Singh';
>> tony@atomide.com; svenkatr@ti.com; linux-omap@vger.kernel.org; linux-
>> kernel@vger.kernel.org; jarkko.lavinen@nokia.com
>> Subject: Re: [PATCH v3] OMAP: Fix for bus width which improves SD card's
>> peformance.
>>
>> Hi,
>>
>> On Mon, Apr 05, 2010 at 12:19:29PM -0500, Madhusudhan wrote:
>> > Since the first if command already checks for the 8-bit the second check
>> > like >= 4 is definitely not readable in my opinion.
>>
>> how come ???
>>
>> > Functionally do you see anything wrong with this patch??
>>
>> functionally no, but (hypothetical situation) and if on
>> omap4/5/6/whatever, omap controller supports a bigger bus width then
>> you'll have to add a line like:
>>
>> + if (mmc_slot(host).wires == 16)
>> + mmc->caps |= (MMC_CAP_16_BIT_DATA | MMC_CAP_8_BIT_DATA |
>> + MMC_CAP_4_BIT_DATA);
>> - if (mmc_slot(host).wires == 8)
>> + else if (mmc_slot(host).wires == 8)
>>
>> do you see the problem ?? In my opinion it doesn't scale well.
>>
>
>The point we should note here is that MMC spec supports a max bus width of
>8-bit. So anything beyond 8-bit is not in the picture as of today.

in that case, the code could be:

WARN_ON(mmc_slot(host).wires > 8);

if (mmc_slot(host).wires == 8)
mmc->caps |= MMC_CAP_8_BIT_DATA;
if (mmc_slot(host).wires >= 4)
mmc->caps |= MMC_CAP_4_BIT_DATA;

--
balbi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: NFS: Fix RCU warnings in nfs_inode_return_delegation_noreclaim() [ver #
2]
http://groups.google.com/group/linux.kernel/t/efc3d1e6a49a4cbe?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:20 am
From: David Howells

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > > So you have objected to needless memory barriers. How do you feel
> > > about possibly needless ACCESS_ONCE() calls?
> >
> > That would work here since it shouldn't emit any excess instructions.
>
> And here is the corresponding patch. Seem reasonable?

Actually, now I've thought about it some more. No, it's not reasonable.
You've written:

This patch adds a variant of rcu_dereference() that handles situations
where the RCU-protected data structure cannot change, perhaps due to
our holding the update-side lock, or where the RCU-protected pointer is
only to be tested, not dereferenced.

But if we hold the update-side lock, then why should we be forced to use
ACCESS_ONCE()?

In fact, if we don't hold the lock, but we want to test the pointer twice in
succession, why should we be required to use ACCESS_LOCK()?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: An incorrect assumption over radix_tree_tag_get()
http://groups.google.com/group/linux.kernel/t/bcc75fb93cd52d92?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:30 am
From: David Howells

Hi,

I think I've made a bad assumption over my usage of radix_tree_tag_get() in
fs/fscache/page.c.

I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so
sure. I think it's only protected against removal of part of the tree.

Can you confirm?

==============================================================================
TOPIC: regulator: regulator_get behaviour without CONFIG_REGULATOR set
http://groups.google.com/group/linux.kernel/t/30191bf6a382ba77?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:30 am
From: Jonathan Cameron

On 04/06/10 16:27, Liam Girdwood wrote:
> On Mon, 2010-04-05 at 14:23 +0100, Mark Brown wrote:
>> On Sat, Apr 03, 2010 at 05:37:45PM +0200, Jean Delvare wrote:
>>> On Fri, 2 Apr 2010 21:45:03 +0100, Mark Brown wrote:
>>
>>>> In this case you don't need the if (voltage) check - the code that uses
>>>> supply_uV is going to have to cope with it being set to 0 if the driver
>>>> doesn't just give up, and the enable wants to happen anyway (perhaps
>>>> we've got a switchable supply we can't read the voltage of). It should
>>>> never make any odds if the notifier never gets called since the supply
>>>> could be invariant.
>>
>>> We still need to check if (voltage) to not overwrite the previous value
>>> of data->supply_uV with 0. We will probably do that as an immediate fix
>>> to the sht15 driver. But yes, the rest doesn't need a condition.
>>
>> I was assuming that there wasn't a previous value since this was in
>> probe(), sorry.
>>
>>> Still, I'd prefer if drivers were just able to check for data->reg ==
>>> NULL and skip the whole thing. Would you apply the following patch?
>>
>>> From: Jean Delvare <khali@linux-fr.org>
>>> Subject: regulator: Let drivers know when they use the stub API
>>
>>> Have the stub variant of regulator_get() return NULL, so that drivers
>>> can (but still don't have to) handle this case specifically.
>>
>> I guess I'll ack it but I'd be suspicous of driver code which actually
>> makes use of this - there is actual hardware which has the same features
>> as the regulator that gets stubbed in and ought to be handled. On the
>> other hand, perhaps someone will come up with a good use for it.
>>
>> It also seems a bit odd to return a traditional error value in a success
>> case but it doesn't actually make much difference.
>
> Thanks, I've applied this with Mark's Ack.
>
> I suppose this is something we may look into more when we have more
> clients.
Makes sense. There will probably be quite a few IIO drivers over the next
few months doing much the same as sht15, where the voltage ref for devices
may well be fed by a regulator. In that case, we may only offer the option
of using an external v_ref if the regulator is available. Many devices have
an internal regulator to provide it so typically we'll start them up using that
and provide an interface to switch to external regulator if one is available.
I haven't thought through exactly how this will work as yet. I'll cc people in
when this comes up.

Thanks,

Jonathan

==============================================================================
TOPIC: VMware Balloon driver
http://groups.google.com/group/linux.kernel/t/bcacb0fe945d573b?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Apr 6 2010 9:30 am
From: Avi Kivity

On 04/06/2010 01:40 AM, Andrew Morton wrote:
> On Tue, 06 Apr 2010 01:26:11 +0300
> Avi Kivity<avi@redhat.com> wrote:
>
>
>> On 04/06/2010 01:17 AM, Andrew Morton wrote:
>>
>>>> The basic idea of the driver is to allow a guest system to give up
>>>> memory it isn't using so it can be reused by other virtual machines (or
>>>> the host itself).
>>>>
>>>>
>>> So... does this differ in any fundamental way from what hibernation
>>> does, via shrink_all_memory()?
>>>
>>>
>> Just the _all_ bit, and the fact that we need to report the freed page
>> numbers to the hypervisor.
>>
>>
> So... why not tweak that, rather than implementing some parallel thing?
>

That's maybe 5 lines of code. Most of the code is focused on
interpreting requests from the hypervisor and replying with the page
numbers.

--
error compiling committee.c: too many arguments to function

==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Tuesday, April 6, 2010

linux.kernel - 26 new messages in 16 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts