twitter: linux.kernel - 26 new messages in 18 topics

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

Today's topics:

* sysfs: Only take active references on attributes. - 3 messages, 1 author
http://groups.google.com/group/linux.kernel/t/63818dfc875730b2?hl=en
* slab: add memory hotplug support - 3 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/a8beda1232363b5e?hl=en
* kconfig: place git SHA1 in .config output if in SCM - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/95d6e65f1985f2b9?hl=en
* mmc: omap_hsmmc: Fix conditional locking - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/944b336e5366644e?hl=en
* nfs: use 4*rsize readahead size - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/0ded33f7779e13c7?hl=en
* ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first
setup - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/7603bbf89a910ed1?hl=en
* snet: Security for NETwork syscalls - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2ba3cee1400ac233?hl=en
* [PATCH]Support MCP89 and GT21x hdmi audio - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/16c949f2d92ac0a6?hl=en
* introduce sys_membarrier(): process-wide memory barrier (v9) - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/c66948a2bac76935?hl=en
* binfmt_elf: plug a memory leak situation on dump_seek() - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/5bec5866c9bd2261?hl=en
* x86/mm fixes - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/65bf90b96de084b0?hl=en
* memcg: dirty pages instrumentation - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/98b8f3d66410be44?hl=en
* kconfig: place git SHA1 in .config output if in git tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/9f6a7cb3a5d924c8?hl=en
* pci: move pci_set_dma_mask and pci_set_consistent_dma_mask to pci-dma-compat.
h - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/b89576fbf68181bd?hl=en
* 2.6.33 dies on modprobe - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/89ac756ead52e900?hl=en
* [GIT PULL] - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5cc69c0f0961ced2?hl=en
* nommu: get_user_pages(): pin last page on non-page-aligned start - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/b6c94687ec99497a?hl=en
* mmotm 2010-03-02-18-38 uploaded - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/507a22f61ec0d703?hl=en

==============================================================================
TOPIC: sysfs: Only take active references on attributes.
http://groups.google.com/group/linux.kernel/t/63818dfc875730b2?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Mar 2 2010 5:30 pm
From: Tejun Heo

On 03/03/2010 08:28 AM, Greg Kroah-Hartman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
>
> If we exclude directories and symlinks from the set of sysfs
> dirents where we need active references we are left with
> sysfs attributes (binary or not).
>
> - Tweak sysfs_deactivate to only do something on attributes
> - Move lockdep initialization into sysfs_file_add_mode to
> limit it to just attributes.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Acked-by: Tejun Heo <tj@kernel.org>

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 3 ==
Date: Tues, Mar 2 2010 5:40 pm
From: Tejun Heo

On 03/03/2010 08:28 AM, Greg Kroah-Hartman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
>
> These are the non-static sysfs attributes that exist on
> my test machine. Fix them to use sysfs_attr_init or
> sysfs_bin_attr_init as appropriate. It simply requires
> making a sysfs attribute present to see this. So this
> is a little bit tedious but otherwise not too bad.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

== 3 of 3 ==
Date: Tues, Mar 2 2010 5:40 pm
From: Tejun Heo

On 03/03/2010 08:28 AM, Greg Kroah-Hartman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
>
> Acknowledge that the logical sysfs rwsem has one instance per
> sysfs attribute with different locking depencencies for different
> attributes.
>
> There is a sysfs idiom where writing to one sysfs file causes the
> addition or removal of other sysfs files. Lumping all of the
> sysfs attributes together in one lock class causes lockdep to
> generate lots of false positives.
>
> This introduces the requirement that non-static sysfs attributes
> need to be initialized with sysfs_attr_init or sysfs_bin_attr_init.
> Strictly speaking this requirement only exists when lockdep is
> enabled, and when lockdep is enabled we get a bit fat warning
> if this requirement is not met.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Acked-by: Tejun Heo <tj@kernel.org> but it would be nice if when and
how attr->key is set for each case and how it's supposed to trigger
big fat warning in the code. It's a bit difficult to follow
currently.

==============================================================================
TOPIC: slab: add memory hotplug support
http://groups.google.com/group/linux.kernel/t/a8beda1232363b5e?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Mar 2 2010 5:40 pm
From: KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010 14:20:06 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:

>
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
>
> Otherwise
>
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
>

What this patch fixes ? Maybe I miss something...

At node hot-add

* pgdat is allocated from other node (because we have no memory for "nid")
* memmap for the first section (and possiby others) will be allocated from
other nodes.
* Once a section for the node is onlined, any memory can be allocated localy.

(Allocating memory from local node requires some new implementation as
bootmem allocater, we didn't that.)

Before this patch, slab's control layer is allocated by cpuhotplug.
So, at least keeping this order,
memory online -> cpu online
slab's control layer is allocated from local node.

When node-hotadd is done in this order
cpu online -> memory online
kmalloc_node() will allocate memory from other node via fallback.

After this patch, slab's control layer is allocated by memory hotplug.
Then, in any order, slab's control will be allocated via fallback routine.

If this patch is an alternative fix for Andi's this logic
==
Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
* we can do some work if the lock was obtained.
*/
l3 = searchp->nodelists[node];
+ /* Note node yet set up */
+ if (!l3)
+ break;
==
I'm not sure this really happens.

cache_reap() is for checking local node. The caller is set up by
CPU_ONLINE. searchp->nodelists[] is filled by CPU_PREPARE.

Then, cpu for the node should be onlined. (and it's done under proper mutex.)

I'm sorry if I miss something important. But how anyone cause this ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 3 ==
Date: Tues, Mar 2 2010 6:40 pm
From: David Rientjes

On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:

> At node hot-add
>
> * pgdat is allocated from other node (because we have no memory for "nid")
> * memmap for the first section (and possiby others) will be allocated from
> other nodes.
> * Once a section for the node is onlined, any memory can be allocated localy.
>

Correct, and the struct kmem_list3 is also alloacted from other nodes with
my patch.

> (Allocating memory from local node requires some new implementation as
> bootmem allocater, we didn't that.)
>
> Before this patch, slab's control layer is allocated by cpuhotplug.
> So, at least keeping this order,
> memory online -> cpu online
> slab's control layer is allocated from local node.
>
> When node-hotadd is done in this order
> cpu online -> memory online
> kmalloc_node() will allocate memory from other node via fallback.
>
> After this patch, slab's control layer is allocated by memory hotplug.
> Then, in any order, slab's control will be allocated via fallback routine.
>

Again, this addresses memory hotplug that requires a new node to be
onlined that do not have corresponding cpus that are being onlined. On
x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined
either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.
On other architectures such as powerpc, this is done in different ways.

All of this is spelled out in the changelog for the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 3 ==
Date: Tues, Mar 2 2010 7:00 pm
From: KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010 18:39:20 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:
>
> > At node hot-add
> >
> > * pgdat is allocated from other node (because we have no memory for "nid")
> > * memmap for the first section (and possiby others) will be allocated from
> > other nodes.
> > * Once a section for the node is onlined, any memory can be allocated localy.
> >
>
> Correct, and the struct kmem_list3 is also alloacted from other nodes with
> my patch.
>
> > (Allocating memory from local node requires some new implementation as
> > bootmem allocater, we didn't that.)
> >
> > Before this patch, slab's control layer is allocated by cpuhotplug.
> > So, at least keeping this order,
> > memory online -> cpu online
> > slab's control layer is allocated from local node.
> >
> > When node-hotadd is done in this order
> > cpu online -> memory online
> > kmalloc_node() will allocate memory from other node via fallback.
> >
> > After this patch, slab's control layer is allocated by memory hotplug.
> > Then, in any order, slab's control will be allocated via fallback routine.
> >
>
> Again, this addresses memory hotplug that requires a new node to be
> onlined that do not have corresponding cpus that are being onlined. On
> x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined
> either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.
> On other architectures such as powerpc, this is done in different ways.
>
> All of this is spelled out in the changelog for the patch.
>
Ah, ok. for cpu-less node and kmallco_node() against that node.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame

==============================================================================
TOPIC: kconfig: place git SHA1 in .config output if in SCM
http://groups.google.com/group/linux.kernel/t/95d6e65f1985f2b9?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Mar 2 2010 5:40 pm
From: Linus Torvalds

On Tue, 2 Mar 2010, Paul E. McKenney wrote:
> + env = getenv(SRCTREE);
> + if (env) {
> + sprintf(cmdline,
> + "%s/scripts/setlocalversion %s 2> /dev/null",
> + env, env);
> + slv = popen(cmdline, "r");

I suspect this does various bad things if there are spaces or special
characters in $SRCTREE.

It would be a lot safer to uses fork/execve rather than something
that interprets a shell command line.

Of course, I didn't check that all our old users of SRCTREE are safe
either, but at least docproc.c (the one I _did_ check) uses 'execvp()' and
'fopen()' that both take real filenames, not a shell string.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Tues, Mar 2 2010 6:20 pm
From: "Paul E. McKenney"

On Tue, Mar 02, 2010 at 05:29:50PM -0800, Linus Torvalds wrote:
>
>
> On Tue, 2 Mar 2010, Paul E. McKenney wrote:
> > + env = getenv(SRCTREE);
> > + if (env) {
> > + sprintf(cmdline,
> > + "%s/scripts/setlocalversion %s 2> /dev/null",
> > + env, env);
> > + slv = popen(cmdline, "r");
>
> I suspect this does various bad things if there are spaces or special
> characters in $SRCTREE.
>
> It would be a lot safer to uses fork/execve rather than something
> that interprets a shell command line.
>
> Of course, I didn't check that all our old users of SRCTREE are safe
> either, but at least docproc.c (the one I _did_ check) uses 'execvp()' and
> 'fopen()' that both take real filenames, not a shell string.

Well, we certainly don't want or need bash's "$", "``", and other
interpretations in this case. I will update and send out a new patch.

Hmmm... It has been one good long time since I have used pipe(), dup2(),
exec*(), and friends. In happy contrast to last time, some of the man
pages now seem to have nice examples. ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mmc: omap_hsmmc: Fix conditional locking
http://groups.google.com/group/linux.kernel/t/944b336e5366644e?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 5:40 pm
From: "Madhusudhan"

> -----Original Message-----
> From: linux-mmc-owner@vger.kernel.org [mailto:linux-mmc-
> owner@vger.kernel.org] On Behalf Of Thomas Gleixner
> Sent: Monday, March 01, 2010 1:02 PM
> To: LKML
> Cc: linux-omap@vger.kernel.org; linux-mmc@vger.kernel.org
> Subject: [PATCH] mmc: omap_hsmmc: Fix conditional locking
>
> Conditional locking on (!in_interrupt()) is broken by design and there
> is no reason to keep the host->irq_lock across the call to
> mmc_request_done(). Also the host->protect_card magic hack does not
> depend on the context
>

Can you please elaborate why the existing logic is broken?

It locks at the new request and unlocks just before issuing the cmd. Further
IRQ handler has these calls hence the !in_interrupt check.

How does this patch improve that? In fact with your patch for a data
transfer cmd there are several lock-unlock calls.

> Fix the mess by dropping host->irq_lock before calling
> mmc_request_done().
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
> index 4b23225..99a3383 100644
> --- a/drivers/mmc/host/omap_hsmmc.c
> +++ b/drivers/mmc/host/omap_hsmmc.c
> @@ -468,14 +468,6 @@ omap_hsmmc_start_command(struct omap_hsmmc_host
> *host, struct mmc_command *cmd,
> if (host->use_dma)
> cmdreg |= DMA_EN;
>
> - /*
> - * In an interrupt context (i.e. STOP command), the spinlock is
> unlocked
> - * by the interrupt handler, otherwise (i.e. for a new request) it
> is
> - * unlocked here.
> - */
> - if (!in_interrupt())
> - spin_unlock_irqrestore(&host->irq_lock, host->flags);
> -
> OMAP_HSMMC_WRITE(host->base, ARG, cmd->arg);
> OMAP_HSMMC_WRITE(host->base, CMD, cmdreg);
> }
> @@ -506,7 +498,9 @@ omap_hsmmc_xfer_done(struct omap_hsmmc_host *host,
> struct mmc_data *data)
> }
>
> host->mrq = NULL;
> + spin_unlock(&host->irq_lock);
> mmc_request_done(host->mmc, mrq);
> + spin_lock(&host->irq_lock);
> return;
> }
>
> @@ -523,7 +517,9 @@ omap_hsmmc_xfer_done(struct omap_hsmmc_host *host,
> struct mmc_data *data)
>
> if (!data->stop) {
> host->mrq = NULL;
> + spin_unlock(&host->irq_lock);
> mmc_request_done(host->mmc, data->mrq);
> + spin_lock(&host->irq_lock);
> return;
> }
> omap_hsmmc_start_command(host, data->stop, NULL);
> @@ -551,7 +547,9 @@ omap_hsmmc_cmd_done(struct omap_hsmmc_host *host,
> struct mmc_command *cmd)
> }
> if ((host->data == NULL && !host->response_busy) || cmd->error) {
> host->mrq = NULL;
> + spin_unlock(&host->irq_lock);
> mmc_request_done(host->mmc, cmd->mrq);
> + spin_lock(&host->irq_lock);
> }
> }
>
> @@ -1077,37 +1075,31 @@ static void omap_hsmmc_request(struct mmc_host
> *mmc, struct mmc_request *req)
> struct omap_hsmmc_host *host = mmc_priv(mmc);
> int err;
>
> + spin_lock_irqsave(&host->irq_lock, host->flags);
> /*
> - * Prevent races with the interrupt handler because of unexpected
> - * interrupts, but not if we are already in interrupt context i.e.
> - * retries.
> + * Protect the card from I/O if there is a possibility
> + * it can be removed.
> */
> - if (!in_interrupt()) {
> - spin_lock_irqsave(&host->irq_lock, host->flags);
> - /*
> - * Protect the card from I/O if there is a possibility
> - * it can be removed.
> - */
> - if (host->protect_card) {
> - if (host->reqs_blocked < 3) {
> - /*
> - * Ensure the controller is left in a
consistent
> - * state by resetting the command and data
state
> - * machines.
> - */
> - omap_hsmmc_reset_controller_fsm(host, SRD);
> - omap_hsmmc_reset_controller_fsm(host, SRC);
> - host->reqs_blocked += 1;
> - }
> - req->cmd->error = -EBADF;
> - if (req->data)
> - req->data->error = -EBADF;
> - spin_unlock_irqrestore(&host->irq_lock,
host->flags);
> - mmc_request_done(mmc, req);
> - return;
> - } else if (host->reqs_blocked)
> - host->reqs_blocked = 0;
> - }
> + if (host->protect_card) {
> + if (host->reqs_blocked < 3) {
> + /*
> + * Ensure the controller is left in a consistent
> + * state by resetting the command and data state
> + * machines.
> + */
> + omap_hsmmc_reset_controller_fsm(host, SRD);
> + omap_hsmmc_reset_controller_fsm(host, SRC);
> + host->reqs_blocked += 1;
> + }
> + req->cmd->error = -EBADF;
> + if (req->data)
> + req->data->error = -EBADF;
> + spin_unlock_irqrestore(&host->irq_lock, host->flags);
> + mmc_request_done(mmc, req);
> + return;
> + } else if (host->reqs_blocked)
> + host->reqs_blocked = 0;
> +
> WARN_ON(host->mrq != NULL);
> host->mrq = req;
> err = omap_hsmmc_prepare_data(host, req);
> @@ -1116,13 +1108,13 @@ static void omap_hsmmc_request(struct mmc_host
> *mmc, struct mmc_request *req)
> if (req->data)
> req->data->error = err;
> host->mrq = NULL;
> - if (!in_interrupt())
> - spin_unlock_irqrestore(&host->irq_lock,
host->flags);
> + spin_unlock_irqrestore(&host->irq_lock, host->flags);
> mmc_request_done(mmc, req);
> return;
> }
>
> omap_hsmmc_start_command(host, req->cmd, req->data);
> + spin_unlock_irqrestore(&host->irq_lock, host->flags);
> }
>
> /* Routine to configure clock values. Exposed API to core */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

==============================================================================
TOPIC: nfs: use 4*rsize readahead size
http://groups.google.com/group/linux.kernel/t/0ded33f7779e13c7?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 5:50 pm
From: Wu Fengguang

On Wed, Mar 03, 2010 at 04:14:33AM +0800, Bret Towe wrote:

> how do you determine which bdi to use? I skimmed thru
> the filesystem in /sys and didn't see anything that says which is what

MOUNTPOINT=" /mnt/ext4_test "
# grep "$MOUNTPOINT" /proc/$$/mountinfo|awk '{print $3}'
0:24

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_
first setup
http://groups.google.com/group/linux.kernel/t/7603bbf89a910ed1?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Mar 2 2010 5:50 pm
From: Huang Ying

On Tue, 2010-03-02 at 19:04 +0800, Hidetoshi Seto wrote:
> (2010/03/02 18:13), Huang Ying wrote:
> > On Tue, 2010-03-02 at 16:09 +0800, Hidetoshi Seto wrote:
> >> The aer_init() will be called for root ports, but not for end point
> >> devices or so on. So please remain the firmware_first setup code in
> >> PCI core. Otherwise endpoint drivers will get success on call of
> >> pci_enable_pcie_error_reporting() regardless of the firmware first.
> >
> > Or we can call firmware_first setup code in
> > pci_enable_pcie_error_reporting(), because
> >
> > 1. I think AER related code should be put in drivers/pci/pcie/aer
> > instead of PCI core or drivers/acpi, if it is possible.
> >
> > 2. pci_setup_device is called so early, so that it is hard to do some
> > HEST related initialization (such as checking bad format) before it.
>
> I understands the feeling, but before agreeing with your
> proposal, I'd like to have an answer of a question:
>
> - Is it necessary to setup the firmware_first flag
> for an endpoint even if the endpoint's driver never
> call pci_enable_pcie_error_reporting()?
>
> According to the current implementation, there are no
> driver referring the firmware_first flag other than that
> it owns. However I guess that the flag will be necessary
> for AER driver (i.e. aerdrv_core) in near future, because
> we can use the flag to determine whether the AER driver
> can check the device or not, when it is required to walk
> pci bus hierarchy to find an erroneous device.
>
> For example, assume that there are 2 endpoints under a same
> root port. One is (likely on-board) "firmware first" endpoint,
> with driver which does not call pci_enable_pcie_error_reporting()
> (because of no interest in AER, or just not implemented yet,
> anyway). The other is (likely card seated on a slot) not
> firmware first, with better driver which can handle it's AER.
> If my understanding is correct and if everything goes well,
> errors on one should be reported via APEI while the other should
> be reported via AER driver.

Yes. I think this should be supported. How about something as follow?

struct pci_dev {
...
unsigned int __firmware_first:2;
...
};

int pcie_aer_get_firmware_first(struct pci_dev *dev)
{
if (!dev->__firmware_first)
aer_set_firmware_first(dev);
return dev->__firmware_first & 0x1;
}

Then we use pcie_aer_get_firmware_first() instead of dev->firmware_first
directly.

Best Regards,
Huang Ying

== 2 of 2 ==
Date: Tues, Mar 2 2010 6:40 pm
From: Hidetoshi Seto

(2010/03/03 10:43), Huang Ying wrote:
>> For example, assume that there are 2 endpoints under a same
>> root port. One is (likely on-board) "firmware first" endpoint,
>> with driver which does not call pci_enable_pcie_error_reporting()
>> (because of no interest in AER, or just not implemented yet,
>> anyway). The other is (likely card seated on a slot) not
>> firmware first, with better driver which can handle it's AER.
>> If my understanding is correct and if everything goes well,
>> errors on one should be reported via APEI while the other should
>> be reported via AER driver.
>
> Yes. I think this should be supported. How about something as follow?
>
> struct pci_dev {
> ...
> unsigned int __firmware_first:2;
> ...
> };
>
> int pcie_aer_get_firmware_first(struct pci_dev *dev)
> {
> if (!dev->__firmware_first)
> aer_set_firmware_first(dev);
> return dev->__firmware_first & 0x1;
> }
>
> Then we use pcie_aer_get_firmware_first() instead of dev->firmware_first
> directly.

Looks reasonable. I think the following is more straightforward:

struct pci_dev {
...
unsigned int __firmware_first_valid:1;
unsigned int __firmware_first:1;
...
};

int pcie_aer_get_firmware_first(struct pci_dev *dev)
{
if (!dev->__firmware_first_valid)
aer_set_firmware_first(dev);
return dev->__firmware_first;
}

Thanks,
H.Seto

==============================================================================
TOPIC: snet: Security for NETwork syscalls
http://groups.google.com/group/linux.kernel/t/2ba3cee1400ac233?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:00 pm
From: Tetsuo Handa

Hello.

Regarding [RFC v2 02/10] Revert "lsm: Remove the socket_post_accept() hook"
@@ -1538,6 +1538,8 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
fd_install(newfd, newfile);
err = newfd;

+ security_socket_post_accept(sock, newsock);
+
out_put:
fput_light(sock->file, fput_needed);
out:

Please move security_socket_post_accept() to before fd_install().
Otherwise, other threads which share fd tables can use
security-informations-not-yet-updated accept()ed sockets.

Regarding [RFC v2 04/10] snet: introduce snet_core
+static __init int snet_init(void)
+{
+ int ret;
+
+ pr_debug("initializing: event_hash_size=%u "
+ "verdict_hash_size=%u verdict_delay=%usecs "
+ "default_policy=%s\n",
+ snet_evh_size, snet_vdh_size, snet_verdict_delay,
+ snet_verdict_name(snet_verdict_policy));

Why not to stop here if snet_evh_size == 0 or snet_vdh_size == 0 in order to
avoid "division by 0".

Regarding [RFC v2 05/10] snet: introduce snet_event
+static rwlock_t snet_evh_lock = __RW_LOCK_UNLOCKED();

You can use "static DEFINE_RWLOCK(snet_evh_lock);".

+int snet_event_is_registered(const enum snet_syscall syscall, const u8 protocol)

Maybe rcu_read_lock() is better than rw spinlock because this function is
frequently called.

Regarding [RFC v2 06/10] snet: introduce snet_hooks
+ if ((verdict = snet_ticket_check(&info)) != SNET_VERDICT_NONE)

Please avoid assignment in "if" statement, as scripts/checkpatch.pl suggests.

Regarding [RFC v2 09/10] snet: introduce snet_ticket
+enum snet_verdict snet_ticket_check(struct snet_info *info)
+{
+ struct snet_ticket *st = NULL;
+ unsigned int h = 0, verdict = SNET_VERDICT_NONE;
+ struct list_head *l = NULL;
+ struct snet_task_security *tsec = NULL;
+
+ if (snet_ticket_mode == SNET_TICKET_OFF)
+ goto out;
+
+ tsec = (struct snet_task_security*) current_security();
+
+ h = jhash_2words(info->syscall, info->protocol, 0) % HSIZE;
+ l = &tsec->hash[h];
+
+ read_lock_bh(&tsec->lock);

Credentials are allocated for copy-on-write basis.
Sharing "tsec" among multiple "struct task_struct" is what you intended?

Regards.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [PATCH]Support MCP89 and GT21x hdmi audio
http://groups.google.com/group/linux.kernel/t/16c949f2d92ac0a6?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:00 pm
From: Wu Fengguang

On Mon, Mar 01, 2010 at 07:27:53PM +0800, Wei Ni wrote:
> Hi, Takashi
> I developed the hdmi audio driver for new chipset MCP89 and GT21x.
> The new HAD controller and codec support standard HDMI operation.
>
> I attached the patch file, please check it.

Wei Ni,

Can we avoid the big copy&paste and do more code reuse?
This benefits all of us in long term.

==============================================================================
TOPIC: introduce sys_membarrier(): process-wide memory barrier (v9)
http://groups.google.com/group/linux.kernel/t/c66948a2bac76935?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:00 pm
From: Josh Triplett

On Tue, Mar 02, 2010 at 06:07:10PM -0500, Mathieu Desnoyers wrote:
> * Josh Triplett (josh@joshtriplett.org) wrote:
> > On Thu, Feb 25, 2010 at 06:23:16PM -0500, Mathieu Desnoyers wrote:
> > > I am proposing this patch for the 2.6.34 merge window, as I think it is ready
> > > for inclusion.
> > >
> > > Here is an implementation of a new system call, sys_membarrier(), which
> > > executes a memory barrier on all threads of the current process.
> > [...]
> >
> > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Acked-by: Steven Rostedt <rostedt@goodmis.org>
> > > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > CC: Nicholas Miell <nmiell@comcast.net>
> > > CC: Linus Torvalds <torvalds@linux-foundation.org>
> > > CC: mingo@elte.hu
> > > CC: laijs@cn.fujitsu.com
> > > CC: dipankar@in.ibm.com
> > > CC: akpm@linux-foundation.org
> > > CC: josh@joshtriplett.org
> >
> > Acked-by: Josh Triplett <josh@joshtriplett.org>
> >
> > I agree that v9 seems ready for inclusion.
>
> Thanks!
>
> >
> > Out of curiosity, do you have any benchmarks for the case of not
> > detecting sys_membarrier dynamically? Detecting it at library
> > initialization time, for instance, or even just compiling to assume its
> > presence? I'd like to know how much that would improve the numbers.
>
> Citing the patch changelog:
>
> Results in liburcu:
>
> Operations in 10s, 6 readers, 2 writers:
>
> (what we previously had)
> memory barriers in reader: 973494744 reads, 892368 writes
> signal-based scheme: 6289946025 reads, 1251 writes
>
> (what we have now, with dynamic sys_membarrier check, expedited scheme)
> memory barriers in reader: 907693804 reads, 817793 writes
> sys_membarrier scheme: 4316818891 reads, 503790 writes
>
> So basically, yes, there is a significant overhead on the read-side if we
> compare the dynamic check (0.39 ns/read per reader) to the signal-based scheme
> (0.26 ns/read per reader) (which only needs the barrier()). On the update-side,
> we cannot care less though.

Just wanted to confirm that the signal results also hold for the
assume-sys_membarrier approach.

> > If significant, it might make sense to try to have a mechanism similar
> > to SMP alternatives, to have different code in either case. dlopen,
> > function pointers, runtime code patching (nop out the rmb), or similar.
>
> Yes, definitely. It could also be useful to switch between UP and SMP primitives
> dynamically when spawning the second thread in a process. We should be careful
> when sharing memory maps between processes though.

Might prove useful for some use cases, sure. Not a high priority given
complexity:performance ratio though, I think.

- Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: binfmt_elf: plug a memory leak situation on dump_seek()
http://groups.google.com/group/linux.kernel/t/5bec5866c9bd2261?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Mar 2 2010 6:10 pm
From: KOSAKI Motohiro

> On Fri, 26 Feb 2010 00:54:40 -0300
> Andr__ Goddard Rosa <andre.goddard@gmail.com> wrote:
>
> > Signed-off-by: Andr__ Goddard Rosa <andre.goddard@gmail.com>
> > ---
> > fs/binfmt_elf.c | 10 +++++++---
> > 1 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index fd5b2ea..13b0845 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -1096,6 +1096,8 @@ static int dump_write(struct file *file, const void *addr, int nr)
> >
> > static int dump_seek(struct file *file, loff_t off)
> > {
> > + int ret = 1;
> > +
> > if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
> > if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
> > return 0;
> > @@ -1107,13 +1109,15 @@ static int dump_seek(struct file *file, loff_t off)
> > unsigned long n = off;
> > if (n > PAGE_SIZE)
> > n = PAGE_SIZE;
> > - if (!dump_write(file, buf, n))
> > - return 0;
> > + if (!dump_write(file, buf, n)) {
> > + ret = 0;
> > + break;
> > + }
> > off -= n;
> > }
> > free_page((unsigned long)buf);
> > }
> > - return 1;
> > + return ret;
> > }
>
> Please don't send unchangelogged patches.
>
> Explain the leak.
>
> Explain the user impact (ie: how it is triggered).
>
> Explain how the patch fixes it.
>
> Thanks.

Hi Andre,

plus, can you please rebase this patch onto -mmotm tree? it have lots elf core dump related
fix and it is going to be merged at this merge window, maybe.

I think your patch is correct. but I hope to avoid patch confliction.

Thanks.

== 2 of 2 ==
Date: Tues, Mar 2 2010 6:30 pm
From: André Goddard Rosa

Hi Kosaki Motohiro, Andrew,

>> Please don't send unchangelogged patches.
>>
>> Explain the leak.
>>
>> Explain the user impact (ie: how it is triggered).
>>
>> Explain how the patch fixes it.
>>
>> Thanks.
>
> Hi Andre,
>
> plus, can you please rebase this patch onto -mmotm tree? it have lots elf core dump related
> fix and it is going to be merged at this merge window, maybe.
>
> I think your patch is correct. but I hope to avoid patch confliction.
>
> Thanks.
>

Sure, I'm going to include a proper changelog, rebase it on top of
-mmotm tree and send a new patch soon.

Thank you,
André
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: x86/mm fixes
http://groups.google.com/group/linux.kernel/t/65bf90b96de084b0?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:20 pm
From: Wu Fengguang

On Wed, Mar 03, 2010 at 03:31:48AM +0800, H. Peter Anvin wrote:
> On 03/02/2010 10:39 AM, Linus Torvalds wrote:
> >
> >
> > On Tue, 2 Mar 2010, H. Peter Anvin wrote:
> >> + pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >> + end_pfn = (res.end + 1) >> PAGE_SHIFT;
> >> + if (end_pfn > pfn)
> >> + ret = (*func)(pfn, end_pfn - pfn, arg);
> >> if (ret)
> >> break;
> >> res.start = res.end + 1;
> >
> > What kind of messed-up indentation is that? We don't use 4-char indents.
> >
>
> Branch updated with an indentation patch. Sorry about that.

Sorry! Wondering I didn't do the indent with the vim '=='.

Anyway I just hacked /usr/share/quilt/refresh to automatically run the
kernel style checker:

# wfg: check for kernel coding style
if [ -x scripts/checkpatch.pl ]; then
scripts/checkpatch.pl $patch_file
fi

==============================================================================
TOPIC: memcg: dirty pages instrumentation
http://groups.google.com/group/linux.kernel/t/98b8f3d66410be44?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:20 pm
From: Daisuke Nishimura

> diff --git a/mm/filemap.c b/mm/filemap.c
> index fe09e51..f85acae 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -135,6 +135,7 @@ void __remove_from_page_cache(struct page *page)
> * having removed the page entirely.
> */
> if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_DIRTY, -1);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
> }
(snip)
> @@ -1096,6 +1113,7 @@ int __set_page_dirty_no_writeback(struct page *page)
> void account_page_dirtied(struct page *page, struct address_space *mapping)
> {
> if (mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_DIRTY, 1);
> __inc_zone_page_state(page, NR_FILE_DIRTY);
> __inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
> task_dirty_inc(current);
As long as I can see, those two functions(at least) calls mem_cgroup_update_state(),
which acquires page cgroup lock, under mapping->tree_lock.
But as I fixed before in commit e767e056, page cgroup lock must not acquired under
mapping->tree_lock.
hmm, we should call those mem_cgroup_update_state() outside mapping->tree_lock,
or add local_irq_save/restore() around lock/unlock_page_cgroup() to avoid dead-lock.

Thanks,
Daisuke Nishimura.

On Mon, 1 Mar 2010 22:23:40 +0100, Andrea Righi <arighi@develer.com> wrote:
> Apply the cgroup dirty pages accounting and limiting infrastructure to
> the opportune kernel functions.
>
> Signed-off-by: Andrea Righi <arighi@develer.com>
> ---
> fs/fuse/file.c | 5 +++
> fs/nfs/write.c | 4 ++
> fs/nilfs2/segment.c | 10 +++++-
> mm/filemap.c | 1 +
> mm/page-writeback.c | 84 ++++++++++++++++++++++++++++++++------------------
> mm/rmap.c | 4 +-
> mm/truncate.c | 2 +
> 7 files changed, 76 insertions(+), 34 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index a9f5e13..dbbdd53 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -11,6 +11,7 @@
> #include <linux/pagemap.h>
> #include <linux/slab.h>
> #include <linux/kernel.h>
> +#include <linux/memcontrol.h>
> #include <linux/sched.h>
> #include <linux/module.h>
>
> @@ -1129,6 +1130,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>
> list_del(&req->writepages_entry);
> dec_bdi_stat(bdi, BDI_WRITEBACK);
> + mem_cgroup_update_stat(req->pages[0],
> + MEM_CGROUP_STAT_WRITEBACK_TEMP, -1);
> dec_zone_page_state(req->pages[0], NR_WRITEBACK_TEMP);
> bdi_writeout_inc(bdi);
> wake_up(&fi->page_waitq);
> @@ -1240,6 +1243,8 @@ static int fuse_writepage_locked(struct page *page)
> req->inode = inode;
>
> inc_bdi_stat(mapping->backing_dev_info, BDI_WRITEBACK);
> + mem_cgroup_update_stat(tmp_page,
> + MEM_CGROUP_STAT_WRITEBACK_TEMP, 1);
> inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
> end_page_writeback(page);
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index b753242..7316f7a 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -439,6 +439,7 @@ nfs_mark_request_commit(struct nfs_page *req)
> req->wb_index,
> NFS_PAGE_TAG_COMMIT);
> spin_unlock(&inode->i_lock);
> + mem_cgroup_update_stat(req->wb_page, MEM_CGROUP_STAT_UNSTABLE_NFS, 1);
> inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_UNSTABLE);
> __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> @@ -450,6 +451,7 @@ nfs_clear_request_commit(struct nfs_page *req)
> struct page *page = req->wb_page;
>
> if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_UNSTABLE_NFS, -1);
> dec_zone_page_state(page, NR_UNSTABLE_NFS);
> dec_bdi_stat(page->mapping->backing_dev_info, BDI_UNSTABLE);
> return 1;
> @@ -1273,6 +1275,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
> req = nfs_list_entry(head->next);
> nfs_list_remove_request(req);
> nfs_mark_request_commit(req);
> + mem_cgroup_update_stat(req->wb_page,
> + MEM_CGROUP_STAT_UNSTABLE_NFS, -1);
> dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
> BDI_UNSTABLE);
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index ada2f1b..aef6d13 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1660,8 +1660,11 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out)
> } while (bh = bh->b_this_page, bh2 = bh2->b_this_page, bh != head);
> kunmap_atomic(kaddr, KM_USER0);
>
> - if (!TestSetPageWriteback(clone_page))
> + if (!TestSetPageWriteback(clone_page)) {
> + mem_cgroup_update_stat(clone_page,
> + MEM_CGROUP_STAT_WRITEBACK, 1);
> inc_zone_page_state(clone_page, NR_WRITEBACK);
> + }
> unlock_page(clone_page);
>
> return 0;
> @@ -1783,8 +1786,11 @@ static void __nilfs_end_page_io(struct page *page, int err)
> }
>
> if (buffer_nilfs_allocated(page_buffers(page))) {
> - if (TestClearPageWriteback(page))
> + if (TestClearPageWriteback(page)) {
> + mem_cgroup_update_stat(clone_page,
> + MEM_CGROUP_STAT_WRITEBACK, -1);
> dec_zone_page_state(page, NR_WRITEBACK);
> + }
> } else
> end_page_writeback(page);
> }
> diff --git a/mm/filemap.c b/mm/filemap.c
> index fe09e51..f85acae 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -135,6 +135,7 @@ void __remove_from_page_cache(struct page *page)
> * having removed the page entirely.
> */
> if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_DIRTY, -1);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
> }
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 5a0f8f3..d83f41c 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -137,13 +137,14 @@ static struct prop_descriptor vm_dirties;
> */
> static int calc_period_shift(void)
> {
> - unsigned long dirty_total;
> + unsigned long dirty_total, dirty_bytes;
>
> - if (vm_dirty_bytes)
> - dirty_total = vm_dirty_bytes / PAGE_SIZE;
> + dirty_bytes = mem_cgroup_dirty_bytes();
> + if (dirty_bytes)
> + dirty_total = dirty_bytes / PAGE_SIZE;
> else
> - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> - 100;
> + dirty_total = (mem_cgroup_dirty_ratio() *
> + determine_dirtyable_memory()) / 100;
> return 2 + ilog2(dirty_total - 1);
> }
>
> @@ -408,14 +409,16 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
> */
> unsigned long determine_dirtyable_memory(void)
> {
> - unsigned long x;
> -
> - x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> + unsigned long memory;
> + s64 memcg_memory;
>
> + memory = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> if (!vm_highmem_is_dirtyable)
> - x -= highmem_dirtyable_memory(x);
> -
> - return x + 1; /* Ensure that we never return 0 */
> + memory -= highmem_dirtyable_memory(memory);
> + memcg_memory = mem_cgroup_page_stat(MEMCG_NR_DIRTYABLE_PAGES);
> + if (memcg_memory < 0)
> + return memory + 1;
> + return min((unsigned long)memcg_memory, memory + 1);
> }
>
> void
> @@ -423,26 +426,28 @@ get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
> unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
> {
> unsigned long background;
> - unsigned long dirty;
> + unsigned long dirty, dirty_bytes, dirty_background;
> unsigned long available_memory = determine_dirtyable_memory();
> struct task_struct *tsk;
>
> - if (vm_dirty_bytes)
> - dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
> + dirty_bytes = mem_cgroup_dirty_bytes();
> + if (dirty_bytes)
> + dirty = DIV_ROUND_UP(dirty_bytes, PAGE_SIZE);
> else {
> int dirty_ratio;
>
> - dirty_ratio = vm_dirty_ratio;
> + dirty_ratio = mem_cgroup_dirty_ratio();
> if (dirty_ratio < 5)
> dirty_ratio = 5;
> dirty = (dirty_ratio * available_memory) / 100;
> }
>
> - if (dirty_background_bytes)
> - background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
> + dirty_background = mem_cgroup_dirty_background_bytes();
> + if (dirty_background)
> + background = DIV_ROUND_UP(dirty_background, PAGE_SIZE);
> else
> - background = (dirty_background_ratio * available_memory) / 100;
> -
> + background = (mem_cgroup_dirty_background_ratio() *
> + available_memory) / 100;
> if (background >= dirty)
> background = dirty / 2;
> tsk = current;
> @@ -508,9 +513,13 @@ static void balance_dirty_pages(struct address_space *mapping,
> get_dirty_limits(&background_thresh, &dirty_thresh,
> &bdi_thresh, bdi);
>
> - nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> + nr_reclaimable = mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
> + nr_writeback = mem_cgroup_page_stat(MEMCG_NR_WRITEBACK);
> + if ((nr_reclaimable < 0) || (nr_writeback < 0)) {
> + nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> global_page_state(NR_UNSTABLE_NFS);
> - nr_writeback = global_page_state(NR_WRITEBACK);
> + nr_writeback = global_page_state(NR_WRITEBACK);
> + }
>
> bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
> if (bdi_cap_account_unstable(bdi)) {
> @@ -611,10 +620,12 @@ static void balance_dirty_pages(struct address_space *mapping,
> * In normal mode, we start background writeout at the lower
> * background_thresh, to keep the amount of dirty memory low.
> */
> + nr_reclaimable = mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
> + if (nr_reclaimable < 0)
> + nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> + global_page_state(NR_UNSTABLE_NFS);
> if ((laptop_mode && pages_written) ||
> - (!laptop_mode && ((global_page_state(NR_FILE_DIRTY)
> - + global_page_state(NR_UNSTABLE_NFS))
> - > background_thresh)))
> + (!laptop_mode && (nr_reclaimable > background_thresh)))
> bdi_start_writeback(bdi, NULL, 0);
> }
>
> @@ -678,6 +689,8 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> unsigned long dirty_thresh;
>
> for ( ; ; ) {
> + unsigned long dirty;
> +
> get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
>
> /*
> @@ -686,10 +699,14 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> */
> dirty_thresh += dirty_thresh / 10; /* wheeee... */
>
> - if (global_page_state(NR_UNSTABLE_NFS) +
> - global_page_state(NR_WRITEBACK) <= dirty_thresh)
> - break;
> - congestion_wait(BLK_RW_ASYNC, HZ/10);
> +
> + dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> + if (dirty < 0)
> + dirty = global_page_state(NR_UNSTABLE_NFS) +
> + global_page_state(NR_WRITEBACK);
> + if (dirty <= dirty_thresh)
> + break;
> + congestion_wait(BLK_RW_ASYNC, HZ/10);
>
> /*
> * The caller might hold locks which can prevent IO completion
> @@ -1096,6 +1113,7 @@ int __set_page_dirty_no_writeback(struct page *page)
> void account_page_dirtied(struct page *page, struct address_space *mapping)
> {
> if (mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_DIRTY, 1);
> __inc_zone_page_state(page, NR_FILE_DIRTY);
> __inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
> task_dirty_inc(current);
> @@ -1297,6 +1315,8 @@ int clear_page_dirty_for_io(struct page *page)
> * for more comments.
> */
> if (TestClearPageDirty(page)) {
> + mem_cgroup_update_stat(page,
> + MEM_CGROUP_STAT_FILE_DIRTY, -1);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info,
> BDI_DIRTY);
> @@ -1332,8 +1352,10 @@ int test_clear_page_writeback(struct page *page)
> } else {
> ret = TestClearPageWriteback(page);
> }
> - if (ret)
> + if (ret) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_WRITEBACK, -1);
> dec_zone_page_state(page, NR_WRITEBACK);
> + }
> return ret;
> }
>
> @@ -1363,8 +1385,10 @@ int test_set_page_writeback(struct page *page)
> } else {
> ret = TestSetPageWriteback(page);
> }
> - if (!ret)
> + if (!ret) {
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_WRITEBACK, 1);
> inc_zone_page_state(page, NR_WRITEBACK);
> + }
> return ret;
>
> }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4d2fb93..8d74335 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -832,7 +832,7 @@ void page_add_file_rmap(struct page *page)
> {
> if (atomic_inc_and_test(&page->_mapcount)) {
> __inc_zone_page_state(page, NR_FILE_MAPPED);
> - mem_cgroup_update_file_mapped(page, 1);
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_MAPPED, 1);
> }
> }
>
> @@ -864,7 +864,7 @@ void page_remove_rmap(struct page *page)
> __dec_zone_page_state(page, NR_ANON_PAGES);
> } else {
> __dec_zone_page_state(page, NR_FILE_MAPPED);
> - mem_cgroup_update_file_mapped(page, -1);
> + mem_cgroup_update_stat(page, MEM_CGROUP_STAT_FILE_MAPPED, -1);
> }
> /*
> * It would be tidy to reset the PageAnon mapping here,
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 2466e0c..5f437e7 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -73,6 +73,8 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
> if (TestClearPageDirty(page)) {
> struct address_space *mapping = page->mapping;
> if (mapping && mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_update_stat(page,
> + MEM_CGROUP_STAT_FILE_DIRTY, -1);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info,
> BDI_DIRTY);
> --
> 1.6.3.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: kconfig: place git SHA1 in .config output if in git tree
http://groups.google.com/group/linux.kernel/t/9f6a7cb3a5d924c8?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:20 pm
From: "Paul E. McKenney"

On Wed, Mar 03, 2010 at 01:42:41AM +0100, Frans Pop wrote:
> On Wednesday 03 March 2010, Paul E. McKenney wrote:
> > > Wouldn't it be more logical to include the line in the dmesg output?
> > > My preference would be a separate line below the existing (Linux
> > > version) line. That line could only be output if the kernel was built
> > > from a VCS. It could then even be repeated in oops output.
> >
> > My concern with only putting it in the dmesg output is that people do
> > not always capture that. �The oops output is more often captured, but
> > the kernel only has this information if CONFIG_LOCALVERSION_AUTO is set.
>
> Yes, my suggestion is exactly because IMO it should be independent of
> CONFIG_LOCALVERSION_AUTO.
>
> I hugely dislike that option because it makes the git version part of the
> kernel version and thus affects how the kernel gets installed (names of
> files in /boot, name of the directory in /lib/modules, name of the Debian
> package created using the deb-pkg target, etc.).
> For all those things I want a "clean" kernel version and thus I will never
> enable CONFIG_LOCALVERSION_AUTO.

That does sound a bit painful...

> But I do see the value of a reliable and consistent identification of what
> exact source a kernel was built from. Including the git version separately
> from the kernel version would allow that.

Fortunately, scripts/setlocalversion seems to run quite a bit faster the
second time I run it, probably due to various caches having been warmed
up the first time. Because doing all of this straightforwardly means up
to three invocations in a given kernel build. ;-)

==============================================================================
TOPIC: pci: move pci_set_dma_mask and pci_set_consistent_dma_mask to pci-dma-
compat.h
http://groups.google.com/group/linux.kernel/t/b89576fbf68181bd?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Mar 2 2010 6:40 pm
From: Andrew Morton

On Fri, 12 Feb 2010 18:33:32 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> We can use pci-dma-compat.h to implement pci_set_dma_mask and
> pci_set_consistent_dma_mask as we do with the other PCI DMA API.
>
> We can remove HAVE_ARCH_PCI_SET_DMA_MASK too.

i386 allnoconfig:

include/asm-generic/pci-dma-compat.h:105: error: redefinition of 'pci_set_dma_mask'
include/linux/pci.h:1092: error: previous definition of 'pci_set_dma_mask' was here
include/asm-generic/pci-dma-compat.h:110: error: redefinition of 'pci_set_consistent_dma_mask'
include/linux/pci.h:1097: error: previous definition of 'pci_set_consistent_dma_mask' was here

In fact the whole of include/asm-generic/pci-dma-compat.h seems a bit
fishy when CONFIG_PCI=n. Shouldn't all those functions be
EIO-returning stubs?

== 2 of 2 ==
Date: Tues, Mar 2 2010 7:00 pm
From: FUJITA Tomonori

On Tue, 2 Mar 2010 18:29:27 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 12 Feb 2010 18:33:32 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
>
> > We can use pci-dma-compat.h to implement pci_set_dma_mask and
> > pci_set_consistent_dma_mask as we do with the other PCI DMA API.
> >
> > We can remove HAVE_ARCH_PCI_SET_DMA_MASK too.
>
> i386 allnoconfig:

Sorry about that.

> include/asm-generic/pci-dma-compat.h:105: error: redefinition of 'pci_set_dma_mask'
> include/linux/pci.h:1092: error: previous definition of 'pci_set_dma_mask' was here
> include/asm-generic/pci-dma-compat.h:110: error: redefinition of 'pci_set_consistent_dma_mask'
> include/linux/pci.h:1097: error: previous definition of 'pci_set_consistent_dma_mask' was here
>
> In fact the whole of include/asm-generic/pci-dma-compat.h seems a bit
> fishy when CONFIG_PCI=n. Shouldn't all those functions be
> EIO-returning stubs?

Might be (and it should work). However, when CONFIG_PCI=n, we has
silently converted the PCI DMA API to the generic DMA API.

In the long term, we remove the pci_dma_* API.

Can you fold this into the above patch?

diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h
index ddfa9c5..1437b7d 100644
--- a/include/asm-generic/pci-dma-compat.h
+++ b/include/asm-generic/pci-dma-compat.h
@@ -101,6 +101,7 @@ pci_dma_mapping_error(struct pci_dev *pdev, dma_addr_t dma_addr)
return dma_mapping_error(&pdev->dev, dma_addr);
}

+#ifdef CONFIG_PCI
static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask)
{
return dma_set_mask(&dev->dev, mask);
@@ -110,5 +111,6 @@ static inline int pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
{
return dma_set_coherent_mask(&dev->dev, mask);
}
+

twitter

Tuesday, March 2, 2010

linux.kernel - 26 new messages in 18 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts