Tuesday, January 21, 2014

linux.kernel - 26 new messages in 14 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* ACPI: add callback prepare() into acpi_hotplug_handler - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/6f6826cf9783dcfc?hl=en
* mm: thp: hugepage_vma_check has a blind spot - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/bf263f763b1b8c0c?hl=en
* drivers/base: delete non-required instances of include <linux/init.h> - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/bd6c2e133bc56440?hl=en
* pm/qos: allow state control of qos class - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/99d47d2ab1cbab9d?hl=en
* [PATCH 4-1/5] ACPI / scan: Add bind/unbind callbacks to struct acpi_scan_
handler - 3 messages, 1 author
http://groups.google.com/group/linux.kernel/t/9677ebe66238002a?hl=en
* tty: Allow stealing of controlling ttys within user namespaces - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/46ae0e7e30358a80?hl=en
* Linux 3.12.7 introduces page map handling regression - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1076eaae005a05d1?hl=en
* x86: Inconsistent xAPIC synchronization in arch_irq_work_raise? - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/daa6d2e500b383a8?hl=en
* MCS Lock: Allow architectures to hook in to contended paths - 7 messages, 1
author
http://groups.google.com/group/linux.kernel/t/a70959a6ec9463ba?hl=en
* audit: store audit_pid as a struct pid pointer - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/867132a6ef8a8c2c?hl=en
* bio_integrity_verify() bug causing READ verify to be silently skipped - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/cf25ed79a5250081?hl=en
* [PATCH] preempt: Debug for possible missed preemption checks - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/68b76fe9e374c2dc?hl=en
* numa,sched: define some magic numbers - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/34ca8de3d5507b71?hl=en
* gpio: bcm281xx: Centralize register locking - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2695bfd6ae884f11?hl=en

==============================================================================
TOPIC: ACPI: add callback prepare() into acpi_hotplug_handler
http://groups.google.com/group/linux.kernel/t/6f6826cf9783dcfc?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Jan 21 2014 3:10 pm
From: "Rafael J. Wysocki"


On Tuesday, January 21, 2014 02:14:57 PM Toshi Kani wrote:
> On Sat, 2014-01-18 at 10:48 +0800, Jiang Liu wrote:
> > Add callback prepare() into acpi_hotplug_handler, which will get called
> > at the very beginning of ACPI hotplug event handler. The ACPI core will
> > ignore the event if prepare() returns NOTIFY_STOP.
> >
> > Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> > ---
> > drivers/acpi/scan.c | 4 ++++
> > include/acpi/acpi_bus.h | 1 +
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index fd39459..6b0f419 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -392,6 +392,10 @@ static void acpi_hotplug_notify_cb(acpi_handle handle, u32 type, void *data)
> > struct acpi_device *adev;
> > acpi_status status;
> >
> > + if (handler->prepare &&
> > + handler->prepare(handle, type, data) == NOTIFY_STOP)
> > + return;
>
> The OS is responsible for calling _OST when it is implemented. So you
> cannot just return here. See acpi_hotplug_unsupported(handle, type)
> next line. Also, please describe why prepare() needs to be added.

I don't think it's needed any more, please see:

http://marc.info/?l=linux-acpi&m=139001691317575&w=2

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Tues, Jan 21 2014 3:20 pm
From: Toshi Kani


On Wed, 2014-01-22 at 00:17 +0100, Rafael J. Wysocki wrote:
> On Tuesday, January 21, 2014 02:14:57 PM Toshi Kani wrote:
> > On Sat, 2014-01-18 at 10:48 +0800, Jiang Liu wrote:
> > > Add callback prepare() into acpi_hotplug_handler, which will get called
> > > at the very beginning of ACPI hotplug event handler. The ACPI core will
> > > ignore the event if prepare() returns NOTIFY_STOP.
> > >
> > > Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> > > ---
> > > drivers/acpi/scan.c | 4 ++++
> > > include/acpi/acpi_bus.h | 1 +
> > > 2 files changed, 5 insertions(+)
> > >
> > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > > index fd39459..6b0f419 100644
> > > --- a/drivers/acpi/scan.c
> > > +++ b/drivers/acpi/scan.c
> > > @@ -392,6 +392,10 @@ static void acpi_hotplug_notify_cb(acpi_handle handle, u32 type, void *data)
> > > struct acpi_device *adev;
> > > acpi_status status;
> > >
> > > + if (handler->prepare &&
> > > + handler->prepare(handle, type, data) == NOTIFY_STOP)
> > > + return;
> >
> > The OS is responsible for calling _OST when it is implemented. So you
> > cannot just return here. See acpi_hotplug_unsupported(handle, type)
> > next line. Also, please describe why prepare() needs to be added.
>
> I don't think it's needed any more, please see:
>
> http://marc.info/?l=linux-acpi&m=139001691317575&w=2

Oh, I see. Thanks!
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: mm: thp: hugepage_vma_check has a blind spot
http://groups.google.com/group/linux.kernel/t/bf263f763b1b8c0c?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Jan 21 2014 3:10 pm
From: Alex Thorlton


hugepage_vma_check is called during khugepaged_scan_mm_slot to ensure
that khugepaged doesn't try to allocate THPs in vmas where they are
disallowed, either due to THPs being disabled system-wide, or through
MADV_NOHUGEPAGE.

The logic that hugepage_vma_check uses doesn't seem to cover all cases,
in my opinion. Looking at the original code:

if ((!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_always()) ||
(vma->vm_flags & VM_NOHUGEPAGE))

We can see that it's possible to have THP disabled system-wide, but still
receive THPs in this vma. It seems that it's assumed that just because
khugepaged_always == false, TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG must be
set, which is not the case. We could have VM_HUGEPAGE set, but have THP
set to "never" system-wide, in which case, the condition presented in the
if will evaluate to false, and (provided the other checks pass) we can
end up giving out a THP even though the behavior is set to "never."

While we do properly check these flags in khugepaged_has_work, it looks
like it's possible to sleep after we check khugepaged_hask_work, but
before hugepage_vma_check, during which time, hugepages could have been
disabled system-wide, in which case, we could hand out THPs when we
shouldn't be.

This small fix makes hugepage_vma_check work more like
transparent_hugepage_enabled, checking if THPs are set to "always"
system-wide, then checking if THPs are set to "madvise," as well as
making sure that VM_HUGEPAGE is set for this vma.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org

---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95d1acb..f62fba9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2394,7 +2394,8 @@ static struct page

static bool hugepage_vma_check(struct vm_area_struct *vma)
{
- if ((!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_always()) ||
+ if ((!khugepaged_always() ||
+ (!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_req_madv())) ||
(vma->vm_flags & VM_NOHUGEPAGE))
return false;

--
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Tues, Jan 21 2014 3:30 pm
From: David Rientjes


On Tue, 21 Jan 2014, Alex Thorlton wrote:

> hugepage_vma_check is called during khugepaged_scan_mm_slot to ensure
> that khugepaged doesn't try to allocate THPs in vmas where they are
> disallowed, either due to THPs being disabled system-wide, or through
> MADV_NOHUGEPAGE.
>
> The logic that hugepage_vma_check uses doesn't seem to cover all cases,
> in my opinion. Looking at the original code:
>
> if ((!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_always()) ||
> (vma->vm_flags & VM_NOHUGEPAGE))
>
> We can see that it's possible to have THP disabled system-wide, but still
> receive THPs in this vma. It seems that it's assumed that just because
> khugepaged_always == false, TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG must be
> set, which is not the case. We could have VM_HUGEPAGE set, but have THP
> set to "never" system-wide, in which case, the condition presented in the
> if will evaluate to false, and (provided the other checks pass) we can
> end up giving out a THP even though the behavior is set to "never."
>

You should be able to add a

BUG_ON(current != khugepaged_thread);

here since khugepaged is supposed to be the only caller to the function.

> While we do properly check these flags in khugepaged_has_work, it looks
> like it's possible to sleep after we check khugepaged_hask_work, but
> before hugepage_vma_check, during which time, hugepages could have been
> disabled system-wide, in which case, we could hand out THPs when we
> shouldn't be.
>

You're talking about when thp is set to "never" and before khugepaged has
stopped, correct?

That doesn't seem like a bug to me or anything that needs to be fixed, the
sysfs knob could be switched even after hugepage_vma_check() is called and
before a hugepage is actually collapsed so you have the same race.

The only thing that's guaranteed is that, upon writing "never" to
/sys/kernel/mm/transparent_hugepage/enabled, no more thp memory will be
collapsed after khugepaged has stopped.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: drivers/base: delete non-required instances of include <linux/init.h>
http://groups.google.com/group/linux.kernel/t/bd6c2e133bc56440?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Jan 21 2014 3:10 pm
From: "Rafael J. Wysocki"


On Tuesday, January 21, 2014 04:23:10 PM Paul Gortmaker wrote:
> None of these files are actually using any __init type directives
> and hence don't need to include <linux/init.h>. Most are just a
> left over from __devinit and __cpuinit removal, or simply due to
> code getting copied from one driver to the next.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> drivers/base/attribute_container.c | 1 -
> drivers/base/power/clock_ops.c | 1 -
> drivers/base/power/common.c | 1 -
> drivers/base/power/domain.c | 1 -
> drivers/base/power/domain_governor.c | 1 -
> drivers/base/power/opp.c | 1 -
> drivers/base/regmap/regmap-i2c.c | 1 -
> drivers/base/regmap/regmap-mmio.c | 1 -
> drivers/base/regmap/regmap-spi.c | 1 -
> drivers/base/topology.c | 1 -
> 10 files changed, 10 deletions(-)
>
> diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c
> index ecc1929..b84ca8f 100644
> --- a/drivers/base/attribute_container.c
> +++ b/drivers/base/attribute_container.c
> @@ -12,7 +12,6 @@
> */
>
> #include <linux/attribute_container.h>
> -#include <linux/init.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> #include <linux/slab.h>
> diff --git a/drivers/base/power/clock_ops.c b/drivers/base/power/clock_ops.c
> index e870bbe..b99e6c0 100644
> --- a/drivers/base/power/clock_ops.c
> +++ b/drivers/base/power/clock_ops.c
> @@ -6,7 +6,6 @@
> * This file is released under the GPLv2.
> */
>
> -#include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/device.h>
> #include <linux/io.h>
> diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c
> index 5da9140..df2e5ee 100644
> --- a/drivers/base/power/common.c
> +++ b/drivers/base/power/common.c
> @@ -6,7 +6,6 @@
> * This file is released under the GPLv2.
> */
>
> -#include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/device.h>
> #include <linux/export.h>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index bfb8955..921b192 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -6,7 +6,6 @@
> * This file is released under the GPLv2.
> */
>
> -#include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/io.h>
> #include <linux/pm_runtime.h>
> diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
> index 28dee30..a089e3b 100644
> --- a/drivers/base/power/domain_governor.c
> +++ b/drivers/base/power/domain_governor.c
> @@ -6,7 +6,6 @@
> * This file is released under the GPLv2.
> */
>
> -#include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/pm_domain.h>
> #include <linux/pm_qos.h>
> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
> index fa41874..2553867 100644
> --- a/drivers/base/power/opp.c
> +++ b/drivers/base/power/opp.c
> @@ -14,7 +14,6 @@
> #include <linux/kernel.h>
> #include <linux/errno.h>
> #include <linux/err.h>
> -#include <linux/init.h>
> #include <linux/slab.h>
> #include <linux/cpufreq.h>
> #include <linux/device.h>
> diff --git a/drivers/base/regmap/regmap-i2c.c b/drivers/base/regmap/regmap-i2c.c
> index fa6bf52..ebd1895 100644
> --- a/drivers/base/regmap/regmap-i2c.c
> +++ b/drivers/base/regmap/regmap-i2c.c
> @@ -13,7 +13,6 @@
> #include <linux/regmap.h>
> #include <linux/i2c.h>
> #include <linux/module.h>
> -#include <linux/init.h>
>
> static int regmap_i2c_write(void *context, const void *data, size_t count)
> {
> diff --git a/drivers/base/regmap/regmap-mmio.c b/drivers/base/regmap/regmap-mmio.c
> index 81f9775..4410cb2 100644
> --- a/drivers/base/regmap/regmap-mmio.c
> +++ b/drivers/base/regmap/regmap-mmio.c
> @@ -18,7 +18,6 @@
>
> #include <linux/clk.h>
> #include <linux/err.h>
> -#include <linux/init.h>
> #include <linux/io.h>
> #include <linux/module.h>
> #include <linux/regmap.h>
> diff --git a/drivers/base/regmap/regmap-spi.c b/drivers/base/regmap/regmap-spi.c
> index 37f12ae..0eb3097 100644
> --- a/drivers/base/regmap/regmap-spi.c
> +++ b/drivers/base/regmap/regmap-spi.c
> @@ -12,7 +12,6 @@
>
> #include <linux/regmap.h>
> #include <linux/spi/spi.h>
> -#include <linux/init.h>
> #include <linux/module.h>
>
> #include "internal.h"
> diff --git a/drivers/base/topology.c b/drivers/base/topology.c
> index 94ffee3..ad9d177 100644
> --- a/drivers/base/topology.c
> +++ b/drivers/base/topology.c
> @@ -23,7 +23,6 @@
> * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> *
> */
> -#include <linux/init.h>
> #include <linux/mm.h>
> #include <linux/cpu.h>
> #include <linux/module.h>
>

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Tues, Jan 21 2014 3:50 pm
From: Geoff Levand


Hi Paul,

On Tue, 2014-01-21 at 16:22 -0500, Paul Gortmaker wrote:
> Currently these two RTC devices are in core platform code
> where it is not possible for them to be modular. It will
> never be modular, so using module_init as an alias for
> __initcall can be somewhat misleading.
>
> arch/powerpc/kernel/time.c | 2 +-
> arch/powerpc/platforms/ps3/time.c | 3 +--
> 2 files changed, 2 insertions(+), 3 deletions(-)

I tested the PS3 part of this patch and it seems to work OK.

Acked-by: Geoff Levand <geoff@infradead.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: pm/qos: allow state control of qos class
http://groups.google.com/group/linux.kernel/t/99d47d2ab1cbab9d?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Jan 21 2014 3:10 pm
From: "Rafael J. Wysocki"


On Tuesday, January 21, 2014 02:10:42 PM Jacob Pan wrote:
> On Thu, 16 Jan 2014 02:17:01 +0100
> "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
>
> > On Wednesday, November 27, 2013 12:28:16 AM Rafael J. Wysocki wrote:
> > > On 11/27/2013 12:20 AM, Jacob Pan wrote:
> > > > When power capping or thermal control is needed, CPU QOS latency
> > > > cannot be satisfied. This patch adds a state variable to indicate
> > > > whether a QOS class (including all constraint requests) should be
> > > > ignored.
> > > >
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > >
> > > Honestly, I don't like this. I know the motivation and what you're
> > > trying to achieve, but I don't like the approach.
> > >
> > > I need to think a bit more about that.
> >
> > So the reason I don't like this patch is mainly because it affects
> > all of the users of struct pm_qos_constraints and
> > pm_qos_read_value(), which include device PM QoS among other things,
> > but it only really needs to affect PM_QOS_CPU_DMA_LATENCY.
> >
> > I would add a special routine, say pm_qos_cpu_dma_latency(), for
> > reading the current effective PM_QOS_CPU_DMA_LATENCY constraint and
> > checking whether or not it should be ignored. Then, I'd make cpuidle
> > use that.
> >
> Agreed, it was a little too broad. I will send an updated patch soon.
>
> Alternatively, can we add a special check for ignored system wide QOS
> class in:
> int pm_qos_request(int pm_qos_class)
>
> i.e.
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 8dff9b4..9342da4 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -286,10 +286,28 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
> */
> int pm_qos_request(int pm_qos_class)
> {
> - return pm_qos_read_value(pm_qos_array[pm_qos_class]->constraints);
> + struct pm_qos_constraints *c;
> +
> + c = pm_qos_array[pm_qos_class]->constraints;
> + if (c->state == PM_QOS_CONSTRAINT_IGNORED)
> + return PM_QOS_DEFAULT_VALUE;
> + return pm_qos_read_value(c);
>
>
> Then we don't have to add a special routine just for CPU_DMA_LATENCY
> class. It does not affect other system wide QOS classes unless the
> state is set to be ignored.

Yes, but then the check has to be done regardless which is slightly inefficient
and I'm not sure if we need/want a mechanism to set "ignored" for all classes.

It actually is specific to CPU in practice, so I'd prefer to make it specific
in the code as well.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 2 ==
Date: Tues, Jan 21 2014 3:50 pm
From: Jacob Pan


On Wed, 22 Jan 2014 00:15:44 +0100
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:

> On Tuesday, January 21, 2014 02:10:42 PM Jacob Pan wrote:
> > On Thu, 16 Jan 2014 02:17:01 +0100
> > "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> >
> > > On Wednesday, November 27, 2013 12:28:16 AM Rafael J. Wysocki
> > > wrote:
> > > > On 11/27/2013 12:20 AM, Jacob Pan wrote:
> > > > > When power capping or thermal control is needed, CPU QOS
> > > > > latency cannot be satisfied. This patch adds a state variable
> > > > > to indicate whether a QOS class (including all constraint
> > > > > requests) should be ignored.
> > > > >
> > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > >
> > > > Honestly, I don't like this. I know the motivation and what
> > > > you're trying to achieve, but I don't like the approach.
> > > >
> > > > I need to think a bit more about that.
> > >
> > > So the reason I don't like this patch is mainly because it affects
> > > all of the users of struct pm_qos_constraints and
> > > pm_qos_read_value(), which include device PM QoS among other
> > > things, but it only really needs to affect PM_QOS_CPU_DMA_LATENCY.
> > >
> > > I would add a special routine, say pm_qos_cpu_dma_latency(), for
> > > reading the current effective PM_QOS_CPU_DMA_LATENCY constraint
> > > and checking whether or not it should be ignored. Then, I'd make
> > > cpuidle use that.
> > >
> > Agreed, it was a little too broad. I will send an updated patch
> > soon.
> >
> > Alternatively, can we add a special check for ignored system wide
> > QOS class in:
> > int pm_qos_request(int pm_qos_class)
> >
> > i.e.
> > diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> > index 8dff9b4..9342da4 100644
> > --- a/kernel/power/qos.c
> > +++ b/kernel/power/qos.c
> > @@ -286,10 +286,28 @@ bool pm_qos_update_flags(struct pm_qos_flags
> > *pqf, */
> > int pm_qos_request(int pm_qos_class)
> > {
> > - return
> > pm_qos_read_value(pm_qos_array[pm_qos_class]->constraints);
> > + struct pm_qos_constraints *c;
> > +
> > + c = pm_qos_array[pm_qos_class]->constraints;
> > + if (c->state == PM_QOS_CONSTRAINT_IGNORED)
> > + return PM_QOS_DEFAULT_VALUE;
> > + return pm_qos_read_value(c);
> >
> >
> > Then we don't have to add a special routine just for CPU_DMA_LATENCY
> > class. It does not affect other system wide QOS classes unless the
> > state is set to be ignored.
>
> Yes, but then the check has to be done regardless which is slightly
> inefficient and I'm not sure if we need/want a mechanism to set
> "ignored" for all classes.
>
> It actually is specific to CPU in practice, so I'd prefer to make it
> specific in the code as well.
Actually, the idle consolidation patches went into the tip tree do not
include common idle loop, it was different than the earlier patch with
play_idle() which causes idle injection to go through pm qos.

There is no need for this patchset for now. acpi_pad and powerclamp
driver still can pick their own target c-states.

Thanks,

Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: [PATCH 4-1/5] ACPI / scan: Add bind/unbind callbacks to struct acpi_
scan_handler
http://groups.google.com/group/linux.kernel/t/9677ebe66238002a?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Jan 21 2014 3:20 pm
From: "Rafael J. Wysocki"


From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In some cases it may be necessary to perform certain setup/cleanup
operations on a device object representing a physical device after
it has been associated with an ACPI companion by acpi_bind_one() or
before disassociating it from that companion by acpi_unbind_one(),
respectively. If there is a struct acpi_bus_type object for the
given device's bus type, the .setup()/.cleanup() callbacks from there
are executed for these purposes. However, an analogous mechanism will
be necessary for devices whose bus types don't have corresponding
struct acpi_bus_type objects and that have specific ACPI scan handlers.

For those devices, add new .bind() and .unbind() callbacks to struct
acpi_scan_handler that will be executed by acpi_platform_notify()
right after the given device has been associated with an ACPI
comapnion and by acpi_platform_notify_remove() right before calling
acpi_unbind_one() for that device, respectively.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/acpi/glue.c | 12 ++++++++++++
include/acpi/acpi_bus.h | 2 ++
2 files changed, 14 insertions(+)

Index: linux-pm/drivers/acpi/glue.c
===================================================================
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -287,6 +287,7 @@ EXPORT_SYMBOL_GPL(acpi_unbind_one);
static int acpi_platform_notify(struct device *dev)
{
struct acpi_bus_type *type = acpi_get_bus_type(dev);
+ struct acpi_device *adev;
int ret;

ret = acpi_bind_one(dev, NULL);
@@ -303,9 +304,14 @@ static int acpi_platform_notify(struct d
if (ret)
goto out;
}
+ adev = ACPI_COMPANION(dev);
+ if (!adev)
+ goto out;

if (type && type->setup)
type->setup(dev);
+ else if (adev->handler && adev->handler->bind)
+ adev->handler->bind(dev);

out:
#if ACPI_GLUE_DEBUG
@@ -324,11 +330,17 @@ static int acpi_platform_notify(struct d

static int acpi_platform_notify_remove(struct device *dev)
{
+ struct acpi_device *adev = ACPI_COMPANION(dev);
struct acpi_bus_type *type;

+ if (!adev)
+ return 0;
+
type = acpi_get_bus_type(dev);
if (type && type->cleanup)
type->cleanup(dev);
+ else if (adev->handler && adev->handler->unbind)
+ adev->handler->unbind(dev);

acpi_unbind_one(dev);
return 0;
Index: linux-pm/include/acpi/acpi_bus.h
===================================================================
--- linux-pm.orig/include/acpi/acpi_bus.h
+++ linux-pm/include/acpi/acpi_bus.h
@@ -133,6 +133,8 @@ struct acpi_scan_handler {
struct list_head list_node;
int (*attach)(struct acpi_device *dev, const struct acpi_device_id *id);
void (*detach)(struct acpi_device *dev);
+ void (*bind)(struct device *phys_dev);
+ void (*unbind)(struct device *phys_dev);
struct acpi_hotplug_profile hotplug;
};


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 2 of 3 ==
Date: Tues, Jan 21 2014 3:20 pm
From: "Rafael J. Wysocki"


On Monday, January 20, 2014 02:10:10 PM Rafael J. Wysocki wrote:
> On Monday, January 20, 2014 01:15:19 PM Mika Westerberg wrote:
> > On Fri, Jan 17, 2014 at 03:46:40PM +0100, Rafael J. Wysocki wrote:
> > > @@ -415,11 +472,12 @@ static int acpi_lpss_platform_notify(str
> > > return 0;
> > > }
> > >
> > > - if (action == BUS_NOTIFY_ADD_DEVICE)
> > > + if (action == BUS_NOTIFY_ADD_DEVICE) {
> > > ret = sysfs_create_group(&pdev->dev.kobj, &lpss_attr_group);
> > > - else if (action == BUS_NOTIFY_DEL_DEVICE)
> > > + pdev->dev.power.set_latency_tolerance = acpi_lpss_set_ltr;
> >
> > While trying to test this I noticed that BUS_NOTIFY_ADD_DEVICE happens
> > after call to dpm_sysfs_add(), so LTR field is never exposed to the
> > userspace.
>
> Ahh, I confused things, thanks for reporting this!
>
> I'll need to hook it up to acpi_platform_notify() somehow I think.

OK, two patches to replace this one ([4/5]) will follow.

On top of them we can move the code from acpi_lpss_platform_notify() to
acpi_lpss_(bind|unbind)() and get rid of acpi_lpss_nb, which I think would be
an improvement too. :-)

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/




== 3 of 3 ==
Date: Tues, Jan 21 2014 3:20 pm
From: "Rafael J. Wysocki"


From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a new routine, acpi_lpss_set_ltr(), for setting latency tolerance
values for LPSS devices having LTR (Latency Tolerance Reporting)
registers. Add .bind()/.unbind() callbacks to lpss_handler to set
the LPSS devices' power.set_latency_tolerance callback pointers to
acpi_lpss_set_ltr() during device addition and to clear those pointers
on device removal, respectively.

That will cause the device latency tolerance PM QoS to work for
the devices in question as documented.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/acpi/acpi_lpss.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/acpi/acpi_lpss.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpi_lpss.c
+++ linux-pm/drivers/acpi/acpi_lpss.c
@@ -33,6 +33,12 @@ ACPI_MODULE_NAME("acpi_lpss");
#define LPSS_GENERAL_UART_RTS_OVRD BIT(3)
#define LPSS_SW_LTR 0x10
#define LPSS_AUTO_LTR 0x14
+#define LPSS_LTR_SNOOP_REQ BIT(15)
+#define LPSS_LTR_SNOOP_MASK 0x0000FFFF
+#define LPSS_LTR_SNOOP_LAT_1US 0x800
+#define LPSS_LTR_SNOOP_LAT_32US 0xC00
+#define LPSS_LTR_SNOOP_LAT_SHIFT 5
+#define LPSS_LTR_MAX_VAL 0x3FF
#define LPSS_TX_INT 0x20
#define LPSS_TX_INT_MASK BIT(1)

@@ -316,6 +322,17 @@ static int acpi_lpss_create_device(struc
return ret;
}

+static u32 __lpss_reg_read(struct lpss_private_data *pdata, unsigned int reg)
+{
+ return readl(pdata->mmio_base + pdata->dev_desc->prv_offset + reg);
+}
+
+static void __lpss_reg_write(u32 val, struct lpss_private_data *pdata,
+ unsigned int reg)
+{
+ writel(val, pdata->mmio_base + pdata->dev_desc->prv_offset + reg);
+}
+
static int lpss_reg_read(struct device *dev, unsigned int reg, u32 *val)
{
struct acpi_device *adev;
@@ -337,7 +354,7 @@ static int lpss_reg_read(struct device *
ret = -ENODEV;
goto out;
}
- *val = readl(pdata->mmio_base + pdata->dev_desc->prv_offset + reg);
+ *val = __lpss_reg_read(pdata, reg);

out:
spin_unlock_irqrestore(&dev->power.lock, flags);
@@ -390,6 +407,38 @@ static struct attribute_group lpss_attr_
.name = "lpss_ltr",
};

+static void acpi_lpss_set_ltr(struct device *dev, s32 val)
+{
+ struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
+ u32 ltr_mode, ltr_val;
+
+ ltr_mode = __lpss_reg_read(pdata, LPSS_GENERAL);
+ if (val < 0) {
+ if (ltr_mode & LPSS_GENERAL_LTR_MODE_SW) {
+ ltr_mode &= ~LPSS_GENERAL_LTR_MODE_SW;
+ __lpss_reg_write(ltr_mode, pdata, LPSS_GENERAL);
+ }
+ return;
+ }
+ ltr_val = __lpss_reg_read(pdata, LPSS_SW_LTR) & ~LPSS_LTR_SNOOP_MASK;
+ if (val > LPSS_LTR_MAX_VAL) {
+ ltr_val |= LPSS_LTR_SNOOP_LAT_32US | LPSS_LTR_SNOOP_REQ;
+ val >>= LPSS_LTR_SNOOP_LAT_SHIFT;
+ if (val > LPSS_LTR_MAX_VAL)
+ val = LPSS_LTR_MAX_VAL;
+ } else {
+ ltr_val |= LPSS_LTR_SNOOP_LAT_1US;
+ if (val > 0)
+ ltr_val |= LPSS_LTR_SNOOP_REQ;
+ }
+ ltr_val |= val;
+ __lpss_reg_write(ltr_val, pdata, LPSS_SW_LTR);
+ if (!(ltr_mode & LPSS_GENERAL_LTR_MODE_SW)) {
+ ltr_mode |= LPSS_GENERAL_LTR_MODE_SW;
+ __lpss_reg_write(ltr_mode, pdata, LPSS_GENERAL);
+ }
+}
+
static int acpi_lpss_platform_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -427,9 +476,29 @@ static struct notifier_block acpi_lpss_n
.notifier_call = acpi_lpss_platform_notify,
};

+static void acpi_lpss_bind(struct device *dev)
+{
+ struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
+
+ if (WARN_ON(!pdata || !pdata->mmio_base))
+ return;
+
+ if (pdata->mmio_size >= pdata->dev_desc->prv_offset + LPSS_LTR_SIZE)
+ dev->power.set_latency_tolerance = acpi_lpss_set_ltr;
+ else
+ dev_err(dev, "MMIO size insufficient to access LTR\n");
+}
+
+static void acpi_lpss_unbind(struct device *dev)
+{
+ dev->power.set_latency_tolerance = NULL;
+}
+
static struct acpi_scan_handler lpss_handler = {
.ids = acpi_lpss_device_ids,
.attach = acpi_lpss_create_device,
+ .bind = acpi_lpss_bind,
+ .unbind = acpi_lpss_unbind,
};

void __init acpi_lpss_init(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: tty: Allow stealing of controlling ttys within user namespaces
http://groups.google.com/group/linux.kernel/t/46ae0e7e30358a80?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Jan 21 2014 3:20 pm
From: ebiederm@xmission.com (Eric W. Biederman)


Seth Forshee <seth.forshee@canonical.com> writes:

> root is allowed to steal ttys from other sessions, but it
> requires system-wide CAP_SYS_ADMIN and therefore is not possible
> for root within a user namespace. This should be allowed so long
> as the process doing the stealing is privileged towards the
> session leader which currently owns the tty.
>
> Update the tty code to only require CAP_SYS_ADMIN in the
> namespace of the target session leader when stealing a tty. Fall
> back to using init_user_ns to preserve the existing behavior for
> system-wide root.
>
> Cc: stable@vger.kernel.org # 3.8+

This is not a regression of any form, nor is it obviously correct so
this does not count as a stable material.

> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
> drivers/tty/tty_io.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index c74a00a..1c47f16 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -2410,7 +2410,19 @@ static int tiocsctty(struct tty_struct *tty, int arg)
> * This tty is already the controlling
> * tty for another session group!
> */
> - if (arg == 1 && capable(CAP_SYS_ADMIN)) {
> + struct user_namespace *ns = &init_user_ns;
> + struct task_struct *p;
> +
> + read_lock(&tasklist_lock);
> + do_each_pid_task(tty->session, PIDTYPE_SID, p) {
> + if (p->signal->leader) {
> + ns = task_cred_xxx(p, user_ns);
> + break;
> + }
> + } while_each_pid_task(tty->session, PIDTYPE_SID, p);
> + read_unlock(&tasklist_lock);

Ugh. That appears to be both racy (what protects the user_ns from going
away?) and a possibly allowing revoking a tty from a more privileged processes tty.

However I do see a form that can easily verify we won't revoke a tty from a
more privileged process.

if (arg == 1) {
struct user_namespace *user_ns;
read_lock(&tasklist_lock);
do_each_pid_task(tty->session, PIDTYPE_SID, p) {
rcu_read_lock();
user_ns = task_cred_xxx(p, user_ns);
if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
rcu_read_unlock();
read_unlock(&task_list_lock);
ret = -EPERM;
goto out_unlock;
}
rcu_read_unlock();
}
/* Don't drop the the tasklist_lock before
* stealing the tasks or the set of tasks can
* change, and we only have permission for this set
* of tasks.
*/
/*
* Steal it away
*/
session_clear_tty(tty->session);
read_unlock(&task_list_lock);
} else {
ret = -EPERM;
goto out_unlock;
}

My code above is ugly and could use some cleaning up but it should be
correct with respect to this issue.

Eric


> + if (arg == 1 && ns_capable(user_ns, CAP_SYS_ADMIN)) {
> /*
> * Steal it away
> */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: Linux 3.12.7 introduces page map handling regression
http://groups.google.com/group/linux.kernel/t/1076eaae005a05d1?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Jan 21 2014 3:30 pm
From: Steven Noonan


A user reported a problem starting vsftpd on a Xen paravirtualized
guest, with this in dmesg:

[ 60.654862] BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067
[ 60.654876] page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0
[ 60.654879] page flags: 0x2ffc0000000014(referenced|dirty)
[ 60.654885] addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74
[ 60.654890] CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
[ 60.654893] ffff880e9cc6ec38 ffff880e9cc61ca0 ffffffff814c763b 00007f97eea74000
[ 60.654900] ffff880e9cc61ce8 ffffffff8116784e 0000000000000000 0000000000000000
[ 60.654906] ffff880e9cc013a0 ffffea00124ee200 00007f97eea75000 ffff880e9cc61e10
[ 60.654912] Call Trace:
[ 60.654921] [<ffffffff814c763b>] dump_stack+0x45/0x56
[ 60.654928] [<ffffffff8116784e>] print_bad_pte+0x22e/0x250
[ 60.654933] [<ffffffff81169073>] unmap_single_vma+0x583/0x890
[ 60.654938] [<ffffffff8116a405>] unmap_vmas+0x65/0x90
[ 60.654942] [<ffffffff81173795>] exit_mmap+0xc5/0x170
[ 60.654948] [<ffffffff8105d295>] mmput+0x65/0x100
[ 60.654952] [<ffffffff81062983>] do_exit+0x393/0x9e0
[ 60.654955] [<ffffffff810630dc>] do_group_exit+0xcc/0x140
[ 60.654959] [<ffffffff81063164>] SyS_exit_group+0x14/0x20
[ 60.654965] [<ffffffff814d602d>] system_call_fastpath+0x1a/0x1f
[ 60.654968] Disabling lock debugging due to kernel taint
[ 60.655191] BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
[ 60.655196] BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1


The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.

I noted that it wasn't present in 3.10.27, but was present in 3.12.7 and
3.12.8. I ran through a bisection to find the root cause:

# start: 'v3.12.7' 'v3.10.27'
# bad: [4301b7a8] Linux 3.12.7
# good: [1071ea6e] Linux 3.10.27
# good: [8bb495e3] Linux 3.10
# good: [8fe73691] staging: comedi: comedi_bond: change return value
# good: [22e04f6b] Merge branch 'for-linus' of git://git.kernel.org/p
# good: [b7c09ad4] Merge branch 'for-linus' of git://git.kernel.org/p
# good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
# good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
# good: [f5fa9283] ipv6: reset dst.expires value when clearing expire
# good: [4af9d888] bridge: flush br's address entry in fdb when remov
# good: [8c13daf6] dm delay: fix a possible deadlock due to shared wo
# good: [93c02d70] firewire: sbp2: bring back WRITE SAME support
# good: [18065245] ACPI / PCI / hotplug: Avoid warning when _ADR not
# bad: [8807a436] mm/memory-failure.c: transfer page count from head
# bad: [fd5df800] mm: numa: avoid unnecessary disruption of NUMA hin
# good: [c18e3316] mm: numa: do not clear PMD during PTE update scan
# good: [f3b578d9] mm: numa: avoid unnecessary work on the failure pa
# bad: [3d792d61] mm: numa: clear numa hinting information on mprote
# good: [cefeb279] sched: numa: skip inaccessible VMAs
# first bad: [3d792d61] mm: numa: clear numa hinting information on mprote

If only I'd tested v3.12.0, that bisection would have been a lot shorter!


It looks like this is the change implicated (introduced in v3.12.7):

commit 3d792d616ba408ab55a54c1bb75a9367d997acfa
Author: Mel Gorman <mgorman@suse.de>
Date: Tue Jan 7 14:00:44 2014 +0000

mm: numa: clear numa hinting information on mprotect

commit 1667918b6483b12a6496bf54151b827b8235d7b1 upstream.

On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a
protection change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


This clearly points to breakage of mprotect() in particular. Checking
what vsftpd was doing via strace, I was able to come up with a simple
test case which triggers the issue:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

void die(const char *what)
{
perror(what);
exit(1);
}

int main(int arg, char **argv)
{
void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

if (p == MAP_FAILED)
die("mmap");

/* Tickle the page. */
((char *)p)[0] = 0;

if (mprotect(p, 4096, PROT_NONE) != 0)
die("mprotect");

if (mprotect(p, 4096, PROT_READ) != 0)
die("mprotect");

if (munmap(p, 4096) != 0)
die("munmap");

return 0;
}

This could probably be reduced further. I didn't spend much time on it.

Adding people cited in the patch to CC, as well as Konrad since this is
a Xen issue (I haven't been able to repro on HVM or bare metal so far).

Any ideas what's causing the BUG, and how we can fix it?

- Steven
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: x86: Inconsistent xAPIC synchronization in arch_irq_work_raise?
http://groups.google.com/group/linux.kernel/t/daa6d2e500b383a8?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Jan 21 2014 3:30 pm
From: Huang Ying


On Tue, 2014-01-21 at 15:51 +0100, Peter Zijlstra wrote:
> On Tue, Jan 21, 2014 at 03:01:13PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 21, 2014 at 02:02:06PM +0100, Jan Kiszka wrote:
> > > Hi all,
> > >
> > > while trying to plug a race in the CPU hotplug code on xAPIC systems, I
> > > was analyzing IPI transmission patterns. The handlers in
> > > arch/x86/include/asm/ipi.h first wait for ICR, then send. In contrast,
> > > arch_irq_work_raise sends the self-IPI directly and then waits. This
> > > looks inconsistent. Is it intended?
> > >
> > > BTW, the races are in wakeup_secondary_cpu_via_init and
> > > wakeup_secondary_cpu_via_nmi (lacking IRQ disable around ICR accesses).
> > > There we also send first, then wait for completion. But I guess that is
> > > due to the code originally only being used during boot. Will send fixes
> > > for those once the sync pattern is clear to me.
> >
> > Could be I had no clue what I was doing and copy/pasted the code until
> > it compiled and ran.
> >
> > In fact, I've got no clue what an ICR is.
>
> I dug about a bit, I borrowed that code from:
>
> lkml.kernel.org/r/1277348698-17311-3-git-send-email-ying.huang@intel.com
>
> Huang Ying, can you explain to Jan why you do the wait afterwards?

I borrow the code from the original MCE report event code.

Andi, could you help us to explain it?

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





==============================================================================
TOPIC: MCS Lock: Allow architectures to hook in to contended paths
http://groups.google.com/group/linux.kernel/t/a70959a6ec9463ba?hl=en
==============================================================================

== 1 of 7 ==
Date: Tues, Jan 21 2014 3:40 pm
From: Tim Chen


From: Will Deacon <will.deacon@arm.com

When contended, architectures may be able to reduce the polling overhead
in ways which aren't expressible using a simple relax() primitive.

This patch allows architectures to hook into the mcs_{lock,unlock}
functions for the contended cases only.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
include/linux/mcs_spinlock.h | 42 ++++++++++++++++++++++++++++--------------
1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h
index 143fa42..e9a4d74 100644
--- a/include/linux/mcs_spinlock.h
+++ b/include/linux/mcs_spinlock.h
@@ -17,6 +17,28 @@ struct mcs_spinlock {
int locked; /* 1 if lock acquired */
};

+#ifndef arch_mcs_spin_lock_contended
+/*
+ * Using smp_load_acquire() provides a memory barrier that ensures
+ * subsequent operations happen after the lock is acquired.
+ */
+#define arch_mcs_spin_lock_contended(l) \
+do { \
+ while (!(smp_load_acquire(l))) \
+ arch_mutex_cpu_relax(); \
+} while (0)
+

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate