Thursday, March 11, 2010

linux.kernel - 26 new messages in 15 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* execve for script don't return ENOEXEC, bug ? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/53d156cf36c58e4f?hl=en
* lp3971: Fix BUCK_VOL_CHANGE_SHIFT logic - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/e038a561bd808851?hl=en
* (none) - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/168ef74f99f3304d?hl=en
* ST SPEAr: Added source files for SPEAr platform - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ce805d5c50983fd8?hl=en
* Low latency mode and performance of pty's? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5a2b00e35b0864a7?hl=en
* tcp: bugs and cleanup for 2.6.34-rc1 - 8 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c35ca27b143f9a64?hl=en
* [PATCH 7/7] xen: Enable event channel of PV extension of HVM - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/eac82160cca26360?hl=en
* ramzswap: Eliminate stale data in compressed memory - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/7889db6e2e4c45f0?hl=en
* oprofile, perf, x86: introduce new functions to reserve perfctrs - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/716ea40ac646a344?hl=en
* Please pull my perf.git urgent branch - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/caf7112ff37b81a8?hl=en
* perf_events: add sampling period randomization support (v2) - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/e1eb7c2b14245ed5?hl=en
* fs/partition/msdos: Fix unusable extended partition for > 512B sector - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/feedecd5f30fb05f?hl=en
* allow retrieving entries with trailing dots - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/57d334aba1b16975?hl=en
* perf_events: fix X86 bogus counts when multiplexing - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/82db6d0109ff6b09?hl=en
* [PATCH 1/1] perf: add support for arch-dependent symbolic event names to "
perf stat" - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/d9c6c63579142582?hl=en

==============================================================================
TOPIC: execve for script don't return ENOEXEC, bug ?
http://groups.google.com/group/linux.kernel/t/53d156cf36c58e4f?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:10 am
From: Valery Reznic


Hi,

I Have following to scripts:

a.sh
#!/bin/sh
echo "It's a.sh

and b.sh:
#! ./b.sh
echo "It's b.sh"

As per execve man page, script interpreter should not be script itself.
When I run it on my Fedora 8 x86_64 box (with stock kernel, never updated)
under strace I got following:

strace -f -e execve setarch x86_64 ./b.sh
execve("/usr/bin/setarch", ["setarch", "x86_64", "./b.sh"], [/* 23 vars */]) = 0
execve("./b.sh", ["./b.sh"], [/* 23 vars */]) = -1 ENOEXEC (Exec format error)
execve("/bin/sh", ["/bin/sh", "./b.sh"], [/* 23 vars */]) = 0
It's b.sh

I.e execve failed as it should

When I run same scripts on Fedora 12 x86_64 box with stock kernel 2.6.31.5-127.fc12.x86_64 I got following:

strace -f -e execve setarch i386 ./b.sh
execve("/usr/bin/setarch", ["setarch", "i386", "./b.sh"], [/* 41 vars */]) = 0
execve("./b.sh", ["./b.sh"], [/* 41 vars */]) = 0
It's a.sh

I.e execve succeeded, instead of failing with ENOEXEC

Regards,
Valery.

P.S. I am not subscribed to this list, so please CC me



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: lp3971: Fix BUCK_VOL_CHANGE_SHIFT logic
http://groups.google.com/group/linux.kernel/t/e038a561bd808851?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:20 am
From: Liam Girdwood


On Thu, 2010-03-11 at 11:04 +0000, Mark Brown wrote:
> On Thu, Mar 11, 2010 at 09:56:34AM +0800, axel lin wrote:
> > From 99175ed0f281d54efdaf8eef548077a8f0b4924a Mon Sep 17 00:00:00 2001
> > From: Axel Lin <axel.lin@gmail.com>
> > Date: Thu, 11 Mar 2010 09:50:07 +0800
> > Subject: [PATCH] lp3971: Fix BUCK_VOL_CHANGE_SHIFT logic
> >
> > Given x=0,1,2, current implementation of BUCK_VOL_CHANGE_SHIFT(x) returns 0,4,8.
> > The correct return value should be 0,4,6.
> > This patch fix the logic.
> >
> > Signed-off-by: Axel Lin <axel.lin@gmail.com>
>
> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
>
> with the same comment about ideally wanting someone who worked on the
> driver previously to check stuff.

Marek, Kyungmin, could someone please test/ack this before both are
applied.

Thanks

Liam

--
Freelance Developer, SlimLogic Ltd
ASoC and Voltage Regulator Maintainer.
http://www.slimlogic.co.uk

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: (none)
http://groups.google.com/group/linux.kernel/t/168ef74f99f3304d?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 11 2010 3:30 am
From: "ExxonMobil Oil Company, Malaysia"


Your Email just won you ($500,000.00 US Dollars ATM CARD) You are to contact
Mr. Abdul Razi With your,Names....Age...Country....Tell....
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Thurs, Mar 11 2010 3:30 am
From: "ExxonMobil Oil Company, Malaysia"


Your Email just won you ($500,000.00 US Dollars ATM CARD) You are to contact
Mr. Abdul Razi With your,Names....Age...Country....Tell....
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ST SPEAr: Added source files for SPEAr platform
http://groups.google.com/group/linux.kernel/t/ce805d5c50983fd8?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 11 2010 3:30 am
From: Linus Walleij


Hi Viresh,

I'm working my way through the patch set, be patient :)

2010/3/3 Viresh KUMAR <viresh.kumar@st.com>:
> (...)
> diff --git a/arch/arm/plat-spear/gpt.c b/arch/arm/plat-spear/gpt.c
> new file mode 100644
> index 0000000..3b16360
> --- /dev/null
> +++ b/arch/arm/plat-spear/gpt.c
> @@ -0,0 +1,537 @@
> +/*
> + * arch/arm/plat-spear/gpt.c

As mentioned I prefer that you merge this file into the timer.c file
and remove the extra .h interface totally. Also of course refactor to
take into account that it's one file only.

> + *
> + * ST GPT Timer driver, based on omap's dmtimer.c
> + *
> + * Copyright (C) 2009 ST Microelectronics
> + * Shiraz Hashim<shiraz.hashim@st.com>
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/spinlock.h>
> +#include <mach/irqs.h>
> +#include <mach/spear.h>
> +#include <plat/gpt.h>
> +
> +static const char *src_clk_con_id[] __initdata = {
> + � � � "pll3_48m_clk",
> + � � � "pll1_clk",
> + � � � NULL
> +};
> +
> +static struct clk *spear_source_clocks[2];
> +static struct spear_timer *spear_timers;
> +static int spear_timer_count;
> +static spinlock_t spear_timer_lock;
> +
> +/*
> + * static helper functions
> + */
> +static inline u16 spear_timer_read_reg(struct spear_timer *timer, u32 reg)
> +{
> + � � � return readw(timer->io_base + (reg & 0xff));
> +}
> +
> +static inline void spear_timer_write_reg(struct spear_timer *timer, u32 reg,
> + � � � � � � � � � � � � � � � � � � � � � � � u16 value)
> +{
> + � � � writew(value, timer->io_base + (reg & 0xff));
> +}

I usually just have raw writew() and skip all this extra abstraction in
my code, but it's a matter of taste so if you like it, keep it.

> +
> +static void spear_timer_reset(struct spear_timer *timer)
> +{
> +
> + � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, 0x0);
> + � � � spear_timer_write_reg(timer, GPT_LOAD_OFF, 0x0);
> + � � � spear_timer_write_reg(timer, GPT_INT_OFF, GPT_STATUS_MATCH);
> + � � � timer->prescaler = 0;
> +}
> +
> +static void spear_timer_prepare(struct spear_timer *timer)
> +{
> + � � � spear_timer_enable(timer);
> + � � � spear_timer_reset(timer);
> +}
> +
> +/* functions exported for application */
> +
> +/**
> + * spear_timer_enable - enable a timer
> + * @timer: timer
> + *
> + * This API enables the functional clock of the timer
> + */
> +int spear_timer_enable(struct spear_timer *timer)
> +{
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � if (timer->enabled)
> + � � � � � � � return -EINVAL;
> +
> + � � � clk_enable(timer->fclk);
> +
> + � � � timer->enabled = 1;
> +
> + � � � return 0;
> +}
> +
> +/**
> + * spear_timer_disable - disables timer
> + * @timer: timer
> + *
> + * The API disables the functional clock of the timer
> + */
> +int spear_timer_disable(struct spear_timer *timer)
> +{
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � if (!timer->enabled)
> + � � � � � � � return -EINVAL;
> +
> + � � � clk_disable(timer->fclk);
> +
> + � � � timer->enabled = 0;
> +
> + � � � return 0;
> +}
> +
> +/**
> + * spear_timer_request - request for a timer
> + *
> + * This API returns a timer which is available, returns NULL if none is
> + * available
> + */
> +struct spear_timer *spear_timer_request(void)
> +{
> + � � � struct spear_timer *timer = NULL;
> + � � � unsigned long flags;
> + � � � int i;
> +
> + � � � spin_lock_irqsave(&spear_timer_lock, flags);
> +
> + � � � for (i = 0; i < spear_timer_count; i++) {
> + � � � � � � � if (spear_timers[i].reserved)
> + � � � � � � � � � � � continue;
> +
> + � � � � � � � timer = &spear_timers[i];
> + � � � � � � � timer->reserved = 1;

I forgot to mention that if .reserved can only have the values 0 and 1,
the chances are big that you should turn that member of struct spear_timer
into a bool instead and use false/true as values.

> + � � � � � � � break;
> + � � � }
> +
> + � � � spin_unlock_irqrestore(&spear_timer_lock, flags);
> +
> + � � � if (timer != NULL)
> + � � � � � � � spear_timer_prepare(timer);
> +
> + � � � return timer;
> +}
> +EXPORT_SYMBOL(spear_timer_request);
> +
> +/**
> + * spear_timer_request_specific - request for a specific timer
> + * @id: timer id requested, 1 onwards
> + *
> + * This API returns specific timer which is requested, returns NULL if it is
> + * not available.
> + */
> +struct spear_timer *spear_timer_request_specific(int id)

I don't see the need for this functions. Are not all timers the same?
If the timers are hardwired for specific purposes, as in they have some
extra hardware routing of their output,, this should rather
show up in the API. Again I think moving this into timer.c will solve
the problem.

> +{
> + � � � struct spear_timer *timer;
> + � � � unsigned long flags;
> +
> + � � � spin_lock_irqsave(&spear_timer_lock, flags);
> +
> + � � � if (id < 0 || id >= spear_timer_count ||
> + � � � � � � � � � � � spear_timers[id].reserved) {
> + � � � � � � � spin_unlock_irqrestore(&spear_timer_lock, flags);
> + � � � � � � � printk(KERN_ERR "BUG: unable to get timer %d\n", id);
> + dump_stack();

Is this really a bug...? You can easily see that say timer 2 can have been
taken by the generic spear_timer_request() function. In that case, the raise
of a BUG message is subject to calling order and implicit semantics, e.g.
you require that all clients that require a certain timer issue their
spear_timer_request_specific() before any calls to spear_request_timer()
is issued. That's getting messy, at least it needs to documented.

Again merging into timer.c will solve this (it will just go away).

> + � � � � � � � return NULL;
> + � � � }
> +
> + � � � timer = &spear_timers[id];
> + � � � timer->reserved = 1;
> + � � � spin_unlock_irqrestore(&spear_timer_lock, flags);
> +
> + � � � spear_timer_prepare(timer);
> +
> + � � � return timer;
> +}
> +EXPORT_SYMBOL(spear_timer_request_specific);
> +
> +/**
> + * spear_timer_free - free the timer
> + * @timer: timer which should be freed
> + *
> + * Applications should free the timer when not in use. The API also resets and
> + * disables the timer clock
> + */
> +int spear_timer_free(struct spear_timer *timer)
> +{
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � WARN_ON(!timer->reserved);

This is a bug. If spear_request_timer[_specific]() and
spear_timer_free() are supposed to be called in pairs
this warning message will always be printed.

Or are you suggesting that clients should never ever
release their clocks?

> +
> + � � � spear_timer_reset(timer);
> + � � � spear_timer_disable(timer);
> +
> + � � � timer->reserved = 0;
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_free);
> +
> +/**
> + * spear_timer_get_irq - get the irq number associated
> + * @timer: return the irq number for this timer
> + *
> + * The application may require the irq associated with a timer to register it
> + * with the OS interrupt framework.
> + */
> +int spear_timer_get_irq(struct spear_timer *timer)
> +{
> + � � � return (timer) ? timer->irq : -ENODEV;
> +}
> +EXPORT_SYMBOL(spear_timer_get_irq);

This is messy IMHO. If you really want to route interrupts from the
timers away from the actual timer driver (which *should* be in this
file only!) you need to register a new struct irq_chip for the timer
array and register a callback along with the
spear_request_timer[_specific]() function.

If you look for the uses of struct irq_chip in the kernel you can find
a lot of this hierarchical irq_chip:s, notice that they are used as abstract
entities, an irq_chip does NOT have to map 1-to-1 to e.g. a PL190
VIC, they can be subrouters.

> +
> +/**
> + * spear_timer_get_fclk - return the functional clock
> + * @timer: timer for which fclk is required
> + *
> + * Returns the functional clock of the timer at which it is operating.
> + * Functional clock represents the actual clock at which the timer is
> + * operating. The count period depends on this clock
> + */
> +struct clk *spear_timer_get_fclk(struct spear_timer *timer)
> +{
> + � � � if (!timer)
> + � � � � � � � return NULL;
> +
> + � � � return timer->fclk;
> +}
> +EXPORT_SYMBOL(spear_timer_get_fclk);
> +
> +/**
> + * spear_timer_start - start the timer operation
> + * @timer: timer to start
> + *
> + * The timer shall start operating only after calling this API
> + */
> +int spear_timer_start(struct spear_timer *timer)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (!(val & GPT_CTRL_ENABLE)) {
> + � � � � � � � val |= GPT_CTRL_ENABLE;
> + � � � � � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> + � � � }
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_start);
> +
> +/**
> + * spear_timer_stop - stop the timer operation
> + * @timer: timer to stop
> + *
> + * This API can be used to stop the timer operation
> + */
> +int spear_timer_stop(struct spear_timer *timer)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (val & GPT_CTRL_ENABLE) {
> + � � � � � � � val &= ~GPT_CTRL_ENABLE;
> + � � � � � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> + � � � }
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_stop);
> +
> +/**
> + * spear_timer_set_source - select the proper source clock
> + * @timer: timer
> + * @source: the clock cource
> + *
> + * source specifies the clock which has to be selected as the functional clock
> + * of timer. The available choice are 48 MHz (PLL3) output or AHB clock.
> + *
> + */
> +int spear_timer_set_source(struct spear_timer *timer, int source)
> +{
> + � � � if (source < 0 || source >= 3)
> + � � � � � � � return -EINVAL;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � clk_disable(timer->fclk);
> + � � � clk_set_parent(timer->fclk, spear_source_clocks[source]);
> + � � � clk_enable(timer->fclk);
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_set_source);

So depending on what you want each timer can be clocked at two different
frequencies. This is actually an argument *against* merging it into the timer.c
file, because the run frequency of clock event drivers are expected to be
fixed and static. But again, this depends on whether you use it or not.

When you use it as clocksource / clock event, you should just set it
to the highest possible frequency. However when power management
comes into the picture, this becomes a more complicated question.

So, are lower frequencies really used for anything?

> +
> +/**
> + * spear_timer_set_load - program the count value of timer
> + * @timer: timer
> + * @autoreload: boolean, whether the count will be reloaded
> + * @load: the actual count deciding the time period

Append a comment satting that the value to reload will depend on
the frequency of the parent PLL, because it does, right?

> + *
> + * The timer will be programmed as per the value in load. The time period will
> + * depend on the clock and prescaler selected.
> + */
> +int spear_timer_set_load(struct spear_timer *timer, int autoreload,
> + � � � � � � � u16 load)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (autoreload)
> + � � � � � � � val &= ~GPT_CTRL_MODE_AR;
> + � � � else
> + � � � � � � � val |= GPT_CTRL_MODE_AR;
> +
> + � � � spear_timer_write_reg(timer, GPT_LOAD_OFF, load);
> + � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_set_load);
> +
> +/**
> + * spear_timer_set_load_start - program the count value of timer and start
> + * @timer: timer
> + * @autoreload: boolean, whether the count will be reloaded
> + * @load: the actual count deciding the time period
> + *
> + * The timer will be programmed as per the value in load and the operation will
> + * be started.
> + */
> +int spear_timer_set_load_start(struct spear_timer *timer, int autoreload,
> + � � � � � � � u16 load)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (autoreload)
> + � � � � � � � val &= ~GPT_CTRL_MODE_AR;
> + � � � else
> + � � � � � � � val |= GPT_CTRL_MODE_AR;
> +
> + � � � val |= GPT_CTRL_ENABLE;
> +
> + � � � spear_timer_write_reg(timer, GPT_LOAD_OFF, load);
> + � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_set_load_start);

If you merge this into timer.c it becomes obvious that autoreload will
be used for the MODE_PERIODIC and non-autoreload for the oneshot
timer, and the abstraction goes away, so the timer driver will be a lot
simpler to read.

> +
> +/**
> + * spear_timer_match_irq - enable the match interrupt
> + * @timer: timer
> + * @enable: boolean, 1 for enable
> + *
> + * If application wants to enable the interrupt on count match then this API
> + * should be called
> + */
> +int spear_timer_match_irq(struct spear_timer *timer, int enable)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (enable)
> + � � � � � � � val |= GPT_CTRL_MATCH_INT;
> + � � � else
> + � � � � � � � val &= ~GPT_CTRL_MATCH_INT;
> + � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_match_irq);

Again, use an abstract struct irq_chip instead and you won't need all
these kludges.

> +
> +/**
> + * spear_timer_set_prescaler - set the prescaler for the timer
> + * @timer: timer
> + * @prescaler: the division factor to be applied on the src clk
> + *
> + * This API can be used to program the prescaler for the timer. Its value can
> + * be from 1 to 8 signifying division by 2^prescaler
> + */
> +int spear_timer_set_prescaler(struct spear_timer *timer, int prescaler)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_CTRL_OFF);
> + � � � if (prescaler >= 0x00 && prescaler <= 0x08) {
> + � � � � � � � val &= 0xFFF0;
> + � � � � � � � val |= prescaler;
> + � � � } else {
> + � � � � � � � printk(KERN_ERR "Invalid prescaler\n");
> + � � � }
> +
> + � � � timer->prescaler = val & 0xF;
> + � � � spear_timer_write_reg(timer, GPT_CTRL_OFF, val);
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_set_prescaler);

Same comment here as for set_source().

> +
> +/**
> + * spear_timer_read_status - read the interrupt status
> + * @timer: timer
> + *
> + * This API can be used to read the interrupt status of timer. Currently only
> + * match interrupt, at offset 0, is part of this.
> + */
> +int spear_timer_read_status(struct spear_timer *timer)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_INT_OFF);
> +
> + � � � return val;
> +}
> +EXPORT_SYMBOL(spear_timer_read_status);

Goes away with struct irq_chip again.

> +
> +/**
> + * spear_timer_clear_status - write to the interrupt status reg to clear
> + * @timer: timer
> + * @value: value to be written
> + *
> + * The API can be used to clear the match interrupt status by writing a value
> + * 0.
> + */
> +int spear_timer_clear_status(struct spear_timer *timer, u16 value)
> +{
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � spear_timer_write_reg(timer, GPT_INT_OFF, value);
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_clear_status);

Goes away with struct irq_chip again.

> +
> +/**
> + * spear_timer_read_counter - get the current count value
> + * @timer: timer
> + *
> + * The API returns the current count value of the timer. One has to remember
> + * that if the timer is operating on asynchronous clock source then there is a
> + * probability that the application may get some junk count value. This is due
> + * to the fact that the count register is being accessed by two independent
> + * clock, one driving the timer (48MHz) and the other on which this register is
> + * read (system AHB clock). For more details refer GPT Application note.
> + */
> +int spear_timer_read_counter(struct spear_timer *timer)
> +{
> + � � � u16 val;
> +
> + � � � if (!timer)
> + � � � � � � � return -ENODEV;
> +
> + � � � val = spear_timer_read_reg(timer, GPT_COUNT_OFF);
> +
> + � � � return val;
> +}
> +EXPORT_SYMBOL(spear_timer_read_counter);

Used for the clocksource right? No other in-kernel user would be interested
in this function. If you fold the driver into timer.c it goes away
naturally, the
driver for the clocksource just read and use this register.

> +
> +/**
> + * spear_timer_active - is timer active
> + * @timer: timer
> + *
> + * This API can be used to find whether a timer is already enabled and is
> + * active i.e. it is under operation
> + */
> +int spear_timer_active(struct spear_timer *timer)
> +{
> + � � � if (!timer)
> + � � � � � � � return 0;

return -EINVAL, no?

> +
> + � � � if (!timer->enabled)
> + � � � � � � � return 0;
> +
> + � � � if (spear_timer_read_reg(timer, GPT_CTRL_OFF) & GPT_CTRL_ENABLE)
> + � � � � � � � return 1;

These two if:s are counterintuitive, can't you just remove the .enabled field
in the struct spear_timer and *only* rely on the GPT_CTRL_ENABLE bit in the
GPT_CTRL register? This way there are two bits that are supposed to be either
both 00 or both 11 and any 10 or 01 is a BUG() really. The only reason of
having a shadow variable is if it takes a long time to read the register, which
appears to not be the case.

> +
> + � � � return 0;
> +}
> +EXPORT_SYMBOL(spear_timer_active);
> +
> +/**
> + * spear_timer_init - initialize the set of timers
> + * @timer: list of timers to be intialized
> + * @count: number of timers supported in platform
> + *
> + * This API must not be called by user and it is called only once by the
> + * platform specific intialization code.
> + */
> +int __init spear_timer_init(struct spear_timer *timers, int count)
> +{
> + � � � struct spear_timer *timer;
> + � � � int i;
> + � � � int mtu; /* timer unit */
> + � � � char dev_id[16];
> +
> + � � � spin_lock_init(&spear_timer_lock);
> +
> + � � � spear_timers = timers;
> + � � � spear_timer_count = count;
> +
> + � � � for (i = 0; src_clk_con_id[i] != NULL; i++)
> + � � � � � � � spear_source_clocks[i] = clk_get(NULL, src_clk_con_id[i]);
> +
> + � � � for (i = 0; i < spear_timer_count; i++) {
> + � � � � � � � timer = &spear_timers[i];

Here, before ioremap you should
request_mem_region(timer->phys_base, timer->phys_base + SZ_1K - 1);
Make sure that call doesn't return NULL.

> + � � � � � � � timer->io_base = (void __iomem *)ioremap(timer->phys_base,
> + � � � � � � � � � � � � � � � SZ_1K);
> + � � � � � � � if (!timer->io_base) {
> + � � � � � � � � � � � printk(KERN_ERR "BUG:ioremap failed for timer:%d\n", i);
> + � � � � � � � � � � � continue;

Um... Make a proper error path. What about goto err_ioremap; ?

> + � � � � � � � }
> +
> + � � � � � � � /* each mtu has 2 timers with same set of clocks */
> + � � � � � � � mtu = i >> 1;
> +
> + � � � � � � � sprintf(dev_id, "gpt%d", mtu);
> + � � � � � � � timer->fclk = clk_get_sys(dev_id, NULL);
> + � � � }
> +
> + � � � return 0;
> +}
> diff --git a/arch/arm/plat-spear/time.c b/arch/arm/plat-spear/time.c
> new file mode 100644
> index 0000000..d89ac17
> --- /dev/null
> +++ b/arch/arm/plat-spear/time.c
> @@ -0,0 +1,197 @@
> +/*
> + * arch/arm/plat-spear/time.c
> + *
> + * Copyright (C) 2009 ST Microelectronics
> + * Shiraz Hashim<shiraz.hashim@st.com>
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/clockchips.h>
> +#include <linux/clocksource.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/time.h>
> +#include <linux/irq.h>
> +#include <asm/mach/time.h>
> +#include <mach/irqs.h>
> +#include <mach/hardware.h>
> +#include <mach/spear.h>
> +#include <mach/generic.h>
> +#include <plat/gpt.h>
> +
> +static void clockevent_set_mode(enum clock_event_mode mode,
> + � � � � � � � � � � � � � � � struct clock_event_device *clk_event_dev);
> +static int clockevent_next_event(unsigned long evt,
> + � � � � � � � � � � � � � � � �struct clock_event_device *clk_event_dev);
> +
> +/*
> + * Timer0 and Timer1 both belong to same mtu in cpu subbsystem. Further they
> + * share same functional clock. Any change in one's functional clock will also
> + * affect other timer.
> + */
> +static struct spear_timer *spear_timer_clkevt;
> +static struct spear_timer *spear_timer_clksrc;
> +
> +/*
> + * Clock Source
> + */
> +static cycle_t clocksource_read_cycles(struct clocksource *cs)
> +{
> + � � � return (cycle_t) spear_timer_read_counter(spear_timer_clksrc);
> +}
> +
> +static struct clocksource clksrc = {
> + � � � .name = "clock source",
> + � � � .rating = 200, � � � � �/* its a pretty decent clock */
> + � � � .read = clocksource_read_cycles,
> + � � � .mask = 0xFFFF, � � � � /* 16 bits */
> + � � � .mult = 0, � � � � � � �/* to be computed */
> + � � � .shift = 10,

How is this number (10) decided? I ask because many people just
copy something from some other platform. One algorithm you could consider
is the clocksource_calc_mult_shift() function.

> + � � � .flags = CLOCK_SOURCE_IS_CONTINUOUS,
> +};
> +
> +static void spear_clocksource_init(void)
> +{
> + � � � static struct spear_timer *timer;
> + � � � u32 tick_rate;
> +
> + � � � timer = spear_timer_request_specific(1);

Why does this need exactly timer 1?

Atleast the argument could be an enum GPT_TIMER_1, GPT_TIMER_2 etc.

If this is some way to force the clocksource on a certain timer this also
shows that gpt should be merged in here so we don't need strange
interfaces or numbers like this.

> + � � � BUG_ON(timer == NULL);
> + � � � spear_timer_clksrc = timer;
> +
> + � � � spear_timer_set_source(timer, SPEAR_TIMER_SRC_PLL3_CLK);
> + � � � spear_timer_set_prescaler(timer, GPT_CTRL_PRESCALER256);
> +
> + � � � /* find out actual clock driving Timer */
> + � � � tick_rate = clk_get_rate(spear_timer_get_fclk(timer));
> + � � � tick_rate >>= timer->prescaler;
> +
> + � � � spear_timer_set_load_start(timer, 1, 0xFFFF); /* 16 bit maximum */
> +
> + � � � clksrc.mult =
> + � � � � � � � clocksource_khz2mult((tick_rate / 1000), clksrc.shift);
> +
> + � � � /* register the clocksource */
> + � � � clocksource_register(&clksrc);
> +}
> +
> +/*
> +* Clock Event
> +*/
> +static struct clock_event_device clkevt = {
> + � � � .name = "clock_event",
> + � � � .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
> + � � � .set_mode = clockevent_set_mode,
> + � � � .set_next_event = clockevent_next_event,
> + � � � .shift = 20,

Elaborate on why you choose 20.
Comtemplate using clockevents_calc_mult_shift().

> +};
> +
> +static void clockevent_set_mode(enum clock_event_mode mode,
> + � � � � � � � � � � � � � � � struct clock_event_device *clk_event_dev)
> +{
> + � � � u32 period;
> + � � � struct spear_timer *timer = spear_timer_clkevt;
> +
> + � � � if (timer)
> + � � � � � � � return;
> +
> + � � � spear_timer_stop(timer);
> +
> + � � � switch (mode) {
> + � � � case CLOCK_EVT_MODE_PERIODIC:
> + � � � � � � � period = clk_get_rate(spear_timer_get_fclk(timer)) / HZ;
> + � � � � � � � period >>= timer->prescaler;
> +
> + � � � � � � � spear_timer_match_irq(timer, 1);
> + � � � � � � � spear_timer_set_load_start(timer, 1, period);
> + � � � � � � � break;
> +
> + � � � case CLOCK_EVT_MODE_ONESHOT:
> + � � � � � � � spear_timer_match_irq(timer, 1);
> + � � � � � � � break;
> +
> + � � � case CLOCK_EVT_MODE_UNUSED:
> + � � � case CLOCK_EVT_MODE_SHUTDOWN:
> + � � � case CLOCK_EVT_MODE_RESUME:
> + � � � � � � � break;
> +
> + � � � default:
> + � � � � � � � printk(KERN_ERR "BUG: Invalid mode requested\n");

What about using pr_err() and also BUG()?

> + � � � � � � � break;
> + � � � }
> +}
> +
> +static int clockevent_next_event(unsigned long cycles,
> + � � � � � � � � � � � � � � � �struct clock_event_device *clk_event_dev)
> +{
> + � � � spear_timer_set_load_start(spear_timer_clkevt, 0, (u16) cycles);
> +
> + � � � return 0;
> +}
> +
> +static irqreturn_t spear_timer_interrupt(int irq, void *dev_id)
> +{
> + � � � struct clock_event_device *evt = &clkevt;
> + � � � struct spear_timer *timer = (struct spear_timer *)dev_id;
> +
> + � � � spear_timer_clear_status(timer, GPT_STATUS_MATCH);
> +
> + � � � evt->event_handler(evt);
> +
> + � � � return IRQ_HANDLED;
> +}
> +
> +static struct irqaction spear_timer_irq = {
> + � � � .name = "gp_timer",
> + � � � .flags = IRQF_DISABLED | IRQF_TIMER,
> + � � � .handler = spear_timer_interrupt
> +};
> +
> +static void __init spear_clockevent_init(void)
> +{
> + � � � struct spear_timer *timer;
> + � � � u32 tick_rate;
> +
> + � � � timer = spear_timer_request_specific(0);
> + � � � BUG_ON(timer == NULL);
> + � � � spear_timer_clkevt = timer;
> +
> + � � � spear_timer_set_source(timer, SPEAR_TIMER_SRC_PLL3_CLK);
> + � � � spear_timer_set_prescaler(timer, GPT_CTRL_PRESCALER16);
> +
> + � � � tick_rate = clk_get_rate(spear_timer_get_fclk(timer));
> + � � � tick_rate >>= timer->prescaler;
> +
> + � � � clkevt.mult = div_sc(tick_rate, NSEC_PER_SEC,
> + � � � � � � � � � � � � � � � � � � � clkevt.shift);
> + � � � clkevt.max_delta_ns = clockevent_delta2ns(0xfff0,
> + � � � � � � � � � � � &clkevt);
> + � � � clkevt.min_delta_ns = clockevent_delta2ns(3, &clkevt);
> +
> + � � � clkevt.cpumask = cpumask_of(0);
> + � � � clockevents_register_device(&clkevt);
> +
> + � � � spear_timer_irq.dev_id = (void *)timer;
> +
> + � � � setup_irq(spear_timer_get_irq(timer), &spear_timer_irq);
> + � � � spear_timer_match_irq(timer, 1);
> +}
> +
> +void __init spear_setup_timer(void)
> +{
> + � � � /* Initialize all General Purpose Timers */
> + � � � spear_gpt_init();
> +
> + � � � spear_clockevent_init();
> + � � � spear_clocksource_init();
> +}
> +
> +struct sys_timer spear_sys_timer = {
> + � � � .init = spear_setup_timer,
> +};

Consider also adding a simple
sched_clock() implementation or your scheduling and hrtimers will have tick
granularity only, (not very highres...) look at examples in other
archs like plat-omap.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Thurs, Mar 11 2010 3:30 am
From: Linus Walleij


2010/3/11 Shiraz HASHIM <shiraz.hashim@st.com>:

> Can we use clockevents directly? I though they are abstracted by the kernel
> framework and using them directly is not advisable/feasible.

Not that I know, but as tglx says it might soon be possible to do so, and if
you have a patch for using some surplus clockevents for this along with some
example of it actually being used too, people will probably react,
atleast they will have opinions.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Low latency mode and performance of pty's?
http://groups.google.com/group/linux.kernel/t/5a2b00e35b0864a7?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:40 am
From: Jef Driesen


Hi,

I have an application that uses a pair of pseudo terminals (using socat
to link them together) to simulate a serial nullmodem cable. I use this
setup to simulate the serial communication with an external device in my
test environment. It works fine, but I noticed the performance is quite bad.

You can reproduce the problem by running these commands in three
different terminals:

# Terminal 1: Setup the pty's.
socat PTY,link=/tmp/ttyS0 PTY,link=/tmp/ttyS1

# Terminal 2: Send some data.
dd if=/dev/urandom of=input.bin bs=538368 count=1
sx input.bin >>/tmp/ttyS0 </tmp/ttyS0

# Terminal 2: Receive the data data.
time rx output.bin >/tmp/ttyS1 </tmp/ttyS1

On my system, this takes about 2m49s, which is much longer than it
should be. After some discussion in the comp.unix.programmer [1] and
comp.os.linux.development.apps [2] newsgroups, I learned pty's have a
low latency flag, which is disabled by default. When patching my kernel
to enable this flag (patch included at the end of this post), the
transfer time goes down to about 30s. When running my own applications,
it's get even better and the transfer is almost instantly.

Now, the reason why I'm reporting this on LKML is that the patch I
applied is almost identical to a commit [3] that got reverted again [4].
Based on the comment in that last commit, the slow performance was
addressed in another way, but according to my measurements it isn't. Or
at least it's not as good as setting the low latency flag directly.
Unfortunately I don't have any experience with kernel development and I
have no idea how to fix this properly.

[1]
http://groups.google.com/group/comp.unix.programmer/browse_thread/thread/503d529448f045e5
[2]
http://groups.google.be/group/comp.os.linux.development.apps/browse_thread/thread/e6041109eb91aaf7/fccf3946dce9c4b1
[3]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3a54297478e6578f96fd54bf4daa1751130aca86
[4]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e043e42bdb66885b3ac10d27a01ccb9972e2b0a3

Thanks,

Jef

--- drivers/char/pty.c.orig 2010-01-11 13:00:58.000000000 +0100
+++ drivers/char/pty.c 2010-01-11 13:01:04.000000000 +0100
@@ -200,6 +200,8 @@ static int pty_open(struct tty_struct *t
if (tty->link->count != 1)
goto out;

+ tty->low_latency = 1;
+
clear_bit(TTY_OTHER_CLOSED, &tty->link->flags);
set_bit(TTY_THROTTLED, &tty->flags);
retval = 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: tcp: bugs and cleanup for 2.6.34-rc1
http://groups.google.com/group/linux.kernel/t/c35ca27b143f9a64?hl=en
==============================================================================

== 1 of 8 ==
Date: Thurs, Mar 11 2010 3:40 am
From: William Allen Simpson


I'd have thought that there would be greater interest about patching
crashing bugs, signed versus unsigned (underflow) bugs, TCP DoS bugs,
TCP data corruption, and TCP performance problems....

There's been ample warning. Zero-day security issues will be reported
to the usual announcement lists. In particular, these 0day exploits
affect systems as far back as the 2005 changeover to git.

Combination of patches reported in October, November, December, January,
and February, for 2.6.32, 2.6.33, and now 2.6.34.

This code has had previous review and several months of limited testing.

Some portions were removed during the various TCPCT part 1 patch splits,
then were cut off by the sudden unexpected end of that merge window.
[03 Dec 2009] I've restarted the sub-numbering (again).

Of particular interest are the TCPCT header extensions that already
appear in the next phase of testing with other platforms. These patches
allow correct reception without data corruption.

The remainder of the original TCPCT part 2 will be merged with part 3.

[Updated to 2010 Mar 08 2.6.34-rc1.]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 8 ==
Date: Thurs, Mar 11 2010 3:50 am
From: William Allen Simpson


Redefine two TCP header functions to accept TCP header pointer.
When subtracting, return signed int to allow error checking.

These functions will be used in subsequent patches that implement
additional features.

Signed-off-by: William.Allen.Simpson@gmail.com
---
include/linux/tcp.h | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)


== 3 of 8 ==
Date: Thurs, Mar 11 2010 4:00 am
From: William Allen Simpson


The tcp_optlen() function returns a potential *negative* unsigned.

In the only two existing files using the old tcp_optlen() function,
clean up confusing and inconsistent mixing of both byte and word
offsets, and other coding style issues. Document assumptions.

Quoth David Miller:
This is transmit, and the packets can only come from the Linux
TCP stack, not some external entity.

You're being way too anal here, and adding these checks to
drivers would be just a lot of rediculious bloat. [sic]

Therefore, there are *no* checks for bad TCP and IP header sizes, nor
any semantic changes. The drivers should function exactly as existing,
although usage of int should ameliorate the issues.

No response from testers in 21+ weeks.

[removed comment references to commit log]

Requires:
net: tcp_header_len_th and tcp_option_len_th

Signed-off-by: William.Allen.Simpson@gmail.com
CC: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 29 +++++++++++++-----------
drivers/net/tg3.c | 60 +++++++++++++++++++++++---------------------------
include/linux/tcp.h | 5 ----
3 files changed, 44 insertions(+), 50 deletions(-)


== 4 of 8 ==
Date: Thurs, Mar 11 2010 4:10 am
From: William Allen Simpson


The tcp_optlen() function returns a potential *negative* unsigned.

In the only two existing files using the old tcp_optlen() function,
clean up confusing and inconsistent mixing of both byte and word
offsets, and other coding style issues. Document assumptions.

Quoth David Miller:
This is transmit, and the packets can only come from the Linux
TCP stack, not some external entity.

You're being way too anal here, and adding these checks to
drivers would be just a lot of rediculious bloat. [sic]

Therefore, there are *no* checks for bad TCP and IP header sizes, nor
any semantic changes. The drivers should function exactly as existing,
although usage of int should ameliorate the issues.

No response from testers in 21+ weeks.

[removed comment references to commit log]

Requires:
net: tcp_header_len_th and tcp_option_len_th

Signed-off-by: William.Allen.Simpson@gmail.com
CC: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 29 +++++++++++++-----------
drivers/net/tg3.c | 60 +++++++++++++++++++++++---------------------------
include/linux/tcp.h | 5 ----
3 files changed, 44 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 5 of 8 ==
Date: Thurs, Mar 11 2010 4:10 am
From: William Allen Simpson


Harmonize tcp_v4_rcv() and tcp_v6_rcv() -- better document tcp doff
and header length assumptions, and carefully compare implementations.

Reduces multiply/shifts, marginally improving speed.

Removes redundant tcp header length checks before checksumming.

Instead, assumes (and documents) that any backlog processing and
transform policies will carefully preserve the header, and will
ensure the socket buffer length remains >= the header size.

Quoth David Miller:
Not true.

The skb->len can be modified by the call to sk_filter() done
by tcp_v4_rcv().

Therefore, added extra redundant checks for any sk_filter() problems,
also in tcp_v6_rcv().

[updated comments, resolved conflicts]

Stand-alone patch, originally developed for TCPCT.

Signed-off-by: William.Allen.Simpson@gmail.com
Reviewed-by: Andi Kleen <andi@firstfloor.org>
---
include/net/xfrm.h | 7 +++++
net/ipv4/tcp_ipv4.c | 57 ++++++++++++++++++++++++---------------
net/ipv6/tcp_ipv6.c | 72 +++++++++++++++++++++++++++++++-------------------
3 files changed, 87 insertions(+), 49 deletions(-)


== 6 of 8 ==
Date: Thurs, Mar 11 2010 4:30 am
From: William Allen Simpson


Fix incorrect header prediction flags documentation.

Relieve register pressure in (the i386) fast path by accessing skb->len
directly, instead of carrying a rarely used len parameter.

Eliminate unused len parameters in two other functions.

Don't use output calculated tp->tcp_header_len for input decisions.
While the output header is usually the same as the input (same options
in both directions), that's a poor assumption. In particular, Sack will
be different. Newer options are not guaranteed.

Moreover, in the fast path, that only saved a shift or two. The other
efficiencies in this patch more than make up the difference.

Instead, use tp->rx_opt.tstamp_ok to accurately predict header length.

Likewise, use tp->rx_opt.tstamp_ok for received MSS calculations.

Don't use "sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED" to guess that
the timestamp is present. This may have been OK in the days with fewer
possible options, but various combinations of newer options may yield
the same header length. (This bug is in 3 places.)

Instead, use tp->rx_opt.saw_tstamp to determine a timestamp is present.

There's no need to test buffer length against header length, already
checked by tcp_v[4,6]_rcv(). Straighten code for minor efficiency gain.

Stand-alone patch, originally developed for TCPCT.

Requires:
net: tcp_header_len_th and tcp_option_len_th
tcp: harmonize tcp_vx_rcv header length assumptions

Signed-off-by: William.Allen.Simpson@gmail.com
Reviewed-by: Andi Kleen <andi@firstfloor.org>
---
include/linux/tcp.h | 6 ++-
include/net/tcp.h | 18 ++++++--
net/ipv4/tcp_input.c | 96 +++++++++++++++++++---------------------------
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_minisocks.c | 3 +-
net/ipv4/tcp_probe.c | 2 +-
net/ipv6/tcp_ipv6.c | 4 +-
7 files changed, 63 insertions(+), 70 deletions(-)


== 7 of 8 ==
Date: Thurs, Mar 11 2010 4:40 am
From: William Allen Simpson


When accompanied by cookie option, Initiator (client) queues incoming
SYNACK transaction data.

This is a straightforward re-implementation of an earlier (18 month old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley). The patch was previously reviewed:

http://thread.gmane.org/gmane.linux.network/102586

This function will also be used in subsequent patches that implement
additional features.

Requires:
TCPCT part 1g: Responder Cookie => Initiator
net: tcp_header_len_th and tcp_option_len_th

Signed-off-by: William.Allen.Simpson@gmail.com
---
net/ipv4/tcp_input.c | 26 +++++++++++++++++++++++++-
1 files changed, 25 insertions(+), 1 deletions(-)


== 8 of 8 ==
Date: Thurs, Mar 11 2010 4:50 am
From: William Allen Simpson


Split switch, shift cases to the left, fix most lines beyond column 80.

Add error return.

Harmonize parameter order with other tcp_input functions:
tcp_fast_parse_options() and tcp_validate_incoming().

Harmonize initialization in syncookies.
Fix initialization in tcp_minisocks.

Repair net/ipv4/tcp_ipv4.c errors with goto targets, overlooked by
David Miller in commit bb5b7c11263dbbe78253cd05945a6bf8f55add8e

[updated to latest posted internet draft-simpson-tcpct-00]

Requires:
TCPCT part 1g: Responder Cookie => Initiator
net: tcp_header_len_th and tcp_option_len_th

Signed-off-by: William.Allen.Simpson@gmail.com
---
include/net/tcp.h | 5 +-
net/ipv4/syncookies.c | 9 +-
net/ipv4/tcp_input.c | 238 +++++++++++++++++++++++++++-------------------
net/ipv4/tcp_ipv4.c | 10 ++-
net/ipv4/tcp_minisocks.c | 26 ++++--
net/ipv6/syncookies.c | 9 +-
net/ipv6/tcp_ipv6.c | 6 +-
7 files changed, 185 insertions(+), 118 deletions(-)

==============================================================================
TOPIC: [PATCH 7/7] xen: Enable event channel of PV extension of HVM
http://groups.google.com/group/linux.kernel/t/eac82160cca26360?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:40 am
From: Stefano Stabellini


On Thu, 11 Mar 2010, Sheng Yang wrote:
> On Wednesday 10 March 2010 23:29:30 Stefano Stabellini wrote:
> > On Wed, 10 Mar 2010, Sheng Yang wrote:
> > > I think we can leave the controversial thing later. At least, we want a
> > > framework for PV extension of HVM. We can work together to determine what
> > > is the better way for evtchn, as well as porting pirqs. (And the later
> > > MSI work may also depends on it)
> >
> > Most of my patch series can be upstreamed right now, the only thing that
> > needs some extra work is the pirq remapping.
> > So yes, we can upstream the rest and that would also make PV on HVM work
> > on linux upstream ASAP.
> > I think that is a worthy goal by itself, therefore I am going to send
> > another reduced patch series, ready to be upstreamed, without pirq
> > remappings, with support for PV on HVM.
> >
> I think my first 6 patches can be used without modification. And I've already
> spent lots of efforts on them with Jeremy.
>
> We can work on evtchn and platform pci later.
>

Agreed on evtchn but platform pci should be present even in the basic
version.
Please read the series "basic PV on HVM support" I posted yesterday
because I believe has already everything needed to be accepted upstream
right now; at the same time it introduces the pv clocksource and the
vector based callback mechanism that is going to be needed for the
interrupt remapping we are going to work on later on.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ramzswap: Eliminate stale data in compressed memory
http://groups.google.com/group/linux.kernel/t/7889db6e2e4c45f0?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:40 am
From: Nitin Gupta


On Thu, Mar 11, 2010 at 12:52 PM, Hugh Dickins
<hugh.dickins@tiscali.co.uk> wrote:
> On Fri, 5 Mar 2010, Nitin Gupta wrote:
<snip>
>
>> Adding handler for this callback:
>> � � � swapon notifier --> set_swap_free_notify(swap_type, fn)
>>
>> Removing handler:
>> � � � swapoff notifier --> set_swap_free_notify(swap_type, NULL)
>>
>>
>> Alternative approaches:
>> 1) Add callback directly in 'struct block_device_operations' but
>> that is considered too hacky.
>> 2) Use swap discard mechanism: It involves unncessary overhead of
>> allocating 'discard bio' requests and its too slow to serve ramzswap
>> needs.
>>
>> �drivers/staging/ramzswap/ramzswap_drv.c � | � 91 +++++++++++++++++++++++++++++
>> �drivers/staging/ramzswap/ramzswap_drv.h � | � �1 +
>> �drivers/staging/ramzswap/ramzswap_ioctl.h | � �1 +
>> �include/linux/swap.h � � � � � � � � � � �| � 16 +++++-
>> �mm/swapfile.c � � � � � � � � � � � � � � | � 78 ++++++++++++++++++++++++
>> �5 files changed, 185 insertions(+), 2 deletions(-)
>

> This is one of the various solutions I disliked already, isn't it?
>

This is a kind of hybrid of previous approaches. We now have notifier
for swapon/off events only -- which Linus seems to be okay with (earlier
we had notifier for slot free event also). Handler for swapon is responsible
for setting callback for slot free event (earlier we were installing it from
"ramzswap_make_request" directly, when first I/O is received).


> To me this is just a more convoluted and obscure variant of the
> block_device_operations swap_slot_free_notify patch you had in mmotm,
> which Linus has rejected.
>

This slot free notification is a sort of black area for which I have
really run out of
ideas. I tried all the approaches I could think of (and all you and
Pekka suggested)
but nothing seems to satisfy everyone. This is so essential for ramzswap but now
I don't know how to get this done.

> I admit, I did not understand at all what Linus was proposing with
> readpage, writepage and address_space_operations: I kept quiet in
> the hope that you'd understand where I didn't!
>

I guess he is suggesting creating a fictional 'address_space' just like
we have 'struct address_space swapper_space'. This way, we can
intercept writepage naturally in pageout() with new mapping->a_ops->writepage().
And then similarly with mapping->a_ops->readpage().

I'm now looking into this approach. This might allow compressing page cache
pages too. However, figuring out all the details can take significant
amount of time
and allowing ramzswap development to happen in parallel should be okay.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: oprofile, perf, x86: introduce new functions to reserve perfctrs
http://groups.google.com/group/linux.kernel/t/716ea40ac646a344?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 11 2010 3:50 am
From: Peter Zijlstra


On Thu, 2010-03-04 at 18:59 +0100, Peter Zijlstra wrote:
> On Thu, 2010-03-04 at 16:22 +0100, Robert Richter wrote:
> > This patch set improves the perfctr reservation code. New functions
> > are available to reserve a counter by its index only. It is no longer
> > necessary to allocate both msrs of a counter which also improves the
> > code and makes it easier.
> >
> > For oprofile a handler is implemented that returns an error now if a
> > counter is already reserved by a different subsystem such as perf or
> > watchdog. Before, oprofile silently ignored that counter. Finally the
> > new reservation functions can be used to allocate special parts of the
> > pmu such as IBS, which is necessary to use IBS with perf too.
> >
> > The patches are available in the oprofile tree:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
> >
> > If there are no objections, I suggest to merge it into the
> > tip/perf/core too, maybe after pending patches went in. If there are
> > already conflicts, I will do the merge for this.
>
> Right, so cleaning up that reservation code is nice, but wouldn't it be
> much nicer to simply do away with all that and make everything use the
> (low level) perf code?

Alternatively, could we maybe further simplify this reservation into:

int reserve_pmu(void);
void release_pmu(void);

And not bother with anything finer grained.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Thurs, Mar 11 2010 4:50 am
From: Ingo Molnar

* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2010-03-04 at 18:59 +0100, Peter Zijlstra wrote:
> > On Thu, 2010-03-04 at 16:22 +0100, Robert Richter wrote:
> > > This patch set improves the perfctr reservation code. New functions
> > > are available to reserve a counter by its index only. It is no longer
> > > necessary to allocate both msrs of a counter which also improves the
> > > code and makes it easier.
> > >
> > > For oprofile a handler is implemented that returns an error now if a
> > > counter is already reserved by a different subsystem such as perf or
> > > watchdog. Before, oprofile silently ignored that counter. Finally the
> > > new reservation functions can be used to allocate special parts of the
> > > pmu such as IBS, which is necessary to use IBS with perf too.
> > >
> > > The patches are available in the oprofile tree:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
> > >
> > > If there are no objections, I suggest to merge it into the
> > > tip/perf/core too, maybe after pending patches went in. If there are
> > > already conflicts, I will do the merge for this.
> >
> > Right, so cleaning up that reservation code is nice, but wouldn't it be
> > much nicer to simply do away with all that and make everything use the
> > (low level) perf code?
>
> Alternatively, could we maybe further simplify this reservation into:
>
> int reserve_pmu(void);
> void release_pmu(void);
>
> And not bother with anything finer grained.

Yeah, that looks quite a bit simpler.

It's all about making it easier to live with legacies anyway - all modern
facilities will use perf events to access the PMU.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Please pull my perf.git urgent branch
http://groups.google.com/group/linux.kernel/t/caf7112ff37b81a8?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 3:50 am
From: Ingo Molnar

* Paul Mackerras <paulus@samba.org> wrote:

> Ingo,
>
> My fix for the oops Anton was seeing was acked by Peter Z and
> attracted no other comments, so please pull it and send it on to
> Linus with the next batch of urgent fixes.
>
> Thanks,
> Paul.
>
> The following changes since commit 548b84166917d6f5e2296123b85ad24aecd3801d:
> Ingo Molnar (1):
> Merge commit 'v2.6.34-rc1' into perf/urgent
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf.git urgent
>
> Paul Mackerras (1):
> perf_event: Fix oops triggered by cpu offline/online
>
> kernel/perf_event.c | 13 ++++++++++++-
> 1 files changed, 12 insertions(+), 1 deletions(-)

Applied, thanks a lot Paul!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: perf_events: add sampling period randomization support (v2)
http://groups.google.com/group/linux.kernel/t/e1eb7c2b14245ed5?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 4:00 am
From: Ingo Molnar

* Stephane Eranian <eranian@google.com> wrote:

> On Thu, Mar 4, 2010 at 1:13 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Stephane Eranian <eranian@google.com> wrote:
> >
> >> On Thu, Mar 4, 2010 at 3:32 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >> >
> >> > * eranian@google.com <eranian@google.com> wrote:
> >> >
> >> >> This patch adds support for randomizing the sampling period. ??Randomization
> >> >> is very useful to mitigate the bias that exists with sampling. The random
> >> >> number generator does not need to be sophisticated. This patch uses the
> >> >> builtin random32() generator.
> >> >
> >> >> + ?? ?? if (width > 63 || attr->freq)
> >> >> + ?? ?? ?? ?? ?? ?? return -EINVAL;
> >> >
> >> > Why not for freq counters? Those are semi-randomized already, but it might
> >> > make sense to make them 'more' randomized in special circumstances. That would
> >> > also allow us to enable the randomization in perf top and perf record, by
> >> > default.
> >> >
> >>
> >> What's the goal of freq?
> >> Achieve and maintain the target interrupt/rate.
> >> In doing so, it has to adjust the period (not randomly).
> >
> > No, the goal of auto-freq is to keep a steady average rate of sampling.
>
> rate of samples = rate of interrupts (if period < 32 bits on Intel).

What's your point? I corrected your statement which said that the goal of
auto-freq was to maintain a target interrupt-rate and as such wouldnt be
randomizable. So i said that auto-freq is slightly different from that: it
provides a steady _average_ rate, and as such small amounts of randomization
'fuzz' could still be injected - the auto-freq system would auto-correct the
effects of that.

Think of it as a dynamic steady-state equilibrium with noise injected. If the
noise isnt too brutal and the system can adapt, the average sampling rate
doesnt change.

> > There is no requirement to keep it 'steady' - each sample comes with a
> > specific weight.
> >
> >> Randomization may prevent achieving the rate, or it may slow it down.
> >> What's the value add of that?
> >
> > Why do you assume that the two are incompatible? We can randomize
> > auto-freq and still have a perfectly stable average rate.
>
> What would that buy you compared to what you already have?

The same goal as randomization in general: to decrease the chance for sampling
artifacts that can occur due to the sampling frequency oscillating together
with some internal workload parameter, skewing the sample.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: fs/partition/msdos: Fix unusable extended partition for > 512B sector
http://groups.google.com/group/linux.kernel/t/feedecd5f30fb05f?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 4:00 am
From: OGAWA Hirofumi


"Daniel Taylor" <Daniel.Taylor@wdc.com> writes:

> In the near future, WD will be releasing products that need this patch.
>
> Wouldn't it be a better Linux user experience to never have the problem,
> rather than wait for a bug-fix cycle on the kernel?

Of course, if we can fix, it's better.

However, probably, users of this patch would be only boot loader,
because this is a first sector on extended-partition itself, not
logical-partitions in extended-partition.

So, maybe this wouldn't be so major problem for normal users, and what
is needed actually is not sure to me for now (I guess it might be
depending to BIOS, if boot loader is using BIOS call.).

So, this patch provides one logical sector, but it's just guess.

> OTOH, it would be reasonable to wait until someone else had a chance to
> test the change. We are awaiting NDAs from RedHat, Canonical, and
> Novell/SUSE to send them the affected products for library/application
> development.

Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: allow retrieving entries with trailing dots
http://groups.google.com/group/linux.kernel/t/57d334aba1b16975?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 11 2010 4:10 am
From: Philippe De Muyter


Hello Ogawa,

On Thu, Mar 11, 2010 at 06:26:54PM +0900, OGAWA Hirofumi wrote:
> Philippe De Muyter <phdm@macqel.be> writes:
>
> >> >> This introduces unneeded directory-parse to standard one. And for
> >> >
> >> > No. There will only be a second parse if someone is looking for a filename
> >> > with a trailing dot, and it is not found (*). I there is no trailing dot in qname, only the first fat_search_long will be called, because len after vfat_striptail_len would be equal to qname->len.
> >>
> >> Yes, I'm saying about trailing-dot filename case. Any disadvantage to
> >> standard one is unacceptable to workaround.
> >
> > This is unavoidable.

We could perhaps reduce the disadvantage by moving the logic (test first
with the trailing dots, and then without if needed) into fat_search_long
instead of putting it in vfat_find as my patch proposal does. This way
the name to find would only be compared once to unrelated entries,
instead of possibly twice as my patch does.

But therefore I need your help : fat_search_long isn't easy to read/modify.

The logic could be modified as such :
- if the searched name contains trailing dots, first compare
the truncated part to the same length of the directory entry
- if they are the same, test the length of the rest of the directory entry
- if length_of_rest is 0, this could be the matching entry,
but there could be a better one later; keep searching till the end.
- if length_of_rest is not 0, compare the rest with the trailing dots
- if rests are equal, we have found it; return
- if rests are unequal, continue searching

>
> We would be able to introduce new mount option to do it if needed.

A new mount option should be the last programming option. It is better to
automatically do the right thing than to require the user to figure out
he must add a mount option in /etc/fstab or whatever. Remember this is
for hot-plug disks.

>
> >> $ ls
> >> a.. a. a
> >> $ rm -rf *
> >>
> >> $ ls
> >> a..
> >> $ touch a.
> >> $ touch a
> >> ...
> >>
> >> I assumed you want to define "a." and "a" are different name on
> >> "mv a a.", and _totally_.
> >
> > For file creation and renaming, I want to introduce no change, because there
> > is no problem. If one wants to create a "a." file on a IO-MEGA disk suing
> > linux and USB, it is currently called "a", and that will remain exactly the
> > same.
>
> This changed vfat_find(), so, this patch will change the behavior of all
> callers more or less. And the behavior seems to be really strange, you
> can remove "a.", but you can't create it?

Yes, you can remove any existing file, but if you want to create a file,
the name creation rules are kept unchanged. So, creating "a." will
succeed, it will actually be called "a" on disk, but you can still refer
to it as "a." : that will succeed. That's the strange part of the behavior
but that part is already present for compatibility reasons nor you nor me
can do anything about :(

> The behavior sounds random, right?

It's a compromise to avoid creating name entries that are not universally
accepted, but to allow accessing existing files and directories.

The mount option could be useful then to allow the creation of file and
directory names with trailing dot, but consistency between getdents and
stat/open must be automatic.

Best regards

Philippe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Thurs, Mar 11 2010 4:50 am
From: OGAWA Hirofumi


Philippe De Muyter <phdm@macqel.be> writes:

>> > This is unavoidable.
>
> We could perhaps reduce the disadvantage by moving the logic (test first
> with the trailing dots, and then without if needed) into fat_search_long
> instead of putting it in vfat_find as my patch proposal does. This way
> the name to find would only be compared once to unrelated entries,
> instead of possibly twice as my patch does.
>
> But therefore I need your help : fat_search_long isn't easy to read/modify.
>
> The logic could be modified as such :
> - if the searched name contains trailing dots, first compare
> the truncated part to the same length of the directory entry
> - if they are the same, test the length of the rest of the directory entry
> - if length_of_rest is 0, this could be the matching entry,
> but there could be a better one later; keep searching till the end.
> - if length_of_rest is not 0, compare the rest with the trailing dots
> - if rests are equal, we have found it; return
> - if rests are unequal, continue searching
>
>>
>> We would be able to introduce new mount option to do it if needed.
>
> A new mount option should be the last programming option. It is better to
> automatically do the right thing than to require the user to figure out
> he must add a mount option in /etc/fstab or whatever. Remember this is
> for hot-plug disks.

Sorry, but I'm not thinking this is primary one. So, requiring option
for it to avoid disadvantage of normal users, it sounds good.

>> This changed vfat_find(), so, this patch will change the behavior of all
>> callers more or less. And the behavior seems to be really strange, you
>> can remove "a.", but you can't create it?
>
> Yes, you can remove any existing file, but if you want to create a file,
> the name creation rules are kept unchanged. So, creating "a." will
> succeed, it will actually be called "a" on disk, but you can still refer
> to it as "a." : that will succeed. That's the strange part of the behavior
> but that part is already present for compatibility reasons nor you nor me
> can do anything about :(
>
>> The behavior sounds random, right?
>
> It's a compromise to avoid creating name entries that are not universally
> accepted, but to allow accessing existing files and directories.
>
> The mount option could be useful then to allow the creation of file and
> directory names with trailing dot, but consistency between getdents and
> stat/open must be automatic.

No, this breaks consistency. With this patch, unlink("a."), then
open("a.", O_CREAT) and write(), the result depend on existent
files. This patch is providing two files on one name.

Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: perf_events: fix X86 bogus counts when multiplexing
http://groups.google.com/group/linux.kernel/t/82db6d0109ff6b09?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 4:40 am
From: Peter Zijlstra


On Thu, 2010-03-11 at 09:32 +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-10 at 22:17 -0800, eranian@google.com wrote:
> > This patch fixes a bug in 2.6.33 X86 event scheduling whereby
> > all counts are bogus as soon as events need to be multiplexed
> > because the PMU is overcommitted.
> >
> > The code in hw_perf_enable() was causing multiplexed events
> > to accumulate collected counts twice causing bogus results.
> >
> > This is demonstrated on AMD Barcelona with the example
> > below. First run, no conflict, you obtain the actual counts.
> > Second run, PMU overcommitted, multiplexing, all events are
> > over-counted. Third run, patch applied, you obtain the correct
> > count through scaling.
> >
>
> I'm a bit puzzled by this one, if we, during scheduling move an event
> from idx 1 to idx 2, we need to stop it on 1 and start if on 2,
> otherwise we do not properly transfer its count, right?
>
> With the below patch it does no such thing.
>
> I did fix some funnies I observed with hw_perf_enable() while doing the
> PEBS stuff, and -tip does it wrong differently from what you illustrate,
> so while there defenately is something to fix, I doubt the below is
> correct.

OK, so what happens is that we schedule badly like:

<...>-1987 [019] 280.252808: x86_pmu_start: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252811: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252812: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252813: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252814: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252834: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252834: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252835: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252836: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*

This happens because we only iterate the n_running events in the first
pass, and reset their index to -1 if they don't match to force a
re-assignment.

Now, in our RR example, n_running == 0 because we fully unscheduled, so
event-50 will retain its idx==32, even though in scheduling it will have
gotten idx=0, and we don't trigger the re-assign path.

The easiest way to fix this is the below patch, which simply validates
the full assignment in the second pass.

---
arch/x86/kernel/cpu/perf_event.c | 12 +++---------
1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4a0514d..a3aff76 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -787,7 +787,6 @@ void hw_perf_enable(void)
* step2: reprogram moved events into new counters
*/
for (i = 0; i < n_running; i++) {
-
event = cpuc->event_list[i];
hwc = &event->hw;

@@ -802,21 +801,16 @@ void hw_perf_enable(void)
continue;

x86_pmu_stop(event);
-
- hwc->idx = -1;
}

for (i = 0; i < cpuc->n_events; i++) {
-
event = cpuc->event_list[i];
hwc = &event->hw;

- if (i < n_running &&
- match_prev_assignment(hwc, cpuc, i))
- continue;
-
- if (hwc->idx == -1)
+ if (!match_prev_assignment(hwc, cpuc, i))
x86_assign_hw_event(event, cpuc, i);
+ else if (i < n_running)
+ continue;

x86_pmu_start(event);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [PATCH 1/1] perf: add support for arch-dependent symbolic event names
to "perf stat"
http://groups.google.com/group/linux.kernel/t/d9c6c63579142582?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 11 2010 4:50 am
From: Ingo Molnar

* Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:

> On 3/3/2010 6:30 PM, Corey Ashford wrote:
> >For your review, this patch adds support for arch-dependent symbolic
> >event names to the "perf stat" tool, and could be expanded to other
> >"perf *" commands fairly easily, I suspect.
> >
> >To support arch-dependent event names without adding arch-dependent code
> >to perf, I added a callout mechanism whereby perf will look for the
> >environment variable: PERF_ARCH_DEP_LIB, and if it exists, it will try
> >to open it as a shared object. If that succeeds, it looks for the symbol
> >"parse_arch_dep_event". If that exists, that function will be called by
> >parse_events() before all of the other event parsing functions in
> >parse-events.c. It is passed the same arguments as the other
> >parse_*_event functions, namely the event string and a pointer to an
> >event attribute structure.
> >
> >As the code existed, "perf stat" would print out the count results, but
> >for raw events (which is how arch-dependent events are supported in
> >perf_events), it would just print out a raw code. This is not
> >acceptable, especially when a symbolic name was placed on the command
> >line. So I changed the code to save away the event name that was passed
> >on the command line, rather than doing a reverse translation to an event
> >string based on the event type and config fields of the attr structure.
> >In this way, there's no need for a reverse translation function in the
> >arch-dependent library; only a event string->attr struct function is
> >needed.
> >
> >I could well be missing something, but I don't understand why reverse
> >translation is ever needed in perf, as long as the tool keeps track of
> >the original event strings.
>
> A couple of follow-up comments on this patch:
>
> This functionality was designed to provide a generalized interface to an
> external event name -> attr struct library, such as libpfm4. libpfm4 has an
> interface that nearly exactly matches parse_*_event() profiles, so it's
> quite easy to write a small wrapper function to call libpfm4's function.
>
> Ingo Molnar discussed adding some visibility to the arch-dependent event
> names through some other interface, such as through /sys/devices/pmus
> perhaps, but that discussion is a long way (as far as I know) from having
> something usable today. So you could think of this external library
> approach to be a stop-gap until something better is developed. When/if that
> new event naming mechanism becomes available, we can easily remove this
> external library support from perf.

I'm quite much against stop-gap measures like this - they tend to become
tomorrow's impossible-to-remove quirk.

If you want extensible events you can already do it by providing an ftrace
tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
supported throughout by perf.

That could be librarized further by providing an /eventfs or /proc/events
interface to enumerate them.

Or if you want to extend the perf events namespace ABI you can send patches
for that as well. (It's not a big issue if a particular event is currently
only supported on Power for example - as long as you make a good effort naming
and structuring it in a reasonably generic way.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate