twitter: linux.kernel - 26 new messages in 18 topics

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

Today's topics:

* perf symbols: Adopt the strlists for dso, comm, - 3 messages, 3 authors
http://groups.google.com/group/linux.kernel/t/6ce7ba2823135055?hl=en
* cfq: Take whether cfq group is changed into account when choosing service
tree - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/c5cd6622ae55dd5c?hl=en
* Defer skb allocation -- add destroy buffers function for virtio] - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/8f0c8cf3582d2d68?hl=en
* PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc & receive calls - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/451cae96dcf2b8b4?hl=en
* wireless: wext: allocate space for NULL-termination for 32byte SSIDs - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/375be3f65cf8eb6c?hl=en
* Async suspend-resume patch w/ completions (was: Re: Async suspend-resume
patch w/ rwsems) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/744b2baf61c0ac2a?hl=en
* sh: Fix test of unsigned in se7722_irq_demux() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/d3980cfd65569980?hl=en
* [PATCH 2/2] iwlwifi: unify iwl_setup_rxon_timing - 4 messages, 3 authors
http://groups.google.com/group/linux.kernel/t/cf1f15614c9dac9c?hl=en
* THE-BIG-BIG LOTTO, UK ©2009 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/58ca258f5eee64aa?hl=en
* 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at
000000000000001f - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ad244ee9c25bcdf4?hl=en
* [input] add mc13783 touchscreen driver - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2ba86257c3bb866d?hl=en
* Possible data loss on ext[34], reiserfs with external journal - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/326aed9339cfdbd9?hl=en
* Are these MTRR settings correct? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ee98ace36c359c1f?hl=en
* [PATCH 2/5] mm : avoid false sharing on mm_counter - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1d905e0d3fc3ccff?hl=en
* x86: UV - XPC fixes with related support functionality V2. - 2 messages, 1
author
http://groups.google.com/group/linux.kernel/t/fd19743e8bee3c06?hl=en
* Per cpu atomics in core allocators and cleanup - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/92c1d57d42e5ddf3?hl=en
* Input: Fix test of unsigned in altera_ps2_probe() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/4936bafefa349e70?hl=en
* On cgroup, cpuset and rcu and a patch. - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/42c0c82ab68ecf3f?hl=en

==============================================================================
TOPIC: perf symbols: Adopt the strlists for dso, comm,
http://groups.google.com/group/linux.kernel/t/6ce7ba2823135055?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Dec 15 2009 8:10 am
From: Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Will be used in perf diff too.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-annotate.c | 4 +-
tools/perf/builtin-diff.c | 9 ++---
tools/perf/builtin-kmem.c | 4 +-
tools/perf/builtin-record.c | 6 ++--
tools/perf/builtin-report.c | 73 +++++++++++++++++----------------------
tools/perf/builtin-timechart.c | 4 +-
tools/perf/builtin-trace.c | 4 +-
tools/perf/util/symbol.c | 33 ++++++++++++++++++
tools/perf/util/symbol.h | 9 +++++
9 files changed, 89 insertions(+), 57 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index e656e25..645d580 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -518,14 +518,14 @@ static const struct option options[] = {

int cmd_annotate(int argc, const char **argv, const char *prefix __used)
{
+ argc = parse_options(argc, argv, options, annotate_usage, 0);
+
symbol_conf.priv_size = sizeof(struct sym_priv);
symbol_conf.try_vmlinux_path = true;

if (symbol__init() < 0)
return -1;

- argc = parse_options(argc, argv, options, annotate_usage, 0);
-
setup_sorting(annotate_usage, options);

if (argc) {
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 67328d1..4fde606 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -265,11 +265,6 @@ static const struct option options[] = {

int cmd_diff(int argc, const char **argv, const char *prefix __used)
{
- if (symbol__init() < 0)
- return -1;
-
- setup_sorting(diff_usage, options);
-
argc = parse_options(argc, argv, options, diff_usage, 0);
if (argc) {
if (argc > 2)
@@ -281,6 +276,10 @@ int cmd_diff(int argc, const char **argv, const char *prefix __used)
input_new = argv[0];
}

+ if (symbol__init() < 0)
+ return -1;
+
+ setup_sorting(diff_usage, options);
setup_pager();
return __cmd_diff();
}
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index e078797..fc21ad7 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -766,13 +766,13 @@ static int __cmd_record(int argc, const char **argv)

int cmd_kmem(int argc, const char **argv, const char *prefix __used)
{
- symbol__init();
-
argc = parse_options(argc, argv, kmem_options, kmem_usage, 0);

if (!argc)
usage_with_options(kmem_usage, kmem_options);

+ symbol__init();
+
if (!strncmp(argv[0], "rec", 3)) {
return __cmd_record(argc, argv);
} else if (!strcmp(argv[0], "stat")) {
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1da48a8..65301c5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -632,13 +632,13 @@ int cmd_record(int argc, const char **argv, const char *prefix __used)
{
int counter;

- symbol__init();
-
argc = parse_options(argc, argv, options, record_usage,
- PARSE_OPT_STOP_AT_NON_OPTION);
+ PARSE_OPT_STOP_AT_NON_OPTION);
if (!argc && target_pid == -1 && !system_wide)
usage_with_options(record_usage, options);

+ symbol__init();
+
if (!nr_counters) {
nr_counters = 1;
attrs[0].type = PERF_TYPE_HARDWARE;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c349bdb..03afac3 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -33,10 +33,6 @@

static char const *input_name = "perf.data";

-static char *dso_list_str, *comm_list_str, *sym_list_str,
- *col_width_list_str;
-static struct strlist *dso_list, *comm_list, *sym_list;
-
static int force;
static bool use_callchain;

@@ -365,8 +361,9 @@ static size_t hist_entry__fprintf(FILE *fp, struct hist_entry *self,

static void dso__calc_col_width(struct dso *self)
{
- if (!col_width_list_str && !field_sep &&
- (!dso_list || strlist__has_entry(dso_list, self->name))) {
+ if (!symbol_conf.col_width_list_str && !field_sep &&
+ (!symbol_conf.dso_list ||
+ strlist__has_entry(symbol_conf.dso_list, self->name))) {
unsigned int slen = strlen(self->name);
if (slen > dsos__col_width)
dsos__col_width = slen;
@@ -379,8 +376,9 @@ static void thread__comm_adjust(struct thread *self)
{
char *comm = self->comm;

- if (!col_width_list_str && !field_sep &&
- (!comm_list || strlist__has_entry(comm_list, comm))) {
+ if (!symbol_conf.col_width_list_str && !field_sep &&
+ (!symbol_conf.comm_list ||
+ strlist__has_entry(symbol_conf.comm_list, comm))) {
unsigned int slen = strlen(comm);

if (slen > comms__col_width) {
@@ -442,7 +440,7 @@ static size_t perf_session__fprintf_hist_entries(struct perf_session *self,
struct rb_node *nd;
size_t ret = 0;
unsigned int width;
- char *col_width = col_width_list_str;
+ char *col_width = symbol_conf.col_width_list_str;
int raw_printing_style;

raw_printing_style = !strcmp(pretty_printing_style, "raw");
@@ -468,7 +466,7 @@ static size_t perf_session__fprintf_hist_entries(struct perf_session *self,
}
width = strlen(se->header);
if (se->width) {
- if (col_width_list_str) {
+ if (symbol_conf.col_width_list_str) {
if (col_width) {
*se->width = atoi(col_width);
col_width = strchr(col_width, ',');
@@ -587,7 +585,8 @@ static int process_sample_event(event_t *event, struct perf_session *session)

dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);

- if (comm_list && !strlist__has_entry(comm_list, thread->comm))
+ if (symbol_conf.comm_list &&
+ !strlist__has_entry(symbol_conf.comm_list, thread->comm))
return 0;

cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
@@ -601,14 +600,15 @@ static int process_sample_event(event_t *event, struct perf_session *session)
if (al.map && !sort_dso.elide && !al.map->dso->slen_calculated)
dso__calc_col_width(al.map->dso);

- if (dso_list &&
+ if (symbol_conf.dso_list &&
(!al.map || !al.map->dso ||
- !(strlist__has_entry(dso_list, al.map->dso->short_name) ||
+ !(strlist__has_entry(symbol_conf.dso_list, al.map->dso->short_name) ||
(al.map->dso->short_name != al.map->dso->long_name &&
- strlist__has_entry(dso_list, al.map->dso->long_name)))))
+ strlist__has_entry(symbol_conf.dso_list, al.map->dso->long_name)))))
return 0;

- if (sym_list && al.sym && !strlist__has_entry(sym_list, al.sym->name))
+ if (symbol_conf.sym_list && al.sym &&
+ !strlist__has_entry(symbol_conf.sym_list, al.sym->name))
return 0;

if (perf_session__add_hist_entry(session, &al, data.callchain, data.period)) {
@@ -825,13 +825,13 @@ static const struct option options[] = {
OPT_CALLBACK_DEFAULT('g', "call-graph", NULL, "output_type,min_percent",
"Display callchains using output_type and min percent threshold. "
"Default: fractal,0.5", &parse_callchain_opt, callchain_default_opt),
- OPT_STRING('d', "dsos", &dso_list_str, "dso[,dso...]",
+ OPT_STRING('d', "dsos", &symbol_conf.dso_list_str, "dso[,dso...]",
"only consider symbols in these dsos"),
- OPT_STRING('C', "comms", &comm_list_str, "comm[,comm...]",
+ OPT_STRING('C', "comms", &symbol_conf.comm_list_str, "comm[,comm...]",
"only consider symbols in these comms"),
- OPT_STRING('S', "symbols", &sym_list_str, "symbol[,symbol...]",
+ OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
"only consider these symbols"),
- OPT_STRING('w', "column-widths", &col_width_list_str,
+ OPT_STRING('w', "column-widths", &symbol_conf.col_width_list_str,
"width[,width...]",
"don't try to adjust column width, use these fixed values"),
OPT_STRING('t', "field-separator", &field_sep, "separator",
@@ -840,32 +840,25 @@ static const struct option options[] = {
OPT_END()
};

-static void setup_list(struct strlist **list, const char *list_str,
- struct sort_entry *se, const char *list_name,
- FILE *fp)
+static void sort_entry__setup_elide(struct sort_entry *self,
+ struct strlist *list,
+ const char *list_name, FILE *fp)
{
- if (list_str) {
- *list = strlist__new(true, list_str);
- if (!*list) {
- fprintf(stderr, "problems parsing %s list\n",
- list_name);
- exit(129);
- }
- if (strlist__nr_entries(*list) == 1) {
- fprintf(fp, "# %s: %s\n", list_name,
- strlist__entry(*list, 0)->s);
- se->elide = true;
- }
+ if (list && strlist__nr_entries(list) == 1) {
+ fprintf(fp, "# %s: %s\n", list_name, strlist__entry(list, 0)->s);
+ self->elide = true;
}
}

int cmd_report(int argc, const char **argv, const char *prefix __used)
{
+ argc = parse_options(argc, argv, options, report_usage, 0);
+
+ setup_pager();
+
if (symbol__init() < 0)
return -1;

- argc = parse_options(argc, argv, options, report_usage, 0);
-
setup_sorting(report_usage, options);

if (parent_pattern != default_parent_pattern) {
@@ -880,11 +873,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __used)
if (argc)
usage_with_options(report_usage, options);

- setup_pager();
-
- setup_list(&dso_list, dso_list_str, &sort_dso, "dso", stdout);
- setup_list(&comm_list, comm_list_str, &sort_comm, "comm", stdout);
- setup_list(&sym_list, sym_list_str, &sort_sym, "symbol", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
+ sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", stdout);
+ sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);

if (field_sep && *field_sep == '.') {
fputs("'.' is the only non valid --field-separator argument\n",
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9c98b7a..a589a43 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -1137,11 +1137,11 @@ static const struct option options[] = {

int cmd_timechart(int argc, const char **argv, const char *prefix __used)
{
- symbol__init();
-
argc = parse_options(argc, argv, options, timechart_usage,
PARSE_OPT_STOP_AT_NON_OPTION);

+ symbol__init();
+
if (argc && !strncmp(argv[0], "rec", 3))
return __cmd_record(argc, argv);
else if (argc)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 07ad25c..e2285e2 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -579,13 +579,13 @@ int cmd_trace(int argc, const char **argv, const char *prefix __used)
exit(-1);
}

- symbol__init();
-
setup_scripting();

argc = parse_options(argc, argv, options, annotate_usage,
PARSE_OPT_STOP_AT_NON_OPTION);

+ if (symbol__init() < 0)
+ return -1;
setup_pager();

session = perf_session__new(input_name, O_RDONLY, 0);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 17ce012..164286a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1,6 +1,7 @@
#include "util.h"
#include "../perf.h"
#include "session.h"
+#include "sort.h"
#include "string.h"
#include "symbol.h"
#include "thread.h"
@@ -1739,6 +1740,20 @@ out_fail:
return -1;
}

+static int setup_list(struct strlist **list, const char *list_str,
+ const char *list_name)
+{
+ if (list_str == NULL)
+ return 0;
+
+ *list = strlist__new(true, list_str);
+ if (!*list) {
+ pr_err("problems parsing %s list\n", list_name);
+ return -1;
+ }
+ return 0;
+}
+
int symbol__init(void)
{
elf_version(EV_CURRENT);
@@ -1749,7 +1764,25 @@ int symbol__init(void)
if (symbol_conf.try_vmlinux_path && vmlinux_path__init() < 0)
return -1;

+ if (setup_list(&symbol_conf.dso_list,
+ symbol_conf.dso_list_str, "dso") < 0)
+ return -1;
+
+ if (setup_list(&symbol_conf.comm_list,
+ symbol_conf.comm_list_str, "comm") < 0)
+ goto out_free_dso_list;
+
+ if (setup_list(&symbol_conf.sym_list,
+ symbol_conf.sym_list_str, "symbol") < 0)
+ goto out_free_comm_list;
+
return 0;
+
+out_free_dso_list:
+ strlist__delete(symbol_conf.dso_list);
+out_free_comm_list:
+ strlist__delete(symbol_conf.comm_list);
+ return -1;
}

int perf_session__create_kernel_maps(struct perf_session *self)
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 7662947..d61f350 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -49,12 +49,21 @@ struct symbol {
char name[0];
};

+struct strlist;
+
struct symbol_conf {
unsigned short priv_size;
bool try_vmlinux_path,
use_modules,
sort_by_name;
const char *vmlinux_name;
+ char *dso_list_str,
+ *comm_list_str,
+ *sym_list_str,
+ *col_width_list_str;
+ struct strlist *dso_list,
+ *comm_list,
+ *sym_list;
};

extern struct symbol_conf symbol_conf;
--
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 3 ==
Date: Tues, Dec 15 2009 8:20 am
From: Arnaldo Carvalho de Melo

Em Tue, Dec 15, 2009 at 01:58:35PM -0200, Arnaldo Carvalho de Melo escreveu:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> This simplifies a lot of functions, less stuff to be done by tool
> writers.

Ingo, this clashes with the batch Masami-san just submitted, please
defer applying my series so that I can rework it to take into account
the changes that Masami-san contributed.

Thanks!

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 3 ==
Date: Tues, Dec 15 2009 9:20 am
From: Masami Hiramatsu

Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 15, 2009 at 01:58:35PM -0200, Arnaldo Carvalho de Melo escreveu:
>> From: Arnaldo Carvalho de Melo <acme@redhat.com>
>>
>> This simplifies a lot of functions, less stuff to be done by tool
>> writers.
>
> Ingo, this clashes with the batch Masami-san just submitted, please
> defer applying my series so that I can rework it to take into account
> the changes that Masami-san contributed.

Ah, thank you for working on that!
Feel free to update builtin-probe.c too :-) it might be use
symbol.c in different way from others. (for getting vmlinux path)

Thanks,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com

==============================================================================
TOPIC: cfq: Take whether cfq group is changed into account when choosing
service tree
http://groups.google.com/group/linux.kernel/t/c5cd6622ae55dd5c?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Dec 15 2009 8:10 am
From: Corrado Zoccolo

Hi Vivek,

On Tue, Dec 15, 2009 at 4:23 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> Thinking more about it...
>
> Moving all RT tasks to root group will increase the overall share of root
> group (including share of non RT workload like sync-idle, sync-noidle and
> async). Because everytime, RT task does some IO, root group will be put at the
> front of service tree (irrespective of the fact how much service it has
> received in the past w.r.t other groups). That will make root group gain
> share and in trun root group non-RT sync-idle, sync-nodile and async
> workload also gain share.
>
Yes, this can be a problem. However, it can obtain a similar effect to the
prio_changed concept (especially when group_isolation = 0). In fact, we wanted
to run sync_noidle after RT, and sync_noidle are in root group in the
uninsulated case.

> Another way to solve the issue could be to have a separate service tree
> and root group for RT workload. By default all the RT tasks (systemwide),
> will be put into that group and we will always serve that root rt group first
> and if that group does not have any request than serve the requests from
> regular (BE and IDLE tasks), group service tree.
Ok. In this way, in a group you won't have 2 arrays of service trees
(one for each priority), but just one of them.
Only the RT root group will be allowed to contain RT queues.
What about IDLE? Should we have an other root group for it, to have
system-wide idle?
In that case, the design will become more orthogonal.

>
> This will make sure that RT tasks system wide get full access to disk
> first and then BE and IDLE tasks get to run. Also BE and IDLE tasks in
> root group will not gain share.
>
> One issue with this approach is prio_changed concept. Because now all the
> RT tasks are in a seprate group altogether, there will be no concept of
> prio_changed with-in group. Rest of the group will have either BE or IDLE
> prio tasks only. So that would mean that I need to get rid of prio_changed
> concept while selecting workload with-in group and rely on either fresh
> selection of workload type based on rb_key offset or kind of force strict
> round-robin between workloads of type (sync-idle, sync-noidle and async).
>
> Does this make sense? Corrodo, do you forsee any issues if I get rid of
> prio_changed concept. So if a workload has expired, we will always do
> fresh selection of workload based on rb_key across service trees of
> sync-idle, sync-noidle and async. This might lead to issues of sync-noidle
> workload not gettting as good latency in the presence of RT tasks. May be
> forcing a strict round robin between workload types will mitigate that
> issue up to some extent.
>
I think the prio_changed concept can simply be dropped, and we can still have
lowest rb_key selection (that I think is superior to strict round robin).
I don't see any problem in this. In presence of RT, the latency will
go up anyway.

Thanks,
Corrado

> Thanks
> Vivek
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Tues, Dec 15 2009 8:30 am
From: Vivek Goyal

On Tue, Dec 15, 2009 at 05:04:31PM +0100, Corrado Zoccolo wrote:
> Hi Vivek,
>
> On Tue, Dec 15, 2009 at 4:23 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Thinking more about it...
> >
> > Moving all RT tasks to root group will increase the overall share of root
> > group (including share of non RT workload like sync-idle, sync-noidle and
> > async). Because everytime, RT task does some IO, root group will be put at the
> > front of service tree (irrespective of the fact how much service it has
> > received in the past w.r.t other groups). That will make root group gain
> > share and in trun root group non-RT sync-idle, sync-nodile and async
> > workload also gain share.
> >
> Yes, this can be a problem. However, it can obtain a similar effect to the
> prio_changed concept (especially when group_isolation = 0). In fact, we wanted
> to run sync_noidle after RT, and sync_noidle are in root group in the
> uninsulated case.
>
> > Another way to solve the issue could be to have a separate service tree
> > and root group for RT workload. By default all the RT tasks (systemwide),
> > will be put into that group and we will always serve that root rt group first
> > and if that group does not have any request than serve the requests from
> > regular (BE and IDLE tasks), group service tree.
> Ok. In this way, in a group you won't have 2 arrays of service trees
> (one for each priority), but just one of them.
> Only the RT root group will be allowed to contain RT queues.

Yes, something like that. So instead of a group hosting both RT and BE
queues, group can have a type and it will host only either RT or BE
queues.

> What about IDLE? Should we have an other root group for it, to have
> system-wide idle?

My be. We can make a system wide idle group also along the lines of system
wide RT group.

So in this model, we will be doing proportional BW division only for BE
class tasks and not for RT and BE class tasks. RT and BE class tasks will
be system wide and will always go to fixed root RT or root BE groups. BE
class tasks will go in different groups based on their cgroups and will
get disk share in proportion to group weight.

> In that case, the design will become more orthogonal.
>
> >
> > This will make sure that RT tasks system wide get full access to disk
> > first and then BE and IDLE tasks get to run. Also BE and IDLE tasks in
> > root group will not gain share.
> >
> > One issue with this approach is prio_changed concept. Because now all the
> > RT tasks are in a seprate group altogether, there will be no concept of
> > prio_changed with-in group. Rest of the group will have either BE or IDLE
> > prio tasks only. So that would mean that I need to get rid of prio_changed
> > concept while selecting workload with-in group and rely on either fresh
> > selection of workload type based on rb_key offset or kind of force strict
> > round-robin between workloads of type (sync-idle, sync-noidle and async).
> >
> > Does this make sense? Corrodo, do you forsee any issues if I get rid of
> > prio_changed concept. So if a workload has expired, we will always do
> > fresh selection of workload based on rb_key across service trees of
> > sync-idle, sync-noidle and async. This might lead to issues of sync-noidle
> > workload not gettting as good latency in the presence of RT tasks. May be
> > forcing a strict round robin between workload types will mitigate that
> > issue up to some extent.
> >
> I think the prio_changed concept can simply be dropped, and we can still have
> lowest rb_key selection (that I think is superior to strict round robin).
> I don't see any problem in this. In presence of RT, the latency will
> go up anyway.

Ok, initially we can stick to lowest rb_key based selection and if that
does not give satisfactory latencies for sync-noidle workload, we can
revisit this issue.

Cool, I will write a patch and see how well does this thing work.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Defer skb allocation -- add destroy buffers function for virtio]
http://groups.google.com/group/linux.kernel/t/8f0c8cf3582d2d68?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:20 am
From: Shirley Ma

Sorry, forgot to CC all.

Thanks
Shirley

On Tue, Dec 15, 2009 at 07:59:42AM -0800, Shirley Ma wrote:
> Hello Michael,
>
> On Tue, 2009-12-15 at 12:57 +0200, Michael S. Tsirkin wrote:
> > No, this code would be in virtio net.
> > destroy would simply be the virtqueue API that returns
> > data pointer.
>
> Since virtio_net doesn't maintain the descriptors of vring, to return
> the data pointer from vq destroy will go through vq->vring.num. It's a
> little expensive. Since it's shutdown code, it might be OK.
>
> Rusty,
>
> How do you think? If I change the code something like this, not tested.

Yes, this is basically what I had in mind.
Looks slightly easier to understand than callbacks to me.
But far from critical of course.

> +static void *vring_destroy_bufs(struct virtqueue *_vq, void
> (*destroy)(void *))
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + unsigned int i;
> +
> + START_USE(vq);
> +
> + for (i = 0; i < vq->vring.num; i++) {
> + if (vq->data[i]) {
> + /* detach_buf clears data, so grab it now. */
> + detach_buf(vq, i);
> + END_USED(vq);
> + return vq->data[i];
> + }
> + }
> + /* That should have freed everything. */
> + BUG_ON(vq->num_free != vq->vring.num);
> + END_USED(vq);
> + return NULL;
> +}
>
>
> +static void virtnet_free_bufs(struct virtqueue *rvq)
> +{
> + void *buf;
> + for (;;) {
> + buf = rvq->destroy(rvq);
> + if (!buf) {
> + BUG_ON(vi->num != 0);
> + return;
> + } else {

Since we have return above, you can put the following code
in the outer block and reduce nesting (without "else").

> + if (vi->mergeable_rx_bufs || vi->big_packets)
> + struct page *page, *next;
> +
> + for (page = buf; page; page = next) {
> + next = (struct page *)page->private; *)page->private;
> + __free_pages(page, 0);
> + } else
> + kfree_skb(buf);
> + }
> + --vi->num;
> +}
>
> Thanks
> Shirley
>

==============================================================================
TOPIC: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc & receive calls
http://groups.google.com/group/linux.kernel/t/451cae96dcf2b8b4?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Dec 15 2009 8:30 am
From: Shirley Ma

On Tue, 2009-12-15 at 13:33 +0200, Michael S. Tsirkin wrote:
> So what I would suggest is, have function
> that just copies part of skb, and have
> caller open-code allocating the skb and free up
> pages as necessary.
Yes, the updated patch has changed the function.

> What I am asking is why do we add skb in vi->recv.
> Can't we use vq destoy hack here as well?
Yes, I removed recv queue skb link totally in the updated patch.

> > One is for big packet virtio_net_hdr, one is for goodcopy skb.
>
>
> Maybe put this in a comment then.
Ok, will do.

>
> I mean the for loop: can i be for(i = 0, ..., ++i) just as well?
> Why do you start at the end of buffer and decrement?

Are asking why reverse order for new page to sg? The reason is we link
the new page in first, and only maintain the first pointer. So the most
recent new page should be related to sg[0], if we put the new page in
the last, then we need to travel the page list to get last pointer. Am I
missing your point here?

Thanks
Shirley

== 2 of 2 ==
Date: Tues, Dec 15 2009 8:50 am
From: "Michael S. Tsirkin"

On Tue, Dec 15, 2009 at 08:25:20AM -0800, Shirley Ma wrote:
> On Tue, 2009-12-15 at 13:33 +0200, Michael S. Tsirkin wrote:
> > So what I would suggest is, have function
> > that just copies part of skb, and have
> > caller open-code allocating the skb and free up
> > pages as necessary.
> Yes, the updated patch has changed the function.
>
> > What I am asking is why do we add skb in vi->recv.
> > Can't we use vq destoy hack here as well?
> Yes, I removed recv queue skb link totally in the updated patch.
>
> > > One is for big packet virtio_net_hdr, one is for goodcopy skb.
> >
> >
> > Maybe put this in a comment then.
> Ok, will do.
>
> >
> > I mean the for loop: can i be for(i = 0, ..., ++i) just as well?
> > Why do you start at the end of buffer and decrement?
>
> Are asking why reverse order for new page to sg? The reason is we link
> the new page in first, and only maintain the first pointer. So the most
> recent new page should be related to sg[0], if we put the new page in
> the last, then we need to travel the page list to get last pointer. Am I
> missing your point here?
>
> Thanks
> Shirley

No, that was what I was looking for.

==============================================================================
TOPIC: wireless: wext: allocate space for NULL-termination for 32byte SSIDs
http://groups.google.com/group/linux.kernel/t/375be3f65cf8eb6c?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:30 am
From: Marcel Holtmann

Hi Daniel,

> We've experienced a long standing bug when quickly switching from
> ad-hoc to managed mode on a hardware using a Libertas chipset.

I don't know where you get your list of mailing lists from, but not
sending it to linux-wireless@vger.kernel.org will not really get you
anywhere. Check MAINTAINERS file first since it has all information
about it.

NETWORKING [WIRELESS]
M: "John W. Linville" <linville@tuxdriver.com>
L: linux-wireless@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git
S: Maintained
F: net/mac80211/
F: net/rfkill/
F: net/wireless/
F: include/net/ieee80211*
F: include/linux/wireless.h
F: drivers/net/wireless/

Regards

Marcel

==============================================================================
TOPIC: Async suspend-resume patch w/ completions (was: Re: Async suspend-
resume patch w/ rwsems)
http://groups.google.com/group/linux.kernel/t/744b2baf61c0ac2a?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:30 am
From: Linus Torvalds

On Tue, 15 Dec 2009, Alan Stern wrote:
>
> It doesn't feel like an ugly hack to me. It seems like exactly the
> Right Thing To Do: Make as many devices as possible use async
> suspend/resume.

The reason it's a ugly hack is that it's actually not a simple decision to
make. The devil is in the details:

> The only reason we don't make every device async is because we don't
> know whether it's safe. In the case of PCI bridges we _do_ know --
> because they don't have any work to do outside of
> late_suspend/early_resume -- and so they _should_ be async.

That's the theory, yes. And it was worth the comment to spell out that
theory. But..

It's a very subtle theory, and it's not necessarily always 100% true. For
example, a cardbus bridge is strictly speaking very much a PCI bridge, but
for cardbus bridges we _do_ have a suspend/resume function.

And perhaps worse than that, cardbus bridges are one of the canonical
examples where two different PCI devices actually share registers. It's
quite common that some of the control registers are shared across the two
subfunctions of a two-slot cardbus controller (and we generally don't even
have full docs for them!)

> The same goes for devices that don't have suspend or resume methods.

Yes and no.

Again, the "async_suspend" flag is done at the generic device layer, but
99% of all suspend/resume methods are _not_ done at that level: they are
bus-specific functions, where the bus has a generic suspend-resume
function that it exposes to the generic device layer, and that knows about
the bus-specific rules.

So if you are a PCI device (to take just that example - but it's true of
just about all other buses too), and you don't have any suspend or resume
methods, it's actually impossible to see that fact from the generic device
layer.

And even when you know it's PCI, our rules are actually not simple at all.
Our rules for PCI devices (and this strictly speaking is true for bridges
too) are rather complex:

- do we have _any_ legacy PM support (ie the "direct" driver
suspend/resume functions in the driver ops, rather than having a
"struct dev_pm_ops" pointer)? If so, call "->suspend()"

- If not - do we have that "dev_pm_ops" thing? If so, call it.

- If not - just disable the device entirely _UNLESS_ you're a PCI bridge.

Notice? The way things are set up, if you have no suspend routine, you'll
not get suspended, but you will get disabled.

So it's _not_ actually safe to asynchronously suspend a PCI device if that
device has no driver or no suspend routines - because even in the absense
of a driver and suspend routines, we'll still least disable it. And if
there is some subtle dependency on that device that isn't obvious (say, it
might be used indirectly for some ACPI thing), then that async suspend is
the wrong thing to do.

Subtle? Hell yes.

So the whole thing about "we can do PCI bridges asynchronously because
they are obviously no-op" is kind of true - except for the "obviously"
part. It's not obvious at all. It's rather subtle.

As an example of this kind of subtlety - iirc PCIE bridges used to have
suspend and resume bugs when we initially switched over to the "new world"
suspend/resume exactly because they actually did things at "suspend" time
(rather than suspend_late), and that broke devices behind them (this was
not related to async, of course, but the point is that even when you look
like a PCI bridge, you might be doing odd things).

So just saying "let's do it asynchronously" is _not_ always guaranteed to
be the right thing at all. It's _probably_ safe for at least regular PCI
bridges. Cardbus bridges? Probably not, but since most modern laptop have
just a single slot - and people who have multiple slots seldom use them
all - most people will probably never see the problems that it _could_
introduce.

And PCIE bridges? Should be safe these days, but it wasn't quite as
obvious, because a PCIE bridge actually has a driver unlike a regular
plain PCI-PCI bridge.

Subtle, subtle.

> There remains a separate question: Should async devices also be forced
> to wait for their children? I don't see why not. For PCI bridges it
> won't make any significant difference. As long as the async code
> doesn't have to do anything, who cares when it runs?

That's why I just set the "async_resume = 1" thing.

But there might actually be reasons why we care. Like the fact that we
actually throttle the amount of parallel work we do in async_schedule().
So doing even a "no-op" asynchronously isn't actually a no-op: while it is
pending (and those things can be pending for a long time, since they have
to wait for those slow devices underneath them), it can cause _other_
async work - that isn't necessarily a no-op at all - to be then done
synchronously.

Now, admittedly our async throttling limits are high enough that the above
kind of detail will probably never ever realy matter (default 256 worker
threads etc). But it's an example of how practice is different from theory
- in _theory_ it doesn't make any difference if you wait for something
asynchronously, but in practice it could make a difference under some
circumstances.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: sh: Fix test of unsigned in se7722_irq_demux()
http://groups.google.com/group/linux.kernel/t/d3980cfd65569980?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:40 am
From: Roel Kluin

se7722_fpga_irq[] is unsigned so the test does not work.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
Found using coccinelle: http://coccinelle.lip6.fr/

diff --git a/arch/sh/boards/mach-se/7722/irq.c b/arch/sh/boards/mach-se/7722/irq.c
index 4eb31ac..b221b68 100644
--- a/arch/sh/boards/mach-se/7722/irq.c
+++ b/arch/sh/boards/mach-se/7722/irq.c
@@ -57,15 +57,16 @@ static void se7722_irq_demux(unsigned int irq, struct irq_desc *desc)
*/
void __init init_se7722_IRQ(void)
{
- int i;
+ int i, irq;

ctrl_outw(0, IRQ01_MASK); /* disable all irqs */
ctrl_outw(0x2000, 0xb03fffec); /* mrshpc irq enable */

for (i = 0; i < SE7722_FPGA_IRQ_NR; i++) {
- se7722_fpga_irq[i] = create_irq();
- if (se7722_fpga_irq[i] < 0)
+ irq = create_irq();
+ if (irq < 0)
return;
+ se7722_fpga_irq[i] = irq;

set_irq_chip_and_handler_name(se7722_fpga_irq[i],
&se7722_irq_chip,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [PATCH 2/2] iwlwifi: unify iwl_setup_rxon_timing
http://groups.google.com/group/linux.kernel/t/cf1f15614c9dac9c?hl=en
==============================================================================

== 1 of 4 ==
Date: Tues, Dec 15 2009 8:40 am
From: Marcel Holtmann

Hi Greg,

> > >> This patch unifies setup_rxon_timing funcions
> > >> of AGN and 3945. HWs differ only in supported maximal
> > >> beacon interval. This is reflected in hw_paras.max_beacon_itrvl
> > >>
> > >> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> > >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > >> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> > >> (cherry picked from commit 2c2f3b33888419fb9e7d015b9dc67b9db4437efa)
> > >>
> > >> Conflicts:
> > >>
> > >> drivers/net/wireless/iwlwifi/iwl-dev.h
> > >
> > > What does this mean?
> > >
> > After cherry-pick:
> >
> > --- a/drivers/net/wireless/iwlwifi/iwl-dev.h
> > +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
> > @@@ -625,7 -608,7 +625,11 @@@ struct iwl_hw_params
> > u8 max_stations;
> > u8 bcast_sta_id;
> > u8 fat_channel;
> > ++<<<<<<< HEAD:drivers/net/wireless/iwlwifi/iwl-dev.h
> > + u8 sw_crypto;
> > ++=======
> > + u8 max_beacon_itrvl; /* in 1024 ms */
> > ++>>>>>>> 2c2f3b3... iwlwifi: unify iwl_setup_rxon_timing:drivers/net/wireless/iwlwifi/iwl-dev.h
> > u32 max_inst_size;
> > u32 max_data_size;
> > u32 max_bsm_size;
> >
> > The sw_crypto is removed in the prior commit (90e8e4), and the commit
> > is not in the stable tree. We still need sw_crypto.
> >
> > So, the patch is modified to keep sw_crypto.
>
> And why would we care? We've never used this kind of marking before in
> the kernel changelogs that I know of.
>
> > >> CC: stable@kernel.org
> > >> Signed-off-by: Ike Panhc <ike.pan@canonical.com>
> > >>
> > >> BugLink: http://bugs.launchpad.net/bugs/496496
> > >
> > > What are you expecting this patch to be applied to?
> > >
> > > confused,
> > Please consider applying to linux-2.6.31.y
>
> I need the subsystem maintainer to agree with this, have they?

I agree here. Not copying Reinette on this is just wrong. And blindly
picking some patches and sending them for -stable even more.

Regards

Marcel

== 2 of 4 ==
Date: Tues, Dec 15 2009 8:50 am
From: "John W. Linville"

On Tue, Dec 15, 2009 at 05:49:43AM -0800, Greg KH wrote:
> On Tue, Dec 15, 2009 at 03:02:17PM +0800, Ike Panhc wrote:

> > Please consider applying to linux-2.6.31.y
>
> I need the subsystem maintainer to agree with this, have they?

It seems fine to me. You may want to let Intel comment too.

John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 3 of 4 ==
Date: Tues, Dec 15 2009 8:50 am
From: "John W. Linville"

On Tue, Dec 15, 2009 at 08:34:00AM -0800, Marcel Holtmann wrote:

> I agree here. Not copying Reinette on this is just wrong. And blindly
> picking some patches and sending them for -stable even more.

I agree that Reinette should have been copied. I'm not sure I agree
that sending the patches was wrong by itself.

Canonical gets kicked in the teeth for lack of upstream participation
all the time. I even have some of their enamel on my shoes...
But let's not be too hard on them for making a mistake while trying
to do something that is overall helpful to the community at large.

Just my $0.02...

== 4 of 4 ==
Date: Tues, Dec 15 2009 9:20 am
From: Greg KH

On Tue, Dec 15, 2009 at 11:39:24AM -0500, John W. Linville wrote:
> On Tue, Dec 15, 2009 at 08:34:00AM -0800, Marcel Holtmann wrote:
>
> > I agree here. Not copying Reinette on this is just wrong. And blindly
> > picking some patches and sending them for -stable even more.
>
> I agree that Reinette should have been copied. I'm not sure I agree
> that sending the patches was wrong by itself.

I'm not saying it is wrong, but I agree, you need to copy the
maintainer, as we would need their ack before being able to accept any
type of backport.

Ike, the script, scripts/get_maintainer.pl is your friend here.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: THE-BIG-BIG LOTTO, UK ©2009
http://groups.google.com/group/linux.kernel/t/58ca258f5eee64aa?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:40 am
From: "winners@thebigbiglotto.co.uk"

--
Your e-mail address attached to Winning ticket number 00002765649541,Serial
number BIG-3673050706-07 and lucky numbers (46)0023/4440/20/89 won
£500,000.00 GBP. Contact Mr Benham Cole for info. Email:
2009agent.benhamcole@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer
dereference at 000000000000001f
http://groups.google.com/group/linux.kernel/t/ad244ee9c25bcdf4?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:40 am
From: Peter Palfrader

Hi,

we tried to upgrade a couple of our proliant servers from 2.6.31.6 to
2.6.32.1.

On two of our DL385g1 servers we had problems booting 2.6.32.1, as they
paniced.

One of them eventually booted correctly when it was decided to log its
serial console output; that strategy proved unsuccessful with the second
box.

[ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
..
[ 5.308739] Call Trace:
[ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
[ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
[ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
[ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
[ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
[ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
[ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
[ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
[ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
[ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
[ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
[ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b
[ 5.308739] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 5.308739] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 5.308739] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 5.308739] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 5.308739] [<ffffffff81036a00>] ? child_rip+0x0/0x20

is from the machine that reliably fails to boot.
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/ravel
hosts the complete serial console output.

What I caught on the second box, that eventually decided to boot is
similar, but not identical:
[ 19.028333] Call Trace:
[ 19.028333] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 19.028333] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 19.028333] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 19.028333] [<ffffffff81150b17>] ? sysfs_add_one+0x27/0xd0
[ 19.028333] [<ffffffff81151bf7>] sysfs_do_create_link+0x87/0x160
[ 19.028333] [<ffffffff81151cee>] sysfs_create_link+0xe/0x10
[ 19.028333] [<ffffffff81422072>] device_add+0x272/0x730
[ 19.028333] [<ffffffff8139779e>] ? kvasprintf+0x6e/0x90
[ 19.028333] [<ffffffff81422549>] device_register+0x19/0x20
[ 19.028333] [<ffffffff8142262c>] device_create_vargs+0xdc/0xf0
[ 19.028333] [<ffffffff8142268b>] device_create+0x4b/0x50
[ 19.028333] [<ffffffff813e9702>] ? extract_entropy+0xe2/0x140
[ 19.028333] [<ffffffff813f573f>] misc_register+0xbf/0x180
[ 19.028333] [<ffffffff8107a4e0>] ? init_oops_id+0x0/0x40
[ 19.028333] [<ffffffff81a2626b>] ? pm_qos_power_init+0x0/0xe1
[ 19.028333] [<ffffffff81a262a3>] pm_qos_power_init+0x38/0xe1
[ 19.028333] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 19.028333] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 19.028333] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 19.028333] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 19.028333] [<ffffffff81036a00>] ? child_rip+0x0/0x20

http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-bad

http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-good
for the output during a successful boot.

The config file can be found at
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/config-2.6.32.1-dsa-amd64

Cheers,
Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [input] add mc13783 touchscreen driver
http://groups.google.com/group/linux.kernel/t/2ba86257c3bb866d?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:50 am
From: Dmitry Torokhov

Hi Uwe,

On Tue, Dec 15, 2009 at 11:10:28AM +0100, Uwe Kleine-König wrote:
> Hi Dmitry,
>
> I squashed your changes into this patch, restored my indention style and
> simplified error handling in mc13783_ts_probe to assign ret = -ENOMEM
> once at the start of the function instead of each error branch.

I prefer to have error defined right before we jump because it allows
reader to see explicitely set error condition instead of having to
verify if earlier code set it properly; it also forces you to set error
conditions on every error branch properly (if you forget while adding a
new one compiler will warn you about uninitialized variable), but I
won't insist.

>
> In the meantime the changes to mc13783-core are merged in Linus' tree,
> so it can go via your's.
>

It settled then.

--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Possible data loss on ext[34], reiserfs with external journal
http://groups.google.com/group/linux.kernel/t/326aed9339cfdbd9?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 8:50 am
From: tytso@mit.edu

On Tue, Dec 15, 2009 at 01:19:57AM -0500, Oleg Drokin wrote:
> > + /*
> > + * If the journal is not located on the file system device,
> > + * then we must flush the file system device before we issue
> > + * the commit record
> > + */
> > + if (commit_transaction->t_flushed_data_blocks &&
> > + (journal->j_fs_dev != journal->j_dev) &&
> > + (journal->j_flags & JBD2_BARRIER))
> > + blkdev_issue_flush(journal->j_fs_dev, NULL);
> > +
>
> I am afraid this is not enough. This code is called after journal
> was flushed for async commit case, so it leaves a race window where
> journal transaction is already on disk and complete, but the data is
> still in cache somewhere.

No, that's actually fine. In the ASYNC_COMMIT case, the commit won't
be valid until the checksum is correct, and we won't have written any
descriptor blocks yet at this point. So there is no race because
during that window, the commit is written but we won't write any
descriptor blocks until after the barrier returns.

> Also the callsite has this comment which is misleading, I think:
> /*
> * This is the right place to wait for data buffers both for ASYNC
> * and !ASYNC commit. If commit is ASYNC, we need to wait only after
> * the commit block went to disk (which happens above). If commit is
> * SYNC, we need to wait for data buffers before we start writing
> * commit block, which happens below in such setting.
> */

Yeah, that comment is confusing and not entirely accurate. I thought
about cleaning it up, and then decided to do that in a separate patch.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Are these MTRR settings correct?
http://groups.google.com/group/linux.kernel/t/ee98ace36c359c1f?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 9:00 am
From: Bjorn Helgaas

On Monday 14 December 2009 06:42:11 pm Yinghai Lu wrote:
> Robert Hancock wrote:
> > Something else isn't quite right. It looks like MMCONFIG area should be
> > reserved:
> >
> > [ 0.308434] system 00:0c: iomem range 0xe0000000-0xefffffff has been
> > reserved
> >
> > but the code didn't seem to detect that. In fact there doesn't seem to
> > be any output about whether it was or wasn't reserved, which from the
> > code it seems there should be.
> >
> > Maybe because of that ACPI method execution error?
>
> could be sth pnpacpi brokenness?

Robert, I assume you're referring to this from Tvrtko's post
(http://lkml.org/lkml/2009/12/13/90):

[ 0.000000] BIOS-e820: 00000000dffd0000 - 00000000e0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
...
[ 0.250088] PCI: Found AMD Family 10h NB with MMCONFIG support.
[ 0.250091] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[ 0.250092] PCI: Not using MMCONFIG.
...
[ 0.253491] ACPI Error (psargs-0359): [ECEN] Namespace lookup failure, AE_NOT_FOUND
[ 0.253495] ACPI Error (psparse-0537): Method parse/execution failed [\] (Node ffffffff81656ab0), AE_NOT_FOUND
...
[ 0.308434] system 00:0c: iomem range 0xe0000000-0xefffffff has been reserved

I think we're rejecting MMCONFIG in the early call to
pci_mmcfg_reject_broken(), when we check only E820 resources, not
ACPI resources. And indeed, the 0xe0000000-0xefffffff range is
not mentioned in E820. Which output did you expect to see?

I am uncomfortable with this early/late checking and looking at both
E820 and ACPI. It just feels hacky and error-prone. I'm not happy about
adding Yinghai's special-case "if we found AMD Fam10h, don't check for
reservations" patch either.

I assume that Windows runs on this box without requiring per-machine
hacks in the kernel. Linux should be able to do the same, and the fact
that we can't is telling us we're doing somethign wrong. We should fix
whatever's wrong rather than papering over it.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: [PATCH 2/5] mm : avoid false sharing on mm_counter
http://groups.google.com/group/linux.kernel/t/1d905e0d3fc3ccff?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 9:00 am
From: "KAMEZAWA Hiroyuki"

Christoph Lameter さんは書きました：
> On Tue, 15 Dec 2009, KAMEZAWA Hiroyuki wrote:
>
>> #if USE_SPLIT_PTLOCKS
>> +#define SPLIT_RSS_COUNTING
>> struct mm_rss_stat {
>> atomic_long_t count[NR_MM_COUNTERS];
>> };
>> +/* per-thread cached information, */
>> +struct task_rss_stat {
>> + int events; /* for synchronization threshold */
>
> Why count events? Just always increment the task counters and fold them
> at appropriate points into mm_struct.

I used event counter because I think this patch is _easy_ version of all
I've wrote since November. I'd like to start from simple one rather than
some codes which is invasive and can cause complicated discussion.

This event counter is very simple and all we do can be folded under /mm.
To be honest, I'd like to move synchronization point to tick or
schedule(), but for now, I'd like to start from this.
The point of this patch is "spliting" mm_counter counting and remove
false sharing. The problem of synchronization of counter can be
discussed later.

As you know, I have exterme version using percpu etc...but it's not
too late to think of some best counter after removing false sharing
of mmap_sem. When measuring page-fault speed, using more than 4 threads,
most of time is used for false sharing of mmap_sem and this counter's
scalability is not a problem. (So, my test program just use 2 threads.)

Considering trade-off, I'd like to start from "implement all under /mm"
imeplemnation. We can revisit and modify this after mmap_sem problem is
fixed.

If you recommend to drop this and just post 1,3,4,5. I'll do so.

> Or get rid of the mm_struct counters and only sum them up on the fly if
needed?
>
Get rid of mm_struct's counter is impossible because of get_user_pages(),
kswapd, vmscan etc...(now)

Then, we have 3 choices.
1. leave atomic counter on mm_struct
2. add pointer to some thread's counter in mm_struct.
3. use percpu counter on mm_stuct.

With 2. , we'll have to take care of atomicity of updateing per-thread
counter...so, not choiced. With 3, using percpu counter, as you did, seems
attractive. But there are problem scalabilty in read-side and we'll
need some synchonization point for avoid level-down in read-side even
using percpu counter..

Considering memory foot print, the benefit of per-thread counter is
that we can put per-thread counter near to cache-line of task->mm
and we don't have to take care of extra cache-miss.
(if counter size is enough small.)

> Add a pointer to thread rss_stat structure to mm_struct and remove the
> counters? If the task has only one thread then the pointer points to the
> accurate data (most frequent case). Otherwise it can be NULL and then we
> calculate it on the fly?
>
get_user_pages(), vmscan, kvm etc...will touch other process's page table.

>> +static void add_mm_counter_fast(struct mm_struct *mm, int member, int
>> val)
>> +{
>> + struct task_struct *task = current;
>> +
>> + if (likely(task->mm == mm))
>> + task->rss_stat.count[member] += val;
>> + else
>> + add_mm_counter(mm, member, val);
>> +}
>> +#define inc_mm_counter_fast(mm, member) add_mm_counter_fast(mm,
>> member,1)
>> +#define dec_mm_counter_fast(mm, member) add_mm_counter_fast(mm,
>> member,-1)
>> +
>
> Code will be much simpler if you always increment the task counts.
>
yes, I know and tried but failed. Maybe bigger patch will be required.

The result this patch shows is not very bad even if we have more chances.

Thanks,
-Kame

==============================================================================
TOPIC: x86: UV - XPC fixes with related support functionality V2.
http://groups.google.com/group/linux.kernel/t/fd19743e8bee3c06?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Dec 15 2009 9:10 am
From: Robin Holt

Andrew, did this get lost in the shuffle? If you need me to
resubmit, please let me know.

Thanks,
Robin

On Mon, Nov 23, 2009 at 07:39:37PM -0600, Robin Holt wrote:
>
> The UV BIOS has been updated to implement some of our interface
> functionality differently than originally expected. These patches update
> the kernel to the bios implementation and include a few minor bug fixes
> which prevent us from doing significant testing on real hardware.
>
> Changes from V1:
>
> - Actually include the patch introducing the gru_read_gpa. This
> was missed in the V1 submission.
>
> - One additional BIOS change has the OS no longer passing blade
> to the BIOS when registering a message queue watchlist.
>
> ---
>
> arch/x86/include/asm/uv/bios.h | 11 --------
> arch/x86/include/asm/uv/uv_hub.h | 20 +++++++++++++++
> arch/x86/kernel/bios_uv.c | 8 +-----
> drivers/misc/sgi-gru/gru_instructions.h | 13 ++++++++++
> drivers/misc/sgi-gru/grukservices.c | 23 +++++++++++++++++
> drivers/misc/sgi-gru/grukservices.h | 14 ++++++++++
> drivers/misc/sgi-gru/gruprocfs.c | 1
> drivers/misc/sgi-gru/grutables.h | 1
> drivers/misc/sgi-xp/xp.h | 1
> drivers/misc/sgi-xp/xp_main.c | 3 ++
> drivers/misc/sgi-xp/xp_sn2.c | 10 +++++++
> drivers/misc/sgi-xp/xp_uv.c | 33 +++++++++++++++++++++++++
> drivers/misc/sgi-xp/xpc_partition.c | 13 +++++++---
> drivers/misc/sgi-xp/xpc_uv.c | 41 +++++++++++++++++---------------
> 14 files changed, 153 insertions(+), 39 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

== 2 of 2 ==
Date: Tues, Dec 15 2009 9:10 am
From: Robin Holt

Argh. Forget it. I see them now. I am sorry for the noise.

Robin

On Tue, Dec 15, 2009 at 11:04:39AM -0600, Robin Holt wrote:
> Andrew, did this get lost in the shuffle? If you need me to
> resubmit, please let me know.
>
> Thanks,
> Robin
>
> On Mon, Nov 23, 2009 at 07:39:37PM -0600, Robin Holt wrote:
> >
> > The UV BIOS has been updated to implement some of our interface
> > functionality differently than originally expected. These patches update
> > the kernel to the bios implementation and include a few minor bug fixes
> > which prevent us from doing significant testing on real hardware.
> >
> > Changes from V1:
> >
> > - Actually include the patch introducing the gru_read_gpa. This
> > was missed in the V1 submission.
> >
> > - One additional BIOS change has the OS no longer passing blade
> > to the BIOS when registering a message queue watchlist.
> >
> > ---
> >
> > arch/x86/include/asm/uv/bios.h | 11 --------
> > arch/x86/include/asm/uv/uv_hub.h | 20 +++++++++++++++
> > arch/x86/kernel/bios_uv.c | 8 +-----
> > drivers/misc/sgi-gru/gru_instructions.h | 13 ++++++++++
> > drivers/misc/sgi-gru/grukservices.c | 23 +++++++++++++++++
> > drivers/misc/sgi-gru/grukservices.h | 14 ++++++++++
> > drivers/misc/sgi-gru/gruprocfs.c | 1
> > drivers/misc/sgi-gru/grutables.h | 1
> > drivers/misc/sgi-xp/xp.h | 1
> > drivers/misc/sgi-xp/xp_main.c | 3 ++
> > drivers/misc/sgi-xp/xp_sn2.c | 10 +++++++
> > drivers/misc/sgi-xp/xp_uv.c | 33 +++++++++++++++++++++++++
> > drivers/misc/sgi-xp/xpc_partition.c | 13 +++++++---
> > drivers/misc/sgi-xp/xpc_uv.c | 41 +++++++++++++++++---------------
> > 14 files changed, 153 insertions(+), 39 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Per cpu atomics in core allocators and cleanup
http://groups.google.com/group/linux.kernel/t/92c1d57d42e5ddf3?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 9:10 am
From: Mel Gorman

On Mon, Dec 14, 2009 at 04:03:20PM -0600, Christoph Lameter wrote:
> Leftovers from the earlier patchset. Mostly applications of per cpu counters
> to core components.
>
> After this patchset there will be only one user of local_t left: Mathieu's
> trace ringbuffer. Does it really need these ops?
>

What kernel are these patches based on? They do not cleanly apply and
when fixed up, they do not build against 2.6.32.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Input: Fix test of unsigned in altera_ps2_probe()
http://groups.google.com/group/linux.kernel/t/4936bafefa349e70?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 9:10 am
From: Roel Kluin

ps2if->irq is unsigned so the test does not work.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
Found using coccinelle: http://coccinelle.lip6.fr/

diff --git a/drivers/input/serio/altera_ps2.c b/drivers/input/serio/altera_ps2.c
index f479ea5..3aa8f2e 100644
--- a/drivers/input/serio/altera_ps2.c
+++ b/drivers/input/serio/altera_ps2.c
@@ -83,7 +83,7 @@ static int altera_ps2_probe(struct platform_device *pdev)
{
struct ps2if *ps2if;
struct serio *serio;
- int error;
+ int error, irq;

ps2if = kzalloc(sizeof(struct ps2if), GFP_KERNEL);
serio = kzalloc(sizeof(struct serio), GFP_KERNEL);
@@ -108,11 +108,13 @@ static int altera_ps2_probe(struct platform_device *pdev)
goto err_free_mem;
}

- ps2if->irq = platform_get_irq(pdev, 0);
- if (ps2if->irq < 0) {
+
+ irq = platform_get_irq(pdev, 0);
+ if (irq < 0) {
error = -ENXIO;
goto err_free_mem;
}
+ ps2if->irq = irq;

if (!request_mem_region(ps2if->iomem_res->start,
resource_size(ps2if->iomem_res), pdev->name)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: On cgroup, cpuset and rcu and a patch.
http://groups.google.com/group/linux.kernel/t/42c0c82ab68ecf3f?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Dec 15 2009 9:20 am
From: Peter Zijlstra

Hi Paul,

I'm wanting to use cpuset_cpus_allowed_locked() from within a scheduler
path (to break affinity in some rare cases), this however precludes that
I hold any mutex.

>From what I can see, its perfectly safe to use guarantee_online_cpus()
without holding any mutex, or even task_lock(), as long as I hold the
RCU read-lock, since cgroups are RCU-freed and task_cs() includes the
appropriate rcu_dereference(), and cgroup_attach_task() includes the
matching rcu_assign_pointer() call.

Or is there still any merit to the cpuset comments and am I missing some
detail -- which then ought to get fixed?

Also, while looking over the cgroup code, I found the below...

---
Subject: cgroup: fix RCU assumptions

The code relies on RCU read-lock to iterate the task list, but fails to
actually take it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/cgroup.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c
+++ linux-2.6/kernel/cgroup.c
@@ -2127,6 +2127,7 @@ static void cgroup_enable_task_cg_lists(
struct task_struct *p, *g;
write_lock(&css_set_lock);
use_task_css_set_links = 1;
+ rcu_read_lock();
do_each_thread(g, p) {
task_lock(p);
/*
@@ -2138,6 +2139,7 @@ static void cgroup_enable_task_cg_lists(
list_add(&p->cg_list, &p->cgroups->tasks);
task_unlock(p);
} while_each_thread(g, p);
+ rcu_read_unlock();
write_unlock(&css_set_lock);
}

==============================================================================

You received this message because you are subscribed to the Google Groups "linux.kernel"
group.

To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en

To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Tuesday, December 15, 2009

linux.kernel - 26 new messages in 18 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts