linux.kernel - 26 new messages in 24 topics - digest
linux.kernel
http://groups.google.com/group/linux.kernel?hl=en
Today's topics:
* cpuset,mm: use rwlock to protect task->mempolicy and mems_allowed - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/a4153e82e4e3d905?hl=en
* [PATCH 3/3] memcg: oom kill disable and stop and go hooks. - 2 messages, 1
author
http://groups.google.com/group/linux.kernel/t/bbf57513f577c468?hl=en
* fs: fat: use hex_asc_lo/hex_asc_hi instead of custom one - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/92eb1fedc85b2dd2?hl=en
* kbuild: Do not unnecessarily regenerate modules.builtin - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/a2bfa67b9e0dbb06?hl=en
* Patch for tracing c states (power_end) on x86 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/91e464354797bdad?hl=en
* linux-next: build failure after merge of the pcmcia tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/550261f3edfeadd9?hl=en
* memcg: disable irq at page cgroup lock (Re: [PATCH -mmotm 3/4] memcg: dirty
pages accounting and limiting infrastructure) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/eb7661b915f1ef96?hl=en
* cpuset,mm: update task's mems_allowed lazily - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8cf36eef7f96d9f5?hl=en
* davinci: MMC: Pass number of SG segments as platform data - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/432793a9cfd7c2de?hl=en
* x86/kvm: Show guest system/user cputime in cpustat - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/f85ba99e07cdffa5?hl=en
* intel-agp.c: Fix crash when accessing nonexistent GTT entries in i915 - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/bcbed4037b6559cf?hl=en
* perf_events: fix X86 bogus counts when multiplexing - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/82db6d0109ff6b09?hl=en
* firmware loader: use statically initialized data attribute - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/43f4f33dc707bb67?hl=en
* perf: export perf_trace_regs and perf_arch_fetch_caller_regs - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/cd42e7124803ea93?hl=en
* perf_events: improve task_sched_in() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b56952ee7476b66d?hl=en
* tracing: Update comments - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5b7ef9182ff8a8fe?hl=en
* x86: remove rdc321x_defs.h - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/7a5f3f3567fe68c3?hl=en
* RDC321x southbridge and GPIO support - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/6edde9df3d5ce190?hl=en
* WATCHDOG: convert rdc321x_wdt to use southbridge accessors - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/c971edde03a72e5f?hl=en
* drm/i915: Convert some trace events to DEFINE_TRACE - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b94eef8a017f8e1a?hl=en
* [PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_
level_irq - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/06c69412ea327027?hl=en
* tracing: Convert some signal events to DEFINE_TRACE - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ae1162a6d10e9683?hl=en
* GPIO: add support for RDC321x GPIO controller - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/713daf2b3db2e4b6?hl=en
* MFD: add support for the RDC321x southbridge - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/ccafbaf6c47386c3?hl=en
==============================================================================
TOPIC: cpuset,mm: use rwlock to protect task->mempolicy and mems_allowed
http://groups.google.com/group/linux.kernel/t/a4153e82e4e3d905?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:00 am
From: Miao Xie
on 2010-3-11 13:30, Nick Piggin wrote:
>>>> The problem is following:
>>>> The size of nodemask_t is greater than the size of long integer, so loading
>>>> and storing of nodemask_t are not atomic operations. If task->mems_allowed
>>>> don't intersect with new_mask, such as the first word of the mask is empty
>>>> and only the first word of new_mask is not empty. When the allocator
>>>> loads a word of the mask before
>>>>
>>>> current->mems_allowed |= new_mask;
>>>>
>>>> and then loads another word of the mask after
>>>>
>>>> current->mems_allowed = new_mask;
>>>>
>>>> the allocator gets an empty nodemask.
>>>
>>> Couldn't that be solved by having the reader read the nodemask twice
>>> and compare them? In the normal case there's no race, so the second
>>> read is straight from L1 cache and is very cheap. In the unlikely case
>>> of a race, the reader would keep trying until it got two consistent
>>> values in a row.
>>
>> I think this method can't fix the problem because we can guarantee the second
>> read is after the update of mask completes.
>
> Any problem with using a seqlock?
>
> The other thing you could do is store a pointer to the nodemask, and
> allocate a new nodemask when changing it, issue a smp_wmb(), and then
> store the new pointer. Read side only needs a smp_read_barrier_depends()
Comparing with my second version patch, I think both of these methods will cause worse
performance and the changing of code is more.
Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: [PATCH 3/3] memcg: oom kill disable and stop and go hooks.
http://groups.google.com/group/linux.kernel/t/bbf57513f577c468?hl=en
==============================================================================
== 1 of 2 ==
Date: Thurs, Mar 11 2010 12:10 am
From: KAMEZAWA Hiroyuki
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
This adds a feature to disable oom-killer for memcg, if disabled,
of course, tasks under memcg will stop.
But now, we have oom-notifier for memcg. And the world around
memcg is not under out-of-memory. memcg's out-of-memory just
shows memcg hits limit. Then, administrator or
management daemon can recover the situation by
- kill some process
- enlarge limit, add more swap.
- migrate some tasks
- remove file cache on tmps (difficult ?)
TODO:
more brush up and find races.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Documentation/cgroups/memory.txt | 19 ++++++
mm/memcontrol.c | 113 ++++++++++++++++++++++++++++++++-------
2 files changed, 113 insertions(+), 19 deletions(-)
Index: mmotm-2.6.34-Mar9/mm/memcontrol.c
===================================================================
--- mmotm-2.6.34-Mar9.orig/mm/memcontrol.c
+++ mmotm-2.6.34-Mar9/mm/memcontrol.c
@@ -235,7 +235,8 @@ struct mem_cgroup {
* mem_cgroup ? And what type of charges should we move ?
*/
unsigned long move_charge_at_immigrate;
-
+ /* Disable OOM killer */
+ unsigned long oom_kill_disable;
/*
* percpu counter.
*/
@@ -1340,20 +1341,26 @@ static void memcg_wakeup_oom(struct mem_
__wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, mem);
}
+static void memcg_oom_recover(struct mem_cgroup *mem)
+{
+ if (mem->oom_kill_disable && atomic_read(&mem->oom_lock))
+ memcg_wakeup_oom(mem);
+}
+
/*
* try to call OOM killer. returns false if we should exit memory-reclaim loop.
*/
bool mem_cgroup_handle_oom(struct mem_cgroup *mem, gfp_t mask)
{
struct oom_wait_info owait;
- bool locked;
+ bool locked, need_to_kill;
owait.mem = mem;
owait.wait.flags = 0;
owait.wait.func = memcg_oom_wake_function;
owait.wait.private = current;
INIT_LIST_HEAD(&owait.wait.task_list);
-
+ need_to_kill = true;
/* At first, try to OOM lock hierarchy under mem.*/
mutex_lock(&memcg_oom_mutex);
locked = mem_cgroup_oom_lock(mem);
@@ -1362,15 +1369,17 @@ bool mem_cgroup_handle_oom(struct mem_cg
* accounting. So, UNINTERRUPTIBLE is appropriate. But SIGKILL
* under OOM is always welcomed, use TASK_KILLABLE here.
*/
- if (!locked)
- prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
- else
+ prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
+ if (!locked || mem->oom_kill_disable)
+ need_to_kill = false;
+ if (locked)
mem_cgroup_oom_notify(mem);
mutex_unlock(&memcg_oom_mutex);
- if (locked)
+ if (need_to_kill) {
+ finish_wait(&memcg_oom_waitq, &owait.wait);
mem_cgroup_out_of_memory(mem, mask);
- else {
+ } else {
schedule();
finish_wait(&memcg_oom_waitq, &owait.wait);
}
@@ -2162,15 +2171,6 @@ __do_uncharge(struct mem_cgroup *mem, co
/* If swapout, usage of swap doesn't decrease */
if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
uncharge_memsw = false;
- /*
- * do_batch > 0 when unmapping pages or inode invalidate/truncate.
- * In those cases, all pages freed continously can be expected to be in
- * the same cgroup and we have chance to coalesce uncharges.
- * But we do uncharge one by one if this is killed by OOM(TIF_MEMDIE)
- * because we want to do uncharge as soon as possible.
- */
- if (!current->memcg_batch.do_batch || test_thread_flag(TIF_MEMDIE))
- goto direct_uncharge;
batch = ¤t->memcg_batch;
/*
@@ -2181,6 +2181,17 @@ __do_uncharge(struct mem_cgroup *mem, co
if (!batch->memcg)
batch->memcg = mem;
/*
+ * do_batch > 0 when unmapping pages or inode invalidate/truncate.
+ * In those cases, all pages freed continously can be expected to be in
+ * the same cgroup and we have chance to coalesce uncharges.
+ * But we do uncharge one by one if this is killed by OOM(TIF_MEMDIE)
+ * because we want to do uncharge as soon as possible.
+ */
+
+ if (!batch->do_batch || test_thread_flag(TIF_MEMDIE))
+ goto direct_uncharge;
+
+ /*
* In typical case, batch->memcg == mem. This means we can
* merge a series of uncharges to an uncharge of res_counter.
* If not, we uncharge res_counter ony by one.
@@ -2196,6 +2207,8 @@ direct_uncharge:
res_counter_uncharge(&mem->res, PAGE_SIZE);
if (uncharge_memsw)
res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ if (unlikely(batch->memcg != mem))
+ memcg_oom_recover(mem);
return;
}
@@ -2332,6 +2345,7 @@ void mem_cgroup_uncharge_end(void)
res_counter_uncharge(&batch->memcg->res, batch->bytes);
if (batch->memsw_bytes)
res_counter_uncharge(&batch->memcg->memsw, batch->memsw_bytes);
+ memcg_oom_recover(batch->memcg);
/* forget this pointer (for sanity check) */
batch->memcg = NULL;
}
@@ -2568,10 +2582,11 @@ static int mem_cgroup_resize_limit(struc
unsigned long long val)
{
int retry_count;
- u64 memswlimit;
+ u64 memswlimit, memlimit;
int ret = 0;
int children = mem_cgroup_count_children(memcg);
u64 curusage, oldusage;
+ int enlarge;
/*
* For keeping hierarchical_reclaim simple, how long we should retry
@@ -2582,6 +2597,7 @@ static int mem_cgroup_resize_limit(struc
oldusage = res_counter_read_u64(&memcg->res, RES_USAGE);
+ enlarge = 0;
while (retry_count) {
if (signal_pending(current)) {
ret = -EINTR;
@@ -2599,6 +2615,11 @@ static int mem_cgroup_resize_limit(struc
mutex_unlock(&set_limit_mutex);
break;
}
+
+ memlimit = res_counter_read_u64(&memcg->res, RES_LIMIT);
+ if (memlimit < val)
+ enlarge = 1;
+
ret = res_counter_set_limit(&memcg->res, val);
if (!ret) {
if (memswlimit == val)
@@ -2620,6 +2641,8 @@ static int mem_cgroup_resize_limit(struc
else
oldusage = curusage;
}
+ if (!ret && enlarge)
+ memcg_oom_recover(memcg);
return ret;
}
@@ -2628,9 +2651,10 @@ static int mem_cgroup_resize_memsw_limit
unsigned long long val)
{
int retry_count;
- u64 memlimit, oldusage, curusage;
+ u64 memlimit, memswlimit, oldusage, curusage;
int children = mem_cgroup_count_children(memcg);
int ret = -EBUSY;
+ int enlarge = 0;
/* see mem_cgroup_resize_res_limit */
retry_count = children * MEM_CGROUP_RECLAIM_RETRIES;
@@ -2652,6 +2676,9 @@ static int mem_cgroup_resize_memsw_limit
mutex_unlock(&set_limit_mutex);
break;
}
+ memswlimit = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
+ if (memswlimit < val)
+ enlarge = 1;
ret = res_counter_set_limit(&memcg->memsw, val);
if (!ret) {
if (memlimit == val)
@@ -2674,6 +2701,8 @@ static int mem_cgroup_resize_memsw_limit
else
oldusage = curusage;
}
+ if (!ret && enlarge)
+ memcg_oom_recover(memcg);
return ret;
}
@@ -2865,6 +2894,7 @@ move_account:
if (ret)
break;
}
+ memcg_oom_recover(mem);
/* it seems parent cgroup doesn't have enough mem */
if (ret == -ENOMEM)
goto try_to_free;
@@ -3650,6 +3680,46 @@ static int mem_cgroup_oom_unregister_eve
return 0;
}
+static int mem_cgroup_oom_control_read(struct cgroup *cgrp,
+ struct cftype *cft, struct cgroup_map_cb *cb)
+{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+ cb->fill(cb, "oom_kill_disable", mem->oom_kill_disable);
+
+ if (atomic_read(&mem->oom_lock))
+ cb->fill(cb, "under_oom", 1);
+ else
+ cb->fill(cb, "under_oom", 0);
+ return 0;
+}
+
+/*
+ */
+static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
+ struct cftype *cft, u64 val)
+{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+ struct mem_cgroup *parent;
+
+ /* cannot set to root cgroup and only 0 and 1 are allowed */
+ if (!cgrp->parent || !((val == 0) || (val == 1)))
+ return -EINVAL;
+
+ parent = mem_cgroup_from_cont(cgrp->parent);
+
+ cgroup_lock();
+ /* oom-kill-disable is a flag for subhierarchy. */
+ if ((parent->use_hierarchy) ||
+ (mem->use_hierarchy && !list_empty(&cgrp->children))) {
+ cgroup_unlock();
+ return -EINVAL;
+ }
+ mem->oom_kill_disable = val;
+ cgroup_unlock();
+ return 0;
+}
+
static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -3707,6 +3777,8 @@ static struct cftype mem_cgroup_files[]
},
{
.name = "oom_control",
+ .read_map = mem_cgroup_oom_control_read,
+ .write_u64 = mem_cgroup_oom_control_write,
.register_event = mem_cgroup_oom_register_event,
.unregister_event = mem_cgroup_oom_unregister_event,
.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
@@ -3946,6 +4018,7 @@ mem_cgroup_create(struct cgroup_subsys *
} else {
parent = mem_cgroup_from_cont(cont->parent);
mem->use_hierarchy = parent->use_hierarchy;
+ mem->oom_kill_disable = parent->oom_kill_disable;
}
if (parent && parent->use_hierarchy) {
@@ -4240,6 +4313,7 @@ static void mem_cgroup_clear_mc(void)
if (mc.precharge) {
__mem_cgroup_cancel_charge(mc.to, mc.precharge);
mc.precharge = 0;
+ memcg_oom_recover(mc.to);
}
/*
* we didn't uncharge from mc.from at mem_cgroup_move_account(), so
@@ -4248,6 +4322,7 @@ static void mem_cgroup_clear_mc(void)
if (mc.moved_charge) {
__mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
mc.moved_charge = 0;
+ memcg_oom_recover(mc.from);
}
/* we must fixup refcnts and charges */
if (mc.moved_swap) {
Index: mmotm-2.6.34-Mar9/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.34-Mar9.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.34-Mar9/Documentation/cgroups/memory.txt
@@ -493,6 +493,8 @@ It's applicable for root and non-root cg
10. OOM Control
+memory.oom_control file is for OOM notification and other controls.
+
Memory controler implements oom notifier using cgroup notification
API (See cgroups.txt). It allows to register multiple oom notification
delivery and gets notification when oom happens.
@@ -505,6 +507,23 @@ To register a notifier, application need
Application will be notifier through eventfd when oom happens.
OOM notification doesn't work for root cgroup.
+You can disable oom-killer by writing "1" to memory.oom_control file.
+As.
+ #echo 1 > memory.oom_control
+
+This operation is only allowed to the top cgroup of subhierarchy.
+If oom-killer is disabled, tasks under cgroup will hang/sleep
+in memcg's oom-waitq when they request accountable memory.
+For running them, you have to relax the memcg's oom sitaution by
+ * enlarge limit
+ * kill some tasks.
+ * move some tasks to other group with account migration.
+Then, stopped tasks will work again.
+
+At reading, current status of OOM is shown.
+ oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)
+ under_oom 0 or 1 (if 1, the memcg is under OOM,tasks may
+ be stopped.)
11. TODO
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Thurs, Mar 11 2010 12:10 am
From: KAMEZAWA Hiroyuki
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Considering containers or other resource management softwares in userland,
event notification of OOM in memcg should be implemented.
Now, memcg has "threshold" notifier which uses eventfd, we can make
use of it for oom notification.
This patch adds oom notification eventfd callback for memcg. The usage
is very similar to threshold notifier, but control file is
memory.oom_control and no arguments other than eventfd is required.
% cgroup_event_notifier /cgroup/A/memory.oom_control dummy
(About cgroup_event_notifier, see Documentation/cgroup/)
TODO:
- add a knob to disable oom-kill under a memcg.
- add read/write function to oom_control
Changelog: 20100309
- splitted from threshold functions. use list rather than array.
- moved all to inside of mutex.
Changelog: 20100304
- renewed implemenation.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Documentation/cgroups/memory.txt | 20 +++++++
mm/memcontrol.c | 105 ++++++++++++++++++++++++++++++++++++---
2 files changed, 116 insertions(+), 9 deletions(-)
Index: mmotm-2.6.34-Mar9/mm/memcontrol.c
===================================================================
--- mmotm-2.6.34-Mar9.orig/mm/memcontrol.c
+++ mmotm-2.6.34-Mar9/mm/memcontrol.c
@@ -149,6 +149,7 @@ struct mem_cgroup_threshold {
u64 threshold;
};
+/* For threshold */
struct mem_cgroup_threshold_ary {
/* An array index points to threshold just below usage. */
atomic_t current_threshold;
@@ -157,8 +158,14 @@ struct mem_cgroup_threshold_ary {
/* Array of thresholds */
struct mem_cgroup_threshold entries[0];
};
+/* for OOM */
+struct mem_cgroup_eventfd_list {
+ struct list_head list;
+ struct eventfd_ctx *eventfd;
+};
static void mem_cgroup_threshold(struct mem_cgroup *mem);
+static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
/*
* The memory controller data structure. The memory controller controls both
@@ -220,6 +227,9 @@ struct mem_cgroup {
/* thresholds for mem+swap usage. RCU-protected */
struct mem_cgroup_threshold_ary *memsw_thresholds;
+ /* For oom notifier event fd */
+ struct list_head oom_notify;
+
/*
* Should we move charges of a task when a task is moved into this
* mem_cgroup ? And what type of charges should we move ?
@@ -282,9 +292,12 @@ enum charge_type {
/* for encoding cft->private value on file */
#define _MEM (0)
#define _MEMSWAP (1)
+#define _OOM_TYPE (2)
#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
#define MEMFILE_TYPE(val) (((val) >> 16) & 0xffff)
#define MEMFILE_ATTR(val) ((val) & 0xffff)
+/* Used for OOM nofiier */
+#define OOM_CONTROL (0)
/*
* Reclaim flags for mem_cgroup_hierarchical_reclaim
@@ -1351,6 +1364,8 @@ bool mem_cgroup_handle_oom(struct mem_cg
*/
if (!locked)
prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
+ else
+ mem_cgroup_oom_notify(mem);
mutex_unlock(&memcg_oom_mutex);
if (locked)
@@ -3398,8 +3413,22 @@ static int compare_thresholds(const void
return _a->threshold - _b->threshold;
}
-static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
- struct eventfd_ctx *eventfd, const char *args)
+static int mem_cgroup_oom_notify_cb(struct mem_cgroup *mem, void *data)
+{
+ struct mem_cgroup_eventfd_list *ev;
+
+ list_for_each_entry(ev, &mem->oom_notify, list)
+ eventfd_signal(ev->eventfd, 1);
+ return 0;
+}
+
+static void mem_cgroup_oom_notify(struct mem_cgroup *mem)
+{
+ mem_cgroup_walk_tree(mem, NULL, mem_cgroup_oom_notify_cb);
+}
+
+static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
+ struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
struct mem_cgroup_threshold_ary *thresholds, *thresholds_new;
@@ -3483,8 +3512,8 @@ unlock:
return ret;
}
-static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
- struct eventfd_ctx *eventfd)
+static int mem_cgroup_usage_unregister_event(struct cgroup *cgrp,
+ struct cftype *cft, struct eventfd_ctx *eventfd)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
struct mem_cgroup_threshold_ary *thresholds, *thresholds_new;
@@ -3568,13 +3597,66 @@ unlock:
return ret;
}
+static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
+ struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
+{
+ struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+ struct mem_cgroup_eventfd_list *event;
+ int type = MEMFILE_TYPE(cft->private);
+ int ret = -ENOMEM;
+
+ BUG_ON(type != _OOM_TYPE);
+
+ mutex_lock(&memcg_oom_mutex);
+
+ /* Allocate memory for new array of thresholds */
+ event = kmalloc(sizeof(*event), GFP_KERNEL);
+ if (!event)
+ goto unlock;
+ /* Add new threshold */
+ event->eventfd = eventfd;
+ list_add(&event->list, &memcg->oom_notify);
+
+ /* already in OOM ? */
+ if (atomic_read(&memcg->oom_lock))
+ eventfd_signal(eventfd, 1);
+ ret = 0;
+unlock:
+ mutex_unlock(&memcg_oom_mutex);
+
+ return ret;
+}
+
+static int mem_cgroup_oom_unregister_event(struct cgroup *cgrp,
+ struct cftype *cft, struct eventfd_ctx *eventfd)
+{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+ struct mem_cgroup_eventfd_list *ev, *tmp;
+ int type = MEMFILE_TYPE(cft->private);
+
+ BUG_ON(type != _OOM_TYPE);
+
+ mutex_lock(&memcg_oom_mutex);
+
+ list_for_each_entry_safe(ev, tmp, &mem->oom_notify, list) {
+ if (ev->eventfd == eventfd) {
+ list_del(&ev->list);
+ kfree(ev);
+ }
+ }
+
+ mutex_unlock(&memcg_oom_mutex);
+
+ return 0;
+}
+
static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
.read_u64 = mem_cgroup_read,
- .register_event = mem_cgroup_register_event,
- .unregister_event = mem_cgroup_unregister_event,
+ .register_event = mem_cgroup_usage_register_event,
+ .unregister_event = mem_cgroup_usage_unregister_event,
},
{
.name = "max_usage_in_bytes",
@@ -3623,6 +3705,12 @@ static struct cftype mem_cgroup_files[]
.read_u64 = mem_cgroup_move_charge_read,
.write_u64 = mem_cgroup_move_charge_write,
},
+ {
+ .name = "oom_control",
+ .register_event = mem_cgroup_oom_register_event,
+ .unregister_event = mem_cgroup_oom_unregister_event,
+ .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
+ },
};
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -3631,8 +3719,8 @@ static struct cftype memsw_cgroup_files[
.name = "memsw.usage_in_bytes",
.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
.read_u64 = mem_cgroup_read,
- .register_event = mem_cgroup_register_event,
- .unregister_event = mem_cgroup_unregister_event,
+ .register_event = mem_cgroup_usage_register_event,
+ .unregister_event = mem_cgroup_usage_unregister_event,
},
{
.name = "memsw.max_usage_in_bytes",
@@ -3876,6 +3964,7 @@ mem_cgroup_create(struct cgroup_subsys *
}
mem->last_scanned_child = 0;
spin_lock_init(&mem->reclaim_param_lock);
+ INIT_LIST_HEAD(&mem->oom_notify);
if (parent)
mem->swappiness = get_swappiness(parent);
Index: mmotm-2.6.34-Mar9/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.34-Mar9.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.34-Mar9/Documentation/cgroups/memory.txt
@@ -184,6 +184,9 @@ limits on the root cgroup.
Note2: When panic_on_oom is set to "2", the whole system will panic.
+When oom event notifier is registered, event will be delivered.
+(See oom_control section)
+
2. Locking
The memory controller uses the following hierarchy
@@ -488,7 +491,22 @@ threshold in any direction.
It's applicable for root and non-root cgroup.
-10. TODO
+10. OOM Control
+
+Memory controler implements oom notifier using cgroup notification
+API (See cgroups.txt). It allows to register multiple oom notification
+delivery and gets notification when oom happens.
+
+To register a notifier, application need:
+ - create an eventfd using eventfd(2)
+ - open memory.oom_control file
+ - write string like "<event_fd> <memory.oom_control>" to cgroup.event_control
+
+Application will be notifier through eventfd when oom happens.
+OOM notification doesn't work for root cgroup.
+
+
+11. TODO
1. Add support for accounting huge pages (as a separate controller)
2. Make per-cgroup scanner reclaim not-shared pages first
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: fs: fat: use hex_asc_lo/hex_asc_hi instead of custom one
http://groups.google.com/group/linux.kernel/t/92eb1fedc85b2dd2?hl=en
==============================================================================
== 1 of 2 ==
Date: Thurs, Mar 11 2010 12:10 am
From: Andy Shevchenko
T24gVGh1LCBNYXIgMTEsIDIwMTAgYXQgOToyMSBBTSwgSm9lIFBlcmNoZXMgPGpvZUBwZXJjaGVz
LmNvbT4gd3JvdGU6Cj4+ID4gKyDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDC
oCAqb3ArKyA9IGhleF9hc2NfaGkoZWMgPj4gOCk7Cj4+ID4gKyDCoCDCoCDCoCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCAqb3ArKyA9IGhleF9hc2NfbG8oZWMgPj4gOCk7Cj4+ID4gKyDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCAqb3ArKyA9IGhleF9hc2NfaGko
ZWMpOwo+PiA+ICsgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgKm9wKysg
PSBoZXhfYXNjX2xvKGVjKTsKPj4gV2h5IGRvZXNuJ3QgdGhpcyB1c2UgcGFja19oZXhfYnl0ZSgp
Pwo+IG9yIHNucHJpbnRmCnNwcmludGYgbG9va3MgbGlrZSBvdmVya2lsbCBoZXJlLgoKLS0gCldp
dGggQmVzdCBSZWdhcmRzLApBbmR5IFNoZXZjaGVua28K
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Joe Perches
On Thu, 2010-03-11 at 09:59 +0200, Andy Shevchenko wrote:
> On Thu, Mar 11, 2010 at 9:21 AM, Joe Perches <joe@perches.com> wrote:
> >> > + *op++ = hex_asc_hi(ec >> 8);
> >> > + *op++ = hex_asc_lo(ec >> 8);
> >> > + *op++ = hex_asc_hi(ec);
> >> > + *op++ = hex_asc_lo(ec);
> >> Why doesn't this use pack_hex_byte()?
> > or snprintf
> sprintf looks like overkill here.
It's shorter and more intelligible though
op += sprintf(op, ":%04x:%04x", etc)
cheers, Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: kbuild: Do not unnecessarily regenerate modules.builtin
http://groups.google.com/group/linux.kernel/t/a2bfa67b9e0dbb06?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Michal Marek
Hi Stephen,
On Thu, Mar 11, 2010 at 04:03:26PM +1100, Stephen Rothwell wrote:
> This breaks a "make modules modules_install" type build:
[...]
> $ make O=../ntest.obj INSTALL_MOD_PATH=../ -s modules modules_install
> cp: cannot stat `/home/sfr/kernels/ntest.obj/modules.builtin': No such file or directory
Andrew reported this yesterday, too. The followin patch should fix it
(added to the for-next branch):
From 73d1393eb8507ed5fd7f8e696f6b1ecc18035ebe Mon Sep 17 00:00:00 2001
From: Michal Marek <mmarek@suse.cz>
Date: Wed, 10 Mar 2010 12:28:58 +0100
Subject: [PATCH] kbuild: Generate modules.builtin in make modules_install
The previous approach didn't work if one did
make modules && make modules_install
Add modules.builtin as dependency of _modinst_, which is the target that
actually needs the file.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
---
Makefile | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Makefile b/Makefile
index 160cada..b98943a 100644
--- a/Makefile
+++ b/Makefile
@@ -1086,7 +1086,7 @@ ifdef CONFIG_MODULES
# By default, build modules as well
-all: modules modules.builtin
+all: modules
# Build modules
#
@@ -1104,7 +1104,7 @@ modules: $(vmlinux-dirs) $(if $(KBUILD_BUILTIN),vmlinux)
modules.builtin: $(vmlinux-dirs:%=%/modules.builtin)
$(Q)$(AWK) '!x[$$0]++' $^ > $(objtree)/modules.builtin
-%/modules.builtin: include/config/auto.conf | modules
+%/modules.builtin: include/config/auto.conf
$(Q)$(MAKE) $(modbuiltin)=$*
@@ -1117,7 +1117,7 @@ PHONY += modules_install
modules_install: _modinst_ _modinst_post
PHONY += _modinst_
-_modinst_:
+_modinst_: modules.builtin
@if [ -z "`$(DEPMOD) -V 2>/dev/null | grep module-init-tools`" ]; then \
echo "Warning: you may need to install module-init-tools"; \
echo "See http://www.codemonkey.org.uk/docs/post-halloween-2.6.txt";\
--
1.6.6.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Patch for tracing c states (power_end) on x86
http://groups.google.com/group/linux.kernel/t/91e464354797bdad?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Peter Zijlstra
On Thu, 2010-03-11 at 07:36 +0100, Robert Schöne wrote:
> Did anyone look at the problem described and checked the patch?
> I still received no reply.
Ingo, how about applying it and letting people complain when they don't
agree? :-)
> Am Donnerstag, den 25.02.2010, 13:52 +0100 schrieb Peter Zijlstra:
> > On Wed, 2010-02-24 at 09:19 +0100, Robert Schöne wrote:
> > > Hello,
> > >
> > > Since noone replied to my last mail (Febr. 15th, 11:42), describing the
> > > way to fix the missing c-state tracing, here's a patch.
> > > Maybe its easier that way.
> > >
> > > (I used the perf-fixes-for-linus git tree to obtain a
> > > more-then-up-to-date version)
> >
> > Arjan, any comments?, you seem skilled with this power stuff ;-)
> >
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 02d6780..b1cfb88 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -384,6 +384,7 @@ void default_idle(void)
> > > else
> > > local_irq_enable();
> > > current_thread_info()->status |= TS_POLLING;
> > > + trace_power_end(1);
> > > } else {
> > > local_irq_enable();
> > > /* loop is done by the caller */
> > > @@ -451,6 +452,7 @@ void mwait_idle_with_hints(unsigned long ax,
> > > unsigned long cx)
> > > if (!need_resched())
> > > __mwait(ax, cx);
> > > }
> > > + trace_power_end((ax>>4)+1);
> > > }
> > >
> > > /* Default MONITOR/MWAIT with no hints, used for default C1 state */
> > > @@ -467,6 +469,7 @@ static void mwait_idle(void)
> > > __sti_mwait(0, 0);
> > > else
> > > local_irq_enable();
> > > + trace_power_end(1);
> > > } else
> > > local_irq_enable();
> > > }
> > >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: linux-next: build failure after merge of the pcmcia tree
http://groups.google.com/group/linux.kernel/t/550261f3edfeadd9?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Dominik Brodowski
Hey Stephen,
On Thu, Mar 11, 2010 at 01:21:49PM +1100, Stephen Rothwell wrote:
> After merging the pcmcia tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
>
> drivers/pcmcia/pcmcia_resource.c:704:1: error: unterminated #ifdef
oops, I'm very sorry, seems like I pushed the wrong tree to master... fixed
it; so should be fine for tomorrow.
Best,
Dominik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: memcg: disable irq at page cgroup lock (Re: [PATCH -mmotm 3/4] memcg:
dirty pages accounting and limiting infrastructure)
http://groups.google.com/group/linux.kernel/t/eb7661b915f1ef96?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: KAMEZAWA Hiroyuki
On Thu, 11 Mar 2010 16:50:20 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On Thu, 11 Mar 2010 15:15:11 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Thu, 11 Mar 2010 14:13:00 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > On Thu, 11 Mar 2010 13:58:47 +0900
> > > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > > I'll consider yet another fix for race in account migration if I can.
> > > > >
> > > > me too.
> > > >
> > >
> > > How about this ? Assume that the race is very rare.
> > >
> > > 1. use trylock when updating statistics.
> > > If trylock fails, don't account it.
> > >
> > > 2. add PCG_FLAG for all status as
> > >
> > > + PCG_ACCT_FILE_MAPPED, /* page is accounted as file rss*/
> > > + PCG_ACCT_DIRTY, /* page is dirty */
> > > + PCG_ACCT_WRITEBACK, /* page is being written back to disk */
> > > + PCG_ACCT_WRITEBACK_TEMP, /* page is used as temporary buffer for FUSE */
> > > + PCG_ACCT_UNSTABLE_NFS, /* NFS page not yet committed to the server */
> > >
> > > 3. At reducing counter, check PCG_xxx flags by
> > > TESTCLEARPCGFLAG()
> > >
> > > This is similar to an _used_ method of LRU accounting. And We can think this
> > > method's error-range never go too bad number.
> > >
> I agree with you. I've been thinking whether we can remove page cgroup lock
> in update_stat as we do in lru handling codes.
>
> > > I think this kind of fuzzy accounting is enough for writeback status.
> > > Does anyone need strict accounting ?
> > >
> >
> IMHO, we don't need strict accounting.
>
> > How this looks ?
> I agree to this direction. One concern is we re-introduce "trylock" again..
>
Yes, it's my concern, too.
> Some comments are inlined.
> > + switch (idx) {
> > + case MEMCG_NR_FILE_MAPPED:
> > + if (charge) {
> > + if (!PageCgroupFileMapped(pc))
> > + SetPageCgroupFileMapped(pc);
> > + else
> > + val = 0;
> > + } else {
> > + if (PageCgroupFileMapped(pc))
> > + ClearPageCgroupFileMapped(pc);
> > + else
> > + val = 0;
> > + }
> Using !TestSetPageCgroupFileMapped(pc) or TestClearPageCgroupFileMapped(pc) is better ?
>
I used this style because we're under lock. (IOW, to show we're guarded by lock.)
> > + idx = MEM_CGROUP_STAT_FILE_MAPPED;
> > + break;
> > + default:
> > + BUG();
> > + break;
> > + }
> > /*
> > * Preemption is already disabled. We can use __this_cpu_xxx
> > */
> > - __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], val);
> > + __this_cpu_add(mem->stat->count[idx], val);
> > +}
> >
> > -done:
> > - unlock_page_cgroup(pc);
> > +void mem_cgroup_update_stat(struct page *page, int idx, bool charge)
> > +{
> > + struct page_cgroup *pc;
> > +
> > + pc = lookup_page_cgroup(page);
> > + if (unlikely(!pc))
> > + return;
> > +
> > + if (trylock_page_cgroup(pc)) {
> > + __mem_cgroup_update_stat(pc, idx, charge);
> > + unlock_page_cgroup(pc);
> > + }
> > + return;
> > +}
> > +
> > +static void mem_cgroup_migrate_stat(struct page_cgroup *pc,
> > + struct mem_cgroup *from, struct mem_cgroup *to)
> > +{
> > + preempt_disable();
> > + if (PageCgroupFileMapped(pc)) {
> > + __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > + __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > + }
> > + preempt_enable();
> > +}
> > +
> I think preemption is already disabled here too(by lock_page_cgroup()).
>
Ah, yes.
> > +static void
> > +__mem_cgroup_stat_fixup(struct page_cgroup *pc, struct mem_cgroup *mem)
> > +{
> > + /* We'are in uncharge() and lock_page_cgroup */
> > + if (PageCgroupFileMapped(pc)) {
> > + __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > + ClearPageCgroupFileMapped(pc);
> > + }
> > }
> >
> ditto.
>
ok.
> > /*
> > @@ -1810,13 +1859,7 @@ static void __mem_cgroup_move_account(st
> > VM_BUG_ON(pc->mem_cgroup != from);
> >
> > page = pc->page;
> > - if (page_mapped(page) && !PageAnon(page)) {
> > - /* Update mapped_file data for mem_cgroup */
> > - preempt_disable();
> > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > - preempt_enable();
> > - }
> > + mem_cgroup_migrate_stat(pc, from, to);
> > mem_cgroup_charge_statistics(from, pc, false);
> > if (uncharge)
> > /* This is not "cancel", but cancel_charge does all we need. */
> I welcome this fixup. IIUC, we have stat leak in current implementation.
>
If necessary, I'd like to prepare fixed one as independent patch for mmotm.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: cpuset,mm: update task's mems_allowed lazily
http://groups.google.com/group/linux.kernel/t/8cf36eef7f96d9f5?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Nick Piggin
On Tue, Mar 09, 2010 at 03:25:54PM +0800, Miao Xie wrote:
> on 2010-3-9 5:46, David Rientjes wrote:
> [snip]
> >> Considering the change of task->mems_allowed is not frequent, so in this patch,
> >> I use two variables as a tag to indicate whether task->mems_allowed need be
> >> update or not. And before setting the tag, cpuset caches the new mask of every
> >> task at its task_struct.
> >>
> >
> > So what exactly is the benefit of 58568d2 from last June that caused this
> > issue to begin with? It seems like this entire patchset is a revert of
> > that commit. So why shouldn't we just revert that one commit and then add
> > the locking and updating necessary for configs where
> > MAX_NUMNODES > BITS_PER_LONG on top?
>
> I worried about the consistency of task->mempolicy with task->mems_allowed for
> configs where MAX_NUMNODES <= BITS_PER_LONG.
>
> The problem that I worried is fowllowing:
> When the kernel allocator allocates pages for tasks, it will access task->mempolicy
> first and get the allowed node, then check whether that node is allowed by
> task->mems_allowed.
>
> But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same
> time. the kernel allocator may access the inconsistent information of ->mempolicy
> and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy,
> but checks whether that node is allowed by new mems_allowed which does't intersect
> old mempolicy.
>
> So I made this patchset.
I like your focus on keeping the hotpath light, but it is getting a bit
crazy. I wonder if it wouldn't be better just to teach those places that
matter to retry on finding an inconsistent nodemask? The only failure
case to worry about is getting an empty nodemask, isn't it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: davinci: MMC: Pass number of SG segments as platform data
http://groups.google.com/group/linux.kernel/t/432793a9cfd7c2de?hl=en
==============================================================================
== 1 of 1 ==
Date: Thurs, Mar 11 2010 12:20 am
From: Sudhakar Rajashekhara
On some platforms like DM355, the number of EDMA parameter
slots available for EDMA_SLOT_ANY usage are few. In such cases,
if MMC/SD uses 16 slots for each instance of MMC controller,
then the number of slots available for other modules will be
very few.
By passing the number of EDMA slots to be used in MMC driver
from platform data, EDMA slots available for other purposes
can be controlled.
Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
---
arch/arm/mach-davinci/include/mach/mmc.h | 3 +++
drivers/mmc/host/davinci_mmc.c | 22 +++++++++++++++-------
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/arch/arm/mach-davinci/include/mach/mmc.h b/arch/arm/mach-davinci/include/mach/mmc.h
index 5a85e24..384fc0e 100644
--- a/arch/arm/mach-davinci/include/mach/mmc.h
+++ b/arch/arm/mach-davinci/include/mach/mmc.h
@@ -22,6 +22,9 @@ struct davinci_mmc_config {
/* Version of the MMC/SD controller */
u8 version;
+
+ /* Number of sg segments */
+ u32 nr_sg;
};
void davinci_setup_mmc(int module, struct davinci_mmc_config *config);
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 3bd0ba2..19c050c 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -137,15 +137,15 @@
/*
* One scatterlist dma "segment" is at most MAX_CCNT rw_threshold units,
- * and we handle up to NR_SG segments. MMC_BLOCK_BOUNCE kicks in only
+ * and we handle up to MAX_NR_SG segments. MMC_BLOCK_BOUNCE kicks in only
* for drivers with max_hw_segs == 1, making the segments bigger (64KB)
- * than the page or two that's otherwise typical. NR_SG == 16 gives at
- * least the same throughput boost, using EDMA transfer linkage instead
- * of spending CPU time copying pages.
+ * than the page or two that's otherwise typical. nr_sg (passed from
+ * platform data) == 16 gives at least the same throughput boost, using
+ * EDMA transfer linkage instead of spending CPU time copying pages.
*/
#define MAX_CCNT ((1 << 16) - 1)
-#define NR_SG 16
+#define MAX_NR_SG 16
static unsigned rw_threshold = 32;
module_param(rw_threshold, uint, S_IRUGO);
@@ -192,7 +192,7 @@ struct mmc_davinci_host {
struct edmacc_param tx_template;
struct edmacc_param rx_template;
unsigned n_link;
- u32 links[NR_SG - 1];
+ u32 links[MAX_NR_SG - 1];
/* For PIO we walk scatterlists one segment at a time. */
unsigned int sg_len;
@@ -202,6 +202,8 @@ struct mmc_davinci_host {
u8 version;
/* for ns in one cycle calculation */
unsigned ns_in_one_cycle;
+ /* Number of sg segments */
+ u32 nr_sg;
#ifdef CONFIG_CPU_FREQ
struct notifier_block freq_transition;
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home