Wednesday, February 3, 2010

linux.kernel - 25 new messages in 16 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* Improving OOM killer - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/389db2dcf6479d30?hl=en
* lkdtm: Add debugfs access and loosen KPROBE ties - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/6fe510f7d559ec5f?hl=en
* enhanced reimplemention of the kfifo API - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/21e9c147c9adf137?hl=en
* ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early
exception 08 rip 246:10 error ffffffff810251b5 cr2 0) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/42f60222e13dd142?hl=en
* PATCH: Add non-Virtex 5 support for LL TEMAC driver - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/463e8b64b3d3d57d?hl=en
* [PATCH] PM: disable nonboot cpus before suspending devices - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/e1dc3aa81b42a297?hl=en
* syslog: use defined constants instead of raw numbers - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/e1594790606b7db6?hl=en
* PCI: split up pci_read_bridge_bases() - 5 messages, 1 author
http://groups.google.com/group/linux.kernel/t/513ef1add7919cee?hl=en
* libertas/if_spi: needs linux/semaphore.h - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/d65e4cde91861666?hl=en
* x86: ptrace and core-dump extensions for xstate - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/78d570b1c5c8bad5?hl=en
* PCI: try enabling "pci=use_crs" again - 3 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a44a2df9f48247da?hl=en
* HID: add multi-input quirk for eGalax Touchcontroller - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/007c58f0d072109c?hl=en
* [PATCH] perf_events, x86: PEBS support - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/0acdc85baea49653?hl=en
* [PATCH 0/4] devmem and readahead fixes for 2.6.33 - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/875c093874f6d0ad?hl=en
* ntp: add hardpps implementation - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/0c8202dba360df9c?hl=en
* USB mass storage and ARM cache coherency - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/68938cdf1fa061a9?hl=en

==============================================================================
TOPIC: Improving OOM killer
http://groups.google.com/group/linux.kernel/t/389db2dcf6479d30?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 3:00 pm
From: Lubos Lunak

Given that the badness() proposal I see in your another mail uses
get_mm_rss(), I take it that you've meanwhile changed your mind on the VmSize
vs VmRSS argument and considered that argument irrelevant now. I will comment
only on the suggested use of oom_adj on the desktop here. And actually I hope
that if something reasonably similar to your badness() proposal replaces the
current one it will make any use of oom_adj not needed on the desktop in the
usual case, so this may be irrelevant as well.

On Wednesday 03 of February 2010, David Rientjes wrote:
> On Tue, 2 Feb 2010, Lubos Lunak wrote:
> > Not that it really matters - the net result is that OOM killer usually
> > decides to kill kdeinit or ksmserver, starts killing their children,
> > vital KDE processes, and since the offenders are not among them, it ends
> > up either terminating the whole session by killing ksmserver or killing
> > enough vital processes there to free enough memory for the offenders to
> > finish their work cleanly.
>
> The kernel cannot possibly know what you consider a "vital" process, for
> that understanding you need to tell it using the very powerful
> /proc/pid/oom_adj tunable. I suspect if you were to product all of
> kdeinit's children by patching it to be OOM_DISABLE so that all threads it
> forks will inherit that value you'd actually see much improved behavior.

No. Almost everything in KDE is spawned by kdeinit, so everything would get
the adjustment, which means nothing would in practice get the adjustment.

> I'd also encourage you to talk to the KDE developers to ensure that proper
> precautions are taken to protect it in such conditions since people who
> use such desktop environments typically don't want them to be sacrificed
> for memory.

I am a KDE developer, it's written in my signature. And I've already talked
enough to the KDE developer who has done the oom_adj code that's already
there, as that's also me. I don't know kernel internals, but that doesn't
mean I'm completely clueless about the topic of the discussion I've started.

> > Worse, it worked for about a year or two and now it has only shifted the
> > problem elsewhere and that's it. We now protect kdeinit, which means the
> > OOM killer's choice will very likely ksmserver then. Ok, so let's say now
> > we start protecting also ksmserver, that's some additional hassle setting
> > it up, but that's doable. Now there's a good chance the OOM killer's
> > choice will be kwin (as a compositing manager it can have quite large
> > mappings because of graphics drivers). So ok, we need to protect the
> > window manager, but since that's not a hardcoded component like
> > ksmserver, that's even more hassle.
>
> No, you don't need to protect every KDE process from the oom killer unless
> it is going to be a contender for selection. You could certainly do so
> for completeness, but it shouldn't be required unless the nature of the
> thread demands it such that it forks many vital tasks (kdeinit) or its
> first-generation children's memory consumption can't be known either
> because it depends on how many children it can fork or their memory
> consumption is influenced by the user's work.

1) I think you missed that I said that every KDE application with the current
algorithm can be potentially a contender for selection, and I provided
numbers to demonstrate that in a selected case. Just because such application
is not vital does not mean it's good for it to get killed instead of an
obvious offender.

2) You probably do not realize the complexity involved in using oom_adj in a
desktop. Even when doing that manually I would have some difficulty finding
the right setup for my own desktop use. It'd be probably virtually impossible
to write code that would do it at least somewhat right with all the widely
differing various desktop setups that dynamically change.

3) oom_adj is ultimately just a kludge to handle special cases where the
heuristic doesn't get it right for whatever strange reason. But even you
yourself in another mail presented a heuristic that I believe would make any
use of oom_adj on the desktop unnecessary in the usual cases. The usual
desktop is not a special case.

> The heuristics are always well debated in this forum and there's little
> chance that we'll ever settle on a single formula that works for all
> possible use cases. That makes oom_adj even more vital to the overall
> efficiency of the oom killer, I really hope you start to use it to your
> advantage.

I really hope your latest badness() heuristics proposal allows us to dump
even the oom_adj use we already have.

--
Lubos Lunak
openSUSE Boosters team, KDE developer
l.lunak@suse.cz , l.lunak@kde.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: lkdtm: Add debugfs access and loosen KPROBE ties
http://groups.google.com/group/linux.kernel/t/6fe510f7d559ec5f?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 3:00 pm
From: Andrew Morton


On Wed, 3 Feb 2010 09:52:24 +0100
Simon Kagstrom <simon.kagstrom@netinsight.net> wrote:

> This patch adds a debugfs interface and additional failure modes to
> LKDTM to provide similar functionality to the provoke-crash driver
> submitted here:
>
> http://lwn.net/Articles/371208/
>
> Crashes can now be induced either through module parameters (as before)
> or through the debugfs interface as in provoke-crash.
>
> The patch also provides a new "direct" interface, where KPROBES are not
> used, i.e., the crash is invoked directly upon write to the debugfs
> file. When built without KPROBES configured, only this mode is available.
>
> Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
> ---
> I reused the debugfs directory name from provoke-crash since I think the
> name is more descriptive than "lkdtm".
>
> I also put some documentation in Documentation/fault-injection. While I
> know that the fault-injection framework isn't used for this driver, I
> think the name make sense (that's where I'd look for functionality like
> this).
>
>
> ...
>
> +static void lkdtm_do_action(enum ctype which)
> {
> - printk(KERN_INFO "lkdtm : Crash point %s of type %s hit\n",
> - cpoint_name, cpoint_type);
> + switch (which) {
> + case PANIC:
> + panic("dumptest");
> + break;
> + case BUG:
> + BUG();
> + break;
> + case EXCEPTION:
> + *((int *) 0) = 0;
> + break;
> + case LOOP:
> + for (;;);

Please feed the patch through scripts/checkpatch.pl and contemplate the
resulting report.

> + break;
> + case OVERFLOW:
> + (void) recursive_loop(0);
> + break;
> + case CORRUPT_STACK:
> + {
> + volatile u32 data[8];
> + volatile u32 *p = data;
> +
> + p[12] = 0x12345678;
> + } break;

Like this:

case CORRUPT_STACK: {
volatile u32 data[8];
volatile u32 *p = data;

p[12] = 0x12345678;
break;
}

> + case UNALIGNED_LOAD_STORE_WRITE:
> + {
> + static u8 data[5] __attribute__((aligned(4))) = {1,2,3,4,5};
> + u32 *p;
> + u32 val = 0x12345678;
> +
> + p = (u32*)(data + 1);
> + if (*p == 0)
> + val = 0x87654321;
> + *p = val;
> + } break;
> + case OVERWRITE_ALLOCATION:
> + {
> + size_t len = 1020;
> + u32 *data = kmalloc(len, GFP_KERNEL);
> +
> + data[1024 / sizeof(u32)] = 0x12345678;
> + kfree(data);
> + } break;
> + case WRITE_AFTER_FREE:
> + {
> + size_t len = 1024;
> + u32 *data = kmalloc(len, GFP_KERNEL);
> +
> + kfree(data);
> + schedule();
> + memset(data, 0x78, len);
> + } break;
> + case NONE:
> + default:
> + break;
> + }
> +
> +}
> +
>
> ...
>
> +static ssize_t do_register_entry(enum cname which, struct file *f,
> + const char __user *user_buf, size_t count, loff_t *off)
> +{
> + char *buf;
> + int err;
> +
> + if (count >= PAGE_SIZE)
> + return -EINVAL;
> +
> + buf = (char *)__get_free_page(GFP_TEMPORARY);

Someone ought to write

static inline void *__get_free_page_ptr(gfp_t flags)
{
return (void *)__get_free_page(flags);
}

and then delete 100000000 typecasts.

The use of GFP_TEMPORARY is incorrect. This page is not reclaimable.

> + if (!buf)
> + return -ENOMEM;
> + if (copy_from_user(buf, user_buf, count)) {
> + free_page((unsigned long) buf);
> + return -EFAULT;
> + }
> + /* NULL-terminate and remove enter */
> + buf[count] = '\0';
> + if (buf[count - 1] == '\r' || buf[count - 1] == '\n')
> + buf[count - 1] = '\0';

Use strim().

> + cptype = parse_cp_type(buf, count);
> + free_page((unsigned long) buf);

Write free_page_ptr() and delete another 100000000.

> +
> + if (cptype == NONE)
> + return -EINVAL;
> +
> + err = lkdtm_register_cpoint(which);
> + if (err < 0)
> + return err;
> +
> + *off += count;
> +
> + return count;
> +}
> +
>
> ...
>
> +/* Special entry to just crash directly. Available without KPROBEs */
> +static ssize_t direct_entry(struct file *f, const char __user *user_buf,
> + size_t count, loff_t *off)
> +{
> + enum ctype type;
> + char *buf;
> +
> + if (count >= PAGE_SIZE)
> + return -EINVAL;
> + if (count < 1)
> + return -EINVAL;
> +
> + buf = (char *)__get_free_page(GFP_TEMPORARY);

GFP_KERNEL

> + if (!buf)
> + return -ENOMEM;
> + if (copy_from_user(buf, user_buf, count)) {
> + free_page((unsigned long) buf);
> + return -EFAULT;
> + }
> + /* NULL-terminate and remove enter */
> + buf[count] = '\0';
> + if (buf[count - 1] == '\r' || buf[count - 1] == '\n')
> + buf[count - 1] = '\0';

strim().

> + type = parse_cp_type(buf, count);
> + free_page((unsigned long) buf);
> + if (type == NONE)
> + return -EINVAL;
> +
> + printk(KERN_INFO "lkdtm : Performing direct entry %s\n",
> + cp_type_to_str(type));
> + lkdtm_do_action(type);
> + *off += count;
> +
> + return count;
> +}
> +
> +struct crash_entry
> +{

struct crash_entry {

> + const char *name;
> + struct file_operations fops;
> +};
> +
> +static struct crash_entry crash_entries[] = {

const, perhaps.

> + {"DIRECT", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = direct_entry}},
> + {"INT_HARDWARE_ENTRY", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = int_hardware_entry}},
> + {"INT_HW_IRQ_EN", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = int_hw_irq_en}},
> + {"INT_TASKLET_ENTRY", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = int_tasklet_entry}},
> + {"FS_DEVRW", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = fs_devrw_entry}},
> + {"MEM_SWAPOUT", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = mem_swapout_entry}},
> + {"TIMERADD", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = timeradd_entry}},
> + {"SCSI_DISPATCH_CMD", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = scsi_dispatch_cmd_entry}},
> + {"IDE_CORE_CP", {.read = lkdtm_debugfs_read,
> + .open = lkdtm_debugfs_open, .write = ide_core_cp_entry}},
> +};
> +
> +static struct dentry *lkdtm_debugfs_root;
> +
> +static int __init lkdtm_module_init(void)
> +{
> + int ret = -EINVAL;
> + int n_debugfs_entries = 1; /* Assume only the direct entry */
> + int i;
> +
> + /* Register debugfs interface */
> + lkdtm_debugfs_root = debugfs_create_dir("provoke-crash", NULL);
> + if (!lkdtm_debugfs_root) {
> + printk(KERN_ERR "lkdtm: creating root dir failed\n");
> + return -ENODEV;
> + }
> +
> +#if defined(CONFIG_KPROBES)

#ifdef will suffice

> + n_debugfs_entries = ARRAY_SIZE(crash_entries);
> +

No comments:

Post a Comment