Wednesday, February 3, 2010

linux.kernel - 26 new messages in 16 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* x86: fix race in create_irq_nr on irq_desc - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8d79051b75e0b2aa?hl=en
* core: workqueue: BUG_ON on workqueue recursion - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a975d634243c5123?hl=en
* netfilter: per netns nf_conntrack_cachep - 7 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/8498db207556ff69?hl=en
* regression in 2.6.27.45 with usb and suspend-to-disk - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/f18f89f344fa881c?hl=en
* radio-si470x-common: -EINVAL overwritten in si470x_vidioc_s_tuner() - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/b9df83e9a4867ffc?hl=en
* Improving OOM killer - 4 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/389db2dcf6479d30?hl=en
* blackfin: use generic ptrace_resume code - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5497cb48daecd717?hl=en
* new hardware support for atang tree - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/61c03fbc95cec86c?hl=en
* bitops: compile time optimization for hweight_long(CONSTANT) - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/f58ac10e7917a328?hl=en
* mxc: Core support for i.MX5 series of processors from Freescale - 1 messages,
1 author
http://groups.google.com/group/linux.kernel/t/281ca1a532ca6d76?hl=en
* [PATCH] vmscan: balance local_irq_disable() and local_irq_enable() - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/9c6cfefcada8187f?hl=en
* enhanced reimplemention of the kfifo API - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/21e9c147c9adf137?hl=en
* improve sys_personality for compat architectures - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c11a869d434f48b8?hl=en
* exec: refactor how call_usermodehelper works, and update the sense of the
core_pipe recursion check (v3) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/5201de27ec6123a3?hl=en
* tags: put function prototypes back! - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/b4fc5beeed7d389a?hl=en
* reiserfs deadlock - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/8725f85be3262cd6?hl=en

==============================================================================
TOPIC: x86: fix race in create_irq_nr on irq_desc
http://groups.google.com/group/linux.kernel/t/8d79051b75e0b2aa?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 11:40 am
From: Yinghai Lu


On 02/03/2010 09:42 AM, Brandon Philips wrote:
> On 02:20 Wed 03 Feb 2010, Yinghai Lu wrote:
>> On 02/02/2010 07:31 PM, Brandon Philips wrote:
>>> Race in create_irq_nr():
>>>
>>> - Thread 1 loops through and calls irq_to_desc_alloc_node with new=0x66.
>>>
>>> - Thread 2 has exited the loop with irq=0x66 and calls dynamic_irq_init(0x66)
>>> setting desc->chip_data = NULL
>>>
>>> - Thread 1 then dereferences NULL via desc_new->chip_data->vector
>>
>> two threads get same irq?
>
> This race happened when two drivers were setting up MSI-X at the same
> time via pci_enable_msix(). See this dmesg excerpt:
>
> [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
> [ 85.170611] alloc irq_desc for 99 on node -1
> [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
> [ 85.170614] alloc kstat_irqs on node -1
> [ 85.170616] alloc irq_2_iommu on node -1
> [ 85.170617] alloc irq_desc for 100 on node -1
> [ 85.170619] alloc kstat_irqs on node -1
> [ 85.170621] alloc irq_2_iommu on node -1
> [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
> [ 85.170626] alloc irq_desc for 101 on node -1
> [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
> [ 85.170630] alloc kstat_irqs on node -1
> [ 85.170631] alloc irq_2_iommu on node -1
> [ 85.170635] alloc irq_desc for 102 on node -1
> [ 85.170636] alloc kstat_irqs on node -1
> [ 85.170639] alloc irq_2_iommu on node -1
> [ 85.170646] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000088
>
> As you can see igb and ixgbe are both alternating on create_irq_nr()
> via pci_enable_msix() in their probe function. So, let me rewrite my
> explanation using this example:
>
> ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
> choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
> calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
> NULL via dynamic_irq_init().
>
> igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
> via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
>
> cfg_new = irq_desc_ptrs[102]->chip_data;
> if (cfg_new->vector != 0)
> continue;
>
> This hits the NULL deref.
>

please try following patch in addition to

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=37ef2a3029fde884808ff1b369677abc7dd9a79a

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7edafc7..14099ba 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3280,12 +3280,9 @@ unsigned int create_irq_nr(unsigned int irq_want, int node)
}
spin_unlock_irqrestore(&vector_lock, flags);

- if (irq > 0) {
- dynamic_irq_init(irq);
- /* restore it, in case dynamic_irq_init clear it */
- if (desc_new)
- desc_new->chip_data = cfg_new;
- }
+ if (irq > 0)
+ dynamic_irq_init_keep_chip_data(irq);
+
return irq;
}

diff --git a/include/linux/irq.h b/include/linux/irq.h
index d13492d..cd6b870 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -400,6 +400,7 @@ static inline int irq_has_action(unsigned int irq)

/* Dynamic irq helper functions */
extern void dynamic_irq_init(unsigned int irq);
+void dynamic_irq_init_keep_chip_data(unsigned int irq);
extern void dynamic_irq_cleanup(unsigned int irq);

/* Set/get chip/data for an IRQ: */
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index ecc3fa2..370dbc4 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -22,7 +22,7 @@
* dynamic_irq_init - initialize a dynamically allocated irq
* @irq: irq number to initialize
*/
-void dynamic_irq_init(unsigned int irq)
+static void dynamic_irq_init_x(unsigned int irq, bool keep_chip_data)
{
struct irq_desc *desc;
unsigned long flags;
@@ -41,7 +41,8 @@ void dynamic_irq_init(unsigned int irq)
desc->depth = 1;
desc->msi_desc = NULL;
desc->handler_data = NULL;
- desc->chip_data = NULL;
+ if (!keep_chip_data)
+ desc->chip_data = NULL;
desc->action = NULL;
desc->irq_count = 0;
desc->irqs_unhandled = 0;
@@ -54,6 +55,16 @@ void dynamic_irq_init(unsigned int irq)
raw_spin_unlock_irqrestore(&desc->lock, flags);
}

+void dynamic_irq_init(unsigned int irq)
+{
+ dynamic_irq_init_x(irq, false);
+}
+
+void dynamic_irq_init_keep_chip_data(unsigned int irq)
+{
+ dynamic_irq_init_x(irq, true);
+}
+
/**
* dynamic_irq_cleanup - cleanup a dynamically allocated irq
* @irq: irq number to initialize
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: core: workqueue: BUG_ON on workqueue recursion
http://groups.google.com/group/linux.kernel/t/a975d634243c5123?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 11:50 am
From: Oleg Nesterov


On 02/03, Simon Kagstrom wrote:
>
> When the workqueue is flushed from workqueue context (recursively), the
> system enters a strange state where things at random (dependent on the
> global workqueue) start misbehaving. For example, for us the console and
> logins locks up while the web server continues running.
>
> Since the system becomes unstable, change this to a BUG_ON instead.

I agree with this patch. We are going to deadlock anyway, if the
condition is true the caller is cwq->current_work, this means
flush_cpu_workqueue() will insert the barrier and hang.

However,

> @@ -482,7 +482,7 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> int active = 0;
> struct wq_barrier barr;
>
> - WARN_ON(cwq->thread == current);
> + BUG_ON(cwq->thread == current);

Another option is change the code to do

if (WARN_ON(cwq->thread == current))
return;

This gives the kernel chance to survive after the warning.

What do you think?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: netfilter: per netns nf_conntrack_cachep
http://groups.google.com/group/linux.kernel/t/8498db207556ff69?hl=en
==============================================================================

== 1 of 7 ==
Date: Wed, Feb 3 2010 11:50 am
From: Jon Masters


On Wed, 2010-02-03 at 14:44 -0500, Jon Masters wrote:
> On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> > On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:

> > > I also think it is necessary to expose net namespace layout
> >
> > Not necessary. Why?
>
> How am I as a sysadmin supposed to figure out which net namespaces exist
> on my system, and as a developer, supposed to debug these situations?

(without Jason's excellent kgdb patches, which really help)

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 7 ==
Date: Wed, Feb 3 2010 11:50 am
From: Jon Masters


On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:
> > *). Per namespace cacheing allocation (the cachep bits). We know it's
> > still possible for weirdness to happen in the SLAB cache here.
>
> Tiny race, needs reproducer.

Maybe. I think it's worth fixing anyway.

> > *). Per namespace hashsize tracking. Existing code corrupts hashtables
> > if the global size is changed when there is more than one netns
>
> I think, no.
> Changing hash size will change hashsize for all netns, current and future.

Nope. Look at the logic in nf_conntrack_set_hashsize where you iterate
over init_net.ct.hash but don't touch other namespaces. So then you go
setting nf_conntrack_htable_size and will deference that in accessing
other per-namespace hashtables using the wrong size information.

> > *). Per namespace expectations. This is for similar reasons to the need
> > for multiple hashtables, though I haven't poked at that.
>
> Expectation cache is not SLAB_DESTROY_BY_RCU, so the logic doesn't
> apply, I hope.

I'm not sure, I didn't poke at that much yet. But hashtables certainly
need fixing unless you can point out where I'm wrong.

> > I also think it is necessary to expose net namespace layout
>
> Not necessary. Why?

How am I as a sysadmin supposed to figure out which net namespaces exist
on my system, and as a developer, supposed to debug these situations?

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 3 of 7 ==
Date: Wed, Feb 3 2010 12:00 pm
From: Alexey Dobriyan


On Wed, Feb 03, 2010 at 02:43:47PM -0500, Jon Masters wrote:
> On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> > On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:
> > > *). Per namespace hashsize tracking. Existing code corrupts hashtables
> > > if the global size is changed when there is more than one netns
> >
> > I think, no.
> > Changing hash size will change hashsize for all netns, current and future.
>
> Nope. Look at the logic in nf_conntrack_set_hashsize where you iterate
> over init_net.ct.hash but don't touch other namespaces. So then you go
> setting nf_conntrack_htable_size and will deference that in accessing
> other per-namespace hashtables using the wrong size information.

> > > I also think it is necessary to expose net namespace layout
> >
> > Not necessary. Why?
>
> How am I as a sysadmin supposed to figure out which net namespaces exist
> on my system, and as a developer, supposed to debug these situations?

We don't expose many relations to userspace, and it's generally fine.

As a developer you fire a debugger and look at net_namespace_list.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 4 of 7 ==
Date: Wed, Feb 3 2010 12:00 pm
From: Jon Masters


On Wed, 2010-02-03 at 21:51 +0200, Alexey Dobriyan wrote:
> On Wed, Feb 03, 2010 at 02:43:47PM -0500, Jon Masters wrote:
> > On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> > > On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:

> > > > I also think it is necessary to expose net namespace layout
> > >
> > > Not necessary. Why?
> >
> > How am I as a sysadmin supposed to figure out which net namespaces exist
> > on my system, and as a developer, supposed to debug these situations?
>
> We don't expose many relations to userspace, and it's generally fine.

I can see slabs via /proc, memory layout, heck I can even expose the
kernel page tables if I really want to. I guess that's not too many :)

> As a developer you fire a debugger and look at net_namespace_list.

Yeah, but being able to cat a nice file is always handy.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 5 of 7 ==
Date: Wed, Feb 3 2010 12:00 pm
From: Alexey Dobriyan


On Wed, Feb 03, 2010 at 02:46:11PM -0500, Jon Masters wrote:
> On Wed, 2010-02-03 at 14:44 -0500, Jon Masters wrote:
> > On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> > > On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:
>
> > > > I also think it is necessary to expose net namespace layout
> > >
> > > Not necessary. Why?
> >
> > How am I as a sysadmin supposed to figure out which net namespaces exist
> > on my system, and as a developer, supposed to debug these situations?
>
> (without Jason's excellent kgdb patches, which really help)

Oh! Just like as usual, thinking and looking at oops and code.

Because when box is dead, netns info is not goint to be printed
anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 6 of 7 ==
Date: Wed, Feb 3 2010 12:10 pm
From: Jon Masters


On Wed, 2010-02-03 at 21:53 +0200, Alexey Dobriyan wrote:
> On Wed, Feb 03, 2010 at 02:46:11PM -0500, Jon Masters wrote:
> > On Wed, 2010-02-03 at 14:44 -0500, Jon Masters wrote:
> > > On Wed, 2010-02-03 at 21:09 +0200, Alexey Dobriyan wrote:
> > > > On Wed, Feb 03, 2010 at 01:38:09PM -0500, Jon Masters wrote:
> >
> > > > > I also think it is necessary to expose net namespace layout
> > > >
> > > > Not necessary. Why?
> > >
> > > How am I as a sysadmin supposed to figure out which net namespaces exist
> > > on my system, and as a developer, supposed to debug these situations?
> >
> > (without Jason's excellent kgdb patches, which really help)
>
> Oh! Just like as usual, thinking and looking at oops and code.
>
> Because when box is dead, netns info is not goint to be printed
> anyway.

It would really have been helpful over the weekend to have just been
able to look around in sysfs to see what was going on with namespaces.
I'm not saying it can't be done in a debugger, or by poking at
backtraces, but it's easier to say to the many people who had this
problem in Fedora kernels "tell me what this file says" just like we do
for slab, memory, whatever other information. Just a suggestion.

> And what do you want to see at it?

Just an ability to walk through which namespaces exist (debugfs even)
and which resources are currently assigned to them. Somehow. I didn't
give it a lot of thought, but something would be useful.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 7 of 7 ==
Date: Wed, Feb 3 2010 12:10 pm
From: Alexey Dobriyan


On Wed, Feb 03, 2010 at 02:53:58PM -0500, Jon Masters wrote:
> > As a developer you fire a debugger and look at net_namespace_list.
>
> Yeah, but being able to cat a nice file is always handy.

And what do you want to see at it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: regression in 2.6.27.45 with usb and suspend-to-disk
http://groups.google.com/group/linux.kernel/t/f18f89f344fa881c?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 11:50 am
From: Corey Wright


On Wed, 3 Feb 2010 13:09:45 -0500 (EST)
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Wed, 3 Feb 2010, Corey Wright wrote:
>
> > > Please build a kernel with CONFIG_USB_DEBUG enabled and post a dmesg
> > > log showing the problem during the second hibernation attempt.
> >
> > [ 675.443474] PM: Syncing filesystems ... done.
> > [ 675.576353] Freezing user space processes ... (elapsed 0.00 seconds)
> > done.
> > [ 675.606968] Freezing remaining freezable tasks ... (elapsed 0.00
> > seconds) done.
> > [ 675.629249] PM: Shrinking memory... - \ | done (3129 pages freed)
> > [ 679.547522] PM: Freed 12516 kbytes in 3.90 seconds (3.20 MB/s)
> > [ 679.565065] Suspending console(s) (use no_console_suspend to debug)
> > [ 679.585425] uhci_hcd 0000:00:10.0: release dev 2 ep81-INT, period 8,
> > phase 4, 118 us
> > [ 679.585453] hub 2-0:1.0: hub_suspend
> > [ 679.585460] usb usb2: bus suspend
> > [ 679.585463] usb usb2: suspend_rh
> > [ 679.585534] hub 1-0:1.0: hub_suspend
> > [ 679.585538] usb usb1: bus suspend
> > [ 679.585542] ehci_hcd 0000:00:10.4: suspend root hub
> > [ 679.585547] ehci_hcd 0000:00:10.4: suspend failed because port 8 is
> > resuming
>
> There you have it. What is the USB device attached to port 8 on bus 1
> (i.e., /sys/bus/usb/devices/1-8)?

the device is a usb SD reader w/ SD card in it.

# cat /sys/bus/usb/devices/1-8/product
USB 2.0 SD MMC READER

i removed the media reader after the failed suspend (it is only used at
boot-up to hold the LUKS key material to decrypt the filesystem) but the
message is the same:

[ 4002.585329] ehci_hcd 0000:00:10.4: suspend root hub
[ 4002.585334] ehci_hcd 0000:00:10.4: suspend failed because port 8 is
resuming
[ 4002.585338] usb usb1: bus suspend fail, err -16

and /sys/bus/usb/devices/1-8 no longer exists (little
alone /sys/bus/usb/devices/1-8/power/wakeup):

# ls -1 /sys/bus/usb/devices/
1-0:1.0
1-7
1-7:1.0
2-0:1.0
2-1
2-1:1.0
3-0:1.0
4-0:1.0
5-0:1.0
usb1
usb2
usb3
usb4
usb5

> And what happens if you do:
>
> echo disabled >/sys/bus/usb/devices/1-8/power/wakeup
>
> before trying to hibernate?

it's the same immediately after a reboot (and before suspending), a
successful suspend, or a failed suspend:

# echo disabled >/sys/bus/usb/devices/1-8/power/wakeup
bash: echo: write error: Invalid argument

and as previously said, that sysfs entry does not exist if the media reader
is removed, though the error appears to continue to refer to that
port/device.

corey
--
undefined@pobox.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: radio-si470x-common: -EINVAL overwritten in si470x_vidioc_s_tuner()
http://groups.google.com/group/linux.kernel/t/b9df83e9a4867ffc?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 11:50 am
From: Roel Kluin


The -EINVAL was overwritten by the si470x_disconnect_check().

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
Is this needed?

diff --git a/drivers/media/radio/si470x/radio-si470x-common.c b/drivers/media/radio/si470x/radio-si470x-common.c
index 4da0f15..65b4a92 100644
--- a/drivers/media/radio/si470x/radio-si470x-common.c
+++ b/drivers/media/radio/si470x/radio-si470x-common.c
@@ -748,12 +748,13 @@ static int si470x_vidioc_s_tuner(struct file *file, void *priv,
struct v4l2_tuner *tuner)
{
struct si470x_device *radio = video_drvdata(file);
- int retval = -EINVAL;
+ int retval;

/* safety checks */
retval = si470x_disconnect_check(radio);
if (retval)
goto done;
+ retval = -EINVAL;

if (tuner->index != 0)
goto done;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: Improving OOM killer
http://groups.google.com/group/linux.kernel/t/389db2dcf6479d30?hl=en
==============================================================================

== 1 of 4 ==
Date: Wed, Feb 3 2010 11:50 am
From: Frans Pop


David Rientjes wrote:
> /*
> * /proc/pid/oom_adj ranges from -1000 to +1000 to either
> * completely disable oom killing or always prefer it.
> */
> points += p->signal->oom_adj;
>

Wouldn't that cause a rather huge compatibility issue given that the
current oom_adj works in a totally different way:

! 3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
! ------------------------------------------------------
! This file can be used to adjust the score used to select which processes
! should be killed in an out-of-memory situation. Giving it a high score
! will increase the likelihood of this process being killed by the
! oom-killer. Valid values are in the range -16 to +15, plus the special
! value -17, which disables oom-killing altogether for this process.

?

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 4 ==
Date: Wed, Feb 3 2010 12:00 pm
From: David Rientjes


On Wed, 3 Feb 2010, Frans Pop wrote:

> > * /proc/pid/oom_adj ranges from -1000 to +1000 to either
> > * completely disable oom killing or always prefer it.
> > */
> > points += p->signal->oom_adj;
> >
>
> Wouldn't that cause a rather huge compatibility issue given that the
> current oom_adj works in a totally different way:
>
> ! 3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
> ! ------------------------------------------------------
> ! This file can be used to adjust the score used to select which processes
> ! should be killed in an out-of-memory situation. Giving it a high score
> ! will increase the likelihood of this process being killed by the
> ! oom-killer. Valid values are in the range -16 to +15, plus the special
> ! value -17, which disables oom-killing altogether for this process.
>
> ?
>

I thought about whether we'd need an additional, complementary tunable
such as /proc/pid/oom_bias that would effect this new memory-charging bias
in the heuristic. It could be implemented so that writing to oom_adj
would clear oom_bias and vice versa.

Although that would certainly be possible, I didn't propose it for a
couple of reasons:

- it would clutter the space to have two seperate tunables when the
metrics that /proc/pid/oom_adj uses has become obsolete by the new
baseline as a fraction of total RAM, and

- we have always exported OOM_DISABLE, OOM_ADJUST_MIN, and OOM_ADJUST_MAX
via include/oom.h so that userspace should use them sanely. Setting
a particular oom_adj value for anything other than OOM_DISABLE means
the score will be relative to other system tasks, so its a value that
is typically calibrated at runtime rather than static, hardcoded
values.

We could reuse /proc/pid/oom_adj for the new heuristic by severely
reducing its granularity than it otherwise would by doing
(oom_adj * 1000 / OOM_ADJUST_MAX), but that will eventually become
annoying and much more difficult to document.

Given your citation, I don't think we've ever described /proc/pid/oom_adj
outside of the implementation as a bitshift, either. So its use right now
for anything other than OOM_DISABLE is probably based on scalar thinking.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 3 of 4 ==
Date: Wed, Feb 3 2010 12:20 pm
From: Frans Pop


On Wednesday 03 February 2010, David Rientjes wrote:
> - we have always exported OOM_DISABLE, OOM_ADJUST_MIN, and
> OOM_ADJUST_MAX via include/oom.h so that userspace should use them
> sanely. Setting a particular oom_adj value for anything other than
> OOM_DISABLE means the score will be relative to other system tasks, so
> its a value that is typically calibrated at runtime rather than static,
> hardcoded values.

That doesn't take into account:
- applications where the oom_adj value is hardcoded to a specific value
(for whatever reason)
- sysadmin scripts that set oom_adj from the console

I would think that oom_adj is a documented part of the userspace ABI and
that the change you propose does not fit the normal backwards
compatibility requirements for exposed tunables.

I think that at least any user who's currently setting oom_adj to -17 has a
right to expect that to continue to mean "oom killer disabled". And for
any other value they should get a similar impact to the current impact,
and not one that's reduced by a factor 66.

> We could reuse /proc/pid/oom_adj for the new heuristic by severely
> reducing its granularity than it otherwise would by doing
> (oom_adj * 1000 / OOM_ADJUST_MAX), but that will eventually become
> annoying and much more difficult to document.

Probably quite true, but maybe unavoidable if one accepts the above.

But I'll readily admit I'm not the final authority on this.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 4 of 4 ==
Date: Wed, Feb 3 2010 12:30 pm
From: David Rientjes


On Wed, 3 Feb 2010, Frans Pop wrote:

> That doesn't take into account:
> - applications where the oom_adj value is hardcoded to a specific value
> (for whatever reason)
> - sysadmin scripts that set oom_adj from the console
>

The fundamentals are the same: negative values mean the task is less
likely to be preferred and positive values mean the task is more likely,
only the scale is different. That scale is exported by the kernel via
OOM_ADJUST_MIN and OOM_ADJUST_MAX and has been since 2006. I don't think
we need to preserve legacy applications or scripts that use hardcoded
values without importing linux/oom.h.

> I would think that oom_adj is a documented part of the userspace ABI and
> that the change you propose does not fit the normal backwards
> compatibility requirements for exposed tunables.
>

The range is documented (but it should have been documented as being from
OOM_ADJUST_MIN to OOM_ADJUST_MAX) but its implementation as a bitshift is
not; it simply says that positive values mean the task is more preferred
and negative values mean it is less preferred. Those semantics are
preserved.

> I think that at least any user who's currently setting oom_adj to -17 has a
> right to expect that to continue to mean "oom killer disabled". And for
> any other value they should get a similar impact to the current impact,
> and not one that's reduced by a factor 66.
>

If the baseline changes as we all agree it needs to such that oom_adj no
longer represents the same thing it did in the first place (it would
become a linear bias), I think this breakage is actually beneficial.
Users will now be able to tune their oom_adj values based on a fraction of
system memory to bias their applications either preferrably or otherwise.

I think we should look at Linux over the next couple of years and decide
if we want to be married to the current semantics of oom_adj that are
going to change (as it would require being a factor of 66, as you
mentioned) when the implementation it was designed for has vanished.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: blackfin: use generic ptrace_resume code
http://groups.google.com/group/linux.kernel/t/5497cb48daecd717?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 11:50 am
From: Christoph Hellwig


On Wed, Feb 03, 2010 at 02:36:47PM -0500, Mike Frysinger wrote:
> i added tracehook support to Blackfin recently, so that covered all
> the new functions here. i just had to drop the handling of the
> PTRACE_xxx things that common code already does.
>
> when did you want to push through these updates ? i was planning on
> sending these ptrace() updates through the Blackfin tree as part of my
> 2.6.34 queue. i'm guessing you didnt want this stuff in 2.6.33 ...

No, it's all .34 material. I'd say send your bits through the
microblaze tree. The double prototype for the two functions won't
hurt during the merge window and we can fix it up later. Just make
sure you really have user_{enable,disable}_single_step implement
as functions and not as a #define for ptrace_disable as in your
earlier version.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: new hardware support for atang tree
http://groups.google.com/group/linux.kernel/t/61c03fbc95cec86c?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 12:00 pm
From: Jeff Garzik


On 02/03/2010 01:40 PM, Bartlomiej Zolnierkiewicz wrote:
>
> Since I haven't seen any progress on mainstreaming of s3c-ide and
> pata_altera_cf patches in the recent months I decided to merge them

pata_altera_cf still needs to be updated to v3, taking into account
Alan's feedback. A ping in Nov 2009 resulted in no response.

Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: bitops: compile time optimization for hweight_long(CONSTANT)
http://groups.google.com/group/linux.kernel/t/f58ac10e7917a328?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 12:00 pm
From: "H. Peter Anvin"


On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
>
>> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
>
> Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> and hweight64() needs a bit of magic for 32bit, but yes, something like
> that ought to work nicely.
>

Arguably the "best" option is to have the alternative being a jump to an
out-of-line stub which does the necessary parameter marshalling before
calling a stub. This technique is already used in a few other places.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mxc: Core support for i.MX5 series of processors from Freescale
http://groups.google.com/group/linux.kernel/t/281ca1a532ca6d76?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Feb 3 2010 12:10 pm
From: Amit Kucheria


On 10 Feb 03, Sascha Hauer wrote:
> On Tue, Feb 02, 2010 at 09:16:27PM -0800, Amit Kucheria wrote:
> > From: Amit Kucheria <amit.kucheria@verdurent.com>
> >
> > Add basic clock support, cpu identification, I/O mapping and serial port.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@canonical.com>
> > ---
> > arch/arm/mach-mx5/clock.c | 848 ++++++++++++++++++++++++++
> > arch/arm/mach-mx5/cpu.c | 45 ++
> > arch/arm/mach-mx5/crm_regs.h | 583 ++++++++++++++++++
> > arch/arm/mach-mx5/devices.c | 96 +++
> > arch/arm/mach-mx5/devices.h | 4 +
> > arch/arm/mach-mx5/mm.c | 88 +++
> > arch/arm/plat-mxc/include/mach/common.h | 1 +
> > arch/arm/plat-mxc/include/mach/debug-macro.S | 4 +-
> > arch/arm/plat-mxc/include/mach/iomux-mx51.h | 340 +++++++++++
> > arch/arm/plat-mxc/include/mach/mx51.h | 454 ++++++++++++++
> > 10 files changed, 2461 insertions(+), 2 deletions(-)
> > create mode 100644 arch/arm/mach-mx5/clock.c
> > create mode 100644 arch/arm/mach-mx5/cpu.c
> > create mode 100644 arch/arm/mach-mx5/crm_regs.h
> > create mode 100644 arch/arm/mach-mx5/devices.c
> > create mode 100644 arch/arm/mach-mx5/devices.h
> > create mode 100644 arch/arm/mach-mx5/mm.c
> > create mode 100644 arch/arm/plat-mxc/include/mach/iomux-mx51.h
> > create mode 100644 arch/arm/plat-mxc/include/mach/mx51.h
> >
> > diff --git a/arch/arm/mach-mx5/clock.c b/arch/arm/mach-mx5/clock.c
>
> Can we rename this file to clock-mx51.c? We made this mistake on other
> i.MX platforms and ended with a clock.c and a clock-mx35.c in the same
> directory.

Will fix.

> > new file mode 100644
> > index 0000000..595f966
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/clock.c
> > @@ -0,0 +1,848 @@
> > +/*
> > + * Copyright 2008-2010 Freescale Semiconductor, Inc. All Rights Reserved.
> > + * Copyright (C) 2009-2010 Amit Kucheria <amit.kucheria@canonical.com>
> > + *
> > + * The code contained herein is licensed under the GNU General Public
> > + * License. You may obtain a copy of the GNU General Public License
> > + * Version 2 or later at the following locations:
> > + *
> > + * http://www.opensource.org/licenses/gpl-license.html
> > + * http://www.gnu.org/copyleft/gpl.html
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/delay.h>
> > +#include <linux/clk.h>
> > +#include <linux/io.h>
> > +
> > +#include <asm/clkdev.h>
> > +
> > +#include <mach/hardware.h>
> > +#include <mach/common.h>
> > +#include <mach/clock.h>
> > +
> > +#include "crm_regs.h"
> > +
> > +static void __iomem *pll_base[] = {
> > + MX51_DPLL1_BASE,
> > + MX51_DPLL2_BASE,
> > + MX51_DPLL3_BASE,
> > +};
> > +
> > +/* External clock values passed-in by the board code */
> > +static unsigned long external_high_reference, external_low_reference;
> > +static unsigned long oscillator_reference, ckih2_reference;
> > +
> > +static struct clk osc_clk;
> > +static struct clk pll1_main_clk;
> > +static struct clk pll1_sw_clk;
> > +static struct clk pll2_sw_clk;
> > +static struct clk pll3_sw_clk;
> > +static struct clk lp_apm_clk;
> > +static struct clk periph_apm_clk;
> > +static struct clk ahb_clk;
> > +static struct clk ipg_clk;
> > +
> > +#define MAX_DPLL_WAIT_TRIES 1000 /* 1000 * udelay(1) = 1ms */
> > +
> > +static int _clk_ccgr_enable(struct clk *clk)
> > +{
> > + u32 reg;
> > +
> > + reg = __raw_readl(clk->enable_reg);
> > + reg |= MXC_CCM_CCGRx_MOD_ON << clk->enable_shift;
> > + __raw_writel(reg, clk->enable_reg);
> > +
> > + return 0;
> > +}
> > +
> > +static void _clk_ccgr_disable(struct clk *clk)
> > +{
> > + u32 reg;
> > + reg = __raw_readl(clk->enable_reg);
> > + reg &= ~(MXC_CCM_CCGRx_MOD_OFF << clk->enable_shift);
> > + __raw_writel(reg, clk->enable_reg);
> > +
> > +}
> > +
> > +static void _clk_ccgr_disable_inwait(struct clk *clk)
> > +{
> > + u32 reg;
> > +
> > + reg = __raw_readl(clk->enable_reg);
> > + reg &= ~(MXC_CCM_CCGRx_CG_MASK << clk->enable_shift);
> > + reg |= MXC_CCM_CCGRx_MOD_IDLE << clk->enable_shift;
> > + __raw_writel(reg, clk->enable_reg);
> > +}
> > +
> > +/*
> > + * For the 4-to-1 muxed input clock
> > + */
> > +static inline u32 _get_mux(struct clk *parent, struct clk *m0,
> > + struct clk *m1, struct clk *m2, struct clk *m3)
> > +{
> > + if (parent == m0)
> > + return 0;
> > + else if (parent == m1)
> > + return 1;
> > + else if (parent == m2)
> > + return 2;
> > + else if (parent == m3)
> > + return 3;
> > + else
> > + BUG();
> > +
> > + return -EINVAL;
> > +}
> > +
> > +static inline void __iomem *_get_pll_base(struct clk *pll)
> > +{
> > + if (pll == &pll1_main_clk)
> > + return pll_base[0];
> > + else if (pll == &pll2_sw_clk)
> > + return pll_base[1];
> > + else if (pll == &pll3_sw_clk)
> > + return pll_base[2];
> > + else
>
> I see no purpose for the pll_base[] array. It is used only here and you
> can return the values directly.

Fixed.

> > + BUG();
> > +
> > + return NULL;
> > +}
> > +
> > +static unsigned long clk_pll_get_rate(struct clk *clk)
> > +{
> > + long mfi, mfn, mfd, pdf, ref_clk, mfn_abs;
> > + unsigned long dp_op, dp_mfd, dp_mfn, dp_ctl, pll_hfsm, dbl;
> > + void __iomem *pllbase;
> > + s64 temp;
> > + unsigned long parent_rate;
> > +
> > + parent_rate = clk_get_rate(clk->parent);
> > +
> > + pllbase = _get_pll_base(clk);
> > +
> > + dp_ctl = __raw_readl(pllbase + MXC_PLL_DP_CTL);
> > + pll_hfsm = dp_ctl & MXC_PLL_DP_CTL_HFSM;
> > + dbl = dp_ctl & MXC_PLL_DP_CTL_DPDCK0_2_EN;
> > +
> > + if (pll_hfsm == 0) {
> > + dp_op = __raw_readl(pllbase + MXC_PLL_DP_OP);
> > + dp_mfd = __raw_readl(pllbase + MXC_PLL_DP_MFD);
> > + dp_mfn = __raw_readl(pllbase + MXC_PLL_DP_MFN);
> > + } else {
> > + dp_op = __raw_readl(pllbase + MXC_PLL_DP_HFS_OP);
> > + dp_mfd = __raw_readl(pllbase + MXC_PLL_DP_HFS_MFD);
> > + dp_mfn = __raw_readl(pllbase + MXC_PLL_DP_HFS_MFN);
> > + }
> > + pdf = dp_op & MXC_PLL_DP_OP_PDF_MASK;
> > + mfi = (dp_op & MXC_PLL_DP_OP_MFI_MASK) >> MXC_PLL_DP_OP_MFI_OFFSET;
> > + mfi = (mfi <= 5) ? 5 : mfi;
> > + mfd = dp_mfd & MXC_PLL_DP_MFD_MASK;
> > + mfn = mfn_abs = dp_mfn & MXC_PLL_DP_MFN_MASK;
> > + /* Sign extend to 32-bits */
> > + if (mfn >= 0x04000000) {
> > + mfn |= 0xFC000000;
> > + mfn_abs = -mfn;
> > + }
> > +
> > + ref_clk = 2 * parent_rate;
> > + if (dbl != 0)
> > + ref_clk *= 2;
> > +
> > + ref_clk /= (pdf + 1);
> > + temp = (u64) ref_clk * mfn_abs;
> > + do_div(temp, mfd + 1);
> > + if (mfn < 0)
> > + temp = -temp;
> > + temp = (ref_clk * mfi) + temp;
> > +
> > + return temp;
> > +}
> > +
> > +static int _clk_pll_set_rate(struct clk *clk, unsigned long rate)
> > +{
> > + u32 reg;
> > + void __iomem *pllbase;
> > +
> > + long mfi, pdf, mfn, mfd = 999999;
> > + s64 temp64;
> > + unsigned long quad_parent_rate;
> > + unsigned long pll_hfsm, dp_ctl;
> > + unsigned long parent_rate;
> > +
> > + parent_rate = clk_get_rate(clk->parent);
> > +
> > + pllbase = _get_pll_base(clk);
> > +
> > + quad_parent_rate = 4 * parent_rate;
> > + pdf = mfi = -1;
> > + while (++pdf < 16 && mfi < 5)
> > + mfi = rate * (pdf+1) / quad_parent_rate;
> > + if (mfi > 15)
> > + return -1;
> > + pdf--;
> > +
> > + temp64 = rate * (pdf+1) - quad_parent_rate * mfi;
> > + do_div(temp64, quad_parent_rate/1000000);
> > + mfn = (long)temp64;
> > +
> > + dp_ctl = __raw_readl(pllbase + MXC_PLL_DP_CTL);
> > + /* use dpdck0_2 */
> > + __raw_writel(dp_ctl | 0x1000L, pllbase + MXC_PLL_DP_CTL);
> > + pll_hfsm = dp_ctl & MXC_PLL_DP_CTL_HFSM;
> > + if (pll_hfsm == 0) {
> > + reg = mfi << 4 | pdf;
> > + __raw_writel(reg, pllbase + MXC_PLL_DP_OP);
> > + __raw_writel(mfd, pllbase + MXC_PLL_DP_MFD);
> > + __raw_writel(mfn, pllbase + MXC_PLL_DP_MFN);
> > + } else {
> > + reg = mfi << 4 | pdf;
> > + __raw_writel(reg, pllbase + MXC_PLL_DP_HFS_OP);
> > + __raw_writel(mfd, pllbase + MXC_PLL_DP_HFS_MFD);
> > + __raw_writel(mfn, pllbase + MXC_PLL_DP_HFS_MFN);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int _clk_pll_enable(struct clk *clk)
> > +{
> > + u32 reg;
> > + void __iomem *pllbase;
> > + int i = 0;
> > +
> > + pllbase = _get_pll_base(clk);
> > + reg = __raw_readl(pllbase + MXC_PLL_DP_CTL) | MXC_PLL_DP_CTL_UPEN;
> > + __raw_writel(reg, pllbase + MXC_PLL_DP_CTL);
> > +
> > + /* Wait for lock */
> > + while ((!(__raw_readl(pllbase + MXC_PLL_DP_CTL) & MXC_PLL_DP_CTL_LRF))
> > + && i < MAX_DPLL_WAIT_TRIES) {
> > + i++;
> > + udelay(1);
> > + }
> > +
> > + if (i == MAX_DPLL_WAIT_TRIES) {
> > + printk(KERN_ERR "MX5: pll locking failed\n");
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void _clk_pll_disable(struct clk *clk)
> > +{
> > + u32 reg;
> > + void __iomem *pllbase;
> > +
> > + pllbase = _get_pll_base(clk);
> > + reg = __raw_readl(pllbase + MXC_PLL_DP_CTL) & ~MXC_PLL_DP_CTL_UPEN;
> > + __raw_writel(reg, pllbase + MXC_PLL_DP_CTL);
> > +}
> > +
> > +static int _clk_pll1_sw_set_parent(struct clk *clk, struct clk *parent)
> > +{
> > + u32 reg;
> > +
> > + reg = __raw_readl(MXC_CCM_CCSR);
> > +
> > + /* When switching from pll_main_clk to a bypass clock, first select a
> > + multiplexed clock in 'step_sel', then shift the glitchless mux
> > + 'pll1_sw_clk_sel'.
> > + When switching back, do it in reverse order
> > + */
> > + if (parent == &pll1_main_clk) {
> > + /* Switch to pll1_main_clk */
> > + reg &= ~MXC_CCM_CCSR_PLL1_SW_CLK_SEL;
> > + __raw_writel(reg, MXC_CCM_CCSR);
> > + /* step_clk mux switched to lp_apm, to save power. */
> > + reg = __raw_readl(MXC_CCM_CCSR);
> > + reg = (reg & ~MXC_CCM_CCSR_STEP_SEL_MASK) |
> > + (MXC_CCM_CCSR_STEP_SEL_LP_APM <<
> > + MXC_CCM_CCSR_STEP_SEL_OFFSET);
> > + } else {
> > + if (parent == &lp_apm_clk) {
> > + reg = (reg & ~MXC_CCM_CCSR_STEP_SEL_MASK) |
> > + (MXC_CCM_CCSR_STEP_SEL_LP_APM <<
> > + MXC_CCM_CCSR_STEP_SEL_OFFSET);
> > + } else if (parent == &pll2_sw_clk) {
> > + reg = (reg & ~MXC_CCM_CCSR_STEP_SEL_MASK) |
> > + (MXC_CCM_CCSR_STEP_SEL_PLL2_DIVIDED <<
> > + MXC_CCM_CCSR_STEP_SEL_OFFSET);
> > + } else if (parent == &pll3_sw_clk) {
> > + reg = (reg & ~MXC_CCM_CCSR_STEP_SEL_MASK) |
> > + (MXC_CCM_CCSR_STEP_SEL_PLL3_DIVIDED <<
> > + MXC_CCM_CCSR_STEP_SEL_OFFSET);
>
> Can we write this as
>
> reg &= ~MXC_CCM_CCSR_STEP_SEL_MASK;
> reg |= MXC_CCM_CCSR_STEP_SEL_PLL3_DIVIDED <<
> MXC_CCM_CCSR_STEP_SEL_OFFSET;
>
> At least for me this is much easier to read. Also, the &= part can be
> outside the if clause.

Fixed per Eric's comments

> > + } else
> > + return -EINVAL;
> > +
> > + __raw_writel(reg, MXC_CCM_CCSR);
> > + /* Switch to step_clk */
> > + reg = __raw_readl(MXC_CCM_CCSR);
> > + reg |= MXC_CCM_CCSR_PLL1_SW_CLK_SEL;
> > + }
> > + __raw_writel(reg, MXC_CCM_CCSR);
> > + return 0;
> > +}
> > +
> > +static unsigned long clk_pll1_sw_get_rate(struct clk *clk)
> > +{
> > + u32 reg, div;
> > + unsigned long parent_rate;
> > +
> > + parent_rate = clk_get_rate(clk->parent);
> > +
> > + div = 1;
> > + reg = __raw_readl(MXC_CCM_CCSR);
> > +
> > + if (clk->parent == &pll2_sw_clk) {
> > + div = ((reg & MXC_CCM_CCSR_PLL2_PODF_MASK) >>
> > + MXC_CCM_CCSR_PLL2_PODF_OFFSET) + 1;
> > + } else if (clk->parent == &pll3_sw_clk) {
> > + div = ((reg & MXC_CCM_CCSR_PLL3_PODF_MASK) >>
> > + MXC_CCM_CCSR_PLL3_PODF_OFFSET) + 1;
> > + }
> > + return parent_rate / div;
>
> This will return parent rate if the parent is not pll2_sw_clk and not
> pll3_sw_clk. Is this intended? If yes, you could write
>
> } else
> div = 1;
>
> to emphasize this is not an accident.

Fixed.

> > +}
> > +
> > +static int _clk_pll2_sw_set_parent(struct clk *clk, struct clk *parent)
> > +{
> > + u32 reg;
> > +
> > + reg = __raw_readl(MXC_CCM_CCSR);
> > +
> > + if (parent == &pll2_sw_clk)
> > + reg &= ~MXC_CCM_CCSR_PLL2_SW_CLK_SEL;
> > + else
> > + reg |= MXC_CCM_CCSR_PLL2_SW_CLK_SEL;
>
> What's the other clock in the else part? I think there is a check
> missing.

The other clock is the external pll2 bypass clock. And the reference manual
indicates that the bit should only be enabled for testing.

> > +
> > + __raw_writel(reg, MXC_CCM_CCSR);
> > + return 0;
> > +}
> > +
> > +static int _clk_lp_apm_set_parent(struct clk *clk, struct clk *parent)
> > +{
> > + u32 reg;
> > +
> > + if (parent == &osc_clk)
> > + reg = __raw_readl(MXC_CCM_CCSR) & ~MXC_CCM_CCSR_LP_APM_SEL;
> > + else
> > + return -EINVAL;
> > +
> > + __raw_writel(reg, MXC_CCM_CCSR);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long clk_arm_get_rate(struct clk *clk)
> > +{
> > + u32 cacrr, div;
> > + unsigned long parent_rate;
> > +
> > + parent_rate = clk_get_rate(clk->parent);
> > + cacrr = __raw_readl(MXC_CCM_CACRR);
> > + div = (cacrr & MXC_CCM_CACRR_ARM_PODF_MASK) + 1;
> > +
> > + return parent_rate / div;
> > +}
> > +
> > +static int _clk_periph_apm_set_parent(struct clk *clk, struct clk *parent)
> > +{
> > + u32 reg, mux;
> > + int i = 0;
> > +
> > + mux = _get_mux(parent, &pll1_sw_clk, &pll3_sw_clk, &lp_apm_clk, NULL);
> > +
> > + reg = __raw_readl(MXC_CCM_CBCMR) & ~MXC_CCM_CBCMR_PERIPH_CLK_SEL_MASK;
> > + reg |= mux << MXC_CCM_CBCMR_PERIPH_CLK_SEL_OFFSET;
> > + __raw_writel(reg, MXC_CCM_CBCMR);
> > +
> > + /* Wait for lock */
> > + while ((__raw_readl(MXC_CCM_CDHIPR) & MXC_CCM_CDHIPR_PERIPH_CLK_SEL_BUSY)
> > + && i < MAX_DPLL_WAIT_TRIES) {
> > + i++;
> > + udelay(1);
> > + }
> > +
> > + if (i == MAX_DPLL_WAIT_TRIES) {
> > + printk(KERN_ERR "MX5: Set parent for periph_apm clock failed\n");
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long clk_main_bus_get_rate(struct clk *clk)
> > +{
> > + return clk_get_rate(clk->parent);
> > +}
>
> The generic code will automatically return the parent rate if the
> get_rate field is set to NULL.

OK. removing...
> > +
> > +static int _clk_main_bus_set_parent(struct clk *clk, struct clk *parent)
> > +{
> > + u32 reg;
> > +
> > + reg = __raw_readl(MXC_CCM_CBCDR);
> > +
> > + if (parent == &pll2_sw_clk)
> > + reg &= ~MXC_CCM_CBCDR_PERIPH_CLK_SEL;
> > + else if (parent == &periph_apm_clk)
> > + reg |= MXC_CCM_CBCDR_PERIPH_CLK_SEL;
> > + else
> > + return -EINVAL;
> > +
> > + __raw_writel(reg, MXC_CCM_CBCDR);
> > +
> > + return 0;
> > +}
> > +
>
> [snip]
>
> > +
> > +static void clk_tree_init(void)
> > +{
> > + u32 reg;
> > +
> > + ipg_perclk.set_parent(&ipg_perclk, &lp_apm_clk);
>
> Something is wrong here. Here you set the ipg_perclk parent.
> _clk_ipg_per_set_parent will then set the MXC_CCM_CBCMR_PERCLK_IPG_CLK_SEL
> and MXC_CCM_CBCMR_PERCLK_LP_APM_CLK_SEL bits accordingly...
>
> > +
> > + /*
> > + * Initialise the IPG PER CLK dividers to 3. IPG_PER_CLK should be at
> > + * 8MHz, its derived from lp_apm.
> > + * FIXME: Verify if true for all boards
> > + */
> > + reg = __raw_readl(MXC_CCM_CBCDR);
> > + reg &= ~MXC_CCM_CBCDR_PERCLK_PRED1_MASK;
> > + reg &= ~MXC_CCM_CBCDR_PERCLK_PRED2_MASK;
> > + reg &= ~MXC_CCM_CBCDR_PERCLK_PODF_MASK;
> > + reg |= (2 << MXC_CCM_CBCDR_PERCLK_PRED1_OFFSET);
> > + __raw_writel(reg, MXC_CCM_CBCDR);
> > +
> > + /* set parent for pll1, pll2 and pll3 */
> > + pll1_main_clk.parent = &osc_clk;
> > + pll2_sw_clk.parent = &osc_clk;
> > + pll3_sw_clk.parent = &osc_clk;
> > +
> > + /* set ipg_perclk parent */
> > + ipg_perclk.parent = &lp_apm_clk;
> > + reg = __raw_readl(MXC_CCM_CBCMR);
> > + if ((reg & MXC_CCM_CBCMR_PERCLK_IPG_CLK_SEL) != 0) {
> > + ipg_perclk.parent = &ipg_clk;
> > + } else {
> > + if ((reg & MXC_CCM_CBCMR_PERCLK_LP_APM_CLK_SEL) == 0)
> > + ipg_perclk.parent = &main_bus_clk;
> > + }
>
> ...And here you set the parent according to the register bits. What's
> the intention here? Do you want to keep the bootloader settings or
> do you want to overwrite them with a known value?

It does look pointless. Infact, since MXC_CCM_CBCMR_PERCLK_LP_APM_CLK_SEL is
set because of the call on the top, this entire if statement does nothing
new.

> > +}
> > +
> > +int __init mx51_clocks_init(unsigned long ckil, unsigned long osc,
> > + unsigned long ckih1, unsigned long ckih2)
> > +{
> > + int i;
> > +
> > + external_low_reference = ckil;
> > + external_high_reference = ckih1;
> > + ckih2_reference = ckih2;
> > + oscillator_reference = osc;
> > +
> > + for (i = 0; i < ARRAY_SIZE(lookups); i++)
> > + clkdev_add(&lookups[i]);
> > +
> > + clk_tree_init();
> > +
> > + clk_enable(&cpu_clk);
> > + clk_enable(&main_bus_clk);
> > +
> > + /* System timer */
> > + mxc_timer_init(&gpt_clk, MX51_IO_ADDRESS(MX51_GPT1_BASE_ADDR),
> > + MX51_MXC_INT_GPT);
> > + return 0;
> > +}
> > diff --git a/arch/arm/mach-mx5/cpu.c b/arch/arm/mach-mx5/cpu.c
> > new file mode 100644
> > index 0000000..93f1d5a
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/cpu.c
> > @@ -0,0 +1,45 @@
> > +/*
> > + * Copyright 2008-2009 Freescale Semiconductor, Inc. All Rights Reserved.
> > + *
> > + * The code contained herein is licensed under the GNU General Public
> > + * License. You may obtain a copy of the GNU General Public License
> > + * Version 2 or later at the following locations:
> > + *
> > + * http://www.opensource.org/licenses/gpl-license.html
> > + * http://www.gnu.org/copyleft/gpl.html
> > + *
> > + * This file contains the CPU initialization code.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/kernel.h>
> > +#include <linux/init.h>
> > +#include <mach/hardware.h>
> > +#include <asm/io.h>
> > +
> > +static int __init post_cpu_init(void)
> > +{
> > + unsigned int reg;
> > + void __iomem *base;
> > +
> > + if (cpu_is_mx51()) {
>
> if (!cpu_is_mx51())
> return 0;
>
> please. This way we have one indention level more in case this function
> gets more complicated.
>

Neat. Fixed.

> > + base = MX51_IO_ADDRESS(MX51_AIPS1_BASE_ADDR);
> > + __raw_writel(0x0, base + 0x40);
> > + __raw_writel(0x0, base + 0x44);
> > + __raw_writel(0x0, base + 0x48);
> > + __raw_writel(0x0, base + 0x4C);
> > + reg = __raw_readl(base + 0x50) & 0x00FFFFFF;
> > + __raw_writel(reg, base + 0x50);
> > +
> > + base = MX51_IO_ADDRESS(MX51_AIPS2_BASE_ADDR);
> > + __raw_writel(0x0, base + 0x40);
> > + __raw_writel(0x0, base + 0x44);
> > + __raw_writel(0x0, base + 0x48);
> > + __raw_writel(0x0, base + 0x4C);
> > + reg = __raw_readl(base + 0x50) & 0x00FFFFFF;
> > + __raw_writel(reg, base + 0x50);
> > + }
> > + return 0;
> > +}
> > +
> > +postcore_initcall(post_cpu_init);
> > diff --git a/arch/arm/mach-mx5/crm_regs.h b/arch/arm/mach-mx5/crm_regs.h
> > new file mode 100644
> > index 0000000..c776b9a
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/crm_regs.h
>
> [snip]
>
> > diff --git a/arch/arm/mach-mx5/devices.c b/arch/arm/mach-mx5/devices.c
> > new file mode 100644
> > index 0000000..55eb089
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/devices.c
> > @@ -0,0 +1,96 @@
> > +/*
> > + * Copyright 2009 Amit Kucheria <amit.kucheria@canonical.com>
> > + *
> > + * The code contained herein is licensed under the GNU General Public
> > + * License. You may obtain a copy of the GNU General Public License
> > + * Version 2 or later at the following locations:
> > + *
> > + * http://www.opensource.org/licenses/gpl-license.html
> > + * http://www.gnu.org/copyleft/gpl.html
> > + */
> > +
> > +#include <linux/platform_device.h>
> > +#include <mach/hardware.h>
> > +#include <mach/imx-uart.h>
> > +
> > +static struct resource uart0[] = {
> > + {
> > + .start = MX51_UART1_BASE_ADDR,
> > + .end = MX51_UART1_BASE_ADDR + 0x0B5,
>
> You can safely write MX51_UART1_BASE_ADDR + 0xfff here because that's
> the register space this device actually has. The last register in the
> datasheet is 0xb4 anyway and with 32bit register accesses the correct
> value here would be 0xb7.

Oops. Fixed.

> > + .flags = IORESOURCE_MEM,
> > + }, {
> > + .start = MX51_MXC_INT_UART1,
> > + .end = MX51_MXC_INT_UART1,
> > + .flags = IORESOURCE_IRQ,
> > + },
> > +};
> > +
> > +struct platform_device mxc_uart_device0 = {
> > + .name = "imx-uart",
> > + .id = 0,
> > + .resource = uart0,
> > + .num_resources = ARRAY_SIZE(uart0),
> > +};
> > +
> > +static struct resource uart1[] = {
> > + {
> > + .start = MX51_UART2_BASE_ADDR,
> > + .end = MX51_UART2_BASE_ADDR + 0x0B5,
> > + .flags = IORESOURCE_MEM,
> > + }, {
> > + .start = MX51_MXC_INT_UART2,
> > + .end = MX51_MXC_INT_UART2,
> > + .flags = IORESOURCE_IRQ,
> > + },
> > +};
> > +
> > +struct platform_device mxc_uart_device1 = {
> > + .name = "imx-uart",
> > + .id = 1,
> > + .resource = uart1,
> > + .num_resources = ARRAY_SIZE(uart1),
> > +};
> > +
> > +static struct resource uart2[] = {
> > + {
> > + .start = MX51_UART3_BASE_ADDR,
> > + .end = MX51_UART3_BASE_ADDR + 0x0B5,
> > + .flags = IORESOURCE_MEM,
> > + }, {
> > + .start = MX51_MXC_INT_UART3,
> > + .end = MX51_MXC_INT_UART3,
> > + .flags = IORESOURCE_IRQ,
> > + },
> > +};
> > +
> > +struct platform_device mxc_uart_device2 = {
> > + .name = "imx-uart",
> > + .id = 2,
> > + .resource = uart2,
> > + .num_resources = ARRAY_SIZE(uart2),
> > +};
> > +
> > +static struct resource mxc_fec_resources[] = {
> > + {
> > + .start = MX51_MXC_FEC_BASE_ADDR,
> > + .end = MX51_MXC_FEC_BASE_ADDR + 0xfff,
> > + .flags = IORESOURCE_MEM,
> > + }, {
> > + .start = MX51_MXC_INT_FEC,
> > + .end = MX51_MXC_INT_FEC,
> > + .flags = IORESOURCE_IRQ,
> > + },
> > +};
> > +
> > +struct platform_device mxc_fec_device = {
> > + .name = "fec",
> > + .id = 0,
> > + .num_resources = ARRAY_SIZE(mxc_fec_resources),
> > + .resource = mxc_fec_resources,
> > +};
> > +
> > +/* Dummy definition to allow compiling in AVIC and TZIC simultaneously */
> > +int __init mxc_register_gpios(void)
> > +{
> > + return 0;
> > +}
> > diff --git a/arch/arm/mach-mx5/devices.h b/arch/arm/mach-mx5/devices.h
> > new file mode 100644
> > index 0000000..f339ab8
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/devices.h
> > @@ -0,0 +1,4 @@
> > +extern struct platform_device mxc_uart_device0;
> > +extern struct platform_device mxc_uart_device1;
> > +extern struct platform_device mxc_uart_device2;
> > +extern struct platform_device mxc_fec_device;
> > diff --git a/arch/arm/mach-mx5/mm.c b/arch/arm/mach-mx5/mm.c
> > new file mode 100644
> > index 0000000..d66c31a
> > --- /dev/null
> > +++ b/arch/arm/mach-mx5/mm.c
> > @@ -0,0 +1,88 @@
> > +/*
> > + * Copyright 2008-2009 Freescale Semiconductor, Inc. All Rights Reserved.
> > + *
> > + * The code contained herein is licensed under the GNU General Public
> > + * License. You may obtain a copy of the GNU General Public License
> > + * Version 2 or later at the following locations:
> > + *
> > + * http://www.opensource.org/licenses/gpl-license.html
> > + * http://www.gnu.org/copyleft/gpl.html
> > + *
> > + * Create static mapping between physical to virtual memory.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/init.h>
> > +
> > +#include <asm/mach/map.h>
> > +
> > +#include <mach/hardware.h>
> > +#include <mach/common.h>
> > +#include <mach/iomux-v3.h>
> > +
> > +/*
> > + * Define the MX51 memory map.
> > + */
> > +static struct map_desc mxc_io_desc[] __initdata = {
> > + {
> > + .virtual = MX51_IRAM_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_IRAM_BASE_ADDR),
> > + .length = MX51_IRAM_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_DEBUG_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_DEBUG_BASE_ADDR),
> > + .length = MX51_DEBUG_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_TZIC_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_TZIC_BASE_ADDR),
> > + .length = MX51_TZIC_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_AIPS1_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_AIPS1_BASE_ADDR),
> > + .length = MX51_AIPS1_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_SPBA0_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_SPBA0_BASE_ADDR),
> > + .length = MX51_SPBA0_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_AIPS2_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_AIPS2_BASE_ADDR),
> > + .length = MX51_AIPS2_SIZE,
> > + .type = MT_DEVICE},
> > + {
> > + .virtual = MX51_NFC_AXI_BASE_ADDR_VIRT,
> > + .pfn = __phys_to_pfn(MX51_NFC_AXI_BASE_ADDR),
> > + .length = MX51_NFC_AXI_SIZE,
> > + .type = MT_DEVICE},
> > +};
> > +
> > +/*
> > + * This function initializes the memory map. It is called during the
> > + * system startup to create static physical to virtual memory mappings
> > + * for the IO modules.
> > + */
> > +void __init mx51_map_io(void)
> > +{
> > + u32 tzic_addr;
> > +
> > + if (mx51_revision() < MX51_CHIP_REV_2_0)
> > + tzic_addr = 0x8FFFC000;
> > + else
> > + tzic_addr = 0xE0003000;
> > + mxc_io_desc[2].pfn = __phys_to_pfn(tzic_addr);
> > +
> > + mxc_set_cpu_type(MXC_CPU_MX51);
> > + mxc_iomux_v3_init(MX51_IO_ADDRESS(MX51_IOMUXC_BASE_ADDR));
> > + mxc_arch_reset_init(MX51_IO_ADDRESS(MX51_WDOG_BASE_ADDR));
> > + iotable_init(mxc_io_desc, ARRAY_SIZE(mxc_io_desc));
> > +}
> > +
> > +void __init mx51_init_irq(void)
> > +{
> > + tzic_init_irq(MX51_IO_ADDRESS(MX51_TZIC_BASE_ADDR));
> > +}
> > diff --git a/arch/arm/plat-mxc/include/mach/common.h b/arch/arm/plat-mxc/include/mach/common.h
> > index 5250a3f..0a25576 100644
> > --- a/arch/arm/plat-mxc/include/mach/common.h
> > +++ b/arch/arm/plat-mxc/include/mach/common.h
> > @@ -30,6 +30,7 @@ extern void mx25_init_irq(void);
> > extern void mx27_init_irq(void);
> > extern void mx31_init_irq(void);
> > extern void mx35_init_irq(void);
> > +extern void mx51_init_irq(void);
> > extern void mxc91231_init_irq(void);
> > extern void mxc_timer_init(struct clk *timer_clk, void __iomem *, int);
> > extern int mx1_clocks_init(unsigned long fref);
> > diff --git a/arch/arm/plat-mxc/include/mach/debug-macro.S b/arch/arm/plat-mxc/include/mach/debug-macro.S
> > index 9fe7300..9d41bfd 100644
> > --- a/arch/arm/plat-mxc/include/mach/debug-macro.S
> > +++ b/arch/arm/plat-mxc/include/mach/debug-macro.S
> > @@ -49,8 +49,8 @@
> > #error "CONFIG_DEBUG_LL is incompatible with multiple archs"
> >

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate