linux.kernel - 25 new messages in 19 topics - digest
linux.kernel
http://groups.google.com/group/linux.kernel?hl=en
Today's topics:
* 2.6.33 pagemap endless read loop - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/5407a156928ecd21?hl=en
* Make Intel 8-way Xeons boot again - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/db38ed3ce2e471d0?hl=en
* BAR 0: can't allocate resource - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/fcc38db87a1444c0?hl=en
* ksym_tracer: Fix to make the tracer work - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/d229f17fd7404338?hl=en
* High cpu temperature with 2.6.32, bisection shows commit 69d258 (fwd) - 2
messages, 2 authors
http://groups.google.com/group/linux.kernel/t/64eb218a25568a2e?hl=en
* [Bugfix][x86][hw-breakpoint] Fix return-code to notifier chain in hw_
breakpoint_handler - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1dfa78998a656c21?hl=en
* Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid() - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/197d4e49668b93ac?hl=en
* MSI broken in libata? - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a859f84b78879515?hl=en
* vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #
5) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1d3b43642dea48cb?hl=en
* generic sys_old_select - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2944132d1dd8ed87?hl=en
* generic sys_ipc wrapper - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/cf75ba704494de9b?hl=en
* introduce sys_membarrier(): process-wide memory barrier - 3 messages, 1
author
http://groups.google.com/group/linux.kernel/t/c8972d397ccbdcff?hl=en
* [PATCH 6/8] mm: handle_speculative_fault() - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/2a5e8285ffb8a998?hl=en
* Leaky Bucket qdisc - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/87748e509dbc3166?hl=en
* input: make i2c device id constant - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/a2533a41cfac50f9?hl=en
* hp-wmi: fix double free - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1ff50bb873776838?hl=en
* drivers/input/joystick/xpad.c: Add rumble support for original xbox
controller - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/bf02e6c3ba0db192?hl=en
* 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/c8fbbf0b2885d688?hl=en
* uml: fix memory leak in arch/um/os-Linux/mem.c - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/3b79503b0d1fffb8?hl=en
==============================================================================
TOPIC: 2.6.33 pagemap endless read loop
http://groups.google.com/group/linux.kernel/t/5407a156928ecd21?hl=en
==============================================================================
== 1 of 2 ==
Date: Sat, Jan 9 2010 6:20 pm
From: Andi Kleen
An LTP run on x86-64/2.6.33-rc3 run right now results in "proc01" hanging
lr-x------ 1 root root 64 2010-01-10 02:28 7 -> /proc/2679/task/2679/pagemap
strace shows an endless loop of
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 10
cat of that file hangs in the same way so it seems like a kernel bug.
A cat of /proc/2679/pagemap also hangs, in fact any pagemap cat seems to.
I tried to revert the latest change from Horiguchi-san:
commit 5dc37642cbce34619e4588a9f0bdad1d2f870956
Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Mon Dec 14 18:00:01 2009 -0800
mm hugetlb: add hugepage support to pagemap
but that didn't fix it, so it must be something else.
Haven't done further bisect or anything, should be easy to reproduce.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Sat, Jan 9 2010 10:40 pm
From: Américo Wang
On Sun, Jan 10, 2010 at 03:09:55AM +0100, Andi Kleen wrote:
>
>An LTP run on x86-64/2.6.33-rc3 run right now results in "proc01" hanging
>
>lr-x------ 1 root root 64 2010-01-10 02:28 7 -> /proc/2679/task/2679/pagemap
>
>strace shows an endless loop of
>
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 10
>
>cat of that file hangs in the same way so it seems like a kernel bug.
>A cat of /proc/2679/pagemap also hangs, in fact any pagemap cat seems to.
>
Are you sure?
On my 32bit machine, it runs for a long time, but finally exits.
From the document of pagemap, it is the mapping of all the virtual pages
of a process, on x86-32, it should be 4G/4K = 1024*1024 pages, also
that when you 'cat' it, tty layer is involved to display it too.
--
Live like a child, think like the god.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Make Intel 8-way Xeons boot again
http://groups.google.com/group/linux.kernel/t/db38ed3ce2e471d0?hl=en
==============================================================================
== 1 of 2 ==
Date: Sat, Jan 9 2010 6:40 pm
From: Ananth N Mavinakayanahalli
On Sat, Jan 09, 2010 at 01:13:39PM -0800, Yinghai Lu wrote:
> On Sat, Jan 9, 2010 at 2:10 AM, Ananth N Mavinakayanahalli
> <ananth@in.ibm.com> wrote:
> > On an 8-way system with Intel Xeon X7350 CPUs, booting 2.6.32 or newer
> > kernels fails at:
> >
> > ...
> > CPU0: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz stepping 0b
> > Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok.
> > Brought up 8 CPUs
> > Total of 8 processors activated (46906.05 BogoMIPS).
> >
> > Git bisect showed 2fbd07a5f as the offending commit.
> >
> > With the patch below, I am able to boot the latest Linus' git tree on
> > the machine. If this patch is correct, it needs to get into the stable
> > tree too.
> >
> > Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > ---
> > Index: linux-2.6/arch/x86/kernel/apic/probe_64.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/apic/probe_64.c 2010-01-09 14:54:29.000000000 +0530
> > +++ linux-2.6/arch/x86/kernel/apic/probe_64.c 2010-01-09 14:57:53.000000000 +0530
> > @@ -70,7 +70,7 @@
> > if (apic == &apic_flat) {
> > switch (boot_cpu_data.x86_vendor) {
> > case X86_VENDOR_INTEL:
> > - if (num_processors > 8)
> > + if (num_processors >= 8)
> > apic = &apic_physflat;
> > break;
> > case X86_VENDOR_AMD:
>
> can you send out whole bootlog with apic=debug?
Here it is:
Linux version 2.6.33-rc3-bsect (ananth@llm69.in.ibm.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Sun Jan 10 07:36:02 IST 2010
Command line: ro root=LABEL=/ rhgb console=tty0 console=ttyS0,9600n1 apic=debug
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bff4b480 (usable)
BIOS-e820: 00000000bff4b480 - 00000000bff57b40 (ACPI data)
BIOS-e820: 00000000bff57b40 - 00000000c0000000 (reserved)
BIOS-e820: 00000000d0000000 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000840000000 (usable)
NX (Execute Disable) protection: active
DMI 2.4 present.
No AGP bridge found
last_pfn = 0x840000 max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
last_pfn = 0xbff4b max_arch_pfn = 0x400000000
Scan SMP from ffff880000000000 for 1024 bytes.
Scan SMP from ffff88000009fc00 for 1024 bytes.
Scan SMP from ffff8800000f0000 for 65536 bytes.
Scan SMP from ffff88000009bc00 for 1024 bytes.
found SMP MP-table at [ffff88000009bd40] 9bd40
mpc: 9d920-9dc84
init_memory_mapping: 0000000000000000-00000000bff4b000
init_memory_mapping: 0000000100000000-0000000840000000
RAMDISK: 37d4d000 - 37fef9e3
ACPI: RSDP 000000000009bde0 00014 (v00 M IB)
ACPI: RSDT 00000000bff57ac0 00044 (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: FACP 00000000bff57900 000F4 (v03 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: DSDT 00000000bff4b480 021B5 (v01 IBM EXA01ZEU 00001000 INTL 20060707)
ACPI: FACS 00000000bff53780 00040
ACPI: APIC 00000000bff57800 000F4 (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: SRAT 00000000bff57700 00100 (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: HPET 00000000bff576c0 00038 (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: TCPA 00000000bff57640 00064 (v02 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: MCFG 00000000bff57600 0003C (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: ERST 00000000bff537c0 00230 (v01 IBM EXA01ZEU 00001000 IBM 45444F43)
ACPI: SSDT 00000000bff4d640 05686 (v01 IBM VIGSSDT0 00001000 INTL 20060707)
SRAT: PXM 0 -> APIC 0x0c -> Node 0
SRAT: PXM 0 -> APIC 0x10 -> Node 0
SRAT: PXM 0 -> APIC 0x0d -> Node 0
SRAT: PXM 0 -> APIC 0x11 -> Node 0
SRAT: PXM 0 -> APIC 0x0e -> Node 0
SRAT: PXM 0 -> APIC 0x12 -> Node 0
SRAT: PXM 0 -> APIC 0x0f -> Node 0
SRAT: PXM 0 -> APIC 0x13 -> Node 0
SRAT: Node 0 PXM 0 0-c0000000
SRAT: Node 0 PXM 0 100000000-840000000
Bootmem setup node 0 0000000000000000-0000000840000000
NODE_DATA [0000000000028000 - 000000000002efff]
bootmap [0000000000100000 - 0000000000207fff] pages 108
(13 early reservations) ==> bootmem [0000000000 - 0840000000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0001000000 - 0001d4b2b0] TEXT DATA BSS ==> [0001000000 - 0001d4b2b0]
#2 [0037d4d000 - 0037fef9e3] RAMDISK ==> [0037d4d000 - 0037fef9e3]
#3 [0001d4c000 - 0001d4c350] BRK ==> [0001d4c000 - 0001d4c350]
#4 [000009bc00 - 000009bd40] BIOS reserved ==> [000009bc00 - 000009bd40]
#5 [000009bd40 - 000009bd50] MP-table mpf ==> [000009bd40 - 000009bd50]
#6 [000009bd50 - 000009d920] BIOS reserved ==> [000009bd50 - 000009d920]
#7 [000009dc84 - 0000100000] BIOS reserved ==> [000009dc84 - 0000100000]
#8 [000009d920 - 000009dc84] MP-table mpc ==> [000009d920 - 000009dc84]
#9 [0000001000 - 0000003000] TRAMPOLINE ==> [0000001000 - 0000003000]
#10 [0000003000 - 0000007000] ACPI WAKEUP ==> [0000003000 - 0000007000]
#11 [0000008000 - 000000b000] PGTABLE ==> [0000008000 - 000000b000]
#12 [000000b000 - 0000028000] PGTABLE ==> [000000b000 - 0000028000]
Zone PFN ranges:
DMA 0x00000000 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00840000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0x00000000 -> 0x0000009b
0: 0x00000100 -> 0x000bff4b
0: 0x00100000 -> 0x00840000
ACPI: PM-Timer IO Port: 0x588
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x0c] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x0d] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x0e] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x0f] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x13] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 15, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[24])
IOAPIC[1]: apic_id 16, version 17, address 0xfecff000, GSI 24-26
ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[27])
IOAPIC[2]: apic_id 14, version 17, address 0xfec01000, GSI 27-62
ACPI: IOAPIC (id[0x0d] address[0xfec02000] gsi_base[63])
IOAPIC[3]: apic_id 13, version 17, address 0xfec02000, GSI 63-98
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10142201 base: 0xfde84000
SMP: Allowing 8 CPUs, 0 hotplug CPUs
mapped APIC to ffffffffff5fc000 (fee00000)
mapped IOAPIC to ffffffffff5fb000 (fec00000)
mapped IOAPIC to ffffffffff5fa000 (fecff000)
mapped IOAPIC to ffffffffff5f9000 (fec01000)
mapped IOAPIC to ffffffffff5f8000 (fec02000)
Allocating PCI resources starting at e0000000 (gap: e0000000:1ec00000)
setup_percpu: NR_CPUS:255 nr_cpumask_bits:255 nr_cpu_ids:8 nr_node_ids:1
PERCPU: Embedded 27 pages/cpu @ffff880028200000 s80280 r8192 d22120 u262144
pcpu-alloc: s80280 r8192 d22120 u262144 alloc=1*2097152
pcpu-alloc: [0] 0 1 2 3 4 5 6 7
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 8269916
Policy zone: Normal
Kernel command line: ro root=LABEL=/ rhgb console=tty0 console=ttyS0,9600n1 apic=debug
PID hash table entries: 4096 (order: 3, 32768 bytes)
Checking aperture...
No AGP bridge found
Memory: 33010860k/34603008k available (3066k kernel code, 1049704k absent, 542444k reserved, 5027k data, 476k init)
Hierarchical RCU implementation.
NR_IRQS:4352
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Fast TSC calibration using PIT
Detected 2931.853 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 5863.70 BogoMIPS (lpj=2931853)
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Mount-cache hash table entries: 256
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
mce: CPU supports 6 MCE banks
CPU0: Thermal monitoring enabled (TM1)
using mwait in idle threads.
Performance Events: Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
ACPI: Core revision 20091214
Setting APIC routing to flat
Getting VERSION: 50014
Getting VERSION: 50014
Getting ID: c000000
Getting ID: f3000000
Getting LVT0: 700
Getting LVT1: 400
enabled ExtINT on CPU#0
ESR value before enabling vector: 0x00000040 after: 0x00000000
ENABLING IO-APIC IRQs
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz stepping 0b
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 1665619
... PM-Timer delta = 357899
... PM-Timer result ok
..... delta 1665619
..... mult: 71548666
..... calibration result: 266499
..... CPU clock speed is 2931.0489 MHz.
..... host bus clock speed is 266.0499 MHz.
Booting Node 0, Processors #1masked ExtINT on CPU#1
#2masked ExtINT on CPU#2
#3masked ExtINT on CPU#3
#4masked ExtINT on CPU#4
#5masked ExtINT on CPU#5
#6masked ExtINT on CPU#6
#7 Ok.
masked ExtINT on CPU#7
Brought up 8 CPUs
Total of 8 processors activated (46905.61 BogoMIPS).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Sat, Jan 9 2010 10:40 pm
From: Yinghai Lu
On Sat, Jan 9, 2010 at 6:30 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
> On Sat, Jan 09, 2010 at 01:13:39PM -0800, Yinghai Lu wrote:
>> On Sat, Jan 9, 2010 at 2:10 AM, Ananth N Mavinakayanahalli
>> <ananth@in.ibm.com> wrote:
>> > On an 8-way system with Intel Xeon X7350 CPUs, booting 2.6.32 or newer
>> > kernels fails at:
>> >
>> > ...
>> > CPU0: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz stepping 0b
>> > Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok.
>> > Brought up 8 CPUs
>> > Total of 8 processors activated (46906.05 BogoMIPS).
>> >
>> > Git bisect showed 2fbd07a5f as the offending commit.
>> >
>> > With the patch below, I am able to boot the latest Linus' git tree on
>> > the machine. If this patch is correct, it needs to get into the stable
>> > tree too.
>> >
>> > Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
>> > ---
>> > Index: linux-2.6/arch/x86/kernel/apic/probe_64.c
>> > ===================================================================
>> > --- linux-2.6.orig/arch/x86/kernel/apic/probe_64.c 2010-01-09 14:54:29.000000000 +0530
>> > +++ linux-2.6/arch/x86/kernel/apic/probe_64.c 2010-01-09 14:57:53.000000000 +0530
>> > @@ -70,7 +70,7 @@
>> > if (apic == &apic_flat) {
>> > switch (boot_cpu_data.x86_vendor) {
>> > case X86_VENDOR_INTEL:
>> > - if (num_processors > 8)
>> > + if (num_processors >= 8)
>> > apic = &apic_physflat;
>> > break;
>> > case X86_VENDOR_AMD:
>>
>> can you send out whole bootlog with apic=debug?
>
> Here it is:
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x0c] enabled)
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x0d] enabled)
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x11] enabled)
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x0e] enabled)
> ACPI: LAPIC (acpi_id[0x05] lapic_id[0x12] enabled)
> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x0f] enabled)
> ACPI: LAPIC (acpi_id[0x07] lapic_id[0x13] enabled)
...
> Setting APIC routing to flat
> Getting VERSION: 50014
> Getting VERSION: 50014
> Getting ID: c000000
> Getting ID: f3000000
> Getting LVT0: 700
> Getting LVT1: 400
> enabled ExtINT on CPU#0
> ESR value before enabling vector: 0x00000040 after: 0x00000000
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz stepping 0b
...
the BSP's physical apic id is 0x0c instead of 0.
not sure Suresh test that or not.
YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: BAR 0: can't allocate resource
http://groups.google.com/group/linux.kernel/t/fcc38db87a1444c0?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 7:10 pm
From: Alex Brooks
> I was hoping for a kernel directly from Linus' git repo, e.g.,
> 2.6.33-rc3; I don't think the debug output I'm thinking about went in
> until after 2.6.32 was released.
I'm not sure how to modify the 2.6.33-rc3 kernel source exactly re MCFG -- the
dmesg output for 2.6.33-rc3 is quite different (attached), including some new
information about an address collision for the recalcitrant device:
pci 0000:01:04.0: address space collision: [mem 0x00800000-0x00800fff] already
in use
pci 0000:01:04.0: can't reserve [mem 0x00800000-0x00800fff]
Does this shed any more light on things (or can you tell me what I could
modify to get better debug info)?
Thanks,
Alex
==============================================================================
TOPIC: ksym_tracer: Fix to make the tracer work
http://groups.google.com/group/linux.kernel/t/d229f17fd7404338?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 7:20 pm
From: Frederic Weisbecker
On Thu, Dec 31, 2009 at 01:54:35PM +0800, Li Zefan wrote:
> K.Prasad wrote:
> > Frederic must have used for_each_possible_cpu() to account for CPUs that
> > are offline at the time of registration, but may eventually turn online.
> > Since register_wide_hw_breakpoint() interface is designed to deliver
> > system-wide breakpoints, the debug registers of a new online CPU will
> > should have the breakpoints populated to comprehensively notify all
> > memory accesses over target address.
> >
> > I'd rather wait to hear from Frederic to know why
> > perf_event_create_kernel_counter() returns an error when run for an
> > offline cpu and how it can be solved.
> >
>
> See the comment in find_get_context() in kernel/perf_event.c:
>
> /*
> * We could be clever and allow to attach a event to an
> * offline CPU and activate it when the CPU comes up, but
> * that's for later.
> */
> if (!cpu_online(cpu))
> return ERR_PTR(-ENODEV);
>
> So I think we can use for_each_possible_cpu() in the future, but not now.
>
Ah, right I indeed missed that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: High cpu temperature with 2.6.32, bisection shows commit 69d258 (fwd)
http://groups.google.com/group/linux.kernel/t/64eb218a25568a2e?hl=en
==============================================================================
== 1 of 2 ==
Date: Sat, Jan 9 2010 7:20 pm
From: Robert Hancock
On 01/09/2010 08:07 PM, Ray Lee wrote:
> On Sat, Jan 9, 2010 at 4:42 PM, Arjan van de Ven<arjan@infradead.org> wrote:
>> basically it appears that your machine, when the kernel asks for C2,
>> exits C2 immediately again.
>>
>> The old algorithm somehow caught this and stopped asking for C2 most of
>> the time; the new algorithm doesn't see any activity and asks for C2
>> again.
>
> This change of behavior will certainly bite more users out there. Is
> there any way we can detect the systems that aren't honoring the C2
> request and limit back to C1?
That seems like it would be a better approach, rather than adding to a
DMI list which is almost certainly incomplete.. We've got too many DMI
special cases in the kernel already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Sat, Jan 9 2010 8:20 pm
From: Arjan van de Ven
On Sat, 9 Jan 2010 18:07:14 -0800
Ray Lee <ray-lk@madrabbit.org> wrote:
> On Sat, Jan 9, 2010 at 4:42 PM, Arjan van de Ven
> <arjan@infradead.org> wrote:
> > basically it appears that your machine, when the kernel asks for C2,
> > exits C2 immediately again.
> >
> > The old algorithm somehow caught this and stopped asking for C2
> > most of the time; the new algorithm doesn't see any activity and
> > asks for C2 again.
>
> This change of behavior will certainly bite more users out there. Is
> there any way we can detect the systems that aren't honoring the C2
> request and limit back to C1?
it's not very likely that there are many such systems; it takes work to
break C2....
so far in 6 months 2 systems showed up, and this includes a fedora
release.
on the other hand, it's not so easy to detect the situation; exiting c2
quickly can also happen in normal use, so we'd have to have some sort of
threshold, which will be fragile by itself.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: [Bugfix][x86][hw-breakpoint] Fix return-code to notifier chain in hw_
breakpoint_handler
http://groups.google.com/group/linux.kernel/t/1dfa78998a656c21?hl=en
==============================================================================
== 1 of 2 ==
Date: Sat, Jan 9 2010 7:20 pm
From: Frederic Weisbecker
On Fri, Jan 01, 2010 at 12:32:17AM +0530, K.Prasad wrote:
> On Thu, Dec 31, 2009 at 01:38:09AM +0100, Frederic Weisbecker wrote:
> > On Sat, Dec 26, 2009 at 11:58:33PM +0530, K.Prasad wrote:
> > > The hw-breakpoint handler will return NOTIFY_DONE for user-space breakpoints
> > > to generate SIGTRAP signal (and not for kernel-space addresses).
> > >
> > > Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> > > ---
> > > arch/x86/kernel/hw_breakpoint.c | 9 +++++++--
> > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > Index: linux-2.6-tip/arch/x86/kernel/hw_breakpoint.c
> > > ===================================================================
> > > --- linux-2.6-tip.orig/arch/x86/kernel/hw_breakpoint.c
> > > +++ linux-2.6-tip/arch/x86/kernel/hw_breakpoint.c
> > > @@ -502,8 +502,6 @@ static int __kprobes hw_breakpoint_handl
> > > rcu_read_lock();
> > >
> > > bp = per_cpu(bp_per_reg[i], cpu);
> > > - if (bp)
> > > - rc = NOTIFY_DONE;
> > > /*
> > > * Reset the 'i'th TRAP bit in dr6 to denote completion of
> > > * exception handling
> > > @@ -517,6 +515,13 @@ static int __kprobes hw_breakpoint_handl
> > > rcu_read_unlock();
> > > break;
> > > }
> > > + /*
> > > + * Further processing in do_debug() is needed for a) user-space
> > > + * breakpoints (to generate signals) and b) when the system has
> > > + * taken exception due to multiple causes
> > > + */
> > > + if (bp->attr.bp_addr < TASK_SIZE)
> > > + rc = NOTIFY_DONE;
> > >
> > > perf_bp_event(bp, args->regs);
> > >
> > >
> >
> >
> > Oh and now that I see this patch, the previous one indeed makes sense
> > with this check:
> >
> > if (dr6 & (~DR_TRAP_BITS))
> > rc = NOTIFY_DONE;
> >
> > That said, it means thread.debugreg6 won't get the reserved bits anymore.
> > I see some use of them from kvm (it restores the reserved bits on guest<->host
> > switch). Not sure if this inconsistency could affect kvm...
> >
>
> Can you point me to the relevant code?
I see various uses of DR6_VOLATILE and DR6_FIXED_1 in arch/x86/kvm/,
DR6_FIXED_1 being the fixed unused bits in dr6. Not sure how
this patch would affect what's set there.
I'll wait for Jan's answer.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 2 ==
Date: Sat, Jan 9 2010 7:30 pm
From: Frederic Weisbecker
On Fri, Jan 01, 2010 at 12:19:49AM +0530, K.Prasad wrote:
> On Thu, Dec 31, 2009 at 12:45:00AM +0100, Frederic Weisbecker wrote:
> > On Sat, Dec 26, 2009 at 11:57:25PM +0530, K.Prasad wrote:
> > > Clear the reserved bits from the stored copy of debug status register (DR6).
> > > This will help easy bitwise operations.
> > >
> > > Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> > > ---
> > > arch/x86/include/asm/debugreg.h | 3 +++
> > > arch/x86/kernel/traps.c | 3 +++
> > > 2 files changed, 6 insertions(+)
> > >
> > > Index: linux-2.6-tip/arch/x86/include/asm/debugreg.h
> > > ===================================================================
> > > --- linux-2.6-tip.orig/arch/x86/include/asm/debugreg.h
> > > +++ linux-2.6-tip/arch/x86/include/asm/debugreg.h
> > > @@ -14,6 +14,9 @@
> > > which debugging register was responsible for the trap. The other bits
> > > are either reserved or not of interest to us. */
> > >
> > > +/* Define reserved bits in DR6 which are always set to 1 */
> > > +#define DR6_RESERVED (0xFFFF0FF0)
> > > +
> >
> >
> > The 12th bit seems to be also reserved.
> > Shouldn't it be 0xffff1ff0 ?
> >
>
> The 12th bit is reserved to be 0 always.
Ah, ok.
> > What kind of bitwise operations do you think it could help?
> >
> > All of the operations I can find on dr6 are simple masks
> > test/set/clear.
> >
>
> As you found out later, this bitmask helps us in
> hw_breakpoint_handler().
Yeah, ok. Just waiting for Jan's answer to be sure it has
not side effects :)
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()
http://groups.google.com/group/linux.kernel/t/197d4e49668b93ac?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 8:10 pm
From: Frederic Weisbecker
On Fri, Jan 01, 2010 at 12:18:04AM +0530, K.Prasad wrote:
> On Wed, Dec 30, 2009 at 11:28:39PM +0100, Frederic Weisbecker wrote:
> > On Tue, Dec 22, 2009 at 12:16:31AM +0530, K.Prasad wrote:
> > > On Fri, Dec 18, 2009 at 09:47:48PM +0100, Frederic Weisbecker wrote:
> >
> > > > And this function needs rcu too.
> > > >
> > > > I don't see any in-kernel user for this new feature.
> > > > That would be required to integrate it.
> > > >
> > >
> > > The proposed interfaces, as obvious, are mere wrappers over existing
> > > (un)register_user_* interfaces, and don't do anything vastly different
> > > in order to demonstrate them separately.
> > >
> > > I can get a sample kernel module ready - that consumes pid and user-space
> > > address to track write accesses, if you prefer it.
> >
> >
> > Ok. The code looks good and useful.
> >
> > But the usual philosophy in the kernel is to not add code
> > that is left unused upstream. And samples don't substitute a user.
> > I'm not sure this is a good idea to merge this.
> >
>
> Back to the old trick!...How about an ftrace plugin that accepts pid,
> user-space address and memory access type and traces all the IP
> addresses that caused access?
>
> echo <pid>:<user_addr><access_type> > usym_trace_filter
> echo 567:0x1234567:rw- > usym_trace_filter
>
> Breakpoint IP
> ------------ ---------
> 567:0x1234567 0x0abcdef
>
> I'm unsure if it sounds interesting at all, but I suspect it wouldn't be
> as easy as above to gather the shown information through any existing
> tools.
That's a good idea to trace userspace breakpoints but:
I think the perf interface to use breakpoints is much more powerful
than the breakpoint ftrace plugin, which is somehow deprecated now.
The fact is that ftrace plugins in general (I mean the plugins based
on struct tracer, not the trace events) are mostly deprecated in favor
of trace events.
There are still some areas where the plugins are necessary though, such as
function tracing, and some other tracers. But these are exceptions.
And breakpoints interface was one of them, but now we have
a much more powerful existing interface for it in perf tools.
When it comes to think about improving or adding a new ftrace plugin,
if we know an existing interface that is proven better, let's rather
improve the latter (that's why I'll try to convert the function graph
tracer into a trace event...not an easy task).
The breakpoint ftrace plugin can only trace the whole kernel, has no
per-cpu or per task (+inheritance) granularity, backtraces,
or whatever the perf tools can already offer.
To sum-up, sure we could improve it but:
- we already have better, as a unified and easier to improve interface.
Extending this ftrace plugin would make it even harder to maintain.
- ftrace plugins are deprecated, except for particular cases
So, concerning breakpoints, I really suggest to focus on the perf tools
and deprecate this breakpoint ftrace plugin.
Also, I'm pretty sure that this would already work:
./perf record -e mem:@addr_in_ls:rw ls /usr
or:
./perf record -e mem:@addr_in_ls:rw --pid $(pid_of_a_running_ls)
I've never tested it, but if that doesn't work, that's probably because
of a guardian inside the kernel that only accepts userspace breakpoints
from ptraced processes. I should check that. But if it doesn't work yet,
that would require very few changes for it to work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: MSI broken in libata?
http://groups.google.com/group/linux.kernel/t/a859f84b78879515?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 8:40 pm
From: Torsten Kaiser
On Sat, Jan 9, 2010 at 10:11 AM, Tejun Heo <tj@kernel.org> wrote:
> On 12/25/2009 06:22 PM, Torsten Kaiser wrote:
>> As reported in http://lkml.org/lkml/2009/12/19/82 the new MSI support
>> for sata_sil24 does not work for me.
>> This is still the same with 2.6.33-rc2.
>>
>> Why I think, this might be a problem within libata:
>> * other drivers can use MSI successful on my system (tg3, radeon, hda-intel)
>> * happens both in sata_sil24 and sata_nv
>> * the count in /proc/interrupts increases for the MSIs assigned to
>> sata_sil24/sata_nv, so interrupt delivery seems to work
>> * only writing seems to fail
>
> How does it fail? Timeouts?
Yes, timeouts.
I posted the error messages in http://lkml.org/lkml/2009/12/19/82 and
http://lkml.org/lkml/2010/1/6/60
> Also, ahci enables MSI by default if
> available and works fine on many configurations so I don't think
> anything in libata core layer is broken regarding MSI (there just
> isn't anything which can break).
The system I'm using does not have a ahci compatible controller, so I
could not compare this.
(And my other system that is using ahci, does not use MSI for that)
I just found it suspicious that 3 other drivers (tg3, hda-intel and
radeon) can use MSI, but both of the libata drivers (sata_sil24 and
sata_nv) fail in a similar way.
I did try the patch from Robert Hancock in
http://lkml.org/lkml/2010/1/6/417 ,but without success.
if you need any more information, or have something for me to try,
please just ask. I did look at the code and the documentation about
enabling MSI, but did not see anything (obvious) wrong, so I don't
know what to try next.
Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (
try #5)
http://groups.google.com/group/linux.kernel/t/1d3b43642dea48cb?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 8:50 pm
From: Al Viro
On Fri, Jan 01, 2010 at 04:40:27PM +0100, Pavel Machek wrote:
> > Access rights belong to file, not to a pathname (and there's no such thing
> > as _the_ pathname of a file).
> >
> > I'd buy that as a minor QoI issue; as a security one - no way.
>
> Ok, so you see it as a (QoI) problem, but not too major. Good; I hope
> it gets fixed one day.
Actually, I'm not even sure that it *is* worse than what we'd get after
such change. Note that it's not just about trying to reopen a file
currently opened r/o for write; there's the opposite case. We'd break
scripts that try to read /dev/stderr and expect to be called with stderr
redirected to caller-writable file. With redirects done with 2> and not
2<>. Sure, it's a lousy practice. And scripts in question are not
well-written in general. Downright unmaintainable, in fact. Written
by sysadmin that had left the job five years ago and can't be located,
even if he could be bribed into touching That Shite(tm) ever again.
We have far lousier kinds of behaviour we can't fix for compatibility
reasons. O_CREAT on dangling symlinks, for one. We tried to switch to
sane variant (from the current "create file wherever that symlink points
to") and had to revert due to userland crap that actually relied on that
insanity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: generic sys_old_select
http://groups.google.com/group/linux.kernel/t/2944132d1dd8ed87?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 9:00 pm
From: Al Viro
On Fri, Jan 08, 2010 at 02:22:41PM -0800, H. Peter Anvin wrote:
> > If we do that let's do it consistantly for various old syscalls, not
> > an odd one out.
>
> Yes, and it would be a good idea to do so, rather than hiding all these
> compatibility calls in all kind of random places.
>
> There is, however, a reason *not* to do it which should be carefully
> considered: by co-locating the compatibility version with the modern
> version, it gets access to static functions that are part of the
> implementation of the modern version. If we move the compatibility
> versions out, it may entail having to export those statics.
So we don't move such ones... I agree that it's a separate patch
queue, BTW.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: generic sys_ipc wrapper
http://groups.google.com/group/linux.kernel/t/cf75ba704494de9b?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 9:00 pm
From: Al Viro
On Fri, Jan 08, 2010 at 11:03:47AM +0100, Christoph Hellwig wrote:
> On Fri, Jan 08, 2010 at 10:50:23AM +0100, Christoph Hellwig wrote:
> > Always return -EINVAL for the iBCS2 special case in SHMAT, and add a
> > prototype to linux/syscalls.h
>
> and stop compiling the generic sys_ipc on s390:
... and maybe update the commit message? Other than that, ACK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: introduce sys_membarrier(): process-wide memory barrier
http://groups.google.com/group/linux.kernel/t/c8972d397ccbdcff?hl=en
==============================================================================
== 1 of 3 ==
Date: Sat, Jan 9 2010 9:20 pm
From: "Paul E. McKenney"
On Sat, Jan 09, 2010 at 08:12:55PM -0500, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > On Sat, Jan 09, 2010 at 06:16:40PM -0500, Steven Rostedt wrote:
> > > On Sat, 2010-01-09 at 18:05 -0500, Steven Rostedt wrote:
> > >
> > > > Then we should have O(tasks) for spinlocks taken, and
> > > > O(min(tasks, CPUS)) for IPIs.
> > >
> > > And for nr tasks >> CPUS, this may help too:
> > >
> > > > cpumask = 0;
> > > > foreach task {
> > >
> > > if (cpumask == online_cpus)
> > > break;
> > >
> > > > spin_lock(task_rq(task)->rq->lock);
> > > > if (task_rq(task)->curr == task)
> > > > cpu_set(task_cpu(task), cpumask);
> > > > spin_unlock(task_rq(task)->rq->lock);
> > > > }
> > > > send_ipi(cpumask);
> >
> > Good point, erring on the side of sending too many IPIs is safe. One
> > might even be able to just send the full set if enough of the CPUs were
> > running the current process and none of the remainder were running
> > real-time threads. And yes, it would then be necessary to throttle
> > calls to sys_membarrier().
> >
> > Quickly hiding behind a suitable boulder... ;-)
>
> :)
>
> One quick counter-argument against IPI-to-all: that will wake up all
> CPUs, including those which are asleep. Not really good for
> energy-saving.
Good point.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 2 of 3 ==
Date: Sat, Jan 9 2010 9:20 pm
From: "Paul E. McKenney"
On Sat, Jan 09, 2010 at 08:44:56PM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > On Sat, 2010-01-09 at 16:03 -0800, Paul E. McKenney wrote:
> > > On Sat, Jan 09, 2010 at 06:16:40PM -0500, Steven Rostedt wrote:
> > > > On Sat, 2010-01-09 at 18:05 -0500, Steven Rostedt wrote:
> > > >
> > > > > Then we should have O(tasks) for spinlocks taken, and
> > > > > O(min(tasks, CPUS)) for IPIs.
> > > >
> > > > And for nr tasks >> CPUS, this may help too:
> > > >
> > > > > cpumask = 0;
> > > > > foreach task {
> > > >
> > > > if (cpumask == online_cpus)
> > > > break;
> > > >
> > > > > spin_lock(task_rq(task)->rq->lock);
> > > > > if (task_rq(task)->curr == task)
> > > > > cpu_set(task_cpu(task), cpumask);
> > > > > spin_unlock(task_rq(task)->rq->lock);
> > > > > }
> > > > > send_ipi(cpumask);
> > >
> > > Good point, erring on the side of sending too many IPIs is safe. One
> > > might even be able to just send the full set if enough of the CPUs were
> > > running the current process and none of the remainder were running
> > > real-time threads. And yes, it would then be necessary to throttle
> > > calls to sys_membarrier().
> > >
> >
> > If you need to throttle calls to sys_membarrier(), than why bother
> > optimizing it? Again, this is like calling synchronize_sched() in the
> > kernel, which is a very heavy operation, and should only be called by
> > those that are not performance critical.
> >
> > Why are we struggling so much with optimizing the slow path?
> >
> > Here's how I take it. This method is much better that sending signals to
> > all threads. The advantage the sys_membarrier gives us, is also a way to
> > keep user rcu_read_locks barrier free, which means that rcu_read_locks
> > are quick and scale well.
> >
> > So what if we have a linear decrease in performance with the number of
> > threads on the write side?
>
> Hrm, looking at arch/x86/include/asm/mmu_context.h
>
> switch_mm(), which is basically called each time the scheduler needs to
> change the current task, does a
>
> cpumask_clear_cpu(cpu, mm_cpumask(prev));
>
> and
>
> cpumask_set_cpu(cpu, mm_cpumask(next));
>
> which precise goal is to stop the flush ipis for the previous mm. The
> 100$ question is : why do we have to confirm that the thread is indeed
> on the runqueue (taking locks and everything) when we could simply just
> bluntly use the mm_cpumask for our own IPIs ?
>
> cpumask_clear_cpu and cpumask_set_cpu translate into clear_bit/set_bit.
> cpumask_next does a find_next_bit on the cpumask.
>
> clear_bit/set_bit are atomic and not reordered on x86. PowerPC also uses
> ll/sc loops in bitops.h, so I think it should be pretty safe to assume
> that mm_cpumask is, by design, made to be used as cpumask to send a
> broadcast IPI to all CPUs which run threads belonging to a given
> process.
According to Documentation/atomic_ops.txt, clear_bit/set_bit are atomic,
but do not require memory-barrier semantics.
> So, how about just using mm_cpumask(current) for the broadcast ? Then we
> don't even need to allocate our own cpumask neither.
>
> Or am I missing something ? I just sounds too simple.
In this case, a pair of memory barriers around the clear_bit/set_bit in
mm and a memory barrier before sampling the mask. Yes, x86 gives you
memory barriers on atomics whether you need them or not, but they are
not guaranteed.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
== 3 of 3 ==
Date: Sat, Jan 9 2010 9:30 pm
From: "Paul E. McKenney"
On Sat, Jan 09, 2010 at 09:12:58PM -0500, Steven Rostedt wrote:
> On Sat, 2010-01-09 at 20:44 -0500, Mathieu Desnoyers wrote:
>
> > > So what if we have a linear decrease in performance with the number of
> > > threads on the write side?
> >
> > Hrm, looking at arch/x86/include/asm/mmu_context.h
> >
> > switch_mm(), which is basically called each time the scheduler needs to
> > change the current task, does a
> >
> > cpumask_clear_cpu(cpu, mm_cpumask(prev));
> >
> > and
> >
> > cpumask_set_cpu(cpu, mm_cpumask(next));
> >
> > which precise goal is to stop the flush ipis for the previous mm. The
> > 100$ question is : why do we have to confirm that the thread is indeed
> > on the runqueue (taking locks and everything) when we could simply just
> > bluntly use the mm_cpumask for our own IPIs ?
>
> I was just looking at that code, and was thinking the same thing ;-)
>
> > cpumask_clear_cpu and cpumask_set_cpu translate into clear_bit/set_bit.
> > cpumask_next does a find_next_bit on the cpumask.
> >
> > clear_bit/set_bit are atomic and not reordered on x86. PowerPC also uses
> > ll/sc loops in bitops.h, so I think it should be pretty safe to assume
> > that mm_cpumask is, by design, made to be used as cpumask to send a
> > broadcast IPI to all CPUs which run threads belonging to a given
> > process.
> >
> > So, how about just using mm_cpumask(current) for the broadcast ? Then we
> > don't even need to allocate our own cpumask neither.
> >
> > Or am I missing something ? I just sounds too simple.
>
> I think we can use it. If for some reason it does not satisfy what you
> need then I also think the TLB flushing is also broken.
>
> IIRC, (Paul help me out on this), what Paul said earlier, we are trying
> to protect against this scenario:
>
> (from Paul's email:)
>
>
> >
> > CPU 1 CPU 2
> > ----------- -------------
> >
> > <user space> <kernel space, switching to task>
> >
> > ->curr updated
> >
> > <long code path, maybe mb?>
> >
> > <user space>
> >
> > rcu_read_lock(); [load only]
> >
> > obj = list->next
> >
> > list_del(obj)
> >
> > sys_membarrier();
> > < kernel space >
> >
> > if (task_rq(task)->curr != task)
> > < but load to obj reordered before store to ->curr >
> >
> > < user space >
> >
> > < misses that CPU 2 is in rcu section >
>
>
> If the TLB flush misses that CPU 2 has a threaded task, and does not
> flush CPU 2s TLB, it can also risk the same type of crash.
But isn't the VM's locking helping us out in that case?
> > [CPU 2's ->curr update now visible]
> >
> > [CPU 2's rcu_read_lock() store now visible]
> >
> > free(obj);
> >
> > use_object(obj); <=== crash!
> >
>
> Think about it. If you change a process mmap, say you updated a mmap of
> a file by flushing out one page and replacing it with another. If the
> above missed sending to CPU 2, then CPU 2 may still be accessing the old
> page of the file, and not the new one.
>
> I think this may be the safe bet.
You might well be correct that we can access that bitmap locklessly,
but there are additional things (like the loading of the arch-specific
page-table register) that are likely to be helping in the VM case, but
not necessarily helping in this case.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: [PATCH 6/8] mm: handle_speculative_fault()
http://groups.google.com/group/linux.kernel/t/2a5e8285ffb8a998?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 9:30 pm
From: Nitin Gupta
On 01/09/2010 08:17 PM, Ed Tomlinson wrote:
> On Friday 08 January 2010 11:53:30 Peter Zijlstra wrote:
>> On Tue, 2010-01-05 at 20:20 -0800, Linus Torvalds wrote:
>>>
>>> On Wed, 6 Jan 2010, KAMEZAWA Hiroyuki wrote:
>>>>>
>>>>> Of course, your other load with MADV_DONTNEED seems to be horrible, and
>>>>> has some nasty spinlock issues, but that looks like a separate deal (I
>>>>> assume that load is just very hard on the pgtable lock).
>>>>
>>>> It's zone->lock, I guess. My test program avoids pgtable lock problem.
>>>
>>> Yeah, I should have looked more at your callchain. That's nasty. Much
>>> worse than the per-mm lock. I thought the page buffering would avoid the
>>> zone lock becoming a huge problem, but clearly not in this case.
>>
>> Right, so I ran some numbers on a multi-socket (2) machine as well:
>>
>> pf/min
>>
>> -tip 56398626
>> -tip + xadd 174753190
>> -tip + speculative 189274319
>> -tip + xadd + speculative 200174641
>
> Has anyone tried these patches with ramzswap? Nitin do they help with the locking
> issues you mentioned?
>
Locking problem with ramzswap seems completely unrelated to what is being discussed here.
Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: Leaky Bucket qdisc
http://groups.google.com/group/linux.kernel/t/87748e509dbc3166?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 11:10 pm
From: David Miller
Networking developers hang out on netdev@vger.kernel.org, please
direct your posting there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: input: make i2c device id constant
http://groups.google.com/group/linux.kernel/t/a2533a41cfac50f9?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 11:50 pm
From: Dmitry Torokhov
On Sat, Jan 09, 2010 at 01:55:47PM +0100, Németh Márton wrote:
> From: Márton Németh <nm127@freemail.hu>
>
> The id_table field of the struct i2c_driver is constant in <linux/i2c.h>
> so it is worth to make the initialization data also constant.
>
Applied all 5 to the next branch, thank you Márton.
--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: hp-wmi: fix double free
http://groups.google.com/group/linux.kernel/t/1ff50bb873776838?hl=en
==============================================================================
== 1 of 1 ==
Date: Sat, Jan 9 2010 11:50 pm
From: Dan Carpenter
kfree(obj) was called earlier.
This was found by smatch and has only been compile tested. :/
Signed-off-by: Dan Carpenter <error27@gmail.com>
--- orig/drivers/platform/x86/hp-wmi.c 2010-01-09 21:43:13.000000000 +0300
+++ devel/drivers/platform/x86/hp-wmi.c 2010-01-09 21:43:28.000000000 +0300
@@ -388,8 +388,6 @@ static void hp_wmi_notify(u32 value, voi
} else
printk(KERN_INFO "HP WMI: Unknown key pressed - %x\n",
eventcode);
-
- kfree(obj);
}
static int __init hp_wmi_input_setup(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: drivers/input/joystick/xpad.c: Add rumble support for original xbox
controller
http://groups.google.com/group/linux.kernel/t/bf02e6c3ba0db192?hl=en
==============================================================================
== 1 of 1 ==
Date: Sun, Jan 10 2010 12:00 am
From: Dmitry Torokhov
On Fri, Jan 08, 2010 at 11:26:10AM +0100, Benjamin Valentin wrote:
> On Thu, 7 Jan 2010 23:50:54 -0800
> Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote:
>
> > Thank you for your patch. Could I please have your "Signed-off-by: "
> > so I can apply it? Also, if you have any more patches ofr in put
> > devices, could you please CC linux-input@vger.kernel.org?
>
> This way?
>
> Signed-off-by: Benjamin Valentin <benpicco@zedat.fu-berlin.de>
Yep, thanks.
>
> --- /usr/src/linux-source-2.6.33/drivers/input/joystick/xpad.c
> 2010-01-08 02:56:59.365851076 +0100 +++ xpad.c 2010-01-08
> 03:13:38.477835651 +0100 @@ -505,7 +505,7 @@
> struct usb_endpoint_descriptor *ep_irq_out;
> int error = -ENOMEM;
>
> - if (xpad->xtype != XTYPE_XBOX360)
> + if (xpad->xtype != XTYPE_XBOX360 && xpad->xtype != XTYPE_XBOX)
> return 0;
>
> xpad->odata = usb_buffer_alloc(xpad->udev, XPAD_PKT_LEN,
> @@ -535,13 +535,13 @@
>
> static void xpad_stop_output(struct usb_xpad *xpad)
> {
> - if (xpad->xtype == XTYPE_XBOX360)
> + if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype != XTYPE_XBOX)
This should cretainly be "... || xpad->xtype == XTYPE_XBOX)", I'll fix
it up locally.
> usb_kill_urb(xpad->irq_out);
> }
>
> static void xpad_deinit_output(struct usb_xpad *xpad)
> {
> - if (xpad->xtype == XTYPE_XBOX360) {
> + if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype != XTYPE_XBOX)
Same here.
BTW, your mailer line-wraps e-mail which is bad when sending patches.
--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
http://groups.google.com/group/linux.kernel/t/c8fbbf0b2885d688?hl=en
==============================================================================
== 1 of 1 ==
Date: Sun, Jan 10 2010 12:20 am
From: Cyrill Gorcunov
On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
...
> > ---
> > x86: kernel_thread -- initialize SS to a known state
> >
> > Before the kernel_thread was converted into "C" we had
> > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> >
> > Though I must admit I didn't find any *explicit* load of
> > %ss from this structure the better to be on a safe side
> > and set it to a known value.
>
> It shouldn't make any difference, but maybe Xen is doing something
> subtle. In 64-bit mode the %ss segment register is supposed to be
> ignored, which is why it is left set to zero. It works properly on
> real hardware. It can't hurt anything to put __KERNEL_DS back in, but
> I'd just like to know why Xen requires it if this does fix it.
Yeah, I didn't found any explicit %ss reloading for this _particular_
case (as I marked in patch changelog). So the only suspicious is Xen
itself. So as only Christian get ability to test -- we will see the
results.
>
> --
> Brian Gerst
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
TOPIC: uml: fix memory leak in arch/um/os-Linux/mem.c
http://groups.google.com/group/linux.kernel/t/3b79503b0d1fffb8?hl=en
==============================================================================
== 1 of 1 ==
Date: Sun, Jan 10 2010 12:20 am
From: Américo Wang
On Sat, Jan 09, 2010 at 08:26:24PM +0300, Alexander Beregalov wrote:
>Free tempname before exit.
>Found by cppcheck.
>
>Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Thanks!
>---
> arch/um/os-Linux/mem.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/arch/um/os-Linux/mem.c b/arch/um/os-Linux/mem.c
>index 93a11d7..f079ea0 100644
>--- a/arch/um/os-Linux/mem.c
>+++ b/arch/um/os-Linux/mem.c
>@@ -175,7 +175,7 @@ static int __init make_tempfile(const char *template, char **out_tempname,
>
> find_tempdir();
> if ((tempdir == NULL) || (strlen(tempdir) >= MAXPATHLEN))
>- return -1;
>+ goto out;
>
> if (template[0] != '/')
> strcpy(tempname, tempdir);
>--
>1.6.6
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
--
Live like a child, think like the god.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
==============================================================================
You received this message because you are subscribed to the Google Groups "linux.kernel"
group.
To post to this group, visit http://groups.google.com/group/linux.kernel?hl=en
To unsubscribe from this group, send email to linux.kernel+unsubscribe@googlegroups.com
To change the way you get mail from this group, visit:
http://groups.google.com/group/linux.kernel/subscribe?hl=en
To report abuse, send email explaining the problem to abuse@googlegroups.com
==============================================================================
Google Groups: http://groups.google.com/?hl=en
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home