Tuesday, November 1, 2011

linux.kernel - 25 new messages in 20 topics - digest

linux.kernel
http://groups.google.com/group/linux.kernel?hl=en

linux.kernel@googlegroups.com

Today's topics:

* linux-next: manual merge of the hwspinlock tree with the arm-soc tree - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/c9d062ce5bcd7a71?hl=en
* ulist: generic data structure to build unique lists - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/c46f877d5cb7751d?hl=en
* cpumask: update the Note of setup_node_to_cpumask_map - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/49c169f4615a35a9?hl=en
* linux-next: build failure after merge of the moduleh tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/a73aeb930ef0adec?hl=en
* writeback tree status (for 3.2 merge window) - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/9f8955e3d3206c2a?hl=en
* mm: add free_hot_cold_page_list helper - 2 messages, 1 author
http://groups.google.com/group/linux.kernel/t/1046629cbaebc7af?hl=en
* linux-next: manual merge of the akpm with the scsi tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/7fcb3b719ed438dc?hl=en
* i2c-gpio.c: correct logic of pdata->scl_is_open_drain - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/cff5233b6757a96f?hl=en
* question about kernel panic related to sched_rt.c - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/e009b997343bf4fc?hl=en
* ata port runtime pm - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/62f6e4e08ae131b1?hl=en
* linux-next: manual merge of the block tree with Linus' tree - 2 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/6dba4bcd675d2516?hl=en
* freezer: revert 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
TASK_KILLABLE tasks too" - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/af2947db266e716a?hl=en
* linux-next: manual merge of the akpm tree with Linus' tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/677ce897b731eb35?hl=en
* [mm/memory.c]: transparent hugepage check condition missed - 3 messages, 2
authors
http://groups.google.com/group/linux.kernel/t/423b60da12215e28?hl=en
* hda_hwdep: Fix possible buffer overflow - 2 messages, 2 authors
http://groups.google.com/group/linux.kernel/t/0918224a3fbaac5a?hl=en
* ramoops appears geared to not support ARM - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/756889b4ecfdf6c6?hl=en
* linux-next: build failure after merge of the akpm tree - 1 messages, 1
author
http://groups.google.com/group/linux.kernel/t/efc4ff39e4b9bd8c?hl=en
* [RFC] A readahead complete notify approach to implement buffer aio - 1
messages, 1 author
http://groups.google.com/group/linux.kernel/t/a7ba5127efa4667f?hl=en
* drop unused Kconfig symbols - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/66b2858ee8b2f45b?hl=en
* intel-iommu:make identity_map default for crash dump - 1 messages, 1 author
http://groups.google.com/group/linux.kernel/t/601650cd6a87d233?hl=en

==============================================================================
TOPIC: linux-next: manual merge of the hwspinlock tree with the arm-soc tree
http://groups.google.com/group/linux.kernel/t/c9d062ce5bcd7a71?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 12:30 am
From: Ohad Ben-Cohen


Hi Stephen,

On Tue, Nov 1, 2011 at 8:56 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> I fixed it up (see below) anc can carry the fix as necessary.

Looks good, thanks a lot !

Ohad.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ulist: generic data structure to build unique lists
http://groups.google.com/group/linux.kernel/t/c46f877d5cb7751d?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 12:40 am
From: Arne Jansen


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12.10.2011 01:37, Andrew Morton wrote:
>
> Generally, I really do ask that you provide us a complete description
> of what this utility is supposed to do. Once that is understood we can
> then start to determine why none of the existing container helpers were
> suitable, and were not capable of being modified to be suitable.

Thanks for your kind comments. I sent an updated version to the list
accordingly. Please let me know if the documentation is good enough
and shows the purpose clearly.

>
> For example, from a quick squint at the code I'm wondering why
> flex_array was unsuitable.
>

ulist is currently implemented as an array, but this is not the way
to go to scale to larger lists. I'd rather switch to using rbtrees
over a certain threshold and will do so as soon as we settle on
putting it in lib/ instead of burying it inside fs/btrfs.

- -Arne
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJOr6GeAAoJEPa4OwE4JkZ9TOcP/ArwGeCv8sQejihSahoXPum8
3F5/QhvNpQMtl//3iQ6duuyCbSNyskGQ6xViYmIhYoH8Q9E+DOgQrSa4lSgsJxte
4ItbiobmTnCOWHqbUckkjp8toBleXC2iSjvtpt7DCfP/H1enAi3lE9FONB5A6RcP
ekGoNlp0LIFqLa0hk8qZ8JGetxuR1YWcunkrZRAze/cj4nGcegxiH9bLvEWgEShk
o0QT8RNF1BFtef4unWyxnND+IUbwfvVMU61cKANbXvKhUfG6QxDH62QfYyxQ/Unw
S2IXFZ41I/vgZIE4MKUABYFhigFkmBC3Mze7BGeF084SussazisHdA6UQsQi/TPP
+fU73XTmpFbpg4z2XWAyDkq96qIYijgkSLwelSMRBza8/A3fGnLcKZSCNH3LbjGu
aBjIi8A4VbasiBt/Hx1Vu2v9feX+S1tlmiODoLVp5bsvjD592bAfJK8QOf9RAFYK
0PMV2Fw6yplv6RFfZ32aJwtsryLN6iTOq1LXOz71KBVDaMuZXEur298WhrVCmcEN
270Kd3cBbF/JAoCODw97f/UPE9qWF1Vye+dTEjIWqb8bk8RBrzW5C6Wsxk9BRoU4
dONNMet9ykhHJIbchlYiOv3SMz04+yJJV74MmypefXj6tbxrdYm4gvebx28bC4jm
Ok/DXMFAdLm6wCicjKgI
=GHYG
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: cpumask: update the Note of setup_node_to_cpumask_map
http://groups.google.com/group/linux.kernel/t/49c169f4615a35a9?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 12:40 am
From: Wanlong Gao


node_to_cpumask() had been replaced by cpumask_of_node(), and wholly
removed since commit 29c337a0.

So update the Note of setup_node_to_cpumask_map().

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
arch/powerpc/mm/numa.c | 2 +-
arch/x86/mm/numa.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 2164006..664272c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -58,7 +58,7 @@ static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
* Allocate node_to_cpumask_map based on number of available nodes
* Requires node_possible_map to be valid.
*
- * Note: node_to_cpumask() is not valid until after this is done.
+ * Note: cpumask_of_node() is not valid until after this is done.
*/
static void __init setup_node_to_cpumask_map(void)
{
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fbeaaf4..43049f0 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -110,7 +110,7 @@ void __cpuinit numa_clear_node(int cpu)
* Allocate node_to_cpumask_map based on number of available nodes
* Requires node_possible_map to be valid.
*
- * Note: node_to_cpumask() is not valid until after this is done.
+ * Note: cpumask_of_node() is not valid until after this is done.
* (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
*/
void __init setup_node_to_cpumask_map(void)
--
1.7.8.rc0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: linux-next: build failure after merge of the moduleh tree
http://groups.google.com/group/linux.kernel/t/a73aeb930ef0adec?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 12:40 am
From: Stephen Rothwell


Hi Paul,

After merging the moduleh tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/md/dm-bufio.c:988:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:988:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:988:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:997:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:997:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:997:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1006:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1006:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1006:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1036:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1036:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1036:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1049:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1049:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1049:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1059:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1059:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1059:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1135:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1135:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1135:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1158:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1158:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1158:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1232:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1232:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1232:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1238:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1238:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1238:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1245:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1245:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1245:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1251:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1251:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1251:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1257:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1257:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1257:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1263:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1263:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1263:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1269:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1269:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1269:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1489:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1489:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1489:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1534:1: warning: data definition has no type or storage class [enabled by default]
drivers/md/dm-bufio.c:1534:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' [-Wimplicit-int]
drivers/md/dm-bufio.c:1534:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/md/dm-bufio.c:1676:63: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1677:40: error: expected ')' before string constant
drivers/md/dm-bufio.c:1679:55: error: expected ')' before 'uint'
drivers/md/dm-bufio.c:1680:35: error: expected ')' before string constant
drivers/md/dm-bufio.c:1682:67: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1683:40: error: expected ')' before string constant
drivers/md/dm-bufio.c:1685:79: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1686:46: error: expected ')' before string constant
drivers/md/dm-bufio.c:1688:87: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1689:50: error: expected ')' before string constant
drivers/md/dm-bufio.c:1691:73: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1692:43: error: expected ')' before string constant
drivers/md/dm-bufio.c:1694:73: error: expected ')' before 'ulong'
drivers/md/dm-bufio.c:1695:43: error: expected ')' before string constant
drivers/md/dm-bufio.c:1697:15: error: expected declaration specifiers or '...' before string constant
drivers/md/dm-bufio.c:1698:20: error: expected declaration specifiers or '...' before string constant
drivers/md/dm-bufio.c:1699:16: error: expected declaration specifiers or '...' before string constant

Caused by commit 0b068238c5ef ("The dm-bufio interface allows you to do
cached I/O on devices") interacting with the modul.h split up. This file
should have included module.h in any case.

I have added this merg fix patch (Alasdair, this could be added to the
device-mapper tree):

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 1 Nov 2011 18:30:49 +1100
Subject: [PATCH] device-mapper: dm-bufio.c needs to include module.h

since it uses the module facilities.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
drivers/md/dm-bufio.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index cb24666..3a94ef4 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -14,6 +14,7 @@
#include <linux/vmalloc.h>
#include <linux/version.h>
#include <linux/shrinker.h>
+#include <linux/module.h>

#define DM_MSG_PREFIX "bufio"

--
1.7.7

--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

==============================================================================
TOPIC: writeback tree status (for 3.2 merge window)
http://groups.google.com/group/linux.kernel/t/9f8955e3d3206c2a?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 12:50 am
From: Wu Fengguang


Hi,

There are 3 patchsets sitting in the writeback tree.

1) IO-less dirty throttling v12
https://github.com/fengguang/linux/commits/dirty-throttling-v12

2) writeback reasons tracing from Curt Wohlgemuth
https://github.com/fengguang/linux/commits/writeback-reason

3) writeback queuing changes from Jan Kara and me
https://github.com/fengguang/linux/commits/requeue-io-wait

They have been merged into this branch testing in linux-next for a while:

https://github.com/fengguang/linux/commits/writeback-for-next

Since (3) still has an unresolved issue (detailed in the below
links), it looks better to hold it back for this merge window.

http://permalink.gmane.org/gmane.linux.kernel/1206315
http://permalink.gmane.org/gmane.linux.kernel/1206316

The patches from (1,2) together with 2 tracing patches essential for
debugging (1) have been pushed to the "writeback-for-linus" branch:

http://git.kernel.org/?p=linux/kernel/git/wfg/linux.git;a=shortlog;h=refs/heads/writeback-for-linus

If no objections, I'll send a pull request to Linus soon.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: mm: add free_hot_cold_page_list helper
http://groups.google.com/group/linux.kernel/t/1046629cbaebc7af?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Nov 1 2011 1:00 am
From: Konstantin Khlebnikov


This patch adds helper free_hot_cold_page_list() to free list of 0-order pages.
It frees pages directly from the list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function old new delta
free_hot_cold_page_list - 264 +264
get_page_from_freelist 2129 2132 +3
__pagevec_free 243 239 -4
split_free_page 380 373 -7
release_pages 606 510 -96
free_page_list 188 - -188

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
include/linux/gfp.h | 1 +
mm/page_alloc.c | 13 +++++++++++++
mm/swap.c | 14 +++-----------
mm/vmscan.c | 20 +-------------------
4 files changed, 18 insertions(+), 30 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3a76faf..6562958 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -358,6 +358,7 @@ void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
extern void free_hot_cold_page(struct page *page, int cold);
+extern void free_hot_cold_page_list(struct list_head *list, int cold);

#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9dd443d..5093114 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1211,6 +1211,19 @@ out:
}

/*
+ * Free a list of 0-order pages
+ */
+void free_hot_cold_page_list(struct list_head *list, int cold)
+{
+ struct page *page, *next;
+
+ list_for_each_entry_safe(page, next, list, lru) {
+ trace_mm_pagevec_free(page, cold);
+ free_hot_cold_page(page, cold);
+ }
+}
+
+/*
* split_page takes a non-compound higher-order page, and splits it into
* n (1<<order) sub-pages: page[0..n]
* Each sub-page must be freed individually.
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..b9138c7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -562,11 +562,10 @@ int lru_add_drain_all(void)
void release_pages(struct page **pages, int nr, int cold)
{
int i;
- struct pagevec pages_to_free;
+ LIST_HEAD(pages_to_free);
struct zone *zone = NULL;
unsigned long uninitialized_var(flags);

- pagevec_init(&pages_to_free, cold);
for (i = 0; i < nr; i++) {
struct page *page = pages[i];

@@ -597,19 +596,12 @@ void release_pages(struct page **pages, int nr, int cold)
del_page_from_lru(zone, page);
}

- if (!pagevec_add(&pages_to_free, page)) {
- if (zone) {
- spin_unlock_irqrestore(&zone->lru_lock, flags);
- zone = NULL;
- }
- __pagevec_free(&pages_to_free);
- pagevec_reinit(&pages_to_free);
- }
+ list_add_tail(&page->lru, &pages_to_free);
}
if (zone)
spin_unlock_irqrestore(&zone->lru_lock, flags);

- pagevec_free(&pages_to_free);
+ free_hot_cold_page_list(&pages_to_free, cold);
}
EXPORT_SYMBOL(release_pages);

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a90c603..77f84ef 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -728,24 +728,6 @@ static enum page_references page_check_references(struct page *page,
return PAGEREF_RECLAIM;
}

-static noinline_for_stack void free_page_list(struct list_head *free_pages)
-{
- struct pagevec freed_pvec;
- struct page *page, *tmp;
-
- pagevec_init(&freed_pvec, 1);
-
- list_for_each_entry_safe(page, tmp, free_pages, lru) {
- list_del(&page->lru);
- if (!pagevec_add(&freed_pvec, page)) {
- __pagevec_free(&freed_pvec);
- pagevec_reinit(&freed_pvec);
- }
- }
-
- pagevec_free(&freed_pvec);
-}
-
/*
* shrink_page_list() returns the number of reclaimed pages
*/
@@ -1009,7 +991,7 @@ keep_lumpy:
if (nr_dirty && nr_dirty == nr_congested && scanning_global_lru(sc))
zone_set_flag(zone, ZONE_CONGESTED);

- free_page_list(&free_pages);
+ free_hot_cold_page_list(&free_pages, 1);

list_splice(&ret_pages, page_list);
count_vm_events(PGACTIVATE, pgactivate);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Tues, Nov 1 2011 1:00 am
From: Konstantin Khlebnikov


Andrew Morton wrote:
> On Mon, 29 Aug 2011 16:48:46 +0900
> Minchan Kim<minchan.kim@gmail.com> wrote:
>
>> On Fri, Jul 29, 2011 at 4:58 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> This patch adds helper free_hot_cold_page_list() to free list of 0-order pages.
>>> It frees pages directly from list without temporary page-vector.
>>> It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour.
>>>
>>> bloat-o-meter:
>>>
>>> add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
>>> function old new delta
>>> free_hot_cold_page_list - 264 +264
>>> get_page_from_freelist 2129 2132 +3
>>> pagevec_free 243 239 -4
>>> split_free_page 380 373 -7
>>> release_pages 606 510 -96
>>> free_page_list 188 - -188
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> ---
>>> include/linux/gfp.h | 1 +
>>> mm/page_alloc.c | 12 ++++++++++++
>>> mm/swap.c | 14 +++-----------
>>> mm/vmscan.c | 20 +-------------------
>>> 4 files changed, 17 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index cb40892..dd7b9cc 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -358,6 +358,7 @@ void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
>>> extern void free_pages(struct page *page, unsigned int order);
>>> extern void free_pages(unsigned long addr, unsigned int order);
>>> extern void free_hot_cold_page(struct page *page, int cold);
>>> +extern void free_hot_cold_page_list(struct list_head *list, int cold);
>>>
>>> #define free_page(page) free_pages((page), 0)
>>> #define free_page(addr) free_pages((addr), 0)
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 1dbcf88..af486e4 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1209,6 +1209,18 @@ out:
>>> local_irq_restore(flags);
>>> }
>>>
>>> +void free_hot_cold_page_list(struct list_head *list, int cold)
>>> +{
>>> + struct page *page, *next;
>>> +
>>> + list_for_each_entry_safe(page, next, list, lru) {
>>> + trace_mm_pagevec_free(page, cold);
>>
>>
>> I understand you want to minimize changes without breaking current ABI
>> with trace tools.
>> But apparently, It's not a pagvec_free. It just hurts readability.
>> As I take a look at the code, mm_pagevec_free isn't related to pagevec
>> but I guess it can represent 0-order pages free because 0-order pages
>> are freed only by pagevec until now.
>> So, how about renaming it with mm_page_free or mm_page_free_zero_order?
>> If you do, you need to do s/MM_PAGEVEC_FREE/MM_FREE_FREE/g in
>> trace-pagealloc-postprocess.pl.
>>
>>
>>> + free_hot_cold_page(page, cold);
>>> + }
>>> +
>>> + INIT_LIST_HEAD(list);
>>
>> Why do we need it?
>
> My email has been horrid for a couple of months (fixed now), so I might
> have missed any reply to Minchin's review comments?
>

Sorry, I forget about this patch. v2 sended.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: linux-next: manual merge of the akpm with the scsi tree
http://groups.google.com/group/linux.kernel/t/7fcb3b719ed438dc?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:00 am
From: Stephen Rothwell


Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in
drivers/scsi/sd.c between commit 21208ae5a21f ("[SCSI] sd: remove
arbitrary SD_MAX_DISKS namespace limit") from the scsi tree and
"drivers/scsi/sd.c: use ida_simple_get() and ida_simple_remove() in place
of boilerplate code" from the akpm.

I think that the latter supercedes that former, so I used it.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

==============================================================================
TOPIC: i2c-gpio.c: correct logic of pdata->scl_is_open_drain
http://groups.google.com/group/linux.kernel/t/cff5233b6757a96f?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:10 am
From: "Voss, Nikolaus"


> It is never correct to use push-pull I/O for i2c, so the flag does not
> specify the desired behavior of the driver, it specifies what the
> hardware has been configured to do so that the driver can choose the
> cheapest way to do open-drain I/O.

Ok, seems rather logical after all.

>
> And even if you could argue that the flag should be inverted, it has
> had the same meaning since the driver was introduced several years
> ago, so changing it now will break every single platform which
> currently uses i2c-gpio.

I completely agree.

Thanks for the explanation,
Niko

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: question about kernel panic related to sched_rt.c
http://groups.google.com/group/linux.kernel/t/e009b997343bf4fc?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:10 am
From: MING ZHOU


Hi all,

May I ask a question about scheduler (sched_rt.c)? I want to make
sure a patch related in kernel-linux mailing list is valid or not.

https://lkml.org/lkml/2011/8/14/71

I encountered a kernel panic recently which caused by BUG_ON in
pick_next_pushable_task ( my kernel version is 2.6.35, on arm-omap
platform).

static struct task_struct *pick_next_pushable_task(struct rq *rq)
{
...
BUG_ON(task_current(rq, p)); <------------ panic here!!!
...
}

<4>[17583.180664] [<c00a3888>] (pick_next_pushable_task+0x4c/0xa4)
from [<c00ae21c>] (push_rt_task+0x20/0x264)
<4>[17583.180725] [<c00ae21c>] (push_rt_task+0x20/0x264) from
[<c00ae554>] (post_schedule_rt+0x14/0x20)
<4>[17583.180816] [<c00ae554>] (post_schedule_rt+0x14/0x20) from
[<c069352c>] (schedule+0x738/0x7c8)


I checked patch history related to push_rt_task, and I think the
following patch may be the reason, since if dequeue task improperly,
it may ruin task pointer by mistake.

https://lkml.org/lkml/2011/8/14/71

Commit-ID: 311e800e16f63d909136a64ed17ca353a160be59
Author: Hillf Danton <dhillf@gmail.com>

sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()

Do not call dequeue_pushable_task() when failing to push an eligible
task, as it remains pushable, merely not at this particular moment.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Mike Galbraith <mgalbraith@gmx.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yong Zhang <yong.zhang0@gmail.com>

And I also noticed in commit history of sched_rt.c, I found the
similar patch is submitted before at 2008.
However, it was not picked up in latest kernel code. So, I am
wondering whether this patch is valid?

commit 1563513d34ed4b12ef32bc2adde4a53ce05701a1
Author: Gregory Haskins <ghaskins@novell.com>
Date: Mon Dec 29 09:39:53 2008 -0500

RT: fix push_rt_task() to handle dequeue_pushable properly

A panic was discovered by Chirag Jog where a BUG_ON sanity check
in the new "pushable_task" logic would trigger a panic under
certain circumstances:

http://lkml.org/lkml/2008/9/25/189

Gilles Carry discovered that the root cause was attributed to the
pushable_tasks list getting corrupted in the push_rt_task logic.
This was the result of a dropped rq lock in double_lock_balance
allowing a task in the process of being pushed to potentially migrate
away, and thus corrupt the pushable_tasks() list.

I traced back the problem as introduced by the pushable_tasks patch
that went in recently. There is a "retry" path in push_rt_task()
that actually had a compound conditional to decide whether to
retry or exit. I missed the meaning behind the rationale for the
virtual "if(!task) goto out;" portion of the compound statement and
thus did not handle it properly. The new pushable_tasks logic
actually creates three distinct conditions:

1) an untouched and unpushable task should be dequeued
2) a migrated task where more pushable tasks remain should be retried
3) a migrated task where no more pushable tasks exist should exit

The original logic mushed (1) and (3) together, resulting in the
system dequeuing a migrated task (against an unlocked foreign run-queue
nonetheless).

To fix this, we get rid of the notion of "paranoid" and we support the
three unique conditions properly. The paranoid feature is no longer
relevant with the new pushable logic (since pushable naturally limits
the loop) anyway, so lets just remove it.

Reported-By: Chirag Jog <chirag@linux.vnet.ibm.com>
Found-by: Gilles Carry <gilles.carry@bull.net>
Signed-off-by: Gregory Haskins <ghaskins@novell.com>


Best Regards,
Jane Zhou
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ata port runtime pm
http://groups.google.com/group/linux.kernel/t/62f6e4e08ae131b1?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:20 am
From: Lin Ming


On Sat, 2011-10-29 at 02:51 +0800, Alan Stern wrote:
> On Fri, 28 Oct 2011, Rafael J. Wysocki wrote:
>
> > On Friday, October 28, 2011, Lin Ming wrote:
> > > On Fri, 2011-10-28 at 11:37 +0800, Jeff Garzik wrote:
> > > > On 10/27/2011 11:21 PM, Lin Ming wrote:
> > > > > @@ -3208,6 +3209,11 @@ int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
> > > > >
> > > > > ap = ata_shost_to_port(shost);
> > > > >
> > > > > + if (pm_runtime_suspended(&ap->tdev))
> > > > > + pm_runtime_resume(&ap->tdev);
> > > > > + pm_runtime_mark_last_busy(&ap->tdev);
> > > > > + pm_request_autosuspend(&ap->tdev);
> > > > > +
> > > > > spin_lock_irqsave(ap->lock, irq_flags);
> > > > >
> > > >
> > > >
> > > > Putting this into the core command dispatch fast-path is rather
> > > > disappointing. That's at least one additional lock, plus some atomic
> > > > instructions and tests.
>
> And it calls pm_runtime_resume(), which requires process context, from
> within a SCSI queuecmd routine, which runs in interrupt context.

Hi,

Thanks to point this out. I change the code to do ata port runtime
suspend/resume through scsi layer.

scsi host runtime suspend/resume framework is already there(scsi_pm.c).
So I only need to insert hooks for ata port in
scsi_runtime_suspend/resume(...).

But I found a live lock when testing my patch.

<scsi host runtime suspend>
scsi_autopm_put_host
pm_runtime_put_sync
<scsi_host runtime pm status updated to RPM_SUSPENDING>
......
<call libata hook to do suspend>
<wake up scsi EH to handle suspend>
<wait for scsi EH ...>

<scsi EH wake up>
scsi_error_handler
<resume scsi host>
scsi_autopm_get_host
pm_runtime_get_sync
.....
<sleep to wait for the ongoing scsi host suspend>

libata schedules scsi EH to handle suspend, then dead lock happens
because scsi EH in turn waits for the ongoing suspend.

Any idea how to resolve this dead lock?

Thanks,
Lin Ming

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: linux-next: manual merge of the block tree with Linus' tree
http://groups.google.com/group/linux.kernel/t/6dba4bcd675d2516?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Nov 1 2011 1:20 am
From: Jens Axboe


On 2011-11-01 06:15, Stephen Rothwell wrote:
> Hi Jens,
>
> Today's linux-next merge of the block tree got a conflict in
> drivers/md/raid10.c between commit fd01b88c75a7 ("md: remove typedefs:
> mddev_t -> struct mddev") from Linus' tree and commit 5a7bbad27a41
> ("block: remove support for bio remapping from ->make_request") from the
> block tree.
>
> I fixed it up (see below) and can carry the fix as necessary.

Fixup looks good, however don't you get the same conflict in basically
all the raid personalities? Fixup is indeed just the simple int -> void
transition, conflict is because of the mddev_t -> struct mddev change
that is now in Linus' tree.

I'll push my pending off to Linus today or tomorrow, so this will be
resolved shortly.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Tues, Nov 1 2011 2:10 am
From: Stephen Rothwell


Hi Jens,

On Tue, 01 Nov 2011 09:09:45 +0100 Jens Axboe <axboe@kernel.dk> wrote:
>
> Fixup looks good, however don't you get the same conflict in basically
> all the raid personalities? Fixup is indeed just the simple int -> void
> transition, conflict is because of the mddev_t -> struct mddev change
> that is now in Linus' tree.

I got these conflicts in all the raid personalities a while ago. I am
not sure why this turned up for just raid10 again.

> I'll push my pending off to Linus today or tomorrow, so this will be
> resolved shortly.

OK, thanks.

--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

==============================================================================
TOPIC: freezer: revert 27920651fe "PM / Freezer: Make fake_signal_wake_up()
wake TASK_KILLABLE tasks too"
http://groups.google.com/group/linux.kernel/t/af2947db266e716a?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:20 am
From: Jeff Layton


On Mon, 31 Oct 2011 17:55:05 -0700
Tejun Heo <tj@kernel.org> wrote:

> Hey, again.
>
> On Mon, Oct 31, 2011 at 04:30:59PM -0700, Tejun Heo wrote:
> > I can't remember one off the top of my head but I'm pretty sure there
> > at least are few which expect tight inter-locking between sleeps and
> > wakeups. I'll look for examples and post reply. ISTR them being
> > kernel threads so this might not apply directly but it's still a
> > dangerous game to play.
>
> Hmm... I couldn't find KILLABLE used like that but here are two
> UNINTERRUPTIBLE sleep examples.
>
> * kthread_start() depends on the fact that a kthread won't be woken up
> from UNINTERRUPTIBLE sleep spuriously.
>
> * jfs_flush_journal() doesn't check whether the wakeup was spurious
> after waiting if !tblkGC_COMMITTED.
>
> Maybe we can re-define KILLABLE as killable && freezable but IMHO that
> requires pretty strong rationales. If at all possible, let's not
> diddle with that if it can be worked around some other way.
>
> Thank you.
>

(cc'ing Trond and the linux-nfs mailing list -- fwiw, he maintains the
NFS client code -- Bruce is the NFS server maintainer and probably has
little interest in this thread).

The main reason for this change is primarily that we have people with
laptops and nfs and cifs mounts that sometimes fail to suspend.

IIUC, the TASK_KILLABLE was mostly added to ensure that file-store
writes would be uninterruptible, but still allow those tasks to be
killed if the process is going down anyway.

The intr/nointr mount options in NFS have been deprecated since
TASK_KILLABLE was added. The scheme now is basically that those sleeps
ignore any signals except for fatal ones. So, that knob is meaningless
and has been for a long time now.

cifs never had a working intr/nointr knob, but signal handling while
waiting for replies was always a difficult thing to handle correctly. I
don't think the right answer is to go back to using such a knob in cifs
or nfs.

I suppose we could look at going back to the world of complicated
signal handling and TASK_INTERRUPTIBLE, but that's not a trivial change
either. The TASK_WAKE_FREEZABLE flag you mention might make more sense
than doing that.

--
Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: linux-next: manual merge of the akpm tree with Linus' tree
http://groups.google.com/group/linux.kernel/t/677ce897b731eb35?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 1:20 am
From: Stephen Rothwell


Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in
fs/direct-io.c between commit eb28be2b4c0a ("direct-io: separate fields
only used in the submission path from struct dio") from Linus' tree and
commit "fs/direct-io.c: salcuate fs_count correctly in get_more_blocks()"
from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au

diff --cc fs/direct-io.c
index d740ab6,b05f24e..0000000
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@@ -575,14 -564,13 +575,13 @@@ static inline int dio_bio_reap(struct d
* buffer_mapped(). However the direct-io code will only process holes one
* block at a time - it will repeatedly call get_block() as it walks the hole.
*/
-static int get_more_blocks(struct dio *dio)
+static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
+ struct buffer_head *map_bh)
{
int ret;
- struct buffer_head *map_bh = &dio->map_bh;
sector_t fs_startblk; /* Into file, in filesystem-sized blocks */
+ sector_t fs_endblk; /* Into file, in filesystem-sized blocks */
unsigned long fs_count; /* Number of filesystem-sized blocks */
- unsigned long dio_count;/* Number of dio_block-sized blocks */
- unsigned long blkmask;
int create;

/*
@@@ -591,13 -579,10 +590,10 @@@
*/
ret = dio->page_errors;
if (ret == 0) {
- BUG_ON(dio->block_in_file >= dio->final_block_in_request);
- fs_startblk = dio->block_in_file >> dio->blkfactor;
- fs_endblk = (dio->final_block_in_request - 1) >> dio->blkfactor;
+ BUG_ON(sdio->block_in_file >= sdio->final_block_in_request);
+ fs_startblk = sdio->block_in_file >> sdio->blkfactor;
- dio_count = sdio->final_block_in_request - sdio->block_in_file;
- fs_count = dio_count >> sdio->blkfactor;
- blkmask = (1 << sdio->blkfactor) - 1;
- if (dio_count & blkmask)
- fs_count++;
++ fs_endblk = (sdio->final_block_in_request - 1) >> sdio->blkfactor;
+ fs_count = fs_endblk - fs_startblk + 1;

map_bh->b_state = 0;
map_bh->b_size = fs_count << dio->inode->i_blkbits;

==============================================================================
TOPIC: [mm/memory.c]: transparent hugepage check condition missed
http://groups.google.com/group/linux.kernel/t/423b60da12215e28?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Nov 1 2011 1:30 am
From: "Guan Jun He"


>>> On 11/1/2011 at 09:18 AM, in message <1320110288.22361.190.camel@sli10-conroe>,
Shaohua Li <shaohua.li@intel.com> wrote:
> On Mon, 2011-10-31 at 16:23 +0800, Guanjun He wrote:
>> For the transparent hugepage module still does not support
>> tmpfs and cache,the check condition should always be checked
>> to make sure that it only affect the anonymous maps, the
>> original check condition missed this, this patch is to fix this.
>> Otherwise,the hugepage may affect the file-backed maps,
>> then the cache for the small-size pages will be unuseful,
>> and till now there is still no implementation for hugepage's cache.
>>
>> Signed-off-by: Guanjun He <gjhe@suse.com>
>> ---
>> mm/memory.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index a56e3ba..79b85fe 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3475,7 +3475,8 @@ int handle_mm_fault(struct mm_struct *mm, struct
> vm_area_struct *vma,
>> if (pmd_trans_huge(orig_pmd)) {
>> if (flags & FAULT_FLAG_WRITE &&
>> !pmd_write(orig_pmd) &&
>> - !pmd_trans_splitting(orig_pmd))
>> + !pmd_trans_splitting(orig_pmd) &&
>> + !vma->vm_ops)
>> return do_huge_pmd_wp_page(mm, vma, address,
>> pmd, orig_pmd);
>> return 0;
> so if vma->vm_ops != NULL, how could the pmd_trans_huge(orig_pmd) be
> true? We never enable THP if vma->vm_ops != NULL.
acturally, pmd_trans_huge(orig_pmd) only checks the _PAGE_PSE bits,
it's only a pagesize, not a flag to identity a hugepage.
If I change my default pagesize to PAGE_PSE, then THP will be confused.

There is already a defination:

#define VM_HUGEPAGE 0x01000000 /* MADV_HUGEPAGE marked this vma */

maybe,this can be the flag to identity a hugepage.But the comment marked it only stands for MADV_HUGEPAGE,
and it's still a hugepage.So, I suggest to add the check condition !vma->vm_ops, or turn to
use VM_HUGEPAGE as the flag.
or
adjust the logic to:
(transparent_hugepage_enabled() use the VM_HUGEPAGE flag)

if(transparent_hugepage_enabled(vma)){
if (pmd_none(*pmd){
...
}
else
{
...
}
}


the original logic is:

if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
{
...
}
else
{
...
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 3 ==
Date: Tues, Nov 1 2011 1:50 am
From: Shaohua Li


2011/11/1 Guan Jun He <gjhe@suse.com>:
>
>
>>>> On 11/1/2011 at 09:18 AM, in message <1320110288.22361.190.camel@sli10-conroe>,
> Shaohua Li <shaohua.li@intel.com> wrote:
>> On Mon, 2011-10-31 at 16:23 +0800, Guanjun He wrote:
>>> For the transparent hugepage module still does not support
>>> tmpfs and cache,the check condition should always be checked
>>> to make sure that it only affect the anonymous maps, the
>>> original check condition missed this, this patch is to fix this.
>>> Otherwise,the hugepage may affect the file-backed maps,
>>> then the cache for the small-size pages will be unuseful,
>>> and till now there is still no implementation for hugepage's cache.
>>>
>>> Signed-off-by: Guanjun He <gjhe@suse.com>
>>> ---
>>>  mm/memory.c |    3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index a56e3ba..79b85fe 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3475,7 +3475,8 @@ int handle_mm_fault(struct mm_struct *mm, struct
>> vm_area_struct *vma,
>>>              if (pmd_trans_huge(orig_pmd)) {
>>>                      if (flags & FAULT_FLAG_WRITE &&
>>>                          !pmd_write(orig_pmd) &&
>>> -                        !pmd_trans_splitting(orig_pmd))
>>> +                        !pmd_trans_splitting(orig_pmd) &&
>>> +                        !vma->vm_ops)
>>>                              return do_huge_pmd_wp_page(mm, vma, address,
>>>                                                         pmd, orig_pmd);
>>>                      return 0;
>> so if vma->vm_ops != NULL, how could the pmd_trans_huge(orig_pmd) be
>> true? We never enable THP if vma->vm_ops != NULL.
> acturally, pmd_trans_huge(orig_pmd) only checks the _PAGE_PSE bits,
> it's only a pagesize, not a flag to identity a hugepage.
> If I change my default pagesize to PAGE_PSE,
Not sure what pagesize means here, assume pmd entry bits.
how could you make the default 'pagesize' to PAGE_PSE?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 3 of 3 ==
Date: Tues, Nov 1 2011 2:20 am
From: "Guan Jun He"


>>> On 11/1/2011 at 04:42 PM, in message
<CANejiEVk41X-P+UyMf96jmPrJJ5-_vbubYtnQgaWXY2FLb41iw@mail.gmail.com>, Shaohua
Li <shaohua.li@intel.com> wrote:
> 2011/11/1 Guan Jun He <gjhe@suse.com>:
>>
>>
>>>>> On 11/1/2011 at 09:18 AM, in message <1320110288.22361.190.camel@sli10-conroe>,
>> Shaohua Li <shaohua.li@intel.com> wrote:
>>> On Mon, 2011-10-31 at 16:23 +0800, Guanjun He wrote:
>>>> For the transparent hugepage module still does not support
>>>> tmpfs and cache,the check condition should always be checked
>>>> to make sure that it only affect the anonymous maps, the
>>>> original check condition missed this, this patch is to fix this.
>>>> Otherwise,the hugepage may affect the file-backed maps,
>>>> then the cache for the small-size pages will be unuseful,
>>>> and till now there is still no implementation for hugepage's cache.
>>>>
>>>> Signed-off-by: Guanjun He <gjhe@suse.com>
>>>> ---
>>>> mm/memory.c | 3 ++-
>>>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index a56e3ba..79b85fe 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -3475,7 +3475,8 @@ int handle_mm_fault(struct mm_struct *mm, struct
>>> vm_area_struct *vma,
>>>> if (pmd_trans_huge(orig_pmd)) {
>>>> if (flags & FAULT_FLAG_WRITE &&
>>>> !pmd_write(orig_pmd) &&
>>>> - !pmd_trans_splitting(orig_pmd))
>>>> + !pmd_trans_splitting(orig_pmd) &&
>>>> + !vma->vm_ops)
>>>> return do_huge_pmd_wp_page(mm, vma, address,
>>>> pmd, orig_pmd);
>>>> return 0;
>>> so if vma->vm_ops != NULL, how could the pmd_trans_huge(orig_pmd) be
>>> true? We never enable THP if vma->vm_ops != NULL.
>> acturally, pmd_trans_huge(orig_pmd) only checks the _PAGE_PSE bits,
>> it's only a pagesize, not a flag to identity a hugepage.
>> If I change my default pagesize to PAGE_PSE,
> Not sure what pagesize means here, assume pmd entry bits.
yes, it's pmd entry bits.
> how could you make the default 'pagesize' to PAGE_PSE?
That requires some work and not so easy and need hardware support... So, recently it won't come.
But one can easily create the same pmd entry bits for some special use;
as comment above, it's a pmd entry bits, only mark a size, not a flag;
and adjust the logic to use the flag can perfect avoid this potential issuse,
and basically no impact to the current code.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: hda_hwdep: Fix possible buffer overflow
http://groups.google.com/group/linux.kernel/t/0918224a3fbaac5a?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Nov 1 2011 1:50 am
From: Alexander Stein


If a line in the firmware file is larger than the given buffer size (and
so the firmware file size), size is set to a value larger than the actual
buffer size. This results in an overflow in the buffer passed.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
---
Changes in v2:
* Just remove the erroneous check

sound/pci/hda/hda_hwdep.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/sound/pci/hda/hda_hwdep.c b/sound/pci/hda/hda_hwdep.c
index 72e5885..7e7d078 100644
--- a/sound/pci/hda/hda_hwdep.c
+++ b/sound/pci/hda/hda_hwdep.c
@@ -756,8 +756,6 @@ static int get_line_from_fw(char *buf, int size, struct firmware *fw)
}
if (!fw->size)
return 0;
- if (size < fw->size)
- size = fw->size;

for (len = 0; len < fw->size; len++) {
if (!*p)
--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


== 2 of 2 ==
Date: Tues, Nov 1 2011 1:50 am
From: Takashi Iwai


At Tue, 1 Nov 2011 09:40:07 +0100,
Alexander Stein wrote:
>
> If a line in the firmware file is larger than the given buffer size (and
> so the firmware file size), size is set to a value larger than the actual
> buffer size. This results in an overflow in the buffer passed.
>
> Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
> ---
> Changes in v2:
> * Just remove the erroneous check

Thanks, applied now.


Takashi

>
> sound/pci/hda/hda_hwdep.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)
>
> diff --git a/sound/pci/hda/hda_hwdep.c b/sound/pci/hda/hda_hwdep.c
> index 72e5885..7e7d078 100644
> --- a/sound/pci/hda/hda_hwdep.c
> +++ b/sound/pci/hda/hda_hwdep.c
> @@ -756,8 +756,6 @@ static int get_line_from_fw(char *buf, int size, struct firmware *fw)
> }
> if (!fw->size)
> return 0;
> - if (size < fw->size)
> - size = fw->size;
>
> for (len = 0; len < fw->size; len++) {
> if (!*p)
> --
> 1.7.3.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: ramoops appears geared to not support ARM
http://groups.google.com/group/linux.kernel/t/756889b4ecfdf6c6?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 2:00 am
From: Marco Stornelli


Il 01/11/2011 00:03, Bryan Freed ha scritto:
> On Mon, Oct 31, 2011 at 1:57 AM, Marco Stornelli
>>> And I cannot shake the feeling that we have a fairly simple disconnect
>>> here. Ramoops expects to use _device_ memory because it uses
>>> ioremap(). But the buffer itself is accessed through /dev/mem which
>>> (as we use it with no mmap() calls) expects to give access to _system_
>>
>> no mmap calls?! I don't understand how you are using /dev/mem.
>
> open(), lseek(), read(). No mmap is required for RAM, right?
> dd if=/dev/mem bs=1 count=1000 skip=32M
>

Mmmm, the operations done are different. Try: reserve the memory with
memblock_reserve and read some data with this useful program
http://free-electrons.com/pub/mirror/devmem2.c from the right location
(the address used for ramoops).

Let me know.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

==============================================================================
TOPIC: linux-next: build failure after merge of the akpm tree
http://groups.google.com/group/linux.kernel/t/efc4ff39e4b9bd8c?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 2:00 am
From: Stephen Rothwell


Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

next/drivers/scsi/sd.c: In function 'sd_probe':
next/drivers/scsi/sd.c:2583:43: error: 'SD_MAX_DISKS' undeclared (first use in this function)

Caused by commit ddabd33db5a2 ("drivers/scsi/sd.c: use ida_simple_get()
and ida_simple_remove() in place of boilerplate code") (which was fixed
up by me). I am not sure how to fix this properly (SD_MAX_DISKS was
removed by other commits), so I have just reverted that patch for today.

--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

==============================================================================
TOPIC: [RFC] A readahead complete notify approach to implement buffer aio
http://groups.google.com/group/linux.kernel/t/a7ba5127efa4667f?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 1 2011 2:00 am
From: Zhu Yanhai


The current libaio/aio has to be Direct-IO, otherwise it falls back into sync IO.
However, the aio core has already been asychronous naturally. This patch adds a complete
notify mechanism to implement buffer aio, the main idea is to readahead()-like in
io_submit(), counts the non-uptodated pages assocaiated with each iocb, then put each ref
in the bio complete path just before unlock_page(), and hook them on to the aio ring buffer
finally when the ref drops to zero. In io_getevents(), we call vfs_read() as a safe net
since there is still little possibility that the pages had brought in were reclaimed
between io_submit() and io_getevents().

I have tested this patch for a while, for the small size random io request, its
performance is more or less the same with the traditional aio, for the big io request,
the overhead of one extra memory copy arises.

I think so far it has at least below obvious drawbacks,

* mpage_readpage() is a really narrow interface, I have no way to pass down
the new control struct baiocb, so I just put it into struct task_struct and
refer it by current() as a workaround.

* the do_baio_read() routine is heavily similar with do_generic_file_read(), but
the latter is really hard to modify. I think we may stuff these code down into the
readahead path to reduce code reduplication.

Hopefully the explanations are clear enough and don't muddy the water any worse.
I figure the code does need some better comments, and any suggestion are welcome.

Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com>

---
fs/aio.c | 319 ++++++++++++++++++++++++++++++++++++++++++-
fs/buffer.c | 26 ++++-
fs/mpage.c | 28 ++++-
include/linux/aio.h | 9 ++
include/linux/aio_abi.h | 1 +
include/linux/blk_types.h | 2 +
include/linux/buffer_head.h | 3 +
include/linux/page-flags.h | 2 +
include/linux/sched.h | 1 +
9 files changed, 386 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index e29ec48..19fc95e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -53,6 +53,7 @@ unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio request

static struct kmem_cache *kiocb_cachep;
static struct kmem_cache *kioctx_cachep;
+static struct kmem_cache *ba_iocb_cachep;

static struct workqueue_struct *aio_wq;

@@ -75,6 +76,7 @@ static int __init aio_setup(void)
kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);

+ ba_iocb_cachep = KMEM_CACHE(ba_iocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
aio_wq = alloc_workqueue("aio", 0, 1); /* used to limit concurrency */
BUG_ON(!aio_wq);

@@ -1074,19 +1076,79 @@ static inline void clear_timeout(struct aio_timeout *to)
del_singleshot_timer_sync(&to->timer);
}

+static int baio_vfs_read(unsigned int fd, char __user *buf,
+ size_t count, loff_t pos)
+{
+ struct file *file;
+ ssize_t ret = -EBADF;
+ int fput_needed;
+
+ file = fget_light(fd, &fput_needed);
+ if (file) {
+ ret = vfs_read(file, buf, count, &pos);
+ fput_light(file, fput_needed);
+ }
+
+ return ret;
+}
+static int baio_read_to_user(struct io_event *ent)
+{
+ struct iocb __user *user_iocb;
+ struct iocb tmp;
+ int ret;
+
+ user_iocb = (struct iocb *)(ent->obj);
+ if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = baio_vfs_read(tmp.aio_fildes, (char *)tmp.aio_buf,
+ tmp.aio_nbytes, tmp.aio_offset);
+
+out:
+ return ret;
+}
+
+/*
+ * return 1 if ent->obj points to a buffer aio's iocb.
+ * 0 if it's not.
+ */
+static int check_baio(struct io_event *ent)
+{
+ struct iocb __user *user_iocb;
+ struct iocb tmp;
+ int ret;
+ user_iocb = (struct iocb *)ent->obj;
+ if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (tmp.aio_lio_opcode == IOCB_CMD_BAIO_PREAD)
+ ret = 1;
+ else
+ ret = 0;
+out:
+ return ret;
+
+}
static int read_events(struct kioctx *ctx,
long min_nr, long nr,
struct io_event __user *event,
struct timespec __user *timeout)
+
{
long start_jiffies = jiffies;
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
int ret;
+ int ret2;
int i = 0;
struct io_event ent;
struct aio_timeout to;
int retry = 0;
+ int is_baio = 0;

/* needed to zero any padding within an entry (there shouldn't be
* any, but C is fun!
@@ -1101,7 +1163,21 @@ retry:

dprintk("read event: %Lx %Lx %Lx %Lx\n",
ent.data, ent.obj, ent.res, ent.res2);
+ is_baio = check_baio(&ent);
+ if (unlikely(is_baio < 0)) {
+ ret = is_baio;
+ break;
+ }

+ if (is_baio) {
+ ret2 = baio_read_to_user(&ent);
+ if (unlikely(ret2 < 0)) {
+ ret = ret2;
+ dprintk("fail in baio_read_to_user: %d\n", ret);
+ break;
+ }
+ ent.res = ret2;
+ }
/* Could we split the check in two? */
ret = -EFAULT;
if (unlikely(copy_to_user(event, &ent, sizeof(ent)))) {
@@ -1167,12 +1243,27 @@ retry:
/*ret = aio_read_evt(ctx, &ent);*/
} while (1) ;

+
set_task_state(tsk, TASK_RUNNING);
remove_wait_queue(&ctx->wait, &wait);

if (unlikely(ret <= 0))
break;

+ is_baio = check_baio(&ent);
+ if (unlikely(is_baio < 0)) {
+ ret = is_baio;
+ break;
+ }
+ if (is_baio) {
+ ret2 = baio_read_to_user(&ent);
+ if (unlikely(ret2 < 0)) {
+ ret = ret2;
+ dprintk("fail in baio_read_to_user: %d\n", ret);
+ break;
+ }
+ ent.res = ret2;
+ }
ret = -EFAULT;
if (unlikely(copy_to_user(event, &ent, sizeof(ent)))) {
dprintk("aio: lost an event due to EFAULT.\n");
@@ -1284,6 +1375,32 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
return -EINVAL;
}

+
+void baio_complete(struct ba_iocb *baiocb)
+{
+ ssize_t ret = 0;
+ if (baiocb->io_error)
+ ret = baiocb->io_error;
+ if (ret == 0)
+ ret = baiocb->result;
+ dprintk("baio_complete: io_error: %d, result: %d\n",
+ baiocb->io_error, baiocb->result);
+
+ aio_complete(baiocb->iocb, ret, 0);
+
+}
+
+void baiocb_put(struct ba_iocb *baiocb)
+{
+ BUG_ON(!baiocb);
+ dprintk("baiocb_put: ref: %d\n", atomic_read(&baiocb->ref));
+ if (atomic_dec_and_test(&baiocb->ref)) {
+ baio_complete(baiocb);
+ kmem_cache_free(ba_iocb_cachep, baiocb);
+ }
+}
+EXPORT_SYMBOL(baiocb_put);
+
static void aio_advance_iovec(struct kiocb *iocb, ssize_t ret)
{
struct iovec *iov = &iocb->ki_iovec[iocb->ki_cur_seg];
@@ -1306,7 +1423,202 @@ static void aio_advance_iovec(struct kiocb *iocb, ssize_t ret)
* the remaining iovecs */
BUG_ON(ret > 0 && iocb->ki_left == 0);
}
+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
+
+
+static void init_baiocb(struct ba_iocb *baiocb, struct kiocb *iocb)
+{
+ atomic_set(&baiocb->ref, 1);
+ baiocb->iocb = iocb;
+ baiocb->io_error = 0;
+ baiocb->result = 0;
+
+}
+static inline void baiocb_get(struct ba_iocb *baiocb)
+{
+ BUG_ON(!baiocb);
+ atomic_add(1, &baiocb->ref);
+ pr_debug("baiocb_add: ref: %d\n", atomic_read(&baiocb->ref));
+}
+
+
+/*
+ * Return value is in desc->error, return the submitted bytes
+ * to read on success,
+ * In fact the exact value doesn't matter because it will be
+ * ignored in upper level aio_run_iocb() in the async path,
+ * and our code won't be envolved in the sync path
+ * anyway.
+ */
+void do_baio_read(struct file *file, struct kiocb *iocb, loff_t *ppos,
+ read_descriptor_t *desc)
+{
+ loff_t first_page_read_size;
+ size_t count = desc->count;
+ struct ba_iocb *baiocb;
+
+ unsigned long nr_pages_to_read, page_idx;
+ ssize_t ret = 0;
+ struct address_space *mapping;
+ struct inode *inode;
+ pgoff_t start, end, end_index;
+ loff_t isize;
+ LIST_HEAD(page_pool);
+ struct page *page;
+
+
+ start = *ppos >> PAGE_CACHE_SHIFT;
+ end = (*ppos + count - 1) >> PAGE_CACHE_SHIFT;
+ nr_pages_to_read = end - start + 1;
+ desc->error = 0;
+
+ first_page_read_size = PAGE_CACHE_SIZE - (*ppos & ~PAGE_CACHE_MASK);
+
+ mapping = file->f_mapping;
+ if (unlikely(!mapping->a_ops->readpage)) {
+ desc->error = -EINVAL;
+ return;
+ }
+
+ baiocb = kmem_cache_alloc(ba_iocb_cachep, GFP_KERNEL);
+ if (unlikely(!baiocb)) {
+ desc->error = -ENOMEM;
+ return;
+ }
+ /* allocate ba_iocb with one ref. */
+ init_baiocb(baiocb, iocb);
+ current->current_baiocb = baiocb;
+
+ inode = mapping->host;
+ isize = i_size_read(inode);
+ end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);

+ for (page_idx = 0; page_idx < nr_pages_to_read; page_idx++) {
+ pgoff_t page_offset = start + page_idx;
+ unsigned long nr;
+
+ if (page_offset > end_index)
+ break;
+
+ nr = PAGE_CACHE_SIZE;
+ if (page_idx == 0)
+ nr = first_page_read_size;
+ if (count < nr)
+ nr = count;
+ count -= nr;
+find_page:
+ page = find_get_page(mapping, page_offset);
+
+ pr_debug("To read %d bytes\n", nr);
+ if (page) {
+ ret = lock_page_killable(page);
+ if (unlikely(ret)) {
+ page_cache_release(page);
+ desc->error = ret;
+ goto out;
+ }
+ if(PageUptodate(page)) {
+ /* This won't go for IO. */
+ pr_debug("To baiocb_put as page is uptodated.\n");
+ unlock_page(page);
+ page_cache_release(page);
+ /* Avoid to be reclaimed. This is not good.
+ * Todo: get_page, then make some page pool, release
+ * them after all bios are finished.
+ */
+ /* mark_page_accessed(page); */
+ desc->written += nr;
+ continue;
+ }
+ if (PageError(page))
+ ClearPageError(page);
+ } else {
+ page = page_cache_alloc_cold(mapping);
+ if (!page) {
+ desc->error = -ENOMEM;
+ goto out;
+ }
+
+ ret = add_to_page_cache_lru(page, mapping,
+ page_offset, GFP_KERNEL);
+ if (ret) {
+ page_cache_release(page);
+ if (ret == -EEXIST) {
+ pr_debug("to baiocb_put as it's there\n");
+ ret = 0;
+ } else {
+ pr_debug("error in add_to_page_cache_lru\n");
+ desc->error = ret;
+ goto out;
+ }
+ }
+ }
+ /* We hold an extra ref to the page after above, also the page
+ * has been locked
+ */
+ BUG_ON(!page);
+ BUG_ON(!PageLocked(page));
+ SetPageBaio(page);
+ pr_debug("To readpage() %d\n", page_idx);
+ baiocb_get(baiocb);
+ ret = mapping->a_ops->readpage(file, page);
+ if (unlikely(ret)) {
+ baiocb_put(baiocb);
+ if (ret == AOP_TRUNCATED_PAGE) {
+ /* The AOP method that was handed a locked page
+ * has unlocked it. We just release the refcount
+ */
+ ClearPageBaio(page);
+ page_cache_release(page);
+ goto find_page;
+ }
+ desc->error = ret;
+ goto out;
+ }
+ page_cache_release(page);
+ }
+out:
+ pr_debug("To the finial baiocb_put()\n");
+ baiocb_put(baiocb);
+ current->current_baiocb = NULL;
+ return;
+
+}
+
+/*
+ * return -EIOCBQUEUED on success. The exact number of bytes are
+ * ignored by the upper level caller. At least we don't have to
+ * make it very precise at ths moment.
+ */
+ssize_t
+baio_read(struct kiocb *iocb, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos)
+{
+ int seg = 0;
+ ssize_t written = 0;
+ loff_t *ppos;
+
+ BUG_ON(!iocb);
+ ppos = &iocb->ki_pos;
+ for (seg = 0; seg < nr_segs; seg++) {
+ read_descriptor_t desc;
+ desc.written = 0;
+ desc.arg.buf = iov[seg].iov_base;
+ desc.count = iov[seg].iov_len;
+ if (desc.count == 0)
+ continue;
+ desc.error = 0;
+ do_baio_read(iocb->ki_filp, iocb, ppos, &desc);
+ written += desc.written;
+
+ if (desc.error) {
+ written = written ? : desc.error;
+ break;
+ }
+ }
+ return (written < 0) ? written : -EIOCBQUEUED;
+}
static ssize_t aio_rw_vect_retry(struct kiocb *iocb)
{
struct file *file = iocb->ki_filp;
@@ -1321,6 +1633,9 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb)
(iocb->ki_opcode == IOCB_CMD_PREAD)) {
rw_op = file->f_op->aio_read;
opcode = IOCB_CMD_PREADV;
+ } else if (iocb->ki_opcode == IOCB_CMD_BAIO_PREAD) {
+ rw_op = baio_read;
+ opcode = IOCB_CMD_BAIO_PREAD;
} else {
rw_op = file->f_op->aio_write;
opcode = IOCB_CMD_PWRITEV;
@@ -1429,6 +1744,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
ssize_t ret = 0;

switch (kiocb->ki_opcode) {
+ case IOCB_CMD_BAIO_PREAD:
case IOCB_CMD_PREAD:
ret = -EBADF;
if (unlikely(!(file->f_mode & FMODE_READ)))
@@ -1794,6 +2110,7 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
put_ioctx(ioctx);
}

- asmlinkage_protect(5, ret, ctx_id, min_nr, nr, events, timeout);
+ asmlinkage_protect(5, ret, ctx_id, min_nr, nr,
+ events, timeout);
return ret;
}
diff --git a/fs/buffer.c b/fs/buffer.c
index 1a80b04..26d2bfe 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -52,6 +52,7 @@ init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
{
bh->b_end_io = handler;
bh->b_private = private;
+ bh->b_private2 = NULL;
}
EXPORT_SYMBOL(init_buffer);

@@ -309,7 +310,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
struct buffer_head *tmp;
struct page *page;
int page_uptodate = 1;
-
+ struct ba_iocb *baiocb;
BUG_ON(!buffer_async_read(bh));

page = bh->b_page;
@@ -351,6 +352,18 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
*/
if (page_uptodate && !PageError(page))
SetPageUptodate(page);
+
+ baiocb = (struct ba_iocb *)bh->b_private2;
+ BUG_ON(baiocb && !PageBaio(page));
+ BUG_ON(!baiocb && PageBaio(page));
+
+ if (baiocb && PageBaio(page)) {
+ ClearPageBaio(page);
+ if (!page_uptodate || PageError(page))
+ baiocb->io_error = -EIO;
+ baiocb->result += PAGE_SIZE;
+ baiocb_put(baiocb);
+ }
unlock_page(page);
return;

@@ -2159,6 +2172,8 @@ int block_read_full_page(struct page *page, get_block_t *get_block)
*/
if (!PageError(page))
SetPageUptodate(page);
+ if (PageBaio(page))
+ baiocb_put(current->current_baiocb);
unlock_page(page);
return 0;
}
@@ -2902,7 +2917,11 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
if (unlikely (test_bit(BIO_QUIET,&bio->bi_flags)))
set_bit(BH_Quiet, &bh->b_state);

+ if (bio_flagged(bio, BIO_BAIO))
+ bh->b_private2 = (void *)bio->bi_private2;
+
bh->b_end_io(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
+ clear_bit(BIO_BAIO, &bio->bi_flags);
bio_put(bio);
}

@@ -2942,6 +2961,11 @@ int submit_bh(int rw, struct buffer_head * bh)
bio->bi_end_io = end_bio_bh_io_sync;
bio->bi_private = bh;

+ if (PageBaio(bh->b_page)) {
+ set_bit(BIO_BAIO, &bio->bi_flags);
+ bio->bi_private2 = (void *)current->current_baiocb;
+ }
+
bio_get(bio);
submit_bio(rw, bio);

diff --git a/fs/mpage.c b/fs/mpage.c
index fdfae9f..6bcfbed 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -58,6 +58,16 @@ static void mpage_end_io(struct bio *bio, int err)
ClearPageUptodate(page);
SetPageError(page);
}
+ if (bio_flagged(bio, BIO_BAIO) && PageBaio(page)) {
+ struct ba_iocb *baiocb =
+ (struct ba_iocb *)bio->bi_private2;
+ clear_bit(BIO_BAIO, &bio->bi_flags);
+ ClearPageBaio(page);
+ if (!uptodate)
+ baiocb->io_error = -EIO;
+ baiocb->result += bvec->bv_len;
+ baiocb_put(baiocb);
+ }
unlock_page(page);
} else { /* bio_data_dir(bio) == WRITE */
if (!uptodate) {
@@ -167,11 +177,12 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
unsigned page_block;
unsigned first_hole = blocks_per_page;
struct block_device *bdev = NULL;
- int length;
+ int length, bio_length;
int fully_mapped = 1;
unsigned nblocks;
unsigned relative_block;

+
if (page_has_buffers(page))
goto confused;

@@ -265,6 +276,8 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
zero_user_segment(page, first_hole << blkbits, PAGE_CACHE_SIZE);
if (first_hole == 0) {
SetPageUptodate(page);
+ if (PageBaio(page))
+ baiocb_put(current->current_baiocb);
unlock_page(page);
goto out;
}
@@ -294,7 +307,13 @@ alloc_new:
}

length = first_hole << blkbits;
- if (bio_add_page(bio, page, length, 0) < length) {
+ bio_length = bio_add_page(bio, page, length, 0);
+ if (PageBaio(page)) {
+ bio->bi_private2 = (void *)current->current_baiocb;
+ set_bit(BIO_BAIO, &bio->bi_flags);
+ }
+
+ if (bio_length < length) {
bio = mpage_bio_submit(READ, bio);
goto alloc_new;
}
@@ -314,8 +333,11 @@ confused:
bio = mpage_bio_submit(READ, bio);
if (!PageUptodate(page))
block_read_full_page(page, get_block);
- else
+ else {
+ if (PageBaio(page))
+ baiocb_put(current->current_baiocb);
unlock_page(page);
+ }
goto out;
}

diff --git a/include/linux/aio.h b/include/linux/aio.h
index 2dcb72b..36ce4f2 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -202,6 +202,13 @@ struct kioctx {
struct rcu_head rcu_head;
};

+struct ba_iocb {
+ atomic_t ref;
+ struct kiocb *iocb;
+ int io_error;
+ ssize_t result;
+};
+
/* prototypes */
extern unsigned aio_max_size;

@@ -214,6 +221,7 @@ struct mm_struct;
extern void exit_aio(struct mm_struct *mm);
extern long do_io_submit(aio_context_t ctx_id, long nr,
struct iocb __user *__user *iocbpp, bool compat);
+extern void baiocb_put(struct ba_iocb *baiocb);
#else
static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
static inline int aio_put_req(struct kiocb *iocb) { return 0; }
@@ -224,6 +232,7 @@ static inline void exit_aio(struct mm_struct *mm) { }
static inline long do_io_submit(aio_context_t ctx_id, long nr,
struct iocb __user * __user *iocbpp,
bool compat) { return 0; }
+static void baiocb_put(struct ba_iocb *baiocb) { }

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate