[PATCH] optimize follow_hugetlb_page
follow_hugetlb_page() walks a range of user virtual address and then fills
in list of struct page * into an array that is passed from the argument
list. It also gets a reference count via get_page(). For compound page,
get_page() actually traverse back to head page via page_private() macro and
then adds a reference count to the head page. Since we are doing a virt to
pte look up, kernel already has a struct page pointer into the head page.
So instead of traverse into the small unit page struct and then follow a
link back to the head page, optimize that with incrementing the reference
count directly on the head page.
The benefit is that we don't take a cache miss on accessing page struct for
the corresponding user address and more importantly, not to pollute the
cache with a "not very useful" round trip of pointer chasing. This adds a
moderate performance gain on an I/O intensive database transaction
workload.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 075877b..06699d8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -661,10 +661,10 @@
struct page **pages, struct vm_area_struct **vmas,
unsigned long *position, int *length, int i)
{
- unsigned long vpfn, vaddr = *position;
+ unsigned long pfn_offset;
+ unsigned long vaddr = *position;
int remainder = *length;
- vpfn = vaddr/PAGE_SIZE;
spin_lock(&mm->page_table_lock);
while (vaddr < vma->vm_end && remainder) {
pte_t *pte;
@@ -692,19 +692,28 @@
break;
}
- if (pages) {
- page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)];
- get_page(page);
- pages[i] = page;
- }
+ pfn_offset = (vaddr & ~HPAGE_MASK) >> PAGE_SHIFT;
+ page = pte_page(*pte);
+same_page:
+ get_page(page);
+ if (pages)
+ pages[i] = page + pfn_offset;
if (vmas)
vmas[i] = vma;
vaddr += PAGE_SIZE;
- ++vpfn;
+ ++pfn_offset;
--remainder;
++i;
+ if (vaddr < vma->vm_end && remainder &&
+ pfn_offset < HPAGE_SIZE/PAGE_SIZE) {
+ /*
+ * We use pfn_offset to avoid touching the pageframes
+ * of this compound page.
+ */
+ goto same_page;
+ }
}
spin_unlock(&mm->page_table_lock);
*length = remainder;