linux/include/asm-x86/bitops_64.h, branch master

x86: finalize bitops unification

2008-04-26T17:21:17Z

include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER and has an extra inline function set_bit_string. The define currently has no influence on the generated code, but it can be argued that setting it on i386 is the right thing to do anyhow. The addition of the extra inline function on i386 does not hurt either. Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

x86, UML: remove x86-specific implementations of find_first_bit

2008-04-26T17:21:17Z

x86 has been switched to the generic versions of find_first_bit and find_first_zero_bit, but the original versions were retained. This patch just removes the now unused x86-specific versions. also update UML. Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

x86: switch 64-bit to generic find_first_bit

2008-04-26T17:21:16Z

Switch x86_64 to generic find_first_bit. The x86_64-specific implementation is not removed. Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

bitops: use __fls for fls64 on 64-bit archs

2008-04-26T17:21:16Z

Use __fls for fls64 on 64-bit archs. The implementation for 64-bit archs is moved from x86_64 to asm-generic. Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

x86: merge the simple bitops and move them to bitops.h

2008-04-26T17:21:16Z

Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Improved comments for ffs, ffz, fls and variations. Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps

2008-04-26T17:21:16Z

This moves an optimization for searching constant-sized small bitmaps form x86_64-specific to generic code. On an i386 defconfig (the x86#testing one), the size of vmlinux hardly changes with this applied. I have observed only four places where this optimization avoids a call into find_next_bit: In the functions return_unused_surplus_pages, alloc_fresh_huge_page, and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap. In __next_cpu a call is avoided for a 32-bit bitmap. That's it. On x86_64, 52 locations are optimized with a minimal increase in code size: Current #testing defconfig: 146 x bsf, 27 x find_next_*bit text data bss dec hex filename 5392637 846592 724424 6963653 6a41c5 vmlinux After removing the x86_64 specific optimization for find_next_*bit: 94 x bsf, 79 x find_next_*bit text data bss dec hex filename 5392358 846592 724424 6963374 6a40ae vmlinux After this patch (making the optimization generic): 146 x bsf, 27 x find_next_*bit text data bss dec hex filename 5392396 846592 724424 6963412 6a40d4 vmlinux [ tglx@linutronix.de: build fixes ] Signed-off-by: Ingo Molnar

x86: change x86 to use generic find_next_bit

2008-04-26T17:21:16Z

The versions with inline assembly are in fact slower on the machines I tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid crashing the benchmark. Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 1...512, for each possible bitmap with one bit set, for each possible offset: find the position of the first bit starting at offset. If you follow ;). Times include setup of the bitmap and checking of the results. Athlon Xeon Opteron 32/64bit x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s If the bitmap size is not a multiple of BITS_PER_LONG, and no set (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a value outside of the range [0, size]. The generic version always returns exactly size. The generic version also uses unsigned long everywhere, while the x86 versions use a mishmash of int, unsigned (int), long and unsigned long. Using the generic version does give a slightly bigger kernel, though. defconfig: text data bss dec hex filename x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit) generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit) x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit) generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit) Signed-off-by: Alexander van Heukelum Signed-off-by: Ingo Molnar

include/asm-x86/bitops_64.h: checkpatch cleanups - formatting only

2008-04-17T15:41:22Z

Signed-off-by: Joe Perches Signed-off-by: Ingo Molnar

iommu sg: kill __clear_bit_string and find_next_zero_string

2008-02-05T17:44:11Z

This kills unused __clear_bit_string and find_next_zero_string (they were used by only gart and calgary IOMMUs). Signed-off-by: FUJITA Tomonori Cc: Jeff Garzik Cc: James Bottomley Cc: Jens Axboe Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Muli Ben-Yehuda Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

x86: partial unification of asm-x86/bitops.h

2008-01-30T12:30:55Z

This unifies the set/clear/test bit functions of asm/bitops.h. I have not attempted to merge the bit-finding functions, since they rely on the machine word size and can't be easily restructured to work generically without a lot of #ifdefs. In particular, the 64-bit code can assume the presence of conditional move instructions, whereas 32-bit needs to be more careful. The inline assembly for the bit operations has been changed to remove explicit sizing hints on the instructions, so the assembler will pick the appropriate instruction forms depending on the architecture and the context. Signed-off-by: Jeremy Fitzhardinge Cc: Andi Kleen Cc: Linus Torvalds Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner