| Age | Commit message (Collapse) | Author | Files | Lines |
|
From: William Lee Irwin III <wli@holomorphy.com>
Without passing this parameter by reference, the changes to used_node_mask
are meaningless and do not affect the caller's copy.
This leads to boot-time failure. This proposed fix passes it by reference.
Signed-off-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add x86_64 support for Jack Steiner's SLIT sysfs patch
Make Jack's code compile on x86-64 and add x86-64 low level support to save
the SLIT pointer and a node_distance() implementation.
Requires the previous SRAT patch.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add SLIT (inter node distance) information to sysfs.
[This is Jack's patch that he submitted on l-k. I'm submitting
it for him because I need it for my x86-64 followon SLIT patch.
Hope I don't stomp onto his toes with that one. If you already
merged it please ignore]
From: Jack Steiner
Here is an update patch to externalize the SLIT information. I think I have
encorporated all the comments that were posted previously)
For example:
# cd /sys/devices/system
# find .
./node
./node/node5
./node/node5/distance
./node/node5/numastat
./node/node5/meminfo
./node/node5/cpumap
# cat ./node/node0/distance
10 20 64 42 42 22
# cat node/*/distance
10 20 64 42 42 22
20 10 42 22 64 84
64 42 10 20 22 42
42 22 20 10 42 62
42 64 22 42 10 20
22 84 42 62 20 10
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below enable to display the size of Active/Inactive pages on
per-node meminfo (/sys/devices/system/node/node%d/meminfo) like
/proc/meminfo.
By a little change to procps, "vmstat -a" can show these statistics about
particular node.
From: mita akinobu <amgta@yacht.ocn.ne.jp>
get_zone_counts() is used by max_sane_readahead(), and
max_sane_readahead() is often called in filemap_nopage().
Signed-off-by: Akinobu Mita <amgta@yacht.ocn.ne.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Paul Jackson points out that the sysfs code saves a node's cpumask in the
sysfs node, although it can change with CPU hotplug. Don't do this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
It adds per node huge page stats in sysfs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Whitespace and formatting changes (a,b,c -> a, b, c) in drivers/base
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
As pointed out by Paul Jackson <pj@sgi.com>, sometimes 99 chars is not enough.
We currently get a page from sysfs: that code should check we haven't overrun
it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Andi Kleen <ak@suse.de>
Patch readds the sysfs output of the NUMA API statistics. All my test
scripts need this and it is very useful to check if the policy actually
works.
This got lost when the huge page numa api changes got dropped.
I decided to not resend the huge pages NUMA API changes for now. Instead I
will wait for this area to settle when demand paged large pages is merged.
|
|
sys_xyz() names in Linux are all syscalls... except for
sys_device_register() and sys_device_unregister().
This patch renames them so that the sys_ namespace is once
again used only by syscalls.
|
|
From: Joe Korty <joe.korty@ccur.com>
Rename bitmap_snprintf() to bitmap_scnprintf() and cpumask_snprintf() to
cpumask_scnprintf(), as these functions now belong to the scnprintf family
of functions.
|
|
From: Paul Jackson <pj@sgi.com>
This patch is a followup to one from Bill Irwin. On Nov
17, he had consolidated the half-dozen chunks of code
that displayed cpumasks in /proc/irq/prof_cpu_mask and
/proc/irq/<pid>/smp_affinity into a single routine, which he
called format_cpumask().
I believe that Andrew Morton has accepted Bill's patch into
his 2.6.0-test10-mm1 patch set as the "format_cpumask" patch.
I hope that the following patch will replace Bill's patch.
I look forward to Bill's feedback on this patch.
The following patch carries Bill's work further:
1) It also consolidates the input side (write syscalls).
2) It adapts a new format, same on input and output.
3) The core routines work for any multi-word bitmask,
not just cpumasks.
4) The core routines avoid overrunning their output
buffers.
Note esp. for David Mosberger:
The small patch I sent you and the linux-ia64 list
yesterday entitled: "check user access ok writing
/proc/irq/<pid>/smp_affinity" for arch ia64 only is
_separate_ from the following patch. Neither presumes the
other. However, they do collide on one line. Last one in
is a Monkey's Uncle and will need an updated patch from me
(or otherwise need to resolve the one obvious collision).
Details of the following patch:
Both the display and input of cpumasks on 9 arch's are
consolidated into a single pair of routines, which use the
same format for input and output, as recommended by Tony
Luck. The two common routines work on any multi-word bitmask
(array of unsigned longs). A pair of trivial inline wrappers
cpumask_snprintf() and cpumask_parse() hide this generality
for the common case of cpumask input and output.
My real motivation for consolidating this code will become
visible later - when I seek to add a nodemask_t that resembles
cpumask_t (just a different length). These common underlying
routines will be used there as well, following up on a suggestion
of Christoph Hellwig that I investigate implementing nodemask_t
as an ADT sharing infrastructure with cpumask_t. However, I
believe that this patch stands on its own merit, consolidating
a couple hundred lines of duplicated code, and making the
cpumask display format usable on very large systems.
There are two exceptions to the consolidation - the alpha and
sparc64 arch's manipulate bare unsigned longs, not cpumask_t's,
on input (write syscall), and do stuff that was more funky than
I could make sense of. So the input side of these two arch's
was left as-is. I'd welcome someone with access to either of
these systems to provide additional patches.
The new format consists of multiple 32 bit words, separated by
commas, displayed and input in hex. The following comment from
this patch describes this format further:
* The ascii representation of multi-word bit masks displays each
* 32bit word in hex (not zero filled), and for masks longer than
* one word, uses a comma separator between words. Words are
* displayed in big-endian order most significant first. And hex
* digits within a word are also in big-endian order, of course.
*
* Examples:
* A mask with just bit 0 set displays as "1".
* A mask with just bit 127 set displays as "80000000,0,0,0".
* A mask with just bit 64 set displays as "1,0,0".
* A mask with bits 0, 1, 2, 4, 8, 16, 32 and 64 set displays
* as "1,1,10117". The first "1" is for bit 64, the second
* for bit 32, the third for bit 16, and so forth, to the
* "7", which is for bits 2, 1 and 0.
* A mask with bits 32 through 39 set displays as "ff,0".
The essential reason for adding the comma breaks was to make
the long masks from our (SGI's) big 512 CPU systems parsable by
humans. An unbroken string of 128 hex digits is pretty difficult
to read. For those who are compiling systems with CONFIG_NR_CPUS
of 32 or less, there should be no visible change in format.
There are of course a thousand possible output formats that
meet similar criteria. If someone wants to lobby for and seek
consensus behind another such format, that's fine. Now that
the format is consolidated into a single pair of routines,
it should be easy to adapt whatever we choose.
Internally, the display routine uses snprintf to track the
remaining space in its output buffer, to avoid the risk of
overrunning it.
A new file, lib/mask.c, is added to the lib directory, to
hold the two common routines. I anticipate adding a few more
common routines for generic support of multi-word bit masks to
lib/mask.c, in subsequent patches that will add a nodemask_t
type as an ADT sharing implementation with cpumask_t.
|
|
From: William Lee Irwin III <wli@holomorphy.com>
Contributions from:
Jan Dittmer <jdittmer@sfhq.hn.org>
Arnd Bergmann <arnd@arndb.de>
"Bryan O'Sullivan" <bos@serpentine.com>
"David S. Miller" <davem@redhat.com>
Badari Pulavarty <pbadari@us.ibm.com>
"Martin J. Bligh" <mbligh@aracnet.com>
Zwane Mwaikambo <zwane@linuxpower.ca>
It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64.
cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their
cpus by creating an abstract data type dedicated to representing cpu
bitmasks, similar to fd sets from userspace, and sweeping the appropriate
code to update callers to the access API. The fd set-like structure is
according to Linus' own suggestion; the macro calling convention to ambiguate
representations with minimal code impact is my own invention.
Specifically, a new set of inline functions for manipulating arbitrary-width
bitmaps is introduced with a relatively simple implementation, in tandem with
a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose
accessor functions are defined in terms of the bitmap manipulation inlines.
This bitmap ADT found an additional use in i386 arch code handling sparse
physical APIC ID's, which was convenient to use in this case as the
accounting structure was required to be wider to accommodate the physids
consumed by larger numbers of cpus.
For the sake of simplicity and low code impact, these cpu bitmasks are passed
primarily by value; however, an additional set of accessors along with an
auxiliary data type with const call-by-reference semantics is provided to
address performance concerns raised in connection with very large systems,
such as SGI's larger models, where copying and call-by-value overhead would
be prohibitive. Few (if any) users of the call-by-reference API are
immediately introduced.
Also, in order to avoid calling convention overhead on architectures where
structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is
special-cased so that cpumask_t falls back to an unsigned long and the
accessors perform the usual bit twiddling on unsigned longs as opposed to
arrays thereof. Audits were done with the structure overhead in-place,
restoring this special-casing only afterward so as to ensure a more complete
API conversion while undergoing the majority of its end-user exposure in -mm.
More -mm's were shipped after its restoration to be sure that was tested,
too.
The immediate users of this functionality are Sun sparc64 systems, SGI mips64
and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the
ppc64 machines needing the functionality have yet to be released; all others
have had systems requiring it for full functionality for at least 6 months,
and in some cases, since the initial Linux port to the affected architecture.
|
|
|
|
|
|
|
|
From: Matthew Dobson <colpatch@us.ibm.com>
sched_best_cpu schedules processes on nodes based on node_nr_running. For
CPU-less nodes, this is always 0, and thus sched_best_cpu tends to migrate
tasks to these nodes, which eventually get remigrated elsewhere.
This patch adds include/linux/topology.h, and modifies all includes of
asm/topology.h to linux/topology.h. A subsequent patch in this series adds
helper functions to linux/topology.h to ensure processes are only migrated
to nodes with CPUs.
Test compiled and booted by Andrew Theurer (habanero@us.ibm.com) on both
x440 and ppc64.
|
|
|
|
From: "Martin J. Bligh" <mbligh@aracnet.com>
Fix a couple of instances of "warning: suggest parentheses around assignment
used as truth value".
|
|
From Matt Dobson:
The cpu, memblk, and node driver/device registration should be a little
more clean in the way it handles registration failures. Or at least
*consistent* amongst the topology elements. Right now, failures are
either silent, obscure, or leave things in an inconsistent state.
|
|
Patch from Matthew Dobson <colpatch@us.ibm.com>
When I originally wrote the patches implementing the in-kernel topology
macros, they were meant to be called as a second layer of functions,
sans underbars. This additional layer was deemed unnecessary and
summarily dropped. As such, carrying around (and typing!) all these
extra underbars is quite pointless. Here's a patch to nip this in the
(sorta) bud. The macros only appear in 16 files so far, most of them
being the definitions themselves.
|
|
- Remove count and off parameters from show() method.
|
|
- Remove count and off parameters from per-node meminfo show() method.
|
|
This (from wli & myself) was overlooked for 2.5.51. Without this fix,
sysfs panics when registering topology for NUMA boxen.
|
|
This fixes an Oops on boot on NUMA systems, since the driver tries to access the
device class when it's registered.
|
|
From Matthew Dobson.
This final patch from Matthew cleans up a few leftovers which were noted
after the code had been reviewed and tested a bit in the -mm patchsets.
1) Update register_XXX and arch_register_XXX functions to return int
instead of void. Functions calling these functions should know if
they completed successfully to take appropriate further registration
action, or not bother.
2) Drop some pointless error checking in the arch_register_XXX
functions.
|
|
From Matthew Dobson.
Create nodeX/meminfo files for DriverFS Topology.
This patch adds code to DriverFS Topology to expose per-node memory
statistics. This information is exposed via: cat nodeX/meminfo
The patch also adds 2 helper functions to gather per-node memory info.
|
|
From Matthew Dobson.
Update/Create core files for DriverFS Topology.
This patch creates the generic structures that are (will be) embedded in
the per-arch structures. Also creates calls to register these generic
structures (CPUs, MemBlks, & Nodes).
Note that without arch-specific structures in which to embed these
structures, and an arch-specific initialization routine, these
functions/structures remain unused.
|