| Age | Commit message (Collapse) | Author | Files | Lines |
|
into mars.ravnborg.org:/home/sam/bk/kbuild
|
|
This my cpuset patch, with the following changes in the last two weeks:
1) Updated to 2.6.8.1-mm1
2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty,
not copied from parent. Needed to avoid breaking exclusive property.
3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top
cpuset from cpu_possible_map after smp_init() called.
4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages()
if the current tasks cpuset mems_allowed has changed. Use a cpuset
generation number, bumped on any cpuset memory placement change,
to make this check efficient. Update the tasks mems_allowed from
its cpuset, if the cpuset has changed.
5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset,
then update its cpus_allowed, using set_cpus_allowed().
6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to
reflect above changes (4) and (5).
I continue to recommend the following patch for inclusion in your 2.6.9-*mm
series, when that opens. It provides an important facility for high
performance computing on large systems. Simon Derr of Bull (France) and
myself are the primary authors. Erich Focht has indicated that NEC is also
a potential user of this patch on the TX-7 NUMA machines, and that he
"would very much welcome the inclusion of cpusets."
I offer this update to lkml, in order to invite continued feedback.
The one prerequiste patch for this cpuset patch was just posted before this
one. That was a patch to provide a new bitmap list format, of which
cpusets is the first user.
This patch has been built on top of 2.6.8.1-mm1, for the arch's:
i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64
with and without CONFIG_CPUSET. It has been booted and tested on ia64
(sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for
what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in
the final link step.
===
Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to
a set of tasks.
Cpusets constrain the CPU and Memory placement of tasks to only the
processor and memory resources within a tasks current cpuset. They form a
nested hierarchy visible in a virtual file system. These are the essential
hooks, beyond what is already present, required to manage dynamic job
placement on large systems.
Cpusets require small kernel hooks in init, exit, fork, mempolicy,
sched_setaffinity, page_alloc and vmscan. And they require a "struct
cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t
(to go along with the "cpus_allowed" cpumask_t that's already there) in
each task struct.
These hooks:
1) establish and propagate cpusets,
2) enforce CPU placement in sched_setaffinity,
3) enforce Memory placement in mbind and sys_set_mempolicy,
4) restrict page allocation and scanning to mems_allowed, and
5) restrict migration and set_cpus_allowed to cpus_allowed.
The other required hook, restricting task scheduling to CPUs in a tasks
cpus_allowed mask, is already present.
Cpusets extend the usefulness of, the existing placement support that was
added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and
mbind() and set_mempolicy() for memory placement. On smaller or dedicated
use systems, the existing calls are often sufficient.
On larger NUMA systems, running more than one, performance critical, job,
it is necessary to be able to manage jobs in their entirety. This includes
providing a job with exclusive CPU and memory that no other job can use,
and being able to list all tasks currently in a cpuset.
A given job running within a cpuset, would likely use the existing
placement calls to manage its CPU and memory placement in more detail.
Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is
represented by a directory in the cpuset virtual file system, normally
mounted at /dev/cpuset.
Each cpuset directory provides the following files, which can be
read and written:
cpus:
List of CPUs allowed to tasks in that cpuset.
mems:
List of Memory Nodes allowed to tasks in that cpuset.
tasks:
List of pid's of tasks in that cpuset.
cpu_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its CPUs (no sibling or cousin cpuset may overlap CPUs).
mem_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its Memory Nodes (no sibling or cousin may overlap).
notify_on_release:
Flag (0 or 1) - if set, then /sbin/cpuset_release_agent
will be invoked, with the name (/dev/cpuset relative path)
of that cpuset in argv[1], when the last user of it (task
or child cpuset) goes away. This supports automatic
cleanup of abandoned cpusets.
In addition one new filetype is added to the /proc file system:
/proc/<pid>/cpuset:
For each task (pid), list its cpuset path, relative to the
root of the cpuset file system. This file is read-only.
New cpusets are created using 'mkdir' (at the shell or in C). Old ones are
removed using 'rmdir'. The above files are accessed using read(2) and
write(2) system calls, or shell commands such as 'cat' and 'echo'.
The CPUs and Memory Nodes in a given cpuset are always a subset of its
parent. The root cpuset has all possible CPUs and Memory Nodes in the
system. A cpuset may be exclusive (cpu or memory) only if its parent is
similarly exclusive.
See further Documentation/cpusets.txt, at the top of the following
patch.
/proc interface:
It is useful, when learning and making new uses of cpusets and placement to be
able to see what are the current value of a tasks cpus_allowed and
mems_allowed, which are the actual placement used by the kernel scheduler and
memory allocator.
The cpus_allowed and mems_allowed values are needed by user space apps that
are micromanaging placement, such as when moving an app to a obtained by
that app within its cpuset using sched_setaffinity, mbind and
set_mempolicy.
The cpus_allowed value is also available via the sched_getaffinity system
call. But since the entire rest of the cpuset API, including the display
of mems_allowed added here, is via an ascii style presentation in /proc and
/dev/cpuset, it is worth the extra couple lines of code to display
cpus_allowed in the same way.
This patch adds the display of these two fields to the 'status' file in the
/proc/<pid> directory of each task. The fields are only added if
CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed
field of each task). The new output lines look like:
$ tail -2 /proc/1/status
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed: ffffffff,ffffffff
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch provides support for thread and process CPU time clocks in the
POSIX clock interface. Both the existing utime and utime+stime information
(already available via getrusage et al) can be used, as well as a new
(potentially) more precise and accurate clock (which cannot distinguish user
from system time). The clock used is that provided by the `sched_clock'
function already used internally by the scheduler. This gives a way for
platforms to provide the highest-resolution CPU time tracking that is
available cheaply, and some already do so (such as x86 using TSC). Because
this clock is already sampled internally by the scheduler, this new tracking
adds only the tiniest new overhead to accomplish the bookkeeping.
Some notes:
This allows per-thread clocks to be accessed only by other threads in the same
process. The only POSIX calls that access these are defined only for
in-process use, and having this check is necessary for the userland
implementations of the POSIX clock functions to robustly refuse stale
clockid_t's in the face of potential PID reuse.
This makes no constraint on who can see whose per-process clocks. This
information is already available for the VIRT and PROF (i.e. utime and stime)
information via /proc. I am open to suggestions on if/how security
constraints on who can see whose clocks should be imposed.
The SCHED clock information is now available only via clock_* syscalls. This
means that per-thread information is not available outside the process.
Perhaps /proc should show sched_time as well? This would let ps et al show
this more-accurate information.
When this code is merged, it will be supported in glibc. I have written the
support and added some test programs for glibc, which are what I mainly used
to test the new kernel code. You can get those here:
http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I'd need it merged into mainline at some point, unless anybody has strong
arguments against it. All I can guarantee here, is that I'll back it out
myself in the future, iff Cpushare will fail and nobody else started using
it in the meantime for similar security purposes.
(akpm: project details are at http://www.cpushare.com/technical. It seems
like a good idea to me, and one which is worth supporting. I agree that for
this to be successful, the added robustness of Andrea's simple and specific
jail is worthwhile).
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch makes a needlessly global variable static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Sticking the not-implemented syscall stuff in sys.c is a pain because the
cond_syscall()s explode when certain prototypes are in scope. And we need
those prototypes' header files for the C code in sys.c.
Fix all that up by moving all the sys_ni_syscall code into its own .c file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
A simple ringbuffer implementation for various character drivers.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
The following patch series consolidates the various instances of waitqueue
hashing to use a uniform structure and share the per-zone hashtable among all
waitqueue hashers. This is expected to increase the number of hashtable
buckets available for waiting on bh's and inodes and eliminate statically
allocated kernel data structures for greater node locality and reduced kernel
image size. Some attempt was made to look similar to Oleg Nesterov's
suggested API in order to provide some kind of credit for independent
invention of something very similar (the original versions of these patches
predated my public postings on the subject of filtered waitqueues).
These patches have the further benefit and intention of enabling aio to use
filtered wakeups by standardizing the data structure passed to wake functions
so that embedded waitqueue elements in aio structures may be succesfully
passed to the filtered wakeup wake functions, though this patch series doesn't
implement that particular functionality.
Successfully stress-tested on x86-64, and ia64 in recent prior versions.
This patch:
Move waitqueue -related functions not needing static functions in sched.c
to kernel/wait.c
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The main goal of this patch is to consolidate all the different but still
fundamentally similar arch/*/kernel/irq.c code into the kernel/irq/ subsystem.
There are 4 new files in the kernel/irq/ directory:
- handle.c: core bits: __do_IRQ() and handle_IRQ_event(),
callable from arch-specific irq.c code.
- manage.c: the main driver apis
- spurious.c: the handling of buggy interrupt sources.
- autoprobe.c: probing of interrupts - older code but still in use.
- proc.c: /proc/irq/ code.
- internals.h for irq-core-internal interfaces not visible to drivers
nor arch PIC code.
An architecture enables the generic hardirq code by defining
CONFIG_GENERIC_HARDIRQS in its arch Kconfig. People doing this conversion
should check out the x86/x64/ppc/ppc64 patches for details - the conversion is
quite straightforward but every converted function (i.e. every function
removed from the arch irq.c) _must_ be matched to the generic version and if
there is any detail that the generic code should do it has to be added to the
generic code. All of the currently converted 4 architectures were converted
like that, and the generic code was extended/fixed along the way.
Other changes related to this patchset:
- clean up the irq include files (linux/irq.h, linux/interrupt.h,
linux/hardirq.h) and consolidate asm-*/[hard]irq.h. Note, to keep all
non-touched architectures in an untouched state this consolidation is
done carefully and strictly under CONFIG_GENERIC_HARDIRQS.
Once the consolidation is done we can do a couple of final cleanups
to reach the following logical splitup of 3 include files:
linux/interrupt.h: driver-visible APIs and details
linux/irq.h: core irq and arch-PIC code, internals
asm-*/irq.h: arch PIC and irq delivery details
the following include files will likely vanish:
linux/hardirq.h merges into linux/irq.h
asm-*/hardirq.h: merges into asm-*/irq.h
asm-*/hw_irq.h: merges into asm-*/irq.h
Christoph would like to do these once the current wave of
cleanups gets in.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into kroah.com:/home/greg/linux/BK/driver-2.6
|
|
Thanks to Kay Sievers for pointing this out.
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
This patch achieves out of line spinlocks by creating kernel/spinlock.c
and using the _raw_* inline locking functions.
Now, as much as this is supposed to be arch agnostic, there was still a
fair amount of rummaging about in archs, mostly for the cases where the
arch already has out of line locks and i wanted to avoid the extra call,
saving that extra call also makes lock profiling easier. PPC32/64 was
an example of such an arch and i have added the necessary profile_pc()
function as an example.
Size differences are with CONFIG_PREEMPT enabled since we wanted to
determine how much could be saved by moving that lot out of line too.
ppc64 = 259897 bytes:
text data bss dec hex filename
5489808 1962724 709064 8161596 7c893c vmlinux-after
5749577 1962852 709064 8421493 808075 vmlinux-before
sparc64 = 193368 bytes:
text data bss dec hex filename
3472037 633712 308920 4414669 435ccd vmlinux-after
3665285 633832 308920 4608037 465025 vmlinux-before
i386 = 416075 bytes
text data bss dec hex filename
5808371 867442 326864 7002677 6ada35 vmlinux-after
6221254 870634 326864 7418752 713380 vmlinux-before
x86-64 = 282446 bytes
text data bss dec hex filename
4598025 1450644 523632 6572301 64490d vmlinux-after
4881679 1449436 523632 6854747 68985b vmlinux-before
It has been compile tested (UP, SMP, PREEMPT) on i386, x86-64, sparc,
sparc64, ppc64, ppc32 and runtime tested on i386, x86-64 and sparc64.
Signed-off-by: Zwane Mwaikambo <zwane@fsmlabs.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
o /sys/kernel/hotplug_seqnum exports the current number
o lib/kobject.c's sequence_num is renamed to hotplug_seqnum and
exported by include/linux/kobject.h
o the source file ksysfs.c in kernel/ creates on init the
sybsystem "/sys/kernel/" in sysfs
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
This patch helps developers to trap at almost any kernel code address,
specifying a handler routine to be invoked when the breakpoint is hit.
Useful for analysing the Linux kernel by collecting debugging information
non-disruptively. Employs single-stepping out-of-line to avoid probe
misses on SMP and may be especially useful in aiding debugging elusive
races and problems on live systems. More elaborate dynamic tracing tools
such as DProbes can be built over the kprobes interface.
Helps developers to trap at almost any kernel code address, specifying a
handler routine to be invoked when the breakpoint is hit. Useful for
analysing the Linux kernel by collecting debugging information
non-disruptively. Employs single-stepping out-of-line to avoid probe
misses on SMP and may be especially useful in aiding debugging elusive
races and problems on live systems. More elaborate dynamic tracing tools
such as DProbes can be built over the kprobes interface.
Sample usage:
To place a probe on __blockdev_direct_IO:
static int probe_handler(struct kprobe *p, struct pt_regs *)
{
... whatever ...
}
struct kprobe kp = {
.addr = __blockdev_direct_IO,
.pre_handler = probe_handler
};
register_kprobe(&kp);
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Andy Whitcroft <apw@shadowen.org>
Being able to recover the configuration from a kernel is very useful and it
would be nice to default this option to Yes. Currently, to have the config
available both from the image (using extract-ikconfig) and via /proc we
keep two copies of the original .config in the kernel. One in plain text
and one gzip compressed. This is not optimal.
This patch removes the plain text version of the configuration and updates
the extraction tools to locate and use the gzip'd version of the file.
This has the added bonus of providing us with the exact same results in
both cases, the original .config; including the comments.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When using a separate output directory the in-kernel config wiere rebuild
each time the kernel was compiled. Fix this by specifying correct path to
Makefile in the prerequisite to the ikconfig.h file.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Rik Faith <faith@redhat.com>
This patch provides a low-overhead system-call auditing framework for Linux
that is usable by LSM components (e.g., SELinux). This is an update of the
patch discussed in this thread:
http://marc.theaimsgroup.com/?t=107815888100001&r=1&w=2
In brief, it provides for netlink-based logging of audit records that have
been generated in other parts of the kernel (e.g., SELinux) as well as the
ability to audit system calls, either independently (using simple
filtering) or as a compliment to the audit record that another part of the
kernel generated.
The main goals were to provide system call auditing with 1) as low overhead
as possible, and 2) without duplicating functionality that is already
provided by SELinux (and/or other security infrastructures). This
framework will work "stand-alone", but is not designed to provide, e.g.,
CAPP functionality without another security component in place.
This updated patch includes changes from feedback I have received,
including the ability to compile without CONFIG_NET (and better use of
tabs, so use -w if you diff against the older patch).
Please see http://people.redhat.com/faith/audit/ for an early example
user-space client (auditd-0.4.tar.gz) and instructions on how to try it.
My future intentions at the kernel level include improving filtering (e.g.,
syscall personality/exit codes) and syscall support for more architectures.
First, though, I'm going to work on documentation, a (real) audit daemon,
and patches for other user-space tools so that people can play with the
framework and understand how it can be used with and without SELinux.
Update:
Light-weight Auditing Framework receive filter fixes
From: Rik Faith <faith@redhat.com>
Since audit_receive_filter() is only called with audit_netlink_sem held, it
cannot race with either audit_del_rule() or audit_add_rule(), so the
list_for_each_entry_rcu()s may be replaced by list_for_each_entry()s, and
the rcu_read_{un,}lock()s removed. A fix for this is part of the attached
patch.
Other features of the attached patch are:
1) generalized the ability to test for inequality
2) added syscall exit status reporting and testing
3) added ability to report and test first 4 syscall arguments (this adds
a large amount of flexibility for little cost; not implemented or tested
on ppc64)
4) added ability to report and test personality
User-space demo program enhanced for new fields and inequality testing:
http://people.redhat.com/faith/audit/auditd-0.5.tar.gz
|
|
The "bogolock" code was introduced in module.c, as a way of freezing
the machine when we wanted to remove a module. This patch moves it
out to stop_machine.c and stop_machine.h.
Since the code changes affinity and proirity, it's impolite to hijack
the current context, so we use a kthread. This means we have to pass
the function rather than implement "stop_machine()" and
"restart_machine()".
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
These two patches provide the framework for stopping kernel threads to
allow hotplug CPU. This one just adds kthread.c and kthread.h, next
one uses it.
Most importantly, adds a Monty Python quote to the kernel.
Details:
The hotplug CPU code introduces two major problems:
1) Threads which previously never stopped (migration thread,
ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
2) Threads which previously never had to be created now have
to be created when a CPU goes online.
Unfortunately, stopping a thread is fairly baroque, involving memory
barriers, a completion and spinning until the task is actually dead
(for example, complete_and_exit() must be used if inside a module).
There are also three problems in starting a thread:
1) Doing it from a random process context risks environment contamination:
better to do it from keventd to guarantee a clean environment, a-la
call_usermodehelper.
2) Getting the task struct without races is a hard: see kernel/sched.c
migration_call(), kernel/workqueue.c create_workqueue_thread().
3) There are races in starting a thread for a CPU which is not yet
online: migration thread does a complex dance at the moment for
a similar reason (there may be no migration thread to migrate us).
Place all this logic in some primitives to make life easier:
kthread_create() and kthread_stop(). These primitives require no
extra data-structures in the caller: they operate on normal "struct
task_struct"s.
Other changes:
- Expose keventd_up(), as keventd and migration threads will use kthread to
launch, and kthread normally uses workqueues and must recognize this case.
- Kthreads created at boot before "keventd" are spawned directly. However,
this means that they don't have all signals blocked, and hence can be
killed. The simplest solution is to always explicitly block all signals in
the kthread.
- Change over the migration threads, the workqueue threads and the
ksoftirqd threads to use kthread.
- module.c currently spawns threads directly to stop the machine, so a
module can be atomically tested for removal.
- Unfortunately, this means that the current task is manipulated (which
races with set_cpus_allowed, for example), and it can't set its priority
artificially high. Using a kernel thread can solve this cleanly, and with
kthread_run, it's simple.
- kthreads use keventd, so they inherit its cpus_allowed mask. Unset it.
All current users set it explicity anyway, but it's nice to fix.
- call_usermode_helper uses keventd, so the process created inherits its
cpus_allowed mask. Unset it.
- Prevent errors in boot when cpus_possible() contains a cpu which is not
online (ie. a cpu didn't come up). This doesn't happen on x86, since a
boot failure makes that CPU no longer possible (hacky, but it works).
- When the cpu fails to come up, some callbacks do kthread_stop(), which
doesn't work without keventd (which hasn't started yet). Call it directly,
and take care that it restores signal state (note: do_sigaction does a
flush on blocked signals, so we don't need to repeat it).
|
|
|
|
From: "Randy.Dunlap" <randy.dunlap@verizon.net>
The SuSE kernels place their ikconfig info at /proc/config.gz: in a
different place, and compressed. We thought it was a good idea to do it
that way in 2.6 as well.
- gzip the /proc config file, put it in /proc/config.gz;
- Based on a SuSE patch by Oliver Xymoron <oxymoron@waste.org>, which was
derived from a patch by Nicholas Leon <nicholas@binary9.net>
- change /proc/ikconfig/built_with to /proc/config_build_info;
- cleanup ikconfig init/exit entry points (static, __init, __exit);
- Makefile help from Sam Ravnborg;
DESC
ikconfig cleanup
EDESC
From: Stephen Hemminger <shemminger@osdl.org>
Simplify and cleanup the code:
- use single interface to seq_file where possible
- don't need to do as much of the /proc interface, only read
- use copy_to_user to avoid char at a time copy
- remove unneccesary globals
- use const char[] rather than const char * where possible.
Didn't change the version since interface doesn't change.
|
|
Also remove $Id$ tag.
No other code change.
|
|
From: "Randy.Dunlap" <rddunlap@osdl.org>
Please merge this makefile update from Sam.
From: Sam Ravnborg <sam@ravnborg.org>
Remark, I removed dependencies for configs.o - the are generated by kbuild
anyway. Only generated files needs explicit dependencies.
|
|
- Move kernel/pm.c to kernel/power/pm.c
- Move poweroff sysrq registration to kernel/power/poweroff.c
- Mark pm_* functions deprecated to prevent new uers.
|
|
|
|
into osdl.org:/home/mochel/src/kernel/devel/linux-2.5-power
|
|
(Randy Dunlap)
Build the kernel config data into the kernel - either unloaded or accessible
via /proc
|
|
Move kernel/suspend.c -> drivers/power/swsusp.c
|
|
From: Christopher Hoover <ch@murgatroid.com>
Not everyone needs futex support, so it should be optional. This is needed
for small platforms.
|
|
This is version 23 or so of the POSIX timer code.
Internal changelog:
- Changed the signals code to match the new order of things. Also the
new xtime_lock code needed to be picked up. It made some things a lot
simpler.
- Fixed a spin lock hand off problem in locking timers (thanks
to Randy).
- Fixed nanosleep to test for out of bound nanoseconds
(thanks to Julie).
- Fixed a couple of id deallocation bugs that left old ids
laying around (hey I get this one).
- This version has a new timer id manager. Andrew Morton
suggested elimination of recursion (done) and I added code
to allow it to release unused nodes. The prior version only
released the leaf nodes. (The id manager uses radix tree
type nodes.) Also added is a reuse count so ids will not
repeat for at least 256 alloc/ free cycles.
- The changes for the new sys_call restart now allow one
restart function to handle both nanosleep and clock_nanosleep.
Saves a bit of code, nice.
- All the requested changes and Lindent too :).
- I also broke clock_nanosleep() apart much the same way
nanosleep() was with the 2.5.50-bk5 changes.
TIMER STORMS
The POSIX clocks and timers code prevents "timer storms" by
not putting repeating timers back in the timer list until
the signal is delivered for the prior expiry. Timer events
missed by this delay are accounted for in the timer overrun
count. The net result is MUCH lower system overhead while
presenting the same info to the user as would be the case if
an interrupt and timer processing were required for each
increment in the overrun count.
|
|
One of the goals of the whole new modversions implementation:
export-objs is gone for good!
|
|
This was never used, and a bad idea to begin with.
|
|
This patch is a rewrite of the insmod and boot parameter handling,
to unify them.
The new format is fairly simple: built on top of __module_param_call there
are several helpers, eg "module_param(foo, int, 000)". The final argument
is the permissions bits, for exposing parameters in sysfs (if
non-zero) at a later stage.
|
|
Makefiles no longer need to include Rules.make, which is currently an
empty file. This patch removes it from the remaining Makefiles, and
removes the empty Rules.make file.
|
|
This is the generic part of the start of the compatibility syscall
layer. I think I have made it generic enough that each architecture can
define what compatibility means.
To use this, an architecture must create asm/compat.h and provide
typedefs for (currently) 'compat_time_t', 'struct compat_timeval' and
'struct compat_timespec'.
|
|
was enabled, so split it up into "extable.c"
|
|
This is an implementation of the in-kernel module loader extending
the try_inc_mod_count() primitive and making its use compulsory.
This has the benifit of simplicity, and similarity to the existing
scheme. To reduce the cost of the constant increments and
decrements, reference counters are lockless and per-cpu.
Eliminated (coming in following patches):
o Modversions
o Module parameters
o kallsyms
o EXPORT_SYMBOL_GPL and MODULE_LICENCE checks
o DEVICE_TABLE support.
New features:
o Typesafe symbol_get/symbol_put
o Single "insert this module" syscall interface allows trivial userspace.
o Raceless loading and unloading
You will need the trivial replacement module utilities from:
http://ozlabs.org/~rusty/module-init-tools-0.6.tar.gz
|
|
|
|
This is the RCU core patch from akpm's tree. It has been in his
tree since about 2.5.37-mm1 along with dcache_rcu and so far it has
worked fine. For 2.5, I am hoping that we might get the following
RCU patches included -
1. rt_rcu - ipv4 routecache lookup. Davem agreed to include this patch
if and when you include RCU core in your tree.
2. dcache_rcu (by Maneesh Soni) - dcache lookup avoiding dcache_lock as
much as possible. This has been akpm's tree - stable and gives us
good yield. I have been submitting this to Viro and I will publish
some more benchmark numbers later to help decide on this.
This RCU core implements RCU APIs, call_rcu() and synchronize_kernel(),
by monitoring a per-CPU quiescent state (idle/user etc.) counter.
call_rcu() queues a callback to be invoked after all the CPUs have
gone through a quiescent state. Queuing is per-CPU and each per-CPU
batch gets a batch number. As batches get their turn, a global
cpu mask is used to keep track of CPUs pending quiescent state.
Checking for quiescent cycle is done by saving the per-CPU
counter at the beginning of the batch and then monitoring it for change
through the local timer interrupt handler.
|
|
This implements the simple hooks we need to catch unmappings, and to
make sure no stale task_struct*'s are ever used by the main oprofile
core mechanism. If disabled, it compiles to nothing.
|
|
This is the next iteration of the workqueue abstraction.
The framework includes:
- per-CPU queueing support.
on SMP there is a per-CPU worker thread (bound to its CPU) and per-CPU
work queues - this feature is completely transparent to workqueue-users.
keventd automatically uses this feature. XFS can now update to work-queues
and have the same per-CPU performance as it had with its per-CPU worker
threads.
- delayed work submission
there's a new queue_delayed_work(wq, work, delay) function and a new
schedule_delayed_work(work, delay) function. The later one is used to
correctly fix former tq_timer users. I've reverted those changes in 2.5.40
that changed tq_timer uses to schedule_work() - eg. in the case of
random.c or the tty flip queue it was definitely the wrong thing to do.
delayed work means a timer embedded in struct work_struct. I considered
using split struct work_struct and delayed_work_struct types, but lots
of code actively uses task-queues in both delayed and non-delayed mode,
so i went for the more generic approach that allows both methods of work
submission. Delayed timers do not cause any other overhead in the
normal submission path otherwise.
- multithreaded run_workqueue() implementation
the run_workqueue() function can now be called from multiple contexts, and
a worker thread will only use up a single entryy - this property is used
by the flushing code, and can potentially be used in the future to extend
the number of per-CPU worker threads.
- more reliable flushing
there's now a 'pending work' counter, which is used to accurately detect
when the last work-function has finished execution. It's also used to
correctly flush against timed requests. I'm not convinced whether the old
keventd implementation got this detail right.
- i switched the arguments of the queueing function(s) per Jeff's
suggestion, it's more straightforward this way.
Driver fixes:
i have converted almost every affected driver to the new framework. This
cleaned up tons of code. I also fixed a number of drivers that were still
using BHs (these drivers did not compile in 2.5.40).
while this means lots of changes, it might ease the QA decision whether to
put this patch into 2.5.
The pach converts roughly 80% of all tqueue-using code to workqueues - and
all the places that are not converted to workqueues yet are places that do
not compile in vanilla 2.5.40 anyway, due to unrelated changes. I've
converted a fair number of drivers that do not compile in 2.5.40, and i
think i've managed to convert every driver that compiles under 2.5.40.
|
|
CPUFreq core for 2.5.39
include/linux/cpufreq.h CPUFreq header
kernel/Makefile add cpufreq.c if necessary
kernel/cpufreq.c CPUFreq core
|
|
Make the kernel print out symbolic bactraces if symbol table information
is available (CONFIG_KALLSYMS)
|
|
This is the latest version of the generic pidhash patch. The biggest
change is the removal of separately allocated pid structures: they are
now part of the task structure and the first task that uses a PID will
provide the pid structure. Task refcounting is used to avoid the
freeing of the task structure before every member of a process group or
session has exited.
This approach has a number of advantages besides the performance gains.
Besides simplifying the whole hashing code significantly, attach_pid()
is now fundamentally atomic and can be called during create_process()
without worrying about task-list side-effects. It does not have to
re-search the pidhash to find out about raced PID-adding either, and
attach_pid() cannot fail due to OOM. detach_pid() can do a simple
put_task_struct() instead of the kmem_cache_free().
The only minimal downside is the potential pending task structures after
session leaders or group leaders have exited - but the number of orphan
sessions and process groups is usually very low - and even if it's
higher, this can be regarded as a slow execution of the final
deallocation of the session leader, not some additional burden.
|
|
It's gone almost everywhere else already, and will eventually make for
a nicer top-level Makefile.
|
|
The following patch adds support for CONFIG_GENERIC_ISA_DMA, which went
into the 2.4-ac kernel series prior to 2.5 happening.
The following patch allows architectures to decide whether they want
the generic ISA DMA functionality provided by kernel/dma.c and other
supporting files.
In addition, we move the procfs "/proc/dma" support code out of fs/proc
into kernel/dma.c, and adapt it to use the new seq_file code.
|
|
This patch alters the boot sequence to "plug in" each CPU, one at a
time. You need the patch for each architecture, as well. The
interface used to be "smp_boot_cpus()", "smp_commence()", and each
arch implemented the "maxcpus" boot arg itself. With this patch,
it is:
smp_prepare_cpus(maxcpus): probe for cpus and set up cpu_possible(cpu).
__cpu_up(cpu): called *after* initcalls, for each cpu where
cpu_possible(cpu) is true.
smp_cpus_done(maxcpus): called after every cpu has been brought up
|
|
Looks like sys_sysinfo has not been touched in years. Among other
things, it uses a global cli() for protection; I switched it to an
existing rwlock. I also pulled it out of info.c and stuck it in timer.c
(I choose timer.c because it shares dependencies there already).
The details:
- move sys_sysinfo to kernel/timer.c from kernel/info.c:
why one small syscall got its own file is beyond me.
- delete kernel/info.c
- stop the global cli! now grab a read_lock on xtime_lock.
this is safe as we moved the write_unlock on xtime_lock
down one line to cover the calculating of avenrun.
- trivial code cleanup
|