aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2005-04-03[PATCH] cpuset.c __user annotationsAlexander Viro1-4/+4
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
2005-03-30[PATCH] vt: don't call unblank at irq timeBenjamin Herrenschmidt1-5/+13
This patch removes the call to unblank() from printk, and avoids calling unblank at irq() time _unless_ oops_in_progress is 1. I also export oops_in_progress() so drivers who care like radeonfb can test it and know what to do. I audited call sites of unblank_screen(), console_unblank(), etc... and I _hope_ I got them all, the patch includes a small patch to the s390 bust_spinlocks code that sets oops_in_progress back to 0 _after_ unblanking for example. I added a few might_sleep() to help us catch possible remaining callers. I'll soon write a document explaining fbdev locking. The current situation after this patch is that: - All callbacks have console_semaphore held (fbdev's are fully serialised). - Everything is called in schedule'able context, except the cfb_* rendering operations and cursor operations, with the special case of unblank who can be called at any time when "oops_in_progress" is true. A driver that needs to sleep in it's unblank implementation is welcome to test that variable and use a fallback path (or just do nothing if it's not simple). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-30[PATCH] sched: fix schedstats warningNick Piggin1-1/+1
Quiet a warning when compiling without CONFIG_SMP Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-30[PATCH] kernel/rcupdate.c: make the exports EXPORT_SYMBOL_GPLAdrian Bunk1-3/+3
Due to the patent situation at least in the USA, the exports of kernel/rcupdate.c should be EXPORT_SYMBOL_GPL. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-30[PATCH] seccomp for ppc64Andrea Arcangeli1-25/+7
This patch against 12-rc1 adds seccomp to the ppc64 arch. I tested it successfully with the seccomp_test. I didn't bother to change the syscall exit not to check for TIF_SECCOMP, in theory that bit could be optimized but it's an optimization in the slow path, and current code is a bit simpler. I also verified it still compiles and works fine on x86 and x86-64. Instead of the TIF_32BIT redefine, if you want to change x86-64 to use TIF_32BIT too (instead of TIF_IA32), let me know. Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-30[PATCH] cpuset: make function decl. ANSIRandy Dunlap1-1/+1
kernel/cpuset.c:1428:41: warning: non-ANSI function declaration Signed-off-by: Randy Dunlap <rddunlap@osdl.org> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-30[PATCH] BDI: Provide backing device capability information [try #3]David Howells1-1/+1
The attached patch replaces backing_dev_info::memory_backed with capabilitied bitmap. The capabilities available include: (*) BDI_CAP_NO_ACCT_DIRTY Set if the pages associated with this backing device should not be tracked by the dirty page accounting. (*) BDI_CAP_NO_WRITEBACK Set if dirty pages associated with this backing device should not have writepage() or writepages() invoked upon them to clean them. (*) Capability markers that indicate what a backing device is capable of with regard to memory mapping facilities. These flags indicate whether a device can be mapped directly, whether it can be copied for a mapping, and whether direct mappings can be read, written and/or executed. This information is primarily aimed at improving no-MMU private mapping support. The patch also provides convenience functions for determining the dirty-page capabilities available on backing devices directly or on the backing devices associated with a mapping. These are provided to keep line length down when checking for the capabilities. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28Merge whitespace and __nocast changesLinus Torvalds17-115/+213
2005-03-28[PATCH] Futex: make futex_wait() atomic againJakub JelĂ­nek1-42/+47
Call get_futex_value_locked in futex_wait with futex hash bucket locked and only enqueue the futex if futex has the expected value. Simplify futex_requeue. Signed-off-by: Jakub Jelinek <jakub@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] posix-cpu-timers and cputime_t divisons.Martin Schwidefsky1-39/+67
The posix cpu timers introduced code that will not work with an arbitrary type for cputime_t. In particular the division of two cputime_t values broke the s390 build because cputime_t is define as an unsigned long long. The first problem is the division of a cputime_t value by a number of threads. That is a cputime_t divided by an integer. The patch adds another macro cputime_div to the cputime macro regime which implements this type of division and replaces all occurences of a cputime / nthread in the posix cpu timer code. Next problem is bump_cpu_timer. This function is severly broken: 1) In the body of the first if statement a timer->it.cpu.incr.sched is used as the second argument of do_div. do_div expects an unsigned long as "base" parameter but timer->it.cpu.incr.sched is an unsigned long long. If the timer increment ever happens to be >= 2^32 the result is wrong and if the lower 32 bits are zero this even crashes with a fixed point divide exception. 2) The cputime_le(now.cpu, timer->it.cpu.expires.cpu) in the else if condition is wrong. The cputime_le() reads as "now.cpu <= timer->it.cpu.expires.cpu" and the subsequent cputime_ge() reads as "now.cpu >= timer.it.cpu.expires.cpu". That means that the two values needs to be equal to make the body of the second if to have any effect. The first cputime_le should be a cputime_ge. 3) timer->it.cpu.expires.cpu and delta in the else part of the if are of type cputime_t. A division of two cputime_t values is undefined (think of cputime_t as e.g. a struct timespec, that just doesn't work). We could add a primitive for this type of division but we'd end up with a 64 bit division or something even more complicated. The solution for bump_cpu_timer is to use the "slow" division algorithm that does shifts and subtracts. That adds yet another cputime macro, cputime_halve to do the right shift of a cputime value. The next problem is in arm_timer. The UPDATE_CLOCK macro does the wrong thing for it_prof_expires and it_virt_expires. Expanded the macro and added the cputime magic to it_prof/it_virt. The remaining problems are rather simple, timespec_to_jiffies instead of timespec_to_cputime and several cases where cputime_eq with cputime_zero needs to be used instead of "== 0". What still worries me a bit is to use "timer->it.cpu.incr.sched == 0" as check if the timer is armed at all. It should work but its not really clean. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] Fix POSIX timers expiring before their scheduled timeGeorge Anzinger1-1/+4
This patch fixes the problem of POSIX timers returning too early due to not accounting for the time starting mid jiffie. Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] cpusets: mems generation deadlock fixPaul Jackson1-2/+32
The cpuset code to update mems_generation could (in theory) deadlock on cpuset_sem if it needed to allocate some memory while creating (mkdir) or removing (rmdir) a cpuset, so already held cpuset_sem. Some other process would have to mess with this tasks cpuset memory placement at the same time. We avoid this possible deadlock by always updating mems_generation after we grab cpuset_sem on such operations, before we risk any operations that might require memory allocation. Thanks to Jack Steiner <steiner@sgi.com> for noticing this. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] Missing set_fs() calls around kernel syscallRandolph Chung1-1/+6
Found by sparse... since we are passing kernel param to a syscall handler, we need to do the set_fs() wrappers. Signed-off-by: Randolph Chung <tausq@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] kprobes: incorrect spin_unlock_irqrestore() call in register_kprobe()Prasanna S. Panchamukhi1-2/+3
register_kprobe() routine was calling spin_unlock_irqrestore() wrongly. This patch removes unwanted spin_unlock_irqrestore() call in register_kprobe() routine. Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] New console flag: CON_BOOTMatthew Wilcox1-0/+5
CON_BOOT is like early printk in that it allows for output really early on. It's better than early printk because it unregisters automatically when a real console is initialised. So if you don't get consoles registering in console_init, there isn't a huge delay between the boot console unregistering and the real console starting. This is the case on PA-RISC where we have serial ports that aren't discovered until the PCI bus has been walked. I think all the current early printk users could be converted to this scheme with a minimal amount of effort. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] Exports to enable clock driver modulesChristoph Lameter2-1/+8
The following exports are necessary to allow loadable modules to define new clocks. Without these the mmtimer driver cannot be build correctly as a module (there is another mmtimer specific fix necessary to get it to build properly but that will be a separate patch): Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] Fix irq_affinity write from /proc for ia64Ashok Raj1-2/+8
Made GENERIC_HARDIRQ mechanism work for ia64 and CPU hotplug. When write to /proc/irq is handled it is not appropriate to perform set_rte immediatly, since there is a race when the interrupt is asserted while the re-program is happening. Hence such programming is only safe when we do the re-program at the time of servicing an interrupt. This got broken when GENERIC_HARDIRQ got introduced for ia64. - added CONFIG_PENDING_IRQ so default /proc/irq write handler can do the right thing. TBD: We currently dont handle redirectable hint either in the display, or when we handle writes to /proc/irq/XX/smp_affinity. We need an arch specific way to account for the presence of "r" hint when we handle the proc write. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: <linux-ia64@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] break_lock fixIngo Molnar2-4/+3
lock->break_lock is set when a lock is contended, but cleared only in cond_resched_lock. Users of need_lockbreak (journal_commit_transaction, copy_pte_range, unmap_vmas) don't necessarily use cond_resched_lock on it. So, if the lock has been contended at some time in the past, break_lock remains set thereafter, and the fastpath keeps dropping lock unnecessarily. Hanging the system if you make a change like I did, forever restarting a loop before making any progress. And even users of cond_resched_lock may well suffer an initial unnecessary lockbreak. There seems to be no point at which break_lock can be cleared when unlocking, any point being either too early or too late; but that's okay, it's only of interest while the lock is held. So clear it whenever the lock is acquired - and any waiting contenders will quickly set it again. Additional locking overhead? well, this is only when CONFIG_PREEMPT is on. Since cond_resched_lock's spin_lock clears break_lock, no need to clear it itself; and use need_lockbreak there too, preferring optimizer to #ifdefs. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] swsusp: kill swsusp_restorePavel Machek1-11/+5
This kills swsusp_resume; it should be arch-neutral but some i386 code sneaked in. And arch-specific code is better done in assembly anyway. Plus it fixes memory leaks in error paths. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] swsusp: small updatesPavel Machek2-6/+9
This kills unused macro and write-only variable, and adds messages where something goes wrong with suspending devices. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] swsusp: Add missing refrigerator callsPavel Machek1-0/+2
This adds few more places where it is possible freeze kernel threads. From: Nigel Cunningham <ncunningham@cyclades.com> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] x86: CMOS time update optimisationGeorge Anzinger1-0/+9
This patch changes the update of the cmos clock to be timer driven rather than poll driven by the timer interrupt function. If the clock is not being synced to an outside source the timer is removed and thus system overhead is nill in that case. The update frequency is still ~11 minutes and missing the update window still causes a retry in 60 seconds. We want the calls to sync_cmos_clock() to be made in a consistent environment. This was not true when calling it directly from the NTP call code. The change means that sync_cmos_clock() is ALWAYS called from run_timers(), i.e. as a timer call back function. Also, call the timer code only through the timer interface (set a short timer to do it from the ntp call). Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] ppc64: fix linkage error on G5Anton Blanchard1-0/+1
Move the ppc64 specific cond_syscall(ppc_rtas) into sys_ni.c so that it takes effect. With this fixed we can remove the #define hack. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] mm counter operations through macrosChristoph Lameter2-4/+4
This patch extracts all the operations on counters protected by the page table lock (currently rss and anon_rss) into definitions in include/linux/sched.h. All rss operations are performed through the following macros: get_mm_counter(mm, member) -> Obtain the value of a counter set_mm_counter(mm, member, value) -> Set the value of a counter update_mm_counter(mm, member, value) -> Add to a counter inc_mm_counter(mm, member) -> Increment a counter dec_mm_counter(mm, member) -> Decrement a counter With this patch it becomes easier to add new counters and it is possible to redefine the method of counter handling. The counters are an issue for scalability since they are used in frequently used code paths and may cause cache line bouncing. F.e. One may not use counters at all and count the pages when needed, switch to atomic operations if the mm_struct locking changes or split the rss into counters that can be locally incremented. The relevant fields of the task_struct are renamed with a leading underscore to catch out people who are not using the acceessor macros. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28Mark "gfp" masks as "unsigned int" and use __nocast to find violations.Linus Torvalds2-3/+3
This makes it hard(er) to mix argument orders by mistake for things like kmalloc() and friends, since silent integer promotion is now caught by sparse.
2005-03-16MergeLinus Torvalds1-14/+100
2005-03-15[PATCH] tasklist left lockedHugh Dickins1-0/+1
On 4-way SMP, about one reboot in twenty hangs while killing processes: exit needs exclusive tasklist_lock, but something still holds read_lock. do_signal_stop race case misses unlock, and fixing it fixes the symptom. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] Update panic() commentHeiko Carstens1-2/+1
panic() doesn't flush the filesystem cache anymore. The comment above the function still claims it does. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] verify_area cleanup : i386 and misc.Jesper Juhl3-9/+11
This patch converts verify_area to access_ok in arch/i386, fs/, kernel/ and a few other bits that didn't fit in the other patches or that I actually was able to test on my hardware - this is by far the best tested of all the patches. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] Allow admin to enable only some of the Magic-Sysrq functionsJan Kara1-1/+2
Allow admin to enable only some of the Magic-Sysrq functions. This allows admin to disable sysrq functions he considers dangerous (e.g. sending kill signal, remounting fs RO) while keeping the possibility to use the others (e.g. debug deadlocks by dumps of processes etc.). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] Subject: swsusp: do not provoke emergency disk shutdownsStefan Seyfried1-1/+1
In platform swsusp mode, we were forgetting to spin disks down, leading to ugly emergency shutdown. This synchronizes platform method with other methods and actually helps. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] swsusp: enable resume from initrdPavel Machek2-43/+157
From: <mjg59@scrf.ucam.org> When using a fully modularized kernel it is necessary to activate resume manually as the device node might not be available during kernel init. This patch implements a new sysfs attribute '/sys/power/resume' which allows for manual activation of software resume. When read from it prints the configured resume device in 'major:minor' format. When written to it expects a device in 'major:minor' format. This device is then checked for a suspended image and resume is started if a valid image is found. The original functionality is left in place. It should be used from initramfs, or with care. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] swsusp: use non-contiguous memory on resumePavel Machek1-98/+239
The following patch is designed to fix a problem in the current implementation of swsusp in mainline kernels. Namely, swsusp uses an array of page backup entries (aka pagedir) to store pointers to memory pages that must be saved during suspend and restored during resume. Unfortunately, the pagedir has to be located in a contiguous chunk of memory and it sometimes turns out that an 8-order or even 9-order allocation is needed for this purpose. It sometimes is impossible to get such an allocation and swsusp may fail during either suspend or resume due to the lack of memory, although theoretically there is enough free memory for it to succeed. Moreover, swsusp is more likely to fail for this reason during resume, which means that it may fail during resume after a successful suspend (this actually has happened for some people, including me :-)) and this, potentially, may lead to the loss of data. The problem is fixed by replacing the pagedir with a linklist so that high-order memory allocations are avoided (the patches make swsusp use only 0-order allocations). Unfortunately this means that it's necessary to change assembly routines used to restore the image after it's been loaded from swap so that they walk the list instead of walking the array. This patch makes swsusp allocate only individual pages during resume. it contains the necessary changes to the assembly routines etc. for i386 and x86-64. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11Merge bk://linux-sam.bkbits.net/kbuildLinus Torvalds2-3/+3
into ppc970.osdl.org:/home/torvalds/v2.6/linux
2005-03-12Merge mars.ravnborg.org:/home/sam/bk/linux-2.6Sam Ravnborg2-3/+3
into mars.ravnborg.org:/home/sam/bk/kbuild
2005-03-11[PATCH] Make lots of things staticAdrian Bunk12-40/+40
This is a megarollup of ~60 patches which give various things static scope. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] docbook: kernel-docify commentsMartin Waitz1-5/+5
Kernel-docify comments Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] docbook: update function parameter description in block/fs codeMartin Waitz1-0/+6
Update function parameter description in block/fs code Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] re-inline sched functionsKenneth W. Chen1-4/+4
This could be part of the unknown 2% performance regression with db transaction processing benchmark. The four functions in the following patch use to be inline. They are un-inlined since 2.6.7. We measured that by re-inline them back on 2.6.9, it improves performance for db transaction processing benchmark, +0.2% (on real hardware :-) The cost is certainly larger kernel size, cost 928 bytes on x86, and 2728 bytes on ia64. But certainly worth the money for enterprise customer since they improve performance on enterprise workload. # size vmlinux.* text data bss dec hex filename 3261844 717184 262020 4241048 40b698 vmlinux.x86.orig 3262772 717488 262020 4242280 40bb68 vmlinux.x86.inline text data bss dec hex filename 5836933 903828 201940 6942701 69efed vmlinux.ia64.orig 5839661 903460 201940 6945061 69f925 vmlinux.ia64.inline Possible we can introduce them back? Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] sched: find_busiest_group cleanupNick Piggin1-5/+1
Cleanup find_busiest_group a bit. New sched-domains code means we can't have groups without a CPU. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] sched: find_busiest_group fixletsNick Piggin1-11/+12
Fix up a few small warts in the periodic multiprocessor rebalancing code. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] sched: rework schedstatsNick Piggin1-63/+49
Move balancing fields into struct sched_domain, so we can get more useful results on systems with multiple domains (eg SMT+SMP, CMP+NUMA, SMP+NUMA, etc). Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] sched: timestamp fixesNick Piggin1-2/+7
Some fixes for unsynchronised TSCs. A task's timestamp may have been set by another CPU. Although we try to adjust this correctly with the timestamp_last_tick field, there is no guarantee this will be exactly right. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-10[PATCH] printk-times bugfix for loglevel-only printksTim Bird1-2/+7
This patch fixes a bug with the recently added printk-times feature. In the case where a printk consists of only the log level (followed subsequently by printks with more text for the same line), the printk-times code doesn't correctly recognize the end of the string, and starts emitting chars at the 0 byte at the end of the string. The patch below fixes this problem. It also adjusts the handling of printed_len in the routine, which was affected by the printk-times feature. Signed-off-by: Tim Bird <tim.bird@am.sony.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] cpusets - big numa cpu and memory placementPaul Jackson5-1/+1545
This my cpuset patch, with the following changes in the last two weeks: 1) Updated to 2.6.8.1-mm1 2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty, not copied from parent. Needed to avoid breaking exclusive property. 3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top cpuset from cpu_possible_map after smp_init() called. 4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages() if the current tasks cpuset mems_allowed has changed. Use a cpuset generation number, bumped on any cpuset memory placement change, to make this check efficient. Update the tasks mems_allowed from its cpuset, if the cpuset has changed. 5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset, then update its cpus_allowed, using set_cpus_allowed(). 6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to reflect above changes (4) and (5). I continue to recommend the following patch for inclusion in your 2.6.9-*mm series, when that opens. It provides an important facility for high performance computing on large systems. Simon Derr of Bull (France) and myself are the primary authors. Erich Focht has indicated that NEC is also a potential user of this patch on the TX-7 NUMA machines, and that he "would very much welcome the inclusion of cpusets." I offer this update to lkml, in order to invite continued feedback. The one prerequiste patch for this cpuset patch was just posted before this one. That was a patch to provide a new bitmap list format, of which cpusets is the first user. This patch has been built on top of 2.6.8.1-mm1, for the arch's: i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64 with and without CONFIG_CPUSET. It has been booted and tested on ia64 (sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in the final link step. === Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks. Cpusets constrain the CPU and Memory placement of tasks to only the processor and memory resources within a tasks current cpuset. They form a nested hierarchy visible in a virtual file system. These are the essential hooks, beyond what is already present, required to manage dynamic job placement on large systems. Cpusets require small kernel hooks in init, exit, fork, mempolicy, sched_setaffinity, page_alloc and vmscan. And they require a "struct cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t (to go along with the "cpus_allowed" cpumask_t that's already there) in each task struct. These hooks: 1) establish and propagate cpusets, 2) enforce CPU placement in sched_setaffinity, 3) enforce Memory placement in mbind and sys_set_mempolicy, 4) restrict page allocation and scanning to mems_allowed, and 5) restrict migration and set_cpus_allowed to cpus_allowed. The other required hook, restricting task scheduling to CPUs in a tasks cpus_allowed mask, is already present. Cpusets extend the usefulness of, the existing placement support that was added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and mbind() and set_mempolicy() for memory placement. On smaller or dedicated use systems, the existing calls are often sufficient. On larger NUMA systems, running more than one, performance critical, job, it is necessary to be able to manage jobs in their entirety. This includes providing a job with exclusive CPU and memory that no other job can use, and being able to list all tasks currently in a cpuset. A given job running within a cpuset, would likely use the existing placement calls to manage its CPU and memory placement in more detail. Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is represented by a directory in the cpuset virtual file system, normally mounted at /dev/cpuset. Each cpuset directory provides the following files, which can be read and written: cpus: List of CPUs allowed to tasks in that cpuset. mems: List of Memory Nodes allowed to tasks in that cpuset. tasks: List of pid's of tasks in that cpuset. cpu_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its CPUs (no sibling or cousin cpuset may overlap CPUs). mem_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its Memory Nodes (no sibling or cousin may overlap). notify_on_release: Flag (0 or 1) - if set, then /sbin/cpuset_release_agent will be invoked, with the name (/dev/cpuset relative path) of that cpuset in argv[1], when the last user of it (task or child cpuset) goes away. This supports automatic cleanup of abandoned cpusets. In addition one new filetype is added to the /proc file system: /proc/<pid>/cpuset: For each task (pid), list its cpuset path, relative to the root of the cpuset file system. This file is read-only. New cpusets are created using 'mkdir' (at the shell or in C). Old ones are removed using 'rmdir'. The above files are accessed using read(2) and write(2) system calls, or shell commands such as 'cat' and 'echo'. The CPUs and Memory Nodes in a given cpuset are always a subset of its parent. The root cpuset has all possible CPUs and Memory Nodes in the system. A cpuset may be exclusive (cpu or memory) only if its parent is similarly exclusive. See further Documentation/cpusets.txt, at the top of the following patch. /proc interface: It is useful, when learning and making new uses of cpusets and placement to be able to see what are the current value of a tasks cpus_allowed and mems_allowed, which are the actual placement used by the kernel scheduler and memory allocator. The cpus_allowed and mems_allowed values are needed by user space apps that are micromanaging placement, such as when moving an app to a obtained by that app within its cpuset using sched_setaffinity, mbind and set_mempolicy. The cpus_allowed value is also available via the sched_getaffinity system call. But since the entire rest of the cpuset API, including the display of mems_allowed added here, is via an ascii style presentation in /proc and /dev/cpuset, it is worth the extra couple lines of code to display cpus_allowed in the same way. This patch adds the display of these two fields to the 'status' file in the /proc/<pid> directory of each task. The fields are only added if CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed field of each task). The new output lines look like: $ tail -2 /proc/1/status Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Mems_allowed: ffffffff,ffffffff Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] Properly share process and session keyrings with CLONE_THREAD [try #2]David Howells2-0/+9
The attached patch causes process and session keyrings to be shared properly when CLONE_THREAD is in force. It does this by moving the keyring pointers into struct signal_struct[*]. [*] I have a patch to rename this to struct thread_group that I'll revisit after the advent of 2.6.11. Furthermore, once this patch is applied, process keyrings will no longer be allocated at fork, but will instead only be allocated when needed. Allocating them at fork was a way of half getting around the sharing across threads problem, but that's no longer necessary. This revision of the patch has the documentation changes patch rolled into it and no longer abstracts the locking for signal_struct into a pair of macros. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] make cond_syscall look rightMatt Mackall1-69/+69
The current cond_syscall #defines add a semicolon on the end, and then folks leave the semicolons off in kernel/sys_ni.c, which confuses editors that are language-aware and is just generally bad style. This sweeps all the users and makes sys_ni.c look like normal C code. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] consolidate the last of the compat sigevent structsStephen Rothwell1-0/+21
This patch pulls together the compat_sigevent structs. It also consolidates the copying of these structures into the kernel. The only part of the second union in sigevent that the kernel looks at currently is the _tid, so that is the only bit we copy. This patch depends on my previous two patches "add and use COMPAT_SIGEV_PAD_SIZE" and "Consolidate the last compat sigvals". Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] remove barrier() in software_resume()Coywolf Qi Hunt1-1/+0
This patch removes the redundant compiler barrier. As Linus ever said "The mb() should make sure that gcc cannot move things around...". Signed-off-by: Coywolf Qi Hunt <coywolf@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07[PATCH] set RLIMIT_SIGPENDING limit based on RLIMIT_NPROCRoland McGrath1-0/+2
While looking into the issues Jeremy had with the RLIMIT_SIGPENDING limit, it occurred to me that the normal setting of this limit is bizarrely low. The initial hard limit setting (MAX_SIGPENDING) was taken from the old max_queued_signals parameter, which was for the entire system in aggregate. But even as a per-user limit, the 1024 value is incongruously low for this. On my machine, RLIMIT_NPROC allows me 8192 processes, but only 1024 queued signals, i.e. fewer even than one pending signal in each process. (To me, this really puts in doubt the sensibility of using a per-user limit for this rather than a per-process one, i.e. counted in sighand_struct or signal_struct, which could have a much smaller reasonable value. I don't recall the rationale for making this new limit per-user in the first place.) This patch sets the default RLIMIT_SIGPENDING limit at boot time, using the calculation that decides the default RLIMIT_NPROC limit. This uses the same value for those two limits, which I think is still pretty conservative on the RLIMIT_SIGPENDING value. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>