Summary: | java 1.4.2 client fails with erroneous "stack size too smal" | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | JB West <jbwest> |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | crash | CC: | johan.walles, msimons, tom |
Priority: | NOR | ||
Version: | 2.0.0 | ||
Target Milestone: | --- | ||
Platform: | RedHat Enterprise Linux | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch to allow fetching of stack details for a thread
Updated patch for current CVS head Updated patch to implement more pthread stack attributes properly |
Description
JB West
2003-12-02 16:36:16 UTC
The abort following the message about the stack size is occuring at: ==9328== Process terminating with default action of signal 6 (SIGABRT): dumping core ==9328== at 0x40166B11: __GI___kill (in /lib/i686/libc-2.3.2.so) ==9328== by 0x40167F07: __GI_abort (in /lib/i686/libc-2.3.2.so) ==9328== by 0x40CD01D6: os::abort(int) (in /usr/java/j2sdk1.4.1_05/jre/lib/i386/client/libjvm.so) ==9328== by 0x40CCE465: os::Linux::install_alternate_signal_stack(void) (in /usr/java/j2sdk1.4.1_05/jre/lib/i386/client/libjvm.so) I found an interesting comment in the RedHat Bugzilla (bug 26096) about a problem in os::Linux::install_alternate_signal_stack where it assumes things about the size and alignment of a thread stack. Created attachment 3599 [details]
Patch to allow fetching of stack details for a thread
This patch makes pthread_getattr_np save the stack address and size for the
requested thread in the attribute structure and fixed pthread_attr_getstackaddr
and pthread_attr_getstacksize to return that information. This is enough to
make recent JVMs work.
I installed the patch on the current CVS version. I get a little further and then - the impossible happens ... ==22135== Invalid write of size 4 ==22135== at 0x4647B230: ??? ==22135== by 0x46475C82: ??? ==22135== by 0x46475D03: ??? ==22135== by 0x46475DDA: ??? ==22135== Address 0xBFFF9FBC is not stack'd, malloc'd or free'd ==22135== ==22135== Invalid write of size 4 ==22135== at 0x4647B237: ??? ==22135== by 0x46475C82: ??? ==22135== by 0x46475D03: ??? ==22135== by 0x46475DDA: ??? ==22135== Address 0xBFFF8FBC is not stack'd, malloc'd or free'd ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_getattr_np is incomplete ==22135== your program may misbehave as a result java version "1.4.2" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28) Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode) ==22135== ==22135== Thread 6: ==22135== Syscall param mmap(args) contains uninitialised or unaddressable byte(s) ==22135== at 0x4033412D: __mmap (in /lib/libc-2.2.5.so) ==22135== by 0x416D794E: JavaThread::remove_stack_guard_pages(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416D3CBF: JavaThread::exit(int) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416D7C4C: JavaThread::thread_main_inner(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== Address 0x67A8EE50 is on thread 6's stack ==22135== warning: Valgrind's pthread_cond_destroy is incomplete ==22135== (it doesn't check if the cond is waited on) ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_cond_destroy is incomplete ==22135== (it doesn't check if the cond is waited on) ==22135== your program may misbehave as a result ==22135== warning: Valgrind's pthread_cond_destroy is incomplete ==22135== (it doesn't check if the cond is waited on) ==22135== your program may misbehave as a result --22135-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --22135-- si_code=2 Fault EIP: 0x40020A4F; Faulting address: 0x67B42000 valgrind: the `impossible' happened: Killed by fatal signal Basic block ctr is approximately 14050000 ==22135== at 0x40170A73: (within /usr/local/lib/valgrind/valgrind.so) ==22135== by 0x40170A72: panic (vg_mylibc.c:1117) ==22135== by 0x40170A99: vgPlain_core_panic (vg_mylibc.c:1122) ==22135== by 0x40177AC7: vg_sync_signalhandler (vg_signals.c:1674) sched status: Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0 ==22135== at 0x40230378: (within /usr/local/lib/valgrind/libpthread.so) ==22135== by 0x40232C21: __pthread_getspecific (vg_libpthread.c:1446) ==22135== by 0x4168E6B8: ThreadLocalStorage::thread(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x41702888: Handle::Handle(oopDesc *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Thread 2: status = WaitCV, associated_mx = 0x412E487C, associated_cv = 0x412E4894 ==22135== at 0x402321D9: pthread_cond_timedwait (vg_libpthread.c:1122) ==22135== by 0x4168D499: os::Linux::safe_cond_timedwait(pthread_cond_t *, pthread_mutex_t *, timespec const *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4167E1CE: Monitor::wait(int, long) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416F5979: VMThread::loop(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Thread 3: status = WaitCV, associated_mx = 0x41398CB8, associated_cv = 0x41398CD0 ==22135== at 0x40232061: pthread_cond_wait (vg_libpthread.c:1088) ==22135== by 0x4168D32C: os::Linux::safe_cond_wait(pthread_cond_t *, pthread_mutex_t *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4168663D: ObjectMonitor::wait(long long, int, Thread *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416BBCB6: ObjectSynchronizer::wait(Handle, long long, Thread *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Thread 4: status = WaitCV, associated_mx = 0x4139AF20, associated_cv = 0x4139AF38 ==22135== at 0x40232061: pthread_cond_wait (vg_libpthread.c:1088) ==22135== by 0x4168D32C: os::Linux::safe_cond_wait(pthread_cond_t *, pthread_mutex_t *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4168663D: ObjectMonitor::wait(long long, int, Thread *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416BBCB6: ObjectSynchronizer::wait(Handle, long long, Thread *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Thread 5: status = WaitCV, associated_mx = 0x412E38DC, associated_cv = 0x412E38F4 ==22135== at 0x40232061: pthread_cond_wait (vg_libpthread.c:1088) ==22135== by 0x4168D28B: os::Linux::safe_cond_wait(pthread_cond_t *, pthread_mutex_t *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4167E197: Monitor::wait(int, long) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x416D2FE0: SuspendCheckerThread::run(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Thread 7: status = WaitCV, associated_mx = 0x412E531C, associated_cv = 0x412E5334 ==22135== at 0x40232061: pthread_cond_wait (vg_libpthread.c:1088) ==22135== by 0x4168D32C: os::Linux::safe_cond_wait(pthread_cond_t *, pthread_mutex_t *) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4167E2C5: Monitor::wait(int, long) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) ==22135== by 0x4158E31E: CompileBroker::compiler_thread_loop(void) (in /usr/java/j2sdk1.4.2/jre/lib/i386/client/libjvm.so) Note: see also the FAQ.txt in the source distribution. It contains workarounds to several common problems. If that doesn't help, please report this bug to: valgrind.kde.org In the bug report, send all the above text, the valgrind version, and what Linux distro you are using. Thanks. I've managed to reproduce this latest issue, which looks like it is a completely separate problem. It also looks like a bug in the JVM to me. The crash in valgrind is actually happening when it is trying to mark the stack of a thread that has exited as inaccessible. The problem is that part of one of valgrind's data structures has mysteriously become inaccessible, because of the following sequence of calls made by the JVM: SYSCALL[4481,7](192):mmap2 ( 0x661CD000, 12288, 7, 50, -1, 0 ) SYSCALL[4481,7](125):mprotect ( 0x661CD000, 12288, 0 ) The first of those is an mmap with MAP_FIXED but the address given is within an area of memory which has already been allocated by valgrind for it's own use so I'm not sure why the JVM should think it can fiddle with it. The second call marks that memory as inaccessible which is what causes the crash later on. Created attachment 3742 [details]
Updated patch for current CVS head
I wonder if the JVM is looking at /proc/self/maps and doing something with that info. Anyway, I tried your most recent patch, and it didn't seem to help. Created attachment 4001 [details]
Updated patch to implement more pthread stack attributes properly
This patch extends the previous version of the patch to more fully implemented
various stack related pthread attributes.
What's the status on this one -- Tom, is the JVM working for you? Jeremy? Should this patch be committed? Actually the JVM doesn't seem to be working for me at the moment, even with this patch, but it was working up to a point when I first submitted the patch. The current breakage is very odd and I haven't managed to track down the cause yet. At the end of the day though, all the patch does is to improve valgrind's handling of various stack related attributes in the pthread simulation, which is worthwhile even if it isn't enough to make the JVM work - it isn't entirely clear what use it is to use valgrind on the JVM anyway unless you're working for Sun... I have in fact found other code which needs this patch, namely current versions of wine when using the pthread driver rather than the kthread driver, something which makes valgrinding wine programs much easier. That is actually what led to the third version of the patch because I had to extend it a bit to get wine going. valgrind could be able to help find memory errors in JNI code called by the VM -- that's the value to a developer of mixed java/jni code. *** Bug 75505 has been marked as a duplicate of this bug. *** I have now committed the patch that is attached to this bug even though it isn't sufficient to get current JVMs working as it is a sensible extension to valgrind's pthread support anyway. Current obstacles to getting the JVM running are several... Firstly, it doesn't like the fact that valgrind adds itself to the front of LD_LIBRARY_PATH and keeps reexecing itself. The fix in vg_main.c to make valgrind not add itself if already present doesn't actually seem to stop this for some reason that I can't figure out. The only thing that seems to fix it is changing valgrind to add itself to the end of LD_LIBRARY_PATH but that is not a good idea in general. The second problem is that 1.4.0 versions (at least 1.4.0_03 and 1.4.0_04) of the JVM try and allocate an alternate signal stack at a fixed location, as shown in this strace output: mmap2(0xfee0e000, 12288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xfee0e000 mprotect(0xfee0e000, 12288, PROT_NONE) = 0 mmap2(0xfee04000, 40960, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xfee04000 sigaltstack({ss_sp=0xfee04000, ss_flags=0, ss_size=40960}, NULL) = 0 The problem is that the address used is in valgrind's address space so valgrind faults the first mmap call and the JVM then gives up with an error about failing to allocate the stack guard page. The 1.5.0 beta 2 version of the JVM doesn't do this, but fails with an abort in the hotspot compiler, apparently in pthread_cond_wait. Turning off hotspot doesn't seem to work as it still seems to be used... This specific bug is fixed in CVS head. There still seems to be problems with using --trace-children=yes; the java command just keeps re-execing itself. I think it's getting confused by Valgrind's environment changes. The re-execing is because Java insists on having it's own directory at the front of LD_LIBRARY_PATH and if it isn't then it adds it and re-execs. So valgrind puts itself at the front and starts Java which puts itself at the front and restarts valgrind which puts itself at the front and so on ad infinitum. |