Summary: | valgrind: m_libcfile.c:73 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed when valgrinding Wine | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Austin English <austinenglish> |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED DUPLICATE | ||
Severity: | crash | CC: | kde-bugs, philippe.waroquiers, tom |
Priority: | NOR | ||
Version: | 3.7 SVN | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
patch to help investigate safe_fd exhaustion
valgrind output with patch readlink output valgrind output with both patches strace -f valgrind Improve support for prlimit64 |
Description
Austin English
2011-06-15 00:21:16 UTC
On 15/06/11 00:21, Austin English wrote: > https://bugs.kde.org/show_bug.cgi?id=275673 > > Summary: valgrind: m_libcfile.c:73 (vgPlain_safe_fd): Assertion > 'newfd >= VG_(fd_hard_limit)' failed when valgrinding > Wine That should be the same problem that I have seen. A fix is here: https://bugs.kde.org/show_bug.cgi?id=214909#c103 (https://bugs.kde.org/attachment.cgi?id=60909) Well that may be the problem if the user is running with the gdbserver enabled, yes. If they're not then it won't be. (In reply to comment #2) > Well that may be the problem if the user is running with the gdbserver enabled, > yes. If they're not then it won't be. --vgdb arg has 3 values: no/yes/full, default value is yes. So, it really looks the same bug (unless the wine tests have been started with --vgdb=yes). To verify if this is the same problem, the easiest is to try with --vgdb=yes. A side question: when starting wine under Valgrind with gdbserver activated, is gdb able to debug wine ? Philippe (In reply to comment #3) > So, it really looks the same bug (unless the wine tests have been started > with --vgdb=yes). Humph, replace by --vgdb=no > To verify if this is the same problem, the easiest is to try with --vgdb=yes. Re-humph, replace by --vgdb=no (In reply to comment #1) > On 15/06/11 00:21, Austin English wrote: > > https://bugs.kde.org/show_bug.cgi?id=275673 > > > > Summary: valgrind: m_libcfile.c:73 (vgPlain_safe_fd): Assertion > > 'newfd >= VG_(fd_hard_limit)' failed when valgrinding > > Wine > > That should be the same problem that I have seen. A fix is here: > > https://bugs.kde.org/show_bug.cgi?id=214909#c103 > (https://bugs.kde.org/attachment.cgi?id=60909) Makes no difference here. (In reply to comment #4) > (In reply to comment #3) > > > So, it really looks the same bug (unless the wine tests have been started > > with --vgdb=yes). > Humph, replace by --vgdb=no > > > To verify if this is the same problem, the easiest is to try with --vgdb=yes. > Re-humph, replace by --vgdb=no Same problem. For reference, valgrind options used were: export VALGRIND_OPTS="--vgdb=no -q --trace-children=yes --track-origins=yes --gen-suppressions=all --suppressions=$WINESRC/tools/valgrind/valgrind-suppressions --suppressions=$WINESRC/tools/valgrind/gst.supp --leak-check=full --num-callers=20 --workaround-gcc296-bugs=yes --vex-iropt-precise-memory-exns=yes" (In reply to comment #5) > Same problem. > For reference, valgrind options used were: > export VALGRIND_OPTS="--vgdb=no -q --trace-children=yes --track-origins=yes > --gen-suppressions=all > --suppressions=$WINESRC/tools/valgrind/valgrind-suppressions > --suppressions=$WINESRC/tools/valgrind/gst.supp --leak-check=full > --num-callers=20 --workaround-gcc296-bugs=yes > --vex-iropt-precise-memory-exns=yes" With the patch and/or with --vgdb=no, it is unlikely that the problem is caused by gdbserver. Yesterday evening, I cloned the wine git and compiled/installed wine, and I have run w32 'sort.exe /?' executable under Valgrind under wine without problem. Is there an easy way to reproduce the problem ? (i.e. how to launch a wine test which produces the above problem). I can then try to further investigate Philippe (In reply to comment #6) > (In reply to comment #5) > > Same problem. > > For reference, valgrind options used were: > > export VALGRIND_OPTS="--vgdb=no -q --trace-children=yes --track-origins=yes > > --gen-suppressions=all > > --suppressions=$WINESRC/tools/valgrind/valgrind-suppressions > > --suppressions=$WINESRC/tools/valgrind/gst.supp --leak-check=full > > --num-callers=20 --workaround-gcc296-bugs=yes > > --vex-iropt-precise-memory-exns=yes" > > With the patch and/or with --vgdb=no, it is unlikely that the > problem is caused by gdbserver. > > Yesterday evening, I cloned the wine git and compiled/installed wine, and > I have run w32 'sort.exe /?' executable under Valgrind under wine without > problem. > > Is there an easy way to reproduce the problem ? (i.e. how to launch > a wine test which produces the above problem). > I can then try to further investigate > I was running the full test suite, using this script: http://code.google.com/p/winezeug/source/browse/trunk/valgrind/valgrind-daily.sh though for a quick method, you can use something like: #!/bin/sh export WINESRC="$HOME/wine-git" export CFLAGS="-g -O0 -fno-inline" export VALGRIND_OPTS="-q --trace-children=yes --track-origins=yes --gen-suppressions=all --suppressions=$WINESRC/tools/valgrind/valgrind-suppressions --leak-check=full --num-callers=20 --workaround-gcc296-bugs=yes --vex-iropt-precise-memory-exns=yes" export WINETEST_WRAPPER=valgrind export WINE_HEAP_TAIL_REDZONE=32 cd $WINESRC ./configure && make make -k test ---- I don't know a particular test off hand that has the bug, but if still needed, I can find you one this evening (US/Pacific Time). (In reply to comment #7) > I don't know a particular test off hand that has the bug, but if still needed, > I can find you one this evening (US/Pacific Time). If you can point at a specific test, that would be useful. I will then dig more in details this week-end. (In reply to comment #8) > If you can point at a specific test, that would be useful. > I will then dig more in details this week-end. Note that when trying to run the tests, I had a problem because the suppression file given in the valgrind opts is not present: philippe@gcc10:~/wine-git$ find . -name '*suppr*' philippe@gcc10:~/wine-git$ (I can download it from http://winezeug.googlecode.com/svn/trunk/valgrind/valgrind-suppressions but maybe this missing file means I did an error when downloading and building wine from git). When you have a specific test reproducing the problem, I will further investigate. (In reply to comment #9) > (In reply to comment #8) > > > If you can point at a specific test, that would be useful. > > I will then dig more in details this week-end. > > Note that when trying to run the tests, I had a problem because > the suppression file given in the valgrind opts is not present: > philippe@gcc10:~/wine-git$ find . -name '*suppr*' > philippe@gcc10:~/wine-git$ > > (I can download it from > http://winezeug.googlecode.com/svn/trunk/valgrind/valgrind-suppressions > but maybe this missing file means I did an error when downloading > and building wine from git). No, you're right, my apologies for not being more clear. > When you have a specific test reproducing the problem, I will further > investigate. Got one, dlls/advapi32/tests/security.c. To make it slightly easier to reproduce, open two terminals: Terminal A) $ cd $HOME/wine-git $ ./wine winemine Terminal B) $ cd $HOME/wine-git/dlls/advapi32/tests $ . ~/vg-wrapper # The script with those exports I posted earlier $ make security.ok Hope that helps. Thanks for looking! (In reply to comment #10) > $ make security.ok > > Hope that helps. Thanks for looking! I was able to run this test (multiple times, including without the winemine), but could not reproduce the problem. I re-introduced the gdbserver bug. Then the gdbserver test reproducing this bug failed again, but the wine regression test still did not fail. The way I attacked the gdbserver bug was to put a debug trace in VG_(safe_fd), outputting the stack trace of valgrind and the stack trace of all threads. This helped me to understand where the safe_fd exhaustion was coming from. For these stack traces, the easiest is copy/paste/modify the function __attribute__ ((noreturn)) static void report_and_quit ( const Char* report, UnwindStartRegs* startRegsIN ) in m_libcassert.c so as to obtain a non static function report_and_continue. Then if you call this from safe_fd (maybe also print the resulting fd), you might understand where all this safe_fd are coming from. You might also need to put some tracing in VG_(close) or similar to understand more about which ones are left opened. Sorry I can't help more if problem cannot be reproduced here ... (if I have more time/courage, I will take a look at extending --track-fds to also track the valgrind internal FDs, but in the meantime, the above addition of traces is the only idea I have). Philippe (In reply to comment #11) > (In reply to comment #10) > > > $ make security.ok > > > > Hope that helps. Thanks for looking! > > I was able to run this test (multiple times, including without > the winemine), but could not reproduce the problem. > I re-introduced the gdbserver bug. Then the gdbserver test reproducing > this bug failed again, but the wine regression test still did not fail. > > The way I attacked the gdbserver bug was to put a debug trace in > VG_(safe_fd), outputting the stack trace of valgrind and the stack > trace of all threads. > This helped me to understand where the safe_fd exhaustion was coming from. > > For these stack traces, the easiest is > copy/paste/modify the function __attribute__ ((noreturn)) > static void report_and_quit ( const Char* report, > UnwindStartRegs* startRegsIN ) > > in m_libcassert.c so as to obtain a non static function > report_and_continue. > > Then if you call this from safe_fd (maybe also print the resulting fd), > you might understand where all this safe_fd are coming from. > > You might also need to put some tracing in VG_(close) or similar to > understand more about which ones are left opened. > > Sorry I can't help more if problem cannot be reproduced here ... > > (if I have more time/courage, I will take a look at extending --track-fds to > also track the valgrind internal FDs, but in the meantime, the above > addition of traces is the only idea I have). > > Philippe Looking at my logs, it seems to consistently cause wine to hang on ole32/moniker.c, you might have better luck with that test. (In reply to comment #12) > Looking at my logs, it seems to consistently cause wine to hang on > ole32/moniker.c, you might have better luck with that test. Executed moniker two ways: with a separate winemine, and a full run (no winemine). In both cases, it was running under Valgrind, and reached the end without producing the 'safe_fd' error. (in the second case, there was more Valgrind processes). Here is some details about the full run : philippe@gcc10:~/wine-git/dlls/ole32$ rm moniker.ok; make moniker.ok > r_full.out 2>&1 philippe@gcc10:~/wine-git/dlls/ole32$ tail r_full.out fun:RtlAllocateHeap fun:rpcrt4_conn_np_alloc fun:RPCRT4_CreateConnection fun:rpcrt4_protseq_ncalrpc_open_endpoint fun:RPCRT4_use_protseq fun:RpcServerUseProtseqEpExW fun:RpcServerUseProtseqEpW fun:RPCSS_Initialize fun:main } philippe@gcc10:~/wine-git/dlls/ole32$ grep hard r_full.out ==4880== by 0xD61F510: harddisk_driver_entry (device.c:962) fun:harddisk_driver_entry philippe@gcc10:~/wine-git/dlls/ole32$ ls -lrt r*out -rw-r--r-- 1 philippe philippe 134847 2011-06-18 08:54 r.out -rw-r--r-- 1 philippe philippe 485231 2011-06-18 08:59 r_full.out philippe@gcc10:~/wine-git/dlls/ole32$ tail r.out fun:RtlAllocateHeap fun:rpcrt4_conn_np_alloc fun:RPCRT4_CreateConnection fun:rpcrt4_protseq_ncalrpc_open_endpoint fun:RPCRT4_use_protseq fun:RpcServerUseProtseqEpExW fun:RpcServerUseProtseqEpW fun:RPCSS_Initialize fun:main } philippe@gcc10:~/wine-git/dlls/ole32$ grep fd_hard r*out philippe@gcc10:~/wine-git/dlls/ole32$ So, still no better idea that adding trace in safe_fd. I'm not terribly familiar with the valgrind codebase, I don't suppose you have a diff I could apply and test? (In reply to comment #12) > Looking at my logs, it seems to consistently cause wine to hang on > ole32/moniker.c, you might have better luck with that test. I don't know if this is related, but I also see builds of recent Firefox hanging when controlled by GDB. I can attach in the normal way, with --vgdb-error=0, place a breakpoint at main, and it does break there. However, when I continue it hangs. After main, firefox forks (but does not exec). The child process soon exits with a segfault, and the hang happens after that (immediately afterwards, I assume). Does that make any sense? Could there be some kind of problem with fork + child exiting with a signal? Also: is there any possible bad interaction between the GDBserver functionality and --smc-check=all ? This flag causes V to check every SB that is run, to detect cases where the code from which the SB was made has changed. In such cases the translation is discarded and re-made. I can send exact how-to-repro info if you want. What it looks like when running --tool=none is: LD_LIBRARY_PATH=/space2/sewardj/MOZ/MC-09-06-2011-JEM/ff-opt/toolkit/library:/space2/sewardj/MOZ/MC-09-06-2011-JEM/ff-opt/js/src:/space2/sewardj/MOZ/MC-09-06-2011-JEM/ff-opt/dist/lib: EXE=./ff-opt/dist/bin/firefox-bin -P dev -no-remote ==15876== Nulgrind, the minimal Valgrind tool ==15876== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote. ==15876== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info ==15876== Command: ./ff-opt/dist/bin/firefox-bin -P dev -no-remote ==15876== ==15876== (action at startup) vgdb me ... ==15876== ==15876== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==15876== /path/to/gdb ./ff-opt/dist/bin/firefox-bin ==15876== and then give GDB the following command ==15876== target remote | /home/sewardj/VgTRUNK/merge/Inst/lib/valgrind/../../bin/vgdb --pid=15876 ==15876== --pid is optional if only one valgrind process is running ==15876== (so I start GDB in another shell, as directed, and do "b main", then "cont", then we are waiting at main(), then "cont", and then the below happens and it hangs: ==15888== ==15888== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==15888== /path/to/gdb ./ff-opt/dist/bin/firefox-bin ==15888== and then give GDB the following command ==15888== target remote | /home/sewardj/VgTRUNK/merge/Inst/lib/valgrind/../../bin/vgdb --pid=15888 ==15888== --pid is optional if only one valgrind process is running ==15888== ==15888== ==15888== Process terminating with default action of signal 11 (SIGSEGV) ==15888== Access not within mapped region at address 0x0 ==15888== at 0x5431593: glxtest() (glxtest.cpp:172) ==15888== by 0x5431779: fire_glxtest_process() (glxtest.cpp:234) ==15888== by 0x542881E: XRE_main (nsAppRunner.cpp:2752) ==15888== by 0x401BA4: main (nsBrowserApp.cpp:162) ==15888== If you believe this happened as a result of a stack ==15888== overflow in your program's main thread (unlikely but ==15888== possible), you can try to increase the size of the ==15888== main thread stack using the --main-stacksize= flag. ==15888== The main thread stack size used in this run was 8388608. ==15888== Created attachment 61107 [details]
patch to help investigate safe_fd exhaustion
The patch does two things:
Each time safe_fd is called, it prints the oldfd, the newfd, fg_hard_limit
and outputs the scheduler status.
The second thing is: just before exiting due to an internal error, it activates
the gdbserver. This will freeze the process, allowing either
to debug the guest process (probably not very interesting in this case)
or do any other command before the process dies.
What can be done in this case is then:
for f in /proc/8689/fd/*; do echo $f; readlink $f; done
(replace 8689 by the process which is close to die).
this will give information about fd which are not closed.
With the previous tracing, it should give a better idea
about where these safe_fd are coming from.
(In reply to comment #15) > I don't know if this is related, but I also see builds of recent > Firefox hanging when controlled by GDB. I can attach in the normal > way, with --vgdb-error=0, place a breakpoint at main, and it does > break there. However, when I continue it hangs. I understand that the problem Austin has is a crash due to safe_fd exhaustion causing an assert. (which is what the patch uploaded in comment 14 should help to investigate). Also, in his case, there is no usage of gdbserver (and problem also happens with --vgdb=no). > Does that make any sense? Could there be some kind of problem > with fork + child exiting with a signal? What might happen is the following: if a gdb is attached to the parent process firefox gdbserver and the child dies then IIRC, the parent might receive a signal SIGCHLD If you are connected to a gdbserver of the parent, then gdbserver should report the signal to gdb, where you have to continue the parent. Or possibly the child is waiting for gdb to connect ? > > Also: is there any possible bad interaction between the GDBserver > functionality and --smc-check=all ? This flag causes V to check every > SB that is run, to detect cases where the code from which the SB was > made has changed. In such cases the translation is discarded and > re-made. For what concerns the interaction between --smc-check=all and gdbserver: When inserting a break somewhere (or single stepping), the translation will be discarded by gdbserver to be re-instrumented. I do not think there is anything special with smc code. Is it different if you use the new -smc-check value ("all non file") ? I did not recently tried to debug firefox with Valgrind gdbserver, I will do that this WE and give feedback. Created attachment 61108 [details]
valgrind output with patch
Created attachment 61109 [details]
readlink output
ps showed two valgrind processes, I've attached the readlink output for both.
(In reply to comment #19) > Created an attachment (id=61109) [details] > readlink output > > ps showed two valgrind processes, I've attached the readlink output for both. Looking at the readlink output at the moment of the crash for the 2 processes, This is the situation at crash time (for the safe_fd range): 28814 (busy forking 28823) 1014 /home/austin/wine-git/loader/wine-preloader 1015 /tmp/valgrind_proc_28814_cmdline_d11c5e42 (deleted) 1016 pipe 1017 pipe 1018 pipe 1019 /tmp/vgdb-pipe-from-vgdb-to-28814 28323 1014 /home/austin/wine-git/loader/wine-preloader 1015 /tmp/valgrind_proc_28814_cmdline_d11c5e42 (deleted) 1016 pipe 1019 /tmp/vgdb-pipe-from-vgdb-to-28814 So, in terms of available safe_fd in process 28323, there should be no problem. If the unix kernel is correct, then at least 1017 and 1018 are free to be used by safe_fd, as they were properly usable by 28814 and are not marked as busy in 28323. So, a mystery. Maybe the newfd = VG_(fcntl)(oldfd, VKI_F_DUPFD, VG_(fd_hard_limit)); in VG_(safe_fd) fails for whatever strange reason. One reason might be the max limit of opened fd (very unlikely, as 28323 has about 25 FDs opened only according to readlink output). In any case, what is the output of ulimit -a ? Otherwise, assuming your ulimit is high, can you apply the following patch and re-run ? This should give the errno explaining fcntl failure. Index: coregrind/m_libcfile.c =================================================================== --- coregrind/m_libcfile.c (revision 11822) +++ coregrind/m_libcfile.c (working copy) @@ -383,6 +383,9 @@ # else # error "Unknown OS" # endif + if (sr_isError(res)) + VG_(printf) ("fcntl failure fd %d cmd %d err %lldn", + fd, cmd, (Long)sr_Err(res)); return sr_isError(res) ? -1 : sr_Res(res); } (above patch compiled, but no printf executed here). Alternatively, you might launch valgrind using strace -f valgrind ..... (In reply to comment #20) > Maybe the newfd = VG_(fcntl)(oldfd, VKI_F_DUPFD, VG_(fd_hard_limit)); > in VG_(safe_fd) fails for whatever strange reason. > One reason might be the max limit of opened fd (very unlikely, as 28323 > has about 25 FDs opened only according to readlink output). > In any case, what is the output of ulimit -a ? austin@debian:~$ uname -a Linux debian 2.6.38-2-686 #1 SMP Sun May 8 14:49:45 UTC 2011 i686 GNU/Linux austin@debian:~$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 6981 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 6981 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited > Otherwise, assuming your ulimit is high, can you apply the following > patch and re-run ? This should give the errno explaining fcntl failure. > > Index: coregrind/m_libcfile.c > =================================================================== > --- coregrind/m_libcfile.c (revision 11822) > +++ coregrind/m_libcfile.c (working copy) > @@ -383,6 +383,9 @@ > # else > # error "Unknown OS" > # endif > + if (sr_isError(res)) > + VG_(printf) ("fcntl failure fd %d cmd %d err %lldn", > + fd, cmd, (Long)sr_Err(res)); > return sr_isError(res) ? -1 : sr_Res(res); > } sched status during safe_fd safe_fd oldfd 3 newfd 1019 fd_hard_limit 1014 preloader: Warning: failed to reserve range 00110000-68000000 fcntl failure fd 20 cmd 0 err 22n==30665== at 0x38029992: report_and_continue (m_libcassert.c:264) ... sched status during safe_fd safe_fd oldfd 20 newfd -1 fd_hard_limit 1014 fcntl failure fd -1 cmd 2 err 9n valgrind: m_libcfile.c:76 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed. I'll attach full trace again. Created attachment 61118 [details]
valgrind output with both patches
(In reply to comment #21) > sched status during safe_fd > safe_fd oldfd 3 newfd 1019 fd_hard_limit 1014 > preloader: Warning: failed to reserve range 00110000-68000000 > fcntl failure fd 20 cmd 0 err 22n==30665== at 0x38029992: > report_and_continue (m_libcassert.c:264) 22 = EINVAL. According to man fcntl, it means that the 3arg is either negative or bigger than max allowed ??? But we do not properly see the output (seems to have a mixup between various processes output). I guess we will need to go to strace -f valgrind .... to have the full set of syscall results before. > > ... > > sched status during safe_fd > safe_fd oldfd 20 newfd -1 fd_hard_limit 1014 > fcntl failure fd -1 cmd 2 err 9n > valgrind: m_libcfile.c:76 (vgPlain_safe_fd): Assertion 'newfd >= > VG_(fd_hard_limit)' failed. This is the consequence of the dup failure. Created attachment 61120 [details]
strace -f valgrind
4 MB uncompressed.
I'll be away from this computer for about a week, but I can try reproducing it on another machine if needed.
(In reply to comment #24) > Created an attachment (id=61120) [details] > strace -f valgrind > > 4 MB uncompressed. > > I'll be away from this computer for about a week, but I can try reproducing it > on another machine if needed. From the strace output, it looks like fcntl fails with EINVAL, but it should not according to the manual. From the full strace, I cannot understand what is going on. After some cleanup (e.g. removing calls to sigprocmask, write, etc), here is a working pipe then fcntl: strace.txt:10806:clone(Process 2560 attached strace.txt:10807:child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x4199728) = 2560 strace.txt:10809:[pid 2560] gettid() = 2560 strace.txt:11838:[pid 2560] close(1017) = 0 strace.txt:11839:[pid 2560] close(1018) = 0 strace.txt:11840:[pid 2560] pipe([5, 6]) = 0 strace.txt:11841:[pid 2560] fcntl(5, F_DUPFD, 1014) = 1017 strace.txt:11842:[pid 2560] gettid() = 2560 strace.txt:11923:[pid 2560] close(5) = 0 strace.txt:11924:[pid 2560] fcntl(1017, F_SETFD, FD_CLOEXEC) = 0 strace.txt:11925:[pid 2560] fcntl(6, F_DUPFD, 1014) = 1018 strace.txt:11926:[pid 2560] gettid() = 2560 strace.txt:12007:[pid 2560] close(6) = 0 strace.txt:12008:[pid 2560] fcntl(1018, F_SETFD, FD_CLOEXEC) = 0 strace.txt:12010:[pid 2560] gettid() = 2560 strace.txt:12011:[pid 2560] read(1017, "A", 1) = 1 strace.txt:12012:[pid 2560] close(1019) = 0 strace.txt:12014:[pid 2560] mknod("/tmp/vgdb-pipe-from-vgdb-to-2560", S_IFIFO|0666) = 0 strace.txt:12015:[pid 2560] mknod("/tmp/vgdb-pipe-to-vgdb-from-2560", S_IFIFO|0666) = 0 strace.txt:12016:[pid 2560] open("/tmp/vgdb-pipe-shared-mem-vgdb-2560", O_RDWR|O_CREAT, 0666) = 5 .... And here is the failing pipe/fcntl: strace.txt:60232:[pid 2556] clone(Process 2563 attached strace.txt:60233:child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x42f7938) = 2563 strace.txt:60235:[pid 2563] gettid() = 2563 strace.txt:61322:[pid 2563] close(1017) = 0 strace.txt:61323:[pid 2563] close(1018) = 0 strace.txt:61324:[pid 2563] pipe([20, 21]) = 0 strace.txt:61325:[pid 2563] fcntl(20, F_DUPFD, 1014) = -1 EINVAL (Invalid argument) strace.txt:61327:[pid 2563] gettid() = 2563 strace.txt:61408:[pid 2563] fcntl(-1, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) strace.txt:61413:[pid 2563] gettid() = 2563 strace.txt:61509:[pid 2563] poll([{fd=1019, events=POLLIN}], 1, -1 man fcntl tells that for a DUP, EINVAL is returned if the 3rd arg is negative or bigger than the max allowed. The 3rd arg is not negative, and all the calls in the strace are setting the max no files to 1024. Moreover, 1017 and 1018 have just be closed above. From the traces captured till now, I have no idea anymore what to get. Maybe try on another computer ? Another linux version ? Maybe do an strace including the other wine processes (i.e. without winemine on the side) just to see really nobody does an setrlimit somewhere ? If the problem originates from a valgrind change, do a bissect to find the revision introducing the problem ? Doesn't occur on my Gentoo machine (also 32-bit) with 2.6.37 with valgrind/svn and wine/git. I'll be away from that machine for a week, I'll try to get a debian testing machine here and see if it has the same problem. Otherwise, I'll do the bisect next week. Cheers. (In reply to comment #25) > From the traces captured till now, I have no idea anymore what to get. > > Maybe try on another computer ? > Another linux version ? I don't experience the assertion that Austin sees with my 32-bit Ubuntu 10.04 machine or a 64-bit Fedora 14 virtual machine. I do see the problem on my 64-bit Fedora 15 machine. > If the problem originates from a valgrind change, do a bissect to find > the revision introducing the problem ? I wasn't able to perform the bisect on my Fedora 15 machine because the configure script in older revisions rejected glibc 2.14, but an educated guess pinpointed revision 11806 as the culprit, and reverting it resolves the assertion failure. (In reply to comment #27) > I wasn't able to perform the bisect on my Fedora 15 machine because the > configure script in older revisions rejected glibc 2.14, but an educated guess > pinpointed revision 11806 as the culprit, and reverting it resolves the > assertion failure. revision 11806 adds the support for a new system call (prlimit64). Is this syscall called by wine (to get or set the nr of FDs ?) (In reply to comment #28) > (In reply to comment #27) > > > I wasn't able to perform the bisect on my Fedora 15 machine because the > > configure script in older revisions rejected glibc 2.14, but an educated guess > > pinpointed revision 11806 as the culprit, and reverting it resolves the > > assertion failure. > > > revision 11806 adds the support for a new system call (prlimit64). > > Is this syscall called by wine (to get or set the nr of FDs ?) I don't believe it does so directly, to the best of my knowledge. My suspicion is that the call comes from newer glibc. Created attachment 68711 [details]
Improve support for prlimit64
This patch improves prlimit64 support.
It tries to make the code work like the existing setrlimit/getrlimit.
The support can still be improved.
Currently arg1 (pid) is completely ignored (and assumed the current process is meant).
Comment on attachment 68711 [details] Improve support for prlimit64 If this is caused by bad prlimit64 support, there now is an extra bug for it: Bug 294047 Still in 3.7.0. (In reply to comment #32) > Still in 3.7.0. The improvement to prlimit64 was committed only in 3.8.0. Can you retry with 3.8.0 or 3.8.1 ? Thanks (In reply to comment #33) > (In reply to comment #32) > > Still in 3.7.0. > The improvement to prlimit64 was committed only in 3.8.0. > Can you retry with 3.8.0 or 3.8.1 ? > Thanks Wonderful! Working fine in 3.8.1, thanks! duplicate of bug 294047 *** This bug has been marked as a duplicate of bug 294047 *** |