For the below (wrong) program, both helgrind and drd are crashing, showing an internal error. When running outside of helgrind/drd, the program just reports a failed system call. With the addition of the following code at the beginning of drd_pre_mem_read_asciiz and evh__pre_mem_read_asciiz, no crash occurs (program behaviour is then the same with or without helgrind/drd): // check if the address is valid. If not valid, do nothing. // We have a bit of a problem here: what is the length to be checked ? // For this, we must call strlen, but this implies to have access to it. // Currently, we just check one byte. if (!VG_(am_is_valid_for_client) (a, 1, VKI_PROT_NONE)) { VG_(umsg)("System call invoked with invalid string address %p\n", (void*)a); return; } Note that the wrong chmod happens in a big executable, in a lib for which we do not have the sources, so bypassing that in helgrind/drd would be nice to have. #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> int main () { int res; char *path = (char *) 0x2; res = chmod (path, 0); if (res == 0) printf ("unexpected success of chmod of brol address\n"); else perror ("show error of chmod of brol address"); } gcc -g -o chmod_brol chmod_brol.c install/bin/valgrind --tool=helgrind ./chmod_brol ==26778== Helgrind, a thread error detector ==26778== Copyright (C) 2007-2010, and GNU GPL'd, by OpenWorks LLP et al. ==26778== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==26778== Command: ./chmod_brol ==26778== --26778-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --26778-- si_code=1; Faulting address: 0x2; sp: 0x40345bd38 valgrind: the 'impossible' happened: Killed by fatal signal ==26778== at 0x38019B42: vgPlain_strlen (m_libcbase.c:174) ==26778== by 0x38004582: evh__pre_mem_read_asciiz (hg_main.c:1769) ==26778== by 0x380588AF: vgPlain_client_syscall (syswrap-main.c:1443) ==26778== by 0x38055F71: vgPlain_scheduler (scheduler.c:895) ==26778== by 0x3807BD9E: run_a_thread_NORETURN (syswrap-linux.c:94) sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==26778== at 0x4EEB6C7: chmod (in /lib64/libc-2.5.so) ==26778== by 0x400545: main (chmod_brol.c:8) install/bin/valgrind --tool=drd ./chmod_brol ==26894== drd, a thread error detector ==26894== Copyright (C) 2006-2010, and GNU GPL'd, by Bart Van Assche. ==26894== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==26894== Command: ./chmod_brol ==26894== --26894-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --26894-- si_code=1; Faulting address: 0x2; sp: 0x40305bd58 valgrind: the 'impossible' happened: Killed by fatal signal ==26894== at 0x3800FF40: drd_pre_mem_read_asciiz (drd_main.c:264) ==26894== by 0x380562DF: vgPlain_client_syscall (syswrap-main.c:1443) ==26894== by 0x380539A1: vgPlain_scheduler (scheduler.c:895) ==26894== by 0x380797CE: run_a_thread_NORETURN (syswrap-linux.c:94) sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==26894== at 0x4EF56C7: chmod (in /lib64/libc-2.5.so) ==26894== by 0x400545: main (chmod_brol.c:8)
I can reproduce the reported behavior. But I'm not sure it is the purpose of Helgrind or DRD to report pointer errors instead of crashing on them - reporting pointer errors is the purpose of memcheck. Furthermore, in the manuals of both Helgrind and DRD it is recommended to make a program memcheck-clean before starting to analyze it with Helgrind or DRD. Is it OK for you to close this bug report ?
(In reply to comment #1) > I can reproduce the reported behavior. But I'm not sure it is the purpose of > Helgrind or DRD to report pointer errors instead of crashing on them - > reporting pointer errors is the purpose of memcheck. Furthermore, in the > manuals of both Helgrind and DRD it is recommended to make a program > memcheck-clean before starting to analyze it with Helgrind or DRD. Is it OK for > you to close this bug report ? This error happens in a library for which we do not have the sources => can't fix it. memcheck does not crash on such an error, so we can put it in the memcheck ignore list, but cannot tell drd/helgrind to ignore it, as they both crash. So, to still allow drd/helgrind to be used, it looks better to either ignore such unaddressable bytes (with a msg maybe: the patch I did does that) or alternatively by defining a new kind of drd/helgrind errors (which would tell something like: "unaddressable byte(s) accessed, use memcheck for chasing this error") With this, drd/helgrind would be more usable (and would not change the behaviour of system calls).
Philippe, the use of VKI_PROT_NONE in the patch is not quite right. It allows the use of VG_(strlen) even if the area is mapped but does not have read permissions (eg, --- or -x-). Can you test your patch with VKI_PROT_READ instead of VKI_PROT_NONE on your big application? For the small test case, VKI_PROT_READ works OK.
Modified version of the patch committed as r11533, for both Helgrind and DRD, on the assumption that VKI_PROT_READ is OK -- shout if not! Thanks for looking into this.
(In reply to comment #4) > Modified version of the patch committed as r11533, for both Helgrind > and DRD, on the assumption that VKI_PROT_READ is OK -- shout if not! > Thanks for looking into this. I am quite sure the patch will avoid the problem on the big executable (but I will check in any case) Two comments however: 1. if SHOW_EVENTS is set to >= 1, then I believe the trace will crash. 2. if a page is only PROT_WRITE or PROT_EXEC, posix standard tells the implementation may (implicitely) give read access. I think linux behaves like that (e.g. PROT_WRITE implicitely gives read access, but without having PROT_READ in the protection bits). So, a "fully correct" check implies to check for READ or (if the kernel gives implicit read) for WRITE or for EXEC. Using VKI_PROT_NONE is on the safe side for finding race condition, but is on the unsafe side for crashes: VG still crashes if the protection is only PROT_NONE. Using VKI_PROT_READ is on the safe side to avoid VG crashes, but might cause helgrind/drd to "not see" some reads of a PROT_WRITE or PROT_EXEC. At the end, the VKI_PROT_READ looks more readable/understandable than the VKI_PROT_NONE check, and avoids VG crashes. And probably no one will bitterly complain about a false negative in such a bizarre PROT_WRITE or PROT_EXEC only case. There might however be cases (or other tools) where the above subtility is more important, so this is why I give this (long) comment that you can ignore :). Thanks for this fix and for the threaded fork fix. Philippe
(In reply to comment #5) > Two comments however: > 1. if SHOW_EVENTS is set to >= 1, then I believe the trace will crash. At second reading, this belief is wrong: it is the pointer a which is traced, not the string. So, an invalid pointer will not cause a crash in the trace.