Trying to run the current git version of UltraStar Deluxe compiled with Free Pascal 3.0.0 in Valgrind's Memcheck almost immediately leads to the following crash: ==29223== Memcheck, a memory error detector ==29223== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==29223== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info ==29223== Command: ./game/ultrastardx ==29223== --29223-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --29223-- si_code=1; Faulting address: 0x0; sp: 0x62fede04 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==29223== at 0x380128A3: mc_is_defined_asciiz (mc_main.c:4235) ==29223== by 0x380128A3: check_mem_is_defined_asciiz (mc_main.c:4313) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 29223) ==29223== at 0x804D668: SYSTEM_$$_FPSYSCALL$LONGINT$LONGINT$LONGINT$$LONGINT (in /home/daniel/USDX/game/ultrastardx) ==29223== by 0x804DBBD: SYSTEM_$$_FPACCESS$PCHAR$LONGINT$$LONGINT (in /home/daniel/USDX/game/ultrastardx) ==29223== by 0x8181621: SYSUTILS_$$_FILEEXISTS$RAWBYTESTRING$$BOOLEAN (in /home/daniel/USDX/game/ultrastardx) ==29223== by 0x81AE517: INIFILES$_$TINIFILE_$__$$_CREATE$ANSISTRING$BOOLEAN$$TINIFILE (in /home/daniel/USDX/game/ultrastardx) ==29223== by 0x81AF98A: INIFILES$_$TMEMINIFILE_$__$$_CREATE$ANSISTRING$BOOLEAN$$TMEMINIFILE (in /home/daniel/USDX/game/ultrastardx) The crash happens when mc_is_defined_asciiz looks at the first byte of the pathname argument of the access syscall. For some reason the Free Pascal system library wants to call access with NULL, but that should not make Memcheck crash. The question is why get_vabits2(NULL) returns VA_BITS2_DEFINED. It does not crash on a C application that only does the access(NULL, F_OK) call.
There should be no difference of behaviour between Pascal runtime calling access and a C application calling access. Can you run your small C application under memheck with --trace-syscalls=yes and give the relevant trace portion for the 'access' syscalls ? Then similarly run the pascal application. Thanks On my side, running the below C code gives as trace: SYSCALL[6688,1](21) sys_access ( 0x400614(/dev/null), 0 )[sync] --> Success(0x0) SYSCALL[6688,1](21) sys_access ( 0x0((null)), 0 )==6688== Syscall param access(pathname) points to unaddressable byte(s) ==6688== at 0x4F0EC57: access (syscall-template.S:81) ==6688== by 0x400578: main (sys_access.c:6) access /dev/null 0 access NULL -1 which seems all ok to me. #include <stdio.h> #include <unistd.h> main() { printf("access /dev/null %d\n", access("/dev/null", 0)); printf("access NULL %d\n", access(NULL, 0)); }
Created attachment 104477 [details] Log of int main(){access(NULL,F_OK);return 0;}
Created attachment 104478 [details] Log of UltraStar Deluxe
Strange: effectively, the 2 traces are showing that the same syscall is done. Unclear what is happening at this stage : it looks like the address 0x0 is considered by valgrind as accessible and defined in the Pascal application. Can you do the following test: Use vgdb to connect GDB to your program running under Valgrind. Put a breakpoint just before the failing access syscall, and then in GDB, check the VA bits using: monitor get_vbits 0x0 10 If the 10 bytes at address 0x0 are considered as not accessible, this should give (gdb) monitor get_vbits 0x0 10 ________ ________ ____ Address 0x0 len 10 has 10 bytes unaddressable for both the C application and the pascal application. If the bytes are not addressable and then the access syscall still SEGV, then the mystery deepens further : we need then to debug the Valgrind code that handles the VA bits to check the ascii character string. But if the address 0x0 is considered as addressable, then the best is to put breakpoints at regular places in the pascal application, and see at what points Valgrind starts to consider this address as accessible. Maybe there are some client requests changing the addressibility ? (to find this, an laternative solution is to run with more trace enabled e.g. valgrind -v -v <your pascal app> |& grep request:
Created attachment 104511 [details] First half of the fix Part of the problem is that SNDRV_PCM_IOCTL_PREPARE was not handled correctly. There is a "break" missing in the post-syscall function. I have attached a patch. The other half of the problem is that DRM_IOCTL_VERSION will not write more than name_len/date_len/desc_len bytes into name/date/desc but always sets name_len/date_len/desc_len to the number of bytes that would have been needed. Libdrm first tests how much is needed by setting all fields to 0/NULL before it calls the ioctl a second time with allocated buffers. Is there any way to pass the input values of name_len/date_len/desc_len from PRE(sys_ioctl) to POST(sys_ioctl)?
(In reply to Daniel Glöckner from comment #5) > Created attachment 104511 [details] > First half of the fix > > Part of the problem is that SNDRV_PCM_IOCTL_PREPARE was not handled > correctly. > There is a "break" missing in the post-syscall function. I have attached a > patch. So, IIUC, due to this missing break, the code was falling through VKI_SNDRV_CTL_IOCTL_PVERSION: which was then setting sizeof(int) bytes as defined at ARG3 so, randomly setting some bytes as defined, and I suppose in your case at address 0x0. Thanks for the analysis and the patch, this fix was committed as revision 16266. > > The other half of the problem is that DRM_IOCTL_VERSION will not write more > than name_len/date_len/desc_len bytes into name/date/desc but always sets > name_len/date_len/desc_len to the number of bytes that would have been > needed. > Libdrm first tests how much is needed by setting all fields to 0/NULL before > it calls the ioctl a second time with allocated buffers. > > Is there any way to pass the input values of name_len/date_len/desc_len from > PRE(sys_ioctl) to POST(sys_ioctl)? No easy way that I know of. I suppose that for this second half, the problem is that the the resulting *_len causes again the bytes from 0x0 to be marked as defined. A quick/partially correct (for your case) fix is to avoid setting bytes as defined if name/date/desc are NULL. But that will not cover the cases where these are not NULL, but are too small : too many bytes will be marked as defined. As you suggest the proper fix is in the POST to see what were the initial buffer lengths given by the user, and only mark as defined either the effective lengths touched by the kernel (if <= to the user lengths), or else the user lengths. As far as I know, a possibility to do what you need is : * In the PRE, allocate a block of memory big enough to 1. copy the struct the user has given 2. store after this struct user the 3 input length and the user original struct address. * In the POST, copy back the kernel result to the user struct, using the kernel returned length and the user saved length to decide what to copy and what to set define. I guess you also need to restore the ARG3 value. Then free the allocated block. This is a hack somewhat similar to what is done in e.g. pselect6 and ppoll PRE and POST.
Once I also needed to preserve some value between PRE- and POST- syscall handler. I was able to stash it into an unused syscall argument. See handling of VKI_CRYPTO_GET_PROVIDER_LIST in coregrind/m_syswrap/syswrap-solaris.c.
Created attachment 104515 [details] Second half of the fix The drm-version.patch allocates temporary memory to pass the buffer sizes to POST(sys_ioctl). I chose not to abuse ARG4-6 to do this since the kernel appears to always preserve these registers on the three architectures that I checked. The patch adds a macro container_of to pub_tool_basics.h. Contrary to the definition inside the Linux kernel, this one does not use GCC extensions to force an error if the pointer does not match the type of the member. Shall the macro be renamed to VG_something?
Patch (slightly modified) committed as revision 16274. Thanks for the analysis and patch, please retest to double check the modifications I did are ok. Thanks Some notes: * We should have a standard way to pass some data between the PRE and POST wrapper: up to now, we are lucky that we can either replace a pointer syscall ARG by another value (like in this patch) or on some archs, for some syscalls, hijack some unused args (like what Ivo did) * I highly suspect we have a similar bug in VKI_DRM_IOCTL_GET_UNIQUE except that when looking in the kernel code, I think that when the user buffer is too small, that nothing is written for GET_UNIQUE. See http://lxr.free-electrons.com/source/drivers/gpu/drm/drm_ioctl.c?v=3.2#L52
Following Ivo's comment that ARG3 would leak if syscall fails, a followup commit is done in revision 16277 to avoid leak in case the syscall fails. Would be nice to further retest if that all works :)
Thank you, Philippe!