Bug 376956 - syswrap of SNDDRV and DRM_IOCTL_VERSION causing some addresses to be wrongly marked as addressable
Summary: syswrap of SNDDRV and DRM_IOCTL_VERSION causing some addresses to be wrongly ...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.13 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-26 14:47 UTC by Daniel Glöckner
Modified: 2017-03-17 18:48 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Log of int main(){access(NULL,F_OK);return 0;} (4.55 KB, text/plain)
2017-03-09 22:59 UTC, Daniel Glöckner
Details
Log of UltraStar Deluxe (319.96 KB, text/plain)
2017-03-09 23:00 UTC, Daniel Glöckner
Details
First half of the fix (482 bytes, patch)
2017-03-11 18:17 UTC, Daniel Glöckner
Details
Second half of the fix (3.86 KB, patch)
2017-03-12 17:25 UTC, Daniel Glöckner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Glöckner 2017-02-26 14:47:13 UTC
Trying to run the current git version of UltraStar Deluxe compiled with Free Pascal 3.0.0 in Valgrind's Memcheck almost immediately leads to the following crash:

==29223== Memcheck, a memory error detector
==29223== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==29223== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==29223== Command: ./game/ultrastardx
==29223== 
--29223-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--29223-- si_code=1;  Faulting address: 0x0;  sp: 0x62fede04

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==29223==    at 0x380128A3: mc_is_defined_asciiz (mc_main.c:4235)
==29223==    by 0x380128A3: check_mem_is_defined_asciiz (mc_main.c:4313)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 29223)
==29223==    at 0x804D668: SYSTEM_$$_FPSYSCALL$LONGINT$LONGINT$LONGINT$$LONGINT (in /home/daniel/USDX/game/ultrastardx)
==29223==    by 0x804DBBD: SYSTEM_$$_FPACCESS$PCHAR$LONGINT$$LONGINT (in /home/daniel/USDX/game/ultrastardx)
==29223==    by 0x8181621: SYSUTILS_$$_FILEEXISTS$RAWBYTESTRING$$BOOLEAN (in /home/daniel/USDX/game/ultrastardx)
==29223==    by 0x81AE517: INIFILES$_$TINIFILE_$__$$_CREATE$ANSISTRING$BOOLEAN$$TINIFILE (in /home/daniel/USDX/game/ultrastardx)
==29223==    by 0x81AF98A: INIFILES$_$TMEMINIFILE_$__$$_CREATE$ANSISTRING$BOOLEAN$$TMEMINIFILE (in /home/daniel/USDX/game/ultrastardx)

The crash happens when mc_is_defined_asciiz looks at the first byte of the pathname argument of the access syscall. For some reason the Free Pascal system library wants to call access with NULL, but that should not make Memcheck crash. The question is why get_vabits2(NULL) returns VA_BITS2_DEFINED. It does not crash on a C application that only does the access(NULL, F_OK) call.
Comment 1 Philippe Waroquiers 2017-03-09 16:48:04 UTC
There should be no difference of behaviour between Pascal runtime calling
access and a C application calling access.

Can you run your small C application under memheck with
   --trace-syscalls=yes
and give the relevant trace portion for the 'access' syscalls ?

Then similarly run the pascal application.
Thanks


On my side, running the below C code gives as trace:
  SYSCALL[6688,1](21) sys_access ( 0x400614(/dev/null), 0 )[sync] --> Success(0x0) 
  SYSCALL[6688,1](21) sys_access ( 0x0((null)), 0 )==6688== Syscall param access(pathname) points to unaddressable byte(s)
  ==6688==    at 0x4F0EC57: access (syscall-template.S:81)
  ==6688==    by 0x400578: main (sys_access.c:6)
  access /dev/null 0
  access NULL -1
which seems all ok to me.

#include <stdio.h>
#include <unistd.h>
main()
{
   printf("access /dev/null %d\n", access("/dev/null", 0));
   printf("access NULL %d\n", access(NULL, 0));
}
Comment 2 Daniel Glöckner 2017-03-09 22:59:36 UTC
Created attachment 104477 [details]
Log of int main(){access(NULL,F_OK);return 0;}
Comment 3 Daniel Glöckner 2017-03-09 23:00:35 UTC
Created attachment 104478 [details]
Log of UltraStar Deluxe
Comment 4 Philippe Waroquiers 2017-03-11 13:07:56 UTC
Strange: effectively, the 2 traces are showing that the same syscall is done.

Unclear what is happening at this stage : it looks like the address 0x0
is considered by valgrind as accessible and defined in the Pascal application.

Can you do the following test:
Use vgdb to connect GDB to your program running under Valgrind.
Put a breakpoint just before the failing access syscall, and then
in GDB, check the VA bits using:
monitor get_vbits 0x0 10

If the 10 bytes at address 0x0 are considered as not accessible, this should give
(gdb) monitor get_vbits 0x0 10
________ ________ ____
Address 0x0 len 10 has 10 bytes unaddressable

for both the C application and the pascal application.
If the bytes are not addressable and then the access syscall still SEGV,
then the mystery deepens further : we need then to debug the Valgrind code
that handles the VA bits to check the ascii character string.

But if the address 0x0 is considered as addressable, then the best
is to put breakpoints at regular places in the pascal application,
and see at what points Valgrind starts to consider this address as
accessible.
Maybe there are some client requests changing the addressibility ?
(to find this, an laternative solution is to run with more trace enabled e.g.
valgrind -v -v <your pascal app> |& grep request:
Comment 5 Daniel Glöckner 2017-03-11 18:17:08 UTC
Created attachment 104511 [details]
First half of the fix

Part of the problem is that SNDRV_PCM_IOCTL_PREPARE was not handled correctly.
There is a "break" missing in the post-syscall function. I have attached a patch.

The other half of the problem is that DRM_IOCTL_VERSION will not write more than name_len/date_len/desc_len bytes into name/date/desc but always sets name_len/date_len/desc_len to the number of bytes that would have been needed.
Libdrm first tests how much is needed by setting all fields to 0/NULL before it calls the ioctl a second time with allocated buffers.

Is there any way to pass the input values of name_len/date_len/desc_len from PRE(sys_ioctl) to POST(sys_ioctl)?
Comment 6 Philippe Waroquiers 2017-03-11 21:24:28 UTC
(In reply to Daniel Glöckner from comment #5)
> Created attachment 104511 [details]
> First half of the fix
> 
> Part of the problem is that SNDRV_PCM_IOCTL_PREPARE was not handled
> correctly.
> There is a "break" missing in the post-syscall function. I have attached a
> patch.
So, IIUC, due to this missing break, the code was falling through
VKI_SNDRV_CTL_IOCTL_PVERSION:
which was then setting sizeof(int) bytes as defined at ARG3 
so, randomly setting some bytes as defined, and I suppose in your case
at address 0x0.
Thanks for the analysis and the patch, this fix was committed as
revision 16266.



> 
> The other half of the problem is that DRM_IOCTL_VERSION will not write more
> than name_len/date_len/desc_len bytes into name/date/desc but always sets
> name_len/date_len/desc_len to the number of bytes that would have been
> needed.
> Libdrm first tests how much is needed by setting all fields to 0/NULL before
> it calls the ioctl a second time with allocated buffers.
> 
> Is there any way to pass the input values of name_len/date_len/desc_len from
> PRE(sys_ioctl) to POST(sys_ioctl)?
No easy way that I know of.

I suppose that for this second half, the problem is that the
the resulting *_len causes again the bytes from 0x0 to be marked
as defined.

A quick/partially correct (for your case) fix is to avoid setting bytes
as defined if name/date/desc are NULL.
But that will not cover the cases where these are not NULL, but are too
small : too many bytes will be marked as defined.

As you suggest the proper fix is in the POST to see what were the initial
buffer lengths given by the user, and only mark as defined either the effective
lengths touched by the kernel (if <= to the user lengths), or else the
user lengths.

As far as I know, a possibility to do what you need is :
* In the PRE, allocate a block of memory big enough to 
   1. copy the struct the user has given
   2. store after this struct user the 3 input length and the user
       original struct address.
* In the POST, copy back the kernel result to the user struct, using
  the kernel returned length and the user saved length to decide what to copy
  and what to set define.
  I guess you also need to restore the ARG3 value.
  Then free the allocated block.

This is a hack somewhat similar to what is done in e.g. pselect6
and ppoll PRE and POST.
Comment 7 Ivo Raisr 2017-03-12 05:49:41 UTC
Once I also needed to preserve some value between PRE- and POST- syscall handler.
I was able to stash it into an unused syscall argument.
See handling of VKI_CRYPTO_GET_PROVIDER_LIST in coregrind/m_syswrap/syswrap-solaris.c.
Comment 8 Daniel Glöckner 2017-03-12 17:25:02 UTC
Created attachment 104515 [details]
Second half of the fix

The drm-version.patch allocates temporary memory to pass the buffer sizes to POST(sys_ioctl). I chose not to abuse ARG4-6 to do this since the kernel appears to always preserve these registers on the three architectures that I checked.

The patch adds a macro container_of to pub_tool_basics.h. Contrary to the definition inside the Linux kernel, this one does not use GCC extensions to force an error if the pointer does not match the type of the member. Shall the macro be renamed to VG_something?
Comment 9 Philippe Waroquiers 2017-03-15 19:44:01 UTC
Patch (slightly modified) committed as revision  16274.
Thanks for the analysis and patch, please retest to
double check the modifications I did are ok.
Thanks

Some notes:
* We should have a standard way to pass some data between
  the PRE and POST wrapper: up to now, we are lucky that we can either
  replace a pointer syscall ARG by another value (like in this patch) or on some archs,
  for some syscalls, hijack some unused args (like what Ivo did)
* I highly suspect we have a similar bug in VKI_DRM_IOCTL_GET_UNIQUE
  except that when looking in the kernel code, I think that when the user
  buffer is too small, that nothing is written for GET_UNIQUE.
  See http://lxr.free-electrons.com/source/drivers/gpu/drm/drm_ioctl.c?v=3.2#L52
Comment 10 Philippe Waroquiers 2017-03-17 18:39:56 UTC
Following Ivo's comment that ARG3 would leak if syscall fails,
a followup commit is done in revision 16277 to avoid leak in
case the syscall fails.

Would be nice to further retest if that all works :)
Comment 11 Ivo Raisr 2017-03-17 18:48:11 UTC
Thank you, Philippe!