Bug 86641 - memcheck doesn't work with Mesa OpenGL/ATI on Suse 9.1
Summary: memcheck doesn't work with Mesa OpenGL/ATI on Suse 9.1
Status: RESOLVED DUPLICATE of bug 74298
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 2.1.2
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-05 17:35 UTC by Tom Teixeira
Modified: 2004-08-05 22:04 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Teixeira 2004-08-05 17:35:10 UTC
Running Suse 9.1 with ATI 9200 using the Suse drivers (DRI and MESA), not the
ATI drivers.

Using valgrind 2.1.2, created by building an rpm from the distribution.

Unable to use valgrind with our applications using OpenGL. Can be reproduced
with glxgears. Console output is shown below:

tjt@linux:~> valgrind --tool=memcheck  glxgears
==3680== Memcheck, a memory error detector for x86-linux.
==3680== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==3680== Using valgrind-2.1.2, a program supervision framework for x86-linux.
==3680== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==3680== For more details, rerun with: -v
==3680==
==3680== Warning: set address range perms: large range 134217728, a 0, v 0
==3680== Syscall param ioctl(generic) contains uninitialised or unaddressable
byte(s)
==3680==    at 0x1BBB7A49: ioctl (in /lib/tls/libc.so.6)
==3680==    by 0x1BE8828B: r200InitDriver (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x1BD5052D: __driUtilCreateScreen (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x1BE87D95: __driCreateScreen (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==  Address 0x52BFDAA4 is on thread 1's stack
==3680==
==3680== Syscall param write(buf) contains uninitialised or unaddressable byte(s)
==3680==    at 0x1BBB0FE3: __write_nocancel (in /lib/tls/libc.so.6)
==3680==    by 0x1BA0F01D: _X11TransSocketWrite (in /usr/X11R6/lib/libX11.so.6.2)
==3680==    by 0x1BA0E5AE: _X11TransWrite (in /usr/X11R6/lib/libX11.so.6.2)
==3680==    by 0x1B9F1765: _XFlushInt (in /usr/X11R6/lib/libX11.so.6.2)
==3680==  Address 0x1BC2F254 is 2300 bytes inside a block of size 16384 alloc'd
==3680==    at 0x1B9057ED: calloc (vg_replace_malloc.c:176)
==3680==    by 0x1B9E0CC2: XOpenDisplay (in /usr/X11R6/lib/libX11.so.6.2)
==3680==    by 0x8049DDA: main (in /usr/X11R6/bin/glxgears)
==3680==
==3680== Invalid read of size 2
==3680==    at 0x1BE677A3: sigfpe_handler (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x52BFEFFF: ???
==3680==    by 0x1BDE45CA: _math_init (in /usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x1BD65AD9: _mesa_initialize_context (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==  Address 0x6E is not stack'd, malloc'd or (recently) free'd
==3680==
==3680== Process terminating with default action of signal 11 (SIGSEGV): dumping
core
==3680==  Access not within mapped region at address 0x6E
==3680==    at 0x1BE677A3: sigfpe_handler (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x52BFEFFF: ???
==3680==    by 0x1BDE45CA: _math_init (in /usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==    by 0x1BD65AD9: _mesa_initialize_context (in
/usr/X11R6/lib/modules/dri/r200_dri.so)
==3680==
==3680== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 29 from 2)
==3680== malloc/free: in use at exit: 414558 bytes in 62 blocks.
==3680== malloc/free: 259 allocs, 197 frees, 418369 bytes allocated.
==3680== For a detailed leak analysis,  rerun with: --leak-check=yes
==3680== For counts of detected errors, rerun with: -v
Segmentation fault
tjt@linux:~>
Comment 1 Tom Hughes 2004-08-05 17:49:59 UTC
This looks like a bug in the OpenGL DRI module rather than a bug in valgrind to me - what makes you think it is a bug in valgrind?
Comment 2 Tom Teixeira 2004-08-05 18:07:00 UTC
Because glxgears runs without a segmentation fault run without valgrind. But I'm able to use valgrind on programs which don't use opengl.
Comment 3 Tom Hughes 2004-08-05 18:11:49 UTC
That doesn't actually prove anything - the environment under valgrind is radically different, so an uninitialised read that you might "get away with" normally may well cause a segmentation fault under valgrind.

The fact is that valgrind is reporting uninitialised memory reads, followed by a segmentation fault at the same address. That looks pretty clear to me - valgrind has done it's job and found an error in the client program.

It's just possible that there's something about that ioctl that is confusing valgrind - the ioctl has been handled by the generic code so if it has wierd side effects then that might lead to problems later.
Comment 4 Tom Teixeira 2004-08-05 18:58:56 UTC
I thought the environment wasn't supposed to be radically different -- I have no problem with it reporting an uninitialized read, thereby uncovering a latent bug, but by altering the program flow at that point, it's hiding more interesting bugs.

I'll admit that "more interesting" means somewhere in my code as opposed to latent bugs in library code.
Comment 5 Tom Hughes 2004-08-05 19:33:32 UTC
It isn't a question of altering program flow - it's a question of what data you happen to get when you read from an uninitialised location. Among other differences valgrind has a different malloc which can easily change what data an uninitialised location contains, especially since the valgrind malloc tries to delay reusing memory in order to give better reports.

One other thing that just occurred to me (given that this is in a SIGFPE handler) is that it might be trying to look at the FP state in the signal context, which valgrind doesn't fill in at the moment. Do you know where to get the source for that DRI module?
Comment 6 Tom Teixeira 2004-08-05 20:10:40 UTC
I'll go hunting around to see if I can find the exact source SuSE uses. The DRI source is otherwise at http://dri.sourceforge.net, but doesn't seem to identify any releases: just access to the CVS tree.

I think the code that causes the FPE is part of the Mesa source (http://www.mesa3d.org), and is related to exception handling for SSE.

FWIW, if I run glxgears under GDB, the backtrace is shown below. 

tjt@linux:~> gdb `which glxgears`
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /usr/X11R6/bin/glxgears
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[Thread debugging using libthread_db enabled]
[New Thread 1076954912 (LWP 16702)]
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 1076954912 (LWP 16702)]
0x4044c8c4 in _mesa_test_os_sse_exception_support ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
(gdb) bt
#0  0x4044c8c4 in _mesa_test_os_sse_exception_support ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#1  0x4044c725 in _mesa_init_all_x86_transform_asm ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#2  0x403c95cb in _math_init () from /usr/X11R6/lib/modules/dri/r200_dri.so
#3  0x4034aada in _mesa_initialize_context ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#4  0x4034b9f7 in _mesa_create_context ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#5  0x40452556 in r200CreateContext ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#6  0x40336044 in driCreateContext ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#7  0x40091204 in CreateContext () from /usr/lib/libGL.so.1
#8  0x40091732 in glXCreateContext () from /usr/lib/libGL.so.1
#9  0x08049f6c in main ()
(gdb)
Comment 7 Tom Hughes 2004-08-05 22:04:14 UTC
OK. This is down to the signal handler trying to inspect and update the FPE state.

*** This bug has been marked as a duplicate of 74298 ***