Bug 69616 - glibc 2.3.2 w/NPTL is massively different than what valgrind expects
Summary: glibc 2.3.2 w/NPTL is massively different than what valgrind expects
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 72614 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-12-04 04:25 UTC by Jesse
Modified: 2004-03-10 15:29 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
bad pthreadtypes.h file (3.91 KB, text/x-chdr)
2003-12-09 05:37 UTC, Jesse
Details
this is the pthreadtypes.h that valgrind is expecting (4.67 KB, text/plain)
2004-02-09 14:24 UTC, Christopher Rude
Details
attempted fix (43.07 KB, patch)
2004-02-10 14:17 UTC, Nicholas Nethercote
Details
output of patch-applying script (28.11 KB, text/plain)
2004-02-11 11:24 UTC, Nicholas Nethercote
Details
output of script failing (4.33 KB, text/plain)
2004-02-11 12:10 UTC, Christopher Rude
Details
Patch to handle PaX-style /proc/pid/maps files (2.11 KB, patch)
2004-02-12 16:28 UTC, Nicholas Nethercote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse 2003-12-04 04:25:33 UTC
Version:           cvs (using KDE Devel)
Installed from:    Compiled sources
Compiler:          gcc version 3.3.2 20031022 (Gentoo Linux 3.3.2-r2, propolice) 
OS:          Linux

It's been discovered by a couple of gentoo users and been discussed here:
http://forums.gentoo.org/viewtopic.php?t=100949&highlight=
http://forums.gentoo.org/viewtopic.php?t=106884&highlight=

The gist of the matter is that with NPTL enabled /usr/include/bits/pthreadtypes.h has vastly different contents than what valgrind expects.

A normal pthreadtypes.h file with non-NPTL glibc 2.3.2 (debian unstable) looks like:
typedef struct
{
  int __m_reserved;
  int __m_count;
  _pthread_descr __m_owner;
  int __m_kind;
  struct _pthread_fastlock __m_lock;
} pthread_mutex_t;
...
while ours looks like:
typedef union
{
  struct
  {
    int __lock;
    unsigned int __count;
    int __owner;  /* KIND must stay at this position in the structure to maintain binary compatibility.  */
    int __kind;
    unsigned int __nusers;
  } __data;
  char __size[__SIZEOF_PTHREAD_MUTEX_T];
  long int __align;
} pthread_mutex_t;

---------
You can obviously see why valgrind would then spit these errors during compilation:
vg_scheduler.c: In function `release_one_thread_waiting_on_mutex': 
 vg_scheduler.c:1986: error: union has no member named `__m_owner' 
 vg_scheduler.c:1991: error: union has no member named `__m_count' 
 vg_scheduler.c:1992: error: union has no member named `__m_owner' 
 vg_scheduler.c:1998: error: union has no member named `__m_owner' 
 vg_scheduler.c:1998: error: `_pthread_descr' undeclared (first use in this function) 
 vg_scheduler.c:1998: error: (Each undeclared identifier is reported only once 
 vg_scheduler.c:1998: error: for each function it appears in.) 
 vg_scheduler.c:1998: error: syntax error before "i" 
 vg_scheduler.c: In function `do_pthread_mutex_lock':
Comment 1 Jeremy Fitzhardinge 2003-12-09 05:23:12 UTC
We don't really care about the definition of struct pthread_mutex.  We stash some private info in there, so as long as the struct is big enough, we're happy.  We could just have a private vg_pthread_mutex_t, assert that sizeof(pthread_mutex_t) >= sizeof(vg_pthread_mutex_t), and ignore the pthread.h definition.

This still has the problem that if the program goes crazy, it can trash the contents of the mutex, which tends to make the Valgrind core crash.  The alternative is to just use the pthread_mutex_t * as a cookie, and keep our information hidden away within the core.  This could be a bit less efficient, but it is more reliable.
Comment 2 Jesse 2003-12-09 05:35:24 UTC
pthread_mutex was just an example ... All the structures in pthreadtypes.h are changed.

Here's the full errors ...
vg_scheduler.c: In function `release_one_thread_waiting_on_mutex':
vg_scheduler.c:1986: error: union has no member named `__m_owner'
vg_scheduler.c:1991: error: union has no member named `__m_count'
vg_scheduler.c:1992: error: union has no member named `__m_owner'
vg_scheduler.c:1998: error: union has no member named `__m_owner'
vg_scheduler.c:1998: error: `_pthread_descr' undeclared (first use in this function)
vg_scheduler.c:1998: error: (Each undeclared identifier is reported only once
vg_scheduler.c:1998: error: for each function it appears in.)
vg_scheduler.c:1998: error: syntax error before "i"
vg_scheduler.c: In function `do_pthread_mutex_lock':
vg_scheduler.c:2042: error: union has no member named `__m_kind'
vg_scheduler.c:2052: error: union has no member named `__m_count'
vg_scheduler.c:2061: error: union has no member named `__m_count'
vg_scheduler.c:2063: error: union has no member named `__m_owner'
vg_scheduler.c:2066: error: union has no member named `__m_owner'
vg_scheduler.c:2068: error: union has no member named `__m_kind'
vg_scheduler.c:2070: error: union has no member named `__m_count'
vg_scheduler.c:2074: error: union has no member named `__m_count'
vg_scheduler.c:2107: error: union has no member named `__m_owner'
vg_scheduler.c:2112: error: union has no member named `__m_count'
vg_scheduler.c:2113: error: union has no member named `__m_owner'
vg_scheduler.c:2113: error: `_pthread_descr' undeclared (first use in this function)
vg_scheduler.c:2113: error: syntax error before "tid"
vg_scheduler.c: In function `do_pthread_mutex_unlock':
vg_scheduler.c:2147: error: union has no member named `__m_kind'
vg_scheduler.c:2148: error: union has no member named `__m_kind'
vg_scheduler.c:2149: error: union has no member named `__m_owner'
vg_scheduler.c:2150: error: union has no member named `__m_owner'
vg_scheduler.c:2154: error: union has no member named `__m_kind'
vg_scheduler.c:2164: error: union has no member named `__m_count'
vg_scheduler.c:2174: error: union has no member named `__m_count'
vg_scheduler.c:2182: error: union has no member named `__m_owner'
vg_scheduler.c:2192: error: union has no member named `__m_count'
vg_scheduler.c:2193: error: union has no member named `__m_kind'
vg_scheduler.c:2194: error: union has no member named `__m_count'
vg_scheduler.c:2201: error: union has no member named `__m_count'
vg_scheduler.c:2202: error: union has no member named `__m_owner'
vg_scheduler.c: In function `do_pthread_cond_timedwait_TIMEOUT':
vg_scheduler.c:2256: error: union has no member named `__m_owner'
vg_scheduler.c:2258: error: union has no member named `__m_count'
vg_scheduler.c:2263: error: union has no member named `__m_owner'
vg_scheduler.c:2263: error: `_pthread_descr' undeclared (first use in this function)
vg_scheduler.c:2263: error: syntax error before "tid"
vg_scheduler.c:2264: error: union has no member named `__m_count'
vg_scheduler.c:2276: error: union has no member named `__m_count'
vg_scheduler.c: In function `release_N_threads_waiting_on_cond':
vg_scheduler.c:2326: error: union has no member named `__m_owner'
vg_scheduler.c:2328: error: union has no member named `__m_count'
vg_scheduler.c:2332: error: union has no member named `__m_owner'
vg_scheduler.c:2332: error: `_pthread_descr' undeclared (first use in this function)
vg_scheduler.c:2332: error: syntax error before "i"
vg_scheduler.c:2333: error: union has no member named `__m_count'
vg_scheduler.c:2346: error: union has no member named `__m_count'
vg_scheduler.c: In function `do_pthread_cond_wait':
vg_scheduler.c:2395: error: union has no member named `__m_kind'
vg_scheduler.c:2405: error: union has no member named `__m_count'
vg_scheduler.c:2415: error: union has no member named `__m_count'
vg_scheduler.c:2416: error: union has no member named `__m_owner'
vg_scheduler.c: In function `scheduler_sanity':
vg_scheduler.c:3238: error: union has no member named `__m_count'
vg_scheduler.c:3239: error: union has no member named `__m_owner'
vg_scheduler.c:3240: error: union has no member named `__m_owner'
Comment 3 Jesse 2003-12-09 05:37:01 UTC
Created attachment 3628 [details]
bad pthreadtypes.h file

Here's the resulting file that is now used in glibc.  Yes all other programs
build fine and I'm stable so it's not an install issue I don't think
Comment 4 Jeremy Fitzhardinge 2004-01-15 01:33:52 UTC
*** Bug 72614 has been marked as a duplicate of this bug. ***
Comment 5 Christopher Rude 2004-02-06 05:15:04 UTC
Subject: Re: glibc 2.3.2 w/NPTL is massively different than what valgrind expects

Any progress on fixing this problem?

Comment 6 Nicholas Nethercote 2004-02-07 21:34:47 UTC
No progress, as far as I know.

Is it clear exactly what caused the problem?  Reading the related threads, I'm
not sure where this libpthreads.h change was introduced -- is it a glibc thing?
A Gentoo thing?  A glibc-on-Gentoo thing?  Clarifications welcome.  Binary-incompatible changes to the pthreads data structures are not easy for us to deal with.

Also, it's worth pointing out that this bug currently has 105 votes associated with it;  the next most-voted-for bug only has 25.  (By default, Bugzilla doesn't show votes in the "show all bugs" list.)
Comment 7 Christopher Rude 2004-02-09 14:12:39 UTC
pthreadtypes.h is where the change is. I am currently running 
glibc-2.3.3-pre20040207.

after a bit of digging if found that if i copy the pthreadtypes.h from the 
linuxthreads includes valgrind builds.... (of course it does not run :) I was 
only testing a theory). So i guess that this (atm) is a gentoo problem, but 
when people update to glibc-2.3.3 it will be a much bigger problem (yes I 
know the price for bing cutting edge) as it will be in the mainline glibc 
this way (when using nptl).

so the short answer is that it is not a specific gentoo patch, but rather a 
change in the way glibc will handle things.

I hope that this info is helpfull in some ways

Good luck, and thanks
Chris

Comment 8 Christopher Rude 2004-02-09 14:24:54 UTC
Created attachment 4593 [details]
this is the pthreadtypes.h that valgrind is expecting

just thought that I would include the pthreadtypes.h that valgrind will build
against. I got this from the linuxthreads includes in glibc, the "bad"
pthreadtypes.h is already posted, and comes from the nptl includes in glibc.

Thanks again
Comment 9 Tom Hughes 2004-02-09 15:23:53 UTC
I think I can probably explain what's going on here...

The various pthread structures are binary compatible between linuxthreads and NPTL but they are not source compatible. The structures are all the same size, and any members whose value is important for statically initialised items such as mutexes have the same meaning but other members sometimes have different uses and more importantly as far as valgrind is concerned, the members in the structures generally have different names in the header files.

On some systems such as RedHat 9 and Fedora Core 1 where mutiple builds of the C library are shipped, the standard pthread.h is still the linuxthreads one, although even if you build with that you can run against linuxthreads. The NPTL header is shipped in /usr/include/nptl as part of the nptl-devel package. Obviously some other distributions are shipping the NPTL version of the headers as the default.
Comment 10 Nicholas Nethercote 2004-02-09 22:28:43 UTC
Some more details...

Comment #2 is misleading;  several of the structures in pthreadtypes.h may have changed, but judging from vg_scheduler.c and the compiler errors, pthread_mutex_t is the only one that Valgrind uses, and thus it's the only one that matters.

Tom is right in comment #9 about the change being binary-compatible but source- incompatible.  The three fields of the old pthread_mutex_t that Valgrind uses are __m_count, __m_owner, and __m_kind.  In the new version, these fields are in the same place in the pthread_mutex_t, but have unfortunately been renamed to __count, __owner, __kind.  Secondly, the type of the __m_owner/__owner field has changed from _pthread_descr to int;  _pthread_descr has disappeared in the new version.

The (to me) obvious workaround is to do a autoconf-time test to determine
which header is present, and define three macros for these fields, which expand
to either __m_foo or __foo, as appropriate;  also a macro equating _pthread_descr to int, if necessary (_pthread_descr is only used by Valgrind
in casts, fortunately).  Not very elegant, but fairly simple.  The other workaround is to do as Jeremy suggested in comment #1 and remove our use of pthread_mutex_t altogether.  This is probably more robust for the long-term.

Anyway, could one or more of the people affected by this bug please do a global
search-and-replace like this in vg_scheduler.c:

   s/__m_count/__count/
   s/__m_owner/__owner/
   s/__m_kind/__kind/
   s/_pthread_descr/int/

and see if that fixes the problem?  Thanks.
Comment 11 Christopher Rude 2004-02-09 22:53:14 UTC
I made the replacements you asked for, unfortunately it still does not build.

output is as follows:

Making all in demangle
make[1]: Entering directory `/var/tmp/portage/valgrind-2.1.0/work/valgrind-2.1.0/coregrind/demangle'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/var/tmp/portage/valgrind-2.1.0/work/valgrind-2.1.0/coregrind/demangle'
Making all in .
make[1]: Entering directory `/var/tmp/portage/valgrind-2.1.0/work/valgrind-2.1.0/coregrind'
if gcc -DHAVE_CONFIG_H -I. -I. -I..  -I./demangle -I../include -DVG_LIBDIR="\"/usr/local/lib"\"   -Winline -Wall -Wshadow -O -fno-omit-frame-pointer -mpreferred-stack-boundary=2 -g -fpic  -MT vg_scheduler.o -MD -MP -MF ".deps/vg_scheduler.Tpo" \
  -c -o vg_scheduler.o `test -f 'vg_scheduler.c' || echo './'`vg_scheduler.c; \
then mv -f ".deps/vg_scheduler.Tpo" ".deps/vg_scheduler.Po"; \
else rm -f ".deps/vg_scheduler.Tpo"; exit 1; \
fi
vg_scheduler.c: In function `release_one_thread_waiting_on_mutex':
vg_scheduler.c:1986: error: union has no member named `__owner'
vg_scheduler.c:1991: error: union has no member named `__count'
vg_scheduler.c:1992: error: union has no member named `__owner'
vg_scheduler.c:1998: error: union has no member named `__owner'
vg_scheduler.c: In function `do_pthread_mutex_lock':
vg_scheduler.c:2042: error: union has no member named `__kind'
vg_scheduler.c:2052: error: union has no member named `__count'
vg_scheduler.c:2061: error: union has no member named `__count'
vg_scheduler.c:2063: error: union has no member named `__owner'
vg_scheduler.c:2066: error: union has no member named `__owner'
vg_scheduler.c:2068: error: union has no member named `__kind'
vg_scheduler.c:2070: error: union has no member named `__count'
vg_scheduler.c:2074: error: union has no member named `__count'
vg_scheduler.c:2107: error: union has no member named `__owner'
vg_scheduler.c:2112: error: union has no member named `__count'
vg_scheduler.c:2113: error: union has no member named `__owner'
vg_scheduler.c: In function `do_pthread_mutex_unlock':
vg_scheduler.c:2147: error: union has no member named `__kind'
vg_scheduler.c:2148: error: union has no member named `__kind'
vg_scheduler.c:2149: error: union has no member named `__owner'
vg_scheduler.c:2150: error: union has no member named `__owner'
vg_scheduler.c:2154: error: union has no member named `__kind'
vg_scheduler.c:2164: error: union has no member named `__count'
vg_scheduler.c:2174: error: union has no member named `__count'
vg_scheduler.c:2182: error: union has no member named `__owner'
vg_scheduler.c:2192: error: union has no member named `__count'
vg_scheduler.c:2193: error: union has no member named `__kind'
vg_scheduler.c:2194: error: union has no member named `__count'
vg_scheduler.c:2201: error: union has no member named `__count'
vg_scheduler.c:2202: error: union has no member named `__owner'
vg_scheduler.c: In function `do_pthread_cond_timedwait_TIMEOUT':
vg_scheduler.c:2256: error: union has no member named `__owner'
vg_scheduler.c:2258: error: union has no member named `__count'
vg_scheduler.c:2263: error: union has no member named `__owner'
vg_scheduler.c:2264: error: union has no member named `__count'
vg_scheduler.c:2276: error: union has no member named `__count'
vg_scheduler.c: In function `release_N_threads_waiting_on_cond':
vg_scheduler.c:2326: error: union has no member named `__owner'
vg_scheduler.c:2328: error: union has no member named `__count'
vg_scheduler.c:2332: error: union has no member named `__owner'
vg_scheduler.c:2333: error: union has no member named `__count'
vg_scheduler.c:2346: error: union has no member named `__count'
vg_scheduler.c: In function `do_pthread_cond_wait':
vg_scheduler.c:2395: error: union has no member named `__kind'
vg_scheduler.c:2405: error: union has no member named `__count'
vg_scheduler.c:2415: error: union has no member named `__count'
vg_scheduler.c:2416: error: union has no member named `__owner'
vg_scheduler.c: In function `scheduler_sanity':
vg_scheduler.c:3238: error: union has no member named `__count'
vg_scheduler.c:3239: error: union has no member named `__owner'
vg_scheduler.c:3240: error: union has no member named `__owner'
make[1]: *** [vg_scheduler.o] Error 1
make[1]: Leaving directory `/var/tmp/portage/valgrind-2.1.0/work/valgrind-2.1.0/coregrind'
make: *** [all-recursive] Error 1
Comment 12 Nicholas Nethercote 2004-02-09 23:04:12 UTC
Ah, what about this:

s/__m_count/__data.__count/
s/__m_owner/__data.__owner/
s/__m_kind/__data.__kind/
s/_pthread_descr/int/ 

or something similar?  (I forgot that pthread_mutex_t is a union in the new version.)  If that doesn't work, can you fiddle with similar things until you find  something that does?
Comment 13 Christopher Rude 2004-02-10 00:13:12 UTC
forward progress :) vg_schedular.c now compiles.

now we run into a related problem with vg_libpthread.c

output is as follows:

gcc  -Winline -Wall -Wshadow -O -fno-omit-frame-pointer -mpreferred-stack-boundary=2 -g -fpic    -o valgrinq.so -shared vg_valgrinq_dummy.o
if gcc -DHAVE_CONFIG_H -I. -I. -I..  -I./demangle -I../include -DVG_LIBDIR="\"/usr/local/lib"\"   -Winline -Wall -Wshadow -O -fno-omit-frame-pointer -mpreferred-stack-boundary=2 -g -fpic -fno-omit-frame-pointer -MT vg_libpthread.o -MD -MP -MF ".deps/vg_libpthread.Tpo" \
  -c -o vg_libpthread.o `test -f 'vg_libpthread.c' || echo './'`vg_libpthread.c; \
then mv -f ".deps/vg_libpthread.Tpo" ".deps/vg_libpthread.Po"; \
else rm -f ".deps/vg_libpthread.Tpo"; exit 1; \
fi
vg_libpthread.c: In function `pthread_attr_init':
vg_libpthread.c:309: error: union has no member named `__detachstate'
vg_libpthread.c:312: error: union has no member named `__guardsize'
vg_libpthread.c: In function `pthread_attr_setdetachstate':
vg_libpthread.c:323: error: union has no member named `__detachstate'
vg_libpthread.c: In function `pthread_attr_getdetachstate':
vg_libpthread.c:329: error: union has no member named `__detachstate'
vg_libpthread.c: In function `pthread_getattr_np':
vg_libpthread.c:423: error: union has no member named `__detachstate'
vg_libpthread.c:424: error: union has no member named `__schedpolicy'
vg_libpthread.c:425: error: union has no member named `__schedparam'
vg_libpthread.c:426: error: union has no member named `__inheritsched'
vg_libpthread.c:427: error: union has no member named `__scope'
vg_libpthread.c:428: error: union has no member named `__guardsize'
vg_libpthread.c:429: error: union has no member named `__stackaddr'
vg_libpthread.c:430: error: union has no member named `__stackaddr_set'
vg_libpthread.c:431: error: union has no member named `__stacksize'
vg_libpthread.c:437: error: union has no member named `__detachstate'
vg_libpthread.c: In function `pthread_attr_setschedpolicy':
vg_libpthread.c:472: error: union has no member named `__schedpolicy'
vg_libpthread.c: In function `pthread_attr_getschedpolicy':
vg_libpthread.c:478: error: union has no member named `__schedpolicy'
vg_libpthread.c: In function `pthread_attr_getguardsize':
vg_libpthread.c:505: error: union has no member named `__guardsize'
vg_libpthread.c: In function `pthread_create':
vg_libpthread.c:715: error: union has no member named `__detachstate'
vg_libpthread.c: In function `__pthread_mutexattr_init':
vg_libpthread.c:877: error: union has no member named `__mutexkind'
vg_libpthread.c: In function `__pthread_mutexattr_settype':
vg_libpthread.c:893: error: union has no member named `__mutexkind'
vg_libpthread.c: In function `__pthread_mutex_init':
vg_libpthread.c:927: error: union has no member named `__m_count'
vg_libpthread.c:928: error: union has no member named `__m_owner'
vg_libpthread.c:928: error: `_pthread_descr' undeclared (first use in this function)
vg_libpthread.c:928: error: (Each undeclared identifier is reported only once
vg_libpthread.c:928: error: for each function it appears in.)
vg_libpthread.c:929: error: union has no member named `__m_kind'
vg_libpthread.c:931: error: union has no member named `__m_kind'
vg_libpthread.c:931: error: union has no member named `__mutexkind'
vg_libpthread.c: In function `__pthread_mutex_lock':
vg_libpthread.c:949: error: union has no member named `__m_owner'
vg_libpthread.c:949: error: `_pthread_descr' undeclared (first use in this function)
vg_libpthread.c:949: error: syntax error before numeric constant
vg_libpthread.c:950: error: union has no member named `__m_count'
vg_libpthread.c:951: error: union has no member named `__m_kind'
vg_libpthread.c: In function `__pthread_mutex_trylock':
vg_libpthread.c:970: error: union has no member named `__m_owner'
vg_libpthread.c:970: error: `_pthread_descr' undeclared (first use in this function)
vg_libpthread.c:970: error: syntax error before numeric constant
vg_libpthread.c:971: error: union has no member named `__m_count'
vg_libpthread.c:972: error: union has no member named `__m_kind'
vg_libpthread.c: In function `__pthread_mutex_unlock':
vg_libpthread.c:991: error: union has no member named `__m_owner'
vg_libpthread.c:992: error: union has no member named `__m_count'
vg_libpthread.c:993: error: union has no member named `__m_kind'
vg_libpthread.c: In function `__pthread_mutex_destroy':
vg_libpthread.c:1003: error: union has no member named `__m_count'
vg_libpthread.c:1012: error: union has no member named `__m_count'
vg_libpthread.c:1013: error: union has no member named `__m_owner'
vg_libpthread.c:1013: error: `_pthread_descr' undeclared (first use in this function)
vg_libpthread.c:1014: error: union has no member named `__m_kind'
vg_libpthread.c: In function `pthread_cond_init':
vg_libpthread.c:1038: error: union has no member named `__c_waiting'
vg_libpthread.c:1038: error: `_pthread_descr' undeclared (first use in this function)
vg_libpthread.c: In function `rw_remap':
vg_libpthread.c:2499: error: union has no member named `__rw_readers'
vg_libpthread.c:2500: error: union has no member named `__rw_readers'
vg_libpthread.c:2502: error: union has no member named `__rw_kind'
vg_libpthread.c: In function `pthread_rwlock_init':
vg_libpthread.c:2516: error: union has no member named `__rw_readers'
vg_libpthread.c:2518: error: union has no member named `__rw_kind'
vg_libpthread.c:2520: error: union has no member named `__rw_kind'
vg_libpthread.c:2520: error: union has no member named `__lockkind'
vg_libpthread.c: In function `pthread_rwlockattr_init':
vg_libpthread.c:2770: error: union has no member named `__lockkind'
vg_libpthread.c:2771: error: union has no member named `__pshared'
vg_libpthread.c: In function `pthread_rwlockattr_setpshared':
vg_libpthread.c:2794: error: union has no member named `__pshared'
make[1]: *** [vg_libpthread.o] Error 1
make[1]: Leaving directory `/var/tmp/portage/valgrind-2.1.0/work/valgrind-2.1.0/coregrind'
make: *** [all-recursive] Error 1
Comment 14 Nicholas Nethercote 2004-02-10 14:07:16 UTC
Erk, that makes things more difficult.

The following types have changed in the new pthreadtypes.h in a way that affects
Valgrind:

  pthread_mutex_t
  pthread_attr_t
  pthread_mutexattr_t
  pthread_rwlock_t
  pthread_rwlockattr_t

pthread_mutex_t is easy, because the layout is the same, at least w.r.t the fields Valgrind uses.  The others are more difficult.  Eg. pthread_attr_t
is defined in the new version in an opaque way that just says its size
is >= the size of the old pthread_attr_t struct.  The fields are not exposed in pthreadtypes.h, but done internally.  pthread_attr_t (and similar types) is cast to the internal type before use.

I guess we will have to do the same thing in vg_libpthread.c and vg_scheduler.c,
ie. have our own versions of the troublesome types, and convert to them.  I have
a patch that attempts this;  it seems to work on my machine (passes all regtests as expected) which has the old pthreadtypes.h.  I don't know if I have successfully removed all the code that causes problems with the new pthreadtypes.h, because it's difficult for me to test with that.  I will attach it.
Comment 15 Nicholas Nethercote 2004-02-10 14:17:28 UTC
Created attachment 4618 [details]
attempted fix

2nd attempt at attaching...
Comment 16 Andrew Mahone 2004-02-10 21:22:36 UTC
I've been watching this bug for a while, and I'd like to try the patch, but it doesn't apply for me against 2.0.0, 2.1.0, or a fresh CVS checkout.  What source was this patch made against?
Comment 17 Nicholas Nethercote 2004-02-10 23:09:44 UTC
It's against the current CVS HEAD.  I just tested it myself on a fresh checkout, it worked with:

patch -p0 < pth.patch
Comment 18 Christopher Rude 2004-02-10 23:30:32 UTC
hmmm.... I just checked out valgrind from the cvs (I am not that experienced with cvs) and the patch does not apply for me either.

when I check out valgrind i used this command set... am I wrong

export CVSROOT=:pserver:anonymous@anoncvs.kde.org:/home/kde
cvs login

cvs co valgrind

thanks again, and I apologize for my ignorance
Comment 19 Nicholas Nethercote 2004-02-10 23:48:06 UTC
How are you obtaining the patch?  Cutting and pasting the text from your screen
won't work, because tabs and spaces get mixed up.  You need to choose "save link as" or similar option in your browser on the "attempted fix" link below.
Comment 20 Christopher Rude 2004-02-11 00:04:45 UTC
I have used mozilla and konq. alway selecting save link as option. through both browsers I get the patch... which appears to be correctly formatted, but hunks fail to apply. I have tried against a clean extract of 2.0.0 2.1.0 and a fresh co of cvs using the method i pointed to in my last post.
Comment 21 Andrew Mahone 2004-02-11 05:33:34 UTC
I used save as in konqueror and tried wget as well, all hunks of this patch fail for me against current CVS.  Have you tried downloading it from bugzilla yourself?  Maybe the attachment doesn't match your local copy of the patch.
Comment 22 Nicholas Nethercote 2004-02-11 11:23:05 UTC
I've tried it using anonymous CVS, and the patch from the website.  I just
tried it again, and it worked fine for me.  I've packaged the commands I
used into the following script:

  #! /bin/sh

  set -x

  export CVSROOT=:pserver:anonymous@anoncvs.kde.org:/home/kde
  cvs login
  cvs co valgrind
  cd valgrind
  wget bugs.kde.org/attachment.cgi?id=4618&action=view
  sleep 5     # hack: pause for wget to complete
  patch -p0 < attachment.cgi\?id\=4618

I will attach the output I get when I run this in case it helps.  Has
anyone else successfully applied the patch?

If you look at the .orig and .rej files produced when patch doesn't work,
you can usually work out what went wrong -- is looking at them
informative?

Comment 23 Nicholas Nethercote 2004-02-11 11:24:02 UTC
Created attachment 4634 [details]
output of patch-applying script

Script I used to successfully apply the patch.
Comment 24 Christopher Rude 2004-02-11 12:10:54 UTC
Created attachment 4635 [details]
output of script failing

ok... so I wonder what is going on on my end :) but I copied your script, and
ran it. No luck... I belive that every hunk is failing, and I am not sure why.

I just finished patching by hand the cvs checkout, but I have one problem.
No Makefile.cvs is present. nor configure.

feel free to shoot me now, as I am now at a complete loss.. I have built things
from cvs before. is there something I am missing :(
Comment 25 Nicholas Nethercote 2004-02-11 12:15:29 UTC
I don't know anything about Makefile.cvs, but instructions for installing
from CVS are in the README.  Basically, you have to run "autogen.sh"
before doing ./configure.

Comment 26 Andrew Mahone 2004-02-11 12:26:01 UTC
Same here, I made a few tweaks to the script (the & in the URL is why you had to sleep on wget), and I get the same result - every hunk of the patch fails.  I use diffutils regularly without incident, I have no idea what's going on here :-/
Comment 27 Nicholas Nethercote 2004-02-11 12:34:55 UTC
I'm using patch 2.5.4.  Have you looked at the .orig and .rej files to
determine what's causing the failure?  I get the "(Stripping trailing CRs
from patch.)" messages, might your version of patch not handle trailing
CRs?

Comment 28 Sami Nieminen 2004-02-11 12:44:35 UTC
I had the same problem with all chunks failing. I succeeded in applying the patch after running "dos2unix file.diff" to convert the linebreaks to unix format.
Comment 29 Christopher Rude 2004-02-11 12:45:52 UTC
ok... moving in the right direction.. my lack of sleep made me for get about needing to set which automake to use, and that is why autogen wasnt working and why I was confused about Makefile.cvs (distant memories of days long ago).. I had read the readme, but because it didnt work I layed dumb....

so the good news is the cvs checkout that I hand applied the patch to compiles cleanly :)

I dont no if anything usefull was created, and I am probibly going to just go to bed as it has been a long day.

Thanks for your patents with me :)
Comment 30 Dave Henry 2004-02-11 13:17:43 UTC
Patch applies cleanly with dos2unix for me too. Valgrind now compiles fine and runs without any noticable issues in the few short minutes that I used it.
Comment 31 Andrew Mahone 2004-02-11 22:33:00 UTC
Hrm, yes, dos2unix fixes the patch for me as well.  Regtest fails has 118/119 stderr failures, 47/119 stdout failures.  It builds, though, which is much farther than I've gotten before.  I'm going to play around a bit more and see if it seems to work properly.
Comment 32 Christopher Rude 2004-02-11 22:49:18 UTC
alright... I added dos2linux in my helper script. Patch applies, valgrind compiles, but I am not sure anything usefull is happening.

using the example in the howto

#valgrind ps -ax

I just get the usage text for valgrind.

although passing --tool=<anytool> works as I would expect

Chris :)
Comment 33 Andrew Mahone 2004-02-11 22:59:29 UTC
I have the same issue of getting usage if I don't specify a tool, but it also doesn't work if I specify a tool.  I get this output (this is for cachegrind, but it's similar for other tools):

ash-2.05b$ valgrind --tool=cachegrind ps -ax
==2499== Cachegrind, an I1/D1/L2 cache profiler for x86-linux.
==2499== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote.
==2499== Using valgrind-2.1.0, a program supervision framework for x86-linux.
==2499== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward.
==2499== FATAL: syntax error reading /proc/self/maps
last 50 chars: `08048000-08058000 R'

My guess would be it also doesn't like my kernel, but that'd be another bug entirely :-)
Comment 34 Nicholas Nethercote 2004-02-11 23:31:20 UTC
The usage message is because in the CVS HEAD, we've changed things so that there is no longer a default tool (although you can choose your own in a .valgrindrc file;  read the docs for more info).

As for the FATAL error:  "last 50 chars: `08048000-08058000 R", the 'R' is the problem -- normally the permissions look like 'rwxp', possibly with some of the letters replaced by a '-';  so yes, it does look like a kernel issue.  Can you do "cat /proc/self/maps" and attach the output?  Thanks.

Comment 35 Andrew Mahone 2004-02-12 01:27:26 UTC
This is definitely different from what it's expecting.  I'm using steel300's love-sources, which is based on 2.6 and include PaX (a security patch which provides enforced no-exec pages, among other things), I'm guessing one or the other of those is responsible for the difference.  Here's the output:

08048000-0804c000 R+Xp 00000000 fe:00 7213906    /bin/cat
0804c000-0804d000 RW+p 00003000 fe:00 7213906    /bin/cat
0804d000-08079000 RWXp 00000000 00:00 0
40000000-40016000 R+Xp 00000000 fe:00 7964329    /lib/ld-2.3.3.so
40016000-40017000 RW+p 00015000 fe:00 7964329    /lib/ld-2.3.3.so
40017000-40018000 RW+p 00000000 00:00 0
40032000-40164000 R+Xp 00000000 fe:00 7964186    /lib/libc-2.3.3.so
40164000-40167000 RW+p 00131000 fe:00 7964186    /lib/libc-2.3.3.so
40167000-4016b000 RW+p 00000000 00:00 0
bfffe000-c0000000 RWXp fffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Comment 36 Nicholas Nethercote 2004-02-12 16:28:53 UTC
Created attachment 4655 [details]
Patch to handle PaX-style /proc/pid/maps files

Ok, I've discovered it's due to PaX, which is included in "Hardened Gentoo".

PaX introduces extr controls over memory mappings.  Judging from the patch at
pax.grsecurity.net/pax-linux-2.6.2-200402070035.patch, and the docs at 
pax.grsecurity.net/docs/mprotect.txt, the following letters are used:

  R == allowed to be readable, and currently readable
  + == allowed to be readable, and currently not
  r == not allowed to be readable, and currently readable (impossible?)
  - == not allowed to be readable, and currently not

And similarly for W/w and X/x.	p is left untouched.

I think Valgrind can simply be modified to equate 'R' with 'r', 'W' with 'w',
'X' with 'x', and '+' with '-' in /proc/pid/maps.  The attached patch does just

this.
Comment 37 Nicholas Nethercote 2004-02-12 18:09:36 UTC
Some more info about PaX that might be of interest, from Jason Cox (a.k.a steel300):

Did the user enable PaX security? 
If so, did they rebuild everything using gcc and propolice? 
Some applications will not work in a PaX environment. Have them take a look at
chpax to disable PaX security for certain binaries. If the user is using Gentoo, have them emerge chpax and hardened-gcc, then run emerge -e world.    
That will put them in a true PaX environment. chpax can then be used on app's that no longer work.
Comment 38 Andrew Mahone 2004-02-13 01:56:10 UTC
I'm actually using love-sources, not hardened-sources, but both include PaX.  Other than that, I'm using the new hardened gentoo profile, and I'm in the process of rebuilding everything with hardened-gcc.  I'm also working on an ebuild for paxctl, which controls the new program header flags for executables (the old method of using unused ELF header fields is deprecated).

I can't find any documentation (not that I've looked *hard*) for the PaX changes to /proc/*/maps files.  Perhaps the r/w means "readable/writable, can not be changed"?
Comment 39 Andrew Mahone 2004-02-13 04:01:01 UTC
For anybody else using PaX or ProPolice/SSP:

I can't get valgrind to build with PP/SSP enabled.  The build seems to ignore my CFLAGS, so I had to use gentoo's hardened-gcc tool to make these options not on-by-default.

Chpax/paxctl must be used to disable *most* PaX options for valgrind to run.  I had to use "paxctl -zespm", I believe options are similar for chpax as well.  After disabling *all* PaX options (paxctl -zespmrx), make regtest has 67 stderr failures, and 2 stdout failures (fewer than with -zespm).  Playing around w/ the various tools seems to work OK, and things are overall far, far better than when I first took an interest in valgrind (and couldn't make it build short of having a non-NPTL environment in a chroot).  I'd suggest limiting execution of valgrind to selected users, since it only functions properly with much of PaX's protection disabled - I'd assume this makes the program run under it vulnerable as well.

I only started using love-sources this week, so I have a vanilla 2.6.1 kernel around that I should still be able to boot.  When I get a chance (I don't like to mess with boot/kernel stuff when I'm not physically present), I'll boot that and see if the remaining regtest failures are all PaX-related, or if there's still anything missing for NPTL.  Would it be any help to know which tests failed on my system?  I can attach the failure list or the full make regtest output if necessary.
Comment 40 Nicholas Nethercote 2004-02-13 11:18:42 UTC
A list of the failed tests would be useful;  this would give us an idea of
whether the errors are spread across the whole suite, or clustering in
particular tools.

But what would be most useful is the .stderr.diff and .stdout.diff files.
Attaching 69 would be a bit much;  could you look through them, and if
they look to be failing for the same reason(s), collate and attach a
handful of representative ones?

In general, if PaX fools with things so much, then it probably only makes
sense for us to aim to get Valgrind working underneath it with all options
turned off.

Comment 41 Nicholas Nethercote 2004-02-16 22:50:18 UTC
Dave (comment #30) and Christopher (comment #32) seem to be running things ok with the patch.  Andrew's having trouble because of PaX.  How about the other half-dozen people who are watching and voting for this bug -- is the patch ok?
Comment 42 Andrew Mahone 2004-02-17 00:30:05 UTC
Most of the .stderr.diff files look like this:
+ Conditional jump or move depends on uninitialised value(s)
+    at 0x........: _dl_relocate_object (in /lib/ld-2.3.3.so)
+    by 0x........: ...
+    by 0x........: ...
+    by 0x........: ...
+
+ Conditional jump or move depends on uninitialised value(s)
+    at 0x........: _dl_relocate_object (in /lib/ld-2.3.3.so)
+    by 0x........: ...
+    by 0x........: ...
+    by 0x........: ...
+
+ Conditional jump or move depends on uninitialised value(s)
+    at 0x........: _dl_relocate_object (in /lib/ld-2.3.3.so)
+    by 0x........: ...
+    by 0x........: ...
+    by 0x........: ...
+
+ Conditional jump or move depends on uninitialised value(s)
+    at 0x........: _dl_relocate_object (in /lib/ld-2.3.3.so)
+    by 0x........: ...
+    by 0x........: ...
+    by 0x........: ...
+
+ Conditional jump or move depends on uninitialised value(s)
+    at 0x........: _dl_relocate_object (in /lib/ld-2.3.3.so)
+    by 0x........: ...
+    by 0x........: ...
+    by 0x........: ...
+

Sounds like something in glibc is triggering it, I'll boot my non-PaX kernel sometime tonight and see if I get the same result.
Comment 43 Igor Kovalenko 2004-02-18 19:56:38 UTC
I've applied this patch, and it compiles OK, works for me, with glibc triggering as described by Andrew Mahone
(gentoo with glibc 20040207)
Comment 44 Martijn Koster 2004-02-21 02:01:03 UTC
"Me too" for comment #43.
Comment 45 Nicholas Nethercote 2004-02-28 17:20:30 UTC
The patch fixing the pthreadtypes.h problem has been committed.  Hopefully it fixes the problem.  I'll leave this bug open for the moment in case the patch has problems.  Further confirmations that it works with affected distros would be welcome.

The patch addressing the PaX proc/self/maps problem has not been committed, since there seem to be more problems caused by PaX that it doesn't address.  Since it changes how the kernel works, it's hard to know what to do, so for the moment I'll say supporting PaX-enabled kernels is a non-goal.  However, Andrew, or anyone else using PaX, you might like to file another bug specifically describing the PaX issues.  If you do, please cite this bug, #69616, in the problem description.  (I should have asked you to do that when the problem first arose, two clearly distinguish between the two problems.)
Comment 46 Nicholas Nethercote 2004-03-10 15:29:39 UTC
No complaints about the commit  in 12 days, so I'll close this.