Bug 139776 - Valgrind invalid read in unaligned memcpy with Intel compiler v9
Summary: Valgrind invalid read in unaligned memcpy with Intel compiler v9
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.2.1
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-08 15:58 UTC by James Farmer
Modified: 2007-08-26 14:35 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test case (982 bytes, text/plain)
2007-01-08 16:00 UTC, James Farmer
Details
Test case 2 - demonstrates unexpected "uninitialised value" error (2.66 KB, text/plain)
2007-01-08 20:08 UTC, James Farmer
Details
Valgrind output from test case 2 (9.24 KB, text/plain)
2007-01-08 20:09 UTC, James Farmer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Farmer 2007-01-08 15:58:51 UTC
Hi.  We're very keen users of your excellent memory checker but we've
encountered a small problem.

We recently upgraded our Intel compiler from 7.0 to 9.1 and began to encounter
invalid read errors when running our code using Valgrind.  These only appear
when the build is optimized with -O2 or greater, and we tracked them down to the
intel_fast_memcpy intrinsic. However, tests confirm the memory is copied
correctly.  Here's a small example that seems to duplicate the error:

================================================================

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <math.h>
#include <string.h>

void fill_array(char* array, size_t len) {
  unsigned int i;
  for (i=0;i<len;i++) {
    array[i] = (unsigned char) (rand()%255);
  }
}

void copy_array(char* dst, char* src, size_t len) {
  memcpy(dst, src, len);
}

void check_array(char* dst, char* src, size_t len) {
  unsigned int i;
  for (i=0;i<len;i++) {
    if (dst[i]!=src[i]) {
      printf("ERROR: Array position %d - dst=%d src=%d\n", i, (int) dst[i],
(int) src[i]);
    }
  }
}


int main() {
  int j;
  for (j=0;j<16;j++) {
    size_t len = 128;
    size_t readoff = 1;
    printf("len=%d, readoff=%d\n", len, readoff);
    fflush(stdout);

    char* array1 = malloc(len);
    char* array2 = malloc(len);

    fill_array(array1, len);

    copy_array(array2, array1+readoff, len-readoff);

    check_array(array2, array1+readoff, len-readoff);

    free(array1);
    free(array2);
    usleep(250*1000);
  }
  return 0;
}

================================================================

When compiled with -O2 and run with valgrind (e.g. icc -O2 memcpytest.c ;
valgrind a.out), we see this output:

================================================================

==20187== Memcheck, a memory error detector.
==20187== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==20187== Using LibVEX rev 1658, a library for dynamic binary translation.
==20187== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==20187== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==20187== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==20187== For more details, rerun with: -v
==20187==
len=128, readoff=1
==20187== Invalid read of size 8
==20187==    at 0x8048FA8: (within /home/port/test/a.out)
==20187==  Address 0x41990A8 is 0 bytes after a block of size 128 alloc'd
==20187==    at 0x401A6CE: malloc (vg_replace_malloc.c:149)
==20187==    by 0x8048616: main (in /home/port/test/a.out)
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
==20187==
==20187== ERROR SUMMARY: 16 errors from 1 contexts (suppressed: 19 from 1)
==20187== malloc/free: in use at exit: 0 bytes in 0 blocks.
==20187== malloc/free: 32 allocs, 32 frees, 4,096 bytes allocated.
==20187== For counts of detected errors, rerun with: -v
==20187== All heap blocks were freed -- no leaks are possible.

================================================================

Here's the equivalent output when run with valgrind -v :

================================================================

==20200== Memcheck, a memory error detector.
==20200== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==20200== Using LibVEX rev 1658, a library for dynamic binary translation.
==20200== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==20200== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==20200== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==20200==
--20200-- Command line
--20200--    a.out
--20200-- Startup, with flags:
--20200--    -v
--20200-- Contents of /proc/version:
--20200--   Linux version 2.4.20-8smp (bhcompile@porky.devel.redhat.com) (gcc
version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu Mar 13 17:45:54 EST 2003
--20200-- Arch and hwcaps: X86, x86-sse1-sse2
--20200-- Valgrind library directory: /usr/local/lib/valgrind
--20200-- Reading syms from /lib/ld-2.3.2.so (0x4000000)
--20200-- Reading syms from /home/port/test/a.out (0x8048000)
--20200-- Reading syms from /usr/local/lib/valgrind/x86-linux/memcheck (0x38000000)
--20200--    object doesn't have a dynamic symbol table
--20200-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
--20200-- REDIR: 0x40114A0 (index) redirected to 0x38020EEB
(vgPlain_x86_linux_REDIR_FOR_index)
--20200-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_core.so
(0x4017000)
--20200-- Reading syms from
/usr/local/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x4019000)
==20200== WARNING: new redirection conflicts with existing -- ignoring it
--20200--     new: 0x040114A0 (index     ) R-> 0x0401C180 index
--20200-- REDIR: 0x4011640 (strlen) redirected to 0x401C3D4 (strlen)
--20200-- Reading syms from /lib/libm-2.3.2.so (0x4032000)
--20200-- Reading syms from /lib/libgcc_s-3.2.2-20030225.so.1 (0x4054000)
--20200--    object doesn't have a symbol table
--20200-- Reading syms from /lib/libc-2.3.2.so (0x405C000)
--20200-- Reading syms from /lib/libdl-2.3.2.so (0x4196000)
--20200-- REDIR: 0x40D6500 (rindex) redirected to 0x401C0A8 (rindex)
--20200-- REDIR: 0x40D7B20 (memset) redirected to 0x401CCB4 (memset)
len=128, readoff=1
--20200-- REDIR: 0x40CED80 (malloc) redirected to 0x401A64C (malloc)
==20200== Invalid read of size 8
==20200==    at 0x8048FA8: (within /home/port/test/a.out)
==20200==  Address 0x41990A8 is 0 bytes after a block of size 128 alloc'd
==20200==    at 0x401A6CE: malloc (vg_replace_malloc.c:149)
==20200==    by 0x8048616: main (in /home/port/test/a.out)
--20200-- REDIR: 0x40CEF40 (free) redirected to 0x401B203 (free)
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
==20200==
==20200== ERROR SUMMARY: 16 errors from 1 contexts (suppressed: 19 from 1)
==20200==
==20200== 16 errors in context 1 of 1:
==20200== Invalid read of size 8
==20200==    at 0x8048FA8: (within /home/port/test/a.out)
==20200==  Address 0x41990A8 is 0 bytes after a block of size 128 alloc'd
==20200==    at 0x401A6CE: malloc (vg_replace_malloc.c:149)
==20200==    by 0x8048616: main (in /home/port/test/a.out)
--20200--
--20200-- supp:   19 Ubuntu-stripped-ld.so
==20200==
==20200== IN SUMMARY: 16 errors from 1 contexts (suppressed: 19 from 1)
==20200==
==20200== malloc/free: in use at exit: 0 bytes in 0 blocks.
==20200== malloc/free: 32 allocs, 32 frees, 4,096 bytes allocated.
==20200==
==20200== All heap blocks were freed -- no leaks are possible.
--20200--  memcheck: sanity checks: 0 cheap, 1 expensive
--20200--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
--20200--  memcheck: auxmaps: 0 searches, 0 comparisons
--20200--  memcheck: SMs: n_issued      = 7 (112k, 0M)
--20200--  memcheck: SMs: n_deissued    = 0 (0k, 0M)
--20200--  memcheck: SMs: max_noaccess  = 65535 (1048560k, 1023M)
--20200--  memcheck: SMs: max_undefined = 0 (0k, 0M)
--20200--  memcheck: SMs: max_defined   = 22 (352k, 0M)
--20200--  memcheck: SMs: max_non_DSM   = 7 (112k, 0M)
--20200--  memcheck: max sec V bit nodes:    0 (0k, 0M)
--20200--  memcheck: set_sec_vbits8 calls: 0 (new: 0, updates: 0)
--20200--  memcheck: max shadow mem size:   416k, 0M
--20200-- translate:            fast SP updates identified: 1,673 ( 87.1%)
--20200-- translate:   generic_known SP updates identified: 110 (  5.7%)
--20200-- translate: generic_unknown SP updates identified: 137 (  7.1%)
--20200--     tt/tc: 3,828 tt lookups requiring 3,880 probes
--20200--     tt/tc: 3,828 fast-cache updates, 3 flushes
--20200--  transtab: new        1,809 (37,811 -> 634,313; ratio 167:10) [0 scs]
--20200--  transtab: dumped     0 (0 -> ??)
--20200--  transtab: discarded  8 (193 -> ??)
--20200-- scheduler: 72,932 jumps (bb entries).
--20200-- scheduler: 0/2,209 major/minor sched events.
--20200--    sanity: 1 cheap, 1 expensive checks.
--20200--    exectx: 30,011 lists, 13 contexts (avg 0 per list)
--20200--    exectx: 99 searches, 86 full compares (868 per 1000)
--20200--    exectx: 0 cmp2, 66 cmp4, 0 cmpAll

================================================================

For comparison, the valgrind -v output if our example code is compiled with less
optimization (icc -O1 memcpytest.c ; valgrind -v a.out):

================================================================

==20211== Memcheck, a memory error detector.
==20211== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==20211== Using LibVEX rev 1658, a library for dynamic binary translation.
==20211== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==20211== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==20211== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==20211==
--20211-- Command line
--20211--    a.out
--20211-- Startup, with flags:
--20211--    -v
--20211-- Contents of /proc/version:
--20211--   Linux version 2.4.20-8smp (bhcompile@porky.devel.redhat.com) (gcc
version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu Mar 13 17:45:54 EST 2003
--20211-- Arch and hwcaps: X86, x86-sse1-sse2
--20211-- Valgrind library directory: /usr/local/lib/valgrind
--20211-- Reading syms from /lib/ld-2.3.2.so (0x4000000)
--20211-- Reading syms from /home/port/test/a.out (0x8048000)
--20211-- Reading syms from /usr/local/lib/valgrind/x86-linux/memcheck (0x38000000)
--20211--    object doesn't have a dynamic symbol table
--20211-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
--20211-- REDIR: 0x40114A0 (index) redirected to 0x38020EEB
(vgPlain_x86_linux_REDIR_FOR_index)
--20211-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_core.so
(0x4017000)
--20211-- Reading syms from
/usr/local/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x4019000)
==20211== WARNING: new redirection conflicts with existing -- ignoring it
--20211--     new: 0x040114A0 (index     ) R-> 0x0401C180 index
--20211-- REDIR: 0x4011640 (strlen) redirected to 0x401C3D4 (strlen)
--20211-- Reading syms from /lib/libm-2.3.2.so (0x4032000)
--20211-- Reading syms from /lib/libgcc_s-3.2.2-20030225.so.1 (0x4054000)
--20211--    object doesn't have a symbol table
--20211-- Reading syms from /lib/libc-2.3.2.so (0x405C000)
--20211-- Reading syms from /lib/libdl-2.3.2.so (0x4196000)
--20211-- REDIR: 0x40D6500 (rindex) redirected to 0x401C0A8 (rindex)
--20211-- REDIR: 0x40D7B20 (memset) redirected to 0x401CCB4 (memset)
len=128, readoff=1
--20211-- REDIR: 0x40CED80 (malloc) redirected to 0x401A64C (malloc)
--20211-- REDIR: 0x40D8130 (memcpy) redirected to 0x401C74C (memcpy)
--20211-- REDIR: 0x40CEF40 (free) redirected to 0x401B203 (free)
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
len=128, readoff=1
==20211==
==20211== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 1)
--20211--
--20211-- supp:   19 Ubuntu-stripped-ld.so
==20211== malloc/free: in use at exit: 0 bytes in 0 blocks.
==20211== malloc/free: 32 allocs, 32 frees, 4,096 bytes allocated.
==20211==
==20211== All heap blocks were freed -- no leaks are possible.
--20211--  memcheck: sanity checks: 0 cheap, 1 expensive
--20211--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
--20211--  memcheck: auxmaps: 0 searches, 0 comparisons
--20211--  memcheck: SMs: n_issued      = 7 (112k, 0M)
--20211--  memcheck: SMs: n_deissued    = 0 (0k, 0M)
--20211--  memcheck: SMs: max_noaccess  = 65535 (1048560k, 1023M)
--20211--  memcheck: SMs: max_undefined = 0 (0k, 0M)
--20211--  memcheck: SMs: max_defined   = 22 (352k, 0M)
--20211--  memcheck: SMs: max_non_DSM   = 7 (112k, 0M)
--20211--  memcheck: max sec V bit nodes:    0 (0k, 0M)
--20211--  memcheck: set_sec_vbits8 calls: 0 (new: 0, updates: 0)
--20211--  memcheck: max shadow mem size:   416k, 0M
--20211-- translate:            fast SP updates identified: 1,691 ( 87.2%)
--20211-- translate:   generic_known SP updates identified: 110 (  5.6%)
--20211-- translate: generic_unknown SP updates identified: 138 (  7.1%)
--20211--     tt/tc: 3,817 tt lookups requiring 3,881 probes
--20211--     tt/tc: 3,817 fast-cache updates, 3 flushes
--20211--  transtab: new        1,805 (37,606 -> 632,493; ratio 168:10) [0 scs]
--20211--  transtab: dumped     0 (0 -> ??)
--20211--  transtab: discarded  8 (193 -> ??)
--20211-- scheduler: 73,489 jumps (bb entries).
--20211-- scheduler: 0/2,202 major/minor sched events.
--20211--    sanity: 1 cheap, 1 expensive checks.
--20211--    exectx: 30,011 lists, 12 contexts (avg 0 per list)
--20211--    exectx: 83 searches, 71 full compares (855 per 1000)
--20211--    exectx: 0 cmp2, 51 cmp4, 0 cmpAll

================================================================

The uname -a output of our machine is:

Linux Anubis 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386
GNU/Linux

Although we've seen similar errors on some of our other Linux machines too.

These errors do seem spurious, in that the data seems to be copied correctly and
we haven't noticed any issues or unreliability when not running Valgrind that
could be traced to this issue.

Although the example we gave outputs only one Valgrind error, we have seen cases
where multiple ones are given as the size and offset of the area copied is varied.

Thank you for giving your attention to this issue,

James Farmer
Dash Optimization
Comment 1 James Farmer 2007-01-08 16:00:25 UTC
Created attachment 19195 [details]
Test case
Comment 2 Julian Seward 2007-01-08 17:45:51 UTC
What happens if you run with --alignment=16 ?
Comment 3 James Farmer 2007-01-08 17:55:28 UTC
Thanks.  That removes the errors that were displayed by our test case but still leaves lots of other valgrind errors in our main program (which we assume are related, since they all vanish if we disable the memcpy intrinsic).  I'll try to create another test case to illustrate these.
Comment 4 James Farmer 2007-01-08 18:10:07 UTC
For example, setting len=5886 in our above test case results in:

================================================================ 

==20671== Memcheck, a memory error detector.
==20671== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==20671== Using LibVEX rev 1658, a library for dynamic binary translation.
==20671== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==20671== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==20671== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==20671==
--20671-- Command line
--20671--    a.out
--20671-- Startup, with flags:
--20671--    -v
--20671--    --alignment=16
--20671-- Contents of /proc/version:
--20671--   Linux version 2.4.20-8smp (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu Mar 13 17:45:54 EST 2003
--20671-- Arch and hwcaps: X86, x86-sse1-sse2
--20671-- Valgrind library directory: /usr/local/lib/valgrind
--20671-- Reading syms from /lib/ld-2.3.2.so (0x4000000)
--20671-- Reading syms from /home/port/test/a.out (0x8048000)
--20671-- Reading syms from /usr/local/lib/valgrind/x86-linux/memcheck (0x38000000)
--20671--    object doesn't have a dynamic symbol table
--20671-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
--20671-- REDIR: 0x40114A0 (index) redirected to 0x38020EEB (vgPlain_x86_linux_REDIR_FOR_index)
--20671-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_core.so (0x4017000)
--20671-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x4019000)
==20671== WARNING: new redirection conflicts with existing -- ignoring it
--20671--     new: 0x040114A0 (index     ) R-> 0x0401C180 index
--20671-- REDIR: 0x4011640 (strlen) redirected to 0x401C3D4 (strlen)
--20671-- Reading syms from /lib/libm-2.3.2.so (0x4032000)
--20671-- Reading syms from /lib/libgcc_s-3.2.2-20030225.so.1 (0x4054000)
--20671--    object doesn't have a symbol table
--20671-- Reading syms from /lib/libc-2.3.2.so (0x405C000)
--20671-- Reading syms from /lib/libdl-2.3.2.so (0x4196000)
--20671-- REDIR: 0x40D6500 (rindex) redirected to 0x401C0A8 (rindex)
--20671-- REDIR: 0x40D7B20 (memset) redirected to 0x401CCB4 (memset)
len=5886, readoff=1
--20671-- REDIR: 0x40CED80 (malloc) redirected to 0x401A64C (malloc)
==20671== Invalid read of size 8
==20671==    at 0x8048FE8: (within /home/port/test/a.out)
==20671==  Address 0x419A758 is 5,880 bytes inside a block of size 5,886 alloc'd
==20671==    at 0x401A6CE: malloc (vg_replace_malloc.c:149)
==20671==    by 0x8048652: main (in /home/port/test/a.out)
--20671-- REDIR: 0x40CEF40 (free) redirected to 0x401B203 (free)
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
len=5886, readoff=1
==20671==
==20671== ERROR SUMMARY: 16 errors from 1 contexts (suppressed: 19 from 1)
==20671==
==20671== 16 errors in context 1 of 1:
==20671== Invalid read of size 8
==20671==    at 0x8048FE8: (within /home/port/test/a.out)
==20671==  Address 0x419A758 is 5,880 bytes inside a block of size 5,886 alloc'd
==20671==    at 0x401A6CE: malloc (vg_replace_malloc.c:149)
==20671==    by 0x8048652: main (in /home/port/test/a.out)
--20671--
--20671-- supp:   19 Ubuntu-stripped-ld.so
==20671==
==20671== IN SUMMARY: 16 errors from 1 contexts (suppressed: 19 from 1)
==20671==
==20671== malloc/free: in use at exit: 0 bytes in 0 blocks.
==20671== malloc/free: 32 allocs, 32 frees, 188,352 bytes allocated.
==20671==
==20671== All heap blocks were freed -- no leaks are possible.
--20671--  memcheck: sanity checks: 14 cheap, 1 expensive
--20671--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
--20671--  memcheck: auxmaps: 0 searches, 0 comparisons
--20671--  memcheck: SMs: n_issued      = 10 (160k, 0M)
--20671--  memcheck: SMs: n_deissued    = 0 (0k, 0M)
--20671--  memcheck: SMs: max_noaccess  = 65535 (1048560k, 1023M)
--20671--  memcheck: SMs: max_undefined = 0 (0k, 0M)
--20671--  memcheck: SMs: max_defined   = 22 (352k, 0M)
--20671--  memcheck: SMs: max_non_DSM   = 10 (160k, 0M)
--20671--  memcheck: max sec V bit nodes:    0 (0k, 0M)
--20671--  memcheck: set_sec_vbits8 calls: 0 (new: 0, updates: 0)
--20671--  memcheck: max shadow mem size:   464k, 0M
--20671-- translate:            fast SP updates identified: 1,674 ( 87.1%)
--20671-- translate:   generic_known SP updates identified: 110 (  5.7%)
--20671-- translate: generic_unknown SP updates identified: 137 (  7.1%)
--20671--     tt/tc: 3,840 tt lookups requiring 3,902 probes
--20671--     tt/tc: 3,840 fast-cache updates, 3 flushes
--20671--  transtab: new        1,812 (37,841 -> 634,583; ratio 167:10) [0 scs]
--20671--  transtab: dumped     0 (0 -> ??)
--20671--  transtab: discarded  8 (193 -> ??)
--20671-- scheduler: 1,458,352 jumps (bb entries).
--20671-- scheduler: 14/2,232 major/minor sched events.
--20671--    sanity: 15 cheap, 1 expensive checks.
--20671--    exectx: 30,011 lists, 13 contexts (avg 0 per list)
--20671--    exectx: 99 searches, 86 full compares (868 per 1000)
--20671--    exectx: 0 cmp2, 66 cmp4, 0 cmpAll

================================================================ 

What we haven't managed to reproduce in a test case yet is our main program also reports lots of "Conditional jump or move depends on uninitialised value(s)" and "Use of uninitialised value" errors which seem to be related, as again they vanish if we disable the 'memcpy' intrinsic.


Comment 5 Julian Seward 2007-01-08 18:37:48 UTC
I'm not saying the --alignment=16 flag is a recommended fix: V's malloc
replacement should behave the same as libc one in this respect, and any
difference is a bug.  If you compile (with icc -O2) and run the following
program (natively), the output should help you figure out the alignment
that your native malloc produces.  I always get the lowest digit as 0 or 8
indicating 8-aligned memory.

#include <stdio.h>
#include <stdlib.h>

int main ( void )
{
   int i;
   char* p;
   for (i = 0; i < 20; i++) {
      p = malloc(i);
      printf("0x%02lx\n", ((unsigned long)p) & 0xFF);
   }
   return 0;
}
Comment 6 James Farmer 2007-01-08 19:00:16 UTC
I see something similar (It looks 8-aligned to me: 0x28, 0x38, 0x48, 0x58, 0x68, 0x78, 0x88, 0x98, 0xa8, 0xb8, 0xc8, 0xd8, 0xe8, 0xf8, 0x10, 0x28, 0x40, 0x58, 0x70, 0x88).  But I'm not sure I follow the relevence, as memcpy doesn't have to start at an aligned address?
Comment 7 James Farmer 2007-01-08 20:06:57 UTC
Okay, I think the attached "Test Case 2" demonstrates the unexpected "Conditional jump or move depends on uninitialized value(s)" error.  Basically, this generates a great big array which is only populated in two continuous blocks at the start and the end.  The data is memcpy-ed into another array of the same size and the populated areas compared.  After many tries with various random array sizes, I finally got it to produce the error:

==21603== Conditional jump or move depends on uninitialised value(s)
==21603==    at 0x8048938: check_memory_area. (memcpytest2.c:61)
==21603==    by 0x26412E: ???

The full valgrind -v output is rather long so I'll attach that as a file to this issue too.

Again, the issue vanishes if the code is compiled -O1 rather than -O2.

Comment 8 James Farmer 2007-01-08 20:08:19 UTC
Created attachment 19198 [details]
Test case 2 - demonstrates unexpected "uninitialised value" error
Comment 9 James Farmer 2007-01-08 20:09:39 UTC
Created attachment 19199 [details]
Valgrind output from test case 2
Comment 10 Julian Seward 2007-01-09 02:40:44 UTC
This is extremely strange.  I have reproduced the problem and made a
simplified test case.  I don't think it is a bug in the Intel code.
Valgrind appears to follow the same path of instructions through the
fast memcpy intrinsic as native execution does.  So I still have no 
idea why it complains.
Comment 11 Julian Seward 2007-01-09 03:33:46 UTC
Try this.  It's not a good solution but it's realistically about as good
as can easily be done.  It makes both your test cases run clean.

Index: memcheck/mc_replace_strmem.c
===================================================================
--- memcheck/mc_replace_strmem.c        (revision 6492)
+++ memcheck/mc_replace_strmem.c        (working copy)
@@ -401,6 +401,7 @@

 MEMCPY(m_libc_soname, memcpy)
 MEMCPY(m_ld_so_1,     memcpy) /* ld.so.1 */
+MEMCPY(NONE,          _intel_fast_memcpy)


 #define MEMCMP(soname, fnname) \
Comment 12 Julian Seward 2007-01-17 01:36:17 UTC
A couple of people have confirmed this patch partially but not
fully solves the problem.  I'll put it into 3.2.2.
Comment 13 Julian Seward 2007-01-20 00:03:21 UTC
Various fixes in 3.2.2 and trunk appear to have fixed, or
mostly fixed this.  Closing.
Comment 14 Kevin Neel 2007-07-28 00:22:04 UTC
Still getting this sometimes in 3.2.3. for us, it's an aligned source of 158 bytes. intel_fast_memcpy reads in 32 byte blocks, thus reading past the end. Since the compiler inserts the code locally, it doesn't need a PLT entry, and calls it drectly. Is there any way to make valgrind smart enough to recognize this situation?

==3749== Invalid read of size 8
==3749==    at 0x48CB638: (within /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x42C1B74: nsdo (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x43D8547: nsbasic_sd (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x43D877A: nssend (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x432B56E: nsnasend (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x438F43E: nacomsn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4386355: na_client (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x437CFD1: naconnect (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4322C85: nsnadoconn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x431D2D7: nsnaconn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x42B1F0A: nscall (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4357D79: niotns (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==  Address 0x5EB2478 is 152 bytes inside a block of size 158 alloc'd
==3749==    at 0x40046F2: malloc (vg_replace_malloc.c:149)
==3749==    by 0x4390BB4: nacomap (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x438F3F6: nacomsn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4386355: na_client (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x437CFD1: naconnect (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4322C85: nsnadoconn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x431D2D7: nsnaconn (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x42B1F0A: nscall (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4357D79: niotns (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x43F26C4: nigcall (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x4363477: osncon (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749==    by 0x41E42B7: kpuadef (in /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1)
==3749== 
==3749== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
starting debugger
==3749== starting debugger with cmd: $s/packages/install/bin/gdb -nw /proc/3803/fd/16374 3803
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Finished Loading Init File
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Attaching to program: /proc/3803/fd/16374, process 3803
Reading symbols from /net/stadd24/scratch/kneel/packages/install/lib/valgrind/x86-linux/vgpreload_core.so...done.
Loaded symbols for /net/stadd24/scratch/kneel/packages/install/lib/valgrind/x86-linux/vgpreload_core.so
Reading symbols from /net/stadd24/scratch/kneel/packages/install/lib/valgrind/x86-linux/vgpreload_memcheck.so...done.
Loaded symbols for /net/stadd24/scratch/kneel/packages/install/lib/valgrind/x86-linux/vgpreload_memcheck.so
Reading symbols from /usr/lib/libcwait.so...done.
Loaded symbols for /usr/lib/libcwait.so
Reading symbols from /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libclntsh.so.11.1...done.
Loaded symbols for /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
Reading symbols from /scratch/kneel/view_storage/kneel_l1/oracle/.dispatch/lib/libnnz11.so...done.
Loaded symbols for /ade/kneel_l1/oracle/lib/libnnz11.so
Reading symbols from /scratch/kneel/view_storage/kneel_l1/rdbms/lib/libskgxp11.so...done.
Loaded symbols for /ade/kneel_l1/rdbms/lib/libskgxp11.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 96344288 (LWP 3749)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libaio.so.1...done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libnuma.so...done.
Loaded symbols for /usr/lib/libnuma.so
Reading symbols from /scratch/kneel/view_storage/kneel_l1/oracle/.dispatch/lib/libnque11.so...done.
Loaded symbols for /ade/kneel_l1/oracle/lib/libnque11.so
0x048cb638 in movdqa8 () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
(gdb) where
#0  0x048cb638 in movdqa8 () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#1  0x05eb23e0 in ?? ()
#2  0x05e72a18 in ?? ()
#3  0x0592a04c in ?? () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#4  0x048c9f5d in _intel_fast_memcpy.J ()
   from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#5  0x042c1b75 in nsdo () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#6  0x043d8548 in nsbasic_sd () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#7  0x043d877b in nssend () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#8  0x0432b56f in nsnasend () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#9  0x0438f43f in nacomsn () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#10 0x04386356 in na_client () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#11 0x0437cfd2 in naconnect () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#12 0x04322c86 in nsnadoconn () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#13 0x0431d2d8 in nsnaconn () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#14 0x042b1f0b in nscall () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#15 0x04357d7a in niotns () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#16 0x043f26c5 in nigcall () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#17 0x04363478 in osncon () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#18 0x041e42b8 in kpuadef () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#19 0x041d3034 in upiini () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#20 0x041cebfe in upiah0 () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#21 0x041e39a3 in kpuatch () from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#22 0x04aa54f0 in kpuspsessionget ()
   from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#23 0x0493d2c4 in OCISessionGet ()
   from /ade/kneel_l1/rdbms/lib/libclntsh.so.11.1
#24 0x08049083 in main (argc=1, argv=0xbeff71b4) at tkpgqc17c.c:162
Comment 15 Julian Seward 2007-08-26 14:35:19 UTC
> doesn't need a PLT entry, and calls it drectly. Is there any way to make
> valgrind smart enough to recognize this situation?


I can't think of a way to fix this that doesn't burden the rest of Memcheck
with large extra performance overheads.   Any way you slice it, reading
beyond the end of allocated blocks is a violation of POSIX and really
icc shouldn't produce code that does it (regardless of the fact that
in this particular case it's cleverly done so it won't generate any
exceptions which wouldn't have happened anyway).

Reduce optimisation level?  Use a different compiler?