Bug 378068 - valgrind crashes on AVX2 function in FFmpeg
Summary: valgrind crashes on AVX2 function in FFmpeg
Status: RESOLVED DUPLICATE of bug 375839
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-25 16:07 UTC by Ronald S. Bultje
Modified: 2017-03-27 13:11 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ronald S. Bultje 2017-03-25 16:07:21 UTC
In FFmpeg, we use automatic testing in lots of configurations (including some with valgrind) to make sure "stuff" works correctly. We're having the problem that a recently added AVX2 optimization seems to crash valgrind:

====

==27088== Memcheck, a memory error detector
==27088== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27088== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==27088== Command: /home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/build/tests/checkasm/checkasm
==27088== 
checkasm: using random seed 2501648052
[..]
AVX2:
[..]
 - vp9dsp.ipred               [OK]
VEX temporary storage exhausted.
Pool = TEMP,  start 0x38fb37a8 curr 0x39463168 end 0x394782e7 (size 5000000)

vex: the `impossible' happened:
   VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 6875178536 bytes allocated
vex storage: P total 640 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==27088==    at 0x38085218: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x38085334: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x38085571: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x3808559A: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x380A11C2: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x3814F1F8: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x3814F264: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x3818876D: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x38176101: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x3814D1D9: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x380A3A05: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x380D72CB: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x380D8E3F: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==27088==    by 0x380E82C6: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 27088)
==27088==    at 0x5BEF4E: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149)
==27088==    by 0x42D37A: checkasm_checked_call_emms (checkasm.asm:243)
==27088==    by 0x9254092BC6A16273: ???
==27088==    by 0x1FF: ???
==27088==    by 0x1EBD33DCE: ???
==27088==    by 0x438A9500000001: ???
==27088==    by 0xF: ???
==27088==    by 0x6CD93F: ??? (in /home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/build/tests/checkasm/checkasm)
==27088==    by 0x3FF: ???
==27088==    by 0x1000000003: ???
==27088==    by 0x42D28F: ??? (checkasm.asm:243)
==27088==    by 0xDEADBEEFDEADBEEE: ???
==27088==    by 0xDEADBEEFDEADBEEE: ???
==27088==    by 0x4287B9: check_itxfm (vp9dsp.c:362)
[..]

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

====

Note that different machines have different failures. For example, when one of our developers able to reproduce this tests it locally, he gets this error instead:

====
[..]
 - vp9dsp.ipred               [OK]

valgrind: m_translate.c:1772 (vgPlain_translate): Assertion 'tres.status == VexTransOK' failed.

host stacktrace:
==19293==    at 0x38085B73: show_sched_status_wrk (m_libcassert.c:378)
==19293==    by 0x38085C74: report_and_quit (m_libcassert.c:449)
==19293==    by 0x38085E01: vgPlain_assert_fail (m_libcassert.c:515)
==19293==    by 0x380A4A96: vgPlain_translate (m_translate.c:1772)
==19293==    by 0x380DACDB: handle_chain_me (scheduler.c:1080)
==19293==    by 0x380DC69F: vgPlain_scheduler (scheduler.c:1424)
==19293==    by 0x380EBAB6: thread_wrapper (syswrap-linux.c:103)
==19293==    by 0x380EBAB6: run_a_thread_NORETURN (syswrap-linux.c:156)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 21585)
==21585==    at 0x5C572F: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151)
==21585==    by 0x42DF2A: checkasm_checked_call_emms (checkasm.asm:243)
[..]
====

Note that the function that fails is different (iadst_iadst vs. idct_iadst). We don't really know what this means, but it looks like it's a bug in valgrind. Can you guys suggest next steps for us? The code, if it helps, is available here:

http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/x86/vp9itxfm.asm;h=2c63fe514a3f43a4b6dd32b583deae5a607f96c6;hb=HEAD#l2095

Let me know if I can provide any extra information to make this easier to debug. I'm on OSX 10.12 and valgrind doesn't seem to work at all for me, so I can unfortunately not test it myself...
Comment 1 Julian Seward 2017-03-27 13:11:46 UTC
See easy workaround at https://bugs.kde.org/show_bug.cgi?id=375839#c4

*** This bug has been marked as a duplicate of bug 375839 ***