Created attachment 103747 [details] output from GDB showing faulty instruction sequence + log from run valgrind I noticed a valgrind crash with error: "Temporary storage exhusted" in a case when there is long sequence(50+) of instructions: .. vfmadd231ps ymm0,ymm1,ymm2 vfmadd231ps ymm0,ymm1,ymm2 vfmadd231ps ymm0,ymm1,ymm2 vfmadd231ps ymm0,ymm1,ymm2 .... In attachment there is full crash message and sequence of instructions as seen in GDB (intel syntax) that made this crash happen. This sequence of instructions was created using JIt assembler xbyak (https://github.com/herumi/xbyak). I can attach linux project if needed. Notes: - valgrind was build from source (3.12) - commandline use: <path to valgrind 3.12>/bin/valgrind --tool=memcheck <my project with sequence of vfmadd231ps ymm0,ymm1,ymm2> - Operating System: Fedora 21, uname -a: Linux linux-brix 3.14.5 #1 SMP Fri Mar 13 16:27:51 CET 2015 x86_64 x86_64 x86_64 GNU/Linux - without valgrind program works fine ,( does nothing as it was modified for exploiting this problem)
Yes, VEX has a very poor (verbose) translation for such instructions and generates huge amounts of code, which breaks the JIT. We should fix this somehow.
*** Bug 377159 has been marked as a duplicate of this bug. ***
*** Bug 375150 has been marked as a duplicate of this bug. ***
I should add: as a workaround, you can try specifying --vex-guest-max-insns=25 and if that still doesn't work, lowering the value towards zero. You shouldn't go below about 10. Lower values reduce performance and increase the risk of false errors. The default value is 50.
*** Bug 378068 has been marked as a duplicate of this bug. ***
proposed WA with --vex-guest-max-insns=25 worked for me Thanks
(In reply to Julian Seward from comment #1) > VEX has a very poor (verbose) translation for such instructions [..] VEX r3331 somewhat improves this, reducing the size of the generated code to about 75% of what it was before. Better than nothing.
VEX r3335 further improves the situation a bit. I would be interested to hear if this makes it possible to run these problematic cases without the workaround in comment #4 (the use of --vex-guest-max-insns=25). The only convincing fix though is to rewrite the amd64 front end translation for FMA instructions to use fewer, wider IROps. This isn't simple, though.
Fixed completely by vex r3337. You should now be able to throw any sequence of insns through the JIT without getting such failures and without using the workaround in comment 4. If the problems persist, please let me know ASAP.
We're still seeing the issue in FFmpeg. This is valgrind r16297 with vex r3344. ==15422== Memcheck, a memory error detector ==15422== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15422== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info ==15422== Command: /home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/build/ffmpeg -nostdin -nostats -cpuflags all -hwaccel none -threads 1 -thread_type frame+slice -i /home/fate/fate-suite/vp9-test-vectors/vp90-2-00-quantizer-01.webm -flags +bitexact -fflags +bitexact -f framemd5 - ==15422== ffmpeg version N-85439-g03eb0515c1 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 6.3.1 (GCC) 20170306 configuration: --prefix=/home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/install --samples=/home/fate/fate-suite --enable-gpl --enable-memory-poisoning --enable-avresample --cc='ccache cc' --target-exec='/home/fate/src/valgrind/vg-in-place --error-exitcode=1 --malloc-fill=0xa2 --track-origins=yes --leak-check=full --gen-suppressions=all --suppressions=/home/fate/ffmpeg/tests/fate-valgrind.supp' --disable-stripping --disable-memory-poisoning libavutil 55. 60.100 / 55. 60.100 libavcodec 57. 92.100 / 57. 92.100 libavformat 57. 72.100 / 57. 72.100 libavdevice 57. 7.100 / 57. 7.100 libavfilter 6. 84.101 / 6. 84.101 libavresample 3. 6. 0 / 3. 6. 0 libswscale 4. 7.100 / 4. 7.100 libswresample 2. 8.100 / 2. 8.100 libpostproc 54. 6.100 / 54. 6.100 VEX temporary storage exhausted. Pool = TEMP, start 0x38fbed68 curr 0x39477e28 end 0x394838a7 (size 5000000) vex: the `impossible' happened: VEX temporary storage exhausted. Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile. vex storage: T total 946892552 bytes allocated vex storage: P total 640 bytes allocated valgrind: the 'impossible' happened: LibVEX called failure_exit(). host stacktrace: ==15422== at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378) ==15422== by 0x38085D14: report_and_quit (m_libcassert.c:449) ==15422== by 0x38085F51: panic (m_libcassert.c:525) ==15422== by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530) ==15422== by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535) ==15422== by 0x380A1EE2: failure_exit (m_translate.c:740) ==15422== by 0x38153748: vpanic (main_util.c:231) ==15422== by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171) ==15422== by 0x38179650: LibVEX_Alloc_inline (main_util.h:167) ==15422== by 0x38179650: doRegisterAllocation (host_generic_reg_alloc2.c:517) ==15422== by 0x3815167C: libvex_BackEnd (main_main.c:1122) ==15422== by 0x3815167C: LibVEX_Translate (main_main.c:1225) ==15422== by 0x380A4725: vgPlain_translate (m_translate.c:1770) ==15422== by 0x380DAD7B: handle_chain_me (scheduler.c:1080) ==15422== by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424) ==15422== by 0x380EBB56: thread_wrapper (syswrap-linux.c:103) ==15422== by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 15422) ==15422== at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151) ==15422== by 0xB7D794: ff_vp9_decode_block (vp9block.c:1387) ==15422== by 0x1: ??? ==15422== by 0x70CB63F: ??? ==15422== by 0xB: ??? Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c. If that doesn't help, please report this bug to: www.valgrind.org In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks.
Uh, I thought I fixed it pretty comprehensively in r3344. Anyway, can you get me please a copy of the basic block that contains the failing address? at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151) Not just starting from there, but backing up to where the block starts and all the way to a conditional or indirect branch that finishes it. And .. please .. disassembly? So I don't have to wade through layers of assembler macros to figure out what the actual instructions are.
Note: the issue only happens when using --track-origins=yes Not sure if that's really what you are asking for, but with the following similar but simpler case: ☭ /home/ux/src/valgrind/vg-in-place --track-origins=yes tests/checkasm/checkasm --test=vp9dsp ==5547== Memcheck, a memory error detector ==5547== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==5547== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info ==5547== Command: tests/checkasm/checkasm --test=vp9dsp ==5547== checkasm: using random seed 2405440599 [...] AVX2: - vp9dsp.ipred [OK] VEX temporary storage exhausted. Pool = TEMP, start 0x38fbed68 curr 0x3946ee10 end 0x394838a7 (size 5000000) vex: the `impossible' happened: VEX temporary storage exhausted. Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile. vex storage: T total 3774583552 bytes allocated vex storage: P total 640 bytes allocated valgrind: the 'impossible' happened: LibVEX called failure_exit(). host stacktrace: ==5547== at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378) ==5547== by 0x38085D14: report_and_quit (m_libcassert.c:449) ==5547== by 0x38085F51: panic (m_libcassert.c:525) ==5547== by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530) ==5547== by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535) ==5547== by 0x380A1EE2: failure_exit (m_translate.c:740) ==5547== by 0x38153748: vpanic (main_util.c:231) ==5547== by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171) ==5547== by 0x3818D15D: LibVEX_Alloc_inline (main_util.h:167) ==5547== by 0x3818D15D: addHInstr_SLOW (host_generic_regs.c:300) ==5547== by 0x3817A9A1: doRegisterAllocation (host_generic_reg_alloc2.c:1550) ==5547== by 0x3815167C: libvex_BackEnd (main_main.c:1122) ==5547== by 0x3815167C: LibVEX_Translate (main_main.c:1225) ==5547== by 0x380A4725: vgPlain_translate (m_translate.c:1770) ==5547== by 0x380DAD7B: handle_chain_me (scheduler.c:1080) ==5547== by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424) ==5547== by 0x380EBB56: thread_wrapper (syswrap-linux.c:103) ==5547== by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 5547) ==5547== at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149) ==5547== by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243) The SIMD function: 000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>: 1df30: c5 fd 6f 4a 20 vmovdqa ymm1,YMMWORD PTR [rdx+0x20] 1df35: c5 fd 6f 52 40 vmovdqa ymm2,YMMWORD PTR [rdx+0x40] 1df3a: c5 fd 6f 5a 60 vmovdqa ymm3,YMMWORD PTR [rdx+0x60] 1df3f: c5 fd 6f aa a0 00 00 vmovdqa ymm5,YMMWORD PTR [rdx+0xa0] 1df46: 00 1df47: c5 fd 6f b2 c0 00 00 vmovdqa ymm6,YMMWORD PTR [rdx+0xc0] 1df4e: 00 1df4f: c5 fd 6f ba e0 00 00 vmovdqa ymm7,YMMWORD PTR [rdx+0xe0] 1df56: 00 1df57: c5 7d 6f 82 00 01 00 vmovdqa ymm8,YMMWORD PTR [rdx+0x100] 1df5e: 00 1df5f: c5 7d 6f 8a 20 01 00 vmovdqa ymm9,YMMWORD PTR [rdx+0x120] 1df66: 00 1df67: c5 7d 6f 92 40 01 00 vmovdqa ymm10,YMMWORD PTR [rdx+0x140] 1df6e: 00 1df6f: c5 7d 6f 9a 60 01 00 vmovdqa ymm11,YMMWORD PTR [rdx+0x160] 1df76: 00 1df77: c5 7d 6f a2 80 01 00 vmovdqa ymm12,YMMWORD PTR [rdx+0x180] 1df7e: 00 1df7f: c5 7d 6f aa a0 01 00 vmovdqa ymm13,YMMWORD PTR [rdx+0x1a0] 1df86: 00 1df87: c5 7d 6f b2 c0 01 00 vmovdqa ymm14,YMMWORD PTR [rdx+0x1c0] 1df8e: 00 1df8f: c5 7d 6f ba e0 01 00 vmovdqa ymm15,YMMWORD PTR [rdx+0x1e0] 1df96: 00 1df97: c5 85 69 c1 vpunpckhwd ymm0,ymm15,ymm1 1df9b: c5 05 61 f9 vpunpcklwd ymm15,ymm15,ymm1 [... only v* inst and a few lea ...] 1f3f4: c5 f8 77 vzeroupper 1f3f7: c3 ret 1f3f8: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 1f3ff: 00 and its caller: 0000000000000000 <checkasm_stack_clobber>: 0: 48 81 ec a8 00 00 00 sub rsp,0xa8 7: 48 c7 c6 a0 00 00 00 mov rsi,0xa0 000000000000000e <checkasm_stack_clobber.loop>: e: 48 89 3c 34 mov QWORD PTR [rsp+rsi*1],rdi 12: 48 83 ee 08 sub rsi,0x8 16: 7d f6 jge e <checkasm_stack_clobber.loop> 18: 48 81 c4 a8 00 00 00 add rsp,0xa8 1f: c3 ret 0000000000000020 <checkasm_checked_call>: 20: 53 push rbx 21: 55 push rbp 22: 41 54 push r12 24: 41 55 push r13 26: 41 56 push r14 28: 41 57 push r15 2a: 48 81 ec 88 00 00 00 sub rsp,0x88 31: 49 89 fa mov r10,rdi 34: 48 8b bc 24 c0 00 00 mov rdi,QWORD PTR [rsp+0xc0] 3b: 00 3c: 48 8b b4 24 c8 00 00 mov rsi,QWORD PTR [rsp+0xc8] 43: 00 44: 48 8b 94 24 d0 00 00 mov rdx,QWORD PTR [rsp+0xd0] 4b: 00 4c: 48 8b 8c 24 d8 00 00 mov rcx,QWORD PTR [rsp+0xd8] 53: 00 54: 4c 8b 84 24 e0 00 00 mov r8,QWORD PTR [rsp+0xe0] 5b: 00 5c: 4c 8b 8c 24 e8 00 00 mov r9,QWORD PTR [rsp+0xe8] 63: 00 64: 48 8b 9c 24 f0 00 00 mov rbx,QWORD PTR [rsp+0xf0] 6b: 00 6c: 48 89 1c 24 mov QWORD PTR [rsp],rbx 70: 48 8b 9c 24 f8 00 00 mov rbx,QWORD PTR [rsp+0xf8] 77: 00 78: 48 89 5c 24 08 mov QWORD PTR [rsp+0x8],rbx 7d: 48 8b 9c 24 00 01 00 mov rbx,QWORD PTR [rsp+0x100] 84: 00 85: 48 89 5c 24 10 mov QWORD PTR [rsp+0x10],rbx 8a: 48 8b 9c 24 08 01 00 mov rbx,QWORD PTR [rsp+0x108] 91: 00 92: 48 89 5c 24 18 mov QWORD PTR [rsp+0x18],rbx 97: 48 8b 9c 24 10 01 00 mov rbx,QWORD PTR [rsp+0x110] 9e: 00 9f: 48 89 5c 24 20 mov QWORD PTR [rsp+0x20],rbx a4: 48 8b 9c 24 18 01 00 mov rbx,QWORD PTR [rsp+0x118] ab: 00 ac: 48 89 5c 24 28 mov QWORD PTR [rsp+0x28],rbx b1: 48 8b 9c 24 20 01 00 mov rbx,QWORD PTR [rsp+0x120] b8: 00 b9: 48 89 5c 24 30 mov QWORD PTR [rsp+0x30],rbx be: 48 8b 9c 24 28 01 00 mov rbx,QWORD PTR [rsp+0x128] c5: 00 c6: 48 89 5c 24 38 mov QWORD PTR [rsp+0x38],rbx cb: 48 8b 9c 24 30 01 00 mov rbx,QWORD PTR [rsp+0x130] d2: 00 d3: 48 89 5c 24 40 mov QWORD PTR [rsp+0x40],rbx d8: 4c 8b 3c 25 00 00 00 mov r15,QWORD PTR ds:0x0 df: 00 e0: 4c 8b 34 25 00 00 00 mov r14,QWORD PTR ds:0x0 e7: 00 e8: 4c 8b 2c 25 00 00 00 mov r13,QWORD PTR ds:0x0 ef: 00 f0: 4c 8b 24 25 00 00 00 mov r12,QWORD PTR ds:0x0 f7: 00 f8: 48 8b 2c 25 00 00 00 mov rbp,QWORD PTR ds:0x0 ff: 00 100: 48 8b 1c 25 00 00 00 mov rbx,QWORD PTR ds:0x0 107: 00 108: 41 ff d2 call r10 10b: 4c 33 3c 25 00 00 00 xor r15,QWORD PTR ds:0x0 112: 00 113: 4d 09 ff or r15,r15 116: 4c 33 34 25 00 00 00 xor r14,QWORD PTR ds:0x0 11d: 00 11e: 4d 09 f7 or r15,r14 121: 4c 33 2c 25 00 00 00 xor r13,QWORD PTR ds:0x0 128: 00 129: 4d 09 ef or r15,r13 12c: 4c 33 24 25 00 00 00 xor r12,QWORD PTR ds:0x0 133: 00 134: 4d 09 e7 or r15,r12 137: 48 33 2c 25 00 00 00 xor rbp,QWORD PTR ds:0x0 13e: 00 13f: 49 09 ef or r15,rbp 142: 48 33 1c 25 00 00 00 xor rbx,QWORD PTR ds:0x0 149: 00 14a: 49 09 df or r15,rbx 14d: 74 1b je 16a <checkasm_checked_call.clobber_ok> 14f: 48 89 c3 mov rbx,rax 152: 48 89 d5 mov rbp,rdx 155: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0 15c: 00 15d: 31 c0 xor eax,eax 15f: e8 00 00 00 00 call 164 <checkasm_checked_call+0x144> 164: 48 89 ea mov rdx,rbp 167: 48 89 d8 mov rax,rbx 000000000000016a <checkasm_checked_call.clobber_ok>: 16a: 9b d9 34 24 fstenv [rsp] 16e: 66 81 7c 24 08 ff ff cmp WORD PTR [rsp+0x8],0xffff 175: 74 1d je 194 <checkasm_checked_call.emms_ok> 177: 48 89 c3 mov rbx,rax 17a: 48 89 d5 mov rbp,rdx 17d: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0 184: 00 185: 31 c0 xor eax,eax 187: e8 00 00 00 00 call 18c <checkasm_checked_call.clobber_ok+0x22> 18c: 48 89 ea mov rdx,rbp 18f: 48 89 d8 mov rax,rbx 192: 0f 77 emms 0000000000000194 <checkasm_checked_call.emms_ok>: 194: 48 81 c4 88 00 00 00 add rsp,0x88 19b: 41 5f pop r15 19d: 41 5e pop r14 19f: 41 5d pop r13 1a1: 41 5c pop r12 1a3: 5d pop rbp 1a4: 5b pop rbx 1a5: c3 ret 1a6: 66 66 0f 1f 84 00 00 data16 nop WORD PTR [rax+rax*1+0x0] 1ad: 00 00 00 00000000000001b0 <checkasm_checked_call_emms>: 1b0: 53 push rbx 1b1: 55 push rbp 1b2: 41 54 push r12 1b4: 41 55 push r13 1b6: 41 56 push r14 1b8: 41 57 push r15 1ba: 48 81 ec 88 00 00 00 sub rsp,0x88 1c1: 49 89 fa mov r10,rdi 1c4: 48 8b bc 24 c0 00 00 mov rdi,QWORD PTR [rsp+0xc0] 1cb: 00 1cc: 48 8b b4 24 c8 00 00 mov rsi,QWORD PTR [rsp+0xc8] 1d3: 00 1d4: 48 8b 94 24 d0 00 00 mov rdx,QWORD PTR [rsp+0xd0] 1db: 00 1dc: 48 8b 8c 24 d8 00 00 mov rcx,QWORD PTR [rsp+0xd8] 1e3: 00 1e4: 4c 8b 84 24 e0 00 00 mov r8,QWORD PTR [rsp+0xe0] 1eb: 00 1ec: 4c 8b 8c 24 e8 00 00 mov r9,QWORD PTR [rsp+0xe8] 1f3: 00 1f4: 48 8b 9c 24 f0 00 00 mov rbx,QWORD PTR [rsp+0xf0] 1fb: 00 1fc: 48 89 1c 24 mov QWORD PTR [rsp],rbx 200: 48 8b 9c 24 f8 00 00 mov rbx,QWORD PTR [rsp+0xf8] 207: 00 208: 48 89 5c 24 08 mov QWORD PTR [rsp+0x8],rbx 20d: 48 8b 9c 24 00 01 00 mov rbx,QWORD PTR [rsp+0x100] 214: 00 215: 48 89 5c 24 10 mov QWORD PTR [rsp+0x10],rbx 21a: 48 8b 9c 24 08 01 00 mov rbx,QWORD PTR [rsp+0x108] 221: 00 222: 48 89 5c 24 18 mov QWORD PTR [rsp+0x18],rbx 227: 48 8b 9c 24 10 01 00 mov rbx,QWORD PTR [rsp+0x110] 22e: 00 22f: 48 89 5c 24 20 mov QWORD PTR [rsp+0x20],rbx 234: 48 8b 9c 24 18 01 00 mov rbx,QWORD PTR [rsp+0x118] 23b: 00 23c: 48 89 5c 24 28 mov QWORD PTR [rsp+0x28],rbx 241: 48 8b 9c 24 20 01 00 mov rbx,QWORD PTR [rsp+0x120] 248: 00 249: 48 89 5c 24 30 mov QWORD PTR [rsp+0x30],rbx 24e: 48 8b 9c 24 28 01 00 mov rbx,QWORD PTR [rsp+0x128] 255: 00 256: 48 89 5c 24 38 mov QWORD PTR [rsp+0x38],rbx 25b: 48 8b 9c 24 30 01 00 mov rbx,QWORD PTR [rsp+0x130] 262: 00 263: 48 89 5c 24 40 mov QWORD PTR [rsp+0x40],rbx 268: 4c 8b 3c 25 00 00 00 mov r15,QWORD PTR ds:0x0 26f: 00 270: 4c 8b 34 25 00 00 00 mov r14,QWORD PTR ds:0x0 277: 00 278: 4c 8b 2c 25 00 00 00 mov r13,QWORD PTR ds:0x0 27f: 00 280: 4c 8b 24 25 00 00 00 mov r12,QWORD PTR ds:0x0 287: 00 288: 48 8b 2c 25 00 00 00 mov rbp,QWORD PTR ds:0x0 28f: 00 290: 48 8b 1c 25 00 00 00 mov rbx,QWORD PTR ds:0x0 297: 00 298: 41 ff d2 call r10 29b: 4c 33 3c 25 00 00 00 xor r15,QWORD PTR ds:0x0 2a2: 00 2a3: 4d 09 ff or r15,r15 2a6: 4c 33 34 25 00 00 00 xor r14,QWORD PTR ds:0x0 2ad: 00 2ae: 4d 09 f7 or r15,r14 2b1: 4c 33 2c 25 00 00 00 xor r13,QWORD PTR ds:0x0 2b8: 00 2b9: 4d 09 ef or r15,r13 2bc: 4c 33 24 25 00 00 00 xor r12,QWORD PTR ds:0x0 2c3: 00 2c4: 4d 09 e7 or r15,r12 2c7: 48 33 2c 25 00 00 00 xor rbp,QWORD PTR ds:0x0 2ce: 00 2cf: 49 09 ef or r15,rbp 2d2: 48 33 1c 25 00 00 00 xor rbx,QWORD PTR ds:0x0 2d9: 00 2da: 49 09 df or r15,rbx 2dd: 74 1b je 2fa <checkasm_checked_call_emms.clobber_ok> 2df: 48 89 c3 mov rbx,rax 2e2: 48 89 d5 mov rbp,rdx 2e5: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0 2ec: 00 2ed: 31 c0 xor eax,eax 2ef: e8 00 00 00 00 call 2f4 <checkasm_checked_call_emms+0x144> 2f4: 48 89 ea mov rdx,rbp 2f7: 48 89 d8 mov rax,rbx 00000000000002fa <checkasm_checked_call_emms.clobber_ok>: 2fa: 0f 77 emms 2fc: 48 81 c4 88 00 00 00 add rsp,0x88 303: 41 5f pop r15 305: 41 5e pop r14 307: 41 5d pop r13 309: 41 5c pop r12 30b: 5d pop rbp 30c: 5b pop rbx 30d: c3 ret
(In reply to ux from comment #12) > Thread 1: status = VgTs_Runnable (lwpid 5547) > ==5547== at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149) > ==5547== by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243) > 000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>: > 1df30: c5 fd 6f 4a 20 vmovdqa ymm1,YMMWORD PTR [rdx+0x20] > [..] > 1df9b: c5 05 61 f9 vpunpcklwd ymm15,ymm15,ymm1 > > [... only v* inst and a few lea ...] This was unfortunately the bit I need to see. Specifically, I need to see the block containing the instruction at address xxxxx9BE. If I had to guess, it would be at 1e9be. If too complex then just send the entire disassembly of ff_vp9_idct_iadst_16x16_add_avx2 (but please attach as file, don't paste as comment.)
Created attachment 104952 [details] ff_vp9_idct_iadst_16x16_add_avx2
My bad. So this block? 1e978: c5 fd 72 e0 0e vpsrad ymm0,ymm0,0xe 1e97d: c5 cd 6b f5 vpackssdw ymm6,ymm6,ymm5 1e981: c5 2d 6b d0 vpackssdw ymm10,ymm10,ymm0 1e985: c4 41 45 fd fe vpaddw ymm15,ymm7,ymm14 1e98a: c5 0d f9 f7 vpsubw ymm14,ymm14,ymm7 1e98e: c4 c1 65 fd fd vpaddw ymm7,ymm3,ymm13 1e993: c5 15 f9 eb vpsubw ymm13,ymm13,ymm3 1e997: c5 fd 6f 2a vmovdqa ymm5,YMMWORD PTR [rdx] 1e99b: c5 fd 6f 82 80 00 00 vmovdqa ymm0,YMMWORD PTR [rdx+0x80] 1e9a2: 00 1e9a3: c5 fd 6f 62 20 vmovdqa ymm4,YMMWORD PTR [rdx+0x20] 1e9a8: c5 7d 6f 4a 40 vmovdqa ymm9,YMMWORD PTR [rdx+0x40] 1e9ad: c5 fd 6f 5a 60 vmovdqa ymm3,YMMWORD PTR [rdx+0x60] 1e9b2: c5 fd 7f 3a vmovdqa YMMWORD PTR [rdx],ymm7 1e9b6: c5 7d 7f ba 80 00 00 vmovdqa YMMWORD PTR [rdx+0x80],ymm15 1e9bd: 00 1e9be: c5 fd 7f 72 20 vmovdqa YMMWORD PTR [rdx+0x20],ymm6 1e9c3: c5 7d 7f 42 40 vmovdqa YMMWORD PTR [rdx+0x40],ymm8 1e9c8: c5 7d 7f 72 60 vmovdqa YMMWORD PTR [rdx+0x60],ymm14 1e9cd: c4 41 55 69 c4 vpunpckhwd ymm8,ymm5,ymm12 1e9d2: c4 c1 55 61 ec vpunpcklwd ymm5,ymm5,ymm12 1e9d7: c5 bd f5 3c 25 00 00 vpmaddwd ymm7,ymm8,YMMWORD PTR ds:0x0 1e9de: 00 00 1e9e0: c5 3d f5 04 25 00 00 vpmaddwd ymm8,ymm8,YMMWORD PTR ds:0x0 1e9e7: 00 00 1e9e9: c5 55 f5 24 25 00 00 vpmaddwd ymm12,ymm5,YMMWORD PTR ds:0x0 1e9f0: 00 00 1e9f2: c5 d5 f5 2c 25 00 00 vpmaddwd ymm5,ymm5,YMMWORD PTR ds:0x0 1e9f9: 00 00 1e9fb: c4 c1 65 69 f1 vpunpckhwd ymm6,ymm3,ymm9 1ea00: c4 c1 65 61 d9 vpunpcklwd ymm3,ymm3,ymm9 1ea05: c5 4d f5 34 25 00 00 vpmaddwd ymm14,ymm6,YMMWORD PTR ds:0x0 1ea0c: 00 00 1ea0e: c5 cd f5 34 25 00 00 vpmaddwd ymm6,ymm6,YMMWORD PTR ds:0x0 1ea15: 00 00 1ea17: c5 65 f5 0c 25 00 00 vpmaddwd ymm9,ymm3,YMMWORD PTR ds:0x0 1ea1e: 00 00 1ea20: c5 e5 f5 1c 25 00 00 vpmaddwd ymm3,ymm3,YMMWORD PTR ds:0x0 1ea27: 00 00 1ea29: c5 65 fe fd vpaddd ymm15,ymm3,ymm5 Full function in attachment.
(In reply to ux from comment #15) > My bad. So this block? > Full function in attachment. Ok .. nearly there. Could you please disassemble the function using the normal AT&T syntax rather than then Intel syntax, so I don't have to hand-translate it to AT&T syntax myself in order to have a test case? Thanks.
Created attachment 104959 [details] ff_vp9_idct_iadst_16x16_add_avx2 (AT&T)
Try this. Does it help? Index: priv/guest_amd64_toIR.c =================================================================== --- priv/guest_amd64_toIR.c (revision 3345) +++ priv/guest_amd64_toIR.c (working copy) @@ -28152,6 +28152,7 @@ ) ); *uses_vvvv = True; + dres->hint = Dis_HintVerbose; goto decode_success; } break;
Seems to work, all our VP9 tests pass with this. Thanks!
(In reply to ux from comment #19) > Seems to work, all our VP9 tests pass with this. Thanks! Committed as vex r3346.