| Summary: | Temporary storage exhausted , when long sequence of vfmadd231ps instructions to be executed | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Jacek Czaja <jacek.czaja> |
| Component: | general | Assignee: | Julian Seward <jseward> |
| Status: | RESOLVED FIXED | ||
| Severity: | crash | CC: | heetahke, jacek.czaja, nagendra.goel, nmanjofo, o1o2o3o4o5, rsbultje |
| Priority: | NOR | ||
| Version First Reported In: | 3.12.0 | ||
| Target Milestone: | --- | ||
| Platform: | Compiled Sources | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
output from GDB showing faulty instruction sequence + log from run valgrind
ff_vp9_idct_iadst_16x16_add_avx2 ff_vp9_idct_iadst_16x16_add_avx2 (AT&T) |
||
|
Description
Jacek Czaja
2017-02-01 12:11:47 UTC
Yes, VEX has a very poor (verbose) translation for such instructions and generates huge amounts of code, which breaks the JIT. We should fix this somehow. *** Bug 377159 has been marked as a duplicate of this bug. *** *** Bug 375150 has been marked as a duplicate of this bug. *** I should add: as a workaround, you can try specifying --vex-guest-max-insns=25 and if that still doesn't work, lowering the value towards zero. You shouldn't go below about 10. Lower values reduce performance and increase the risk of false errors. The default value is 50. *** Bug 378068 has been marked as a duplicate of this bug. *** proposed WA with --vex-guest-max-insns=25 worked for me Thanks (In reply to Julian Seward from comment #1) > VEX has a very poor (verbose) translation for such instructions [..] VEX r3331 somewhat improves this, reducing the size of the generated code to about 75% of what it was before. Better than nothing. VEX r3335 further improves the situation a bit. I would be interested to hear if this makes it possible to run these problematic cases without the workaround in comment #4 (the use of --vex-guest-max-insns=25). The only convincing fix though is to rewrite the amd64 front end translation for FMA instructions to use fewer, wider IROps. This isn't simple, though. Fixed completely by vex r3337. You should now be able to throw any sequence of insns through the JIT without getting such failures and without using the workaround in comment 4. If the problems persist, please let me know ASAP. We're still seeing the issue in FFmpeg. This is valgrind r16297 with vex r3344.
==15422== Memcheck, a memory error detector
==15422== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15422== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==15422== Command: /home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/build/ffmpeg -nostdin -nostats -cpuflags all -hwaccel none -threads 1 -thread_type frame+slice -i /home/fate/fate-suite/vp9-test-vectors/vp90-2-00-quantizer-01.webm -flags +bitexact -fflags +bitexact -f framemd5 -
==15422==
ffmpeg version N-85439-g03eb0515c1 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 6.3.1 (GCC) 20170306
configuration: --prefix=/home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/install --samples=/home/fate/fate-suite --enable-gpl --enable-memory-poisoning --enable-avresample --cc='ccache cc' --target-exec='/home/fate/src/valgrind/vg-in-place --error-exitcode=1 --malloc-fill=0xa2 --track-origins=yes --leak-check=full --gen-suppressions=all --suppressions=/home/fate/ffmpeg/tests/fate-valgrind.supp' --disable-stripping --disable-memory-poisoning
libavutil 55. 60.100 / 55. 60.100
libavcodec 57. 92.100 / 57. 92.100
libavformat 57. 72.100 / 57. 72.100
libavdevice 57. 7.100 / 57. 7.100
libavfilter 6. 84.101 / 6. 84.101
libavresample 3. 6. 0 / 3. 6. 0
libswscale 4. 7.100 / 4. 7.100
libswresample 2. 8.100 / 2. 8.100
libpostproc 54. 6.100 / 54. 6.100
VEX temporary storage exhausted.
Pool = TEMP, start 0x38fbed68 curr 0x39477e28 end 0x394838a7 (size 5000000)
vex: the `impossible' happened:
VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 946892552 bytes allocated
vex storage: P total 640 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
host stacktrace:
==15422== at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378)
==15422== by 0x38085D14: report_and_quit (m_libcassert.c:449)
==15422== by 0x38085F51: panic (m_libcassert.c:525)
==15422== by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530)
==15422== by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535)
==15422== by 0x380A1EE2: failure_exit (m_translate.c:740)
==15422== by 0x38153748: vpanic (main_util.c:231)
==15422== by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171)
==15422== by 0x38179650: LibVEX_Alloc_inline (main_util.h:167)
==15422== by 0x38179650: doRegisterAllocation (host_generic_reg_alloc2.c:517)
==15422== by 0x3815167C: libvex_BackEnd (main_main.c:1122)
==15422== by 0x3815167C: LibVEX_Translate (main_main.c:1225)
==15422== by 0x380A4725: vgPlain_translate (m_translate.c:1770)
==15422== by 0x380DAD7B: handle_chain_me (scheduler.c:1080)
==15422== by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424)
==15422== by 0x380EBB56: thread_wrapper (syswrap-linux.c:103)
==15422== by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 15422)
==15422== at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151)
==15422== by 0xB7D794: ff_vp9_decode_block (vp9block.c:1387)
==15422== by 0x1: ???
==15422== by 0x70CB63F: ???
==15422== by 0xB: ???
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
Uh, I thought I fixed it pretty comprehensively in r3344. Anyway, can you get me please a copy of the basic block that contains the failing address? at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151) Not just starting from there, but backing up to where the block starts and all the way to a conditional or indirect branch that finishes it. And .. please .. disassembly? So I don't have to wade through layers of assembler macros to figure out what the actual instructions are. Note: the issue only happens when using --track-origins=yes
Not sure if that's really what you are asking for, but with the following similar but simpler case:
☭ /home/ux/src/valgrind/vg-in-place --track-origins=yes tests/checkasm/checkasm --test=vp9dsp
==5547== Memcheck, a memory error detector
==5547== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5547== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==5547== Command: tests/checkasm/checkasm --test=vp9dsp
==5547==
checkasm: using random seed 2405440599
[...]
AVX2:
- vp9dsp.ipred [OK]
VEX temporary storage exhausted.
Pool = TEMP, start 0x38fbed68 curr 0x3946ee10 end 0x394838a7 (size 5000000)
vex: the `impossible' happened:
VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 3774583552 bytes allocated
vex storage: P total 640 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
host stacktrace:
==5547== at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378)
==5547== by 0x38085D14: report_and_quit (m_libcassert.c:449)
==5547== by 0x38085F51: panic (m_libcassert.c:525)
==5547== by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530)
==5547== by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535)
==5547== by 0x380A1EE2: failure_exit (m_translate.c:740)
==5547== by 0x38153748: vpanic (main_util.c:231)
==5547== by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171)
==5547== by 0x3818D15D: LibVEX_Alloc_inline (main_util.h:167)
==5547== by 0x3818D15D: addHInstr_SLOW (host_generic_regs.c:300)
==5547== by 0x3817A9A1: doRegisterAllocation (host_generic_reg_alloc2.c:1550)
==5547== by 0x3815167C: libvex_BackEnd (main_main.c:1122)
==5547== by 0x3815167C: LibVEX_Translate (main_main.c:1225)
==5547== by 0x380A4725: vgPlain_translate (m_translate.c:1770)
==5547== by 0x380DAD7B: handle_chain_me (scheduler.c:1080)
==5547== by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424)
==5547== by 0x380EBB56: thread_wrapper (syswrap-linux.c:103)
==5547== by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 5547)
==5547== at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149)
==5547== by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243)
The SIMD function:
000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>:
1df30: c5 fd 6f 4a 20 vmovdqa ymm1,YMMWORD PTR [rdx+0x20]
1df35: c5 fd 6f 52 40 vmovdqa ymm2,YMMWORD PTR [rdx+0x40]
1df3a: c5 fd 6f 5a 60 vmovdqa ymm3,YMMWORD PTR [rdx+0x60]
1df3f: c5 fd 6f aa a0 00 00 vmovdqa ymm5,YMMWORD PTR [rdx+0xa0]
1df46: 00
1df47: c5 fd 6f b2 c0 00 00 vmovdqa ymm6,YMMWORD PTR [rdx+0xc0]
1df4e: 00
1df4f: c5 fd 6f ba e0 00 00 vmovdqa ymm7,YMMWORD PTR [rdx+0xe0]
1df56: 00
1df57: c5 7d 6f 82 00 01 00 vmovdqa ymm8,YMMWORD PTR [rdx+0x100]
1df5e: 00
1df5f: c5 7d 6f 8a 20 01 00 vmovdqa ymm9,YMMWORD PTR [rdx+0x120]
1df66: 00
1df67: c5 7d 6f 92 40 01 00 vmovdqa ymm10,YMMWORD PTR [rdx+0x140]
1df6e: 00
1df6f: c5 7d 6f 9a 60 01 00 vmovdqa ymm11,YMMWORD PTR [rdx+0x160]
1df76: 00
1df77: c5 7d 6f a2 80 01 00 vmovdqa ymm12,YMMWORD PTR [rdx+0x180]
1df7e: 00
1df7f: c5 7d 6f aa a0 01 00 vmovdqa ymm13,YMMWORD PTR [rdx+0x1a0]
1df86: 00
1df87: c5 7d 6f b2 c0 01 00 vmovdqa ymm14,YMMWORD PTR [rdx+0x1c0]
1df8e: 00
1df8f: c5 7d 6f ba e0 01 00 vmovdqa ymm15,YMMWORD PTR [rdx+0x1e0]
1df96: 00
1df97: c5 85 69 c1 vpunpckhwd ymm0,ymm15,ymm1
1df9b: c5 05 61 f9 vpunpcklwd ymm15,ymm15,ymm1
[... only v* inst and a few lea ...]
1f3f4: c5 f8 77 vzeroupper
1f3f7: c3 ret
1f3f8: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0]
1f3ff: 00
and its caller:
0000000000000000 <checkasm_stack_clobber>:
0: 48 81 ec a8 00 00 00 sub rsp,0xa8
7: 48 c7 c6 a0 00 00 00 mov rsi,0xa0
000000000000000e <checkasm_stack_clobber.loop>:
e: 48 89 3c 34 mov QWORD PTR [rsp+rsi*1],rdi
12: 48 83 ee 08 sub rsi,0x8
16: 7d f6 jge e <checkasm_stack_clobber.loop>
18: 48 81 c4 a8 00 00 00 add rsp,0xa8
1f: c3 ret
0000000000000020 <checkasm_checked_call>:
20: 53 push rbx
21: 55 push rbp
22: 41 54 push r12
24: 41 55 push r13
26: 41 56 push r14
28: 41 57 push r15
2a: 48 81 ec 88 00 00 00 sub rsp,0x88
31: 49 89 fa mov r10,rdi
34: 48 8b bc 24 c0 00 00 mov rdi,QWORD PTR [rsp+0xc0]
3b: 00
3c: 48 8b b4 24 c8 00 00 mov rsi,QWORD PTR [rsp+0xc8]
43: 00
44: 48 8b 94 24 d0 00 00 mov rdx,QWORD PTR [rsp+0xd0]
4b: 00
4c: 48 8b 8c 24 d8 00 00 mov rcx,QWORD PTR [rsp+0xd8]
53: 00
54: 4c 8b 84 24 e0 00 00 mov r8,QWORD PTR [rsp+0xe0]
5b: 00
5c: 4c 8b 8c 24 e8 00 00 mov r9,QWORD PTR [rsp+0xe8]
63: 00
64: 48 8b 9c 24 f0 00 00 mov rbx,QWORD PTR [rsp+0xf0]
6b: 00
6c: 48 89 1c 24 mov QWORD PTR [rsp],rbx
70: 48 8b 9c 24 f8 00 00 mov rbx,QWORD PTR [rsp+0xf8]
77: 00
78: 48 89 5c 24 08 mov QWORD PTR [rsp+0x8],rbx
7d: 48 8b 9c 24 00 01 00 mov rbx,QWORD PTR [rsp+0x100]
84: 00
85: 48 89 5c 24 10 mov QWORD PTR [rsp+0x10],rbx
8a: 48 8b 9c 24 08 01 00 mov rbx,QWORD PTR [rsp+0x108]
91: 00
92: 48 89 5c 24 18 mov QWORD PTR [rsp+0x18],rbx
97: 48 8b 9c 24 10 01 00 mov rbx,QWORD PTR [rsp+0x110]
9e: 00
9f: 48 89 5c 24 20 mov QWORD PTR [rsp+0x20],rbx
a4: 48 8b 9c 24 18 01 00 mov rbx,QWORD PTR [rsp+0x118]
ab: 00
ac: 48 89 5c 24 28 mov QWORD PTR [rsp+0x28],rbx
b1: 48 8b 9c 24 20 01 00 mov rbx,QWORD PTR [rsp+0x120]
b8: 00
b9: 48 89 5c 24 30 mov QWORD PTR [rsp+0x30],rbx
be: 48 8b 9c 24 28 01 00 mov rbx,QWORD PTR [rsp+0x128]
c5: 00
c6: 48 89 5c 24 38 mov QWORD PTR [rsp+0x38],rbx
cb: 48 8b 9c 24 30 01 00 mov rbx,QWORD PTR [rsp+0x130]
d2: 00
d3: 48 89 5c 24 40 mov QWORD PTR [rsp+0x40],rbx
d8: 4c 8b 3c 25 00 00 00 mov r15,QWORD PTR ds:0x0
df: 00
e0: 4c 8b 34 25 00 00 00 mov r14,QWORD PTR ds:0x0
e7: 00
e8: 4c 8b 2c 25 00 00 00 mov r13,QWORD PTR ds:0x0
ef: 00
f0: 4c 8b 24 25 00 00 00 mov r12,QWORD PTR ds:0x0
f7: 00
f8: 48 8b 2c 25 00 00 00 mov rbp,QWORD PTR ds:0x0
ff: 00
100: 48 8b 1c 25 00 00 00 mov rbx,QWORD PTR ds:0x0
107: 00
108: 41 ff d2 call r10
10b: 4c 33 3c 25 00 00 00 xor r15,QWORD PTR ds:0x0
112: 00
113: 4d 09 ff or r15,r15
116: 4c 33 34 25 00 00 00 xor r14,QWORD PTR ds:0x0
11d: 00
11e: 4d 09 f7 or r15,r14
121: 4c 33 2c 25 00 00 00 xor r13,QWORD PTR ds:0x0
128: 00
129: 4d 09 ef or r15,r13
12c: 4c 33 24 25 00 00 00 xor r12,QWORD PTR ds:0x0
133: 00
134: 4d 09 e7 or r15,r12
137: 48 33 2c 25 00 00 00 xor rbp,QWORD PTR ds:0x0
13e: 00
13f: 49 09 ef or r15,rbp
142: 48 33 1c 25 00 00 00 xor rbx,QWORD PTR ds:0x0
149: 00
14a: 49 09 df or r15,rbx
14d: 74 1b je 16a <checkasm_checked_call.clobber_ok>
14f: 48 89 c3 mov rbx,rax
152: 48 89 d5 mov rbp,rdx
155: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0
15c: 00
15d: 31 c0 xor eax,eax
15f: e8 00 00 00 00 call 164 <checkasm_checked_call+0x144>
164: 48 89 ea mov rdx,rbp
167: 48 89 d8 mov rax,rbx
000000000000016a <checkasm_checked_call.clobber_ok>:
16a: 9b d9 34 24 fstenv [rsp]
16e: 66 81 7c 24 08 ff ff cmp WORD PTR [rsp+0x8],0xffff
175: 74 1d je 194 <checkasm_checked_call.emms_ok>
177: 48 89 c3 mov rbx,rax
17a: 48 89 d5 mov rbp,rdx
17d: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0
184: 00
185: 31 c0 xor eax,eax
187: e8 00 00 00 00 call 18c <checkasm_checked_call.clobber_ok+0x22>
18c: 48 89 ea mov rdx,rbp
18f: 48 89 d8 mov rax,rbx
192: 0f 77 emms
0000000000000194 <checkasm_checked_call.emms_ok>:
194: 48 81 c4 88 00 00 00 add rsp,0x88
19b: 41 5f pop r15
19d: 41 5e pop r14
19f: 41 5d pop r13
1a1: 41 5c pop r12
1a3: 5d pop rbp
1a4: 5b pop rbx
1a5: c3 ret
1a6: 66 66 0f 1f 84 00 00 data16 nop WORD PTR [rax+rax*1+0x0]
1ad: 00 00 00
00000000000001b0 <checkasm_checked_call_emms>:
1b0: 53 push rbx
1b1: 55 push rbp
1b2: 41 54 push r12
1b4: 41 55 push r13
1b6: 41 56 push r14
1b8: 41 57 push r15
1ba: 48 81 ec 88 00 00 00 sub rsp,0x88
1c1: 49 89 fa mov r10,rdi
1c4: 48 8b bc 24 c0 00 00 mov rdi,QWORD PTR [rsp+0xc0]
1cb: 00
1cc: 48 8b b4 24 c8 00 00 mov rsi,QWORD PTR [rsp+0xc8]
1d3: 00
1d4: 48 8b 94 24 d0 00 00 mov rdx,QWORD PTR [rsp+0xd0]
1db: 00
1dc: 48 8b 8c 24 d8 00 00 mov rcx,QWORD PTR [rsp+0xd8]
1e3: 00
1e4: 4c 8b 84 24 e0 00 00 mov r8,QWORD PTR [rsp+0xe0]
1eb: 00
1ec: 4c 8b 8c 24 e8 00 00 mov r9,QWORD PTR [rsp+0xe8]
1f3: 00
1f4: 48 8b 9c 24 f0 00 00 mov rbx,QWORD PTR [rsp+0xf0]
1fb: 00
1fc: 48 89 1c 24 mov QWORD PTR [rsp],rbx
200: 48 8b 9c 24 f8 00 00 mov rbx,QWORD PTR [rsp+0xf8]
207: 00
208: 48 89 5c 24 08 mov QWORD PTR [rsp+0x8],rbx
20d: 48 8b 9c 24 00 01 00 mov rbx,QWORD PTR [rsp+0x100]
214: 00
215: 48 89 5c 24 10 mov QWORD PTR [rsp+0x10],rbx
21a: 48 8b 9c 24 08 01 00 mov rbx,QWORD PTR [rsp+0x108]
221: 00
222: 48 89 5c 24 18 mov QWORD PTR [rsp+0x18],rbx
227: 48 8b 9c 24 10 01 00 mov rbx,QWORD PTR [rsp+0x110]
22e: 00
22f: 48 89 5c 24 20 mov QWORD PTR [rsp+0x20],rbx
234: 48 8b 9c 24 18 01 00 mov rbx,QWORD PTR [rsp+0x118]
23b: 00
23c: 48 89 5c 24 28 mov QWORD PTR [rsp+0x28],rbx
241: 48 8b 9c 24 20 01 00 mov rbx,QWORD PTR [rsp+0x120]
248: 00
249: 48 89 5c 24 30 mov QWORD PTR [rsp+0x30],rbx
24e: 48 8b 9c 24 28 01 00 mov rbx,QWORD PTR [rsp+0x128]
255: 00
256: 48 89 5c 24 38 mov QWORD PTR [rsp+0x38],rbx
25b: 48 8b 9c 24 30 01 00 mov rbx,QWORD PTR [rsp+0x130]
262: 00
263: 48 89 5c 24 40 mov QWORD PTR [rsp+0x40],rbx
268: 4c 8b 3c 25 00 00 00 mov r15,QWORD PTR ds:0x0
26f: 00
270: 4c 8b 34 25 00 00 00 mov r14,QWORD PTR ds:0x0
277: 00
278: 4c 8b 2c 25 00 00 00 mov r13,QWORD PTR ds:0x0
27f: 00
280: 4c 8b 24 25 00 00 00 mov r12,QWORD PTR ds:0x0
287: 00
288: 48 8b 2c 25 00 00 00 mov rbp,QWORD PTR ds:0x0
28f: 00
290: 48 8b 1c 25 00 00 00 mov rbx,QWORD PTR ds:0x0
297: 00
298: 41 ff d2 call r10
29b: 4c 33 3c 25 00 00 00 xor r15,QWORD PTR ds:0x0
2a2: 00
2a3: 4d 09 ff or r15,r15
2a6: 4c 33 34 25 00 00 00 xor r14,QWORD PTR ds:0x0
2ad: 00
2ae: 4d 09 f7 or r15,r14
2b1: 4c 33 2c 25 00 00 00 xor r13,QWORD PTR ds:0x0
2b8: 00
2b9: 4d 09 ef or r15,r13
2bc: 4c 33 24 25 00 00 00 xor r12,QWORD PTR ds:0x0
2c3: 00
2c4: 4d 09 e7 or r15,r12
2c7: 48 33 2c 25 00 00 00 xor rbp,QWORD PTR ds:0x0
2ce: 00
2cf: 49 09 ef or r15,rbp
2d2: 48 33 1c 25 00 00 00 xor rbx,QWORD PTR ds:0x0
2d9: 00
2da: 49 09 df or r15,rbx
2dd: 74 1b je 2fa <checkasm_checked_call_emms.clobber_ok>
2df: 48 89 c3 mov rbx,rax
2e2: 48 89 d5 mov rbp,rdx
2e5: 48 8d 3c 25 00 00 00 lea rdi,ds:0x0
2ec: 00
2ed: 31 c0 xor eax,eax
2ef: e8 00 00 00 00 call 2f4 <checkasm_checked_call_emms+0x144>
2f4: 48 89 ea mov rdx,rbp
2f7: 48 89 d8 mov rax,rbx
00000000000002fa <checkasm_checked_call_emms.clobber_ok>:
2fa: 0f 77 emms
2fc: 48 81 c4 88 00 00 00 add rsp,0x88
303: 41 5f pop r15
305: 41 5e pop r14
307: 41 5d pop r13
309: 41 5c pop r12
30b: 5d pop rbp
30c: 5b pop rbx
30d: c3 ret
(In reply to ux from comment #12) > Thread 1: status = VgTs_Runnable (lwpid 5547) > ==5547== at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149) > ==5547== by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243) > 000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>: > 1df30: c5 fd 6f 4a 20 vmovdqa ymm1,YMMWORD PTR [rdx+0x20] > [..] > 1df9b: c5 05 61 f9 vpunpcklwd ymm15,ymm15,ymm1 > > [... only v* inst and a few lea ...] This was unfortunately the bit I need to see. Specifically, I need to see the block containing the instruction at address xxxxx9BE. If I had to guess, it would be at 1e9be. If too complex then just send the entire disassembly of ff_vp9_idct_iadst_16x16_add_avx2 (but please attach as file, don't paste as comment.) Created attachment 104952 [details]
ff_vp9_idct_iadst_16x16_add_avx2
My bad. So this block? 1e978: c5 fd 72 e0 0e vpsrad ymm0,ymm0,0xe 1e97d: c5 cd 6b f5 vpackssdw ymm6,ymm6,ymm5 1e981: c5 2d 6b d0 vpackssdw ymm10,ymm10,ymm0 1e985: c4 41 45 fd fe vpaddw ymm15,ymm7,ymm14 1e98a: c5 0d f9 f7 vpsubw ymm14,ymm14,ymm7 1e98e: c4 c1 65 fd fd vpaddw ymm7,ymm3,ymm13 1e993: c5 15 f9 eb vpsubw ymm13,ymm13,ymm3 1e997: c5 fd 6f 2a vmovdqa ymm5,YMMWORD PTR [rdx] 1e99b: c5 fd 6f 82 80 00 00 vmovdqa ymm0,YMMWORD PTR [rdx+0x80] 1e9a2: 00 1e9a3: c5 fd 6f 62 20 vmovdqa ymm4,YMMWORD PTR [rdx+0x20] 1e9a8: c5 7d 6f 4a 40 vmovdqa ymm9,YMMWORD PTR [rdx+0x40] 1e9ad: c5 fd 6f 5a 60 vmovdqa ymm3,YMMWORD PTR [rdx+0x60] 1e9b2: c5 fd 7f 3a vmovdqa YMMWORD PTR [rdx],ymm7 1e9b6: c5 7d 7f ba 80 00 00 vmovdqa YMMWORD PTR [rdx+0x80],ymm15 1e9bd: 00 1e9be: c5 fd 7f 72 20 vmovdqa YMMWORD PTR [rdx+0x20],ymm6 1e9c3: c5 7d 7f 42 40 vmovdqa YMMWORD PTR [rdx+0x40],ymm8 1e9c8: c5 7d 7f 72 60 vmovdqa YMMWORD PTR [rdx+0x60],ymm14 1e9cd: c4 41 55 69 c4 vpunpckhwd ymm8,ymm5,ymm12 1e9d2: c4 c1 55 61 ec vpunpcklwd ymm5,ymm5,ymm12 1e9d7: c5 bd f5 3c 25 00 00 vpmaddwd ymm7,ymm8,YMMWORD PTR ds:0x0 1e9de: 00 00 1e9e0: c5 3d f5 04 25 00 00 vpmaddwd ymm8,ymm8,YMMWORD PTR ds:0x0 1e9e7: 00 00 1e9e9: c5 55 f5 24 25 00 00 vpmaddwd ymm12,ymm5,YMMWORD PTR ds:0x0 1e9f0: 00 00 1e9f2: c5 d5 f5 2c 25 00 00 vpmaddwd ymm5,ymm5,YMMWORD PTR ds:0x0 1e9f9: 00 00 1e9fb: c4 c1 65 69 f1 vpunpckhwd ymm6,ymm3,ymm9 1ea00: c4 c1 65 61 d9 vpunpcklwd ymm3,ymm3,ymm9 1ea05: c5 4d f5 34 25 00 00 vpmaddwd ymm14,ymm6,YMMWORD PTR ds:0x0 1ea0c: 00 00 1ea0e: c5 cd f5 34 25 00 00 vpmaddwd ymm6,ymm6,YMMWORD PTR ds:0x0 1ea15: 00 00 1ea17: c5 65 f5 0c 25 00 00 vpmaddwd ymm9,ymm3,YMMWORD PTR ds:0x0 1ea1e: 00 00 1ea20: c5 e5 f5 1c 25 00 00 vpmaddwd ymm3,ymm3,YMMWORD PTR ds:0x0 1ea27: 00 00 1ea29: c5 65 fe fd vpaddd ymm15,ymm3,ymm5 Full function in attachment. (In reply to ux from comment #15) > My bad. So this block? > Full function in attachment. Ok .. nearly there. Could you please disassemble the function using the normal AT&T syntax rather than then Intel syntax, so I don't have to hand-translate it to AT&T syntax myself in order to have a test case? Thanks. Created attachment 104959 [details]
ff_vp9_idct_iadst_16x16_add_avx2 (AT&T)
Try this. Does it help?
Index: priv/guest_amd64_toIR.c
===================================================================
--- priv/guest_amd64_toIR.c (revision 3345)
+++ priv/guest_amd64_toIR.c (working copy)
@@ -28152,6 +28152,7 @@
)
);
*uses_vvvv = True;
+ dres->hint = Dis_HintVerbose;
goto decode_success;
}
break;
Seems to work, all our VP9 tests pass with this. Thanks! (In reply to ux from comment #19) > Seems to work, all our VP9 tests pass with this. Thanks! Committed as vex r3346. |