Bug 375839

Summary: Temporary storage exhausted , when long sequence of vfmadd231ps instructions to be executed
Product: [Developer tools] valgrind Reporter: Jacek Czaja <jacek.czaja>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: crash CC: heetahke, jacek.czaja, nagendra.goel, nmanjofo, o1o2o3o4o5, rsbultje
Priority: NOR    
Version: 3.12.0   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: output from GDB showing faulty instruction sequence + log from run valgrind
ff_vp9_idct_iadst_16x16_add_avx2
ff_vp9_idct_iadst_16x16_add_avx2 (AT&T)

Description Jacek Czaja 2017-02-01 12:11:47 UTC
Created attachment 103747 [details]
output from GDB showing faulty instruction sequence + log from run valgrind

I noticed a valgrind crash with error: "Temporary storage exhusted" in a case
when there is long sequence(50+) of instructions:

..
vfmadd231ps ymm0,ymm1,ymm2
vfmadd231ps ymm0,ymm1,ymm2
vfmadd231ps ymm0,ymm1,ymm2
vfmadd231ps ymm0,ymm1,ymm2
....

In attachment there is full crash message and sequence of instructions as seen in GDB (intel syntax) that made this crash happen.

This sequence of instructions was created using JIt assembler xbyak (https://github.com/herumi/xbyak). I can attach linux project if needed.

Notes:
- valgrind was build from source (3.12) 
- commandline use: <path to valgrind 3.12>/bin/valgrind --tool=memcheck <my project with sequence of vfmadd231ps ymm0,ymm1,ymm2>
- Operating System: Fedora 21, uname -a:
Linux linux-brix 3.14.5 #1 SMP Fri Mar 13 16:27:51 CET 2015 x86_64 x86_64 x86_64 GNU/Linux
- without valgrind program works fine ,( does nothing as it was modified for exploiting this problem)
Comment 1 Julian Seward 2017-03-06 16:54:16 UTC
Yes, VEX has a very poor (verbose) translation for such instructions
and generates huge amounts of code, which breaks the JIT.  We should
fix this somehow.
Comment 2 Julian Seward 2017-03-06 17:05:16 UTC
*** Bug 377159 has been marked as a duplicate of this bug. ***
Comment 3 Julian Seward 2017-03-06 17:10:40 UTC
*** Bug 375150 has been marked as a duplicate of this bug. ***
Comment 4 Julian Seward 2017-03-27 13:10:45 UTC
I should add: as a workaround, you can try specifying

  --vex-guest-max-insns=25

and if that still doesn't work, lowering the value towards zero.
You shouldn't go below about 10.  Lower values reduce performance and
increase the risk of false errors.  The default value is 50.
Comment 5 Julian Seward 2017-03-27 13:11:46 UTC
*** Bug 378068 has been marked as a duplicate of this bug. ***
Comment 6 Eugene 2017-03-27 16:05:15 UTC
proposed WA with --vex-guest-max-insns=25 worked for me

Thanks
Comment 7 Julian Seward 2017-03-27 18:33:29 UTC
(In reply to Julian Seward from comment #1)
> VEX has a very poor (verbose) translation for such instructions [..]

VEX r3331 somewhat improves this, reducing the size of the generated code
to about 75% of what it was before.  Better than nothing.
Comment 8 Julian Seward 2017-03-28 15:00:28 UTC
VEX r3335 further improves the situation a bit.  I would be interested
to hear if this makes it possible to run these problematic cases without
the workaround in comment #4 (the use of --vex-guest-max-insns=25).

The only convincing fix though is to rewrite the amd64 front end translation
for FMA instructions to use fewer, wider IROps.  This isn't simple, though.
Comment 9 Julian Seward 2017-03-29 16:15:07 UTC
Fixed completely by vex r3337.  You should now be able to throw any
sequence of insns through the JIT without getting such failures and
without using the workaround in comment 4.  If the problems persist,
please let me know ASAP.
Comment 10 Ronald S. Bultje 2017-04-10 11:21:48 UTC
We're still seeing the issue in FFmpeg. This is valgrind r16297 with vex r3344.

==15422== Memcheck, a memory error detector
==15422== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15422== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==15422== Command: /home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/build/ffmpeg -nostdin -nostats -cpuflags all -hwaccel none -threads 1 -thread_type frame+slice -i /home/fate/fate-suite/vp9-test-vectors/vp90-2-00-quantizer-01.webm -flags +bitexact -fflags +bitexact -f framemd5 -
==15422== 
ffmpeg version N-85439-g03eb0515c1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 6.3.1 (GCC) 20170306
  configuration: --prefix=/home/fate/workdirs/x86_64-archlinux-gcc-valgrindundef/install --samples=/home/fate/fate-suite --enable-gpl --enable-memory-poisoning --enable-avresample --cc='ccache cc' --target-exec='/home/fate/src/valgrind/vg-in-place --error-exitcode=1 --malloc-fill=0xa2 --track-origins=yes --leak-check=full --gen-suppressions=all --suppressions=/home/fate/ffmpeg/tests/fate-valgrind.supp' --disable-stripping --disable-memory-poisoning
  libavutil      55. 60.100 / 55. 60.100
  libavcodec     57. 92.100 / 57. 92.100
  libavformat    57. 72.100 / 57. 72.100
  libavdevice    57.  7.100 / 57.  7.100
  libavfilter     6. 84.101 /  6. 84.101
  libavresample   3.  6.  0 /  3.  6.  0
  libswscale      4.  7.100 /  4.  7.100
  libswresample   2.  8.100 /  2.  8.100
  libpostproc    54.  6.100 / 54.  6.100
VEX temporary storage exhausted.
Pool = TEMP,  start 0x38fbed68 curr 0x39477e28 end 0x394838a7 (size 5000000)

vex: the `impossible' happened:
   VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 946892552 bytes allocated
vex storage: P total 640 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==15422==    at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378)
==15422==    by 0x38085D14: report_and_quit (m_libcassert.c:449)
==15422==    by 0x38085F51: panic (m_libcassert.c:525)
==15422==    by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530)
==15422==    by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535)
==15422==    by 0x380A1EE2: failure_exit (m_translate.c:740)
==15422==    by 0x38153748: vpanic (main_util.c:231)
==15422==    by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171)
==15422==    by 0x38179650: LibVEX_Alloc_inline (main_util.h:167)
==15422==    by 0x38179650: doRegisterAllocation (host_generic_reg_alloc2.c:517)
==15422==    by 0x3815167C: libvex_BackEnd (main_main.c:1122)
==15422==    by 0x3815167C: LibVEX_Translate (main_main.c:1225)
==15422==    by 0x380A4725: vgPlain_translate (m_translate.c:1770)
==15422==    by 0x380DAD7B: handle_chain_me (scheduler.c:1080)
==15422==    by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424)
==15422==    by 0x380EBB56: thread_wrapper (syswrap-linux.c:103)
==15422==    by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 15422)
==15422==    at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151)
==15422==    by 0xB7D794: ff_vp9_decode_block (vp9block.c:1387)
==15422==    by 0x1: ???
==15422==    by 0x70CB63F: ???
==15422==    by 0xB: ???


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.
Comment 11 Julian Seward 2017-04-10 11:38:13 UTC
Uh, I thought I fixed it pretty comprehensively in r3344.  Anyway,
can you get me please a copy of the basic block that contains the
failing address?

  at 0xCC33BF: ff_vp9_iadst_iadst_16x16_add_avx2 (vp9itxfm.asm:2151)

Not just starting from there, but backing up to where the block starts
and all the way to a conditional or indirect branch that finishes it.
And .. please .. disassembly?  So I don't have to wade through layers
of assembler macros to figure out what the actual instructions are.
Comment 12 ux 2017-04-10 15:30:31 UTC
Note: the issue only happens when using --track-origins=yes

Not sure if that's really what you are asking for, but with the following similar but simpler case:

☭ /home/ux/src/valgrind/vg-in-place --track-origins=yes tests/checkasm/checkasm --test=vp9dsp
==5547== Memcheck, a memory error detector                                                                 
==5547== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5547== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==5547== Command: tests/checkasm/checkasm --test=vp9dsp
==5547== 
checkasm: using random seed 2405440599
[...]
AVX2:
 - vp9dsp.ipred      [OK]
VEX temporary storage exhausted.
Pool = TEMP,  start 0x38fbed68 curr 0x3946ee10 end 0x394838a7 (size 5000000)

vex: the `impossible' happened:
   VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 3774583552 bytes allocated
vex storage: P total 640 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==5547==    at 0x38085C13: show_sched_status_wrk (m_libcassert.c:378)
==5547==    by 0x38085D14: report_and_quit (m_libcassert.c:449)
==5547==    by 0x38085F51: panic (m_libcassert.c:525)
==5547==    by 0x38085F51: vgPlain_core_panic_at (m_libcassert.c:530)
==5547==    by 0x38085F7A: vgPlain_core_panic (m_libcassert.c:535)
==5547==    by 0x380A1EE2: failure_exit (m_translate.c:740)
==5547==    by 0x38153748: vpanic (main_util.c:231)
==5547==    by 0x381537B4: private_LibVEX_alloc_OOM (main_util.c:171)
==5547==    by 0x3818D15D: LibVEX_Alloc_inline (main_util.h:167)
==5547==    by 0x3818D15D: addHInstr_SLOW (host_generic_regs.c:300)
==5547==    by 0x3817A9A1: doRegisterAllocation (host_generic_reg_alloc2.c:1550)
==5547==    by 0x3815167C: libvex_BackEnd (main_main.c:1122)
==5547==    by 0x3815167C: LibVEX_Translate (main_main.c:1225)
==5547==    by 0x380A4725: vgPlain_translate (m_translate.c:1770)
==5547==    by 0x380DAD7B: handle_chain_me (scheduler.c:1080)
==5547==    by 0x380DC73F: vgPlain_scheduler (scheduler.c:1424)
==5547==    by 0x380EBB56: thread_wrapper (syswrap-linux.c:103)
==5547==    by 0x380EBB56: run_a_thread_NORETURN (syswrap-linux.c:156)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 5547)
==5547==    at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149)
==5547==    by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243)


The SIMD function:

000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>:
   1df30:	c5 fd 6f 4a 20       	vmovdqa ymm1,YMMWORD PTR [rdx+0x20]
   1df35:	c5 fd 6f 52 40       	vmovdqa ymm2,YMMWORD PTR [rdx+0x40]
   1df3a:	c5 fd 6f 5a 60       	vmovdqa ymm3,YMMWORD PTR [rdx+0x60]
   1df3f:	c5 fd 6f aa a0 00 00 	vmovdqa ymm5,YMMWORD PTR [rdx+0xa0]
   1df46:	00 
   1df47:	c5 fd 6f b2 c0 00 00 	vmovdqa ymm6,YMMWORD PTR [rdx+0xc0]
   1df4e:	00 
   1df4f:	c5 fd 6f ba e0 00 00 	vmovdqa ymm7,YMMWORD PTR [rdx+0xe0]
   1df56:	00 
   1df57:	c5 7d 6f 82 00 01 00 	vmovdqa ymm8,YMMWORD PTR [rdx+0x100]
   1df5e:	00 
   1df5f:	c5 7d 6f 8a 20 01 00 	vmovdqa ymm9,YMMWORD PTR [rdx+0x120]
   1df66:	00 
   1df67:	c5 7d 6f 92 40 01 00 	vmovdqa ymm10,YMMWORD PTR [rdx+0x140]
   1df6e:	00 
   1df6f:	c5 7d 6f 9a 60 01 00 	vmovdqa ymm11,YMMWORD PTR [rdx+0x160]
   1df76:	00 
   1df77:	c5 7d 6f a2 80 01 00 	vmovdqa ymm12,YMMWORD PTR [rdx+0x180]
   1df7e:	00 
   1df7f:	c5 7d 6f aa a0 01 00 	vmovdqa ymm13,YMMWORD PTR [rdx+0x1a0]
   1df86:	00 
   1df87:	c5 7d 6f b2 c0 01 00 	vmovdqa ymm14,YMMWORD PTR [rdx+0x1c0]
   1df8e:	00 
   1df8f:	c5 7d 6f ba e0 01 00 	vmovdqa ymm15,YMMWORD PTR [rdx+0x1e0]
   1df96:	00 
   1df97:	c5 85 69 c1          	vpunpckhwd ymm0,ymm15,ymm1
   1df9b:	c5 05 61 f9          	vpunpcklwd ymm15,ymm15,ymm1
   
   [... only v* inst and a few lea ...]

   1f3f4:	c5 f8 77             	vzeroupper 
   1f3f7:	c3                   	ret    
   1f3f8:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
   1f3ff:	00 


and its caller:

0000000000000000 <checkasm_stack_clobber>:
   0:	48 81 ec a8 00 00 00 	sub    rsp,0xa8
   7:	48 c7 c6 a0 00 00 00 	mov    rsi,0xa0

000000000000000e <checkasm_stack_clobber.loop>:
   e:	48 89 3c 34          	mov    QWORD PTR [rsp+rsi*1],rdi
  12:	48 83 ee 08          	sub    rsi,0x8
  16:	7d f6                	jge    e <checkasm_stack_clobber.loop>
  18:	48 81 c4 a8 00 00 00 	add    rsp,0xa8
  1f:	c3                   	ret    

0000000000000020 <checkasm_checked_call>:
  20:	53                   	push   rbx
  21:	55                   	push   rbp
  22:	41 54                	push   r12
  24:	41 55                	push   r13
  26:	41 56                	push   r14
  28:	41 57                	push   r15
  2a:	48 81 ec 88 00 00 00 	sub    rsp,0x88
  31:	49 89 fa             	mov    r10,rdi
  34:	48 8b bc 24 c0 00 00 	mov    rdi,QWORD PTR [rsp+0xc0]
  3b:	00 
  3c:	48 8b b4 24 c8 00 00 	mov    rsi,QWORD PTR [rsp+0xc8]
  43:	00 
  44:	48 8b 94 24 d0 00 00 	mov    rdx,QWORD PTR [rsp+0xd0]
  4b:	00 
  4c:	48 8b 8c 24 d8 00 00 	mov    rcx,QWORD PTR [rsp+0xd8]
  53:	00 
  54:	4c 8b 84 24 e0 00 00 	mov    r8,QWORD PTR [rsp+0xe0]
  5b:	00 
  5c:	4c 8b 8c 24 e8 00 00 	mov    r9,QWORD PTR [rsp+0xe8]
  63:	00 
  64:	48 8b 9c 24 f0 00 00 	mov    rbx,QWORD PTR [rsp+0xf0]
  6b:	00 
  6c:	48 89 1c 24          	mov    QWORD PTR [rsp],rbx
  70:	48 8b 9c 24 f8 00 00 	mov    rbx,QWORD PTR [rsp+0xf8]
  77:	00 
  78:	48 89 5c 24 08       	mov    QWORD PTR [rsp+0x8],rbx
  7d:	48 8b 9c 24 00 01 00 	mov    rbx,QWORD PTR [rsp+0x100]
  84:	00 
  85:	48 89 5c 24 10       	mov    QWORD PTR [rsp+0x10],rbx
  8a:	48 8b 9c 24 08 01 00 	mov    rbx,QWORD PTR [rsp+0x108]
  91:	00 
  92:	48 89 5c 24 18       	mov    QWORD PTR [rsp+0x18],rbx
  97:	48 8b 9c 24 10 01 00 	mov    rbx,QWORD PTR [rsp+0x110]
  9e:	00 
  9f:	48 89 5c 24 20       	mov    QWORD PTR [rsp+0x20],rbx
  a4:	48 8b 9c 24 18 01 00 	mov    rbx,QWORD PTR [rsp+0x118]
  ab:	00 
  ac:	48 89 5c 24 28       	mov    QWORD PTR [rsp+0x28],rbx
  b1:	48 8b 9c 24 20 01 00 	mov    rbx,QWORD PTR [rsp+0x120]
  b8:	00 
  b9:	48 89 5c 24 30       	mov    QWORD PTR [rsp+0x30],rbx
  be:	48 8b 9c 24 28 01 00 	mov    rbx,QWORD PTR [rsp+0x128]
  c5:	00 
  c6:	48 89 5c 24 38       	mov    QWORD PTR [rsp+0x38],rbx
  cb:	48 8b 9c 24 30 01 00 	mov    rbx,QWORD PTR [rsp+0x130]
  d2:	00 
  d3:	48 89 5c 24 40       	mov    QWORD PTR [rsp+0x40],rbx
  d8:	4c 8b 3c 25 00 00 00 	mov    r15,QWORD PTR ds:0x0
  df:	00 
  e0:	4c 8b 34 25 00 00 00 	mov    r14,QWORD PTR ds:0x0
  e7:	00 
  e8:	4c 8b 2c 25 00 00 00 	mov    r13,QWORD PTR ds:0x0
  ef:	00 
  f0:	4c 8b 24 25 00 00 00 	mov    r12,QWORD PTR ds:0x0
  f7:	00 
  f8:	48 8b 2c 25 00 00 00 	mov    rbp,QWORD PTR ds:0x0
  ff:	00 
 100:	48 8b 1c 25 00 00 00 	mov    rbx,QWORD PTR ds:0x0
 107:	00 
 108:	41 ff d2             	call   r10
 10b:	4c 33 3c 25 00 00 00 	xor    r15,QWORD PTR ds:0x0
 112:	00 
 113:	4d 09 ff             	or     r15,r15
 116:	4c 33 34 25 00 00 00 	xor    r14,QWORD PTR ds:0x0
 11d:	00 
 11e:	4d 09 f7             	or     r15,r14
 121:	4c 33 2c 25 00 00 00 	xor    r13,QWORD PTR ds:0x0
 128:	00 
 129:	4d 09 ef             	or     r15,r13
 12c:	4c 33 24 25 00 00 00 	xor    r12,QWORD PTR ds:0x0
 133:	00 
 134:	4d 09 e7             	or     r15,r12
 137:	48 33 2c 25 00 00 00 	xor    rbp,QWORD PTR ds:0x0
 13e:	00 
 13f:	49 09 ef             	or     r15,rbp
 142:	48 33 1c 25 00 00 00 	xor    rbx,QWORD PTR ds:0x0
 149:	00 
 14a:	49 09 df             	or     r15,rbx
 14d:	74 1b                	je     16a <checkasm_checked_call.clobber_ok>
 14f:	48 89 c3             	mov    rbx,rax
 152:	48 89 d5             	mov    rbp,rdx
 155:	48 8d 3c 25 00 00 00 	lea    rdi,ds:0x0
 15c:	00 
 15d:	31 c0                	xor    eax,eax
 15f:	e8 00 00 00 00       	call   164 <checkasm_checked_call+0x144>
 164:	48 89 ea             	mov    rdx,rbp
 167:	48 89 d8             	mov    rax,rbx

000000000000016a <checkasm_checked_call.clobber_ok>:
 16a:	9b d9 34 24          	fstenv [rsp]
 16e:	66 81 7c 24 08 ff ff 	cmp    WORD PTR [rsp+0x8],0xffff
 175:	74 1d                	je     194 <checkasm_checked_call.emms_ok>
 177:	48 89 c3             	mov    rbx,rax
 17a:	48 89 d5             	mov    rbp,rdx
 17d:	48 8d 3c 25 00 00 00 	lea    rdi,ds:0x0
 184:	00 
 185:	31 c0                	xor    eax,eax
 187:	e8 00 00 00 00       	call   18c <checkasm_checked_call.clobber_ok+0x22>
 18c:	48 89 ea             	mov    rdx,rbp
 18f:	48 89 d8             	mov    rax,rbx
 192:	0f 77                	emms   

0000000000000194 <checkasm_checked_call.emms_ok>:
 194:	48 81 c4 88 00 00 00 	add    rsp,0x88
 19b:	41 5f                	pop    r15
 19d:	41 5e                	pop    r14
 19f:	41 5d                	pop    r13
 1a1:	41 5c                	pop    r12
 1a3:	5d                   	pop    rbp
 1a4:	5b                   	pop    rbx
 1a5:	c3                   	ret    
 1a6:	66 66 0f 1f 84 00 00 	data16 nop WORD PTR [rax+rax*1+0x0]
 1ad:	00 00 00 

00000000000001b0 <checkasm_checked_call_emms>:
 1b0:	53                   	push   rbx
 1b1:	55                   	push   rbp
 1b2:	41 54                	push   r12
 1b4:	41 55                	push   r13
 1b6:	41 56                	push   r14
 1b8:	41 57                	push   r15
 1ba:	48 81 ec 88 00 00 00 	sub    rsp,0x88
 1c1:	49 89 fa             	mov    r10,rdi
 1c4:	48 8b bc 24 c0 00 00 	mov    rdi,QWORD PTR [rsp+0xc0]
 1cb:	00 
 1cc:	48 8b b4 24 c8 00 00 	mov    rsi,QWORD PTR [rsp+0xc8]
 1d3:	00 
 1d4:	48 8b 94 24 d0 00 00 	mov    rdx,QWORD PTR [rsp+0xd0]
 1db:	00 
 1dc:	48 8b 8c 24 d8 00 00 	mov    rcx,QWORD PTR [rsp+0xd8]
 1e3:	00 
 1e4:	4c 8b 84 24 e0 00 00 	mov    r8,QWORD PTR [rsp+0xe0]
 1eb:	00 
 1ec:	4c 8b 8c 24 e8 00 00 	mov    r9,QWORD PTR [rsp+0xe8]
 1f3:	00 
 1f4:	48 8b 9c 24 f0 00 00 	mov    rbx,QWORD PTR [rsp+0xf0]
 1fb:	00 
 1fc:	48 89 1c 24          	mov    QWORD PTR [rsp],rbx
 200:	48 8b 9c 24 f8 00 00 	mov    rbx,QWORD PTR [rsp+0xf8]
 207:	00 
 208:	48 89 5c 24 08       	mov    QWORD PTR [rsp+0x8],rbx
 20d:	48 8b 9c 24 00 01 00 	mov    rbx,QWORD PTR [rsp+0x100]
 214:	00 
 215:	48 89 5c 24 10       	mov    QWORD PTR [rsp+0x10],rbx
 21a:	48 8b 9c 24 08 01 00 	mov    rbx,QWORD PTR [rsp+0x108]
 221:	00 
 222:	48 89 5c 24 18       	mov    QWORD PTR [rsp+0x18],rbx
 227:	48 8b 9c 24 10 01 00 	mov    rbx,QWORD PTR [rsp+0x110]
 22e:	00 
 22f:	48 89 5c 24 20       	mov    QWORD PTR [rsp+0x20],rbx
 234:	48 8b 9c 24 18 01 00 	mov    rbx,QWORD PTR [rsp+0x118]
 23b:	00 
 23c:	48 89 5c 24 28       	mov    QWORD PTR [rsp+0x28],rbx
 241:	48 8b 9c 24 20 01 00 	mov    rbx,QWORD PTR [rsp+0x120]
 248:	00 
 249:	48 89 5c 24 30       	mov    QWORD PTR [rsp+0x30],rbx
 24e:	48 8b 9c 24 28 01 00 	mov    rbx,QWORD PTR [rsp+0x128]
 255:	00 
 256:	48 89 5c 24 38       	mov    QWORD PTR [rsp+0x38],rbx
 25b:	48 8b 9c 24 30 01 00 	mov    rbx,QWORD PTR [rsp+0x130]
 262:	00 
 263:	48 89 5c 24 40       	mov    QWORD PTR [rsp+0x40],rbx
 268:	4c 8b 3c 25 00 00 00 	mov    r15,QWORD PTR ds:0x0
 26f:	00 
 270:	4c 8b 34 25 00 00 00 	mov    r14,QWORD PTR ds:0x0
 277:	00 
 278:	4c 8b 2c 25 00 00 00 	mov    r13,QWORD PTR ds:0x0
 27f:	00 
 280:	4c 8b 24 25 00 00 00 	mov    r12,QWORD PTR ds:0x0
 287:	00 
 288:	48 8b 2c 25 00 00 00 	mov    rbp,QWORD PTR ds:0x0
 28f:	00 
 290:	48 8b 1c 25 00 00 00 	mov    rbx,QWORD PTR ds:0x0
 297:	00 
 298:	41 ff d2             	call   r10
 29b:	4c 33 3c 25 00 00 00 	xor    r15,QWORD PTR ds:0x0
 2a2:	00 
 2a3:	4d 09 ff             	or     r15,r15
 2a6:	4c 33 34 25 00 00 00 	xor    r14,QWORD PTR ds:0x0
 2ad:	00 
 2ae:	4d 09 f7             	or     r15,r14
 2b1:	4c 33 2c 25 00 00 00 	xor    r13,QWORD PTR ds:0x0
 2b8:	00 
 2b9:	4d 09 ef             	or     r15,r13
 2bc:	4c 33 24 25 00 00 00 	xor    r12,QWORD PTR ds:0x0
 2c3:	00 
 2c4:	4d 09 e7             	or     r15,r12
 2c7:	48 33 2c 25 00 00 00 	xor    rbp,QWORD PTR ds:0x0
 2ce:	00 
 2cf:	49 09 ef             	or     r15,rbp
 2d2:	48 33 1c 25 00 00 00 	xor    rbx,QWORD PTR ds:0x0
 2d9:	00 
 2da:	49 09 df             	or     r15,rbx
 2dd:	74 1b                	je     2fa <checkasm_checked_call_emms.clobber_ok>
 2df:	48 89 c3             	mov    rbx,rax
 2e2:	48 89 d5             	mov    rbp,rdx
 2e5:	48 8d 3c 25 00 00 00 	lea    rdi,ds:0x0
 2ec:	00 
 2ed:	31 c0                	xor    eax,eax
 2ef:	e8 00 00 00 00       	call   2f4 <checkasm_checked_call_emms+0x144>
 2f4:	48 89 ea             	mov    rdx,rbp
 2f7:	48 89 d8             	mov    rax,rbx

00000000000002fa <checkasm_checked_call_emms.clobber_ok>:
 2fa:	0f 77                	emms   
 2fc:	48 81 c4 88 00 00 00 	add    rsp,0x88
 303:	41 5f                	pop    r15
 305:	41 5e                	pop    r14
 307:	41 5d                	pop    r13
 309:	41 5c                	pop    r12
 30b:	5d                   	pop    rbp
 30c:	5b                   	pop    rbx
 30d:	c3                   	ret
Comment 13 Julian Seward 2017-04-10 15:58:20 UTC
(In reply to ux from comment #12)

> Thread 1: status = VgTs_Runnable (lwpid 5547)
> ==5547==    at 0x5BF9BE: ff_vp9_idct_iadst_16x16_add_avx2 (vp9itxfm.asm:2149)
> ==5547==    by 0x42DD6A: checkasm_checked_call_emms (checkasm.asm:243)

> 000000000001df30 <ff_vp9_idct_iadst_16x16_add_avx2>:
>    1df30:	c5 fd 6f 4a 20       	vmovdqa ymm1,YMMWORD PTR [rdx+0x20]
> [..]
>    1df9b:	c5 05 61 f9          	vpunpcklwd ymm15,ymm15,ymm1
>    
>    [... only v* inst and a few lea ...]

This was unfortunately the bit I need to see.  Specifically, I need to
see the block containing the instruction at address xxxxx9BE.  If I had
to guess, it would be at 1e9be.  If too complex then just send the entire
disassembly of ff_vp9_idct_iadst_16x16_add_avx2 (but please attach as file,
don't paste as comment.)
Comment 14 ux 2017-04-10 17:17:27 UTC
Created attachment 104952 [details]
ff_vp9_idct_iadst_16x16_add_avx2
Comment 15 ux 2017-04-10 17:18:13 UTC
My bad. So this block?

   1e978:       c5 fd 72 e0 0e          vpsrad ymm0,ymm0,0xe
   1e97d:       c5 cd 6b f5             vpackssdw ymm6,ymm6,ymm5
   1e981:       c5 2d 6b d0             vpackssdw ymm10,ymm10,ymm0
   1e985:       c4 41 45 fd fe          vpaddw ymm15,ymm7,ymm14
   1e98a:       c5 0d f9 f7             vpsubw ymm14,ymm14,ymm7
   1e98e:       c4 c1 65 fd fd          vpaddw ymm7,ymm3,ymm13
   1e993:       c5 15 f9 eb             vpsubw ymm13,ymm13,ymm3
   1e997:       c5 fd 6f 2a             vmovdqa ymm5,YMMWORD PTR [rdx]
   1e99b:       c5 fd 6f 82 80 00 00    vmovdqa ymm0,YMMWORD PTR [rdx+0x80]
   1e9a2:       00 
   1e9a3:       c5 fd 6f 62 20          vmovdqa ymm4,YMMWORD PTR [rdx+0x20]
   1e9a8:       c5 7d 6f 4a 40          vmovdqa ymm9,YMMWORD PTR [rdx+0x40]
   1e9ad:       c5 fd 6f 5a 60          vmovdqa ymm3,YMMWORD PTR [rdx+0x60]
   1e9b2:       c5 fd 7f 3a             vmovdqa YMMWORD PTR [rdx],ymm7
   1e9b6:       c5 7d 7f ba 80 00 00    vmovdqa YMMWORD PTR [rdx+0x80],ymm15
   1e9bd:       00 
   1e9be:       c5 fd 7f 72 20          vmovdqa YMMWORD PTR [rdx+0x20],ymm6
   1e9c3:       c5 7d 7f 42 40          vmovdqa YMMWORD PTR [rdx+0x40],ymm8
   1e9c8:       c5 7d 7f 72 60          vmovdqa YMMWORD PTR [rdx+0x60],ymm14
   1e9cd:       c4 41 55 69 c4          vpunpckhwd ymm8,ymm5,ymm12
   1e9d2:       c4 c1 55 61 ec          vpunpcklwd ymm5,ymm5,ymm12
   1e9d7:       c5 bd f5 3c 25 00 00    vpmaddwd ymm7,ymm8,YMMWORD PTR ds:0x0
   1e9de:       00 00 
   1e9e0:       c5 3d f5 04 25 00 00    vpmaddwd ymm8,ymm8,YMMWORD PTR ds:0x0
   1e9e7:       00 00 
   1e9e9:       c5 55 f5 24 25 00 00    vpmaddwd ymm12,ymm5,YMMWORD PTR ds:0x0
   1e9f0:       00 00 
   1e9f2:       c5 d5 f5 2c 25 00 00    vpmaddwd ymm5,ymm5,YMMWORD PTR ds:0x0
   1e9f9:       00 00 
   1e9fb:       c4 c1 65 69 f1          vpunpckhwd ymm6,ymm3,ymm9
   1ea00:       c4 c1 65 61 d9          vpunpcklwd ymm3,ymm3,ymm9
   1ea05:       c5 4d f5 34 25 00 00    vpmaddwd ymm14,ymm6,YMMWORD PTR ds:0x0
   1ea0c:       00 00 
   1ea0e:       c5 cd f5 34 25 00 00    vpmaddwd ymm6,ymm6,YMMWORD PTR ds:0x0
   1ea15:       00 00 
   1ea17:       c5 65 f5 0c 25 00 00    vpmaddwd ymm9,ymm3,YMMWORD PTR ds:0x0
   1ea1e:       00 00 
   1ea20:       c5 e5 f5 1c 25 00 00    vpmaddwd ymm3,ymm3,YMMWORD PTR ds:0x0
   1ea27:       00 00 
   1ea29:       c5 65 fe fd             vpaddd ymm15,ymm3,ymm5

Full function in attachment.
Comment 16 Julian Seward 2017-04-11 05:21:18 UTC
(In reply to ux from comment #15)
> My bad. So this block?
> Full function in attachment.

Ok .. nearly there.  Could you please disassemble the function using the
normal AT&T syntax rather than then Intel syntax, so I don't have to
hand-translate it to AT&T syntax myself in order to have a test case?
Thanks.
Comment 17 ux 2017-04-11 05:52:04 UTC
Created attachment 104959 [details]
ff_vp9_idct_iadst_16x16_add_avx2 (AT&T)
Comment 18 Julian Seward 2017-04-11 11:46:30 UTC
Try this.  Does it help?

Index: priv/guest_amd64_toIR.c
===================================================================
--- priv/guest_amd64_toIR.c	(revision 3345)
+++ priv/guest_amd64_toIR.c	(working copy)
@@ -28152,6 +28152,7 @@
             )
          );
          *uses_vvvv = True;
+         dres->hint = Dis_HintVerbose;
          goto decode_success;
       }
       break;
Comment 19 ux 2017-04-11 12:18:08 UTC
Seems to work, all our VP9 tests pass with this. Thanks!
Comment 20 Julian Seward 2017-04-11 16:35:37 UTC
(In reply to ux from comment #19)
> Seems to work, all our VP9 tests pass with this. Thanks!

Committed as vex r3346.