Bug 398545

Summary:	Support for SHA instruction on Ryzen
Product:	[Developer tools] valgrind	Reporter:	Eric Hoffman <ehoffman>
Component:	vex	Assignee:	Julian Seward <jseward>
Status:	REPORTED ---
Severity:	normal	CC:	hy110001, michal.privoznik, pjfloyd, rurban, sam, tom
Priority:	NOR
Version First Reported In:	3.14 SVN
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Eric Hoffman 2018-09-12 15:18:17 UTC

Setup:

Ryzen 2700X

Issue:

2 issues really.

The first one, the cpuid instruction does not return the sha instruction extension bit set (CPUID leaf 7, subleaf 0, result EBX bit 29)

In fact, my CPU is returning, for cpuid leaf 7, subleaf 0 (while NOT in Valgrind): EBX:0x209C01A9
And, while running in Valgrind: EBX:0x000027AB

So, the detection code does not detect the SHA instruction extension support.


Second, forcing the SHA instructions to be executed, Valgrind abort with:

vex amd64->IR: unhandled instruction bytes: 0xF 0x38 0xCB 0xCA 0xC5 0xF8 0x28 0xC1 0xC5 0xF8
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==32224== valgrind: Unrecognised instruction at address 0x11cdd3.

In this case, it's the SHA256RNDS2 instruction (0F 38 CB) that trigger the abort

Regards,
Eric

Comment 1 Tom Hughes 2018-09-12 15:35:54 UTC

That's why we remove that flag from the CPUID response - because we don't support it.

Comment 2 Eric Hoffman 2018-09-12 17:48:32 UTC

Ok, that answer question 1 (by design), but question 2 still remain (and cpuid will certainly follow the fix).

I have not looked at the code yet, but is there a reason why it's not supported?  Is it because of implementation issues, or because it's "just not yet implemented"?

This probably could be classified as 'feature implementation' rather than a bug then, i guess...

Best regards,
Eric

Comment 3 Tom Hughes 2018-09-12 18:15:11 UTC

Well the most obvious would be because nobody has submitted an implementation yet...

If that's an AMD specific instruction then in general I'm not sure we have anything much in the way of support for those - not sure how much of that is deliberate and how much is just that the Intel ones are much more popular,

Comment 4 Reini Urban 2019-12-13 18:25:21 UTC

These new SHA extensions are supported on amd since epyc, on intel since Goldmont (2017), and on recent arm's and power8.

https://software.intel.com/en-us/articles/intel-sha-extensions

How to add vex support for it? Sounds trivial.
binutils/objdump can do it for a long time.

Comment 5 Reini Urban 2019-12-13 19:46:19 UTC

> How to add vex support for it? Sounds trivial.
> binutils/objdump can do it for a long time.

I started with that at https://github.com/rurban/valgrind
linux names it sha_ni, freebsd SHA1,SHA2, 
on Windows it's Family 3, cpu Model >= 92 on Intel and cpu Model >= 23 on amd.

But for adding the necessary logic stubs my 30 min self-intro into the code is certainly not enough. There shouldn't be much logic needed I think. Similar to the aesdec and crc insn, which do exist already. 
I haven't even found the location where hwcaps are set.

Comment 6 Michal Prívozník 2020-06-06 09:11:29 UTC

Just bought a new machine (Ryzen 9 3900X) and hit exactly this bug. I've tried to write a patch, but my VEX skills are poor.

Comment 7 Macro Hoober 2021-01-09 08:01:04 UTC

Another case:

Setup:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7302P 16-Core Processor
stepping        : 0
microcode       : 0x8301034
cpu MHz         : 1499.828
cache size      : 512 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd ibrs ibpb stibp
vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists
pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 6000.01
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]


Issue:

 28 vex amd64->IR: unhandled instruction bytes: 0xF 0x38 0xCC 0xFA 0xF 0x38 0xCB 0xD9 0xC5 0xF9                                                                                                                                                                                 
 29 vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0                                                                                                                                                                                                                      
 30 vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38                                                                                                                                                                                                                       
 31 vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0                                                                                                                                                                                                                                 
 32 ==10543== valgrind: Unrecognised instruction at address 0x1b9fbcc.                                                                                                                                                                                                          
 33 ==10543==    at 0x1B9FBCC: _mm_sha256msg1_epu32 (sha.rs:100)

Comment 8 Macro Hoober 2021-01-09 08:06:52 UTC

(In reply to Reini Urban from comment #5)
> > How to add vex support for it? Sounds trivial.
> > binutils/objdump can do it for a long time.
> 
> I started with that at https://github.com/rurban/valgrind
> linux names it sha_ni, freebsd SHA1,SHA2, 
> on Windows it's Family 3, cpu Model >= 92 on Intel and cpu Model >= 23 on
> amd.
> 
> But for adding the necessary logic stubs my 30 min self-intro into the code
> is certainly not enough. There shouldn't be much logic needed I think.
> Similar to the aesdec and crc insn, which do exist already. 
> I haven't even found the location where hwcaps are set.

Thanks for the great work! After tested the patch "amd64: WIP start implementing the amd64 SHA extensions" based on 3.16.1 and get the error as following comments,
https://github.com/rurban/valgrind/commit/f0fc15e32bba3fdd9d84e1ea7fd44916c4ff3d54