Summary: | s390x: z14 vector instructions not implemented | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Vadim Barkov <vbrkov> |
Component: | vex | Assignee: | Andreas Arnez <arnez> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | arnez, jseward, mark, vbrkov |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
s390x: Support for z14 (arch12) vector instructions
[v2] s390x: Support for z14 (arch12) vector instructions [v3] s390x: Support for z14 (arch12) vector instructions |
Description
Vadim Barkov
2019-02-08 01:25:03 UTC
The vector decimal instructions are special because Valgrind doesn't even support any of the existing decimal instructions (chapter 8) yet. Thus I suggest to split that out and limit this Bugzilla to (1) the vector-enhancements facility 1, plus (2) the few instructions from the vector-packed decimal facility defined outside chapter 25. This means: * New instructions: VBPERM -- "vector bit permute" VMSL -- "vector multiply sum logical" VNX -- "vector not exclusive or" VNN -- "vector nand" VOC -- "vector or with complement" VLRLR -- "vector load rightmost with length" VLRL -- "vector load rightmost with immediate length" VSTRLR -- "vector store rightmost with length" VSTRL -- "vector store rightmost with immediate length" VFMAX -- "vector fp maximum" VFMIN -- "vector fp minimum" VFNMA -- "vector fp negative multiply and add" VFNMS -- "vector fp negative multiply and subtract" * New base mnemonic and new variants: VFLL -- "vector fp load lengthened" (extends VLDE) VFLR -- "vector fp load rounded" (extends VLED) * New variant of VLLEZ: VLLEZLF -- "vector load logical word element and zero - left aligned" * New variants of VPOPCT (previously byte elements only): VPOPCTH -- "vector population count halfword" VPOPCTF -- "vector population count word" VPOPCTG -- "vector population count double word" * Add support for short and extended floating-point formats to: VFA -- "vector fp add" WFC -- "vector fp compare scalar" WFK -- "vector fp compare and signal scalar" VFD -- "vector fp divide" VFI -- "vector load fp integer" VFM -- "vector fp multiply" VFMA -- "vector fp multiply and add" VFMS -- "vector fp multiply and subtract" VFPSO -- "vector fp perform sign operation" VFSQ -- "vector fp square root" VFS -- "vector fp subtract" VFTCI -- "vector fp test data class immediate" * Add short/extended FP formats and a signal-on-QNaN flag to: VFCE -- "vector fp compare equal" VFCH -- "vector fp compare high" VFCHE -- "vector fp compare high or equal" (In reply to Andreas Arnez from comment #1) > The vector decimal instructions are special because Valgrind doesn't even > support any of the existing decimal instructions (chapter 8) yet. Thus I > suggest to split that out and limit this Bugzilla to (1) the > vector-enhancements facility 1, plus (2) the few instructions from the > vector-packed decimal facility defined outside chapter 25. Sounds reasonable. I am not familiar with bugzilla interface, so could you do it? Also we can stay it as is. For reference, other parts of the z14 support are now tracked by Bug 404404 (for the vector decimal support) and Bug 404406 (for the miscellaneous instruction extensions facility 2). Created attachment 132244 [details] s390x: Support for z14 (arch12) vector instructions This is a first version of z14 vector instruction support, as outlined in comment #1. (In reply to Andreas Arnez from comment #4) > Created attachment 132244 [details] > s390x: Support for z14 (arch12) vector instructions This all looks totally reasonable to me; OK to land as-is. My only observation is: @@ -2219,6 +2256,11 @@ static IRExpr * get_vr(UInt archreg, IRType type, UChar index) { UInt offset = s390_vr_offset_by_index(archreg, type, index); + if (type == Ity_F128) { + return binop(Iop_F64HLtoF128, + IRExpr_Get(offset, Ity_F64), + IRExpr_Get(offset + 8, Ity_F64)); + } Can you use simply IRExpr_Get(offset, Ity_F128) ? If you can, it would give better performance than doing the two F64 Gets and then joining the results together. (In reply to Julian Seward from comment #5) > Can you use simply IRExpr_Get(offset, Ity_F128) ? If you can, it would give > better performance than doing the two F64 Gets and then joining the results > together. Sorry for the late answer. The current implementation always stores F128 values in floating-point register pairs, even if they originate from vector registers. Thus a GET would have to be split into two operations anyhow. Theoretically this could be improved, e.g., by preferring vector registers for F128 values when the right hardware capabilities are available. I consider this a potential future improvement. Created attachment 133812 [details]
[v2] s390x: Support for z14 (arch12) vector instructions
The original patch missed a few minor features. This updated version should complete the functionality required for z14 support. It adds:
- VFMIN/VFMAX
- VMSL
- VBPERM
In addition, it introduces a different approach to implementing STFLE. So far unknown (new) facilities were copied from the hardware facilities, even if Valgrind had no handling for them. Since this has caused trouble in the past, the method is changed such that only known features are advertised.
Created attachment 133841 [details]
[v3] s390x: Support for z14 (arch12) vector instructions
Yet another update to the patch. It mainly fixes an issue with the VMSL implementation.
To facilitate wider testing I have backported this patch to the Fedora 33 and rawhide valgrind package. Things look good, but my own testing was done on a setup that didn't have the (z14) vxe facility, so all I can personally conclude is that it doesn't introduce any regressions. https://bodhi.fedoraproject.org/updates/FEDORA-2020-62b0db3640 Based on Julian's feedback on IRC ("ok from me to land") from today, I pushed this as git commit 159f132289160ab1a5a5cf4da14fb57ecdb248ca. |