Bug 404076

Summary:	s390x: z14 vector instructions not implemented
Product:	[Developer tools] valgrind	Reporter:	Vadim Barkov <vbrkov>
Component:	vex	Assignee:	Andreas Arnez <arnez>
Status:	RESOLVED FIXED
Severity:	normal	CC:	arnez, jseward, mark, vbrkov
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	s390x: Support for z14 (arch12) vector instructions [v2] s390x: Support for z14 (arch12) vector instructions [v3] s390x: Support for z14 (arch12) vector instructions

Description Vadim Barkov 2019-02-08 01:25:03 UTC

Valgrind currently lacks support for the new z/Architecture instructions
(that were first implemented with z14) and the new behaviour of existing z13 ones. The list of such insns:

Chapter 21. Vector  Support Instructions:
    VECTOR BIT PERMUTE VBPERM
    VECTOR LOAD RIGHTMOST WITH LENGTH VLRL
    VECTOR LOAD RIGHTMOST WITH LENGTH VLRLR
    VECTOR STORE RIGHTMOST WITH LENGTH VSTRL
    VECTOR STORE RIGHTMOST WITH LENGTH VSTRLR
    VECTOR LOAD LOGICAL ELEMENT AND ZERO VLLEZ

Chapter 22. Vector Integer Instructions:
    VECTOR MULTIPLY SUM LOGICAL VMSL
    VECTOR NAND VNN
    VECTOR NOT EXCLUSIVE OR VNX
    VECTOR OR WITH COMPLEMENT VOC
    VECTOR POPULATION COUNT VPOPCT

Chapter 23, Vector String
Instructions
No changes needed

Chapter 24. Vector Floating-Point Instructions:
  Update all Vector FP instructions to support all BFP formats (short, long, extended) and implement new ones:
    VECTOR FP LOAD LENGTHENED VFLL
    VECTOR FP LOAD ROUNDED VFLR
    VECTOR FP MAXIMUM VFMAX
    VECTOR FP MINIMUM VFMIN
    VECTOR FP NEGATIVE MULTIPLY AND ADD VFNMA
    VECTOR FP NEGATIVE MULTIPLY AND SUBTRACT VFNMS

Chapter 25. Vector Decimal Instructions
    VECTOR ADD DECIMAL VAP
    VECTOR COMPARE DECIMAL VCP
    VECTOR CONVERT TO BINARY VCVB
    VECTOR CONVERT TO BINARY VCVBG
    VECTOR CONVERT TO DECIMAL VCVD
    VECTOR CONVERT TO DECIMAL VCVDG
    VECTOR DIVIDE DECIMAL VDP
    VECTOR LOAD IMMEDIATE DECIMAL VLIP
    VECTOR MULTIPLY AND SHIFT DECIMAL VMSP
    VECTOR MULTIPLY DECIMAL VMP
    VECTOR PACK ZONED VPKZ
    VECTOR PERFORM SIGN OPERATION DECIMAL VPSOP 
    VECTOR REMAINDER DECIMAL VRP 
    VECTOR SHIFT AND DIVIDE DECIMAL VSDP
    VECTOR SHIFT AND ROUND DECIMAL VSRP

Comment 1 Andreas Arnez 2019-02-14 19:03:45 UTC

The vector decimal instructions are special because Valgrind doesn't even support any of the existing decimal instructions (chapter 8) yet.  Thus I suggest to split that out and limit this Bugzilla to (1) the vector-enhancements facility 1, plus (2) the few instructions from the vector-packed decimal facility defined outside chapter 25.  This means:

* New instructions:
   VBPERM -- "vector bit permute"
   VMSL -- "vector multiply sum logical"
   VNX -- "vector not exclusive or"
   VNN -- "vector nand"
   VOC -- "vector or with complement"
   VLRLR -- "vector load rightmost with length"
   VLRL -- "vector load rightmost with immediate length"
   VSTRLR -- "vector store rightmost with length"
   VSTRL -- "vector store rightmost with immediate length"
   VFMAX -- "vector fp maximum"
   VFMIN -- "vector fp minimum"
   VFNMA -- "vector fp negative multiply and add"
   VFNMS -- "vector fp negative multiply and subtract"

* New base mnemonic and new variants:
   VFLL -- "vector fp load lengthened" (extends VLDE)
   VFLR -- "vector fp load rounded" (extends VLED)

* New variant of VLLEZ:
   VLLEZLF -- "vector load logical word element and zero - left aligned"

* New variants of VPOPCT (previously byte elements only):
   VPOPCTH -- "vector population count halfword"
   VPOPCTF -- "vector population count word"
   VPOPCTG -- "vector population count double word"

* Add support for short and extended floating-point formats to:
   VFA -- "vector fp add"
   WFC -- "vector fp compare scalar"
   WFK -- "vector fp compare and signal scalar"
   VFD -- "vector fp divide"
   VFI -- "vector load fp integer"
   VFM -- "vector fp multiply"
   VFMA -- "vector fp multiply and add"
   VFMS -- "vector fp multiply and subtract"
   VFPSO -- "vector fp perform sign operation"
   VFSQ -- "vector fp square root"
   VFS -- "vector fp subtract"
   VFTCI -- "vector fp test data class immediate"

* Add short/extended FP formats and a signal-on-QNaN flag to:
   VFCE -- "vector fp compare equal"
   VFCH -- "vector fp compare high"
   VFCHE -- "vector fp compare high or equal"

Comment 2 Vadim Barkov 2019-02-14 19:08:31 UTC

(In reply to Andreas Arnez from comment #1)
> The vector decimal instructions are special because Valgrind doesn't even
> support any of the existing decimal instructions (chapter 8) yet.  Thus I
> suggest to split that out and limit this Bugzilla to (1) the
> vector-enhancements facility 1, plus (2) the few instructions from the
> vector-packed decimal facility defined outside chapter 25.

Sounds reasonable. I am not familiar with bugzilla interface, so could you do it? Also we can stay it as is.

Comment 3 Andreas Arnez 2019-02-15 17:20:12 UTC

For reference, other parts of the z14 support are now tracked by Bug 404404 (for the vector decimal support) and Bug 404406 (for the miscellaneous instruction extensions facility 2).

Comment 4 Andreas Arnez 2020-10-09 18:38:51 UTC

Created attachment 132244 [details]
s390x: Support for z14 (arch12) vector instructions

This is a first version of z14 vector instruction support, as outlined in comment #1.

Comment 5 Julian Seward 2020-10-28 16:01:51 UTC

(In reply to Andreas Arnez from comment #4)
> Created attachment 132244 [details]
> s390x: Support for z14 (arch12) vector instructions

This all looks totally reasonable to me; OK to land as-is.

My only observation is:

@@ -2219,6 +2256,11 @@ static IRExpr *
 get_vr(UInt archreg, IRType type, UChar index)
 {
    UInt offset = s390_vr_offset_by_index(archreg, type, index);
+   if (type == Ity_F128) {
+      return binop(Iop_F64HLtoF128,
+                   IRExpr_Get(offset, Ity_F64),
+                   IRExpr_Get(offset + 8, Ity_F64));
+   }

Can you use simply IRExpr_Get(offset, Ity_F128) ?  If you can, it would give
better performance than doing the two F64 Gets and then joining the results
together.

Comment 6 Andreas Arnez 2020-12-02 15:32:49 UTC

(In reply to Julian Seward from comment #5)
> Can you use simply IRExpr_Get(offset, Ity_F128) ?  If you can, it would give
> better performance than doing the two F64 Gets and then joining the results
> together.
Sorry for the late answer.  The current implementation always stores F128 values in floating-point register pairs, even if they originate from vector registers.  Thus a GET would have to be split into two operations anyhow.  Theoretically this could be improved, e.g., by preferring vector registers for F128 values when the right hardware capabilities are available.  I consider this a potential future improvement.

Comment 7 Andreas Arnez 2020-12-02 15:38:36 UTC

Created attachment 133812 [details]
[v2] s390x: Support for z14 (arch12) vector instructions

The original patch missed a few minor features.  This updated version should complete the functionality required for z14 support.  It adds:
- VFMIN/VFMAX
- VMSL
- VBPERM
In addition, it introduces a different approach to implementing STFLE.  So far unknown (new) facilities were copied from the hardware facilities, even if Valgrind had no handling for them.  Since this has caused trouble in the past, the method is changed such that only known features are advertised.

Comment 8 Andreas Arnez 2020-12-03 16:11:53 UTC

Created attachment 133841 [details]
[v3] s390x: Support for z14 (arch12) vector instructions

Yet another update to the patch.  It mainly fixes an issue with the VMSL implementation.

Comment 9 Mark Wielaard 2020-12-05 00:26:58 UTC

To facilitate wider testing I have backported this patch to the Fedora 33 and rawhide valgrind package. Things look good, but my own testing was done on a setup that didn't have the (z14) vxe facility, so all I can personally conclude is that it doesn't introduce any regressions.

https://bodhi.fedoraproject.org/updates/FEDORA-2020-62b0db3640

Comment 10 Andreas Arnez 2020-12-08 18:41:58 UTC

Based on Julian's feedback on IRC ("ok from me to land") from today, I pushed this as git commit 159f132289160ab1a5a5cf4da14fb57ecdb248ca.