Bug 385408 - s390x: z13 vector "support" instructions not implemented
Summary: s390x: z13 vector "support" instructions not implemented
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-05 17:51 UTC by Andreas Arnez
Modified: 2018-01-11 21:32 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Initial vector support for z13 (chapter 21) (1.27 MB, patch)
2017-10-14 06:43 UTC, Vadim Barkov
Details
Initial vector support for z13 (chapter 21) (1.28 MB, patch)
2017-10-19 16:18 UTC, Vadim Barkov
Details
Initial vector support for z13 (chapter 21) (1.34 MB, patch)
2017-10-20 18:49 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (remastered) (263.42 KB, patch)
2017-10-26 20:39 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (fixed doc) (327.62 KB, patch)
2017-11-02 18:25 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (fixed doc) (327.52 KB, patch)
2017-11-08 21:31 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (after review) (455.72 KB, patch)
2017-11-13 05:21 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (hope final) (444.28 KB, patch)
2018-01-03 00:23 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (hope final) (444.23 KB, patch)
2018-01-03 21:26 UTC, Vadim Barkov
Details
Initial vector support (chapter 21) (fixed final) (444.23 KB, patch)
2018-01-04 17:32 UTC, Vadim Barkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Arnez 2017-10-05 17:51:54 UTC
Valgrind currently lacks support for the z/Architecture vector "support" instructions introduced with z13.  These are documented in the z/Architecture Principles of Operation, Eleventh Edition (March, 2015), chapter 21: "Vector Overview and Support Instructions".
Comment 1 Vadim Barkov 2017-10-14 06:43:41 UTC
Created attachment 108343 [details]
Initial vector support for z13 (chapter 21)

Usage:
$ cd /path/to/valgrind/
$ patch -p1 < vector_z13.patch
Comment 2 Vadim Barkov 2017-10-14 07:20:43 UTC
Alternatively to patch you can clone my mirror:
$ git clone https://github.com/barkovv/valgrind
$ git checkout vector_basic_z13
Comment 3 Vadim Barkov 2017-10-17 15:46:09 UTC
The adding of -march=z13 to CFLAGS isn't good aproach however it's the only way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c (I've tried ".insn vrr, ..." aproach but it doesn't compile)

gcc-6.1 adds support for "target" attribute. See example:
$ cat test.c
__attribute__ ((target("arch=z13")))
int main()
{
   asm volatile(
   "vlr %%r1, %%r2\n"
   :::
   );
   return 0;
}
$ gcc test.c -o test # Is fine for gcc-6
$ gcc --version
gcc version 6.2.1 20160826 [gcc-6-branch revision 239773] (SUSE Linux)

gcc-6 isn't avaible on RHEL12 so it could be a problem for users to compile such code there. If drop of gcc-{4.8,5} support is fine I can use above attributes and don't modify CFLAGS.

Let me know your opinion on this.
Comment 4 Vadim Barkov 2017-10-19 16:18:57 UTC
Created attachment 108457 [details]
Initial vector support for z13 (chapter 21)

Update: removed unused code
Comment 5 Andreas Arnez 2017-10-19 16:50:56 UTC
(In reply to Vadim Barkov from comment #3)
> The adding of -march=z13 to CFLAGS isn't good aproach however it's the only
> way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c
> (I've tried ".insn vrr, ..." aproach but it doesn't compile)
Hm, but this would mean that Valgrind couldn't be used on older IBM Z platforms any longer... not good.

Without having looked at this in more detail, maybe another option is to manually encode the instructions, similar to what's already done for many other instructions in "none/tests/s390x/opcodes.h".
Comment 6 Vadim Barkov 2017-10-20 18:49:11 UTC
Created attachment 108482 [details]
Initial vector support for z13 (chapter 21)

Update: Removed -mcpu=z13 from CFLAGS and updated s390-opcodes.csv
Comment 7 Vadim Barkov 2017-10-23 16:42:39 UTC
(In reply to Andreas Arnez from comment #5)
> (In reply to Vadim Barkov from comment #3)
> > The adding of -march=z13 to CFLAGS isn't good aproach however it's the only
> > way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c
> > (I've tried ".insn vrr, ..." aproach but it doesn't compile)
> Hm, but this would mean that Valgrind couldn't be used on older IBM Z
> platforms any longer... not good.
> 
> Without having looked at this in more detail, maybe another option is to
> manually encode the instructions, similar to what's already done for many
> other instructions in "none/tests/s390x/opcodes.h".

Fixed.
Comment 8 Julian Seward 2017-10-26 14:33:38 UTC
(In reply to Vadim Barkov from comment #6)
> Created attachment 108482 [details]
> Initial vector support for z13 (chapter 21)

This patch will need some further work before it is reviewable.
Right now it is so huge and difficult to follow that I have no
confidence in being able to review it properly.

Please:

* Divide it up into much smaller pieces, each of which can be reviewed
  independently.

* Remove all whitespace changes.  These just add noise and make the
  reviewing process more difficult.

* Remove the numerous __inline__ annotations in guest_s390_toIR.c.
  None of these functions are performance critical, and the compiler
  can decide for itself what to inline.

* Try to reduce the size of the patch as much as possible.  A few
  small changes widely spaced is ideal.  Right now, we have stuff
  like this

  -static void
  -s390_format_RRF_UUFR(const HChar *(*irgen)(UChar m3, UChar m4, UChar r1,
  -                                           UChar r2),
  -                     UChar m3, UChar m4, UChar r1, UChar r2)
  +/* Write byte #6 of a vr to the guest state. */
  +static __inline__ void
  +put_vr_b6(UInt archreg, IRExpr *expr)
   {
  -   const HChar *mnm = irgen(m3, m4, r1, r2);
  +   vassert(typeOfIRExpr(irsb->tyenv, expr) == Ity_I8);
 
  -   if (UNLIKELY(vex_traceflags & VEX_TRACE_FE))
  -      s390_disasm(ENC5(MNM, FPR, UINT, GPR, UINT), mnm, r1, m3, r2, m4);
  +   stmt(IRStmt_Put(vr_b6_offset(archreg), expr));
   }

  which is incomprehensible.  Is put_vr_b6 a new function?  A replacement
  for s390_format_RRF_UUFR?  A modified version of s390_format_RRF_UUFR?
  It's impossible to tell.

* Verify with Andreas that your changes, at a high level, make sense
  w.r.t. support of older hardware.

Thank you.
Comment 9 Andreas Arnez 2017-10-26 16:13:00 UTC
(In reply to Julian Seward from comment #8)
> This patch will need some further work before it is reviewable.
> Right now it is so huge and difficult to follow that I have no
> confidence in being able to review it properly.
I found that the patch is much easier to read when applying the whole series and then doing "git diff --minimal origin/master".  The resulting diffstat looks like this:
 VEX/priv/guest_s390_defs.h         |   43 +-
 VEX/priv/guest_s390_helpers.c      |   67 +-
 VEX/priv/guest_s390_toIR.c         | 2527 +++++++++++++++++++++++++++++++++++-
 VEX/priv/host_s390_defs.c          |  734 ++++++++++-
 VEX/priv/host_s390_defs.h          |   78 +-
 VEX/priv/host_s390_isel.c          |  451 ++++++-
 VEX/priv/ir_defs.c                 |  232 ++--
 VEX/priv/main_main.c               |  122 +-
 VEX/priv/s390_disasm.c             |   64 +
 VEX/priv/s390_disasm.h             |    4 +-
 VEX/pub/libvex.h                   |   40 +-
 VEX/pub/libvex_emnote.h            |   11 +-
 VEX/pub/libvex_guest_s390x.h       |  131 +-
 VEX/pub/libvex_ir.h                |  194 +--
 VEX/pub/libvex_s390x_common.h      |    3 +-
 VEX/useful/test_main.c             |   25 +
 coregrind/m_machine.c              |   21 +-
 docs/internals/s390-opcodes.csv    |  202 +--
 memcheck/mc_machine.c              |   11 +-
 memcheck/mc_translate.c            |  526 ++++----
 memcheck/tests/vbit-test/irops.c   |   13 +
 none/tests/s390x/Makefile.am       |    3 +-
 none/tests/s390x/stfle.c           |    4 +-
 none/tests/s390x/stfle.stdout.exp  |   12 +-
 none/tests/s390x/stfle.vgtest      |    2 +-
 none/tests/s390x/vector.c          | 1997 ++++++++++++++++++++++++++++
 none/tests/s390x/vector.h          |  139 ++
 none/tests/s390x/vector.stderr.exp |    2 +
 none/tests/s390x/vector.stdout.exp |   86 ++
 none/tests/s390x/vector.vgtest     |    2 +
 tests/s390x_features.c             |   58 +-
 31 files changed, 7026 insertions(+), 778 deletions(-)

Compared to 54 files changed, 18238 insertions, and 11990 deletions, this is obviously much smaller and also doesn't contain those deletions and re-insertions.

The patch still contains various white space corrections that are not necessarily related to this Bug.  They should probably be separated out.
Comment 10 Vadim Barkov 2017-10-26 20:39:17 UTC
Created attachment 108579 [details]
Initial vector support (chapter 21) (remastered)

Changes:
 - Removed ALL trailing whitespaces changes (patch size decreased in five times)
 - Removed some __inline__ 's from guest_s390_toIR.c
 - Rearranged code in sequence of commits (patches) to make the review process easier
Comment 11 Vadim Barkov 2017-10-26 20:55:53 UTC
(In reply to Julian Seward from comment #8)
> (In reply to Vadim Barkov from comment #6)

> * Divide it up into much smaller pieces, each of which can be reviewed
>   independently.
I divided code in logical parts as you suggested. Hope the review process will be MUCH easier now. Note that patch is supposed to be fully applied for compile and regtest purposes.

> * Remove all whitespace changes.  These just add noise and make the
>   reviewing process more difficult.
Done.

> * Remove the numerous __inline__ annotations in guest_s390_toIR.c.
>   None of these functions are performance critical, and the compiler
>   can decide for itself what to inline.
I've removed __inline__ from my helper functions but saved them for vr_xxy_offset(...), put_vr_xxy(...) and get_vr_xxy() ones.

The reason to do so that their gpr "brothers" are declared with inline. I don't know if it is important or not but want to declare similar functions with similar annotations.

If you disagree with this statement I will remove __inline__ from all my helper functions in "guest_s390_toIR.c" file.
Let me know your opinion on it.

> * Try to reduce the size of the patch as much as possible.  A few
>   small changes widely spaced is ideal.  Right now, we have stuff
>   like this
> 
>   -static void
>   -s390_format_RRF_UUFR(const HChar *(*irgen)(UChar m3, UChar m4, UChar r1,
>   -                                           UChar r2),
>   -                     UChar m3, UChar m4, UChar r1, UChar r2)
>   +/* Write byte #6 of a vr to the guest state. */
>   +static __inline__ void
>   +put_vr_b6(UInt archreg, IRExpr *expr)
>    {
>   -   const HChar *mnm = irgen(m3, m4, r1, r2);
>   +   vassert(typeOfIRExpr(irsb->tyenv, expr) == Ity_I8);
>  
>   -   if (UNLIKELY(vex_traceflags & VEX_TRACE_FE))
>   -      s390_disasm(ENC5(MNM, FPR, UINT, GPR, UINT), mnm, r1, m3, r2, m4);
>   +   stmt(IRStmt_Put(vr_b6_offset(archreg), expr));
>    }
> 
>   which is incomprehensible.  Is put_vr_b6 a new function?  A replacement
>   for s390_format_RRF_UUFR?  A modified version of s390_format_RRF_UUFR?
>   It's impossible to tell.
Fixed.
Comment 12 Andreas Arnez 2017-10-27 18:45:56 UTC
(In reply to Vadim Barkov from comment #10)
> Created attachment 108579 [details]
> Initial vector support (chapter 21) (remastered)
> 
> Changes:
>  - Removed ALL trailing whitespaces changes (patch size decreased in five
> times)
>  - Removed some __inline__ 's from guest_s390_toIR.c
>  - Rearranged code in sequence of commits (patches) to make the review
> process easier

Yeah, that looks much better.  Good work!  A few comments:
* In /none/tests/s390x/vector.c, you use z13 instructions in __asm__ directives.  This is probably OK if you make sure that this test is executed only on systems that support this.  See, for instance how this is done for AVX using the build-time variable BUILD_AVX_TESTS defined in configure.ac.
* When running auxprogs/s390-check-opcodes.pl with the appropriate parameters, I get some warnings.  It would be nice if you could get rid of them.
Comment 13 Vadim Barkov 2017-10-27 20:29:08 UTC
(In reply to Andreas Arnez from comment #12)
> (In reply to Vadim Barkov from comment #10)
> * In /none/tests/s390x/vector.c, you use z13 instructions in __asm__
> directives.  This is probably OK if you make sure that this test is executed
> only on systems that support this.  
This test is COMPILED on every s390x machine. It's okay since all branches of gcc (4.8, 4.9.5, 5.3, 6.2 are tested, 7.xx is missing either in RHEL or SLES) support this asm mnemonics.
Compilation is quick so perfomance of "make check" is not much affected.

This test is EXECUTED only on systems with vx support (z13 and later). "./tests/s390x-features s390x-vx" command in vector.vgtest ensures that test if executed if and only if the current CPU support vector instructions. For this reason I've extended valgrind's STFLE helper and test so it is able to perform check of VX support.

> * When running auxprogs/s390-check-opcodes.pl with the appropriate
> parameters, I get some warnings.  It would be nice if you could get rid of
> them.
For some reason binutils think that for example "vrepb, vreph, vrepf and vrep" are separate insns but it's a single insn with different "m" argument. I'll find some way to remove this warnings.
Comment 14 Vadim Barkov 2017-11-02 18:25:25 UTC
Created attachment 108682 [details]
Initial vector support (chapter 21) (fixed doc)

Changes:

Fixed all warnings in s390-check-opcodes.pl
Comment 15 Vadim Barkov 2017-11-02 18:27:27 UTC
I've fixed all problems you pointed to. Waiting for your verdict
Comment 16 Vadim Barkov 2017-11-06 19:00:45 UTC
(In reply to Julian Seward from comment #8)
> (In reply to Vadim Barkov from comment #6)
> > Created attachment 108482 [details]
> > Initial vector support for z13 (chapter 21)
> 
> This patch will need some further work before it is reviewable.
> Right now it is so huge and difficult to follow that I have no
> confidence in being able to review it properly.
Please let me know when you start review process.
Comment 17 Julian Seward 2017-11-07 11:46:05 UTC
(In reply to Vadim Barkov from comment #16)
I started to look at it.  It looks a lot better.  I will look at it in
more detail in the next day or so.
Comment 18 Vadim Barkov 2017-11-08 21:31:41 UTC
Created attachment 108754 [details]
Initial vector support (chapter 21) (fixed doc)

UPDATE: Fixed dublicated cases in guest_s390_toIR.c

For some reason there are multiple instructions for the same opcode in binutils' s390-opcodes.txt. I suppose they are just synonims for single instruction and ignore them as well as other synonyms.
Comment 19 Julian Seward 2017-11-09 21:09:11 UTC
(In reply to Vadim Barkov from comment #18)
> Created attachment 108754 [details]
> Initial vector support (chapter 21) (fixed doc)

This is pretty good on the whole, but there are a couple of areas of concern.
Comments below.


+++ b/memcheck/mc_translate.c

+      case Iop_Perm8x16x2:
+         /* (V128, V128, V128) -> V128 */
+         return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, vatom3));

This isn't right.  It needs to steer the shadow values but using atom3 as
the steering value, not vatom3.  Also it needs to somehow take into account
the fact that atom3 might be undefined.  See the handling of Iop_Perm8x8 in
the same file.


VEX/priv/host_s390_defs.c


+s390_amode *
+s390_amode_for_stack_pointer(Int offset)
+{
+   if (fits_unsigned_12bit(offset))
+      return s390_amode_b12(offset, s390_hreg_stack_pointer());
+
+   if (fits_signed_20bit(offset))
+      return s390_amode_b20(offset, s390_hreg_stack_pointer());

If in both cases you're returning a 's390_amode*', is there any purpose
in having both variants?  Why not always just use the 20 bit variant?


+s390_isel_vec_expr_wrk(ISelEnv *env, IRExpr *expr)
..
+      /* --------- LOAD --------- */
+   case Iex_Load: {
Nit: please indent the 'case' and the comment the same amount, here and below.


VEX/pub/libvex_guest_s390x.h

+   /*   64 */  union{ ULong guest_f0;  V128  guest_v0; };
..
+   /*  304 */  union{ ULong guest_f15; V128  guest_v15; };

This is definitely not right, because the two union components have
different sizes (8 vs 16 bytes).  Please remove the _f[0-15] variants and
the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1]
instead.  This is what various other ports do, that have similar register
arrangements.


VEX/priv/guest_s390_defs.h
+   S390_CC_VEC_LAST = 3 // supposed to be tha last element in enum
Typo: s/tha/the


VEX/priv/guest_s390_helpers.c
+/*------------------------------------------------------------*/
+/*--- Dirty helper for vector instructions                 ---*/
+/*------------------------------------------------------------*/
+#if defined(VGA_s390x)
Nit: please, ensure there is at least one blank line before/after these big
comment blocks, here and below


VEX/priv/guest_s390_toIR.c
+s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr),
+                    UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb)
+{
+   const HChar *mnm;
+   IRTemp op2addr = newTemp(Ity_I64);
+
+   if (! s390_host_has_vx) {
+      emulation_failure(EmFail_S390X_vx);
+      return;
+   }

Is this really right?  Looking at emulation_failure(), I see that this
produces a jump of the kind Ijk_EmFail.  For other targets, and
instruction-decode failure produces a jump of the kind Ijk_NoDecode, which
is different -- it causes Valgrind to send SIGILL to the guest.

So I don't think this is really right.  Unless the s390 response to an
undecodeable insn is something different than SIGILL.


+s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4)
+s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4)
+s390_irgen_VUPH(UChar v1, UChar v2, UChar m3)
+s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3)
+s390_irgen_VUPL(UChar v1, UChar v2, UChar m3)
+s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3)

Many of these look like they are performing vector operations by doing one
lane at a time.  Whilst this will generate working code, it will also cause
the translations to be big.  In extreme cases of this (on other
architectures), given long sequences of vector instructions in the input, we
have had problems in which VEX's internal storage overflowed and so it had
to abort.  I would therefore encourage you to generate shorter sequences
that make more extensive use of the vector IROps, if possible.


one/tests/s390x/vector.c

These tests test decoding, but they don't test the arithmetic much at all.
Please add more tests, possibly by running each test (eg) 50 times with
random data.  That often shakes out edge cases.  See for example
none/tests/amd64/avx-1.c.
Comment 20 Vadim Barkov 2017-11-13 05:21:36 UTC
Created attachment 108826 [details]
Initial vector support (chapter 21) (after review)

Changes:

Reworked tests to be similar to amd64's avx-1.c
Reworked VRs in guest structure (removed FPR & VR unions)
Reworked pack/unpack and interleave insn to use only one IR operation
Reworked Iop_Perm8x16x2 code in memcheck
Fixed typoes and formatting issues

Found and fixed some bugs with new tests for this insns:
VSTL
VPDI
VPKS (CC part)
Comment 21 Vadim Barkov 2017-11-13 05:36:16 UTC
(In reply to Julian Seward from comment #19)
> (In reply to Vadim Barkov from comment #18)
> > Created attachment 108754 [details]
> > Initial vector support (chapter 21) (fixed doc)
> 
> This is pretty good on the whole, but there are a couple of areas of concern.
> Comments below.
> 
> 
> +++ b/memcheck/mc_translate.c
> 
> +      case Iop_Perm8x16x2:
> +         /* (V128, V128, V128) -> V128 */
> +         return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2,
> vatom3));
> 
> This isn't right.  It needs to steer the shadow values but using atom3 as
> the steering value, not vatom3.  Also it needs to somehow take into account
> the fact that atom3 might be undefined.  See the handling of Iop_Perm8x8 in
> the same file.
> 

Is this code right?

      case Iop_Perm8x16x2:
         /* (V128, V128, V128) -> V128 */
            complainIfUndefined(mce, atom3, NULL);
            return mkUifUV128(
                   mce,
                   assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, atom3)),
                   mkPCast8x16(mce, vatom3)
                );

> VEX/priv/host_s390_defs.c
> 
> 
> +s390_amode *
> +s390_amode_for_stack_pointer(Int offset)
> +{
> +   if (fits_unsigned_12bit(offset))
> +      return s390_amode_b12(offset, s390_hreg_stack_pointer());
> +
> +   if (fits_signed_20bit(offset))
> +      return s390_amode_b20(offset, s390_hreg_stack_pointer());
> 
> If in both cases you're returning a 's390_amode*', is there any purpose
> in having both variants?  Why not always just use the 20 bit variant?
 
To be honest this function is heavily inspired by "s390_amode_for_guest_state" one from the same file and I don't know why we have two variants. I guess it's a try to save memory or something else.
Since the functions do the very similar job they have similar implementation.

> VEX/pub/libvex_guest_s390x.h
> 
> +   /*   64 */  union{ ULong guest_f0;  V128  guest_v0; };
> ..
> +   /*  304 */  union{ ULong guest_f15; V128  guest_v15; };
> 
> This is definitely not right, because the two union components have
> different sizes (8 vs 16 bytes).  Please remove the _f[0-15] variants and
> the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1]
> instead.  This is what various other ports do, that have similar register
> arrangements.
 
Fixed. 

> VEX/priv/guest_s390_toIR.c
> +s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr),
> +                    UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb)
> +{
> +   const HChar *mnm;
> +   IRTemp op2addr = newTemp(Ity_I64);
> +
> +   if (! s390_host_has_vx) {
> +      emulation_failure(EmFail_S390X_vx);
> +      return;
> +   }
> 
> Is this really right?  Looking at emulation_failure(), I see that this
> produces a jump of the kind Ijk_EmFail.  For other targets, and
> instruction-decode failure produces a jump of the kind Ijk_NoDecode, which
> is different -- it causes Valgrind to send SIGILL to the guest.
> 
> So I don't think this is really right.  Unless the s390 response to an
> undecodeable insn is something different than SIGILL.
> 

"s390_host_has_vx" is not a decoding failure reaction. It indicates that valgrind decoded vector instruction which it cannot emulate / handle due to missing vector facility. 
In the same way valgrind (on s390x machine) calls emulation_failure() when it decodes some FP insn and doesn't have fpext (floating-point extension) facility.
So I think this is right.

> +s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4)
> +s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4)
> +s390_irgen_VUPH(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPL(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3)
> 
> Many of these look like they are performing vector operations by doing one
> lane at a time.  Whilst this will generate working code, it will also cause
> the translations to be big.  In extreme cases of this (on other
> architectures), given long sequences of vector instructions in the input, we
> have had problems in which VEX's internal storage overflowed and so it had
> to abort.  I would therefore encourage you to generate shorter sequences
> that make more extensive use of the vector IROps, if possible.

 Fixed.

> one/tests/s390x/vector.c
> 
> These tests test decoding, but they don't test the arithmetic much at all.
> Please add more tests, possibly by running each test (eg) 50 times with
> random data.  That often shakes out edge cases.  See for example
> none/tests/amd64/avx-1.c.

Done.
Comment 22 Vadim Barkov 2017-11-23 19:24:05 UTC
(In reply to Julian Seward from comment #19)

If the new Iop_Perm8x16x2 handling in memcheck is incorrect I'll remove Iop_Perm8x16x2 operation and will handle VPERM insn via dirtyhelper. I already have some code for vector string insns so it isn't hard to implement.

Please let me know your opinion on the last patch.
Comment 23 Vadim Barkov 2017-12-08 17:36:13 UTC
(In reply to Andreas Arnez from comment #12)
> (In reply to Vadim Barkov from comment #10)
> > Created attachment 108579 [details]
> > Initial vector support (chapter 21) (remastered)
> > 
> > Changes:
> >  - Removed ALL trailing whitespaces changes (patch size decreased in five
> > times)
> >  - Removed some __inline__ 's from guest_s390_toIR.c
> >  - Rearranged code in sequence of commits (patches) to make the review
> > process easier
> 
> Yeah, that looks much better.  Good work!  A few comments:
> * In /none/tests/s390x/vector.c, you use z13 instructions in __asm__
> directives.  This is probably OK if you make sure that this test is executed
> only on systems that support this.  See, for instance how this is done for
> AVX using the build-time variable BUILD_AVX_TESTS defined in configure.ac.
> * When running auxprogs/s390-check-opcodes.pl with the appropriate
> parameters, I get some warnings.  It would be nice if you could get rid of
> them.

Is the current patch okay? Could you merge it or tell me what's wrong with it please?
Comment 24 Julian Seward 2017-12-08 18:01:16 UTC
Sorry for the delay.  I will review it in Monday.
Comment 25 Julian Seward 2017-12-08 18:01:40 UTC
*on* Monday, that is.
Comment 26 Andreas Arnez 2017-12-15 18:07:39 UTC
(In reply to Vadim Barkov from comment #23)
> Is the current patch okay? Could you merge it or tell me what's wrong with
> it please?

For the record, I didn't find any further problems with the patch, except
maybe a few nits:
- In none/tests/s390x/vector.c, test_once(vgbm) is
  invoked twice for no apparent reason.
- In none/tests/s390x/vector.h, the comment for s390_test_generate() seems
  misleading.  In particular it states the the macro takes up to three
  arguments (in fact, just two?) and that the arguments can be modified
  for testing (not sure what that means -- example?).
- Some minor spelling issues and long lines.

I can't approve the patch though.  Julian, are you OK with the patch now?
Comment 27 Vadim Barkov 2017-12-24 14:43:51 UTC
Julian, did you check this out? I just want to remind you if you forget about this issue for some reason.
Comment 28 Julian Seward 2018-01-02 09:52:09 UTC
(In reply to Vadim Barkov from comment #20)
> Created attachment 108826 [details]
> Initial vector support (chapter 21) (after review)

Looks good to me.  My only comment is below.

OK to land after you also deal with Andreas' comment 26, and if
this tests OK both on latest hardware and older hardware.

I am sorry for the really slow review here.  My bad.
Comment 29 Julian Seward 2018-01-02 10:02:41 UTC
(In reply to Julian Seward from comment #28)
> Looks good to me.  My only comment is below.

Duh!  Here is is:

guest_s390_helpers.c, __inline__ directives

eg
static __inline__ void

Please remove the __inline__ directives.  We build with -finline-functions,
these are all low-frequency functions, and I prefer to leave gcc to decide
for itself what to inline, in this case.
Comment 30 Vadim Barkov 2018-01-03 00:23:25 UTC
Created attachment 109635 [details]
Initial vector support (chapter 21) (hope final)

Changes:
Removed __inline__'s
Removed unused functions from guest_s390_toIR.c
Fixed minor issues from Andreas' comment 26

Test results are equal to nonpatched version on z13 machine.
On earlier machine vector test shouldn't be executed however I haven't any one for testing.

I guess it's ready to apply since all your suggestions are implemented.
Comment 31 Vadim Barkov 2018-01-03 18:19:47 UTC
Are here other task for me to do?
Comment 32 Vadim Barkov 2018-01-03 21:26:23 UTC
Created attachment 109653 [details]
Initial vector support (chapter 21) (hope final)

Previous patch (in comment 30) was corrupted while uploading. This is correct one.

Please check the comment to previous patch.
Comment 33 Julian Seward 2018-01-04 12:29:35 UTC
(In reply to Vadim Barkov from comment #32)
> Created attachment 109653 [details]
> Initial vector support (chapter 21) (hope final)

Hi Vadim.  Andreas and I tried to apply the patch to the current git trunk,
but it doesn't apply.  For example

  checking file none/tests/s390x/Makefile.am
  Hunk #1 FAILED at 18.
  1 out of 2 hunks FAILED

Also, given this kind of output ..

  checking file VEX/priv/guest_s390_toIR.c
  Hunk #1 succeeded at 15016 (offset -2189 lines).

.. I guess the patch is against a quite old version of the tree.

Could you please rebase it against the current git trunk, *and* re-test it?
Thanks.
Comment 34 Vadim Barkov 2018-01-04 14:22:09 UTC
Please try use --ignore-whitespaces option while paching (I forgot to mention it). It works for me (on latest commit from Jan, 3, 0f18cfc986f800b107c7eee063b8b7c04617e0b8).
Anyway I'll try to find problem and post the correct version of patch today.
Comment 35 Vadim Barkov 2018-01-04 17:32:25 UTC
Created attachment 109672 [details]
Initial vector support (chapter 21) (fixed final)

Changes:

Fixed applying issues (were some problems with mixed tabs and spaces).

Not it is fine. This patch has passed tests.
Comment 36 Julian Seward 2018-01-11 17:25:51 UTC
Pushed as f1a49eeb427caf42e7af2da2b91198e55c6f33b2.
Thanks for working on this.
Comment 37 Vadim Barkov 2018-01-11 21:01:02 UTC
Thank you for apply. Could you mark this as resolved and close?