Valgrind currently lacks support for the z/Architecture vector "support" instructions introduced with z13. These are documented in the z/Architecture Principles of Operation, Eleventh Edition (March, 2015), chapter 21: "Vector Overview and Support Instructions".
Created attachment 108343 [details] Initial vector support for z13 (chapter 21) Usage: $ cd /path/to/valgrind/ $ patch -p1 < vector_z13.patch
Alternatively to patch you can clone my mirror: $ git clone https://github.com/barkovv/valgrind $ git checkout vector_basic_z13
The adding of -march=z13 to CFLAGS isn't good aproach however it's the only way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c (I've tried ".insn vrr, ..." aproach but it doesn't compile) gcc-6.1 adds support for "target" attribute. See example: $ cat test.c __attribute__ ((target("arch=z13"))) int main() { asm volatile( "vlr %%r1, %%r2\n" ::: ); return 0; } $ gcc test.c -o test # Is fine for gcc-6 $ gcc --version gcc version 6.2.1 20160826 [gcc-6-branch revision 239773] (SUSE Linux) gcc-6 isn't avaible on RHEL12 so it could be a problem for users to compile such code there. If drop of gcc-{4.8,5} support is fine I can use above attributes and don't modify CFLAGS. Let me know your opinion on this.
Created attachment 108457 [details] Initial vector support for z13 (chapter 21) Update: removed unused code
(In reply to Vadim Barkov from comment #3) > The adding of -march=z13 to CFLAGS isn't good aproach however it's the only > way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c > (I've tried ".insn vrr, ..." aproach but it doesn't compile) Hm, but this would mean that Valgrind couldn't be used on older IBM Z platforms any longer... not good. Without having looked at this in more detail, maybe another option is to manually encode the instructions, similar to what's already done for many other instructions in "none/tests/s390x/opcodes.h".
Created attachment 108482 [details] Initial vector support for z13 (chapter 21) Update: Removed -mcpu=z13 from CFLAGS and updated s390-opcodes.csv
(In reply to Andreas Arnez from comment #5) > (In reply to Vadim Barkov from comment #3) > > The adding of -march=z13 to CFLAGS isn't good aproach however it's the only > > way to compile condition code helpers in VEX/priv/guest_s390x_helpers.c > > (I've tried ".insn vrr, ..." aproach but it doesn't compile) > Hm, but this would mean that Valgrind couldn't be used on older IBM Z > platforms any longer... not good. > > Without having looked at this in more detail, maybe another option is to > manually encode the instructions, similar to what's already done for many > other instructions in "none/tests/s390x/opcodes.h". Fixed.
(In reply to Vadim Barkov from comment #6) > Created attachment 108482 [details] > Initial vector support for z13 (chapter 21) This patch will need some further work before it is reviewable. Right now it is so huge and difficult to follow that I have no confidence in being able to review it properly. Please: * Divide it up into much smaller pieces, each of which can be reviewed independently. * Remove all whitespace changes. These just add noise and make the reviewing process more difficult. * Remove the numerous __inline__ annotations in guest_s390_toIR.c. None of these functions are performance critical, and the compiler can decide for itself what to inline. * Try to reduce the size of the patch as much as possible. A few small changes widely spaced is ideal. Right now, we have stuff like this -static void -s390_format_RRF_UUFR(const HChar *(*irgen)(UChar m3, UChar m4, UChar r1, - UChar r2), - UChar m3, UChar m4, UChar r1, UChar r2) +/* Write byte #6 of a vr to the guest state. */ +static __inline__ void +put_vr_b6(UInt archreg, IRExpr *expr) { - const HChar *mnm = irgen(m3, m4, r1, r2); + vassert(typeOfIRExpr(irsb->tyenv, expr) == Ity_I8); - if (UNLIKELY(vex_traceflags & VEX_TRACE_FE)) - s390_disasm(ENC5(MNM, FPR, UINT, GPR, UINT), mnm, r1, m3, r2, m4); + stmt(IRStmt_Put(vr_b6_offset(archreg), expr)); } which is incomprehensible. Is put_vr_b6 a new function? A replacement for s390_format_RRF_UUFR? A modified version of s390_format_RRF_UUFR? It's impossible to tell. * Verify with Andreas that your changes, at a high level, make sense w.r.t. support of older hardware. Thank you.
(In reply to Julian Seward from comment #8) > This patch will need some further work before it is reviewable. > Right now it is so huge and difficult to follow that I have no > confidence in being able to review it properly. I found that the patch is much easier to read when applying the whole series and then doing "git diff --minimal origin/master". The resulting diffstat looks like this: VEX/priv/guest_s390_defs.h | 43 +- VEX/priv/guest_s390_helpers.c | 67 +- VEX/priv/guest_s390_toIR.c | 2527 +++++++++++++++++++++++++++++++++++- VEX/priv/host_s390_defs.c | 734 ++++++++++- VEX/priv/host_s390_defs.h | 78 +- VEX/priv/host_s390_isel.c | 451 ++++++- VEX/priv/ir_defs.c | 232 ++-- VEX/priv/main_main.c | 122 +- VEX/priv/s390_disasm.c | 64 + VEX/priv/s390_disasm.h | 4 +- VEX/pub/libvex.h | 40 +- VEX/pub/libvex_emnote.h | 11 +- VEX/pub/libvex_guest_s390x.h | 131 +- VEX/pub/libvex_ir.h | 194 +-- VEX/pub/libvex_s390x_common.h | 3 +- VEX/useful/test_main.c | 25 + coregrind/m_machine.c | 21 +- docs/internals/s390-opcodes.csv | 202 +-- memcheck/mc_machine.c | 11 +- memcheck/mc_translate.c | 526 ++++---- memcheck/tests/vbit-test/irops.c | 13 + none/tests/s390x/Makefile.am | 3 +- none/tests/s390x/stfle.c | 4 +- none/tests/s390x/stfle.stdout.exp | 12 +- none/tests/s390x/stfle.vgtest | 2 +- none/tests/s390x/vector.c | 1997 ++++++++++++++++++++++++++++ none/tests/s390x/vector.h | 139 ++ none/tests/s390x/vector.stderr.exp | 2 + none/tests/s390x/vector.stdout.exp | 86 ++ none/tests/s390x/vector.vgtest | 2 + tests/s390x_features.c | 58 +- 31 files changed, 7026 insertions(+), 778 deletions(-) Compared to 54 files changed, 18238 insertions, and 11990 deletions, this is obviously much smaller and also doesn't contain those deletions and re-insertions. The patch still contains various white space corrections that are not necessarily related to this Bug. They should probably be separated out.
Created attachment 108579 [details] Initial vector support (chapter 21) (remastered) Changes: - Removed ALL trailing whitespaces changes (patch size decreased in five times) - Removed some __inline__ 's from guest_s390_toIR.c - Rearranged code in sequence of commits (patches) to make the review process easier
(In reply to Julian Seward from comment #8) > (In reply to Vadim Barkov from comment #6) > * Divide it up into much smaller pieces, each of which can be reviewed > independently. I divided code in logical parts as you suggested. Hope the review process will be MUCH easier now. Note that patch is supposed to be fully applied for compile and regtest purposes. > * Remove all whitespace changes. These just add noise and make the > reviewing process more difficult. Done. > * Remove the numerous __inline__ annotations in guest_s390_toIR.c. > None of these functions are performance critical, and the compiler > can decide for itself what to inline. I've removed __inline__ from my helper functions but saved them for vr_xxy_offset(...), put_vr_xxy(...) and get_vr_xxy() ones. The reason to do so that their gpr "brothers" are declared with inline. I don't know if it is important or not but want to declare similar functions with similar annotations. If you disagree with this statement I will remove __inline__ from all my helper functions in "guest_s390_toIR.c" file. Let me know your opinion on it. > * Try to reduce the size of the patch as much as possible. A few > small changes widely spaced is ideal. Right now, we have stuff > like this > > -static void > -s390_format_RRF_UUFR(const HChar *(*irgen)(UChar m3, UChar m4, UChar r1, > - UChar r2), > - UChar m3, UChar m4, UChar r1, UChar r2) > +/* Write byte #6 of a vr to the guest state. */ > +static __inline__ void > +put_vr_b6(UInt archreg, IRExpr *expr) > { > - const HChar *mnm = irgen(m3, m4, r1, r2); > + vassert(typeOfIRExpr(irsb->tyenv, expr) == Ity_I8); > > - if (UNLIKELY(vex_traceflags & VEX_TRACE_FE)) > - s390_disasm(ENC5(MNM, FPR, UINT, GPR, UINT), mnm, r1, m3, r2, m4); > + stmt(IRStmt_Put(vr_b6_offset(archreg), expr)); > } > > which is incomprehensible. Is put_vr_b6 a new function? A replacement > for s390_format_RRF_UUFR? A modified version of s390_format_RRF_UUFR? > It's impossible to tell. Fixed.
(In reply to Vadim Barkov from comment #10) > Created attachment 108579 [details] > Initial vector support (chapter 21) (remastered) > > Changes: > - Removed ALL trailing whitespaces changes (patch size decreased in five > times) > - Removed some __inline__ 's from guest_s390_toIR.c > - Rearranged code in sequence of commits (patches) to make the review > process easier Yeah, that looks much better. Good work! A few comments: * In /none/tests/s390x/vector.c, you use z13 instructions in __asm__ directives. This is probably OK if you make sure that this test is executed only on systems that support this. See, for instance how this is done for AVX using the build-time variable BUILD_AVX_TESTS defined in configure.ac. * When running auxprogs/s390-check-opcodes.pl with the appropriate parameters, I get some warnings. It would be nice if you could get rid of them.
(In reply to Andreas Arnez from comment #12) > (In reply to Vadim Barkov from comment #10) > * In /none/tests/s390x/vector.c, you use z13 instructions in __asm__ > directives. This is probably OK if you make sure that this test is executed > only on systems that support this. This test is COMPILED on every s390x machine. It's okay since all branches of gcc (4.8, 4.9.5, 5.3, 6.2 are tested, 7.xx is missing either in RHEL or SLES) support this asm mnemonics. Compilation is quick so perfomance of "make check" is not much affected. This test is EXECUTED only on systems with vx support (z13 and later). "./tests/s390x-features s390x-vx" command in vector.vgtest ensures that test if executed if and only if the current CPU support vector instructions. For this reason I've extended valgrind's STFLE helper and test so it is able to perform check of VX support. > * When running auxprogs/s390-check-opcodes.pl with the appropriate > parameters, I get some warnings. It would be nice if you could get rid of > them. For some reason binutils think that for example "vrepb, vreph, vrepf and vrep" are separate insns but it's a single insn with different "m" argument. I'll find some way to remove this warnings.
Created attachment 108682 [details] Initial vector support (chapter 21) (fixed doc) Changes: Fixed all warnings in s390-check-opcodes.pl
I've fixed all problems you pointed to. Waiting for your verdict
(In reply to Julian Seward from comment #8) > (In reply to Vadim Barkov from comment #6) > > Created attachment 108482 [details] > > Initial vector support for z13 (chapter 21) > > This patch will need some further work before it is reviewable. > Right now it is so huge and difficult to follow that I have no > confidence in being able to review it properly. Please let me know when you start review process.
(In reply to Vadim Barkov from comment #16) I started to look at it. It looks a lot better. I will look at it in more detail in the next day or so.
Created attachment 108754 [details] Initial vector support (chapter 21) (fixed doc) UPDATE: Fixed dublicated cases in guest_s390_toIR.c For some reason there are multiple instructions for the same opcode in binutils' s390-opcodes.txt. I suppose they are just synonims for single instruction and ignore them as well as other synonyms.
(In reply to Vadim Barkov from comment #18) > Created attachment 108754 [details] > Initial vector support (chapter 21) (fixed doc) This is pretty good on the whole, but there are a couple of areas of concern. Comments below. +++ b/memcheck/mc_translate.c + case Iop_Perm8x16x2: + /* (V128, V128, V128) -> V128 */ + return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, vatom3)); This isn't right. It needs to steer the shadow values but using atom3 as the steering value, not vatom3. Also it needs to somehow take into account the fact that atom3 might be undefined. See the handling of Iop_Perm8x8 in the same file. VEX/priv/host_s390_defs.c +s390_amode * +s390_amode_for_stack_pointer(Int offset) +{ + if (fits_unsigned_12bit(offset)) + return s390_amode_b12(offset, s390_hreg_stack_pointer()); + + if (fits_signed_20bit(offset)) + return s390_amode_b20(offset, s390_hreg_stack_pointer()); If in both cases you're returning a 's390_amode*', is there any purpose in having both variants? Why not always just use the 20 bit variant? +s390_isel_vec_expr_wrk(ISelEnv *env, IRExpr *expr) .. + /* --------- LOAD --------- */ + case Iex_Load: { Nit: please indent the 'case' and the comment the same amount, here and below. VEX/pub/libvex_guest_s390x.h + /* 64 */ union{ ULong guest_f0; V128 guest_v0; }; .. + /* 304 */ union{ ULong guest_f15; V128 guest_v15; }; This is definitely not right, because the two union components have different sizes (8 vs 16 bytes). Please remove the _f[0-15] variants and the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1] instead. This is what various other ports do, that have similar register arrangements. VEX/priv/guest_s390_defs.h + S390_CC_VEC_LAST = 3 // supposed to be tha last element in enum Typo: s/tha/the VEX/priv/guest_s390_helpers.c +/*------------------------------------------------------------*/ +/*--- Dirty helper for vector instructions ---*/ +/*------------------------------------------------------------*/ +#if defined(VGA_s390x) Nit: please, ensure there is at least one blank line before/after these big comment blocks, here and below VEX/priv/guest_s390_toIR.c +s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr), + UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb) +{ + const HChar *mnm; + IRTemp op2addr = newTemp(Ity_I64); + + if (! s390_host_has_vx) { + emulation_failure(EmFail_S390X_vx); + return; + } Is this really right? Looking at emulation_failure(), I see that this produces a jump of the kind Ijk_EmFail. For other targets, and instruction-decode failure produces a jump of the kind Ijk_NoDecode, which is different -- it causes Valgrind to send SIGILL to the guest. So I don't think this is really right. Unless the s390 response to an undecodeable insn is something different than SIGILL. +s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4) +s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4) +s390_irgen_VUPH(UChar v1, UChar v2, UChar m3) +s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3) +s390_irgen_VUPL(UChar v1, UChar v2, UChar m3) +s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3) Many of these look like they are performing vector operations by doing one lane at a time. Whilst this will generate working code, it will also cause the translations to be big. In extreme cases of this (on other architectures), given long sequences of vector instructions in the input, we have had problems in which VEX's internal storage overflowed and so it had to abort. I would therefore encourage you to generate shorter sequences that make more extensive use of the vector IROps, if possible. one/tests/s390x/vector.c These tests test decoding, but they don't test the arithmetic much at all. Please add more tests, possibly by running each test (eg) 50 times with random data. That often shakes out edge cases. See for example none/tests/amd64/avx-1.c.
Created attachment 108826 [details] Initial vector support (chapter 21) (after review) Changes: Reworked tests to be similar to amd64's avx-1.c Reworked VRs in guest structure (removed FPR & VR unions) Reworked pack/unpack and interleave insn to use only one IR operation Reworked Iop_Perm8x16x2 code in memcheck Fixed typoes and formatting issues Found and fixed some bugs with new tests for this insns: VSTL VPDI VPKS (CC part)
(In reply to Julian Seward from comment #19) > (In reply to Vadim Barkov from comment #18) > > Created attachment 108754 [details] > > Initial vector support (chapter 21) (fixed doc) > > This is pretty good on the whole, but there are a couple of areas of concern. > Comments below. > > > +++ b/memcheck/mc_translate.c > > + case Iop_Perm8x16x2: > + /* (V128, V128, V128) -> V128 */ > + return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, > vatom3)); > > This isn't right. It needs to steer the shadow values but using atom3 as > the steering value, not vatom3. Also it needs to somehow take into account > the fact that atom3 might be undefined. See the handling of Iop_Perm8x8 in > the same file. > Is this code right? case Iop_Perm8x16x2: /* (V128, V128, V128) -> V128 */ complainIfUndefined(mce, atom3, NULL); return mkUifUV128( mce, assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, atom3)), mkPCast8x16(mce, vatom3) ); > VEX/priv/host_s390_defs.c > > > +s390_amode * > +s390_amode_for_stack_pointer(Int offset) > +{ > + if (fits_unsigned_12bit(offset)) > + return s390_amode_b12(offset, s390_hreg_stack_pointer()); > + > + if (fits_signed_20bit(offset)) > + return s390_amode_b20(offset, s390_hreg_stack_pointer()); > > If in both cases you're returning a 's390_amode*', is there any purpose > in having both variants? Why not always just use the 20 bit variant? To be honest this function is heavily inspired by "s390_amode_for_guest_state" one from the same file and I don't know why we have two variants. I guess it's a try to save memory or something else. Since the functions do the very similar job they have similar implementation. > VEX/pub/libvex_guest_s390x.h > > + /* 64 */ union{ ULong guest_f0; V128 guest_v0; }; > .. > + /* 304 */ union{ ULong guest_f15; V128 guest_v15; }; > > This is definitely not right, because the two union components have > different sizes (8 vs 16 bytes). Please remove the _f[0-15] variants and > the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1] > instead. This is what various other ports do, that have similar register > arrangements. Fixed. > VEX/priv/guest_s390_toIR.c > +s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr), > + UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb) > +{ > + const HChar *mnm; > + IRTemp op2addr = newTemp(Ity_I64); > + > + if (! s390_host_has_vx) { > + emulation_failure(EmFail_S390X_vx); > + return; > + } > > Is this really right? Looking at emulation_failure(), I see that this > produces a jump of the kind Ijk_EmFail. For other targets, and > instruction-decode failure produces a jump of the kind Ijk_NoDecode, which > is different -- it causes Valgrind to send SIGILL to the guest. > > So I don't think this is really right. Unless the s390 response to an > undecodeable insn is something different than SIGILL. > "s390_host_has_vx" is not a decoding failure reaction. It indicates that valgrind decoded vector instruction which it cannot emulate / handle due to missing vector facility. In the same way valgrind (on s390x machine) calls emulation_failure() when it decodes some FP insn and doesn't have fpext (floating-point extension) facility. So I think this is right. > +s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4) > +s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4) > +s390_irgen_VUPH(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPL(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3) > > Many of these look like they are performing vector operations by doing one > lane at a time. Whilst this will generate working code, it will also cause > the translations to be big. In extreme cases of this (on other > architectures), given long sequences of vector instructions in the input, we > have had problems in which VEX's internal storage overflowed and so it had > to abort. I would therefore encourage you to generate shorter sequences > that make more extensive use of the vector IROps, if possible. Fixed. > one/tests/s390x/vector.c > > These tests test decoding, but they don't test the arithmetic much at all. > Please add more tests, possibly by running each test (eg) 50 times with > random data. That often shakes out edge cases. See for example > none/tests/amd64/avx-1.c. Done.
(In reply to Julian Seward from comment #19) If the new Iop_Perm8x16x2 handling in memcheck is incorrect I'll remove Iop_Perm8x16x2 operation and will handle VPERM insn via dirtyhelper. I already have some code for vector string insns so it isn't hard to implement. Please let me know your opinion on the last patch.
(In reply to Andreas Arnez from comment #12) > (In reply to Vadim Barkov from comment #10) > > Created attachment 108579 [details] > > Initial vector support (chapter 21) (remastered) > > > > Changes: > > - Removed ALL trailing whitespaces changes (patch size decreased in five > > times) > > - Removed some __inline__ 's from guest_s390_toIR.c > > - Rearranged code in sequence of commits (patches) to make the review > > process easier > > Yeah, that looks much better. Good work! A few comments: > * In /none/tests/s390x/vector.c, you use z13 instructions in __asm__ > directives. This is probably OK if you make sure that this test is executed > only on systems that support this. See, for instance how this is done for > AVX using the build-time variable BUILD_AVX_TESTS defined in configure.ac. > * When running auxprogs/s390-check-opcodes.pl with the appropriate > parameters, I get some warnings. It would be nice if you could get rid of > them. Is the current patch okay? Could you merge it or tell me what's wrong with it please?
Sorry for the delay. I will review it in Monday.
*on* Monday, that is.
(In reply to Vadim Barkov from comment #23) > Is the current patch okay? Could you merge it or tell me what's wrong with > it please? For the record, I didn't find any further problems with the patch, except maybe a few nits: - In none/tests/s390x/vector.c, test_once(vgbm) is invoked twice for no apparent reason. - In none/tests/s390x/vector.h, the comment for s390_test_generate() seems misleading. In particular it states the the macro takes up to three arguments (in fact, just two?) and that the arguments can be modified for testing (not sure what that means -- example?). - Some minor spelling issues and long lines. I can't approve the patch though. Julian, are you OK with the patch now?
Julian, did you check this out? I just want to remind you if you forget about this issue for some reason.
(In reply to Vadim Barkov from comment #20) > Created attachment 108826 [details] > Initial vector support (chapter 21) (after review) Looks good to me. My only comment is below. OK to land after you also deal with Andreas' comment 26, and if this tests OK both on latest hardware and older hardware. I am sorry for the really slow review here. My bad.
(In reply to Julian Seward from comment #28) > Looks good to me. My only comment is below. Duh! Here is is: guest_s390_helpers.c, __inline__ directives eg static __inline__ void Please remove the __inline__ directives. We build with -finline-functions, these are all low-frequency functions, and I prefer to leave gcc to decide for itself what to inline, in this case.
Created attachment 109635 [details] Initial vector support (chapter 21) (hope final) Changes: Removed __inline__'s Removed unused functions from guest_s390_toIR.c Fixed minor issues from Andreas' comment 26 Test results are equal to nonpatched version on z13 machine. On earlier machine vector test shouldn't be executed however I haven't any one for testing. I guess it's ready to apply since all your suggestions are implemented.
Are here other task for me to do?
Created attachment 109653 [details] Initial vector support (chapter 21) (hope final) Previous patch (in comment 30) was corrupted while uploading. This is correct one. Please check the comment to previous patch.
(In reply to Vadim Barkov from comment #32) > Created attachment 109653 [details] > Initial vector support (chapter 21) (hope final) Hi Vadim. Andreas and I tried to apply the patch to the current git trunk, but it doesn't apply. For example checking file none/tests/s390x/Makefile.am Hunk #1 FAILED at 18. 1 out of 2 hunks FAILED Also, given this kind of output .. checking file VEX/priv/guest_s390_toIR.c Hunk #1 succeeded at 15016 (offset -2189 lines). .. I guess the patch is against a quite old version of the tree. Could you please rebase it against the current git trunk, *and* re-test it? Thanks.
Please try use --ignore-whitespaces option while paching (I forgot to mention it). It works for me (on latest commit from Jan, 3, 0f18cfc986f800b107c7eee063b8b7c04617e0b8). Anyway I'll try to find problem and post the correct version of patch today.
Created attachment 109672 [details] Initial vector support (chapter 21) (fixed final) Changes: Fixed applying issues (were some problems with mixed tabs and spaces). Not it is fine. This patch has passed tests.
Pushed as f1a49eeb427caf42e7af2da2b91198e55c6f33b2. Thanks for working on this.
Thank you for apply. Could you mark this as resolved and close?