| Summary: | s390x: provide STFLE instruction support | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Christian Borntraeger <borntraeger> |
| Component: | vex | Assignee: | Julian Seward <jseward> |
| Status: | RESOLVED FIXED | ||
| Severity: | wishlist | CC: | divyvyas, flo2030 |
| Priority: | NOR | ||
| Version First Reported In: | 3.7 SVN | ||
| Target Milestone: | --- | ||
| Platform: | Compiled Sources | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
Patch that implements STFLE instruction
updated patch for stfle instruction updated patch for stfle instruction updated patch for stfle instruction fixup for patch 60263 STFLE patch updated patch for stfle instruction with removed whitespaces updated patch for stfle instruction Changed patch ofr STFLE instruction STFLE patch updated patch for stfle instruction |
||
|
Description
Christian Borntraeger
2011-04-26 16:16:30 UTC
Created attachment 59933 [details]
Patch that implements STFLE instruction
(In reply to comment #1) This patch has some of the same issues as the initial patch for #271779 See below.. > Index: src/VEX/priv/guest_s390_toIR.c > =================================================================== > --- src/VEX/priv/guest_s390_toIR.c (revision 1057) > +++ src/VEX/priv/guest_s390_toIR.c (working copy) > @@ -50,8 +50,24 @@ > #define likely(x) __builtin_expect(!!(x), 1) > #define unlikely(x) __builtin_expect(!!(x), 0) > > +/* z architecture bit */ > +#define STFLE_DW0_ZARCH 0x6000000000000000UL This needs to be 0x6000000000000000ULL (note ULL suffix). Otherwise there will be compiler warnings when this code gets compiled on 32-bit platforms. Likewise for the other #defines. > +/* N3 bit*/ > +#define STFLE_DW0_N3 0x8000000000000000UL > +#define STFLE_DW0_STFLE 0x0100000000000000UL > +/*Long displacement bit 18 and 19 */ > +#define STFLE_DW0_LDIS (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 )) Name this STFLE_DW0_LDISP as we've been using LDISP to denote long-displacement in other places (see libvex.h and search for S390_HWCAPS) > +/* extended immediate facility*/ > +#define STFLE_DW0_EXIMM (1ULL << ( 63 - 21 )) STFLE_DW0_EIMM > +/*General instruction*/ > +#define STFLE_DW0_GENINS (1ULL << ( 63 - 34 )) STFLE_DW0_GIE > +#define STFLE_DW0_EXRL (1ULL << ( 63 - 35 )) > +/* floating point support*/ > +#define STFLE_DW0_FP (1ULL << ( 63 - 41 )) > STFLE_DW0_DFP ? > +#define STFLE_DW0 STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE | STFLE_DW0_LDIS | STFLE_DW0_EXIMM | STFLE_DW0_GENINS | STFLE_DW0_EXRL | STFLE_DW0_FP > Too long a line. 80 characters is the limit. I would also suggest to move this block of #defines right above the function s390_irgen_STFLE. Because that where it is used. > > +static HChar * > +s390_irgen_STFLE(IRTemp op2addr) > +{ > + IRExpr** args = mkIRExprVec_0(); The ** belong to the declarator not the type specifier. The postmodernism argument does not apply here :) > + IRDirty *d; > + IRTemp hoststfle = newTemp(Ity_I64); > + > + d= unsafeIRDirty_1_N (hoststfle,0, "s390x_dirtyhelper_STFLE", > + &s390x_dirtyhelper_STFLE,args); d = unsafe.... Let emacs do the alignment Use a space after a comma because whitespaceimprovesreadibility. When unsure about how to format things look around in the file and see how it's done elsewhere. That is good guidance, typically. > + stmt(IRStmt_Dirty(d)); > + > store(mkexpr(op2addr),binop(Iop_And64,mkexpr(hoststfle),mkU64(STFLE_DW0))); Here, too. > Index: src/VEX/priv/guest_s390_helpers.c > =================================================================== > --- src/VEX/priv/guest_s390_helpers.c (revision 1057) > +++ src/VEX/priv/guest_s390_helpers.c (working copy) > @@ -234,6 +234,15 @@ > last_execute_target = torun; > } > > +ULong > +s390x_dirtyhelper_STFLE(void) > +{ > + ULong hoststfle; > + asm volatile("stfle %0" /* stfle */ > + : "=m" (hoststfle) : : "0","cc"); > + return hoststfle; > +} > + Errr... And this compiles on non-s390x platforms? Add a header: /*------------------------------------------------------------*/ /*--- Dirty helper for Store Facility instruction ---*/ /*------------------------------------------------------------*/ or something like that. (In reply to comment #1) > Created an attachment (id=59933) [details] > Patch that implements STFLE instruction One more thing.. You also need to update none/tests/s390x/Makefile.am and add the new testcase. *** Bug 273113 has been marked as a duplicate of this bug. *** Divya, thanks for the first version. In addition to Florians comments here are some additional remarks. > +#define STFLE_DW0_STFLE 0x0100000000000000UL > +/*Long displacement bit 18 and 19 */ > +#define STFLE_DW0_LDIS (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 )) In addition to what Florian said (move before the function and ULL) it would be better to have the same encoding everywhere: either a hex number or something with a shift. > +the value is e100340030400000 and cc is 0 > +the written double words 1 [...] > Index: src/none/tests/s390x/stfle.c [...] > + printf("the value is %16.16lx and cc is %d\n",*list,cc); While the test will work on the latest model, the result will differ on a z9. Can you add 0x to the output and use 0x................ ? (see none/tests/x86/cpuid for an example). Then the value should be filtered by the test system. > --- src/none/tests/s390x/stfle.vgtest (revision 0) > +++ src/none/tests/s390x/stfle.vgtest (revision 0) > @@ -0,0 +1 @@ > +prog: stfle To disable the testcase on old systems without stfle like z900, can you add prereq: ../../../tests/s390x_features s390x-stckf (This requires the stck patch to be applied) Christian On 12/05/11 15:33, Florian Krohm wrote:
>> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 ))
>>
>
> STFLE_DW0_DFP ?
This is
"The floating-point-support-enhancement facilities (FPR-GR-loading, FPS-sign-handling,
and DFP-rounding) are installed in the z/Architecture architectural mode."
What about STFLE_DW0_FPSUPP?
Christian
(In reply to comment #6) > On 12/05/11 15:33, Florian Krohm wrote: > >> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 )) > >> > > > > STFLE_DW0_DFP ? > > This is > "The floating-point-support-enhancement facilities (FPR-GR-loading, > FPS-sign-handling, > and DFP-rounding) are installed in the z/Architecture architectural mode." > > What about STFLE_DW0_FPSUPP? > Sounds good to me. Created attachment 60114 [details]
updated patch for stfle instruction
Please review the patch.
> Index: tests/s390x_features.c > =================================================================== This file is already in svn. Did you do an "svn up" before cutting your patch? > Index: tests/Makefile.am > =================================================================== > --- tests/Makefile.am (revision 1070) > +++ tests/Makefile.am (working copy) > @@ -22,6 +22,7 @@ > os_test \ > true \ > x86_amd64_features > + s390x_features > already in svn > Index: VEX/priv/guest_s390_toIR.c > =================================================================== > --- VEX/priv/guest_s390_toIR.c (revision 1070) > +++ VEX/priv/guest_s390_toIR.c (working copy) > @@ -10498,6 +10498,39 @@ > return "flogr"; > } > > +#define STFLE_DW0_ZARCH (1ULL << (63-1))|(1ULL << (63-2)) /* z architecture bit */ If the replacement list of a macro is an expression it should be completely enclosed in parentheses to avoid subtle operator precedence errors when the macro is used. So it should be like so. #define STFLE_DW0_ZARCH ((1ULL << (63-1))|(1ULL << (63-2))) /* z architecture bit */ > +#define STFLE_DW0_N3 (1ULL << 63) /* N3 bit*/ > +#define STFLE_DW0_STFLE (1ULL << (63-7)) /*stfle facilty*/ > + These two are OK. > +#define STFLE_DW0_LDISP (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 )) /* Long-displacement facility*/ This one needs parentheses as described above > +#define STFLE_DW0_EIMM (1ULL << ( 63 - 21 )) /* extended immediate facility*/ > +#define STFLE_DW0_GIE (1ULL << ( 63 - 34 )) /*General-instruction-extension facility*/ > +#define STFLE_DW0_EXRL (1ULL << ( 63 - 35 )) > +#define STFLE_DW0_FP (1ULL << ( 63 - 41 )) /* floating point support*/ > + We had said that this ought to be called STFLE_DW0_FPSUPP (see Christian's comment). Bonus points for aligning things in a way that they are easier to the eye. Like so: #define STFLE_DW0_EIMM (1ULL << ( 63 - 21 )) /* Extended immediate facility */ #define STFLE_DW0_GIE (1ULL << ( 63 - 34 )) /* General-instruction-extension facility */ #define STFLE_DW0_EXRL (1ULL << ( 63 - 35 )) That also shows that you have good taste :) If lines get too long, drop "facility" from the comment. > +#define STFLE_DW0 STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \ > +| STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE \ > +| STFLE_DW0_EXRL | STFLE_DW0_FP > + Needs to be parenthesized (see above) and the continuation lines need indentation. Side note: I did not proof-read the encodings. > +static HChar * > +s390_irgen_STFLE(IRTemp op2addr) > +{ > + IRDirty *d; > + > + IRTemp hoststfle = newTemp(Ity_I64); > + > + d = unsafeIRDirty_1_N (hoststfle, 0, "s390x_dirtyhelper_STFLE", > + &s390x_dirtyhelper_STFLE, mkIRExprVec_0()); > + alignment > + stmt(IRStmt_Dirty(d)); > + > store(mkexpr(op2addr),binop(Iop_And64,mkexpr(hoststfle),mkU64(STFLE_DW0))); In an earlier review I had requested spaces after commas in argument lists. They are still missing. Please add them. > Index: VEX/priv/guest_s390_helpers.c > =================================================================== > --- VEX/priv/guest_s390_helpers.c (revision 1070) > +++ VEX/priv/guest_s390_helpers.c (working copy) > @@ -235,6 +235,23 @@ > } > > /*------------------------------------------------------------*/ > +/*--- Dirty helper for Store Facility instruction ---*/ > +/*------------------------------------------------------------*/ > + > +#if defined(VGA_s390x) > +ULong s390x_dirtyhelper_STFLE(void) > +{ > + ULong hoststfle; A blank line is missing here to deparate declarations from statements. > + asm volatile("stfle %0" /* stfle */ > + : "=m" (hoststfle) : : "0","cc"); > + return hoststfle; > + > +} > +#else > +ULong s390x_dirtyhelper_STFLE(void);{return 3;} > +#endif /* VGA_s390x */ > + > +/*------------------------------------------------------------*/ > /*--- Helper for condition code. ---*/ > /*------------------------------------------------------------*/ > Created attachment 60179 [details]
updated patch for stfle instruction
Hi,
I am giving the updated patch with suggested changes. I must admit that I am bad in giving indentations and spaces. Please review it .
(In reply to comment #10) > Created an attachment (id=60179) [details] > updated patch for stfle instruction > > Hi, > I am giving the updated patch with suggested changes. I must admit that I am > bad in giving indentations and spaces. Please review it . The patch does not apply cleanly. I see this: florian@ds9:~/valgrind/trunk> patch --dry -p0 < /home/florian/Downloads/stfle-patch patching file VEX/priv/guest_s390_toIR.c Hunk #1 succeeded at 10552 with fuzz 1 (offset 54 lines). Hunk #2 succeeded at 11025 (offset 54 lines). patching file VEX/priv/guest_s390_defs.h Hunk #1 FAILED at 79. 1 out of 1 hunk FAILED -- saving rejects to file VEX/priv/guest_s390_defs.h.rej patching file VEX/priv/guest_s390_helpers.c Hunk #1 succeeded at 278 with fuzz 1 (offset 43 lines). patching file tests/s390x_features.c patching file tests/Makefile.am Hunk #1 FAILED at 22. 1 out of 1 hunk FAILED -- saving rejects to file tests/Makefile.am.rej patching file none/tests/s390x/stfle.vgtest patching file none/tests/s390x/Makefile.am Hunk #1 FAILED at 5. 1 out of 1 hunk FAILED -- saving rejects to file none/tests/s390x/Makefile.am.rej patching file none/tests/s390x/stfle.stderr.exp patching file none/tests/s390x/stfle.stdout.exp patching file none/tests/s390x/stfle.c Hi Florian, I have created the patch using svn diff. and I am able to apply patch successfully. divya@tuxmaker:~/new1/trunk/src> patch --dry -p0 < stfle.patch patching file VEX/priv/guest_s390_toIR.c patching file VEX/priv/guest_s390_defs.h patching file VEX/priv/guest_s390_helpers.c patching file tests/s390x_features.c patching file tests/Makefile.am patching file none/tests/s390x/stfle.vgtest patching file none/tests/s390x/Makefile.am patching file none/tests/s390x/stfle.stderr.exp patching file none/tests/s390x/stfle.stdout.exp patching file none/tests/s390x/stfle.c Just the difference is I am doing in src folder.Can you please check by doing in src or should I change the patch. (In reply to comment #12) > Hi Florian, > > I have created the patch using svn diff. and I am able to apply patch > successfully. > > divya@tuxmaker:~/new1/trunk/src> patch --dry -p0 < stfle.patch I do not know what the src directory is. When I check out valgrind from svn there is no src directory. The file s390_features.c is already in the repository. See comment #9 above. There is no need to resubmit it. This is one of the reasons why your patch does not apply. I suggest you check out a brand-new version of valgrind and try to apply your patch. Then you see what I mean. Hi, I am not seeing s390_features.c when I am checking out the latest version of valgrind from the tuxmaker. src directory is in trunk/src. I saw that you are applying in trunk folder itself and I am doing in src folder because the patch is created in src folder. Created attachment 60263 [details]
updated patch for stfle instruction
This patch should work.
Created attachment 60272 [details]
fixup for patch 60263
Here is a fixup patch agains 60263.
(Divya, can you apply and merge both patches, test it and then reattach the new patch? Please also mark the old patches as obsolete when doing so)
1.
+#define STFLE_DW0 (STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \
+| STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE \
+| STFLE_DW0_EXRL | STFLE_DW0_FP)
This should be ^^^^ FPSUPP
2. add stfle to none/tests/s390x/Makefile.am
3. in none/tests/s390x/stfle.c: move main after stfle to avoid:
stfle.c:7: warning: implicit declaration of function ‘stfle’
4. make the stfle output machine independent (with 0x.... instead
of the real values) and stdout_filter
5. please add \n newlines to the printfs in stfle.c
(The testcase must not fail!)
6. Remove one newline in none/tests/s390x/stfle.stdout.exp
Created attachment 60348 [details]
STFLE patch
Updated patch with changes.
(In reply to comment #17) > Created an attachment (id=60348) [details] > STFLE patch > > Updated patch with changes. > Index: VEX/priv/guest_s390_helpers.c > =================================================================== > --- VEX/priv/guest_s390_helpers.c (revision 2152) > +++ VEX/priv/guest_s390_helpers.c (working copy) > @@ -278,6 +278,24 @@ > #endif /* VGA_s390x */ > > /*------------------------------------------------------------*/ > +/*--- Dirty helper for Store Facility instruction ---*/ > +/*------------------------------------------------------------*/ > + > +#if defined(VGA_s390x) > +ULong s390x_dirtyhelper_STFLE(void) > +{ > + ULong hoststfle; > + > + asm volatile("stfle %0" /* stfle */ > + : "=m" (hoststfle) : : "0","cc"); This does not compile with binutils versions that do not know about stfle. You need to use .long or .insn here. I also noticed that there were some lines with trailing white space. I don't remember where. We don't want that. Emacs will do the work for you if you add (add-hook 'before-save-hook 'delete-trailing-whitespace) to ~/.gnu-emacs Created attachment 60465 [details]
updated patch for stfle instruction with removed whitespaces
(In reply to comment #19) > Created an attachment (id=60465) [details] > updated patch for stfle instruction with removed whitespaces This code #else ULong s390x_dirtyhelper_STFLE(void);{return 3;} #endif /* VGA_s390x */ is not valid C. It won't compile on non-s390 platforms. Can you please test your patch before posting it? Created attachment 60499 [details]
updated patch for stfle instruction
Florian, Do you use the same testcase on non-s390x platforms . Can you please tell me on which platform you are checking.
It was a semi-colon mistake .
> Florian, Do you use the same testcase on non-s390x platforms . Can you please
> tell me on which platform you are checking.
At least check that any patch works on x86-linux, both 32- and 64-bit,
since those are our primary targets. "Works" means doesn't break the
build, and the regression tests have the same failures before and
after the patch.
Created attachment 60509 [details]
Changed patch ofr STFLE instruction
This patch should work fine.
> --- Comment #23 from divya <divyvyas linux vnet ibm com> 2011-05-31 14:39:54 ---
> Created an attachment (id=60509)
> --> (http://bugs.kde.org/attachment.cgi?id=60509)
> Changed patch ofr STFLE instruction
>
> This patch should work fine.
This one looks ok now.
Divya,
sorry for the long delay here. I was out on vacation for a while.. I'm looking at the patch again and want to nail it soon.
I was reading in the POPs what STFLE does and I'm not sure that the dirty helper does the right thing. This is the code from your patch:
ULong s390x_dirtyhelper_STFLE(void)
{
ULong hoststfle;
asm volatile(" .insn s,0xb2b00000,%0" /* stfle */
: "=m" (hoststfle) : : "0","cc");
return hoststfle;
}
From what I understand, STFLE can write some number of double words beginning at the address given in the instruction. The number of double words written can be larger than 1. The code above assumes that at most 1 double word is written. So there is a problem. And to avoid it you need to assign a value to register 0 that tells STFLE how many double words are available at the address. Here is the relevant paragraph from POPs:
The size of the second operand, in doublewords, is
one more than the value specified in bits 56-63 of
general register 0. The remaining bits of general reg-
ister 0 are unassigned and should contain zeros; oth-
erwise, the program may not operate compatibly in
the future.
I've also peeked into the linux kernel source and it seconds what I'm thinking :) If you're interested look at arch/s390/kernel/early.c around line 277 (I'm referring to the 2.6.39.2 kernel).
So I think register 0 needs to be set to 0 prior to issuing stfle. Note, that you need to modify the constaint for register 0. Currently, it is clobbered, but in reality it is read and then written. So you need something like +d(0) or so.
Finally, here is something I do not understand. In s390_irgen_STFLE you "and" the facility bits with STFLE_DW0. What is the reason for that? Shoudn't we simply pass along what stfle returns?
Hi Florian, Yes as a start,We started with only writing 1 double word and if it works then proceed with more than 1. I will look into that. and after calling dirty helper it returns the facilities bits which are installed in s390 machine. so we do 'and' with STFLE_DW0 to get the facilities bits which are available under valgrind. like if Decimal floating point support is not availible then it should turn off the respective bit. (In reply to comment #26) > > and after calling dirty helper it returns the facilities bits which are > installed in s390 machine. so we do 'and' with STFLE_DW0 to get the facilities > bits which are available under valgrind. like if Decimal floating point support > is not availible then it should turn off the respective bit. If we do what you propose then certain programs will behave differently under valgrind as compared to a native run. Consider the following program int facility_bit_8_is_set(void) { unsigned long long hoststfle; register unsigned long long reg0 asm("0") = 0; asm volatile(" .insn s,0xb2b00000,%0" /* stfle */ : "=m" (hoststfle), "+d"(reg0) : : "cc"); return hoststfle & (63 - 8); } int main() { if (facility_bit_8_is_set()) printf("Extended DAT facility is installed\n"); else printf("Extended DAT facility is not installed\n"); return 0; } Assuming further that the host has the extended DAT facility installed, then the program would print when run natively: Extended DAT facility is installed when run under valgrind: Extended DAT facility is not installed Generally, we want to avoid client programs to behave differently when run under valgrind. So let's not do the "and" operation. (In reply to comment #26) > Hi Florian, > > Yes as a start,We started with only writing 1 double word and if it works then > proceed with more than 1. > Hi Divya, here are some thoughts regarding extending the scheme. I think we should hammer this out now so we don't have to revisit it later. The largest assigned facility bit is (currently) #77. So we need 2 double words to store all bits. Problem is that we cannot return a value larger than 64 bit because it won't fit into a hardware register. So the initial idea of IRTemp hoststfle = newTemp(Ity_I128); // I128 d = unsafeIRDirty_1_N (hoststfle, 0, "s390x_dirtyhelper_STFLE", will not work. Something that might work is to add two slots in the guest state where the facility bits are stored. s390x_dirtyhelper_STFLE can then return the condition code. So we can set that properly (and not set it to always-zero as we currently do). Here's a sketch of what I was thinking about: In libvex_s390x_common.h /* Number of double words needed to store all facility bits */ #define S390_NUM_FACILITY_DW 2 In libvex_guest_s390x.h move this section to the end of the struct definition (and recompute the offsets): /*------------------------------------------------------------*/ /*--- S390 miscellaneous registers ---*/ /*------------------------------------------------------------*/ /* 320 */ ULong guest_counter; /* 328 */ UInt guest_fpc; /* 4-byte hole to enforce alignment requirements */ /* 336 */ ULong guest_IA; Then add: /* ??? */ ULong guest_facilities[S390_NUM_FACILITY_DW]; Should S390_NUM_FACILITY_DW change in the future, then we do not need to make any changes here. in guest_s390_toIR.c static HChar * s390_irgen_STFLE(IRTemp op2addr) { IRDirty *d; IRTemp cc = newTemp(Ity_I64); d = unsafeIRDirty_1_N (cc, 0, "s390x_dirtyhelper_STFLE", &s390x_dirtyhelper_STFLE, mkIRExprVec_0()); d->needsBBP = 1; /* Need to pass pointer to guest state to helper */ d->fxState[0].fx = Ifx_Write; d->fxState[0].offset = S390_GUEST_OFFSET(guest_facility_dw) d->fxState[0].size = S390_NUM_FACILITY_DW * sizeof(ULong); d->fxState[1].fx = Ifx_Modify; /* read then write */ d->fxState[1].offset = S390_GUEST_OFFSET(guest_r0) d->fxState[1].size = sizeof(ULong); stmt(IRStmt_Dirty(d)); s390_cc_set(cc); return "stfle"; } in guest_s390_helpers.c ULong s390x_dirtyhelper_STFLE(VexGuestS390xState *guest_state) { ULong hoststfle, cc; register ULong reg0 asm("0") = guest_state->guest_r0 & 0xF; /* r0[56:63] */ /* We cannot store more than S390_MAX_FACILITY_DW (and it makes not much sense to do so anyhow) */ if (reg0 > S390_MAX_FACILITY_DW - 1) reg0 = S390_MAX_FACILITY_DW - 1; asm volatile(" .insn s,0xb2b00000,%0" /* stfle */ "ipm %1\n" "srl %1,28\n" : "=m" (hoststfle), "+d"(reg0), "=d"(cc) : : "cc"); /* Update guest register 0 with what STFLE set r0 to */ guest_state->guest_r0 = reg0; return cc; } Also in function LibVEX_GuestS390X_initialise: for (i = 0; i < S390_MAX_FACILITY_DW; ++i) state->guest_facilities[i] = 0; This is all not tested, just a sketch. What kind of scheme have you been thinking about? (In reply to comment #28) > Problem is that we cannot return a value larger than 64 bit because it won't > fit into a hardware register. So the initial idea of Does this need to be atomic? Is it performance critical? If no to both questions, you could use a nasty kludge: call the helper twice, passing it an argument telling it whether you want to return the lower or upper 64 bits of a 128 bit result. It's ugly but simple. (In reply to comment #29) > (In reply to comment #28) > > Problem is that we cannot return a value larger than 64 bit because it won't > > fit into a hardware register. So the initial idea of > > Does this need to be atomic? Is it performance critical? If no to > both questions, you could use a nasty kludge: call the helper twice, > passing it an argument telling it whether you want to return the lower > or upper 64 bits of a 128 bit result. It's ugly but simple. The answer is "no" to both. So we could use the scheme. The nasty thing is that the number of double words stored by STFLE depends on the value of r0 at the time stfle is invoked. On a z196 (latest model): r0 #double words stored -------------------------- 0 1 1 2 >=2 2, maybe more (it's unspecified) I forgot to handle that in my sketch... Hmm, since the dirty helper allows writing to memory the following could also work. It wouldn't need the extra pseudo regs in the guest state. static HChar * s390_irgen_STFLE(IRTemp op2addr) { IRDirty *d; IRTemp cc = newTemp(Ity_I64); d = unsafeIRDirty_1_N (cc, 0, "s390x_dirtyhelper_STFLE", &s390x_dirtyhelper_STFLE, mkIRExprVec_1( mkexpr(op2addr))); d->needsBBP = 1; /* Need to pass pointer to guest state to helper */ d->fxState[0].fx = Ifx_Modify; /* read then write */ d->fxState[0].offset = S390_GUEST_OFFSET(guest_r0) d->fxState[0].size = sizeof(ULong); d->nFxState = 1; d->mAddr = mkexpr(op2addr); d->mSize = S390_NUM_FACILITY_DW * sizeof(ULong); /* pretend all double words are written */ d->mFx = Ifx_Write; stmt(IRStmt_Dirty(d)); s390_cc_set(cc); return "stfle"; } ULong s390x_dirtyhelper_STFLE(VexGuestS390xState *guest_state, HWord addr) { ULong hoststfle[S390_NUM_FACILITY_DW], cc, num_dw; register ULong reg0 asm("0") = guest_state->guest_r0 & 0xF; /* r0[56:63] */ /* We cannot store more than S390_NUM_FACILITY_DW (and it makes not much sense to do so anyhow) */ if (reg0 > S390_NUM_FACILITY_DW - 1) reg0 = S390_NUM_FACILITY_DW - 1; num_dw = reg0 + 1; /* number of double words written */ asm volatile(" .insn s,0xb2b00000,%0" /* stfle */ "ipm %1\n" "srl %1,28\n" : "=m" (hoststfle), "+d"(reg0), "=d"(cc) : : "cc"); /* Update guest register 0 with what STFLE set r0 to */ guest_state->guest_r0 = reg0; for (i = 0; i < num_dw; ++i) ((ULong *)addr)[i] = hoststfle[i]; return cc; } Hi florian, I agree with you that stfle write 2 double words as I have also noticed that. I think your suggestion is correct and I will implement stfle code according to the code given by you. and regarding the "anding" the stfle output of dirty helper with STFLE_DW0 , I have one doubt.We don't have decimal floating point support in valgrind.Will it give the respective bit disabled? (In reply to comment #31) > and regarding the "anding" the stfle output of dirty helper with STFLE_DW0 , I > have one doubt.We don't have decimal floating point support in valgrind.Will it > give the respective bit disabled? It is true, that valgrind does not yet support decimal floating point. However, the STFLE insn is about the facilities that the *hardware* supports, not what valgrind supports. So I think we should pass along the unmodified facility bits. But suppose, we did "and" with STFLE_DW. What would be the benefit? On the con side we have the argument from comment #27. Created attachment 61531 [details]
STFLE patch
Hi Florian,
Thank you so much for helping me and giving guidance.
Please try this patch if this matches your requirements.
I have made the number of doublewords as 2 and defined in libvex_s390x_common.h.
and,In future if it increases then we have to just edit this.
(In reply to comment #33) > Created an attachment (id=61531) [details] > > Please try this patch if this matches your requirements. After applying the patch compilation fails on x86. See comment #22. Check the prototype for s390x_dirtyhelper_STFLE. In function s390x_dirtyhelper_STFLE: indentation is 3 white spaces. In your patch it is two. (Hint: emacs will do the work for you) none/tests/s390x/stfle.c When the testcase is run, it may or may not successfully compare against the expected output. The reason is here: printf("the value is 0x%16.16llx and cc is %d\n",*list,cc); This writes out the facility bits of the host. But those may be different depending on the machine the test is running on. My suggestion here is to not write those facility bits out. Just write out the condition code. Additionally, we should check that the returned facility bits have bits #1 and #2 set. valgrind only runs on z/arch, so the host must have those facilities. That gives us a warm feeling that the stfle call returned something sensible and the inline assembly is correct :) The first argument to the stfle function points to 8 bytes of memory (facility_list_extended). But the stfle call will write up to 2 double words into that area, so there is a buffer overflow in the 2nd invocation of stfle. Just declare facility_list_extended to be an array of, say, 10. Which will be large enough until end of humanity. Finally, the number of double words written by stfle depends on the machine model. On newer machines it is 2 and the testcase will compare. But on z990 it will be 1, so the testcase will miscompare. We need to make sure that this testcase only runs on new enough machines. s390x_features should be extended so it can also test for the machine model. I'm thinking that s390x-features should take an optional additional command line argument of the form >z990 =z196 <z800 meaning that the machine model should be - any machine newer z990 - exactly z196 - any machine older than z800 respectively. Bonus points, for handling >= , <= , and != Look at coregrind/m_machine.c function VG_(get_machine_model) for how to figure out the machine mode by reading /proc/cpuinfo. You only need to adapt it which is pretty straight forward. Instead of VG_(open) just use "open" etc.. I'm sure you'll figure it out. Created attachment 61636 [details]
updated patch for stfle instruction
Hi Florian,
I am not sure whether I understood correctly or not but from what I understood I have edited s390x-features.c to take second parameter as machine-check for stfle instruction.So the testcase will run only on newer machines means z9,z10 and z196. Please correct if me I am wrong. In s390x-features.c ,I have given only one condition.If the coding is as you expected then I will proceed . I think what I have implemented is with respect to stfle instructions,I have to implement in a generalize way.
Other than that ,I think the testcase is working as expected.
I've checked in r11862 which provides an updated s390x_features.c Divya, I've checked in VEX r2171 and valgrind r11864 based on your patch. I added some modifications for consistent code layout, more comments and minor fixes in the testcase. Closing it. Thanks Florian.... > Generally, we want to avoid client programs to behave differently when run
> under valgrind. So let's not do the "and" operation.
Florian,
regarding this aspect, I think different.
We _must_ clear out facilities that valgrind cannot handle, no?
Hi Christian, are you back from hiatus or just being bored :) ?? Perhaps you could elaborate why you think that the facilities must be cleared out ? On 18/07/11 17:51, Florian Krohm wrote: > --- Comment #41 from Florian Krohm <britzel acm org> 2011-07-18 15:51:27 --- > Hi Christian, are you back from hiatus or just being bored :) ?? Just checked my mails, will be away another 4 weeks :-) > Perhaps you could elaborate why you think that the facilities must be cleared > out ? There are some newer libraries which use stfle to decide which backend to use. (e.g. some openssl + hw crypto libraries) When we claim support for instructions that valgrind cant handle the application will fail. When we clear out the facilities we force these libraries into a fall back code (which is not ideal) but at least the applications itself will run fine. So I think only passing known good facilities will increase the number of applications valgrind can check. In the end its a trade off between correctness (and we have other things that behave different from native) and better application support. Christian (In reply to comment #42) > There are some newer libraries which use stfle to decide which backend to use. > (e.g. some openssl + hw crypto libraries) > When we claim support for instructions that valgrind cant handle the > application > will fail. When we clear out the facilities we force these libraries into a > fall > back code (which is not ideal) but at least the applications itself will run > fine. > I understand. Looking at a previous patch revision the facility bits from STFLE were and'ed with this: #define STFLE_DW0 (STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \ | STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE \ | STFLE_DW0_EXRL | STFLE_DW0_FPSUPP) That's masking out too many bits. I think we only want to mask out the bits of those facilities that enable a set of user-space instructions which we don't support. E.g. we want to mask out bit 20 (HFP something) but not bit 8 (extended DAT). Can you assemble the list of those facilities? It would also be a good thing to issue a message upon decoding STFLE that we're mucking with the facility bits and that some programs may behave differently. Actually, ideally we want to give that message only in those cases where the actual set of facilities includes one that we're masking out. But it's probably not allowed to print something in a dirty helper, is it? |