Bug 271776

Summary: s390x: provide STFLE instruction support
Product: [Developer tools] valgrind Reporter: Christian Borntraeger <borntraeger>
Component: vexAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: wishlist CC: divyvyas, flo2030
Priority: NOR    
Version First Reported In: 3.7 SVN   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Patch that implements STFLE instruction
updated patch for stfle instruction
updated patch for stfle instruction
updated patch for stfle instruction
fixup for patch 60263
STFLE patch
updated patch for stfle instruction with removed whitespaces
updated patch for stfle instruction
Changed patch ofr STFLE instruction
STFLE patch
updated patch for stfle instruction

Description Christian Borntraeger 2011-04-26 16:16:30 UTC
Version:           3.7 SVN
OS:                Linux

some tools use the STFLE instruction to check for HW capabilities.

valgrind should implement STFLE and provide the intersection of VEX capabilites and system capabilities.

We are currently working on a patch.


Reproducible: Always
Comment 1 divya 2011-05-12 12:38:43 UTC
Created attachment 59933 [details]
Patch that implements STFLE instruction
Comment 2 Florian Krohm 2011-05-12 15:33:51 UTC
(In reply to comment #1)

This patch has some of the same issues as the initial patch for #271779
See below..

> Index: src/VEX/priv/guest_s390_toIR.c
> ===================================================================
> --- src/VEX/priv/guest_s390_toIR.c	(revision 1057)
> +++ src/VEX/priv/guest_s390_toIR.c	(working copy)
> @@ -50,8 +50,24 @@
>  #define likely(x)	__builtin_expect(!!(x), 1)
>  #define unlikely(x)	__builtin_expect(!!(x), 0)
>  
> +/* z architecture bit */
> +#define STFLE_DW0_ZARCH 0x6000000000000000UL

This needs to be 0x6000000000000000ULL  (note ULL suffix). Otherwise there
will be compiler warnings when this code gets compiled on 32-bit platforms.
Likewise for the other #defines.

> +/* N3 bit*/
> +#define STFLE_DW0_N3 0x8000000000000000UL
> +#define STFLE_DW0_STFLE 0x0100000000000000UL
> +/*Long displacement bit 18 and 19 */
> +#define STFLE_DW0_LDIS (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 ))

Name this STFLE_DW0_LDISP as we've been using LDISP to denote long-displacement in other places (see libvex.h and search for S390_HWCAPS)

> +/* extended immediate facility*/
> +#define STFLE_DW0_EXIMM (1ULL << ( 63 - 21 ))

STFLE_DW0_EIMM

> +/*General instruction*/
> +#define STFLE_DW0_GENINS (1ULL << ( 63 - 34 ))

STFLE_DW0_GIE

> +#define STFLE_DW0_EXRL (1ULL << ( 63 - 35 ))
> +/* floating point support*/
> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 ))
>  

STFLE_DW0_DFP ?

> +#define STFLE_DW0 STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE | STFLE_DW0_LDIS | STFLE_DW0_EXIMM | STFLE_DW0_GENINS | STFLE_DW0_EXRL | STFLE_DW0_FP
>  

Too long a line. 80 characters is the limit.

I would also suggest to move this block of #defines right above the function
s390_irgen_STFLE. Because that where it is used.

>  
> +static HChar *
> +s390_irgen_STFLE(IRTemp op2addr)
> +{
> +   IRExpr** args = mkIRExprVec_0();

The ** belong to the declarator not the type specifier. The postmodernism
argument does not apply here :)  

> +   IRDirty *d;
> +   IRTemp hoststfle = newTemp(Ity_I64);
> +
> +   d= unsafeIRDirty_1_N (hoststfle,0, "s390x_dirtyhelper_STFLE",
> +                       &s390x_dirtyhelper_STFLE,args);

d = unsafe....
Let emacs do the alignment 
Use a space after a comma because whitespaceimprovesreadibility.

When unsure about how to format things look around in the file and see how it's done elsewhere. That is good guidance, typically.

> +   stmt(IRStmt_Dirty(d));
> +
> store(mkexpr(op2addr),binop(Iop_And64,mkexpr(hoststfle),mkU64(STFLE_DW0)));

Here, too.

> Index: src/VEX/priv/guest_s390_helpers.c
> ===================================================================
> --- src/VEX/priv/guest_s390_helpers.c	(revision 1057)
> +++ src/VEX/priv/guest_s390_helpers.c	(working copy)
> @@ -234,6 +234,15 @@
>     last_execute_target = torun;
>  }
>  
> +ULong
> +s390x_dirtyhelper_STFLE(void)
> +{
> +   ULong hoststfle;
> +   asm volatile("stfle %0" /* stfle */
> +                    : "=m" (hoststfle) : : "0","cc");
> +   return hoststfle;
> +}
> +

Errr...  And this compiles on non-s390x platforms?

Add a header:

/*------------------------------------------------------------*/
/*--- Dirty helper for Store Facility instruction          ---*/
/*------------------------------------------------------------*/

or something like that.
Comment 3 Florian Krohm 2011-05-14 16:44:53 UTC
(In reply to comment #1)
> Created an attachment (id=59933) [details]
> Patch that implements STFLE instruction

One more thing.. You also need to update none/tests/s390x/Makefile.am and add the new testcase.
Comment 4 Florian Krohm 2011-05-14 18:28:11 UTC
*** Bug 273113 has been marked as a duplicate of this bug. ***
Comment 5 Christian Borntraeger 2011-05-16 16:45:27 UTC
Divya,

thanks for the first version. In addition to Florians comments here are
some additional remarks.

> +#define STFLE_DW0_STFLE 0x0100000000000000UL
> +/*Long displacement bit 18 and 19 */
> +#define STFLE_DW0_LDIS (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 ))

In addition to what Florian said (move before the function and ULL)
it would be better to have the same encoding everywhere: either a hex number
or something with a shift.

> +the value is e100340030400000 and cc is 0
> +the written double words 1
[...]
> Index: src/none/tests/s390x/stfle.c
[...]
> +   printf("the value is %16.16lx and cc is %d\n",*list,cc);

While the test will work on the latest model, the result will differ on
a z9. Can you add 0x to the output and use 0x................ ? (see
none/tests/x86/cpuid for an example). Then the value should be filtered 
by the test system.

> --- src/none/tests/s390x/stfle.vgtest	(revision 0)
> +++ src/none/tests/s390x/stfle.vgtest	(revision 0)
> @@ -0,0 +1 @@
> +prog: stfle

To disable the testcase on old systems without stfle like z900, can you add
prereq: ../../../tests/s390x_features s390x-stckf
(This requires the stck patch to be applied)



Christian
Comment 6 Christian Borntraeger 2011-05-16 16:50:11 UTC
On 12/05/11 15:33, Florian Krohm wrote:
>> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 ))
>>  
> 
> STFLE_DW0_DFP ?

This is 
"The floating-point-support-enhancement facilities (FPR-GR-loading, FPS-sign-handling,
and DFP-rounding) are installed in the z/Architecture architectural mode."

What about STFLE_DW0_FPSUPP?

Christian
Comment 7 Florian Krohm 2011-05-17 14:51:46 UTC
(In reply to comment #6)
> On 12/05/11 15:33, Florian Krohm wrote:
> >> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 ))
> >>  
> > 
> > STFLE_DW0_DFP ?
> 
> This is 
> "The floating-point-support-enhancement facilities (FPR-GR-loading,
> FPS-sign-handling,
> and DFP-rounding) are installed in the z/Architecture architectural mode."
> 
> What about STFLE_DW0_FPSUPP?
> 

Sounds good to me.
Comment 8 divya 2011-05-18 13:29:26 UTC
Created attachment 60114 [details]
updated patch for stfle instruction

Please review the patch.
Comment 9 Florian Krohm 2011-05-18 22:25:01 UTC
> Index: tests/s390x_features.c
> ===================================================================

This file is already in svn. Did you do an "svn up" before cutting
your patch?

> Index: tests/Makefile.am
> ===================================================================
> --- tests/Makefile.am	(revision 1070)
> +++ tests/Makefile.am	(working copy)
> @@ -22,6 +22,7 @@
>  	os_test \
>  	true \
>  	x86_amd64_features
> +        s390x_features
>  

already in svn


> Index: VEX/priv/guest_s390_toIR.c
> ===================================================================
> --- VEX/priv/guest_s390_toIR.c	(revision 1070)
> +++ VEX/priv/guest_s390_toIR.c	(working copy)
> @@ -10498,6 +10498,39 @@
>     return "flogr";
>  }
>  
> +#define STFLE_DW0_ZARCH (1ULL << (63-1))|(1ULL << (63-2))  /* z architecture bit */

If the replacement list of a macro is an expression it should be
completely enclosed in parentheses to avoid subtle operator precedence
errors when the macro is used. So it should be like so.

#define STFLE_DW0_ZARCH ((1ULL << (63-1))|(1ULL << (63-2)))  /* z architecture bit */

> +#define STFLE_DW0_N3 (1ULL << 63)               /* N3 bit*/
> +#define STFLE_DW0_STFLE (1ULL << (63-7))        /*stfle facilty*/
> +

These two are OK.

> +#define STFLE_DW0_LDISP (1ULL << ( 63 - 18 )) | (1ULL << ( 63 - 19 ))                                 /* Long-displacement facility*/

This one needs parentheses as described above

> +#define STFLE_DW0_EIMM (1ULL << ( 63 - 21 )) /* extended immediate  facility*/
> +#define STFLE_DW0_GIE (1ULL << ( 63 - 34 ))  /*General-instruction-extension facility*/
> +#define STFLE_DW0_EXRL (1ULL << ( 63 - 35 ))

> +#define STFLE_DW0_FP (1ULL << ( 63 - 41 ))   /* floating point support*/
> +

We had said that this ought to be called STFLE_DW0_FPSUPP (see Christian's
comment).

Bonus points for aligning things in a way that they are easier to the eye.
Like so:

#define STFLE_DW0_EIMM (1ULL << ( 63 - 21 ))  /* Extended immediate facility */
#define STFLE_DW0_GIE  (1ULL << ( 63 - 34 ))  /* General-instruction-extension facility */
#define STFLE_DW0_EXRL (1ULL << ( 63 - 35 ))

That also shows that you have good taste :)
If lines get too long, drop "facility" from the comment. 

> +#define STFLE_DW0 STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \
> +| STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE \
> +| STFLE_DW0_EXRL | STFLE_DW0_FP
> +

Needs to be parenthesized (see above) and the continuation lines need
indentation.

Side note: I did not proof-read the encodings.

> +static HChar *
> +s390_irgen_STFLE(IRTemp op2addr)
> +{
> +   IRDirty *d;
> +
> +   IRTemp hoststfle = newTemp(Ity_I64);
> +
> +   d = unsafeIRDirty_1_N (hoststfle, 0, "s390x_dirtyhelper_STFLE",
> +                       &s390x_dirtyhelper_STFLE, mkIRExprVec_0());
> +

alignment

> +   stmt(IRStmt_Dirty(d));
> +
> store(mkexpr(op2addr),binop(Iop_And64,mkexpr(hoststfle),mkU64(STFLE_DW0)));

In an earlier review I had requested spaces after commas in argument lists.
They are still missing. Please add them.

> Index: VEX/priv/guest_s390_helpers.c
> ===================================================================
> --- VEX/priv/guest_s390_helpers.c	(revision 1070)
> +++ VEX/priv/guest_s390_helpers.c	(working copy)
> @@ -235,6 +235,23 @@
>  }
>  
>  /*------------------------------------------------------------*/
> +/*--- Dirty helper for Store Facility instruction          ---*/
> +/*------------------------------------------------------------*/
> +
> +#if defined(VGA_s390x)
> +ULong s390x_dirtyhelper_STFLE(void)
> +{
> +  ULong hoststfle;

A blank line is missing here to deparate declarations from statements.

> +  asm volatile("stfle %0" /* stfle */
> +                    : "=m" (hoststfle) : : "0","cc");
> +  return hoststfle;
> +
> +}
> +#else
> +ULong s390x_dirtyhelper_STFLE(void);{return 3;}
> +#endif /* VGA_s390x */
> +
> +/*------------------------------------------------------------*/
>  /*--- Helper for condition code.                           ---*/
>  /*------------------------------------------------------------*/
>
Comment 10 divya 2011-05-20 11:57:26 UTC
Created attachment 60179 [details]
updated patch for stfle instruction

Hi,
I am giving the updated patch with suggested changes. I must admit that I am bad in giving indentations and spaces. Please review it .
Comment 11 Florian Krohm 2011-05-23 09:27:32 UTC
(In reply to comment #10)
> Created an attachment (id=60179) [details]
> updated patch for stfle instruction
> 
> Hi,
> I am giving the updated patch with suggested changes. I must admit that I am
> bad in giving indentations and spaces. Please review it .

The patch does not apply cleanly.
I see this:

florian@ds9:~/valgrind/trunk> patch --dry -p0 < /home/florian/Downloads/stfle-patch
patching file VEX/priv/guest_s390_toIR.c
Hunk #1 succeeded at 10552 with fuzz 1 (offset 54 lines).
Hunk #2 succeeded at 11025 (offset 54 lines).
patching file VEX/priv/guest_s390_defs.h
Hunk #1 FAILED at 79.
1 out of 1 hunk FAILED -- saving rejects to file VEX/priv/guest_s390_defs.h.rej
patching file VEX/priv/guest_s390_helpers.c
Hunk #1 succeeded at 278 with fuzz 1 (offset 43 lines).
patching file tests/s390x_features.c
patching file tests/Makefile.am
Hunk #1 FAILED at 22.
1 out of 1 hunk FAILED -- saving rejects to file tests/Makefile.am.rej
patching file none/tests/s390x/stfle.vgtest
patching file none/tests/s390x/Makefile.am
Hunk #1 FAILED at 5.
1 out of 1 hunk FAILED -- saving rejects to file none/tests/s390x/Makefile.am.rej
patching file none/tests/s390x/stfle.stderr.exp
patching file none/tests/s390x/stfle.stdout.exp
patching file none/tests/s390x/stfle.c
Comment 12 divya 2011-05-23 13:27:30 UTC
Hi Florian,

I have created the patch using svn diff. and I am able to apply patch successfully.

divya@tuxmaker:~/new1/trunk/src> patch --dry -p0 < stfle.patch 
patching file VEX/priv/guest_s390_toIR.c
patching file VEX/priv/guest_s390_defs.h
patching file VEX/priv/guest_s390_helpers.c
patching file tests/s390x_features.c
patching file tests/Makefile.am
patching file none/tests/s390x/stfle.vgtest
patching file none/tests/s390x/Makefile.am
patching file none/tests/s390x/stfle.stderr.exp
patching file none/tests/s390x/stfle.stdout.exp
patching file none/tests/s390x/stfle.c

Just the difference is I am doing in src folder.Can you please check by doing in src or should I change the patch.
Comment 13 Florian Krohm 2011-05-23 14:00:32 UTC
(In reply to comment #12)
> Hi Florian,
> 
> I have created the patch using svn diff. and I am able to apply patch
> successfully.
> 
> divya@tuxmaker:~/new1/trunk/src> patch --dry -p0 < stfle.patch 

I do not know what the src directory is. When I check out valgrind from svn there is no src directory. 

The file s390_features.c is already in the repository. See comment #9 above.
There is no need to resubmit it. This is one of the reasons why your patch does not apply. 

I suggest you check out a brand-new version of valgrind and try to apply your patch. Then you see what I mean.
Comment 14 divya 2011-05-23 14:30:05 UTC
Hi,
I am not seeing s390_features.c when I am checking out the latest version of valgrind from the tuxmaker. 

src directory is in trunk/src.

I saw that you are applying in trunk folder itself and I am doing in src folder because the patch is created in src folder.
Comment 15 divya 2011-05-24 12:06:12 UTC
Created attachment 60263 [details]
updated patch for stfle instruction

This patch should work.
Comment 16 Christian Borntraeger 2011-05-24 15:50:01 UTC
Created attachment 60272 [details]
fixup for patch 60263

Here is a fixup patch agains 60263.
(Divya, can you apply and merge both patches, test it and then reattach the new patch? Please also mark the old patches as obsolete when doing so)

1. 
+#define STFLE_DW0 (STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \
+| STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE                  \
+| STFLE_DW0_EXRL | STFLE_DW0_FP)

This should be               ^^^^ FPSUPP

2. add stfle to none/tests/s390x/Makefile.am

3. in none/tests/s390x/stfle.c: move main after stfle to avoid:
stfle.c:7: warning: implicit declaration of function ‘stfle’

4. make the stfle output machine independent (with 0x.... instead
of the real values) and stdout_filter

5. please add \n newlines to the printfs in stfle.c
(The testcase must not fail!)

6. Remove one newline in none/tests/s390x/stfle.stdout.exp
Comment 17 divya 2011-05-26 13:01:49 UTC
Created attachment 60348 [details]
STFLE patch

Updated patch with changes.
Comment 18 Florian Krohm 2011-05-28 01:23:36 UTC
(In reply to comment #17)
> Created an attachment (id=60348) [details]
> STFLE patch
> 
> Updated patch with changes.

> Index: VEX/priv/guest_s390_helpers.c
> ===================================================================
> --- VEX/priv/guest_s390_helpers.c	(revision 2152)
> +++ VEX/priv/guest_s390_helpers.c	(working copy)
> @@ -278,6 +278,24 @@
>  #endif /* VGA_s390x */
>  
>  /*------------------------------------------------------------*/
> +/*--- Dirty helper for Store Facility instruction          ---*/
> +/*------------------------------------------------------------*/
> +
> +#if defined(VGA_s390x)
> +ULong s390x_dirtyhelper_STFLE(void)
> +{
> +  ULong hoststfle;
> +
> +  asm volatile("stfle %0" /* stfle */
> +                    : "=m" (hoststfle) : : "0","cc");

This does not compile with binutils versions that do not know about stfle.
You need to use .long or .insn here.

I also noticed that there were some lines with trailing white space. I don't remember where.  We don't want that.
Emacs will do the work for you if you add

(add-hook 'before-save-hook 'delete-trailing-whitespace)

to ~/.gnu-emacs
Comment 19 divya 2011-05-30 11:49:20 UTC
Created attachment 60465 [details]
updated patch for stfle instruction with removed whitespaces
Comment 20 Florian Krohm 2011-05-31 11:21:34 UTC
(In reply to comment #19)
> Created an attachment (id=60465) [details]
> updated patch for stfle instruction with removed whitespaces

This code

#else
ULong s390x_dirtyhelper_STFLE(void);{return 3;}
#endif /* VGA_s390x */

is not valid C. It won't compile on non-s390 platforms.
Can you please test your patch before posting it?
Comment 21 divya 2011-05-31 12:18:54 UTC
Created attachment 60499 [details]
updated patch for stfle instruction

Florian, Do you use the same testcase on non-s390x platforms . Can you please tell me on which platform you are checking.
It was a semi-colon mistake .
Comment 22 Julian Seward 2011-05-31 12:32:01 UTC
> Florian, Do you use the same testcase on non-s390x platforms . Can you please
> tell me on which platform you are checking.

At least check that any patch works on x86-linux, both 32- and 64-bit,
since those are our primary targets.  "Works" means doesn't break the
build, and the regression tests have the same failures before and
after the patch.
Comment 23 divya 2011-05-31 14:39:54 UTC
Created attachment 60509 [details]
Changed patch ofr STFLE instruction

This patch should work fine.
Comment 24 Christian Borntraeger 2011-05-31 15:16:36 UTC
> --- Comment #23 from divya <divyvyas linux vnet ibm com>  2011-05-31 14:39:54 ---
> Created an attachment (id=60509)
>  --> (http://bugs.kde.org/attachment.cgi?id=60509)
> Changed patch ofr STFLE instruction
> 
> This patch should work fine.

This one looks ok now.
Comment 25 Florian Krohm 2011-06-29 00:58:53 UTC
Divya,

sorry for the long delay here. I was out on vacation for a while.. I'm looking at the patch again and want to nail it soon.

I was reading in the POPs what STFLE does and I'm not sure that the dirty helper does the right thing. This is the code from your patch:

ULong s390x_dirtyhelper_STFLE(void)
{
  ULong hoststfle;

  asm volatile(" .insn s,0xb2b00000,%0"             /* stfle */
               : "=m" (hoststfle) : : "0","cc");
  return hoststfle;
}

From what I understand, STFLE can write some number of double words beginning at the address given in the instruction. The number of double words written can be larger than 1. The code above assumes that at most 1 double word is written. So there is a problem. And to avoid it you need to assign a value to register 0 that tells STFLE how many double words are available at the address. Here is the relevant paragraph from POPs:

The size of the second operand, in doublewords, is
one more than the value specified in bits 56-63 of
general register 0. The remaining bits of general reg-
ister 0 are unassigned and should contain zeros; oth-
erwise, the program may not operate compatibly in
the future.

I've also peeked into the linux kernel source and it seconds what I'm thinking :)   If you're interested look at arch/s390/kernel/early.c around line 277 (I'm referring to the 2.6.39.2 kernel).

So I think register 0 needs to be set to 0 prior to issuing stfle. Note, that you need to modify the constaint for register 0. Currently, it is clobbered, but in reality it is read and then written. So you need something like +d(0) or so.

Finally, here is something I do not understand. In s390_irgen_STFLE you "and" the facility bits with STFLE_DW0. What is the reason for that? Shoudn't we simply pass along what stfle returns?
Comment 26 divya 2011-06-29 05:27:42 UTC
Hi Florian,

Yes as a start,We started with only writing 1 double word and if it works then proceed with more than 1.

I will look into that.

and after calling dirty helper it returns the facilities bits which are installed in s390 machine. so we do 'and' with STFLE_DW0 to get the facilities bits which are available under valgrind. like if Decimal floating point support is not availible then it should turn off the respective bit.
Comment 27 Florian Krohm 2011-06-29 15:29:26 UTC
(In reply to comment #26)
> 
> and after calling dirty helper it returns the facilities bits which are
> installed in s390 machine. so we do 'and' with STFLE_DW0 to get the facilities
> bits which are available under valgrind. like if Decimal floating point support
> is not availible then it should turn off the respective bit.

If we do what you propose then certain programs will behave differently 
under valgrind as compared to a native run.
Consider the following program

int facility_bit_8_is_set(void)
{
  unsigned long long hoststfle;
  register unsigned long long reg0 asm("0") = 0;

  asm volatile(" .insn s,0xb2b00000,%0"             /* stfle */
               : "=m" (hoststfle), "+d"(reg0) : : "cc");

  return hoststfle & (63 - 8);
}

int main()
{
  if (facility_bit_8_is_set())
    printf("Extended DAT facility is installed\n");
  else
    printf("Extended DAT facility is not installed\n");

  return 0;
}

Assuming further that the host has the extended DAT facility installed, then
the program would print 
when run natively:  Extended DAT facility is installed
when run under valgrind: Extended DAT facility is not installed

Generally, we want to avoid client programs to behave differently when run
under valgrind. So let's not do the "and" operation.
Comment 28 Florian Krohm 2011-06-29 16:34:51 UTC
(In reply to comment #26)
> Hi Florian,
> 
> Yes as a start,We started with only writing 1 double word and if it works then
> proceed with more than 1.
> 

Hi Divya,

here are some thoughts regarding extending the scheme. I think we should hammer this out now so we don't have to revisit it later. 

The largest assigned facility bit is (currently) #77. 
So we need 2 double words to store all bits.

Problem is that we cannot return a value larger than 64 bit because it won't
fit into a hardware register. So the initial idea of 

   IRTemp hoststfle = newTemp(Ity_I128);  // I128

   d = unsafeIRDirty_1_N (hoststfle, 0, "s390x_dirtyhelper_STFLE",

will not work.

Something that might work is to add two slots in the guest state where the
facility bits are stored. 

s390x_dirtyhelper_STFLE can then return the condition code. So we can set that
properly (and not set it to always-zero as we currently do).

Here's a sketch of what I was thinking about:

In libvex_s390x_common.h

/* Number of double words needed to store all facility bits */
#define S390_NUM_FACILITY_DW 2


In libvex_guest_s390x.h move this section to the end of the struct definition
(and recompute the offsets):

/*------------------------------------------------------------*/
/*--- S390 miscellaneous registers                         ---*/
/*------------------------------------------------------------*/

   /*  320 */  ULong guest_counter;
   /*  328 */  UInt guest_fpc;
   /* 4-byte hole to enforce alignment requirements */
   /*  336 */  ULong guest_IA;

Then add:
   /*  ??? */  ULong guest_facilities[S390_NUM_FACILITY_DW];

Should S390_NUM_FACILITY_DW change in the future, then we do not need to make
any changes here.

in guest_s390_toIR.c

static HChar *
s390_irgen_STFLE(IRTemp op2addr)
{
   IRDirty *d;

   IRTemp cc = newTemp(Ity_I64);

   d = unsafeIRDirty_1_N (cc, 0, "s390x_dirtyhelper_STFLE",
                          &s390x_dirtyhelper_STFLE,
                          mkIRExprVec_0());

   d->needsBBP = 1;  /* Need to pass pointer to guest state to helper */

   d->fxState[0].fx     = Ifx_Write;
   d->fxState[0].offset = S390_GUEST_OFFSET(guest_facility_dw)
   d->fxState[0].size   = S390_NUM_FACILITY_DW * sizeof(ULong);

   d->fxState[1].fx     = Ifx_Modify;  /* read then write */
   d->fxState[1].offset = S390_GUEST_OFFSET(guest_r0)
   d->fxState[1].size   = sizeof(ULong);

   stmt(IRStmt_Dirty(d));

   s390_cc_set(cc);

   return "stfle";
}

in guest_s390_helpers.c

ULong s390x_dirtyhelper_STFLE(VexGuestS390xState *guest_state)
{
  ULong hoststfle, cc;
  register ULong reg0 asm("0") = guest_state->guest_r0 & 0xF;  /* r0[56:63] */

  /* We cannot store more than S390_MAX_FACILITY_DW (and it makes not much
     sense to do so anyhow) */
  if (reg0 > S390_MAX_FACILITY_DW - 1)
    reg0 = S390_MAX_FACILITY_DW - 1;

  asm volatile(" .insn s,0xb2b00000,%0"             /* stfle */
                "ipm %1\n"
                "srl %1,28\n"
               : "=m" (hoststfle), "+d"(reg0), "=d"(cc) : : "cc");

  /* Update guest register 0  with what STFLE set r0 to */
  guest_state->guest_r0 = reg0;

  return cc;
}

Also in function LibVEX_GuestS390X_initialise:

  for (i = 0; i < S390_MAX_FACILITY_DW; ++i)
    state->guest_facilities[i] = 0;

This is all not tested, just a sketch. What kind of scheme have you been thinking about?
Comment 29 Julian Seward 2011-06-29 16:42:50 UTC
(In reply to comment #28)
> Problem is that we cannot return a value larger than 64 bit because it won't
> fit into a hardware register. So the initial idea of

Does this need to be atomic?  Is it performance critical?  If no to
both questions, you could use a nasty kludge: call the helper twice,
passing it an argument telling it whether you want to return the lower
or upper 64 bits of a 128 bit result.  It's ugly but simple.
Comment 30 Florian Krohm 2011-06-29 19:30:17 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > Problem is that we cannot return a value larger than 64 bit because it won't
> > fit into a hardware register. So the initial idea of
> 
> Does this need to be atomic?  Is it performance critical?  If no to
> both questions, you could use a nasty kludge: call the helper twice,
> passing it an argument telling it whether you want to return the lower
> or upper 64 bits of a 128 bit result.  It's ugly but simple.

The answer is "no" to both. So we could use the scheme. 
The nasty thing is that the number of double words stored by STFLE depends on the value of r0 at the time stfle is invoked. On a z196 (latest model):

r0   #double words stored
--------------------------
0      1
1      2
>=2    2, maybe more (it's unspecified)

I forgot to handle that in my sketch... 
Hmm, since the dirty helper allows writing to memory the following could also work. It wouldn't need the extra pseudo regs in the guest state.

static HChar *
s390_irgen_STFLE(IRTemp op2addr)
{
   IRDirty *d;

   IRTemp cc = newTemp(Ity_I64);

   d = unsafeIRDirty_1_N (cc, 0, "s390x_dirtyhelper_STFLE",
                          &s390x_dirtyhelper_STFLE,
                          mkIRExprVec_1( mkexpr(op2addr)));

   d->needsBBP = 1;  /* Need to pass pointer to guest state to helper */

   d->fxState[0].fx     = Ifx_Modify;  /* read then write */
   d->fxState[0].offset = S390_GUEST_OFFSET(guest_r0)
   d->fxState[0].size   = sizeof(ULong);
   d->nFxState = 1;

   d->mAddr = mkexpr(op2addr);
   d->mSize = S390_NUM_FACILITY_DW * sizeof(ULong); /* pretend all double words are written */
   d->mFx   = Ifx_Write;

   stmt(IRStmt_Dirty(d));

   s390_cc_set(cc);

   return "stfle";
}


ULong s390x_dirtyhelper_STFLE(VexGuestS390xState *guest_state, HWord addr)
{
  ULong hoststfle[S390_NUM_FACILITY_DW], cc, num_dw;
  register ULong reg0 asm("0") = guest_state->guest_r0 & 0xF;  /* r0[56:63] */

  /* We cannot store more than S390_NUM_FACILITY_DW (and it makes not much
     sense to do so anyhow) */
  if (reg0 > S390_NUM_FACILITY_DW - 1)
    reg0 = S390_NUM_FACILITY_DW - 1;

  num_dw = reg0 + 1;  /* number of double words written */

  asm volatile(" .insn s,0xb2b00000,%0"             /* stfle */
                "ipm %1\n"
                "srl %1,28\n"
               : "=m" (hoststfle), "+d"(reg0), "=d"(cc) : : "cc");

  /* Update guest register 0  with what STFLE set r0 to */
  guest_state->guest_r0 = reg0;

  for (i = 0; i < num_dw; ++i)
    ((ULong *)addr)[i] = hoststfle[i];

  return cc;
}
Comment 31 divya 2011-06-30 11:59:34 UTC
Hi florian,

I agree with you that stfle write 2 double words as I have also noticed that.
I think your suggestion is correct and I will implement stfle code according to the code given by you.

and regarding the "anding" the stfle output of dirty helper with STFLE_DW0 , I have one doubt.We don't have decimal floating point support in valgrind.Will it give the respective bit disabled?
Comment 32 Florian Krohm 2011-06-30 13:51:13 UTC
(In reply to comment #31)

> and regarding the "anding" the stfle output of dirty helper with STFLE_DW0 , I
> have one doubt.We don't have decimal floating point support in valgrind.Will it
> give the respective bit disabled?

It is true, that valgrind does not yet support decimal floating point. However, the STFLE insn is about the facilities that the *hardware* supports, not what valgrind supports. So I think we should pass along the unmodified facility bits.

But suppose, we did "and" with STFLE_DW. What would be the benefit? On the con side we have the argument from comment #27.
Comment 33 divya 2011-07-01 13:52:08 UTC
Created attachment 61531 [details]
STFLE patch

Hi Florian,

Thank you so much for helping me and giving guidance.

Please try this patch if this matches your requirements.
I have made the number of doublewords as 2 and defined in libvex_s390x_common.h.
and,In future if it increases then we have to just edit this.
Comment 34 Florian Krohm 2011-07-04 04:41:14 UTC
(In reply to comment #33)
> Created an attachment (id=61531) [details]

> 
> Please try this patch if this matches your requirements.

After applying the patch compilation fails on x86. See comment #22.
Check the prototype for s390x_dirtyhelper_STFLE.

In function s390x_dirtyhelper_STFLE: indentation is 3 white spaces. In your patch it is two. (Hint: emacs will do the work for you)

none/tests/s390x/stfle.c

When the testcase is run, it may or may not successfully compare against the expected output. The reason is here:

   printf("the value is 0x%16.16llx and cc is %d\n",*list,cc);

This writes out the facility bits of the host. But those may be different depending on the machine the test is running on. 
My suggestion here is to not write those facility bits out. Just write out the condition code. Additionally, we should check that the returned facility bits have bits #1 and #2 set. valgrind only runs on z/arch, so the host must have those facilities. That gives us a warm feeling that the stfle call returned something sensible and the inline assembly is correct :)

The first argument to the stfle function points to 8 bytes of memory (facility_list_extended). But the stfle call will write up to 2 double words into that area, so there is a buffer overflow in the 2nd invocation of stfle. 
Just declare facility_list_extended to be an array of, say, 10. Which will be large enough until end of humanity. 

Finally, the number of double words written by stfle depends on the machine model. On newer machines it is 2 and the testcase will compare. But on z990 it will be 1, so the testcase will miscompare. We need to make sure that this testcase only runs on new enough machines. s390x_features should be extended
so it can also test for the machine model. I'm thinking that s390x-features should take an optional additional command line argument of the form

>z990
=z196
<z800

meaning that the machine model should be
- any machine newer z990
- exactly z196
- any machine older than z800

respectively. Bonus points, for handling >= , <= , and !=
Look at coregrind/m_machine.c function VG_(get_machine_model) for how to figure out the machine mode by reading /proc/cpuinfo. You only need to adapt it which is pretty straight forward. Instead of VG_(open) just use "open" etc.. I'm sure you'll figure it out.
Comment 35 divya 2011-07-06 09:39:26 UTC
Created attachment 61636 [details]
updated patch for stfle instruction

Hi Florian,

I am not sure whether I understood correctly or not but from what I understood I have edited s390x-features.c to take second parameter as machine-check for stfle instruction.So the testcase will run only on newer machines means z9,z10 and z196. Please correct if me I am wrong. In s390x-features.c ,I have given only one condition.If the coding is as you expected then I will proceed . I think what I have implemented is with respect to stfle instructions,I have to implement in a generalize way. 
 
Other than that ,I think the testcase is working as expected.
Comment 36 Florian Krohm 2011-07-07 22:16:06 UTC
I've checked in r11862 which provides an updated s390x_features.c
Comment 37 Florian Krohm 2011-07-11 01:58:59 UTC
Divya, I've checked in  VEX r2171 and valgrind r11864 based on your patch. 
I added some modifications for consistent code layout, more comments and minor fixes in the testcase.
Comment 38 Florian Krohm 2011-07-11 01:59:32 UTC
Closing it.
Comment 39 divya 2011-07-11 04:21:35 UTC
Thanks Florian....
Comment 40 Christian Borntraeger 2011-07-18 14:54:02 UTC
> Generally, we want to avoid client programs to behave differently when run
> under valgrind. So let's not do the "and" operation.

Florian,

regarding this aspect, I think different.
We _must_ clear out facilities that valgrind cannot handle, no?
Comment 41 Florian Krohm 2011-07-18 15:51:27 UTC
Hi Christian, are you back from hiatus or just being bored :) ??

Perhaps you could elaborate why you think that the facilities must be cleared out ?
Comment 42 Christian Borntraeger 2011-07-19 17:36:31 UTC
On 18/07/11 17:51, Florian Krohm wrote:
> --- Comment #41 from Florian Krohm <britzel acm org>  2011-07-18 15:51:27 ---
> Hi Christian, are you back from hiatus or just being bored :) ??

Just checked my mails, will be away another 4 weeks :-)


> Perhaps you could elaborate why you think that the facilities must be cleared
> out ?

There are some newer libraries which use stfle to decide which backend to use. 
(e.g. some openssl + hw crypto libraries)
When we claim support for instructions that valgrind cant handle the application
will fail. When we clear out the facilities we force these libraries into a fall
back code (which is not ideal) but at least the applications itself will run fine.

So I think only passing known good facilities will increase the number of applications
valgrind can check.

In the end its a trade off between correctness (and we have other things that behave
different from native) and better application support.

Christian
Comment 43 Florian Krohm 2011-07-27 14:34:04 UTC
(In reply to comment #42)

> There are some newer libraries which use stfle to decide which backend to use. 
> (e.g. some openssl + hw crypto libraries)
> When we claim support for instructions that valgrind cant handle the
> application
> will fail. When we clear out the facilities we force these libraries into a
> fall
> back code (which is not ideal) but at least the applications itself will run
> fine.
> 

I understand. 
Looking at a previous patch revision the facility bits from STFLE were
and'ed with this:

#define STFLE_DW0 (STFLE_DW0_ZARCH | STFLE_DW0_N3 | STFLE_DW0_STFLE \
      | STFLE_DW0_LDISP | STFLE_DW0_EIMM | STFLE_DW0_GIE                  \
      | STFLE_DW0_EXRL | STFLE_DW0_FPSUPP)

That's masking out too many bits. I think we only want to mask out the bits
of those facilities that enable a set of user-space instructions which we 
don't support. E.g. we want to mask out bit 20 (HFP something) but not bit 8
(extended DAT).
Can you assemble the list of those facilities?

It would also be a good thing to issue a message upon decoding STFLE that
we're mucking with the facility bits and that some programs may behave
differently. Actually, ideally we want to give that message only in those
cases where the actual set of facilities includes one that we're masking out.
But it's probably not allowed to print something in a dirty helper, is it?