270851 – New IBM POWER7 fcfidus instruction causes memcheck to fail

Bug 270851 - New IBM POWER7 fcfidus instruction causes memcheck to fail

Summary: New IBM POWER7 fcfidus instruction causes memcheck to fail

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.7 SVN
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-04-13 18:06 UTC by Maynard Johnson
Modified:	2011-05-17 17:56 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:

Attachments
Patch to fix this bug (784 bytes, patch) 2011-04-13 18:06 UTC, Maynard Johnson	Details
Version 2 of the patch to fix this bug (6.84 KB, patch) 2011-04-20 15:47 UTC, Maynard Johnson	Details
Version 3 of patch to fix this bug (7.11 KB, patch) 2011-04-22 00:01 UTC, Maynard Johnson	Details
Version 4 of the patch to fix this bug (7.30 KB, patch) 2011-05-07 03:09 UTC, Maynard Johnson	Details
Show Obsolete (3) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Maynard Johnson 2011-04-13 18:06:18 UTC

Created attachment 58905 [details]
Patch to fix this bug

Some support for IBM POWER7 instructions is being added via bug # 267630.  There have been several iterations of the POWER7 support patch attached to that bug.  In Version 3 of that patch, a change was made to memcheck/mc_translate.c to fix a problem I had found.  Somehow, that fix did not get integrated into that patch correctly, because running the new none/tests/ppc{32|64}/test_isa_2_06_part1 under Valgrind+POWER7 results in the following error:

---------------------------------------------------
ERROR = IRStmt.Put.Tmp: tmp and expr do not match


vex: the `impossible' happened:
   sanityCheckFail: exiting due to bad IR
---------------------------------------------------

The error occurs when translating the new POWER7 fcfidus instruction. The attached patch fixes this error.

NOTE: This bug is dependent on bug # 267630.  How can I add that dependency?  I don't see an editable "Depends on" field in KDE bug tracker.

Comment 1 Julian Seward 2011-04-19 17:48:35 UTC

(In reply to comment #0)
> Created an attachment (id=58905) [details]
> Patch to fix this bug

Pls re-check this .. seems to me that the patch still doesn't
move it to the right category.  Shouldn't it live with its close
relative Iop_I64StoF32, in the category immediately preceding it
in the source file?

Comment 2 Maynard Johnson 2011-04-19 18:26:19 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > Created an attachment (id=58905) [details] [details]
> > Patch to fix this bug
> 
> Pls re-check this .. seems to me that the patch still doesn't
> move it to the right category.  Shouldn't it live with its close
> relative Iop_I64StoF32, in the category immediately preceding it
> in the source file?
On ppc64, single precision floats are stored in memory with 32-bits, but when loaded into a floating point register (which are 64-bit registers), they are expanded to 64-bits.  So we need the destination to be Ity_I64.  If we put this case with Iop_I64StoF32 (which I see was just recently added for s390), the destination would be an Ity_I32.  We have a similar situation with Iop_RoundF64toF32.  I agree it's confusing.  I don't see a way around it, other than defining new ops (e.g., Iop_I64UtoF32DP).  And we may need to do so if we end up colliding with s390 interpretations of F32.

Comment 3 Julian Seward 2011-04-19 18:57:27 UTC

(In reply to comment #2)
> expanded to 64-bits.  So we need the destination to be Ity_I64.  If we put this
> case with Iop_I64StoF32 (which I see was just recently added for s390), the
> destination would be an Ity_I32.

It would be an Ity_F32.


> [...] end up colliding with s390 interpretations of F32.

To clarify: the IR ops (Iop_whatever) have a fixed meaning that is
independent of the front- and back- ends that produce and consume
them, respectively.  They also have fixed types, which are rigidly
enforced by the IR sanity checker.  If you think that the ops don't
match how you want to use them, then that's a sign that you need to
generate some other sequence of conversions.

Ah, I see it's been given a type that doesn't match its name:

      case Iop_I64UtoF32: BINARY(ity_RMode,Ity_I64, Ity_F64);

and that doesn't match the stated type in libvex_ir.h:

      Iop_I64UtoF32, /* IRRoundingMode(I32) x unsigned I64 -> F32 */

That needs to be fixed.

I think what your front end needs to generate is

   Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) )

Although it's longwinded, it's semantically correct.  And it doesn't
necessarily mean that the backend generates worse code, because the
backend is quite at liberty (in iselDblExpr_wrk) to spot the nested
IRExpr_Unops and generate a single sequence of instructions which have
the desired effect.  See iselIntExpr_AMode_wrk in host_x86_isel.c for
examples of nested matching.  There's even a completely general tree
matching facility in ir_match.[ch], although that's overkill in this
case.

Comment 4 Maynard Johnson 2011-04-19 21:44:56 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > expanded to 64-bits.  So we need the destination to be Ity_I64.  If we put this
> > case with Iop_I64StoF32 (which I see was just recently added for s390), the
> > destination would be an Ity_I32.
> 
> It would be an Ity_F32.
> 
> 
> > [...] end up colliding with s390 interpretations of F32.
> 
> To clarify: the IR ops (Iop_whatever) have a fixed meaning that is
> independent of the front- and back- ends that produce and consume
> them, respectively.  They also have fixed types, which are rigidly
> enforced by the IR sanity checker.  If you think that the ops don't
> match how you want to use them, then that's a sign that you need to
> generate some other sequence of conversions.
> 
> Ah, I see it's been given a type that doesn't match its name:
> 
>       case Iop_I64UtoF32: BINARY(ity_RMode,Ity_I64, Ity_F64);
> 
> and that doesn't match the stated type in libvex_ir.h:
> 
>       Iop_I64UtoF32, /* IRRoundingMode(I32) x unsigned I64 -> F32 */
> 
> That needs to be fixed.

OK.  I have a patch that I'll attach that fixes the above and reworks the ppc64 frontend and backend as you suggested below.  The patch is a little bigger than it needs to be because I took the opportunity to create/use a helper function for the two places where I64-to-float converts were being requested.  Thanks for the advice on resolving this.

As I mentioned in comment #2, we have the same situation with Iop_RoundF64toF32, which has existed for some time (i.e., not part of the POWER7 work).  I will open a separate bug and attach a fix for that.
> 
> I think what your front end needs to generate is
> 
>    Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) )
> 
> Although it's longwinded, it's semantically correct.  And it doesn't
> necessarily mean that the backend generates worse code, because the
> backend is quite at liberty (in iselDblExpr_wrk) to spot the nested
> IRExpr_Unops and generate a single sequence of instructions which have
> the desired effect.  See iselIntExpr_AMode_wrk in host_x86_isel.c for
> examples of nested matching.  There's even a completely general tree
> matching facility in ir_match.[ch], although that's overkill in this
> case.

Comment 5 Maynard Johnson 2011-04-20 15:47:01 UTC

Created attachment 59158 [details]
Version 2 of the patch to fix this bug

hmmmm . . . I intended to attach this patch yesterday and *thought* that I had, but I guess I didn't.  Sorry for the delay.

Comment 6 Maynard Johnson 2011-04-21 16:44:22 UTC

(In reply to comment #4)
[snip]
> As I mentioned in comment #2, we have the same situation with
> Iop_RoundF64toF32, which has existed for some time (i.e., not part of the
> POWER7 work).  I will open a separate bug and attach a fix for that.
Just so you know I've not forgotten about this . . . After I get confirmation from you that my version 2 patch attached to this bug is acceptable, I'll open the new bug to fix Iop_RoundF64toF32 since the fix will be very similar.  Thanks.
[snip]

Comment 7 Julian Seward 2011-04-21 17:31:47 UTC

(In reply to comment #6)
> from you that my version 2 patch attached to this bug is acceptable, I'll open
> the new bug to fix Iop_RoundF64toF32 since the fix will be very similar. 

Yeah, that looks OK.  Sorry for the delay.  It might not land till
after easter now, though.

Pls continue w/ the Iop_RoundF64toF32 patch.  Although I'm not 100%
clear what problem you're referring to in this case.  Maybe I can
infer it from the patch.

Comment 8 Maynard Johnson 2011-04-22 00:01:22 UTC

Created attachment 59190 [details]
Version 3 of patch to fix this bug

I realized that the previous version of this patch was built against an older SVN snapshot, so I re-spun it against today's SVN.  No other changes were made.

Comment 9 Julian Seward 2011-04-27 12:36:03 UTC

Hmm, this turns out to be a much bigger swamp than it seemed at first,
and I got a headache trying to figure out if the patch is really
correct or not.  It doesn't help that I don't have access to a P7
capable machine to test stuff on -- the best I have access to is an
aging PPC970 box.

I committed r2130, which incorporates the type changes for
Iop_I64UtoF32, but nothing else.  So it will still fail in the back
end when a fcfidus comes through.


For one thing, what I said in comment #4 is wrong:

  Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) )

can't be generated directly as a I64U -> F64 conversion, since that
omits the rounding to F32 range.  So there's no point in trying to
handle this case specially.

So you'll need to make a iselFltExpr case to handle Iop_I64UtoF32
separately.  (Sorry.)


Second point of concern is the proliferation of cases associated with
Pin_FpCftI.  Prior to r2127 (your IBM Power ISA 2.06 -- stage 1 patch)
it looked like this:

     /* fcfid/fctid/fctiw.  Note there's no fcfiw so fromI==True
        && int32==True is not allowed. */
     struct {
        Bool fromI; /* False==F->I, True==I->F */
        Bool int32; /* True== I is 32, False==I is 64 */
        HReg src;
        HReg dst;
     } FpCftI;

With the 2 booleans, it could represent 4 different instructions, 3 of
which (fcfid/fctid/fctiw) actually exist in the 2.05 ISA (AIUI), and
the 4th is disallowed by the constructor, PPCInstr_FpCftI().  So
there's no possibility of representing nonexistent instructions.

r2127 introduced 2 new booleans

        Bool syned;
        Bool dst64; /* True==dest is 64bit; False==dest is 32bit */
 
For one thing, dst64 is mis-named.  What it indicates is whether the
float side is 64 bits or not; but whether it is the destination or
source is determined by fromI.  So it should be called flt64.

I'm also concerned that Pin_FpCftI can now represent 16 different
instructions, of which I suspect not all actually exist.  Yet neither
the constructor PPCInstr_FpCftI() nor the emitter (case Pin_FpCftI in
emit_PPCInstr) rigorously reject nonsensical combinations.  For
example:

    if (i->Pin.FpCftI.fromI == False && i->Pin.FpCftI.int32 == True) {
       // fctiw (conv f64 to i32), PPC32 p404
       p = mkFormX(p, 63, fr_dst, 0, fr_src, 14, 0);
       goto done;
    }

This ignores .dst64 and just assumes the floating side is 64 bit
(afaics).

It would be good to tidy this up, at a bare minimum so as to guarantee
that the emitter will assert on any nonsensical combinations.
Possibly splitting Pin_FpCftI into two cases, fromI and !fromI, might
help.  FWIW, I'm a big fan of the "No junk, no confusion" rule -- see
http://www.faqs.org/docs/artu/ch04s02.html.

Comment 10 Maynard Johnson 2011-05-07 03:09:25 UTC

Created attachment 59712 [details]
Version 4 of the patch to fix this bug

Julian, I think this version 4 patch addresses all the issues you raised above.

Comment 11 Julian Seward 2011-05-08 23:52:51 UTC

Committed, r2148.  Thanks for the tidying up.