Created attachment 58905 [details] Patch to fix this bug Some support for IBM POWER7 instructions is being added via bug # 267630. There have been several iterations of the POWER7 support patch attached to that bug. In Version 3 of that patch, a change was made to memcheck/mc_translate.c to fix a problem I had found. Somehow, that fix did not get integrated into that patch correctly, because running the new none/tests/ppc{32|64}/test_isa_2_06_part1 under Valgrind+POWER7 results in the following error: --------------------------------------------------- ERROR = IRStmt.Put.Tmp: tmp and expr do not match vex: the `impossible' happened: sanityCheckFail: exiting due to bad IR --------------------------------------------------- The error occurs when translating the new POWER7 fcfidus instruction. The attached patch fixes this error. NOTE: This bug is dependent on bug # 267630. How can I add that dependency? I don't see an editable "Depends on" field in KDE bug tracker.
(In reply to comment #0) > Created an attachment (id=58905) [details] > Patch to fix this bug Pls re-check this .. seems to me that the patch still doesn't move it to the right category. Shouldn't it live with its close relative Iop_I64StoF32, in the category immediately preceding it in the source file?
(In reply to comment #1) > (In reply to comment #0) > > Created an attachment (id=58905) [details] [details] > > Patch to fix this bug > > Pls re-check this .. seems to me that the patch still doesn't > move it to the right category. Shouldn't it live with its close > relative Iop_I64StoF32, in the category immediately preceding it > in the source file? On ppc64, single precision floats are stored in memory with 32-bits, but when loaded into a floating point register (which are 64-bit registers), they are expanded to 64-bits. So we need the destination to be Ity_I64. If we put this case with Iop_I64StoF32 (which I see was just recently added for s390), the destination would be an Ity_I32. We have a similar situation with Iop_RoundF64toF32. I agree it's confusing. I don't see a way around it, other than defining new ops (e.g., Iop_I64UtoF32DP). And we may need to do so if we end up colliding with s390 interpretations of F32.
(In reply to comment #2) > expanded to 64-bits. So we need the destination to be Ity_I64. If we put this > case with Iop_I64StoF32 (which I see was just recently added for s390), the > destination would be an Ity_I32. It would be an Ity_F32. > [...] end up colliding with s390 interpretations of F32. To clarify: the IR ops (Iop_whatever) have a fixed meaning that is independent of the front- and back- ends that produce and consume them, respectively. They also have fixed types, which are rigidly enforced by the IR sanity checker. If you think that the ops don't match how you want to use them, then that's a sign that you need to generate some other sequence of conversions. Ah, I see it's been given a type that doesn't match its name: case Iop_I64UtoF32: BINARY(ity_RMode,Ity_I64, Ity_F64); and that doesn't match the stated type in libvex_ir.h: Iop_I64UtoF32, /* IRRoundingMode(I32) x unsigned I64 -> F32 */ That needs to be fixed. I think what your front end needs to generate is Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) ) Although it's longwinded, it's semantically correct. And it doesn't necessarily mean that the backend generates worse code, because the backend is quite at liberty (in iselDblExpr_wrk) to spot the nested IRExpr_Unops and generate a single sequence of instructions which have the desired effect. See iselIntExpr_AMode_wrk in host_x86_isel.c for examples of nested matching. There's even a completely general tree matching facility in ir_match.[ch], although that's overkill in this case.
(In reply to comment #3) > (In reply to comment #2) > > expanded to 64-bits. So we need the destination to be Ity_I64. If we put this > > case with Iop_I64StoF32 (which I see was just recently added for s390), the > > destination would be an Ity_I32. > > It would be an Ity_F32. > > > > [...] end up colliding with s390 interpretations of F32. > > To clarify: the IR ops (Iop_whatever) have a fixed meaning that is > independent of the front- and back- ends that produce and consume > them, respectively. They also have fixed types, which are rigidly > enforced by the IR sanity checker. If you think that the ops don't > match how you want to use them, then that's a sign that you need to > generate some other sequence of conversions. > > Ah, I see it's been given a type that doesn't match its name: > > case Iop_I64UtoF32: BINARY(ity_RMode,Ity_I64, Ity_F64); > > and that doesn't match the stated type in libvex_ir.h: > > Iop_I64UtoF32, /* IRRoundingMode(I32) x unsigned I64 -> F32 */ > > That needs to be fixed. OK. I have a patch that I'll attach that fixes the above and reworks the ppc64 frontend and backend as you suggested below. The patch is a little bigger than it needs to be because I took the opportunity to create/use a helper function for the two places where I64-to-float converts were being requested. Thanks for the advice on resolving this. As I mentioned in comment #2, we have the same situation with Iop_RoundF64toF32, which has existed for some time (i.e., not part of the POWER7 work). I will open a separate bug and attach a fix for that. > > I think what your front end needs to generate is > > Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) ) > > Although it's longwinded, it's semantically correct. And it doesn't > necessarily mean that the backend generates worse code, because the > backend is quite at liberty (in iselDblExpr_wrk) to spot the nested > IRExpr_Unops and generate a single sequence of instructions which have > the desired effect. See iselIntExpr_AMode_wrk in host_x86_isel.c for > examples of nested matching. There's even a completely general tree > matching facility in ir_match.[ch], although that's overkill in this > case.
Created attachment 59158 [details] Version 2 of the patch to fix this bug hmmmm . . . I intended to attach this patch yesterday and *thought* that I had, but I guess I didn't. Sorry for the delay.
(In reply to comment #4) [snip] > As I mentioned in comment #2, we have the same situation with > Iop_RoundF64toF32, which has existed for some time (i.e., not part of the > POWER7 work). I will open a separate bug and attach a fix for that. Just so you know I've not forgotten about this . . . After I get confirmation from you that my version 2 patch attached to this bug is acceptable, I'll open the new bug to fix Iop_RoundF64toF32 since the fix will be very similar. Thanks. [snip]
(In reply to comment #6) > from you that my version 2 patch attached to this bug is acceptable, I'll open > the new bug to fix Iop_RoundF64toF32 since the fix will be very similar. Yeah, that looks OK. Sorry for the delay. It might not land till after easter now, though. Pls continue w/ the Iop_RoundF64toF32 patch. Although I'm not 100% clear what problem you're referring to in this case. Maybe I can infer it from the patch.
Created attachment 59190 [details] Version 3 of patch to fix this bug I realized that the previous version of this patch was built against an older SVN snapshot, so I re-spun it against today's SVN. No other changes were made.
Hmm, this turns out to be a much bigger swamp than it seemed at first, and I got a headache trying to figure out if the patch is really correct or not. It doesn't help that I don't have access to a P7 capable machine to test stuff on -- the best I have access to is an aging PPC970 box. I committed r2130, which incorporates the type changes for Iop_I64UtoF32, but nothing else. So it will still fail in the back end when a fcfidus comes through. For one thing, what I said in comment #4 is wrong: Iop_F32toF64 ( Iop_I64UtoF32(roundingmode, arg) ) can't be generated directly as a I64U -> F64 conversion, since that omits the rounding to F32 range. So there's no point in trying to handle this case specially. So you'll need to make a iselFltExpr case to handle Iop_I64UtoF32 separately. (Sorry.) Second point of concern is the proliferation of cases associated with Pin_FpCftI. Prior to r2127 (your IBM Power ISA 2.06 -- stage 1 patch) it looked like this: /* fcfid/fctid/fctiw. Note there's no fcfiw so fromI==True && int32==True is not allowed. */ struct { Bool fromI; /* False==F->I, True==I->F */ Bool int32; /* True== I is 32, False==I is 64 */ HReg src; HReg dst; } FpCftI; With the 2 booleans, it could represent 4 different instructions, 3 of which (fcfid/fctid/fctiw) actually exist in the 2.05 ISA (AIUI), and the 4th is disallowed by the constructor, PPCInstr_FpCftI(). So there's no possibility of representing nonexistent instructions. r2127 introduced 2 new booleans Bool syned; Bool dst64; /* True==dest is 64bit; False==dest is 32bit */ For one thing, dst64 is mis-named. What it indicates is whether the float side is 64 bits or not; but whether it is the destination or source is determined by fromI. So it should be called flt64. I'm also concerned that Pin_FpCftI can now represent 16 different instructions, of which I suspect not all actually exist. Yet neither the constructor PPCInstr_FpCftI() nor the emitter (case Pin_FpCftI in emit_PPCInstr) rigorously reject nonsensical combinations. For example: if (i->Pin.FpCftI.fromI == False && i->Pin.FpCftI.int32 == True) { // fctiw (conv f64 to i32), PPC32 p404 p = mkFormX(p, 63, fr_dst, 0, fr_src, 14, 0); goto done; } This ignores .dst64 and just assumes the floating side is 64 bit (afaics). It would be good to tidy this up, at a bare minimum so as to guarantee that the emitter will assert on any nonsensical combinations. Possibly splitting Pin_FpCftI into two cases, fromI and !fromI, might help. FWIW, I'm a big fan of the "No junk, no confusion" rule -- see http://www.faqs.org/docs/artu/ch04s02.html.
Created attachment 59712 [details] Version 4 of the patch to fix this bug Julian, I think this version 4 patch addresses all the issues you raised above.
Committed, r2148. Thanks for the tidying up.