Created attachment 107012 [details] Prototype implementation of several AVX-512 instructions Valgrind does not support AVX-512 yet. AVX-512 programming reference and hardware emulator are available at https://software.intel.com/en-us/isa-extensions AVX-512 is supported by GCC 4.9 and later. Attached a prototype implementation of several AVX-512 instructions in Nulgrind. A test for these instructions is located in none/tests/amd64/avx512.c
This is a quite good start. Here are some initial comments. This isn't a full review but should give you some initial feedback. ------------------------------ +// AVX-512 - use a real CPUid +// If XGETBV, XSAVE and XRSTOR fail, might need to update XSaveOpt +void amd64g_dirtyhelper_CPUID_avx512 ( VexGuestAMD64State* st ) { I see that you're doing this in order to get started quickly. But in general this doesn't work, because there's no fixed relationship between what the CPUID of the host claims is supported and what you actually support in your patch. In the end you'll need to return your own values from CPUID, like all the other CPUID helper function variants do. ---------------------------------- +static IRExpr* mkV512 ( UInt val ) +{ + return IRExpr_Const(IRConst_V512(val)); +} Assuming that IRConst_V512 contains 64 bits, one for each byte of the value, this is wrong. UInt is a 32 bit value. You need to use ULong, which is always 64 bits. ---------------------------------- @@ -2511,7 +2796,7 @@ static IRTemp disAMode_copy2tmp ( IRExpr* addr64 ) static IRTemp disAMode ( /*OUT*/Int* len, const VexAbiInfo* vbi, Prefix pfx, Long delta, - /*OUT*/HChar* buf, Int extra_bytes ) + /*OUT*/HChar* buf, Int extra_bytes ) Nit: please don't include whitespace changes ---------------------------------- @@ -17797,7 +18164,7 @@ static Long dis_AESx ( const VexAbiInfo* vbi, Prefix pfx, use of mkU64 rather than mkIRExpr_HWord implies the assumption that the host's word size is 64-bit. */ UInt gstOffD = ymmGuestRegOffset(rG); - UInt gstOffL = regNoL == 16 ? OFFB_YMM16 : ymmGuestRegOffset(regNoL); + UInt gstOffL = regNoL == 32 ? OFFB_ZMM32 : ymmGuestRegOffset(regNoL); Did you mean to use zmmGuestRegOffset here? (I don't know if that even exists, but there's a Y/Z inconsistency here, when comparing to the OFFB_YMM16/ZMM32 text. ---------------------------------- + default: + // All other AVX-512 instructions. Crash before things get worse. + vex_printf("dis_ESC_0F__EVEX - UNRECOGNIZED OPCODE 0x%x\n", opcode); + vpanic(" bye\n "); + break; Please, don't cause the instruction decoder to panic on unimplemented instructions. It might be a while before you implement them all. Instead, use whatever the return conventions are for this function, to indicate decode failure (I think it is to return delta unchanged, but you should check) and let V's illegal-instruction handling deal with it in the normal way. The same applies for all the dis_ESC_* functions that you've added. ---------------------------------- Please use the house style as much as possible, in particular, 3-space indents. I notice you have 4 spaces in many places. + if (pfx & PFX_EVEX) + { + delta=ParseEVEX(&pfx, &esc, delta); + } House style please: { at the end of the condition. And spaces on each side of the =. ---------------------------------- + if (pfx & PFX_EVEX) { + /* EVEX prefixed instruction */ + Bool uses_vvvv = False; + switch (esc) { ... + default: + vassert(0); + } + Per comments above, please ensure we can't get to the vassert(0) unless the decoder is buggy. We should never get there for valid but unhandled instructions. ---------------------------------- diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c + - vregmap2 and vregmap3 are used for 512-bit vector IRTemps. + Maybe it'd be better to use them for 256-bit, too Allocating these is going to be expensive for the 99.9% of blocks that don't require them. It would be nice to only allocate them if we know they will be required. ---------------------------------- + } +do_F64AssistedUnary: + { The use of gotos is OK (it makes the code simpler) but please indent them at the same level as surrounding code. Definitely not at column 1: + } + do_F64AssistedUnary: + { ---------------------------------- + for (int i=0; i<4; i++) { + dst[i] = newVRegV(env); + } (1) please, house style: spaces before/after = and < (2) where does '4' come from? Can you make it less like a magic number (give it a name?) +IRConst* IRConst_V512 ( UInt con ) +{ + IRConst* c = LibVEX_Alloc_inline(sizeof(IRConst)); + c->tag = Ico_V512; + c->Ico.V512 = con; + return c; +} Per comments above, this is definitely wrong if you intend to have one bit per byte for a 512 bit value. ---------------------------------- /* A union for doing 128-bit vector primitives conveniently. */ typedef union { @@ -78,6 +81,7 @@ typedef UShort w16[8]; UInt w32[4]; ULong w64[2]; + double f64[2]; use the house types: Double, not double. ---------------------------------- + UInt V512; /* 64-bit value; see Ico_V512 comment above */ per comments above ---------------------------------- + /* Conflict detection */ + Iop_Clz32x16, + Iop_CfD32x16, + Iop_BcMask_W2D, Please describe the semantics of these a bit. What do they do? ---------------------------------- case Ity_V256: tmp1 = assignNew('V', mce, Ity_I64, unop(Iop_1Sto64, tmp1)); - tmp1 = assignNew('V', mce, Ity_V128, binop(Iop_64HLtoV128, - tmp1, tmp1)); - tmp1 = assignNew('V', mce, Ity_V256, binop(Iop_V128HLtoV256, - tmp1, tmp1)); + tmp1 = assignNew('V', mce, Ity_V128, binop(Iop_64HLtoV128, tmp1, tmp1)); + tmp1 = assignNew('V', mce, Ity_V256, binop(Iop_V128HLtoV256, tmp1, tmp1)); + return tmp1; + case Ity_V512: + tmp1 = assignNew('V', mce, Ity_I64, unop(Iop_1Sto64, tmp1)); + tmp1 = assignNew('V', mce, Ity_V128, binop(Iop_64HLtoV128, tmp1, tmp1)); + tmp1 = assignNew('V', mce, Ity_V256, binop(Iop_V128HLtoV256, tmp1, tmp1)); + tmp1 = assignNew('V', mce, Ity_V512, binop(Iop_V256HLtoV512, tmp1, tmp1)); return tmp1; Why does this change the V256 case? This code is fragile and it would be better not to change handling of existing types. ---------------------------------- - if (UNLIKELY(ty == Ity_V256)) { + if (UNLIKELY((ty == Ity_V512))) { + Int offQ[8]; + IRDirty *diQ[8]; + IRAtom *addrQ[8], *vdataQ[8], *eBiasQ[8]; + if (end == Iend_LE) { + for (int i=0; i<8; i++) + offQ[i]=i*8; + } else { + for (int i=0; i<8; i++) + offQ[7-i]=i*8; + } + for (int i=0; i<8; i++){ + eBiasQ[i] = tyAddr==Ity_I32 ? mkU32(bias+offQ[i]) : mkU64(bias+offQ[i]); House style: - spaces around =, < - 3 char indent - proper indentation inside the loop - use house types (Int, not int)
Created attachment 107354 [details] Corrected prototype implementation Julian, thank you for the comments! Attached a corrected version. Not fixed yet: - Return our own values from the CPUID - Allocate vregmap2 and vregmap3 only when needed I'm working on these issues.
Tanya, any progress on this?
(In reply to Julian Seward from comment #3) > Tanya, any progress on this? Julian, More AVX-512 instructions are added to Nulgrind (at the moment, 168 out of 347 are added), and some AVX-512 functionality (masking and displacement encoding) is updated. I will clean up the code and attach the updated patch in a few days. Regards, Tanya
Created attachment 109000 [details] Updated AVX-512 implementation prototype
Created attachment 109001 [details] Nulgrind test for the AVX-512 instructions The test is based on the existing Nulgrind AVX and AVX-2 tests
Created attachment 109002 [details] Nulgrind test for the AVX instructions on AVX-512 machine AVX regression test based on the existing Nulgrind AVX test, outputs ZMM registers instead of YMM
Created attachment 109004 [details] Nulgrind test for the AVX-2 instructions on AVX-512 machine AVX-2 regression test based on the existing Nulgrind AVX-2 test, outputs ZMM instead of YMM
Created attachment 109005 [details] Nulgrind test for the serial vFMA instructions The "AVX-512_prototype_v3" patch changes the behaviour of the serial vFMA instructions. For these instructions, Valgrind sets destination bits [127:64] or [127:32] to zero; according to the ISE, these bits should remain unchanged. The test covers the changed instructions.
Hello, Sorry for the late response. Please find the attachment for the patch AVX-512_prototype_v3.patch. It mainly adds new instructions to Nulgrind; most of these instructions are not added to Memcheck yet. The patch also enables AVX-512 in VGDB. The patch has these updates to the AVX/AVX2 functionality: 1) AVX and AVX-2 instructions now set the unused upper half of destination ZMM vector register to all zeros. 2) Serial FMA instructions are supposed to keep the destination bits [127:32] for 32-bit instructions or [127:64] for the 64-bit unchanged; original Valgrind sets them to zero. Please let us know if this should be a separate patch. The tests for AVX-512, AVX-2, AVX and serial vFMA instructions are attached. The vFMA test must run correctly on the AVX machines, too. Please let me know if you would like us to provide the list of the implemented AVX-512 instructions. Regards, Tanya
Sorry for the delay. I will review in the coming week (11-15 Dec).
(In reply to Tanya from comment #5) > Created attachment 109000 [details] > Updated AVX-512 implementation prototype This looks pretty good. I have a number of comments, many of them just style/layout, but a few are a bit more than that, but nothing serious. As a side note -- before this lands, I would want to do some performance runs to check that this doesn't impact performance (or correctness) of existing IA support. ----- static +Bool have_avx512(void) { return (vai.hwcaps & VEX_HWCAPS_AMD64_AVX512 ? True : False); Potential operator precedence problelsm for & vs ?: make me nervous. Please use (vai.hwcaps & VEX_HWCAPS_AMD64_AVX512) ? True : False so we don't accidentally end up with vai.hwcaps & (VEX_HWCAPS_AMD64_AVX512 ? True : False) ----- valgrind-wip/valgrind/memcheck/mc_machine.c @@ -873,7 +888,6 @@ if (o == GOF(FPSCR) && sz == 4) return -1; if (o == GOF(TPIDRURO) && sz == 4) return -1; - if (o == GOF(TPIDRURW) && sz == 4) return -1; if (o == GOF(ITSTATE) && sz == 4) return -1; Looks like you need to update your -wip copy. The TPIDRURW line was added recently for ARM. ----- void mc_LOADV_128_or_256_slow ( /*OUT*/ULong* res, Addr a, SizeT nBits, Bool bigendian ) { - ULong pessim[4]; /* only used when p-l-ok=yes */ + ULong pessim[16]; /* only used when p-l-ok=yes */ IIUC, you're now using this also to handle the slow case for 512 bit loads. Correct? If so, (1) please rename it to mc_LOADV_128_or_256_or_512_slow (2) please verify that the 4->16 transition here is correct -- I suspect it is not. A ULong is 64 bits, so 'ULong pessim[4]' contains one byte for each byte in a 256-bit load. So to make it work for 512 bit loads, don't you only need to change it to 'ULong pessim[8]' ? ----- +void amd64g_dirtyhelper_CPUID_avx512 ( VexGuestAMD64State* st ) { Please add a comment saying which processor this is taken from (or most closely represents), like with all the other CPUID helper functions. ----- +#define OFFB_YMM32 offsetof(VexGuestAMD64State,guest_ZMM32) //NULL What does //NULL here mean? Maybe remove it? ----- +#define PFX_EVEXb (1<<30) /* EVEX b bit, if EVEX present, else 0 */ +#define PFX_EVEXTT (0xF<<32) /* EVEX tuple type (4-bits) if EVEX present, else 0 */ Please change these (also, all the pre-existing PFX_ values like PFX_ASO etc) like this 1ULL << 30 0xFULL << 32 so that we don't get problems with (eg) the compiler shifting 0xF as 32-bits left by 32 (giving zero) and *then* widening to 64 bits. ----- +static Int tuple_disp_map[14][3] = { I assume that you want to access this using a TupleType for the first index. Please add this: STATIC_ASSERT(FullVectorMask == 14) just before the definition of tuple_disp_map. ----- +static int getMult(Prefix pfx) { + int type = getEvexType(pfx); Please, use house types (Int instead of int). Int is signed-32 on all platforms that valgrind supports. ----- I'm not sure what function these are in, but anyway .. + + // High 256 bits of ZMM0-ZMM15 registers + for (reg = 0; reg < 16; reg++) { + stmt( IRStmt_StoreG( + Iend_LE, + binop(Iop_Add64, mkexpr(addr), mkU64(1152 + reg * 32)), and later + binop(Iop_Add64, mkexpr(addr), mkU64(1664 + lane*16 + reg*64)), Please don't use magic numbers 1152 and 1664. Where do these come from? Are they guest-state offsets? If yes, please make them be derived from OFFB_* values. If no, please at least document where the numbers come from and what else assumes those same numbers. ----- IRExpr* ea = binop(Iop_Add64, mkexpr(addr), mkU64(576 + reg * 16)); same + IRExpr* ea = binop(Iop_Add64, mkexpr(addr), mkU64(1152 + reg * 32)); etc ----- +static IRExpr* mask_expr(Prefix pfx, IRExpr* unmasked, IRExpr* original) { Nit: for big functions like this, please place the { on its own line. In the same function: + default: + break; If it is an internal logic bug that we get to this point, it would be better to 'vassert(0)'. But only if it's a bug in the code -- not if it is an undecoded instruction. ----- + assign( e1, amt >= size + ? mkV512(0) + : binop(op, mkexpr(e0), mkU8(amt)) + ); Please fix the indentation here and below, to make it clearer that there are 2 args. (In emacs C-mode, pressing TAB often fixes it for me). + assign( e1, amt >= size + ? mkV512(0) + : binop(op, mkexpr(e0), mkU8(amt)) + ); ----- + if (!getEvexMask(pfx)) { putYMMRegLoAndZU( rG, unop(op, mkexpr(arg)) ); + } Indent the then-clause! ----- +static +Long dis_AVX512_cmp_V_E_to_k ( /*OUT*/Bool* uses_vvvv, + IRTemp addr = IRTemp_INVALID; + Int alen = 0; + HChar dis_buf[50]; + UInt mask = getEvexMask(pfx); + UChar modrm = getUChar(delta); + UInt rG = gregOfRexRM(pfx, modrm); + UInt rV = getVexNvvvv(pfx); + *uses_vvvv = True; + IRTemp tD = newTemp(Ity_V512); + IRExpr* unmasked; + UInt imm8; Please line up the declarationdb like in many other cases, to make it easier to read. ----- +static Long opmask_operation_decode (const VexAbiInfo* vbi, Prefix pfx, Long* delta) { { on its own line ----- +static IRTemp math_VPERMD_512 ( IRTemp ctrlV, IRTemp dataV ) { + /* In the control vector, zero out all but the bottom four bits of each 32-bit lane. */ Please stay within an 80 column limit. ----- + if (!isYMM) { putYMMRegLane128( rG, 1, mkV128(0) ); + } Indent! ----- @@ -30062,170 +31359,2467 @@ nameIRegG(size,pfx,rm)); delta += alen; } - - /* First mask off bits not set in mask, they are ignored - and it should be fine if they contain undefined values. */ - IRExpr* masked = binop(mkSizedOp(ty,Iop_And8), - mkexpr(src), mkexpr(mask)); - IRExpr** args = mkIRExprVec_2( widenUto64(masked), - widenUto64(mkexpr(mask)) ); - putIRegG( size, pfx, rm, - narrowTo(ty, mkIRExprCCall(Ity_I64, 0/*regparms*/, - "amd64g_calculate_pext", - &amd64g_calculate_pext, args)) ); - *uses_vvvv = True; - /* Flags aren't modified. */ - goto decode_success; + + /* First mask off bits not set in mask, they are ignored + and it should be fine if they contain undefined values. */ + IRExpr* masked = binop(mkSizedOp(ty,Iop_And8), + mkexpr(src), mkexpr(mask)); + IRExpr** args = mkIRExprVec_2( widenUto64(masked), + widenUto64(mkexpr(mask)) ); + putIRegG( size, pfx, rm, + narrowTo(ty, mkIRExprCCall(Ity_I64, 0/*regparms*/, + "amd64g_calculate_pext", + &amd64g_calculate_pext, args)) ); + *uses_vvvv = True; + /* Flags aren't modified. */ + goto decode_success; What actually changed here? I can't see any difference. If it's just whitespace, don't include it. ------ +#define MULT512 4 I see some uses of MULT512, but it's unclear what it means. Please add a comment. (Later) I see you're using it to mean 'the number of 128 bit registers into which an Ity_I512 expression is computed'. Is there a better name for this? I couldn't have guessed the meaning just from the existing name. ------ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) + Int octet = e->Iex.Unop.op - Iop_V512to64_0; + vec = temp[octet/2]; Use 'UInt octet' so that we're sure the division just turns into a shift. ----- @@ -2669,6 +2793,49 @@ return dst; } + if (e->tag == Iex_Unop && e->Iex.Unop.op == Iop_ExtractExpF32) { + HReg dst = newVRegV(env); + HReg arg = iselFltExpr(env, e->Iex.Unop.arg); + sub_from_rsp(env, 32); + addInstr(env, AMD64Instr_Lea64(AMD64AMode_IR(0, hregAMD64_RSP()), hregAMD64_RDI())); + addInstr(env, AMD64Instr_Lea64(AMD64AMode_IR(4, hregAMD64_RSP()), hregAMD64_RSI())); It was a bit unclear to me what you're using the RDI value for. I had to look at this a couple of times to see that it is the first (only?) arg to h_generic_calc_GetExp32. Please add a short comment saying that; also for the clauses that follow (calls to h_generic_calc_GetMant32 and h_generic_calc_RoundScaleF32) ----- static void iselVecExpr_wrk_512 ( /*OUT*/ HReg *dst, ISelEnv* env, IRExpr* e ) { { on its own line ----- -- valgrind-clean/valgrind/VEX/priv/host_generic_fixup.c 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_fixup.c 2017-11-17 12:26:07.224622000 -0600 I see that you need a new source file here (fine), but (1) "fixup" is a pretty meaningless name -- it could mean anything. Please choose something that describes better what it does, eg host_generic_avx512_helpers.c (2) It needs a copyright notice. Copy from any other file (eg host_generic_simd256.c and s/OpenWorks etc/Intel (or whoever)) (3) Start/end comment blocks in the file would be nice. ----- --- valgrind-clean/valgrind/VEX/priv/host_generic_simd512.c 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_simd512.c 2017-11-17 12:21:00.513156000 -0600 In the copyright, s/OpenWorks GBR/Intel (or whoever) Please use Int instead of int in this file. ----- --- valgrind-clean/valgrind/VEX/priv/host_generic_simd512.h 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_simd512.h 2017-11-17 12:21:00.519156000 -0600 Change copyright notice name. ----- --- valgrind-clean/valgrind/VEX/pub/libvex_basictypes.h 2017-11-21 07:10:54.339770000 -0600 +++ valgrind-wip/valgrind/VEX/pub/libvex_basictypes.h 2017-11-14 09:46:54.822380000 -0600 @@ -71,6 +71,13 @@ /* Always 256 bits. */ typedef UInt U256[8]; +/* Always 512 bits. */ +typedef UInt U512[16]; + +/* Floating point. */ +typedef float Float; /* IEEE754 single-precision (32-bit) value */ +typedef double Double; /* IEEE754 double-precision (64-bit) value */ + Please add (at this point) STATIC_ASSERT(sizeof(Float) == 4) STATIC_ASSERT(sizeof(Double) == 8) (Call me paranoid. I don't care :-) ----- UInt V256; /* 32-bit value; see Ico_V256 comment above */ + ULong V512; /* 64-bit value; see Ico_V512 comment above */ Please align the 'V512' -- just to make it pretty ----- + /* MASKING */ + Iop_Mask32to512, Iop_Mask64to512, + Iop_Mask32to256, Iop_Mask64to256, Add a 1-line comment explaining roughly what these do, eg like the INTERLEAVING case just below. ----- + /* Detect duplicate values. Compare each element of the source vector + * for equality with all other elements closer to the least significant one, + * combine the comparison results to a bit vector */ + Iop_CfD32x16, Can you give this a better name? The comment makes sense, but I couldn't guess that from 'CfD'. What does CfD stand for? ----- + // V512 -> I64, must be sequential You could (partly) enforce this at compile time with STATIC_ASSERT(Iop_V512to64_7 == Iop_V512to64_0 + 7) ----- + Iop_Align32x16, Iop_Align64x8, + Iop_Expand32x16, Iop_Expand64x8, + Iop_Compress32x16, Iop_Compress64x8, + Iop_Ternlog32x16, Iop_Ternlog64x8, + Please add a short comment explaining approximately what these operations do. -----
(In reply to Tanya from comment #6) > Created attachment 109001 [details] > Nulgrind test for the AVX-512 instructions > > The test is based on the existing Nulgrind AVX and AVX-2 tests Looks fine. Just one comment: ----- UChar _randArray[2062] __attribute__((used)); Where does the 2062 number come from? I'd prefer a #define with some explanation.
(In reply to Tanya from comment #7) > Created attachment 109002 [details] > Nulgrind test for the AVX instructions on AVX-512 machine > > AVX regression test based on the existing Nulgrind AVX test, outputs ZMM > registers instead of YMM Is this intended to replace the existing AVX test? Or is it a new test? This is unclear.
(In reply to Tanya from comment #8) > Created attachment 109004 [details] > Nulgrind test for the AVX-2 instructions on AVX-512 machine > > AVX-2 regression test based on the existing Nulgrind AVX-2 test, outputs ZMM > instead of YMM Same question here as in comment 14.
(In reply to Tanya from comment #9) > Created attachment 109005 [details] > Nulgrind test for the serial vFMA instructions > > The "AVX-512_prototype_v3" patch changes the behaviour of the serial vFMA > instructions. For these instructions, Valgrind sets destination bits > [127:64] or [127:32] to zero; according to the ISE, these bits should remain > unchanged. The test covers the changed instructions. If I understand this right, that means the existing cases for serial vFMA insns are wrong, and also the VEX implementation is wrong. Is that correct? If so, shouldn't we just fix both the test case and implementation?
Hello Julian, Sorry for a late reply. Thank you very much for the comments. We have fixed most of these bugs, and hope to finish adding and debugging KNL AVX-512 instructions in about a month. Regarding the performance on AVX-2 code: >> As a side note -- before this lands, I would want to do some performance >> runs to check that this doesn't impact performance (or correctness) of >> existing IA support. On a few tiny AVX-2 benchmarks, Memcheck overhead is 0-1% bigger than that of a "clean" Valgrind version. We will run and measure it on bigger AVX-2 benchmarks. Do you have any obligatory benchmarks for Valgrind correctness and performance? Regarding the test files: >> Is this intended to replace the existing AVX test? Or is it a new test? >> This is unclear. The attached tests are new tests, usable on AVX-512 machines only. They recheck AVX and AVX-2 instructions on bigger vector registers, similarly to how avx-2.c test rechecks avx-1 instructions on ymm registers. Would it be ok to keep them as three separate tests files for AVX-512, or should they be merged into one avx-512.c test file? Regarding the FMA instructions: >> If I understand this right, that means the existing cases for serial vFMA >> insns are wrong, and also the VEX implementation is wrong. Is that >> correct? If so, shouldn't we just fix both the test case and >> implementation? The issue was, for serial (32- and 64-bit) FMA instructions, Valgrind used to set bits [128:32] or [128:64] of the destination to zero, while they should be left unchanged. We have fixed the implementation and added a new test, because the none/tests/amd64/fma.c test seems to be designed to only verify one float or one double value. Would you prefer us to provide not-AVX-512-related changes as a separate patches? I also have a question on our implementation of translation of EVEX instructions to IR. Currently, we use separate functions for VEX- and EVEX- prefixed instructions (file VEX/priv/guest_amd64_toIR.c, functions, for example, dis_ESC_0F38__VEX and dis_ESC_0F38__EVEX, respectively). However, looking at the next Intel AVX-512 instruction sets, the VL (Vector Length) set allows to run EVEX-prefixed instructions on xmm and ymm registers, so it basically duplicates the VEX code (for example, EVEX-prefixed "vmovpdd xmm1, xmm2" is an equivalent of VEX-prefixed "vmovpdd xmm1, xmm2"). The easiest way to implement it would be to unite the EVEX- and VEX- translator functions into something like "dis_ESC_0F38__VEX_EVEX". On the upside, there would be less duplicated code. On the downside, it means that EVEX-related code would no longer be contained in separate __EVEX functions, so it would probably be more difficult to review. An alternate approach would be to add VL code (basically, copy the __VEX translations) to the __EVEX functions. As a downside, it may be bothersome to maintain the __VEX and the EVEX VL implementations identical. It we were to implement those instructions in the future, what would be a preferable approach? Thank you, Tanya
Hello, I have patched the "AVX-512_prototype_v3" patch. But I can not use this patch for _mm512_setr_pd(...) Could you please help me to solve it ? I encounter the following ERROR: [elliot@blacksburg ~]$ avx512-valgrind ./aaa ==4021== Memcheck, a memory error detector ==4021== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==4021== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==4021== Command: ./aaa ==4021== dis_ESC_0F3A__EVEX - UNRECOGNIZED OPCODE 0x1A ==4021== Invalid read of size 4 ==4021== at 0x400AB6: simd_assign(int, double*, double*) (in /home/elliot/aaa) ==4021== by 0x400825: main (in /home/elliot/aaa) ==4021== Address 0xfffffffffffffff1 is not stack'd, malloc'd or (recently) free'd ==4021== ==4021== ==4021== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==4021== Access not within mapped region at address 0xFFFFFFFFFFFFFFF1 ==4021== at 0x400AB6: simd_assign(int, double*, double*) (in /home/elliot/aaa) ==4021== by 0x400825: main (in /home/elliot/aaa) ==4021== If you believe this happened as a result of a stack ==4021== overflow in your program's main thread (unlikely but ==4021== possible), you can try to increase the size of the ==4021== main thread stack using the --main-stacksize= flag. ==4021== The main thread stack size used in this run was 8388608. And my code is: #include <xmmintrin.h> #include <emmintrin.h> #include <immintrin.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> #include <float.h> #include <avx512fintrin.h> #include <avx512dqintrin.h> #include <avx512cdintrin.h> typedef double real; typedef int64_t integer; typedef __m512d realVect ; typedef integer intVec __attribute__ ((vector_size(512))); #define VECTOR_SIZE 8 #define SHIFT_SIZE 3 #define modSize(a) (((a>>SHIFT_SIZE )+1)<< SHIFT_SIZE) #define NRM_RAND_VALE (5.0 - 10.0*(real)(rand()%10000)/(real)10000.0) real *new_real(int n) { real *c = (real *)aligned_alloc(1024, sizeof(real)*modSize(n)); if(c==NULL){ fprintf(stderr, "%s Error --- Cannot allocate real array\n",__FUNCTION__); exit(1); } return c; } integer *new_integer(int n) { integer *c = (integer *)aligned_alloc(1024, sizeof(integer)*n); if(c==NULL){ fprintf(stderr, "%s Error --- Cannot allocate int array\n",__FUNCTION__); exit(1); } return c; } inline void simd_assign(int n, real *c, real *a) { realVect *rvc=(realVect *)c; for(int i=0; i<n; i+=VECTOR_SIZE){ rvc[i>>SHIFT_SIZE] = _mm512_setr_pd(a[i], a[i+1], a[i+2], a[i+3], a[i+4], a[i+5], a[i+6], a[i+7]); } } int main(int argc, char *argv[]) { int N=100000; real *a=new_real(N); real *c=new_real(N); for(int i=0; i<N; i++) a[i] = NRM_RAND_VALE; simd_assign(N,c,a); for(int i=0;i<N; i++){ if(a[i] != c[i]){ printf("%s vector assign FAIL %d --- %f != %f\n",argv[0],i,c[i],a[i]); return 1; } } printf("PASS\n"); return 0; } Elliot
(In reply to Hou, Ao-ping from comment #18) > Hello, > I have patched the "AVX-512_prototype_v3" patch. > But I can not use this patch for _mm512_setr_pd(...) > Could you please help me to solve it ? > > I encounter the following ERROR: > ... > > Elliot Hello Elliot, I can not reproduce the issue, but it looks like the application uses AVX-512 "vinsertf64x4" instruciton, which haven't been implemented in the patch. We still have a few AVX-512 instructions missing. I'll attach the updated patch with all AVX-512 instructions next week. Thank you, Tanya
(In reply to Tanya from comment #19) > (In reply to Hou, Ao-ping from comment #18) > > Hello, > > I have patched the "AVX-512_prototype_v3" patch. > > But I can not use this patch for _mm512_setr_pd(...) > > Could you please help me to solve it ? > > > > I encounter the following ERROR: > > ... > > > > Elliot > > > Hello Elliot, > > I can not reproduce the issue, but it looks like the application uses > AVX-512 "vinsertf64x4" instruciton, which haven't been implemented in the > patch. > We still have a few AVX-512 instructions missing. I'll attach the updated > patch with all AVX-512 instructions next week. > > Thank you, > Tanya Hi, Tanya, Thank you so much. Once I get new patch, I will try it for basic operations in AVX-512 double precision. Best Regards Elliot
Created attachment 111693 [details] Updated AVX-512 implementation prototype with all KNL instructions The patch (AVX-512_prototype_v4_all_knl_insns.patch) implements all instructions from four subsets of AVX-512 (F, ER, CD, PF), available on KNL machines, in Nulgrind and in Memcheck. It also enables AVX-512 registers in vgdb. New source files, added by this patch: - VEX/priv/host_generic_simd512.c - Implementation of AVX-512 instrucitons in C, similar to VEX/priv/host_generic_simd256.c - VEX/priv/host_generic_vrcp14.c - Reference implementations of VRCP14 and VRSQRT14 instructions. - VEX/priv/host_generic_avx512er.c - Reference implementations of AVX-512 ER instruction set (VRCP28, VRSQRT28, and VEXP2 instructions). The latter two files are copied, with minor modifications, from https://software.intel.com/en-us/articles/reference-implementations-for-IA-approximation-instructions-vrcp14-vrsqrt14-vrcp28-vrsqrt28-vexp2 , files RECIP14.c and RECIP28EXP2.c, correspondingly AVX-512 KNL benchmarks that pass result verification under Nulgrind (not tested under Memcheck yet) with the patch: NPB IS, IOR, STREAM, STRIDE, AMG-2013, LAMMPS (both GCC-7.1.0 and Intel Compiler 17.0.4 builds). NPB benchmarks other than IS and the QMCPACK benchmark fail result verification or crash.
Created attachment 111694 [details] Nulgrind tests for AVX-512 machine, part 1
Created attachment 111695 [details] Nulgrind tests for AVX-512 machine, part 2
Created attachment 111696 [details] Nulgrind tests for AVX-512 machine, part 3
Created attachment 111697 [details] Nulgrind tests for AVX-512 machine, part 4
Created attachment 111698 [details] Nulgrind tests for AVX-512 machine, part 5 Files AVX-512_prototype_v4_tests.patch_p1 - AVX-512_prototype_v4_tests.patch_p5 are parts of new Nulgrind tests, split up because of site file size limit. To join them together, please run "cat AVX-512_prototype_v4_tests.patch_p? > AVX-512_prototype_v4_tests.patch" New test files, added by this patch (all in none/tests/amd64/): avx512.vgtest, avx512.c, avx512.stdout.exp, avx512.stderr.exp - test for AVX_512 instructions. VGETEXPPS and VSCALEFPS instruction tests fail on denormal values. avx-1_zmm.vgtest, avx-1_zmm.c, avx-1_zmm.stdout.exp, avx-1_zmm.stderr.exp - test for avx-1 instructions on AVX-512 machine. The only difference from original avx-1 test file is that it check bytes 511:256 of the used registers avx2-1_zmm.vgtest, avx2-1_zmm.c, avx2-1_zmm.stdout.exp avx2-1_zmm.stderr.exp - test for avx-2 instructions on AVX-512 machine. The only difference from the original file is that it check bytes 511:256 of the used registers
> Hi, Tanya, > > Thank you so much. > Once I get new patch, I will try it for basic operations in AVX-512 double > precision. > > Best Regards > > Elliot Hello Elliot, Attached a new patch. Sorry for the delay. Thank you, Tanya
Hi there, We hit the illegal instruction message on Skylake (6140) CPUs running valgrind on programs linked against OpenMPI 3.0.0; the cause being that with our GCC 5.5 and higher installs memset() causes the VPXORD command to be emitted during various initialisation routines which Valgrind can't handle. It took a bit of digging to track down what was going on, so we're certainly interested in this work! All the best, Chris (HPC sysadmin at Swinburne University of Technology in Melbourne, Australia)
(In reply to Tanya from comment #27) > > Hi, Tanya, > > > > Thank you so much. > > Once I get new patch, I will try it for basic operations in AVX-512 double > > precision. > > > > Best Regards > > > > Elliot > > Hello Elliot, > Attached a new patch. Sorry for the delay. > > Thank you, > Tanya Hi, Tanya, thank you. I'll try these codes later. Again, thank you so much. Best Regards. Elliot Hou, Ao-ping
(In reply to Tanya from comment #27) > > Hi, Tanya, > > > > Thank you so much. > > Once I get new patch, I will try it for basic operations in AVX-512 double > > precision. > > > > Best Regards > > > > Elliot > > Hello Elliot, > Attached a new patch. Sorry for the delay. > > Thank you, > Tanya Hi, Tanya, I cannnot merge with the following commands: git apply --stat ../AVX-512_prototype_v4_all_knl_insns.patch And there are two errors in this patch: memcheck/mc_translate.c @@ -3191,6 +3341,33 @@ IRAtom* expr2vbits_Triop ( MCEnv* mce, unary64Fx2_w_rm(mce, vatom1, vatom2), unary64Fx2_w_rm(mce, vatom1, vatom3))); And memcheck/tests/vbit-test/irops.c @@ -1131,12 +1143,169 @@ static irop_t irops[] = { { DEFOP(Iop_Rotx32, UNDEF_ALL), }, { DEFOP(Iop_Rotx64, UNDEF_ALL), }, { DEFOP(Iop_PwBitMtxXpose64x2, UNDEF_64x2_TRANSPOSE), .ppc64 = 1, .ppc32 = 1 }, Could you help me to solve it. THank you Elliot Hou, Ao-ping
*** This bug has been confirmed by popular vote. ***
*** Bug 420834 has been marked as a duplicate of this bug. ***
*** Bug 393351 has been marked as a duplicate of this bug. ***
Created attachment 128799 [details] Refactored implementation for Skylake machines Attached patch "AVX-512_prototype_all_skx_insns.patch". It implements Skylake AVX-512 instruction subsets (AVX-512F, AVX-512CD, AVX-512VL, AVX-512BW, AVX-512 DQ) and refactors AVX-512 functionality in the following way: - AVX-512 functionality is separated from the main code and only available under "AVX_512" define - AVX-512 instruction information is moved to C structures. They can be automatically generated form a master .csv file - Minimal AVX-512 Memcheck enabling The patch is in alpha-stage. It has been tested on AVX-512 NPB benchmarks: some benchmarks pass validation under Nulgrind and Memcheck, but validation failures and crashes still occur, and Memcheck often reports false-positive errors.
(In reply to Tanya from comment #34) Nice to see work resuming on this. Valgrind is almost useless for us at this point, because AVX-512 CPUs are now ubiquitous and the performance benefits are too large to ignore.
(In reply to Tanya from comment #34) > Created attachment 128799 [details] > Refactored implementation for Skylake machines > > Attached patch "AVX-512_prototype_all_skx_insns.patch". It implements > Skylake AVX-512 instruction subsets (AVX-512F, AVX-512CD, AVX-512VL, > AVX-512BW, AVX-512 DQ) and refactors AVX-512 functionality in the following > way: > - AVX-512 functionality is separated from the main code and only available > under "AVX_512" define > - AVX-512 instruction information is moved to C structures. They can be > automatically generated form a master .csv file > - Minimal AVX-512 Memcheck enabling > > The patch is in alpha-stage. It has been tested on AVX-512 NPB benchmarks: > some benchmarks pass validation under Nulgrind and Memcheck, but validation > failures and crashes still occur, and Memcheck often reports false-positive > errors. Hello Tanya, This work is great. But do you have any special reason to not to use git format-patch? This patch does not applies. Thank you, Alexandra
Created attachment 128856 [details] patch Hello Tanya, I slighly modified your patch to make it appliable (removed trailing whitespaces, etc.) and used git format-patch.
(In reply to Alexandra Hajkova from comment #37) > Created attachment 128856 [details] > patch > > Hello Tanya, > > I slighly modified your patch to make it appliable (removed trailing > whitespaces, etc.) and used git format-patch. Hello Alexandra, Thank you! I will use this format for further patches.
*** Bug 423182 has been marked as a duplicate of this bug. ***
Created attachment 129522 [details] patch
Created attachment 129525 [details] patch
Created attachment 129528 [details] patch
My git repo: https://github.com/sasshka/valgrind/commit/2a9d9c2a5e6021cd1b928e03aabb9e493cde5cdd I tested the patch on Knights Landing. I had to rename gdbserver xml files for gdbserver tests to pass - they used to have avx with upper cases: 64bit-avx512-valgrind-s2.xml 64bit-avx512-valgrind-s1.xml 64bit-avx512.xml The patch doesn't seem to handle AVX512 subsets, the knights landing I used didn't have KADD instruction which was added for BW/DQ (according to https://en.wikipedia.org/wiki/AVX-512#New_instructions_by_sets) avx512.stdout.exp is missing in the patch with makes avx512 test to always fail.
Created attachment 129652 [details] patch
*** Bug 426330 has been marked as a duplicate of this bug. ***
*** Bug 433272 has been marked as a duplicate of this bug. ***
Created attachment 135971 [details] Part 1 of AVX-512 patch - main implementation
Created attachment 135972 [details] Part 2 of AVX-512 patch - auto-generated files
Created attachment 135973 [details] Part 3 of AVX-512 patch - AVX-512 tests
Created attachment 135974 [details] Part 4 of AVX-512 patch - (optional) Generator of AVX-512 .c files
Hello, Attached updated AVX-512 patch for KNL and SkyLake. It is based on the GIT master from Feb. 20 (commit 1c9a0bf58a47e855e6e5bf78a30bcee0af835804) Attached files ~~~~~~~~~~~~~~ - AVX-512_KNL_SKX_p1_main.patch - Main AVX-512 Valgrind patch - AVX-512_KNL_SKX_p2_data.patch - Files, automatically generated by a script. These files are required for the build - AVX-512_KNL_SKX_p3_test.patch - AVX-512 regression tests - AVX-512_KNL_SKX_p4_filegen.patch - Generator of Valgrind AVX-512 .c files from a file with instruction descriptions. Not required for build or usage; useful for adding new instructions or modifying AVX-512 behaviour Functionality ~~~~~~~~~~~~~ - No known regressions on AVX2 machines - Nulgrind is functional (no known failures of small AVX-512 applications) on KNL and SkyLake - Memcheck does not crash on AVX-512 code; the analysis might still be incorrect Limitations ~~~~~~~~~~~ - Needs GCC version 8 or newer - Cannot emulate AVX-512 code on AVX-2 or older machine (some AVX-512 instructions are emulated through intrinsics) Build and test ~~~~~~~~~~~~~~ Clone Valgrind master: > git clone https://sourceware.org/git/valgrind.git > cd valgrind Apply the patch: > git apply AVX-512_KNL_SKX_p1_main.patch > git apply AVX-512_KNL_SKX_p2_data.patch > git apply AVX-512_KNL_SKX_p3_test.patch Optional development tool, not required for the build process: > git apply AVX-512_KNL_SKX_p4_filegen.patch Build: Please check that GCC version is 8 or higher. > ./autogen.sh > ./configure --prefix=<install-path> On AVX-512 machine, verify that AVX-512 version will be built: open config.log and check that "BUILD_AVX512_TESTS_TRUE=''" and "CFLAGS=' -DAVX_512'" lines exist. > make install Quick test: <install-path>/bin/valgrind ls Invoke a script to generate reference AVX-512 tests results (they are not provided with the patch because file size exceeds limit set by bugtracker): > ./scripts/generate_test_results.sh Run regression tests: > make regtest
*** Bug 428004 has been marked as a duplicate of this bug. ***
Created attachment 140025 [details] Version 2: Part 1 of AVX-512 patch - main implementation
Created attachment 140026 [details] Version 2: Part 2 of AVX-512 patch - auto-generated files
Created attachment 140027 [details] Version 2: Part 3 of AVX-512 patch - AVX-512 tests
Created attachment 140028 [details] Version 2: Part 4 of AVX-512 patch - (optional) Generator of AVX-512 .c files
Attached an updated version of the AVX-512 patch for KNL and SkyLake. It is based on Valgrind 3.18 (GIT master from Jul, 13, commit 61307ee83121aa5f0b57a12a80e90fc2f414380a) The major improvements are: - Fixes for runtime crashes - Changed the way masked instructions access memory For build and test instructions, please refer to comment #51 (https://bugs.kde.org/show_bug.cgi?id=383010#c51)
Created attachment 142992 [details] Version 3: Part 1 of AVX-512 patch - main implementation
Created attachment 142993 [details] Version 3: Part 2 of AVX-512 patch - auto-generated files
Created attachment 142994 [details] Version 3: Part 3 of AVX-512 patch - AVX-512 tests
Created attachment 142995 [details] Version 3: Part 4 of AVX-512 patch - (optional) Generator of AVX-512 .c files
Attached an updated version of the AVX-512 patch for KNL and SkyLake. It does not contain any principal changes comparing to the previous version, just bug fixes. It is based on Valgrind 3.18 (GIT master from Oct 22, commit b77dbefe72e4a5c7bcf1576a02c909010bd56991) For build and test instructions, please refer to comment #51 (https://bugs.kde.org/show_bug.cgi?id=383010#c51)
Could someone please work on getting these changes into mainline? Or at least make this a branch in the git repository? (Is there anything I can do to help?) AVX-512 is now a "must have" for us.
(In reply to Patrick J. LoPresti from comment #63) > Could someone please work on getting these changes into mainline? Or at > least make this a branch in the git repository? (Is there anything I can do > to help?) > > AVX-512 is now a "must have" for us. Hello, If you would make the patch into a new branch - would you be interested in its internal commit history, perhaps for git blame? If yes, preparing it with git-format-patch turned out to be too verbose. If it'd be useful, would it be possible to grant me access to this branch, perhaps temporarily?
(In reply to Tanya from comment #64) > > If you would make the patch into a new branch - would you be interested in > its internal commit history, perhaps for git blame? I am not a Valgrind maintainer; I have only submitted a few patches. I just did not want to ask for something without offering something... I just think it would be really nice if this support could live on a branch in the official repository, if not (yet) delivered to the mainline.
*** Bug 441609 has been marked as a duplicate of this bug. ***
Created attachment 144910 [details] Demonstrates misbehaving `vmovdqu8 %ymm7, (%r9){%k5}` I've been testing the patches from comment 58, 59, 60 against the trunk, using Fedora 35 running on a Core i5-1135G7. It passes the tests in the comment 60 patch, but causes regressions in various other tests. I tracked one problem down to an incorrect implementation of 256-bit stores that use a guard register (k1 .. k7). This causes glibc's memset() to misbehave, hence causing --tool=none runs to fail. Testcase is attached. I imagine it's caused by an incorrect translation into IR, but I haven't figured out how that translation is done.
Created attachment 144930 [details] valgrind-avx512-rollup-fixes-2021Dec29.diff Rollup fixes to be applied on top of (after) the patches in comments 58, 59 and 60: * fixes the problem described in comment 67. The patch set extends AMD64Instr::CStore and AMD64Instr::CLoad to also handle 8- and 16- bit conditional stores and loads. However, the emit_AMD64Instr clauses for these cases were not correct and still generating 64-bit transactions. This fixes them. That removes a bunch of incorrect results in regression tests, and crashing when running real programs. The test case avx512-skx is still failing, though. * [trivial fixes] fixes segfaults caused by insufficient alignment in test cases avx512-l1.c and avx512-l2.c * [temporary] disables a few test cases in avx512-l1.c since they don't run on my hardware (Core i5-1135G7), even natively.
Created attachment 144938 [details] Fix copyright notices on the new AVX-512 files Patches attached in comment #58, comment #59, comment #60 and comment #61 had incorrect copyright notices. Attached a patch with a fix.
Created attachment 145020 [details] Fix handling of no-mask reg-reg versions of VEXPAND* and VCOMPRESS* Here's a bug fix for the VEXPAND and VCOMPRESS instructions, specifically for the register-to-register, mask-free versions. By "mask-free" I mean they do not specify any of `{k1}` to `{k7}`. (I think that makes the instructions into trivial reg-to-reg copies, but that's irrelevant). The bug is that the generated IR acts as if `{k0}` had been specified, and so the result depends on whatever value is in `k0` at the time. I worry that there are potentially other places where the IR is generated using `getKReg(mask)` when really it should be `mask ? getKReg(mask) : mkU64(0)`, and that testing isn't catching these. Not sure though.
Created attachment 145364 [details] Update for AVX-512 Valgrind regression tests to spot differences between mask k0 and no mask (In reply to Julian Seward from comment #70) > I worry that there are potentially other places where the IR is generated > using `getKReg(mask)` when really it should be `mask ? getKReg(mask) : > mkU64(0)`, and that testing isn't catching these. Not sure though. Attached a patch to AVX-512 regression tests that catches this kind of errors by writing a random value in k0 before each test. It did not detect any other affected instructions.
Small remark: If valgrind is compiled explicitly for the skylake_avx512 architecture with GCC 11 (e.g. as done by https://spack.readthedocs.io on appropriate hardware), there is a kmovq instruction generated in vgpreload_memcheck-amd64-linux.so. This triggers an "unrecognised instruction" error for any executable run with valgrind (memcheck): ``` vex amd64->IR: unhandled instruction bytes: 0xC4 0xE1 0xFB 0x92 0xC8 0x48 0x8D 0x5 0xC0 0x71 vex amd64->IR: REX=0 REX.W=1 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F vex amd64->IR: PFX.66=0 PFX.F2=1 PFX.F3=0 ==3481== valgrind: Unrecognised instruction at address 0x4c629f4. ==3481== at 0x4C629F4: stpcpy (vg_replace_strmem.c:1180) ... ``` Everything works of course fine, if one just compiles (generically) for arch=x86_64.
Created attachment 146572 [details] Version 4: Part 2 of AVX-512 patch - auto-generated files
Created attachment 146573 [details] Version 4: Part 4 of AVX-512 patch - (optional) Generator of AVX-512 .c files
Julian reported VPTESTMW and VPTESTMB instruction failures - these instructions read data with incorrect granularity (64 and 32 bits instead of 16 and 8 bits, respectively). The granularity has been specified in the master file incorrectly. Attached an updated master file (also corrects granularity for VPTESTNMW and VPTESTNMB, and corrects exception types for several instructions) and the generated files ("Version 4: Part 2 of AVX-512 patch - auto-generated files").
*** Bug 408140 has been marked as a duplicate of this bug. ***
*** Bug 455279 has been marked as a duplicate of this bug. ***
*** Bug 451837 has been marked as a duplicate of this bug. ***
Are Tanya's patches still pending review? Is there an outstanding known issue with them?
*** Bug 458305 has been marked as a duplicate of this bug. ***
*** Bug 458218 has been marked as a duplicate of this bug. ***
*** Bug 462135 has been marked as a duplicate of this bug. ***
*** Bug 463082 has been marked as a duplicate of this bug. ***
(In reply to Sam James from comment #79) > Are Tanya's patches still pending review? Is there an outstanding known > issue with them? Similar question... What are the outstanding tasks here, and do they have owners? Can I help?
*** Bug 468544 has been marked as a duplicate of this bug. ***
*** Bug 460203 has been marked as a duplicate of this bug. ***
*** Bug 450952 has been marked as a duplicate of this bug. ***
Excellent work, it's a long time :) Is there any schedule for upstreaming about this feature ? these patches looks like a prototype implementation ? I opened another thread about RISC-V vector, some issues should be common in VEX or plugin Memcheck, even more vector ISA generator :) Any suggestions for these ? @Tanya @Julian Seward BTW, is there anyone interested in ARM's SVE or RISC-V vector ? See more details from https://bugs.kde.org/show_bug.cgi?id=468979
*** Bug 417572 has been marked as a duplicate of this bug. ***
*** Bug 339416 has been marked as a duplicate of this bug. ***
*** Bug 469878 has been marked as a duplicate of this bug. ***
*** Bug 470489 has been marked as a duplicate of this bug. ***
*** Bug 480545 has been marked as a duplicate of this bug. ***
*** Bug 481729 has been marked as a duplicate of this bug. ***
I wanted to give these patches a try, but I can't figure out which version of Valgrind these are supposed to apply to.
*** Bug 487124 has been marked as a duplicate of this bug. ***
*** Bug 489221 has been marked as a duplicate of this bug. ***
*** Bug 490009 has been marked as a duplicate of this bug. ***