413330 – avx-1 test fails on AMD EPYC 7401P 24-Core Processor

Bug 413330 - avx-1 test fails on AMD EPYC 7401P 24-Core Processor

Summary: avx-1 test fails on AMD EPYC 7401P 24-Core Processor

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	unspecified
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Mark Wielaard

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-10-22 19:07 UTC by Alexandra Hajkova
Modified:	2022-02-16 22:27 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
fix avx-1 amd64 test (1.42 MB, patch) 2019-11-12 12:25 UTC, Alexandra Hajkova	Details
Add avx_estimate_insn.stdout.exp-amd variant (44.74 KB, text/plain) 2022-02-08 12:40 UTC, Mark Wielaard	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Alexandra Hajkova 2019-10-22 19:07:56 UTC

SUMMARY
avx-1 test fails on AMD EPYC 7401P 24-Core Processor due to slightly different RSQRTPS implementation

rsqrtps may produce slighly different results on different CPU families because the results of instructions like reciprocal square root estimate are not defined by the IEEE standard (https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/)

I tried to run this simple code on AMD EPYC 7401P 24-Core Processor (dell) and
Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (intel)

code:
#include<stdio.h>

int main()
{
    float arr_in[4], arr_out[4];

    arr_in[0] = 1;
    arr_in[1] = 2;
    arr_in[2] = 3;
    arr_in[3] = 4;


    arr_out[0] = 0;
    arr_out[1] = 0;
    arr_out[2] = 0;
    arr_out[3] = 0;

    __asm__ __volatile__(
        "movups  %0, %%xmm0; \n\t"
        "rsqrtss %%xmm0, %%xmm1; \n\t"
        "movups  %%xmm1, %1; \n\t"

        : "=m" (arr_in)
        :  "m" (arr_out)
        :
    );

    printf("arr_out: ");
    for (int i = 0; i < 4; i++)
        printf("%lf ", arr_out[i]);
    printf("\n");

    return 0;
}

the results:
dell: arr_out: 0.999878 0.707031 0.577271 0.499939
intel: arr_out: 0.999756 0.706909 0.577271 0.499878

avx-1 test always expects exactly the same output which causes its failure on some CPU's

Comment 1 Mark Wielaard 2019-10-29 11:08:21 UTC

I am afraid rsqrtss is not like the other instructions in avx-1.

The blog post you quote is very on topic:
> (https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/)

The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They are supposed to provide estimates with bounded errors

See also http://www-archive.xenproject.org/files/xensummit_germany09/AMD.pdf#page=15

So ideally we move these estimate instructions out of avx-1 and into their own testcase that checks the results are within the bounded error. Unfortunately I don't know of a good way to do that.

Alternatively we might for now just assume there are only two implementations, the intel and amd one, and just print the results as we do now, but create two different .exp files for the new testcase. The test would then just pass if the result is either like the intel or the amd implementation. We can see what to do if another implementation pops up.

Comment 2 Alexandra Hajkova 2019-11-12 12:25:59 UTC

Created attachment 123862 [details]
fix avx-1 amd64 test

My proposed patch:

The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests,
    not expected to give a fully accurate result. They may produce slighly different results
    on different CPU families because their results are not defined by the IEEE standard.
    This is the reason avx-1 test fails on amd now.
    
    This patch assumes there are only two implementations, the intel and amd one.
    It moves these estimate instructions out of avx-1 and into their own testcase -
    avx_estimate_insn and creates two different .exp files for intel and amd.

Comment 3 Mark Wielaard 2019-11-12 14:47:11 UTC

After some discussion on irc pushed to git master.
Thanks!

commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd
Author: Alexandra Hájková <ahajkova@redhat.com>
Date:   Mon Nov 11 14:30:26 2019 +0100

    fix avx-1 amd64 test
    
    The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the
    name suggests, not expected to give a fully accurate result. They may
    produce slighly different results on different CPU families because
    their results are not defined by the IEEE standard.  This is the
    reason avx-1 test fails on amd now.
    
    This patch assumes there are only two implementations, the intel and
    amd one.  It moves these estimate instructions out of avx-1 and into
    their own testcase - avx_estimate_insn and creates two different .exp
    files for intel and amd.
    
    https://bugs.kde.org/show_bug.cgi?id=413330

Comment 4 Mark Wielaard 2022-02-08 12:40:04 UTC

Created attachment 146437 [details]
Add avx_estimate_insn.stdout.exp-amd variant

commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd "fix avx-1 amd64 test" split off the estimate instructions into their own testcase avx_estimate_insn.

The commit message suggested that two .exp files would be added, one for the intel and one for the amd cases.

It seems the .exp-amd variant was forgotten. This patch adds it.

Comment 5 Mark Wielaard 2022-02-16 22:27:13 UTC

commit df214356db9ec0555e1f022688a381cee40f68c3
Author: Mark Wielaard <mark@klomp.org>
Date:   Tue Feb 8 13:12:46 2022 +0100

    none/tests/amd64/avx_estimate_insn.vgtest fails on AMD processors
    
    commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd
    "fix avx-1 amd64 test" split off the estimate instructions
    into their own testcase avx_estimate_insn.
    
    The commit message suggested that two .exp files would be
    added, one for the intel and one for the amd cases.
    
    It seems the .exp-amd variant was forgotten. This commit
    adds it.
    
    https://bugs.kde.org/show_bug.cgi?id=413330