SUMMARY avx-1 test fails on AMD EPYC 7401P 24-Core Processor due to slightly different RSQRTPS implementation rsqrtps may produce slighly different results on different CPU families because the results of instructions like reciprocal square root estimate are not defined by the IEEE standard (https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/) I tried to run this simple code on AMD EPYC 7401P 24-Core Processor (dell) and Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (intel) code: #include<stdio.h> int main() { float arr_in[4], arr_out[4]; arr_in[0] = 1; arr_in[1] = 2; arr_in[2] = 3; arr_in[3] = 4; arr_out[0] = 0; arr_out[1] = 0; arr_out[2] = 0; arr_out[3] = 0; __asm__ __volatile__( "movups %0, %%xmm0; \n\t" "rsqrtss %%xmm0, %%xmm1; \n\t" "movups %%xmm1, %1; \n\t" : "=m" (arr_in) : "m" (arr_out) : ); printf("arr_out: "); for (int i = 0; i < 4; i++) printf("%lf ", arr_out[i]); printf("\n"); return 0; } the results: dell: arr_out: 0.999878 0.707031 0.577271 0.499939 intel: arr_out: 0.999756 0.706909 0.577271 0.499878 avx-1 test always expects exactly the same output which causes its failure on some CPU's
I am afraid rsqrtss is not like the other instructions in avx-1. The blog post you quote is very on topic: > (https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/) The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They are supposed to provide estimates with bounded errors See also http://www-archive.xenproject.org/files/xensummit_germany09/AMD.pdf#page=15 So ideally we move these estimate instructions out of avx-1 and into their own testcase that checks the results are within the bounded error. Unfortunately I don't know of a good way to do that. Alternatively we might for now just assume there are only two implementations, the intel and amd one, and just print the results as we do now, but create two different .exp files for the new testcase. The test would then just pass if the result is either like the intel or the amd implementation. We can see what to do if another implementation pops up.
Created attachment 123862 [details] fix avx-1 amd64 test My proposed patch: The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They may produce slighly different results on different CPU families because their results are not defined by the IEEE standard. This is the reason avx-1 test fails on amd now. This patch assumes there are only two implementations, the intel and amd one. It moves these estimate instructions out of avx-1 and into their own testcase - avx_estimate_insn and creates two different .exp files for intel and amd.
After some discussion on irc pushed to git master. Thanks! commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd Author: Alexandra Hájková <ahajkova@redhat.com> Date: Mon Nov 11 14:30:26 2019 +0100 fix avx-1 amd64 test The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They may produce slighly different results on different CPU families because their results are not defined by the IEEE standard. This is the reason avx-1 test fails on amd now. This patch assumes there are only two implementations, the intel and amd one. It moves these estimate instructions out of avx-1 and into their own testcase - avx_estimate_insn and creates two different .exp files for intel and amd. https://bugs.kde.org/show_bug.cgi?id=413330
Created attachment 146437 [details] Add avx_estimate_insn.stdout.exp-amd variant commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd "fix avx-1 amd64 test" split off the estimate instructions into their own testcase avx_estimate_insn. The commit message suggested that two .exp files would be added, one for the intel and one for the amd cases. It seems the .exp-amd variant was forgotten. This patch adds it.
commit df214356db9ec0555e1f022688a381cee40f68c3 Author: Mark Wielaard <mark@klomp.org> Date: Tue Feb 8 13:12:46 2022 +0100 none/tests/amd64/avx_estimate_insn.vgtest fails on AMD processors commit ef9ac3aa0fd3ed41d74707ffe49abe9ad2797ddd "fix avx-1 amd64 test" split off the estimate instructions into their own testcase avx_estimate_insn. The commit message suggested that two .exp files would be added, one for the intel and one for the amd cases. It seems the .exp-amd variant was forgotten. This commit adds it. https://bugs.kde.org/show_bug.cgi?id=413330