243404 – Port to zSeries

Bug 243404 - Port to zSeries

Summary: Port to zSeries

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	unspecified
Platform:	unspecified Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:	243270 253206
Blocks:
	Show dependency tree / graph

Reported:	2010-07-02 05:15 UTC by Florian Krohm
Modified:	2011-04-19 17:30 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Patch for configury and make (12.81 KB, patch) 2010-07-02 05:17 UTC, Florian Krohm	Details
Patch for VEX (740.88 KB, patch) 2010-07-02 05:19 UTC, Florian Krohm	Details
Patch for everything else (262.95 KB, patch) 2010-07-02 05:20 UTC, Florian Krohm	Details
Potential fix for the last test case failure in none (888 bytes, patch) 2010-07-05 10:11 UTC, Christian Borntraeger	Details
Version2: patch to implement VEX for s390x (749.78 KB, patch) 2010-07-27 09:13 UTC, Christian Borntraeger	Details
Version2: build system patch (12.20 KB, patch) 2010-07-27 09:14 UTC, Christian Borntraeger	Details
Version2: coregrind and tools (269.14 KB, patch) 2010-07-27 09:15 UTC, Christian Borntraeger	Details
Version2: Testsuite patches (28.99 KB, patch) 2010-07-27 09:38 UTC, Christian Borntraeger	Details
experimental patch to fix mvc (4.08 KB, patch) 2010-07-27 22:09 UTC, Christian Borntraeger	Details
Latest patch against 11347/2032 part1/2 (new files) (906.85 KB, patch) 2010-09-09 11:08 UTC, Christian Borntraeger	Details
Latest patch against 11347/2032 part2/2 (changes) (156.16 KB, patch) 2010-09-09 11:09 UTC, Christian Borntraeger	Details
updated patch to r11383/r2046 new files. (1/2) (919.15 KB, patch) 2010-09-27 14:06 UTC, Christian Borntraeger	Details
updated patch to r11383/r2046 changed files (2/2) (156.10 KB, patch) 2010-09-27 14:07 UTC, Christian Borntraeger	Details
Latest patch against mainline 11391 VEX 2057 new files (1/2) (919.22 KB, patch) 2010-10-01 12:53 UTC, Christian Borntraeger	Details
Latest patch against mainline 11391 VEX 2057 changed files (2/2) (152.01 KB, patch) 2010-10-01 12:54 UTC, Christian Borntraeger	Details
update to 11433/2065 new files (1/2) (927.78 KB, patch) 2010-10-12 20:07 UTC, Christian Borntraeger	Details
update to 11433/2065 changes (2/2) (155.46 KB, patch) 2010-10-12 20:08 UTC, Christian Borntraeger	Details
patch 1/2 against 11447/2066 new files (927.80 KB, patch) 2010-10-15 13:28 UTC, Christian Borntraeger	Details
patch 2/2 against 11447/2066 changes files (927.80 KB, patch) 2010-10-15 13:29 UTC, Christian Borntraeger	Details
patch 2/2 changed files (156.24 KB, patch) 2010-10-19 14:30 UTC, Christian Borntraeger	Details
Patch 1/2 against 3.6: new files (968.48 KB, patch) 2010-10-21 16:12 UTC, Christian Borntraeger	Details
Patch 2/2 against 3.6: changes (157.83 KB, patch) 2010-10-21 16:16 UTC, Christian Borntraeger	Details
Updated Patches for s390x 1/2 (created files) (970.87 KB, patch) 2010-11-09 13:33 UTC, Christian Borntraeger	Details
Updated Patches for s390x 2/2 (changed files) (159.28 KB, patch) 2010-11-09 13:34 UTC, Christian Borntraeger	Details
valgrind-s390x-build.patch (1.00 KB, patch) 2010-11-11 17:44 UTC, Jakub Jelinek	Details
Patch against 11502/2081 1of2 (new files) (970.25 KB, patch) 2011-01-20 13:01 UTC, Christian Borntraeger	Details
Patch against 11502/2081 2of2 (changes) (154.83 KB, patch) 2011-01-20 13:02 UTC, Christian Borntraeger	Details
what has changed since last patch set (6.98 KB, patch) 2011-01-20 13:04 UTC, Christian Borntraeger	Details
FYI: review feedback fir changes patch (11.11 KB, patch) 2011-02-11 09:35 UTC, Christian Borntraeger	Details
Combined patch addressing review comments (269.55 KB, patch) 2011-02-23 04:55 UTC, Florian Krohm	Details
Show Obsolete (24) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Florian Krohm 2010-07-02 05:15:14 UTC

Version:           unspecified
OS:                Linux

Hello, here is an initial patch-set to add support for zSeries (aka IBM
mainframe) to valgrind. The patch is against r11140  VEX r1981 and consists
of three parts
- patch for configury and make machinery
- VEX patch
- everything else

In this initial version we are targeting the following tools: memcheck,
massif, lackey, and none.

The set of supported instructions is fairly complete with a few of the
complex ones still missing. Binary floating point is supported.

This is work in progress and not ready for primetime. But we wanted to
announce it to get the review started.

If you want to play with the patch please consult the file README.s390 for
hardware requirements and limitations.

Comments are welcome.


Reproducible: Didn't try

Comment 1 Florian Krohm 2010-07-02 05:17:45 UTC

Created attachment 48535 [details]
Patch for configury and make

Comment 2 Florian Krohm 2010-07-02 05:19:04 UTC

Created attachment 48536 [details]
Patch for VEX

Comment 3 Florian Krohm 2010-07-02 05:20:08 UTC

Created attachment 48537 [details]
Patch for everything else

Comment 4 Christian Borntraeger 2010-07-05 10:11:26 UTC

Created attachment 48600 [details]
Potential fix for the last test case failure in none

After some fixing this is the current status for the test suite for none.

I only get a failure in faultstatus.
faultstatus fails in 3 of the 4 cases, but also without valgrind:
 - one problem is an s390 kernel bug (the si_code for SIGBUS is broken)
 - the hardware does not deliver the page offset during a page fault 
 - the ftruncate does not work because we open the temp file without write access. So the hardware can deliver two faults: SIGSEGV because of write access and SIGBUS because of no file backing. Turns out that on s390 we get SIGBUS for test2 instead of SIGSEV.


This is a potential fix for 2 of the 3 failures.

Comment 5 Christian Borntraeger 2010-07-27 09:13:56 UTC

Created attachment 49510 [details]
Version2: patch to implement VEX for s390x

Comment 6 Christian Borntraeger 2010-07-27 09:14:48 UTC

Created attachment 49511 [details]
Version2: build system patch

Comment 7 Christian Borntraeger 2010-07-27 09:15:14 UTC

Created attachment 49512 [details]
Version2: coregrind and tools

Comment 8 Christian Borntraeger 2010-07-27 09:38:34 UTC

Created attachment 49514 [details]
Version2: Testsuite patches

This is the last patch of the newest snapshot for valgrind on s390x.
Several bugfixes and enhancements
- make drd and helgrind work better by fixing noredir and the EX trampoline
- make the EX trampoline robust against signal handler containing EX
- new instructions (cvb,cvd,exrl)
- a start to move our test cases into the regression suite
- implement the s390x parts of other testcases
- fix signal handling during system call
- implement gdb attach
- implement core dump
- fix system calls with 6 parameters
- implement hw capabilities by testing instructions
- implement mkLazy3: I32 x I128 x I128 -> I128
- fix ptrace
- improve the dispatcher code for performance

The main todo is to fix MVC. MVC has two modes:
1. destructive overlap (looks like a bytewise left to right copy. Thats what we currently have) - used for memset
2. no destructive overlap (access concurrent for other CPUs on double word (64bit) boundary - used for memcopy and memory2memory assigments
The current version implements 1, since it will also work for 2. But this will cause tests to see 1 byte read/writes instead of 2,4 or 8 byte reads if gcc used the mvc.

The testsuite:
massif: 0 errors
lackey: 0 errors
cachegrind: 0 errors
none: 1 error left (known kernel problem - wrong si code for SIGBUS)
memcheck: 5 errors left (mostly mvc related)
helgrind: 6 errors left (also mostly mvc related)
drd: 18 errors (not yet checked why)

no support for exp-bbv,exp-ptrcheck and callgrind yet.


As a funny side note: valgrind triggered a gcc error in gcc 4.6. Reported and fixed: (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017)

Comment 9 Christian Borntraeger 2010-07-27 22:09:41 UTC

Created attachment 49547 [details]
experimental patch to fix mvc

Here is a patch that implements mvc differently. With this patch the performance is worse, but the test suite looks good, only callgrind and exp-ptrcheck are still not working.

== 439 tests, 38 stderr failures, 7 stdout failures, 0 post failures ==
callgrind/tests/clreq                    (stderr)
callgrind/tests/notpower2-hwpref         (stderr)
callgrind/tests/notpower2-use            (stderr)
callgrind/tests/notpower2-wb             (stderr)
callgrind/tests/notpower2                (stderr)
callgrind/tests/simwork1                 (stdout)
callgrind/tests/simwork1                 (stderr)
callgrind/tests/simwork2                 (stdout)
callgrind/tests/simwork2                 (stderr)
callgrind/tests/simwork3                 (stdout)
callgrind/tests/simwork3                 (stderr)
callgrind/tests/threads                  (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)
exp-ptrcheck/tests/bad_percentify        (stdout)
exp-ptrcheck/tests/bad_percentify        (stderr)
exp-ptrcheck/tests/base                  (stderr)
exp-ptrcheck/tests/ccc                   (stderr)
exp-ptrcheck/tests/fp                    (stderr)
exp-ptrcheck/tests/globalerr             (stderr)
exp-ptrcheck/tests/hackedbz2             (stdout)
exp-ptrcheck/tests/hackedbz2             (stderr)
exp-ptrcheck/tests/hp_bounds             (stderr)
exp-ptrcheck/tests/hp_dangle             (stderr)
exp-ptrcheck/tests/hsg                   (stdout)
exp-ptrcheck/tests/hsg                   (stderr)
exp-ptrcheck/tests/justify               (stderr)
exp-ptrcheck/tests/partial_bad           (stderr)
exp-ptrcheck/tests/partial_good          (stderr)
exp-ptrcheck/tests/preen_invars          (stdout)
exp-ptrcheck/tests/preen_invars          (stderr)
exp-ptrcheck/tests/pth_create            (stderr)
exp-ptrcheck/tests/pth_specific          (stderr)
exp-ptrcheck/tests/realloc               (stderr)
exp-ptrcheck/tests/stackerr              (stderr)
exp-ptrcheck/tests/strcpy                (stderr)
exp-ptrcheck/tests/supp                  (stderr)
exp-ptrcheck/tests/tricky                (stderr)
exp-ptrcheck/tests/unaligned             (stderr)
exp-ptrcheck/tests/zero                  (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
helgrind/tests/tc22_exit_w_lock          (stderr)
helgrind/tests/tc23_bogus_condwait       (stderr)
none/tests/faultstatus                   (stderr)

I will check if I can improve the performance.
Afterwards we can refresh all patch against latest upstream and slowly start the discussion what else is necessary for upstream integration.

Christian

Comment 10 Christian Borntraeger 2010-09-09 11:08:03 UTC

Created attachment 51454 [details]
Latest patch against 11347/2032 part1/2 (new files)

For reference, here is the latest version of the s390x port. We made good progress in reducing common code changes.  Most of the changes are now within elseif VG*s390 or in new files. We also did a lot of performance improvements thanks  due to spechelper and other things. This patch also has an EXecute implementation that uses self checking code instead of a trampoline. Given that we dont yet support callgrind and exp-ptrcheck, the test suite looks pretty good on SLES11SP1:

== 447 tests, 43 stderr failures, 10 stdout failures, 0 post failures ==
callgrind/tests/clreq                    (stderr)
callgrind/tests/notpower2-hwpref         (stderr)
callgrind/tests/notpower2-use            (stderr)
callgrind/tests/notpower2-wb             (stderr)
callgrind/tests/notpower2                (stderr)
callgrind/tests/simwork-both             (stdout)
callgrind/tests/simwork-both             (stderr)
callgrind/tests/simwork-branch           (stdout)
callgrind/tests/simwork-branch           (stderr)
callgrind/tests/simwork-cache            (stdout)
callgrind/tests/simwork-cache            (stderr)
callgrind/tests/simwork1                 (stdout)
callgrind/tests/simwork1                 (stderr)
callgrind/tests/simwork2                 (stdout)
callgrind/tests/simwork2                 (stderr)
callgrind/tests/simwork3                 (stdout)
callgrind/tests/simwork3                 (stderr)
callgrind/tests/threads-use              (stderr)
callgrind/tests/threads                  (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)
exp-ptrcheck/tests/bad_percentify        (stdout)
exp-ptrcheck/tests/bad_percentify        (stderr)
exp-ptrcheck/tests/base                  (stderr)
exp-ptrcheck/tests/ccc                   (stderr)
exp-ptrcheck/tests/fp                    (stderr)
exp-ptrcheck/tests/globalerr             (stderr)
exp-ptrcheck/tests/hackedbz2             (stdout)
exp-ptrcheck/tests/hackedbz2             (stderr)
exp-ptrcheck/tests/hp_bounds             (stderr)
exp-ptrcheck/tests/hp_dangle             (stderr)
exp-ptrcheck/tests/hsg                   (stdout)
exp-ptrcheck/tests/hsg                   (stderr)
exp-ptrcheck/tests/justify               (stderr)
exp-ptrcheck/tests/partial_bad           (stderr)
exp-ptrcheck/tests/partial_good          (stderr)
exp-ptrcheck/tests/preen_invars          (stdout)
exp-ptrcheck/tests/preen_invars          (stderr)
exp-ptrcheck/tests/pth_create            (stderr)
exp-ptrcheck/tests/pth_specific          (stderr)
exp-ptrcheck/tests/realloc               (stderr)
exp-ptrcheck/tests/stackerr              (stderr)
exp-ptrcheck/tests/strcpy                (stderr)
exp-ptrcheck/tests/supp                  (stderr)
exp-ptrcheck/tests/tricky                (stderr)
exp-ptrcheck/tests/unaligned             (stderr)
exp-ptrcheck/tests/zero                  (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
helgrind/tests/tc09_bad_unlock           (stderr)
helgrind/tests/tc23_bogus_condwait       (stderr)
memcheck/tests/badfree3                  (stderr)
none/tests/faultstatus                   (stderr)

Thats 8 errors without callgrind/exp-ptrcheck.

Performance got also much better:
-- Running  tests in perf ----------------------------------------------
bigcode1 src       :0.29s  no: 7.2s (24.9x, -----)  me:13.5s (46.6x, -----)
bigcode2 src       :0.30s  no:19.2s (64.1x, -----)  me:38.0s (126.6x, -----)
bz2      src       :1.06s  no: 8.3s ( 7.8x, -----)  me:25.6s (24.1x, -----)
fbench   src       :0.74s  no: 4.2s ( 5.6x, -----)  me: 9.2s (12.5x, -----)
ffbench  src       :0.61s  no: 2.5s ( 4.1x, -----)  me: 6.4s (10.4x, -----)
heap     src       :0.41s  no: 2.9s ( 7.0x, -----)  me:16.2s (39.6x, -----)
sarp     src       :0.04s  no: 0.7s (17.5x, -----)  me: 5.8s (144.0x, -----)
tinycc   src       :0.34s  no: 5.9s (17.3x, -----)  me:25.6s (75.1x, -----)
-- Finished tests in perf ----------------------------------------------

What needs to be done for integration?
- There might be 2 or 3 places where we can further reduce the common code changes
- Integrate several of our own tests into the regression test suite
Any more ideas?

Comment 11 Christian Borntraeger 2010-09-09 11:09:12 UTC

Created attachment 51455 [details]
Latest patch against 11347/2032 part2/2 (changes)

2nd part.

Comment 12 Christian Borntraeger 2010-09-27 14:06:09 UTC

Created attachment 52024 [details]
updated patch to  r11383/r2046 new files. (1/2)

this is part1 of an updated patch.

on a z10 with sles11:

-- Running  tests in perf ----------------------------------------------
bigcode1 src       :0.29s  no: 7.1s (24.5x, -----)  me:13.2s (45.7x, -----)
bigcode2 src       :0.31s  no:18.9s (61.0x, -----)  me:36.9s (119.2x, -----)
bz2      src       :1.06s  no: 6.8s ( 6.4x, -----)  me:22.8s (21.5x, -----)
fbench   src       :0.74s  no: 3.0s ( 4.0x, -----)  me: 7.9s (10.6x, -----)
ffbench  src       :0.61s  no: 1.8s ( 3.0x, -----)  me: 5.8s ( 9.5x, -----)
heap     src       :0.41s  no: 2.6s ( 6.3x, -----)  me:16.3s (39.8x, -----)
sarp     src       :0.04s  no: 0.7s (16.8x, -----)  me: 5.6s (139.8x, -----)
tinycc   src       :0.35s  no: 5.5s (15.8x, -----)  me:25.1s (71.7x, -----)
-- Finished tests in perf ----------------------------------------------

== 8 programs, 16 timings =================

== 426 tests, 21 stderr failures, 6 stdout failures, 0 post failures ==
callgrind/tests/clreq                    (stderr)
callgrind/tests/notpower2-hwpref         (stderr)
callgrind/tests/notpower2-use            (stderr)
callgrind/tests/notpower2-wb             (stderr)
callgrind/tests/notpower2                (stderr)
callgrind/tests/simwork-both             (stdout)
callgrind/tests/simwork-both             (stderr)
callgrind/tests/simwork-branch           (stdout)
callgrind/tests/simwork-branch           (stderr)
callgrind/tests/simwork-cache            (stdout)
callgrind/tests/simwork-cache            (stderr)
callgrind/tests/simwork1                 (stdout)
callgrind/tests/simwork1                 (stderr)
callgrind/tests/simwork2                 (stdout)
callgrind/tests/simwork2                 (stderr)
callgrind/tests/simwork3                 (stdout)
callgrind/tests/simwork3                 (stderr)
callgrind/tests/threads-use              (stderr)
callgrind/tests/threads                  (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
helgrind/tests/tc09_bad_unlock           (stderr)
helgrind/tests/tc23_bogus_condwait       (stderr)
memcheck/tests/badfree3                  (stderr)
none/tests/faultstatus                   (stderr)

Comment 13 Christian Borntraeger 2010-09-27 14:07:09 UTC

Created attachment 52025 [details]
updated patch to  r11383/r2046 changed files (2/2)

2nd part

Comment 14 Christian Borntraeger 2010-10-01 12:53:30 UTC

Created attachment 52141 [details]
Latest patch against mainline 11391 VEX  2057 new files (1/2)

This is the latest patch set against mainline 11391 VEX  2057, since the old patch did no longer apply due to the fixes from 248893, 246888,247875.

This also fixes all issues that I have seen on x86:
- compile warnings due to 32 vs. 64bit
- fix arch_test to not run s390x test cases on other platforms

Please note that you have to manually do a
"chmod u+x none/tests/s390x/filter_stderr"
since patch/diff does not save the file attributes.

Comment 15 Christian Borntraeger 2010-10-01 12:54:29 UTC

Created attachment 52142 [details]
 Latest patch against mainline 11391 VEX 2057 changed files (2/2) 

2nd part.

Comment 16 Christian Borntraeger 2010-10-12 20:07:06 UTC

Created attachment 52457 [details]
update to  11433/2065 new files (1/2)

update to latest upstream.
- Contains s390x fix for https://bugs.kde.org/show_bug.cgi?id=243270
- exp-dhat seems to work
- firefox runs fine with memcheck and is actually reasonable usable with --tool=none
- also contains the fix from https://bugs.kde.org/show_bug.cgi?id=253206 to make tool=none run without error

As always, you have to manually do a "chmod u+x none/tests/s390x/filter_stderr"
since patch/diff does not save the file attributes.

Comment 17 Christian Borntraeger 2010-10-12 20:08:22 UTC

Created attachment 52458 [details]
update to  11433/2065 changes (2/2)

part two 11433/2065

Comment 18 Christian Borntraeger 2010-10-14 12:08:19 UTC

Testing with a newer compiler actually revealed a small typo:
This makes valgrind compile with gcc 4.5 again:
Index: include/valgrind.h
===================================================================
--- include/valgrind.h  (revision 866)
+++ include/valgrind.h  (working copy)
@@ -4327,7 +4327,7 @@
       ".cfi_remember_state\n\t"                                   \
       ".cfi_def_cfa r11, 0\n\t"
 #  define VALGRIND_CFI_EPILOGUE                                   \
-      "movq 11, 7\n\t"                                            \
+      "lgr 11, 7\n\t"                                             \
       ".cfi_restore_state\n\t"
 #else
 #  define __FRAME_POINTER

Comment 19 Christian Borntraeger 2010-10-15 13:28:43 UTC

Created attachment 52534 [details]
patch 1/2 against 11447/2066 new files

fixes a bug in the new cfi annotations for the function wrappers:
We have to save the argvec pointer because gcc might use r11 which we overwrite.
Compiles and works now fine on older and newer gcc.
This patch applies without offset/hunk agaist latest upstream as well as against the current 3.6 branch.

Comment 20 Christian Borntraeger 2010-10-15 13:29:14 UTC

Created attachment 52535 [details]
patch 2/2 against 11447/2066 changes files

Comment 21 Christian Borntraeger 2010-10-19 14:30:16 UTC

Created attachment 52679 [details]
patch 2/2 changed files

somehow the changed patch got mixed up with the new files one. Update this one.,

Comment 22 Christian Borntraeger 2010-10-21 16:12:48 UTC

Created attachment 52718 [details]
Patch 1/2 against 3.6: new files

Updated patch against 3.6, containing the new files.

- refreshed patches
- removed some test case changes that are not required for the port and do have or will have their own bugzilla
- Florian added partial support for the z196
- compare and swap insns now yield if comparison fails
- more fixes for the functions wrappers
- you have to "chmod u+x none/tests/s390x/filter_stderr"

These patches also apply against current head (there is an offset 113 for configure.in)

Test status:
Tested on x86 (32bit Debian): no regressions

Test results on a sles11sp1 s390x:
== 426 tests, 19 stderr failures, 6 stdout failures, 0 post failures ==
callgrind/tests/clreq                    (stderr)
callgrind/tests/notpower2-hwpref         (stderr)
callgrind/tests/notpower2-use            (stderr)
callgrind/tests/notpower2-wb             (stderr)
callgrind/tests/notpower2                (stderr)
callgrind/tests/simwork-both             (stdout)
callgrind/tests/simwork-both             (stderr)
callgrind/tests/simwork-branch           (stdout)
callgrind/tests/simwork-branch           (stderr)
callgrind/tests/simwork-cache            (stdout)
callgrind/tests/simwork-cache            (stderr)
callgrind/tests/simwork1                 (stdout)
callgrind/tests/simwork1                 (stderr)
callgrind/tests/simwork2                 (stdout)
callgrind/tests/simwork2                 (stderr)
callgrind/tests/simwork3                 (stdout)
callgrind/tests/simwork3                 (stderr)
callgrind/tests/threads-use              (stderr)
callgrind/tests/threads                  (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
helgrind/tests/tc23_bogus_condwait       (stderr)
none/tests/faultstatus                   (stderr)

Comment 23 Christian Borntraeger 2010-10-21 16:16:02 UTC

Created attachment 52719 [details]
Patch 2/2 against 3.6: changes

2nd part.
On a fedora like system the test suite even looks better, since callgrind works here:
== 410 tests, 8 stderr failures, 0 stdout failures, 0 post failures ==
drd/tests/annotate_spinlock              (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
helgrind/tests/tc23_bogus_condwait       (stderr)
memcheck/tests/partiallydefinedeq        (stderr)
none/tests/faultstatus                   (stderr)

Comment 24 Christian Borntraeger 2010-10-21 16:38:36 UTC

As another reminder. The patches work against the 3.6 branch in the svn repository.
If the patches are applied against the tarball, you have to rerun the autotools stuff for a refreshed configure

aclocal
autoheader
automake -a
autoconf

Comment 25 Christian Borntraeger 2010-11-09 13:33:10 UTC

Created attachment 53280 [details]
Updated Patches for s390x 1/2 (created files)

This is a minor update of the s390x patches. The patch set will apply
on valgrind svn 11477/2069 without any hunk/offset but also on valgrind-3.6 (there is an offset 113 in configure.in).

There are only minor changes (all non-critical) to the last patch set:
- make lackey F128 aware
- reduce VEX GENOFFSET definitions to what is really needed
- whitespace and comment fixes
- move implicit knowledge of dwarf return address handling from debuginfo.c into readdwarf.c
- add some testcase fixes for s390x
- fix massif stack handling with lonjump (see https://bugs.kde.org/show_bug.cgi?id=256043)

Comment 26 Christian Borntraeger 2010-11-09 13:34:49 UTC

Created attachment 53281 [details]
Updated Patches for s390x 2/2 (changed files)

2nd part.

Comment 27 Jakub Jelinek 2010-11-11 17:44:03 UTC

Created attachment 53325 [details]
valgrind-s390x-build.patch

Unfortunately those patches break building on all other architectures.

The problem is that the whole guest_s390_cc.c file is guarded with VGA_s390x,
and functions from it are referenced from other files which aren't guarded that way.

This patch lets me at least build it on x86_64-linux.  Of course ideally there would be a replacement for #ifndef VGA_s390x for the stuff currently performed by inline asm only.

Comment 28 Christian Borntraeger 2010-11-11 20:04:48 UTC

That is strange.
I have no problems building valgrind on x86 (32bit Debian) with these patches.

Will double check.

Comment 29 Christian Borntraeger 2010-11-11 20:29:43 UTC

I got myself an x86_64 system - I can reproduce the problem - and your fix.
Still dont understand why it does not happen on 32bit.

Some of the codition code stuff would become less efficient in C and x86 also has several helpers with inline asms. 

Julian, Jakub, given that this patch also works on arm and ppc, would it be ok to use this patch for the time being?

Comment 30 Florian Krohm 2011-01-20 00:28:05 UTC

(In reply to comment #29)
> I got myself an x86_64 system - I can reproduce the problem - and your fix.
> Still dont understand why it does not happen on 32bit.
> 

It's an optimization thing.. 
When compiling with -m32 GCC will figure out that the assertion
vassert(0 == sizeof(VexGuestS390XState) % VexGuestS390XStateAlignment);
does not hold and therefore the process ends here. That in turn renders the
assignment 
disInstrFn = disInstr_S390;
useless and it gets optimized away thereby removing the only reference to
disInstr_S390.

The undefined symbols that are reported on x86_64 are only reachable if
disInstr_S390 is reachable.

Comment 31 Christian Borntraeger 2011-01-20 13:01:03 UTC

Created attachment 56228 [details]
Patch against 11502/2081 1of2 (new files)

Here is an updated patch set against 11502/2081. The patch set also applies to the 3.6 branch (with some offsets here and there). It also includes the build fix from Jakub and 2 small fixes:
- improve the comment about page faults on s390 which do not deliver the offset within the page
- fix the test data class instruction condition code(also add a testcase)

Comment 32 Christian Borntraeger 2011-01-20 13:02:13 UTC

Created attachment 56229 [details]
Patch against 11502/2081 2of2 (changes)

2nd part

Comment 33 Christian Borntraeger 2011-01-20 13:04:59 UTC

Created attachment 56230 [details]
what has changed since last patch set

For review purposes: This patch contains all changes since the last patch.

Comment 34 Julian Seward 2011-02-05 13:45:31 UTC

(In reply to comment #32)
> Created an attachment (id=56229) [details]
> Patch against 11502/2081 2of2 (changes)

================================================================
Comments for c32-changes.diff (attachment ID = 56229)

Generally the quality is high and this looks like a pretty
clean patch.  Only one larger concern, and a bunch of smaller
points.  I did not yet look at the new files patch.

Larger concerns
---------------

libvex_ir.h:
new additions to IRRoundingMode: hrrm, this isn't good.
What's the deal?  Why are these necessary?


Smaller comments
----------------

syswrap-main.c
I didn't understand the change to the comment about the use of
registers in syscalls.  (line 60).  What's SYSNO in the NUM column?
Why isn't it a register name?

m_debuginfo/priv_storage.h
For the typedef of DiCfSI, can we have a comment pls similar to those
preceding the definitions, (eg) along the lines of existing comment
"On ARM it's pretty much the same, except .."

m_signals.c, extend_stack_if_appropriate()
the change from
  fault >= (esp - VG_STACK_REDZONE_SZB)
to
  fault >= VG_PGROUNDDN(esp - VG_STACK_REDZONE_SZB)
I'd prefer the non-s390x case to not be changed.  This logic
is a bit tricky to get right.

coregrind/m_syscall.c
#define _svc_clobber ...
inline this -- it's only used once

m_initimg/initimg-linux.c
+   vg_assert(0 == sizeof(VexGuestS390XState) % VexGuestS390XStateAlignment);
Is this necessary?  None of the other linux targets have that.
Plus pre_run_checks in m_scheduler/scheduler.c should check everything
anyway.

(jseward TODO:
LibVEX_Translate: insn_bytes: how do we know this is not overrun?
(on any target?)

VEX/priv/main_main.c:
+         /* Make sure the size of the guest state is small enough to be used
+            as an offset in a B12 amode. */
+         vassert(sizeof(VexGuestS390XState) < (1 << 12));
Why is this a useful check?  We could be asked to generate an offset
of up to 3 * sizeof(VexGuestS390XState) from the baseblock pointer.
Plus anyway, surely it is the job of the instruction selector to
generate in-range offsets for such amodes.

VEX/pub/libvex_ir.h
rename F64HLto128 to F64HLtoF128
ditto F128to64, F128HIto64

no need to repeat comment on top of Iop_CmpF32/128, just add
CmpF32 and CmpF128 to CmpF64

(jseward TODO:
We should get rid of IRCmpF64Result and rename it simply to
IRCmpFResult, and not bother with IRCmpF32/128Result)

libvex.h:
+/* Special value representing all available s390x hwcaps */
+#define VEX_HWCAPS_S390X_ALL   (VEX_HWCAPS_S390X_LDISP | \
+                                VEX_HWCAPS_S390X_EIMM  | \
+                                VEX_HWCAPS_S390X_GIE   | \
+                                VEX_HWCAPS_S390X_DFP)
Is this necessary?

configure.in
+       VGCONF_PLATFORM_SEC_CAPS=""
Fix inconsistent indentation (tab vs space problem?)

(jseward TODO: re-check the mc_translate stuff.)

mc_instrument.c: what type is F128 shadowed by?  I64 or I128?
If the latter, then F64HLto128 and inverses might be doable better
(and this is relevant to all platforms, not just s390x).

How are atomic memory operations handled?  I saw no new IR
constructions related to atomic memory ops, so I suppose the
existing LL/SC or CAS constructions are enough?

Why does memcheck/tests/supp_unknown.supp have a new entry?

include/pub_tool_machine.h
+#  define VG_MAX_INSTR_SZB          6
is that really right?  Seems a bit low.

include/valgrind.h
__SPECIAL_INSTRUCTION_PREAMBLE
is that really unique enough?  What does an lr instruction do?
If lr x,y is just an integer register move, I'd say it is not
unique enough: a really stupid compiler having a really bad day
might actually generate that code.

comment fix: "Similar craziness as x86"
shouldn't that be "as amd64" ?

=================================================================

Comment 35 Florian Krohm 2011-02-05 18:16:36 UTC

(In reply to comment #34)
> 

Thanks for the review. We'll fix the minor issues related to indentation, comments, and VEX floating point operator name changes in the next patch.

> Larger concerns
> ---------------
> 
> libvex_ir.h:
> new additions to IRRoundingMode: hrrm, this isn't good.
> What's the deal?  Why are these necessary?
> 

The "convert to fixed" instruction distinguishes between "round to 
nearest with ties away from 0" and "round to nearest with ties to even".
We've added Irrm_NEAREST_AWAY to support the former. However, we will
investigate whether GCC actually uses that mode. If not, we can
probably drop this rounding mode, as it's obscure enough. 

The Irrm_CURRENT is in support of "round according to the current setting
of the rounding mode in the FPC register". First, I tried to understand how
this was done in the PPC port but did not quite follow. Then I thought: If
we had Irrm_CURRENT, then we could implement a simple scheme where the FPC 
register is saved and restored whenever we switch between valgrind proper 
and client code. So the FPC value should always be correct for the client 
and we would not have to worry about tracking the actual rounding mode setting. 
That is what is currently implemented. But we're of course open for
suggestions.

> 
> Smaller comments
> ----------------
> 
> syswrap-main.c
> I didn't understand the change to the comment about the use of
> registers in syscalls.  (line 60).  What's SYSNO in the NUM column?
> Why isn't it a register name?
> 

Not sure. It should be. 

> m_initimg/initimg-linux.c
> +   vg_assert(0 == sizeof(VexGuestS390XState) % VexGuestS390XStateAlignment);
> Is this necessary?  None of the other linux targets have that.
> Plus pre_run_checks in m_scheduler/scheduler.c should check everything
> anyway.
> 

This is a leftover from long time ago (before r8804) where there was no
requirement that the size of the guest state ought to be a multiple of 16.
It is redundant now and we can get rid of it. 

> VEX/priv/main_main.c:
> +         /* Make sure the size of the guest state is small enough to be used
> +            as an offset in a B12 amode. */
> +         vassert(sizeof(VexGuestS390XState) < (1 << 12));
> Why is this a useful check?  We could be asked to generate an offset
> of up to 3 * sizeof(VexGuestS390XState) from the baseblock pointer.

Good point. I'll fix that.

> libvex.h:
> +/* Special value representing all available s390x hwcaps */
> +#define VEX_HWCAPS_S390X_ALL   (VEX_HWCAPS_S390X_LDISP | \
> +                                VEX_HWCAPS_S390X_EIMM  | \
> +                                VEX_HWCAPS_S390X_GIE   | \
> +                                VEX_HWCAPS_S390X_DFP)
> Is this necessary?
> 

This is in support of the hwcaps sanity check at the beginning
of the toplevel insn selector. If we need to add a new hwcap in the future
we only need to adapt VEX_HWCAPS_S390X_ALL and its definition is right 
there. If we did not have it we would have to remember adapting 
the sanity check in the selector which is harder.

> mc_instrument.c: what type is F128 shadowed by?  I64 or I128?
> If the latter, then F64HLto128 and inverses might be doable better
> (and this is relevant to all platforms, not just s390x).
> 

I128. We'll look into it.

> How are atomic memory operations handled?  I saw no new IR
> constructions related to atomic memory ops, so I suppose the
> existing LL/SC or CAS constructions are enough?
> 

The CAS constructions have been sufficient.

> Why does memcheck/tests/supp_unknown.supp have a new entry?
> 

Because we report the correct line (17 in "main") where the error occurs.
See also the file supp_unknown.stderr.exp-s390x there. So we need to
adapt the suppression pattern for the testcase to pass.

> include/pub_tool_machine.h
> +#  define VG_MAX_INSTR_SZB          6
> is that really right?  Seems a bit low.
> 

Yes, it's correct. Instructions are fixed length; 2, 4, or 6 bytes
long.

> include/valgrind.h
> __SPECIAL_INSTRUCTION_PREAMBLE
> is that really unique enough?  What does an lr instruction do?
> If lr x,y is just an integer register move, I'd say it is not
> unique enough: a really stupid compiler having a really bad day
> might actually generate that code.
> 
> comment fix: "Similar craziness as x86"
> shouldn't that be "as amd64" ?
> 
> =================================================================

Comment 36 Christian Borntraeger 2011-02-06 13:51:42 UTC

> syswrap-main.c
> I didn't understand the change to the comment about the use of
> registers in syscalls.  (line 60).  What's SYSNO in the NUM column?
> Why isn't it a register name?

is it uderstandable with this patch on top?


--- valgrind-ibm.orig/coregrind/m_syswrap/syswrap-main.c
+++ valgrind-ibm/coregrind/m_syswrap/syswrap-main.c
@@ -67,7 +67,13 @@
    ppc32  r0    r3   r4   r5   r6   r7   r8   n/a  n/a  r3+CR0.SO (== ARG1)
    ppc64  r0    r3   r4   r5   r6   r7   r8   n/a  n/a  r3+CR0.SO (== ARG1)
    arm    r7    r0   r1   r2   r3   r4   r5   n/a  n/a  r0        (== ARG1)
-   s390x  SYSNO r2   r3   r4   r5   r6   r7   n/a  n/a  r2        (== ARG1)
+   On s390x the svc instruction is used for system calls. The system call
+   number is encoded in the instruction (8 bit immediate field). Since Linux
+   2.6 it is also allowed to use svc 0 with the system call number in r1.
+   This was introduced for system calls >255, but works for all. It is
+   also possible to see the svc 0 together with an EXecute instruction, that
+   fills in the immediate field.
+   s390x r1/SVC r2   r3   r4   r5   r6   r7   n/a  n/a  r2        (== ARG1)
    AIX:
    ppc32  r2  r3   r4   r5   r6   r7   r8   r9   r10  r3(res),r4(err)
    ppc64  r2  r3   r4   r5   r6   r7   r8   r9   r10  r3(res),r4(err)

Comment 37 Christian Borntraeger 2011-02-06 22:19:57 UTC

Some more feedback on the review (thanks for doing it)

> m_debuginfo/priv_storage.h
> For the typedef of DiCfSI, can we have a comment pls similar to those
> preceding the definitions, (eg) along the lines of existing comment
> "On ARM it's pretty much the same, except .."

Will do

> m_signals.c, extend_stack_if_appropriate()
> the change from
>   fault >= (esp - VG_STACK_REDZONE_SZB)
> to
>   fault >= VG_PGROUNDDN(esp - VG_STACK_REDZONE_SZB)
> I'd prefer the non-s390x case to not be changed.  This logic
> is a bit tricky to get right.

Ok. What about
    fault >= fault_mask(esp - VG_STACK_REDZONE_SZB)

with fault_mask(x) = x & ~0xfff on s390x
and fault_mask(x) = x otherwise

> coregrind/m_syscall.c
> #define _svc_clobber ...
> inline this -- it's only used once

will do

include/valgrind.h

> __SPECIAL_INSTRUCTION_PREAMBLE
> is that really unique enough?  What does an lr instruction do?
> If lr x,y is just an integer register move, I'd say it is not
> unique enough: a really stupid compiler having a really bad day
> might actually generate that code.

Yes, these are 4 integer loads from 4 different registers into itself. Given that by ABI definition r15 is the stack pointer (and therefore not available for the standard register allocator in gcc) I think it is reasonably safe to assume that we will never see this sequence in real life. Shall we change it anyway?

> comment fix: "Similar craziness as x86"
> shouldn't that be "as amd64" ?

will do

Comment 38 Christian Borntraeger 2011-02-07 11:05:12 UTC

>> libvex_ir.h:
>> new additions to IRRoundingMode: hrrm, this isn't good.
>> What's the deal?  Why are these necessary?
>> 
> 
> The "convert to fixed" instruction distinguishes between "round to 
> nearest with ties away from 0" and "round to nearest with ties to even".
> We've added Irrm_NEAREST_AWAY to support the former. However, we will
> investigate whether GCC actually uses that mode. If not, we can
> probably drop this rounding mode, as it's obscure enough. 

gcc never sets a rounding mode, glibc does. But fesetround only supports
FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, and FE_TOWARDZERO so we should never see 
"NEAREST_AWAY". So lets drop this.

Comment 39 Julian Seward 2011-02-08 13:31:03 UTC

(In reply to comment #35)
> > libvex_ir.h:
> > new additions to IRRoundingMode: hrrm, this isn't good.
> > What's the deal?  Why are these necessary?
> > 
> 
> The "convert to fixed" instruction distinguishes between "round to 
> nearest with ties away from 0" and "round to nearest with ties to even".
> We've added Irrm_NEAREST_AWAY to support the former. However, we will
> investigate whether GCC actually uses that mode. If not, we can
> probably drop this rounding mode, as it's obscure enough. 

OK.

> The Irrm_CURRENT is in support of "round according to the current setting
> of the rounding mode in the FPC register". First, I tried to understand how
> this was done in the PPC port but did not quite follow. Then I thought: If
> we had Irrm_CURRENT, then we could implement a simple scheme where the FPC 
> register is saved and restored whenever we switch between valgrind proper 
> and client code. So the FPC value should always be correct for the client 
> and we would not have to worry about tracking the actual rounding
> mode setting.

The problem with this is (I think) that it means the behaviour of FP
instructions cannot be stated exactly in IR any more.  You have
introduced "under the table" semantics, in which the rounding to be
used for (eg) Iop_AddF64 is not specified by the first argument to the
primop itself (since you set it to Irrm_CURRENT), but instead depends
on how previous guest instructions set the rounding mode.  So the IR
translation for an FP add becomes context-dependent.

In the existing ports I used two schemes:

* for x86,amd64,arm-vfp:
  The rounding mode is simply ignored when translating FP
  instructions, and instead Irrm_NEAREST is used.  This means that
  if have code which sets the FPU to (eg) round to +infinity, and then
  do FP adds, subtracts, etc, it will produce different results
  natively vs on Valgrind.  This is generally not a big deal.  The
  only place where V respects the rounding mode is in FP<->Int
  conversions, since these simply wouldn't work at all if they
  ignored the rounding mode.

* on PPC/POWER, this approximation caused the AIX libc to not work.
  So the PPC front end really does encode the current rounding mode
  in all IR primops that require it.  eg:

   IRExpr* rm  = get_IR_roundingmode();
   ...
      case 0x15: // fadd (Floating Add (Double-Precision), PPC32 p400)
         if (frC_addr != 0)
            return False;
         DIP("fadd%s fr%u,fr%u,fr%u\n", flag_rC ? ".":"",
             frD_addr, frA_addr, frB_addr);
         assign( frD, triop(Iop_AddF64, rm, mkexpr(frA), mkexpr(frB)) );
         break;

  and the back end, on seeing an Iop_AddF64 or similar, sets the
  host's rounding mode accordingly, before generating the fadd itself.

  Now, that's insanely inefficient, particularly considering that
  the rounding mode has to be converted (in the front end, in IR)
  from the PPC format to the IR format (see get_IR_roundingmode in
  guest_ppc_toIR.c) and then back to the PPC format in the insn
  selector (see set_FPU_rounding_mode, roundModeIRtoPPC in
  host_ppc_isel.c)

  So host_ppc_isel.c optimises this: when it sees a setting of the
  host rounding mode, it checks whether the rounding mode has already
  been set to the same value in this superblock, and if so it omits
  the setting.  (see ISelEnv::previous_rm)  In most cases the effect
  is to reduce the overhead to one rounding-mode setting per
  superblock.  Given that FP superblocks tend to be quite long and
  have many FP insns in them, this is quite effective in reducing
  the overhead.

So I suggest you choose one of the above 2 schemes.  You could start
with the x86 cheap-hack scheme and upgrade to the PPC scheme if it
causes problems, or just use the PPC scheme directly.


> > libvex.h:
> > +/* Special value representing all available s390x hwcaps */
> > +#define VEX_HWCAPS_S390X_ALL   (VEX_HWCAPS_S390X_LDISP | \
> > +                                VEX_HWCAPS_S390X_EIMM  | \
> > +                                VEX_HWCAPS_S390X_GIE   | \
> > +                                VEX_HWCAPS_S390X_DFP)
> > Is this necessary?
> > 
> 
> This is in support of the hwcaps sanity check at the beginning
> of the toplevel insn selector. If we need to add a new hwcap in the future
> we only need to adapt VEX_HWCAPS_S390X_ALL and its definition is right 
> there. If we did not have it we would have to remember adapting 
> the sanity check in the selector which is harder.

Fine.

> > mc_instrument.c: what type is F128 shadowed by?  I64 or I128?
> > If the latter, then F64HLto128 and inverses might be doable better
> > (and this is relevant to all platforms, not just s390x).
> > 
> 
> I128. We'll look into it.

Are you sure about that?  But anyway, assuming yes, then you should
shadow them by Iop_64HLto128, Iop_128to64, Iop_128HIto64 since these
are cheaper than the PCasts (PCast is expensive) and more precise --
you're just steering bits around here, so the shadow operations can
just steer V bits in the same way.

Comment 40 Julian Seward 2011-02-08 13:39:58 UTC

(In reply to comment #37)

> > m_signals.c, extend_stack_if_appropriate()
> > the change from
> >   fault >= (esp - VG_STACK_REDZONE_SZB)
> > to
> >   fault >= VG_PGROUNDDN(esp - VG_STACK_REDZONE_SZB)
> > I'd prefer the non-s390x case to not be changed.  This logic
> > is a bit tricky to get right.
> 
> Ok. What about
>     fault >= fault_mask(esp - VG_STACK_REDZONE_SZB)
> 
> with fault_mask(x) = x & ~0xfff on s390x
> and fault_mask(x) = x otherwise

Yes, that would be good.  Pls do.


> > __SPECIAL_INSTRUCTION_PREAMBLE
> > is that really unique enough?  What does an lr instruction do?
> > If lr x,y is just an integer register move, I'd say it is not
> > unique enough: a really stupid compiler having a really bad day
> > might actually generate that code.
> 
> Yes, these are 4 integer loads from 4 different registers into itself. Given
> that by ABI definition r15 is the stack pointer (and therefore not available
> for the standard register allocator in gcc) I think it is reasonably safe to
> assume that we will never see this sequence in real life. Shall we change it
> anyway?

Well, it depends how paranoid you are, I suppose.  All the other
targets use sequences of rotate instructions which seem to me to be
even less likely to be generated by a human or compiler.  It's your
decision.  If there is ever a snafu in which the s390x front end
misinterprets an intended set of such moves as a special insn, whilst
deep in some 25 million LOC application you're running, you get to
spend days tracking down and fixing the bug, not me :-)

Comment 41 Julian Seward 2011-02-08 13:50:16 UTC

(In reply to comment #36)
> > syswrap-main.c
> > I didn't understand the change to the comment about the use of
> > registers in syscalls.  (line 60).  What's SYSNO in the NUM column?
> > Why isn't it a register name?
> 
> is it uderstandable with this patch on top?

Yes.

Comment 42 Julian Seward 2011-02-08 14:38:03 UTC

(In reply to comment #31)
> Created an attachment (id=56228) [details]
> Patch against 11502/2081 1of2 (new files)

=================================================================

Comments for c31-new-files.diff (attachment ID = 56228)

Again, not bad stuff.  I have three main concerns and a number of
smaller comments.


Larger concerns
---------------

1.
For the VEX stuff, I would much prefer the file structure to match the
file structure (and, therefore, the assignment of functions to files)
for the existing ports, even if you think that it generates some
overly large files (it does).  With your patch there are a whole bunch
of new files and I don't know what stuff is where.

2.
The handling of FP rounding modes.  As already discussed in comments
35 and 39 above.

3.
Unless I missed something, the instruction set tests seem mostly
missing.  There are some special case ones, but nothing like the
comprehensiveness of those for the other targets.  Eg, see
none/tests/arm/v6intThumb.c, none/tests/arm/vfp.c  Without something
that comprehensive, particularly for boring integer instructions, it's
hard to convince oneself that the implementation is really correct.
This might come back to bite you later, in that it can be literally
impossible to figure out why a large program is failing.  See
https://bugs.kde.org/show_bug.cgi?id=245925#c3
for a cautionary tale.


Smaller comments
----------------

sigframe-s390x-linux.c
just a check -- are you sure you need to handle both rt_sigframe
and the older plain sigframe formats?  AIUI the need for two
different formats mostly exists for older linux ports
(eg x86) and does not exist for newer ports (eg, amd64 --
see sigframe-amd64-linux.c)

dispatch-s390x-linux.S
run_innerloop__dispatch_unprofiled:
couldn't you save an instruction here?
       lgr  %r7, %r2              /* next guest addr */
       srlg %r7,%r7,1
-->
       srlg %r7,%r2,1

(We discussed this already, so this is
<kind of irrelevant>
  More seriously, you need to redo the way the FPC is handled.
  ... need to consider more .. (do FP primops take rounding
  mode that is actually considered?  they should; do like PPC)
  run_innerloop_exit: you really need to check that the FPC
  on exit from generated code is as expected.
  so when fixed up, need to check it's the right value at
  exit of the dispatcher (if ppc does that)
</kind of irrelevant>

vex stuff
would prefer a file structure that matches all the other ports

(sewardj to check: vex must be still be buildable standalone
with this patch); (cd VEX && make clean && make libvex.a)

guest_s390_toIR.c
+   /* fixs390: we should probably pass the resteer-function and the callback
+      data. It's not needed for correctness but improves performance. */
You should allow resteering (following of unconditional branches
and unconditional calls).  It can improve performance quite a bit.
This is something you can do with a followup patch, though -- not
important right now.

rename s390_irgen_noredir --> s390_irgen_call_noredir
(the whole purpose of this is to generate IR for a call instruction
which can't be redirected).

check -- are there vex own versions for:
+#define likely(x)       __builtin_expect(!!(x), 1)
+#define unlikely(x)     __builtin_expect(!!(x), 0)

host_s390_insn.c
+#include <stdarg.h>
urr, is this necessary?

(sewardj TODO: check everything for new #includes)

host_s390_disasm.[ch]
What's this for?  The host-side doesn't need any disassembly
facilities on any other platforms.


+/* fixs390: I'm not exactly sure whether it is allowed that cc_dep2 is refered
+            to twice in the expression we build up here. Elsewhere we try to avoid
+            that (see the bazillions mkU64(0) in irgen.c). On the other hand...
+            guest_x86_helper.c around line 1144 does this, too. */
It's OK because cc_dep2 should be an IRAtom (either an IRTemp or
IRConst) and so referring to it more than once does not duplicate work
-- specifically, it does not duplicate an IR expression tree (because
IRAtoms can't be trees).  You could assert that if you like ("isIRAtom()")

remove unused code in directReload_S390 (or make it work) --
probably is not important for s390, only has any noticable
effect on x86, since that has very few integer registers.

everywhere:
80 char max line length please.  In many places in
the front end it is a lot more that 80 chars wide.

+s390_irgen_FLOGR: "Note, that the Iop_Shl64 and
+      Iop_Shr64 semantics will preserve the value-to-be-shifted if the
+      shift-amount equals or is larger than the width of
value-to-be-shifted"
Wrong reason: The IR definition says that the behavior of these primops is
undefined in out of range shift cases -- regardless of what your
back end might do with them.

ISelEnv:
+    - A Bool to tell us if the host is 32 or 64bit.
+      This is set at the start and does not change.
Necessary?  I thought you only supported 64-bit guest and host
in this patch.

+#define VexGuestS390XStateAlignment 16
This can disappear now, yes?  pre_run_checks checks this for
all targets.

Comment 43 Christian Borntraeger 2011-02-09 09:43:34 UTC

Thanks for doing a deep review.

> 1.
> For the VEX stuff, I would much prefer the file structure to match the
> file structure (and, therefore, the assignment of functions to files)
> for the existing ports, even if you think that it generates some
> overly large files (it does).  With your patch there are a whole bunch
> of new files and I don't know what stuff is where.

Ok, so we will merge most of the guest_s390 and host_s390_ files.


> The handling of FP rounding modes.  As already discussed in comments
> 35 and 39 above.

Although this would be a step backwards in terms of full compliance,
I am tempted to do the x86 way and simply do default rounding to nearest
if the instruction normally would use the value from FPC. Most applications
do not set the rounding mode. We can improve that later if it is really
necessary for some tools.


> Unless I missed something, the instruction set tests seem mostly
> missing.  There are some special case ones, but nothing like the
> comprehensiveness of those for the other targets.  Eg, see
> none/tests/arm/v6intThumb.c, none/tests/arm/vfp.c  Without something
> that comprehensive, particularly for boring integer instructions, it's
> hard to convince oneself that the implementation is really correct.
> This might come back to bite you later, in that it can be literally
> impossible to figure out why a large program is failing.  See
> https://bugs.kde.org/show_bug.cgi?id=245925#c3
> for a cautionary tale.

Yes, for several reasons (e.g. licence, historical reasons) we have lots
of our testing outside the regression test suite (e.g compiler test
framework). Still, we have several testcases that could be moved to 
the regression suite relatively easy - will do.


> sigframe-s390x-linux.c
> just a check -- are you sure you need to handle both rt_sigframe
> and the older plain sigframe formats?  AIUI the need for two
> different formats mostly exists for older linux ports
> (eg x86) and does not exist for newer ports (eg, amd64 --
> see sigframe-amd64-linux.c)

Yes, s390/s390x was ported around 2000 -> lots of legacy cruft and
we need both.


> dispatch-s390x-linux.S
> run_innerloop__dispatch_unprofiled:
> couldn't you save an instruction here?
>        lgr  %r7, %r2              /* next guest addr */
>        srlg %r7,%r7,1
> -->
>        srlg %r7,%r2,1

Nice catch. Fixed in our repository. thanks


> (We discussed this already, so this is
> <kind of irrelevant>
>   More seriously, you need to redo the way the FPC is handled.
>   ... need to consider more .. (do FP primops take rounding
>   mode that is actually considered?  they should; do like PPC)
>   run_innerloop_exit: you really need to check that the FPC
>   on exit from generated code is as expected.
>   so when fixed up, need to check it's the right value at
>   exit of the dispatcher (if ppc does that)
> </kind of irrelevant>
see above, I want to start with the x86 way, but this was mostly
done by Florian, so he has more insight.

> vex stuff
> would prefer a file structure that matches all the other ports
see above. Will merge the files into a
guest_s390_defs.h guest_s390_helpers.c guest_s390_toIR.c
and
host_s390_defs.c host_s390_defs.h  host_s390_isel.c


> (sewardj to check: vex must be still be buildable standalone
> with this patch); (cd VEX && make clean && make libvex.a)
> 
> guest_s390_toIR.c
> +   /* fixs390: we should probably pass the resteer-function and the callback
> +      data. It's not needed for correctness but improves performance. */
> You should allow resteering (following of unconditional branches
> and unconditional calls).  It can improve performance quite a bit.
> This is something you can do with a followup patch, though -- not
> important right now.

Yes, we will do it later.

 
> rename s390_irgen_noredir --> s390_irgen_call_noredir
> (the whole purpose of this is to generate IR for a call instruction
> which can't be redirected).

done.


> check -- are there vex own versions for:
> +#define likely(x)       __builtin_expect(!!(x), 1)
> +#define unlikely(x)     __builtin_expect(!!(x), 0)
> 
> host_s390_insn.c
> +#include <stdarg.h>
> urr, is this necessary?

For va_arg. We could use __buildin_va_arg to get rid of this include.
This is used for the instruction printing below.

> 
> (sewardj TODO: check everything for new #includes)
> 
> host_s390_disasm.[ch]
> What's this for?  The host-side doesn't need any disassembly
> facilities on any other platforms.

Its mostly pretty printing for the trace/profile flags.
Maybe s390_disasm is a misnomer, what about print_instruction?

> +/* fixs390: I'm not exactly sure whether it is allowed that cc_dep2 is refered
> +            to twice in the expression we build up here. Elsewhere we try to
> avoid
> +            that (see the bazillions mkU64(0) in irgen.c). On the other
> hand...
> +            guest_x86_helper.c around line 1144 does this, too. */
> It's OK because cc_dep2 should be an IRAtom (either an IRTemp or
> IRConst) and so referring to it more than once does not duplicate work
> -- specifically, it does not duplicate an IR expression tree (because
> IRAtoms can't be trees).  You could assert that if you like ("isIRAtom()")

Comment removed. 

> 
> remove unused code in directReload_S390 (or make it work) --
> probably is not important for s390, only has any noticable
> effect on x86, since that has very few integer registers.

Its mostly working, but somehow creates a failure in our compiler test suite
and origin-bz5. Will remove until debugged.

> 
> everywhere:
> 80 char max line length please.  In many places in
> the front end it is a lot more that 80 chars wide.

Ok, will fix.

> 
> +s390_irgen_FLOGR: "Note, that the Iop_Shl64 and
> +      Iop_Shr64 semantics will preserve the value-to-be-shifted if the
> +      shift-amount equals or is larger than the width of
> value-to-be-shifted"
> Wrong reason: The IR definition says that the behavior of these primops is
> undefined in out of range shift cases -- regardless of what your
> back end might do with them.

Will have a look.


> ISelEnv:
> +    - A Bool to tell us if the host is 32 or 64bit.
> +      This is set at the start and does not change.
> Necessary?  I thought you only supported 64-bit guest and host
> in this patch.

This is a leftover. Fixed.

> +#define VexGuestS390XStateAlignment 16
> This can disappear now, yes?  pre_run_checks checks this for
> all targets.

Yes. Done.

Comment 44 Florian Krohm 2011-02-09 21:00:27 UTC

(In reply to comment #43)
> 
> > (We discussed this already, so this is
> > <kind of irrelevant>
> >   More seriously, you need to redo the way the FPC is handled.
> >   ... need to consider more .. (do FP primops take rounding
> >   mode that is actually considered?  they should; do like PPC)
> >   run_innerloop_exit: you really need to check that the FPC
> >   on exit from generated code is as expected.
> >   so when fixed up, need to check it's the right value at
> >   exit of the dispatcher (if ppc does that)
> > </kind of irrelevant>
> see above, I want to start with the x86 way, but this was mostly
> done by Florian, so he has more insight.
> 

I would much prefer to implement to PPC scheme, even if it is a bit more work. 
That way we don't have to revisit it, or have to discuss whether to revisit.

> 
> > check -- are there vex own versions for:
> > +#define likely(x)       __builtin_expect(!!(x), 1)
> > +#define unlikely(x)     __builtin_expect(!!(x), 0)
> > 

Not that I could find them, but it would definitely be nice if VEX proper provided them.

> > host_s390_insn.c
> > +#include <stdarg.h>
> > urr, is this necessary?
> 
> For va_arg. We could use __buildin_va_arg to get rid of this include.
> This is used for the instruction printing below.
> 

We could use the __builtin stuff but probably would have to check for GCC versions that support it reliably... Perhaps using stdarg.h is simpler, as it does not introduce an unwanted dependency on libc. Unles of course you consider the inclusion of the file itself unwanted.

> > host_s390_disasm.[ch]
> > What's this for?  The host-side doesn't need any disassembly
> > facilities on any other platforms.
> 
> Its mostly pretty printing for the trace/profile flags.
> Maybe s390_disasm is a misnomer, what about print_instruction?
> 

Not really a misnomer (see also wikipedia :) 
In addition to disassembling during IR generation we also can disassemble the emitted code. That is not done on other platforms but I found it quite convenient for debugging purposes. I could have called the file guest_s390_disasm.[ch] as well. It's neither specific to guest nor host.


> > 
> > +s390_irgen_FLOGR: "Note, that the Iop_Shl64 and
> > +      Iop_Shr64 semantics will preserve the value-to-be-shifted if the
> > +      shift-amount equals or is larger than the width of
> > value-to-be-shifted"
> > Wrong reason: The IR definition says that the behavior of these primops is
> > undefined in out of range shift cases -- regardless of what your
> > back end might do with them.
> 

Thanks for the clarification. I did not spot it in libvex_ir.h so I deduced the semantics from ir_defs/ir_opt.c

Comment 45 Christian Borntraeger 2011-02-11 09:35:53 UTC

Created attachment 57150 [details]
FYI:  review feedback fir changes patch

(In reply to comment #34)

Here is a patch that should address most issues in the first patch. The only open thing is the rounding mode.

One thing though: There was a naming issue:
F128toF64 already exists (rounding down from 128 bit to 64bit) so I renamed F128to64 to F128LOtoF64 to better match the F128HItoF64. OK?

Comment 46 Florian Krohm 2011-02-15 05:23:03 UTC

(In reply to comment #42)
> 
> +s390_irgen_FLOGR: "Note, that the Iop_Shl64 and
> +      Iop_Shr64 semantics will preserve the value-to-be-shifted if the
> +      shift-amount equals or is larger than the width of
> value-to-be-shifted"
> Wrong reason: The IR definition says that the behavior of these primops is
> undefined in out of range shift cases -- regardless of what your
> back end might do with them.
> 

OK, here is an updated version of s390_irgrn_FLOGR that avoids the undefined behaviour.

HChar *
s390_irgen_FLOGR(UChar r1, UChar r2)
{
   IRTemp input    = newTemp(Ity_I64);
   IRTemp not_zero = newTemp(Ity_I64);
   IRTemp tmpnum   = newTemp(Ity_I64);
   IRTemp num      = newTemp(Ity_I64);
   IRTemp shift_amount = newTemp(Ity_I8);

   /* We use the "count leading zeroes" operator because the number of
      leading zeroes is identical with the bit position of the first '1' bit.
      However, that operator does not work when the input value is zero.
      Therefore, we set the LSB of the input value to 1 and use Clz64 on
      the modified value. If input == 0, then the result is 64. Otherwise,
      the result of Clz64 is what we want. */

   assign(input, get_gpr_dw0(r2));
   assign(not_zero, binop(Iop_Or64, mkexpr(input), mkU64(1)));
   assign(tmpnum, unop(Iop_Clz64, mkexpr(not_zero)));

   /* num = (input == 0) ? 64 : tmpnum */
   assign(num, mkite(binop(Iop_CmpEQ64, mkexpr(input), mkU64(0)),
                     /* == 0 */ mkU64(64),
                     /* != 0 */ mkexpr(tmpnum)));

   put_gpr_dw0(r1, mkexpr(num));

   /* Set the leftmost '1' bit of the input value to zero. The general scheme
      is to first shift the input value by NUM + 1 bits to the left which
      causes the leftmost '1' bit to disappear. Then we shift logically to
      the right by NUM + 1 bits. Because the semantics of Iop_Shl64 and
      Iop_Shr64 are undefined if the shift-amount is greater than or equal to
      the width of the value-to-be-shifted, we need to special case 
      NUM + 1 >= 64. This is equivalent to INPUT != 0 && INPUT != 1.
      For both such INPUT values the result will be 0. */

   assign(shift_amount, unop(Iop_64to8, binop(Iop_Add64, mkexpr(num),
					      mkU64(1))));

   put_gpr_dw0(r1 + 1,
	       mkite(binop(Iop_CmpLE64U, mkexpr(input), mkU64(1)),
		     /* == 0 || == 1*/ mkU64(0),
		     /* otherwise */
		     binop(Iop_Shr64,
			   binop(Iop_Shl64, mkexpr(input),
				 mkexpr(shift_amount)),
			   mkexpr(shift_amount))));

   /* Compare the original value as an unsigned integer with 0. */
   s390_cc_thunk_put2(S390_CC_OP_UNSIGNED_COMPARE, input,
                      mktemp(Ity_I64, mkU64(0)), False);

   return "flogr";
}

Comment 47 Julian Seward 2011-02-16 13:15:19 UTC

(In reply to comment #45)
> Created an attachment (id=57150) [details]
> FYI:  review feedback fir changes patch

Looks OK to me.

Comment 48 Julian Seward 2011-02-16 13:21:37 UTC

(In reply to comment #44)

> [.. rounding for FP arithmetic ..]
> I would much prefer to implement to PPC scheme, even if it is a bit more work. 
> That way we don't have to revisit it, or have to discuss whether to
> revisit.

As you like.  The important thing is to make sure you have a good test
program to verify it's actually working properly.


> > > check -- are there vex own versions for:
> > > +#define likely(x)       __builtin_expect(!!(x), 1)
> > > +#define unlikely(x)     __builtin_expect(!!(x), 0)
> > > 
> 
> Not that I could find them, but it would definitely be nice if VEX proper
> provided them.

Somewhere in the Valgrind tree are defns for LIKELY and UNLIKELY.
We should copy those into vex.


> Not really a misnomer (see also wikipedia :) 
> In addition to disassembling during IR generation we also can disassemble the
> emitted code. That is not done on other platforms but I found it quite
> convenient for debugging purposes. I could have called the file
> guest_s390_disasm.[ch] as well. It's neither specific to guest nor
> host.

IMO this is of marginal value.  For all the other targets, the front
ends independently print insns (via the DIP macro) and the back ends
also independently print insns pre-assembly, via ppX86Instr etc.  The
only time where I wanted to disassemble the generated code is when
debugging emitX86Instr etc, since those fns were often a source of
bugs, but even then those were quickly fixed and then worked reliably
after that.

So I'd prefer not to have this facility, unless you feel really
strongly about it.

Comment 49 Julian Seward 2011-02-16 13:25:45 UTC

(In reply to comment #46)
> OK, here is an updated version of s390_irgrn_FLOGR that avoids the undefined
> behaviour.

Yeah .. the other front ends have similarly horrible kludges to solve
the same problem (deal with out of range shifts without presenting out
of range shifts to the IR primops).  The only important thing here is
to ensure that, for the IR you're generating, if the shift amount is
an immediate (as is most often the case), then ir_opt can fold out all
the Clz and Mux0X (ite) gunk and leave a single shift primop in place.
(if you see what I mean).  So you don't lose performance in the common
case.

Comment 50 Florian Krohm 2011-02-16 16:42:46 UTC

(In reply to comment #49)
> 
> Yeah .. the other front ends have similarly horrible kludges to solve
> the same problem (deal with out of range shifts without presenting out
> of range shifts to the IR primops).  The only important thing here is
> to ensure that, for the IR you're generating, if the shift amount is
> an immediate (as is most often the case), then ir_opt can fold out all
> the Clz and Mux0X (ite) gunk and leave a single shift primop in place.
> (if you see what I mean).  So you don't lose performance in the common
> case.

I will add a testcase, to make sure we get the desired optimizations.

Comment 51 Florian Krohm 2011-02-16 17:05:20 UTC

(In reply to comment #48)
> > In addition to disassembling during IR generation we also can disassemble the
> > emitted code. That is not done on other platforms but I found it quite
> > convenient for debugging purposes. I could have called the file
> > guest_s390_disasm.[ch] as well. It's neither specific to guest nor
> > host.
> 
> IMO this is of marginal value.  For all the other targets, the front
> ends independently print insns (via the DIP macro) and the back ends
> also independently print insns pre-assembly, via ppX86Instr etc.  The
> only time where I wanted to disassemble the generated code is when
> debugging emitX86Instr etc, since those fns were often a source of
> bugs, but even then those were quickly fixed and then worked reliably
> after that.
> 
> So I'd prefer not to have this facility, unless you feel really
> strongly about it.

I found the feature really helpful and would prefer to keep it. It is not
contributing a lot of code or adding note-worthy runtime.

Comment 52 Florian Krohm 2011-02-20 02:39:58 UTC

(In reply to comment #50)
> (In reply to comment #49)
> > 
> > Yeah .. the other front ends have similarly horrible kludges to solve
> > the same problem (deal with out of range shifts without presenting out
> > of range shifts to the IR primops).  The only important thing here is
> > to ensure that, for the IR you're generating, if the shift amount is
> > an immediate (as is most often the case), then ir_opt can fold out all
> > the Clz and Mux0X (ite) gunk and leave a single shift primop in place.
> > (if you see what I mean).  So you don't lose performance in the common
> > case.
> 
> I will add a testcase, to make sure we get the desired optimizations.

I noticed that ir_opt does not fold Iop_Clz32 / 64. Here is a patch.
Would you mind applying it?

Index: ir_opt.c
===================================================================
--- ir_opt.c	(revision 2104)
+++ ir_opt.c	(working copy)
@@ -957,6 +957,30 @@
 }
 
 
+static UInt fold_Clz64(ULong value)
+{
+  UInt i;
+
+  for (i = 0; i < 64; ++i) {
+    if (value & ((ULong)1 << (63 - i))) return i;
+  }
+
+  return 0;
+}
+
+
+static UInt fold_Clz32(UInt value)
+{
+  UInt i;
+
+  for (i = 0; i < 32; ++i) {
+    if (value & ((UInt)1 << (31 - i))) return i;
+  }
+
+  return 0;
+}
+
+
 static IRExpr* fold_Expr ( IRExpr* e )
 {
    Int     shift;
@@ -1154,6 +1178,16 @@
             break;
          }
 
+         case Iop_Clz32:
+            e2 = IRExpr_Const(IRConst_U32(
+                    fold_Clz32(e->Iex.Unop.arg->Iex.Const.con->Ico.U32)));
+            break;
+
+         case Iop_Clz64:
+            e2 = IRExpr_Const(IRConst_U64(
+                    fold_Clz64(e->Iex.Unop.arg->Iex.Const.con->Ico.U64)));
+            break;
+
          default: 
             goto unhandled;
       }

Comment 53 Florian Krohm 2011-02-23 04:53:39 UTC

(In reply to comment #42)
> 
> Comments for c31-new-files.diff (attachment ID = 56228)
> 
> Again, not bad stuff.  I have three main concerns and a number of
> smaller comments.
> 

Here is a patch against r11566 / VEX r2104 which addresses your comments.
Specifically we fixed all white space and formatting issues, combined 
the files in VEX to match what is done for the other architectures and 
added more testcases. The Irrm_CURRENT rounding mode has been eliminated. 
Instead we're using Irrm_NEAREST (as is done on x86, amd, etc). Eventually,
we'd like to implement what is done in the ppc port, but not in this patch.
We've tested successfully with auxprogs/gsl16test.

Included are also some minor bug fixes in s390 specific code (mostly in
testcases) which were exposed by running on an older Z machine with older
GCCs.

The patch compiles and links OK. Running the testsuite requires bugs
264800 and 265762 to be fixed. Both are trivial and patches are attached
to them. I tried to create a dependency in bugzilla, but could not figure
it out. Probably I don't have the necessary privs to do that.

When you apply the patch, note that none/tests/s390x/filter_stderr
needs execute permissions.

If there is anything else that you want addressed let us know.

Comment 54 Florian Krohm 2011-02-23 04:55:36 UTC

Created attachment 57456 [details]
Combined patch addressing review comments

Comment 55 Christian Borntraeger 2011-03-04 15:10:32 UTC

Julian, 
for reference, here is the make regtest result on SLES11 (with a small callgrind fix)
== 450 tests, 7 stderr failures, 0 stdout failures, 0 post failures ==
memcheck/tests/badfree3                  (stderr)
none/tests/faultstatus                   (stderr)  
helgrind/tests/tc06_two_races_xml        (stderr)  
helgrind/tests/tc23_bogus_condwait       (stderr)
drd/tests/tc04_free_lock                 (stderr)
drd/tests/tc09_bad_unlock                (stderr)
drd/tests/tc23_bogus_condwait            (stderr)

Please consider to apply the patch.

Comment 56 Julian Seward 2011-03-07 19:37:27 UTC

(In reply to comment #52)
> I noticed that ir_opt does not fold Iop_Clz32 / 64. Here is a patch.

Committed, with a check against zero -- since Clz32/64(0) has no
defined semantics, we can't fold in that case.  vex r2106.

Comment 57 Julian Seward 2011-03-07 19:44:01 UTC

(In reply to comment #54)
> Combined patch addressing review comments

Committed: vex r2105, valgrind r11604, 11605, 11606.  Thank you for
the high quality patches, and sorry for the slow handling.

I see there are followup bugs 264800 and 265762.  Any others?

Comment 58 Florian Krohm 2011-03-07 21:05:49 UTC

(In reply to comment #57)
> (In reply to comment #54)
> > Combined patch addressing review comments
> 
> Committed: vex r2105, valgrind r11604, 11605, 11606.  Thank you for
> the high quality patches, and sorry for the slow handling.
> 
> I see there are followup bugs 264800 and 265762.  Any others?

If you could apply 247223 that would be sweet. We've got about 60 warnings in our build about regparm. 
I'm not aware of immediate followup patches. I've some things in the queue but they won't materialize until end of the week or so.

Comment 59 Christian Borntraeger 2011-03-07 21:41:40 UTC

bug #253206 would also be good.

Comment 60 Julian Seward 2011-04-19 17:30:48 UTC

I think this "bug" is comprehensively fixed now.