Bug 374940

Summary: i386: Stack access at -1(%%esp)
Product: [Developer tools] valgrind Reporter: Matthias Urlichs <smurf>
Component: memcheckAssignee: Julian Seward <jseward>
Status: RESOLVED NOT A BUG    
Severity: normal CC: mark, tom
Priority: NOR    
Version: 3.12 SVN   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Matthias Urlichs 2017-01-12 02:32:46 UTC
Hi,

the code of libev uses "lock; orb %0,-1(%%esp)" as a memory guard. memcheck complains.

In the libev author's opinion, this is a bug in valgrind as it should recognize that this is a guard instruction which accesses legitimate (if uninitialized) memory, and whose result isn't actually used for anything.

Would it be possible to each memcheck not to do that?
Comment 1 Tom Hughes 2017-01-12 07:11:24 UTC
I think you need to explain exactly what the purpose of the instruction is - if the result isn't being relied on then presumably it is being executed for some sort of side effect?

In general accessing below the stack pointer on x86_32 is unsafe even when ignoring questions of definedness because a signal may cause corruption of anything stored there.
Comment 2 Matthias Urlichs 2017-01-12 08:19:45 UTC
It's used as a memory barrier. The Linux kernel uses addl %0,0(%%esp) for the same purpose. I have no idea why libev uses -1 but its author argues that his use of -1 is not wrong and therefore valgrind shouldn't complain about it.

Yes I know it's unsafe. Nothing is being stored there, the access is simply used for the barrier side effect.

The libev author claims that you should simply mark the contents of whatever register %0 ends up being as to be undefined data, instead of emitting an unconditional warning.
Comment 3 Julian Seward 2017-01-12 08:33:21 UTC
The libev code is incorrect and should be fixed.  It it violates the ABI.
The problem isn't that the memory is uninitialised.  It is that the
program isn't allowed to access below %esp at any time, for at least
the following reasons:

* a signal may get delivered at any time, in which case the signal
  handler's frame will overwrite the value at -1(%esp).

* since the kernel "knows" that programs may not access below %esp, it
  would be within its rights to unmap the page containing -1(%esp).
  If %esp pointed exactly to the bottom of a page then the access
  at -1(%esp) would cause an unexpected page (segmentation) fault.
Comment 4 Matthias Urlichs 2017-01-23 03:18:29 UTC
Well the libev author thinks that his code works and doesn't break anything …

* the value isn't used, thus doesn't matter, thus valgrind has no business complaining.

* the kernel needs to handle negative %esp offsets all the time because when you grow the stack the kernel "sees" the old contents of %esp (necessarily, otherwise it couldn't restart the faulting instruction). See the kernel fault handler for details.
Comment 5 Julian Seward 2017-01-23 05:05:30 UTC
(In reply to Matthias Urlichs from comment #4)
> Well the libev author thinks that his code works and doesn't break anything …

http://www.sco.com/developers/devspecs/abi386-4.pdf, page 3-29:

  [..] Data in the stack segment at addresses below the
  stack pointer contain undefined values. [..]

is the best I can offer you for 32-bit x86.
Comment 6 Julian Seward 2017-01-23 17:25:13 UTC
(In reply to Matthias Urlichs from comment #4)
> * the kernel needs to handle negative %esp offsets all the time because when
> you grow the stack the kernel "sees" the old contents of %esp (necessarily,
> otherwise it couldn't restart the faulting instruction).

Page fault based stack extension only applies to the main thread's stack.

Why does libev have to be non-conforming in this way?  There are plenty of
other packages that do high-performance thread stuff and I've never heard
of any of them doing anything like this.
Comment 7 Julian Seward 2017-01-23 17:36:10 UTC
FWIW, the ppc32-elf ABI says

  Data in the stack segment at addresses below the stack pointer
  contain undefined values.

The ppc64-elf ABI says

  As discussed later in this chapter, the lowest valid stack address is
  288 bytes less than the value in the stack pointer.
  (ppc64-ELF has a 288 byte red zone, so this says, in effect, any
  address below the redzone is invalid).

You might be able to argue that for x86-linux, with the kernel sources
as they now stand, that libev is "safe".  But for sure you can't say
it's portable.  Will the same guarantees hold on, eg, x86-freebsd?
Or for non-x86 architectures on Linux?
Comment 8 Matthias Urlichs 2017-01-23 21:40:20 UTC
(In reply to Julian Seward from comment #6)

> Why does libev have to be non-conforming in this way?  There are plenty of
> other packages that do high-performance thread stuff and I've never heard
> of any of them doing anything like this.

I have no idea, and the author does not want to use this bug to discuss the issue.

FWIW, the libev mailing list is at http://lists.schmorp.de/mailman/listinfo/libev
Comment 9 Mark Wielaard 2017-02-20 15:29:01 UTC
Just as a reference. gcc[7] briefly implemented __sync_synchronize() on i686 using lock orl $0, -4(%esp). It caused valgrind errors and it has been reverted.
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=245577

https://bugzilla.redhat.com/show_bug.cgi?id=1423434
Comment 10 Mark Wielaard 2017-02-20 15:31:41 UTC
gcc patch discussion: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01130.html