Hi, the code of libev uses "lock; orb %0,-1(%%esp)" as a memory guard. memcheck complains. In the libev author's opinion, this is a bug in valgrind as it should recognize that this is a guard instruction which accesses legitimate (if uninitialized) memory, and whose result isn't actually used for anything. Would it be possible to each memcheck not to do that?
I think you need to explain exactly what the purpose of the instruction is - if the result isn't being relied on then presumably it is being executed for some sort of side effect? In general accessing below the stack pointer on x86_32 is unsafe even when ignoring questions of definedness because a signal may cause corruption of anything stored there.
It's used as a memory barrier. The Linux kernel uses addl %0,0(%%esp) for the same purpose. I have no idea why libev uses -1 but its author argues that his use of -1 is not wrong and therefore valgrind shouldn't complain about it. Yes I know it's unsafe. Nothing is being stored there, the access is simply used for the barrier side effect. The libev author claims that you should simply mark the contents of whatever register %0 ends up being as to be undefined data, instead of emitting an unconditional warning.
The libev code is incorrect and should be fixed. It it violates the ABI. The problem isn't that the memory is uninitialised. It is that the program isn't allowed to access below %esp at any time, for at least the following reasons: * a signal may get delivered at any time, in which case the signal handler's frame will overwrite the value at -1(%esp). * since the kernel "knows" that programs may not access below %esp, it would be within its rights to unmap the page containing -1(%esp). If %esp pointed exactly to the bottom of a page then the access at -1(%esp) would cause an unexpected page (segmentation) fault.
Well the libev author thinks that his code works and doesn't break anything … * the value isn't used, thus doesn't matter, thus valgrind has no business complaining. * the kernel needs to handle negative %esp offsets all the time because when you grow the stack the kernel "sees" the old contents of %esp (necessarily, otherwise it couldn't restart the faulting instruction). See the kernel fault handler for details.
(In reply to Matthias Urlichs from comment #4) > Well the libev author thinks that his code works and doesn't break anything … http://www.sco.com/developers/devspecs/abi386-4.pdf, page 3-29: [..] Data in the stack segment at addresses below the stack pointer contain undefined values. [..] is the best I can offer you for 32-bit x86.
(In reply to Matthias Urlichs from comment #4) > * the kernel needs to handle negative %esp offsets all the time because when > you grow the stack the kernel "sees" the old contents of %esp (necessarily, > otherwise it couldn't restart the faulting instruction). Page fault based stack extension only applies to the main thread's stack. Why does libev have to be non-conforming in this way? There are plenty of other packages that do high-performance thread stuff and I've never heard of any of them doing anything like this.
FWIW, the ppc32-elf ABI says Data in the stack segment at addresses below the stack pointer contain undefined values. The ppc64-elf ABI says As discussed later in this chapter, the lowest valid stack address is 288 bytes less than the value in the stack pointer. (ppc64-ELF has a 288 byte red zone, so this says, in effect, any address below the redzone is invalid). You might be able to argue that for x86-linux, with the kernel sources as they now stand, that libev is "safe". But for sure you can't say it's portable. Will the same guarantees hold on, eg, x86-freebsd? Or for non-x86 architectures on Linux?
(In reply to Julian Seward from comment #6) > Why does libev have to be non-conforming in this way? There are plenty of > other packages that do high-performance thread stuff and I've never heard > of any of them doing anything like this. I have no idea, and the author does not want to use this bug to discuss the issue. FWIW, the libev mailing list is at http://lists.schmorp.de/mailman/listinfo/libev
Just as a reference. gcc[7] briefly implemented __sync_synchronize() on i686 using lock orl $0, -4(%esp). It caused valgrind errors and it has been reverted. https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=245577 https://bugzilla.redhat.com/show_bug.cgi?id=1423434
gcc patch discussion: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01130.html