It seems that the DWARF line info reader in readdwarf.c, mainly function read_dwarf2_lineblock() misinterprets "is_stmt" register. DWARF v4 says about "is_stmt": A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement. However read_dwarf2_lineblock() interprets "is_stmt == 0" as: ignore opcodes in the current block until end of sequence or until "is_stmt = 1" again. This however has negative consequences: only instructions within the range covered by "is_stmt = 1" are considered a part of the line. Instructions belonging to the same line but covered by "is_stmt = 0" are not considered a part of the line. There are some discussions about use cases for "is_stmt" in conjunction with line numbers and I think for debuggers it boils down to the following four: 1 - To set a breakpoint at the beginning of a source line. 2 - When stepping at the source level to identify when a new source line has been encountered. 3 - When displaying interspersed disassembled machine code with source code the line number tables are used to identify where to insert source code into the disassembly listing. 4 - When a hardware exception occurs, or when displaying a stack trace then the tables are used to identify the particular source line associated with an instruction address. Valgrind itself is not a debugger and only 4) is therefore relevant. In that case, line numbers should correctly reflect all instructions belonging to a source line, regardless of is_stmt value. Therefore I am proposing to eliminate tracking of is_stmt in readdwarf.c. I tested the patch on Linux Ubuntu 15.04 amd64+x86 and Solaris 12 amd64+x86. Regression testing is ok.
Created attachment 95792 [details] proposed patch
On debian8/x86, with the patch, the debuginfo memory for a big executable increases by slightly less than 10%. Do you have specific examples of functional improvements with the patch ? (i.e. better stacktraces ?) Philippe
Am 28.11.2015 um 22:03 schrieb Ivo Raisr via KDE Bugzilla: > DWARF v4 says about "is_stmt": > A boolean indicating that the current instruction is a recommended breakpoint > location. A recommended breakpoint location is intended to “represent” a line, > a statement and/or a semantically distinct subpart of a statement. > > However read_dwarf2_lineblock() interprets "is_stmt == 0" as: ignore opcodes in > the current block until end of sequence or until "is_stmt = 1" again. This > however has negative consequences: only instructions within the range covered > by "is_stmt = 1" are considered a part of the line. Instructions belonging to > the same line but covered by "is_stmt = 0" are not considered a part of the > line. I never noted this problem with callgrind/cachegrind, but of course it would be bad if some guest instructions are not attributed to the correct source line in profiling output. Where did you notice this? Only with some compiler? Or is this about future behavior of GCC/clang/whatever ? Using "--dump-instr=yes", Callgrind adds detailed instruction/source code relation to the output, and one should be able to see issues when looking at the machine code annotation view in {q,k]cachegrind). Josef > > There are some discussions about use cases for "is_stmt" in conjunction with > line numbers and I think for debuggers it boils down to the following four: > 1 - To set a breakpoint at the beginning of a source line. > 2 - When stepping at the source level to identify when a new source line has > been encountered. > 3 - When displaying interspersed disassembled machine code with source code the > line number tables are used to identify where to insert source code into the > disassembly listing. > 4 - When a hardware exception occurs, or when displaying a stack trace then the > tables are used to identify the particular source line associated with an > instruction address. > > Valgrind itself is not a debugger and only 4) is therefore relevant. In that > case, line numbers should correctly reflect all instructions belonging to a > source line, regardless of is_stmt value. > > Therefore I am proposing to eliminate tracking of is_stmt in readdwarf.c. > I tested the patch on Linux Ubuntu 15.04 amd64+x86 and Solaris 12 amd64+x86. > Regression testing is ok. >
So I spotted the problem by a chance, because another redirected function got used on Solaris/x86+amd64 (using gcc 4.8.2). However this is a generic problem with location info on Linux as well. Consider the following decoded "Line Number Statements" as displayed by "readelf --debug-dump=rawline": (previous instructions were related to line 868) ... Set is_stmt to 1 Special opcode 174: advance Address by 12 to 0x1c72 and Line by 1 to 869 Advance PC by constant 17 to 0x1c83 Special opcode 5: advance Address by 0 to 0x1c83 and Line by 0 to 869 Extended opcode 4: set Discriminator to 1 Set is_stmt to 0 Special opcode 131: advance Address by 9 to 0x1c8c and Line by 0 to 869 Extended opcode 4: set Discriminator to 2 Special opcode 75: advance Address by 5 to 0x1c91 and Line by 0 to 869 Extended opcode 4: set Discriminator to 1 Special opcode 75: advance Address by 5 to 0x1c96 and Line by 0 to 869 Extended opcode 4: set Discriminator to 2 Copy Extended opcode 4: set Discriminator to 1 Special opcode 131: advance Address by 9 to 0x1c9f and Line by 0 to 869 Extended opcode 4: set Discriminator to 2 Advance PC by constant 17 to 0x1cb0 Special opcode 89: advance Address by 6 to 0x1cb6 and Line by 0 to 869 Extended opcode 4: set Discriminator to 1 Special opcode 159: advance Address by 11 to 0x1cc1 and Line by 0 to 869 Extended opcode 4: set Discriminator to 2 Special opcode 131: advance Address by 9 to 0x1cca and Line by 0 to 869 Special opcode 131: advance Address by 9 to 0x1cd3 and Line by 0 to 869 Extended opcode 4: set Discriminator to 3 Special opcode 117: advance Address by 8 to 0x1cdb and Line by 0 to 869 Extended opcode 4: set Discriminator to 1 Advance PC by 88 to 0x1d33 Special opcode 5: advance Address by 0 to 0x1d33 and Line by 0 to 869 Extended opcode 4: set Discriminator to 2 Advance PC by constant 17 to 0x1d44 Special opcode 47: advance Address by 3 to 0x1d47 and Line by 0 to 869 Advance PC by 12 to 0x1d53 Extended opcode 1: End of Sequence Line 869 corresponds to function _vgr10110ZU_VgSoSynsomalloc_memalign() - which happens to be the last one in this compilation unit. Its size is 0xe1 (225) bytes as confirmed by nm, readelf and also disassembly. However current Valgrind functionality adds DiLoc entries only for addresses where "is_stmt = 1", that is address range 0x1c72-0x1c83 (17 bytes). This is clearly wrong; address ranges for "is_stmt = 0" should be covered as well.
As for the memory increase, current functionality does not merge location entries for adjacent addresses. I will enhance the patch to merge entries for adjacent addresses and the same line - this should give us significant gains.
Created attachment 95828 [details] proposed patch Adjacent DiLoc entries are now merged if they refer to the same line. This should give an improvement in terms of memory used.
(In reply to Ivo Raisr from comment #6) > Created attachment 95828 [details] > proposed patch > > Adjacent DiLoc entries are now merged if they refer to the same line. This > should give an improvement in terms of memory used. Yes, results are good. With the patch, the memory used is now similar to the trunk. Just one question: the merging is done when adding a new entry in the loctab (i.e. in addLoc function). That is good to avoid uselessly growing the loctab during insertion. I am wondering however if this merges all what can be merged. Maybe it would be useful to also merge adjacent entries in canonicaliseLoctab (after having sorted on addr) ? This is of course only useful if there are non successive addLoc calls for mergeable entries. Otherwise, small style remark for the storage.c patch: I think (most of) the code splits the too long lines before the &&, not after. so this + if ((previous->lineno == loc->lineno) && + (previous->addr + previous->size == loc->addr)) { should be + if ((previous->lineno == loc->lineno) + && (previous->addr + previous->size == loc->addr)) {
Created attachment 95895 [details] patch with formatting fixed Patch with fixed formatting.
Yes, '&&' seems misplaced. I attached another patch which fixes this style issues. As regards possible additional merges in canonicaliseLoctab() - preliminary investigation shows that some gains are possible. However I would like to tackle this separately, to avoid combining too many things in one commit.
Fixed in SVN commit r15741.
See also: Bug 356273 - conserve memory by merging adjacent DiLoc entries in the debug info location table