Summary: | kdevelop crashes when parsing big qrc files | ||
---|---|---|---|
Product: | [Applications] kdevelop | Reporter: | tisi sit <vladaspams> |
Component: | Language Support: CPP (old) | Assignee: | kdevelop-bugs-null |
Status: | RESOLVED FIXED | ||
Severity: | crash | CC: | aleixpol, cwbryanii, david.nolden.kde |
Priority: | NOR | ||
Version: | 4.2.60 | ||
Target Milestone: | 4.2.3 | ||
Platform: | Fedora RPMs | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | massif file of duchainify run on the supplied test file |
Description
tisi sit
2012-01-11 08:51:49 UTC
*** Bug 294741 has been marked as a duplicate of this bug. *** what project is this, can we investigate it on our own? it's clearly an out-of-memory issue - how much do you have available? (In reply to comment #2) > what project is this, can we investigate it on our own? Unfortunately, it is an commercial product, and I can not provide source files. The size is about 250 kLOC, and it uses boost, glut, opengl, qt, sdl-audio, and several other 3rd party libraries. I got about 60-100 include paths. Could that cause an issue? > it's clearly an > out-of-memory issue - how much do you have available? I have 8 Gb ram, plus some virtual memory. Is there a way to provide the kdevelop's configuration? Maybe I set something wrong (I am using 4 threads in the parser). I think I have identified the source of the problem. We are using Qt in our project. And we create a shared library containing all resources for all qt applications (we got only images in the resource files, but lots of them. They are mostly buttons icons, but there are some big image). Anyway, the kdevelop is crashing nearly every time it needs to parse the cpp file created from the qrc file (qt resource file). heh that might be the problem indeed. is there maybe a way you could provide a qrc-generated file with some free images that exhibits this behavior? then I could try to look at the problem and maybe find a way to improve the memory consumption. bye (In reply to comment #5) > heh that might be the problem indeed. is there maybe a way you could provide a > qrc-generated file with some free images that exhibits this behavior? then I > could try to look at the problem and maybe find a way to improve the memory > consumption. I tried attaching the file, but I couldn't because it is 50 Mb large (even compressed it is big), and bugzilla allows 1 mega files. Do you know of an alternatives? you can try one of the gazillions of free upload sites, like zippyshare, dropbox, ... Here is the problematic file : http://www10.zippyshare.com/v/48949300/file.html Git commit 79edb4115f9950a98dd395a05a45f391e19bd0e3 by Milian Wolff. Committed on 01/03/2012 at 01:16. Pushed by mwolff into branch 'master'. optimize: reduce memory consumption of Token class by 50% on 64Bit By removing the ParseSession pointer from it, we get rid of 8 bytes and furthermore reduce the alignment size to 4. This way, we now only require 12 bytes per token compared to 24 bytes previously. This also allows us to define the Token class as a primitive type, potentially speeding up the TokenStream even further. The "cost" is a changed API, to get the string representation of a token, one must now ask the TokenStream. In practice this is very rarely a real pita, as before one often did stream->token(i)->symbol() now you just do stream->symbol(i) Furthermore I've consolidated the tons of custom "AST* node to QString" functions into one central ParseSession::stringForNode. Finally, I've replaced some costly token.symbol() == IndexedChar("somechar") with the much faster token.kind == Token_xyz comparisons. All in all, this should a) make our code faster and b) let it use much less memory while at it. For the big resource file in the bug below, the difference of 50% in the Token class results in ~250MB less memory consumption M +4 -4 languages/cpp/cppduchain/cppeditorintegrator.cpp M +2 -2 languages/cpp/cppduchain/declarationbuilder.cpp M +1 -7 languages/cpp/cppduchain/dumpchain.cpp M +7 -11 languages/cpp/cppduchain/expressionvisitor.cpp M +6 -23 languages/cpp/cppduchain/name_visitor.cpp M +3 -2 languages/cpp/parser/codegenerator.cpp M +3 -3 languages/cpp/parser/dumptree.cpp M +35 -38 languages/cpp/parser/lexer.cpp M +53 -21 languages/cpp/parser/lexer.h M +7 -22 languages/cpp/parser/name_compiler.cpp M +6 -11 languages/cpp/parser/parser.cpp M +15 -0 languages/cpp/parser/parsesession.cpp M +10 -0 languages/cpp/parser/parsesession.h M +15 -5 languages/cpp/parser/tests/test_generator.cpp M +2 -11 languages/cpp/parser/tests/test_parser.cpp M +0 -4 languages/cpp/parser/tests/test_parser.h M +2 -2 languages/cpp/parser/tests/test_parser_cpp2011.cpp M +3 -2 languages/cpp/tests/cpp-parser.cpp http://commits.kde.org/kdevelop/79edb4115f9950a98dd395a05a45f391e19bd0e3 Created attachment 69199 [details] massif file of duchainify run on the supplied test file parsing your files shows that the clear culprit is the AST design... your file has roughly 10mio initializer clauses, see e.g. http://www.nongnu.org/hcb/#dcl.init Our InitializerClauseAST has a sizeof(32) on 64 bit machines (8-byte aligned, two pointers, four integers) and every number is in turn a PrimaryExpressionAST with sizeof(64). So we end up at ~10mio * 96bytes = 960 MB. Massif (attached) says it's even more at ~1.1GB... so yeah we need to figure out how to optimize the memory consumption of our ast :-/ btw we could also think about never parsing files larger than X bytes, something like 10MB should do already. Furthermore an additional safeguard in our parser to "stop" after X tokens might be good, what do you think? (In reply to comment #11) > btw we could also think about never parsing files larger than X bytes, > something like 10MB should do already. > 1 MB files are already way too big :) > Furthermore an additional safeguard in our parser to "stop" after X tokens > might be good, what do you think? If you ask me - sounds good. the largest file I can find is /usr/include/ppl.h at 2.6MB so a limit of ~5mb could maybe work I'll see - David, do you have any objections to this? Git commit c0225a06eae31b6ccd46d8b4924faf02caf29dcf by Milian Wolff. Committed on 01/03/2012 at 15:55. Pushed by mwolff into branch '1.3'. skip files that are larger than 5MiB during parsing our memory consumption is quite considerable for large files, in the bug report below e.g. it is roughly 30x that of the parsed files size. to prevent asserts / OOM exceptions we just skip files that are that large. most often enough, these files are uninmportant generated files anyways and don't need to be parsed to get useful developer features. Examples are e.g. Qt resource files. note: not translated to get it in for 1.3. Since this error is not very user visible / important, I think it's ok this way. I'll properly mark it as translatable for 1.4. M +14 -1 language/backgroundparser/parsejob.cpp http://commits.kde.org/kdevplatform/c0225a06eae31b6ccd46d8b4924faf02caf29dcf Git commit 7310ea61043390d5e31f1d07d5bb35f0b06ad4e7 by Milian Wolff. Committed on 24/11/2012 at 21:16. Pushed by mwolff into branch 'master'. Use union in PrimaryExpressionAST to reduce memory footprint. On 64Bit machines, we had sizeof(PrimaryExpressionAST) == 64. By using a union and enum this can be decreased to 40. This can result in dramatically less memory consumption, esp. for large qrc files e.g. In the case of the file attached to bug 291248 the memory consumption dropped by 200MB. While the code handling is now a bit changed, I still think this change is worth it. While at it, I've also refactored ExpressionVisitor::visitPrimaryExpression. This gives us cleaner code and should also be faster since we can use the token type instead of doing string comparisons to find numbers. M +63 -78 languages/cpp/cppduchain/expressionvisitor.cpp M +2 -0 languages/cpp/cppduchain/expressionvisitor.h M +3 -1 languages/cpp/cppduchain/usedecoratorvisitor.cpp M +15 -5 languages/cpp/parser/ast.h M +12 -9 languages/cpp/parser/codegenerator.cpp M +17 -4 languages/cpp/parser/default_visitor.cpp M +5 -0 languages/cpp/parser/parser.cpp http://commits.kde.org/kdevelop/7310ea61043390d5e31f1d07d5bb35f0b06ad4e7 |