Summary: | baloo_file_extractor crashes in KFileMetaData::XmlExtractor::extract() while indexing files | ||
---|---|---|---|
Product: | [Frameworks and Libraries] frameworks-kfilemetadata | Reporter: | Yaroslav Sidlovsky <zawertun> |
Component: | general | Assignee: | Pinak Ahuja <pinak.ahuja> |
Status: | RESOLVED UPSTREAM | ||
Severity: | crash | CC: | nate, stefan.bruens |
Priority: | NOR | Keywords: | drkonqi |
Version First Reported In: | 5.54.0 | ||
Target Milestone: | --- | ||
Platform: | Fedora RPMs | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | internal-entity-polynomial-attribute.xml |
Description
Yaroslav Sidlovsky
2019-02-10 11:10:58 UTC
Created attachment 117962 [details]
internal-entity-polynomial-attribute.xml
Looks like extractor was crashing while indexing internal-entity-polynomial-attribute.xml from the qt-5.11.0 sources, huh. See attachment. Crashing in KFileMetaData::XmlExtractor::extract. Is the file you attached the one that makes Baloo crash? Yes, I think so. This file causes huge memory usage when parsed with XML parser, so it's ok that baloo_file_extractor crashes (Firefox can't parse this file either). Bad that after I restart indexing this file is not skipped. Thats a nasty file: size of e1: 120 characters e2: 64 * e1 e3: 64 * e2 e4: 64 * e3 root id: 64 * e4 = 2^24 * 120 characters ~= 2 * 10^9 characters (4GByte for UTF-16/QString). For reallocation, we need about ~8 GByte (old data storage and new data storage), plus anything else allocated. Although the failed document is remembered, it is currently not taken into account when indexing. The reason for this is the lack of extractor versioning, i.e. after one failed attempt due to e.g. a coding error we would never try the file again. See https://phabricator.kde.org/T9867, 3rd bullet point. Git commit de81ddb651b14ca567e30c5bca4f7618894819a5 by Stefan Brüns. Committed on 23/02/2019 at 20:35. Pushed by bruns into branch 'master'. [Extractor] Add metadata to extractors Summary: This adds extractor metadata in a backwards and forward compatible way. There are several use cases for this metadata: - Delayed loading of extractor plugins - currently, all extractors are loaded and and initialized when an ExtractorCollection is created. - Versioning information - e.g. Baloo would benefit from versioning information, to reindex affected files after an extractor has been updated. Although it would be possible to extend the extractor plugin interface with a method for each relevant property, it would require a bump of the plugin inteface version each time the interface is extended. See: T9867, T8079 Test Plan: ctest Reviewers: #baloo, #frameworks, ngraham, astippich, poboiko Reviewed By: astippich Subscribers: kde-frameworks-devel Tags: #frameworks, #baloo Differential Revision: https://phabricator.kde.org/D19109 M +1 -0 autotests/CMakeLists.txt M +57 -2 autotests/extractorcollectiontest.cpp M +10 -0 src/extractor.cpp M +4 -0 src/extractor.h M +1 -0 src/extractor_p.h M +10 -1 src/extractorcollection.cpp M +3 -1 src/extractorcollection.h M +2 -0 src/extractors/CMakeLists.txt M +2 -1 src/extractors/appimageextractor.h A +9 -0 src/extractors/appimageextractor.json M +2 -1 src/extractors/epubextractor.h A +8 -0 src/extractors/epubextractor.json M +2 -1 src/extractors/exiv2extractor.h A +29 -0 src/extractors/exiv2extractor.json.in M +2 -1 src/extractors/ffmpegextractor.h A +16 -0 src/extractors/ffmpegextractor.json M +2 -1 src/extractors/mobiextractor.h A +8 -0 src/extractors/mobiextractor.json M +2 -1 src/extractors/odfextractor.h A +10 -0 src/extractors/odfextractor.json M +2 -1 src/extractors/office2007extractor.h A +10 -0 src/extractors/office2007extractor.json M +2 -1 src/extractors/officeextractor.h A +19 -0 src/extractors/officeextractor.json M +2 -1 src/extractors/plaintextextractor.h A +8 -0 src/extractors/plaintextextractor.json M +2 -1 src/extractors/poextractor.h A +8 -0 src/extractors/poextractor.json M +2 -1 src/extractors/popplerextractor.h A +8 -0 src/extractors/popplerextractor.json M +2 -1 src/extractors/postscriptdscextractor.h A +9 -0 src/extractors/postscriptdscextractor.json M +2 -1 src/extractors/taglibextractor.h A +25 -0 src/extractors/taglibextractor.json M +2 -1 src/extractors/xmlextractor.h A +10 -0 src/extractors/xmlextractor.json https://commits.kde.org/kfilemetadata/de81ddb651b14ca567e30c5bca4f7618894819a5 |