I downloaded this UHD demo file to benchmark H265 decoding on my machine http://demo-uhd3d.com/files/uhd4k/Demo_Samsung_2014_-_Iceland.zip (800MB .ts file) Which as soon as extracted made baloo start indexing it and taking 100% CPU, with RAM usage gradually growing towards 1.7-1.8GB before I killed it, only have 4GB in this machine so did not want it to get swap locked. file shows it as "MPEG Transport stream", dolphin says it's a "Message catalog" which sounds off Reproducible: Always Steps to Reproduce: 1. Download the file and extract it somewhere baloo is indexing 2. Check top and look for baloo_file_extractor using resources Actual Results: baloo_file_extractor starts eating 1.5+GB of RAM and 100% CPU Expected Results: file extractor should quit within one or two seconds since it's a binary file
Confirmed. This is even a problem with Qt5 Fast Mimetype: text/vnd.trolltech.linguist Slow Mimetype: text/vnd.trolltech.linguist The fix will probably need to go into Qt.
Git commit c19b7a9ded994009c49007d8336afe92acf513cd by Vishesh Handa. Committed on 13/05/2015 at 14:07. Pushed by vhanda into branch 'Plasma/5.3'. Only use the file's content during mimetype detection During the first indexing phase, we only use the filename as we do not want the overhead of reading the contents of the file. During the second indexing phase, we are actually going to be indexing the contents of the file. At this time, it's perfectly fine to read the file's contents to determine the mimetype. We were using QMimeDatabase::mimeTypeForFile with its default settings which takes both the filename and file contents into consideration. This results in interesting cases where if a file ends with '.ts' it is detected as a 'linguist' file, even though the magic byte mapping failed. We want the mimetype to be as exact as possible. We now only use the files contents, and not the filename. Related: bug 342312 FIXED-IN: 5.3.1 M +1 -1 src/file/extractor/app.cpp M +2 -2 src/file/tests/indexerconfigtest.cpp http://commits.kde.org/baloo/c19b7a9ded994009c49007d8336afe92acf513cd