I discovered that I couldn't find some of my videos with baloo because they weren't indexed. I went ahead and issued a "baloo_file_extractor" on one of the files in konsole and it told me "<file> should not be indexed. Ignoring". Looking at the source code I found the reason: commit 282c8dff201d19fd6dbaf42a07cb561b644c5b18 Author: Vishesh Handa <me@vhanda.in> Date: Tue Jun 17 16:04:49 2014 +0200 RegExpCache: Use 'QRegularExpression' instead of "QRegExp" This results in a performance increase of almost 10x. This is especially important because with this we will now consume less cpu when checking which files should be indexed, and we will be faster. The problem with QRegularExpression is that it doesn't support wildcards (see http://qt-project.org/doc/qt-5/qregularexpression.html#wildcard-matching). So the exclude filters now match way too much. Example: There is "*.o" in the exclude filters, this was ok with QRegExp because it would have meant "Every file/folder ending with .o", but in regexp this means "Match 0 or more times any character (except newline) o". So every file/folder ending with o is ignored (that's the case for my videos). So we could either revert to QRegExp or change the exclude filters to correct regular expressions. What's your opinion Vishesh? Reproducible: Always Steps to Reproduce: 1. Create a file or a folder ending with o 2. Try to index the file with baloo_file_extractor Actual Results: Because the last character is o it matches the exclude filter part "*.o" and is ignored. Expected Results: It should get indexed.
Please see https://git.reviewboard.kde.org/r/120570/ for my proposed fix.
Git commit 863ccc6f7901528338efabfef78098fc72cbd94f by Dominik Cermak. Committed on 14/10/2014 at 11:36. Pushed by cermak into branch 'Plasma/5.1'. Escape dots in exclude filters In regular expressions a dot (.) matches any character (except newline) but in the exclude filters we use wildcard syntax (*) and want a dot (.) to be interpreted as a character. Example: With "*.o" in the exclude filters the user expects object files (ending with .o) are excluded. Without escaping this would match every file and folder ending with o though. This is the case for all entries of that form in exlude filters ("*.moc", "*.la", etc.) So just escape every dot we find in exclude filters with a backslash while building the regexp. FIXED-IN: 5.1.1 REVIEW: 120570 M +1 -0 src/file/regexpcache.cpp http://commits.kde.org/baloo/863ccc6f7901528338efabfef78098fc72cbd94f