Summary: | Optimize RegExpCache | ||
---|---|---|---|
Product: | [Unmaintained] nepomuk | Reporter: | Vishesh Handa <me> |
Component: | general | Assignee: | Nepomuk Bugs Coordination <nepomuk-bugs> |
Status: | RESOLVED UNMAINTAINED | ||
Severity: | task | CC: | lukiasz |
Priority: | NOR | ||
Version: | git master | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | Patch includes a new RegExpCache class, can be applied to master branch. |
Description
Vishesh Handa
2012-07-17 09:07:13 UTC
Created attachment 78701 [details] Patch includes a new RegExpCache class, can be applied to master branch. Proposed solution transforms all filters from wildcard syntax to full regexp syntax and combines them into one big expression. For example this five filters: "*.pyc, *.po, *.tmp, *.orig, autom4te" will be converted to: "(.*\.(pyc|po|tmp|orig)|autom4te)". For default nepomuk filters and almost all files on my computer (fresh opensuse distribution with kde sources), testing one big regexp is about five times faster than comparing file names with every filter. For small amount of filters (about 4) old regexp cache is noticeably faster, so in that case this approach is in use. Provided patch is tested and works well with full wildcard syntax ('?', "*" and "[ ]" operators). Special full regexp characters not supported in wildcard format are treated like normal characters and escaped properly. Just in case if something went wrong and output regexp is not valid, it switch to old behaviour and logs error to kDebug. Isolated testcases can be downloaded from http://treadstone.sh.dug.net.pl/regexp_cache.zip . I've also included a small list of files (about 2.8MB), but if you want to get more valuable speedup result, you should provide much bigger list (I've tested it with 25MB file). Hey Lucas Wow! This is quite amazing. However, I think it would be better if you uploaded the patch on the reviewboard. That way the other developers can also have a look at it as well. It'll be a little hard, but it would be nice to commit your tests as well. That way everyone will know why this version is faster. Do you want any help in organizing the files properly? Hi Vishesh, Yes, I surely need some informations about those tests. How should they look like and what should be compared? I can also make some kind of unit tests and put them somewhere in repo. Also, part of code responsible for regexp generation is quite complicated and someone might need some time to get involved into - should I put more inline comments to make itmore understandable at first glance? Reviewboard's url for nepomuk is: https://git.reviewboard.kde.org? Thank you for taking the time to file a bug report. The Nepomuk project is no longer included in the KDE Software Compilation. With Plasma 5, we have replaced most of the underlying technology with Baloo and other components. Hopefully this will have addressed your concern. We encourage you to try to Plasma 5 (+Baloo) and let us know if your problem persists. |