Bug 349966 - Integrate with Baloo search
Summary: Integrate with Baloo search
Status: REPORTED
Alias: None
Product: kdevplatform
Classification: Developer tools
Component: grepview (show other bugs)
Version: 1.7.1
Platform: Fedora RPMs Linux
: LO wishlist
Target Milestone: ---
Assignee: kdevelop-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-06 17:05 UTC by Alexander Potashev
Modified: 2016-10-12 19:26 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Potashev 2015-07-06 17:05:35 UTC
Please index projects' files to speed up full-text search. Currently it takes several seconds to search for a string in the whole Linux kernel source tree.

Reproducible: Always
Comment 1 Aleix Pol 2015-07-06 17:46:45 UTC
I would say that it's quite acceptable few seconds for the whole linux tree...

OTOH, maybe we should make it possible to use git grep somehow? I don't think we should add yet another cache here...
Comment 2 Milian Wolff 2015-07-06 19:44:34 UTC
I agree. Either use the code browsing like quickopen to jump to symbols. Or use git grep and we could think about integrating it. Or better yet, we integrate baloo since that actually has just that - an index of words + search functionality.
Comment 3 Alexander Potashev 2015-07-06 22:17:40 UTC
I was also thinking about git-grep because it searches a few times faster thanks because of reduced I/O load when reading Git database files, compared to reading a lot of small files in the working copy. Also, it doesn't require upfront indexing like Baloo.

Code browsing using quickopen is not enough in some cases. For example, I can't search for string constants or for text in inline comments in quickopen.
Comment 4 Alexander Potashev 2015-07-06 22:22:15 UTC
(In reply to Aleix Pol from comment #1)
> I would say that it's quite acceptable few seconds for the whole linux
> tree...

Aleix,
"several seconds" in this case stand for may be 20 seconds or even more, I didn't do precise measurements. And this time depends on how much of the data to be read are already in the disk cache.
Comment 5 Milian Wolff 2015-07-07 08:45:32 UTC
But git grep only works for stuff you checked in, not for stuff that you currently work on. So everything has a downside. Feel free to create another wishlist report about integrating git grep. But we definitely won't write yet another handcrafted plain text indexer.
Comment 6 Alexander Potashev 2015-07-10 08:38:58 UTC
I'm not sure how to present Baloo search and git-grep to the user. May be there should be a checkbox "Index files for faster search":
 1. If it is on and indexing has completed, then make requests to Baloo.
 2. If it is off or indexing is still in progress, then either fallback to the traditional search or use git-grep (when supported.)

I will not create another ticket about git-grep for now, because Baloo will probably be faster and thus more interesting to have.

You are right, git-grep is faster only when you run it on a commit's file tree which is already in the Git database. But I think that we can combine git-grep with git-diff:
 1. Use git-grep for checked in files,
 2. Detect modified and unstaged files with git-diff and use the traditional algorithm on them,
 3. Merge and filter the results of (1) and (2).
May be this approach should better be integrated in git-grep itself.
Comment 7 Daniel Santos 2015-08-30 20:55:14 UTC
Personally, I would prefer to have the choice to specify what search mechanism(s) to use, perhaps even having an option to say "use git grep/diff" when available.

(In reply to Alexander Potashev from comment #4)
> And this time depends on how much of the
> data to be read are already in the disk cache.
I would venture to say that this is precisely the issue -- disk access. After I do an initial git grep on a tree (which takes several seconds, or even a minute) subsequent git greps are nearly instantaneous -- unless I've gone and blown my FS cache by reading/writing a bunch of other files or using up more memory somewhere else. For the case of the Linux kernel for sure (probably also BSD, I don't know) the FS cache seems to be sufficient.

Another thing to consider is (I will guess anyway) that it's quite likely that git users will have already loaded this data into their FS cache near the start of their development day. Usually, I'll run a git status (just to make sure I didn't leave something uncommitted) and maybe switch branches, do a fetch, pull, rebase, etc. I *believe* that each of these operations will read in all of the source files anyway (I could be wrong, they could just be looking at the timestamps & sizes from a stat).