Bug 427481 - Quick Open plugin does not support nested project directories
Summary: Quick Open plugin does not support nested project directories
Status: REPORTED
Alias: None
Product: kdevelop
Classification: Applications
Component: UI: QuickOpen (other bugs)
Version First Reported In: git master
Platform: Compiled Sources Linux
: LO normal
Target Milestone: ---
Assignee: kdevelop-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-09 11:33 UTC by Igor Kushnir
Modified: 2021-01-14 18:32 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Kushnir 2020-10-09 11:33:56 UTC
SUMMARY
Quick Open plugin assumes that a given path can belong to a single project. See struct ProjectFile in https://commits.kde.org/kdevelop?path=plugins/quickopen/projectfilequickopen.h. When a file belongs to multiple open projects, closing one of the projects removes the file from Quick Open file list.

STEPS TO REPRODUCE
1. Open a project that has a subdirectory with a CMakeLists.txt file in it.
2. Check that a file in the aforementioned subdirectory can be opened via Quick Open search.
3. Open the aforementioned project's subdirectory as another project.
4. Check that the file from step 2 can still be opened via Quick Open search.
5. Close the subdirectory project.
6. Check if the file from step 4 can still be opened via Quick Open search.

OBSERVED RESULT
The file is no longer available in Quick Open list.

EXPECTED RESULT
The file is still available in Quick Open list.

ADDITIONAL INFORMATION
First of all: is there a use case for nested project directories? And should KDevelop support file paths that belong to multiple open projects?

If this should be supported, a possible fix is to store a QVector projectPaths in the ProjectFile struct. In ProjectFileDataProvider::fileRemovedFromSet(): if the file to be removed has projectPaths.size() > 1, simply remove the project path of the ProjectFileItem from projectPaths; if projectPaths.size() == 1, remove the entire ProjectFile.
Comment 1 Milian Wolff 2020-10-12 09:22:03 UTC
Hm, ideally we would somehow have a list of items for every project and then another step on top which de-duplicates that list. But I don't personally think this is a very critical issue, as one rarely opens projects in such a way. And if one does, then one wouldn't close the project again, or?

So I'm setting the importance to LO.

If this is important for your workflow, and you really need to have it fixed: Adding a vector of projects to every item will have a big performance impact if done naively. I would prefer not to do that. Rather, can't we check when a file is removed if there's another project that still contains it? Afaik that's a matter of iterating through the list of still-opened projects and doing a quick hash lookup each time. That will ensure that the performance impact of loading a project is minimal, and we also won't have to store more data in memory either.
Comment 2 Igor Kushnir 2020-10-26 17:22:02 UTC
(In reply to Milian Wolff from comment #1)
> If this is important for your workflow, and you really need to have it
> fixed: Adding a vector of projects to every item will have a big performance
> impact if done naively. I would prefer not to do that. Rather, can't we
> check when a file is removed if there's another project that still contains
> it? Afaik that's a matter of iterating through the list of still-opened
> projects and doing a quick hash lookup each time. That will ensure that the
> performance impact of loading a project is minimal, and we also won't have
> to store more data in memory either.
This is not important to my workflow, but I like to make the code work right unless it causes other issues. I tried to modify ProjectFileDataProvider::projectClosing() to fix this bug. First I merged all file sets from still open projects (obtained by a call to  IProject::fileSet()) into one QSet, then removed all elements of m_projectFiles that weren't in this set. But this turned out to be very slow compared to my optimizations in https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/187. Then I tried to store the file sets in a std::vector<QSet<IndexedString>> and remove all elements of m_projectFiles that weren't in any of the sets. projectClosing() became much faster with this change, but not nearly as fast as the optimizations in the aforementioned merge request.

For example, consider closing many projects at KDevelop exit, the last two of which are WebKit with almost 320 thousand files and kdevelop with a little more than 67 thousand files. With the merge request's optimizations, closing a project (that may even contain less than a thousand files) located in the same directory as WebKit and kdevelop, takes 3 ms on average. Closing a project that has a projectPath with a number of segments unequal to that of WebKit and kdevelop takes 1 ms.

Neither of the QSet approaches described above depend on the number of segments in projectPath. Both 3 ms and 1 ms turned into 148 ms on average with one QSet and into 92 ms on average with a vector of QSets. These costs can add up and cause a one second slowdown on each KDevelop exit. And a simple QSet implementation wouldn't even make overlapping projects work right. In addition, the projectPath of a not-removed ProjectFile has to be replaced with the still open project's path that contains it. And if there are still multiple such open projects, which one to choose?

Until there is a convincing use case or a legitimate workflow that suffers from this bug, I don't want to sacrifice performance for a fix.
Comment 3 Milian Wolff 2020-10-26 20:38:43 UTC
I think one way to fix this would be to check if the closed project path clashes with one of the other open project paths, and if so, reset the provider again. that should then re-add any potentially closed files, no?
Comment 4 Igor Kushnir 2020-10-27 10:57:25 UTC
(In reply to Milian Wolff from comment #3)
> I think one way to fix this would be to check if the closed project path
> clashes with one of the other open project paths, and if so, reset the
> provider again. that should then re-add any potentially closed files, no?
Yes, that's an interesting idea. If this quick check detects a clash, clear m_projectFiles and add all still open projects one by one (as is done in ProjectFileDataProvider's constructor). I'll try this out later. This implementation wouldn't be bullet-proof though as there may be shared files outside of project directories, e.g. ProjectTargetItem can contain files from out-of-tree builds.
Comment 5 Igor Kushnir 2020-10-28 18:06:05 UTC
Git commit 8e0f203ada99acf9e39e4ccd2c8f989ef491eff3 by Igor Kushnir.
Committed on 28/10/2020 at 12:10.
Pushed by igorkushnir into branch 'master'.

ProjectFileDataProvider::projectClosing: match projectPath

The current algorithm iterates over project files and erases matching
items from QVector m_projectFiles one by one. This is very slow, because
KDevelop::forEachFile() iterates over project files in an order close to
lexicographic, and m_projectFiles is sorted in lexicographic order too.
So the average position of an erased item is close to QVector::begin().
The overall complexity of this algorithm is O(N*M), where N is the size
of m_projectFiles before a project is closed, and M is the number of
files in the project that is being closed.

Simply iterating over the project files in the reverse order speeds up
the erase one-by-one algorithm a lot. But there is an even faster
alternative, which is implemented in this commit: instead of iterating
over or collecting project files and then removing matching items from
m_projectFiles, remove all items that have projectPath equal to the path
of the project that is being closed. The complexity of this algorithm is
O(N).

Before this commit on average 3150 ms were spent in
ProjectFileDataProvider::projectClosing() during KDevelop exit while a
single project - kdevelop itself with a little more than 67 thousand
files - was open. At this commit the time is just 14 ms. When the single
project to be closed was WebKit with its almost 320 thousand files,
about 196 *seconds* were spent in this function before the commit and
only 56 ms at this commit.

The best time I could achieve for closing kdevelop project by iterating
over project files in the reverse order and applying minor heuristic
optimizations was 165 ms. And such a reverse iteration could still be
extremely slow when several projects were closed.

This change makes
BenchQuickOpen::benchProjectFileFilter_addRemoveProject() 1.5 times
faster for a project with 100 files; 1.7 times faster for a project with
500 files; 2.2 times faster for a project with 1000 files; 5.9 times
faster for a project with 10000 files.

This commit changes behavior in case when a single file belongs to
multiple projects. Previously all files belonging to the project being
closed were removed. Now only the files that were added in the matching
projectOpened() call are removed. Neither the previous nor the current
behavior is ideal. But removing the files that were added along with the
project makes a bit more sense to me.

M  +6    -3    plugins/quickopen/projectfilequickopen.cpp

https://invent.kde.org/kdevelop/kdevelop/commit/8e0f203ada99acf9e39e4ccd2c8f989ef491eff3
Comment 6 Igor Kushnir 2021-01-14 18:32:43 UTC
I think there is a way to fix this bug that would actually improve performance in the normal case when project file sets don't overlap: simply don't remove duplicates after merging sorted ranges that belong to different projects. fileAddedToSet(), fileRemovedFromSet() would have to use std::equal_range instead of std::lower_bound and consider two items equal only if their project paths are the same. The overall code complexity would remain roughly the same. The only remaining question: is it acceptable/good idea to have multiple items in the quick open list, which differ only in projectPath?