Bug 394002 - Ignore some common file patterns
Summary: Ignore some common file patterns
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.45.0
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-08 13:35 UTC by Guo Yunhe
Modified: 2018-05-15 22:40 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
npm packages and source code files are filling up global search results (55.12 KB, image/png)
2018-05-09 08:49 UTC, Guo Yunhe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guo Yunhe 2018-05-08 13:35:58 UTC
As a web developer, I use some package managers a lot, like NPM and Composer. They will install millions of source code files in one project. NPM folder is usually node_modules. With all these files indexed, what I want to find will never appear.

So probably baloo can have a built in filter that exclude 'node_modules'.
Comment 1 Nate Graham 2018-05-08 17:11:11 UTC
Please provide a list of the filename extensions and/or patterns that you would like excluded.
Comment 2 Guo Yunhe 2018-05-09 08:48:07 UTC
I assume that hidden files and folders start with '.' have been excluded by default. Then here are some files and folder pattern I would like to exclude:

* node_modules/ (NodeJS package folder, it is huge)
* vendor/ (PHP Composer package folder)
* README*
* LICENSE
* COPY
* *.js
* *.css
* *.json
* *.xml
* *.php
* *.py
* *.c
* *.h
* *.cpp
* *.hpp
* *.o
* *.so
* *.py
* __pycache__ (python cache folder, only contains binary files)

However, here are much more source code file extensions. Many be user can search them inside their IDE, with find, or in Dolphin. But indexing all them will be expensive. User may don't want them in global search.
Comment 3 Guo Yunhe 2018-05-09 08:49:01 UTC
Created attachment 112526 [details]
npm packages and source code files are filling up global search results
Comment 4 Guo Yunhe 2018-05-09 08:50:17 UTC
Also these:

* *.js.map
* *.css.map
* *.ts
* *.jsx
* *.tsx
Comment 5 Nate Graham 2018-05-09 13:47:01 UTC
Thanks, I'll be happy to make most of these changes. Don't think we can do "vendor", since it's quite possible that a user could give that name to a non-development directory that they'd like indexed.

Any chance you could provide the MIME types for the files you'd like excluded? Baloo matches by MIME type, not file extension.
Comment 6 Nate Graham 2018-05-09 14:40:07 UTC
Never mind, I found them. I'll submit a patch today.
Comment 7 Nate Graham 2018-05-09 18:48:28 UTC
Patch available: https://phabricator.kde.org/D12787
Comment 8 Nate Graham 2018-05-15 22:40:41 UTC
Git commit 7529727e46242d2fdd71c4e8c92363600373fcb6 by Nathaniel Graham.
Committed on 15/05/2018 at 22:33.
Pushed by ngraham into branch 'master'.

Ignore more types of source files

Summary:
Add more types of development-related files to the exclusion lists. These files aren't useful to index, and having them there can bog down Baloo.
Related: bug 390932, bug 382117
FIXED-IN 5.47

Test Plan: Created a bunch of files of the newly excluded types. Baloo didn't index them.

Reviewers: michaelh, bruns

Reviewed By: bruns

Subscribers: broulik, cfeck, kde-frameworks-devel, #baloo

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D12787

M  +37   -5    src/file/fileexcludefilters.cpp

https://commits.kde.org/baloo/7529727e46242d2fdd71c4e8c92363600373fcb6