Bug 336084

Summary: Excessive delay in returning meta-data for non-indexed files.
Product: [Frameworks and Libraries] Baloo Reporter: Paul <pip.kde>
Component: WidgetsAssignee: Vishesh Handa <me>
Status: RESOLVED FIXED    
Severity: normal CC: ukyoi
Priority: NOR    
Version: 4.13   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 5.0

Description Paul 2014-06-11 15:54:20 UTC
When running with 'Desktop Search' disabled 'libbaloowidgets4 / baloo_file_extractor' takes an excessive amount of time to return meta-data for non-indexed files.

For example, using this file: https://wiki.documentfoundation.org/images/3/35/WG40-WriterGuideLO.pdf

When opened in Okular is almost instant and there is no delay in displaying the meta-data (from Okular's 'Properties').

In Dolphin, mouse-over displays basic file information immediately, baloo_file_extractor then takes 100% of one processor core and, after approximately 8 seconds [1], the meta-data is displayed. (I wonder if it's actually needlessly indexing the entire file rather than just returning the meta-data.)

Thus in Dolphin rendering the use of the information panel completely impractical, at least for PDF files. 

There is more detailed discussion of this on the openSUSE forum, it's a long thread, this would be a good starting point: https://forums.opensuse.org/showthread.php/498098-KDE-4-13-1-Dolphin-Information-Panel-No-Meta-Data?p=2645577#post2645577


[1] Using a relatively modest PC: AMD Athlon 64X2 5600+, 4GB RAM, and using an SSD
Comment 1 Vishesh Handa 2014-06-12 11:36:32 UTC
Confirmed.

The indexer is temporarily indexing the entire file including the plain text, and not just the metadata. Hence the noticable delay.

I'll try to improve stuff, so that in this case only the metadata is extracted.
Comment 2 Paul 2014-06-12 15:07:52 UTC
(In reply to comment #1)
> I'll try to improve stuff, so that in this case only the metadata is
> extracted.

Excellent - Thanks. :)

For future flexibility perhaps baloo_file_extractor should take arguments to indicate what to return...
All meta-data, Specific Named meta-data, No meta-data, File Content... that sort of idea, then a programme calling baloo_file_extractor could specify exactly what it wanted.
Comment 3 Vishesh Handa 2014-07-01 14:55:15 UTC
Git commit 434e3ef2500f64eb3ac2a4f656b47724d04d9c6f by Vishesh Handa.
Committed on 01/07/2014 at 15:03.
Pushed by vhanda into branch 'frameworks'.

Extractor: Do not extract the plain text in --bdata mode

This is the mode that is used to temporarily extract the metadata. It's
used in the dolphin side panel. It doesn't make senese for us to extract
the plain text and then discard it. Extracting pdf metadata is now much
much faster.
FIXED-IN: 5.0

M  +6    -1    src/file/extractor/app.cpp
M  +2    -2    src/file/extractor/result.cpp
M  +1    -1    src/file/extractor/result.h

http://commits.kde.org/baloo/434e3ef2500f64eb3ac2a4f656b47724d04d9c6f
Comment 4 Frank Reininghaus 2014-08-10 10:55:34 UTC
*** Bug 338170 has been marked as a duplicate of this bug. ***