Bug 336084

Summary:	Excessive delay in returning meta-data for non-indexed files.
Product:	[Unmaintained] Baloo	Reporter:	Paul <pip.kde>
Component:	Widgets	Assignee:	Vishesh Handa <me>
Status:	RESOLVED FIXED
Severity:	normal	CC:	ukyoi
Priority:	NOR
Version First Reported In:	4.13
Target Milestone:	---
Platform:	openSUSE
OS:	Linux
Latest Commit:	http://commits.kde.org/baloo/434e3ef2500f64eb3ac2a4f656b47724d04d9c6f	Version Fixed/Implemented In:	5.0
Sentry Crash Report:

Description Paul 2014-06-11 15:54:20 UTC

When running with 'Desktop Search' disabled 'libbaloowidgets4 / baloo_file_extractor' takes an excessive amount of time to return meta-data for non-indexed files.

For example, using this file: https://wiki.documentfoundation.org/images/3/35/WG40-WriterGuideLO.pdf

When opened in Okular is almost instant and there is no delay in displaying the meta-data (from Okular's 'Properties').

In Dolphin, mouse-over displays basic file information immediately, baloo_file_extractor then takes 100% of one processor core and, after approximately 8 seconds [1], the meta-data is displayed. (I wonder if it's actually needlessly indexing the entire file rather than just returning the meta-data.)

Thus in Dolphin rendering the use of the information panel completely impractical, at least for PDF files. 

There is more detailed discussion of this on the openSUSE forum, it's a long thread, this would be a good starting point: https://forums.opensuse.org/showthread.php/498098-KDE-4-13-1-Dolphin-Information-Panel-No-Meta-Data?p=2645577#post2645577


[1] Using a relatively modest PC: AMD Athlon 64X2 5600+, 4GB RAM, and using an SSD

Comment 1 Vishesh Handa 2014-06-12 11:36:32 UTC

Confirmed.

The indexer is temporarily indexing the entire file including the plain text, and not just the metadata. Hence the noticable delay.

I'll try to improve stuff, so that in this case only the metadata is extracted.

Comment 2 Paul 2014-06-12 15:07:52 UTC

(In reply to comment #1)
> I'll try to improve stuff, so that in this case only the metadata is
> extracted.

Excellent - Thanks. :)

For future flexibility perhaps baloo_file_extractor should take arguments to indicate what to return...
All meta-data, Specific Named meta-data, No meta-data, File Content... that sort of idea, then a programme calling baloo_file_extractor could specify exactly what it wanted.

Comment 3 Vishesh Handa 2014-07-01 14:55:15 UTC

Git commit 434e3ef2500f64eb3ac2a4f656b47724d04d9c6f by Vishesh Handa.
Committed on 01/07/2014 at 15:03.
Pushed by vhanda into branch 'frameworks'.

Extractor: Do not extract the plain text in --bdata mode

This is the mode that is used to temporarily extract the metadata. It's
used in the dolphin side panel. It doesn't make senese for us to extract
the plain text and then discard it. Extracting pdf metadata is now much
much faster.
FIXED-IN: 5.0

M  +6    -1    src/file/extractor/app.cpp
M  +2    -2    src/file/extractor/result.cpp
M  +1    -1    src/file/extractor/result.h

http://commits.kde.org/baloo/434e3ef2500f64eb3ac2a4f656b47724d04d9c6f

Comment 4 Frank Reininghaus 2014-08-10 10:55:34 UTC

*** Bug 338170 has been marked as a duplicate of this bug. ***