Bug 325008

Summary: strange font in information panel in pdf files
Product: nepomuk Reporter: Simon Solinas <ksolsim>
Component: fileindexerAssignee: Nepomuk Bugs Coordination <nepomuk-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: cfeck, dinolib2, frank78ac, nepomuk-bugs, peter.fink126
Priority: NOR    
Version: 4.11.1   
Target Milestone: ---   
Platform: Archlinux Packages   
OS: Linux   
URL: http://s21.postimg.org/pfv5ga6on/screen.png
Latest Commit: Version Fixed In: 4.11.3
Attachments: screenshoot
problematic file

Description Simon Solinas 2013-09-17 16:33:26 UTC
if I look at the title of a pdf file into the information panel, this consists by the real title followed by a series of strange characters ( perhaps Chinese, Japanese, Korean or unknown)

Reproducible: Sometimes

Steps to Reproduce:
1. open or create a simple pdf file with Calligra Word or LibreOffice Write.
2. select the pdf file 
3. look at the information panel
Comment 1 Frank Reininghaus 2013-09-17 20:59:25 UTC
Thanks for the bug report. Please always include a screenshot when you see something strange in the application. I'm not quite sure if you refer to the preview image (in that case, it would be a problem with the thumbnailer) or to the title of the PDF file (in which case it might be a Nepomuk problem).

It would also be good if you could attach a problematic file, because I could not reproduce any problems with a few test files yet. Thanks for your help!
Comment 2 Simon Solinas 2013-09-17 21:39:50 UTC
Screenshot is present in the URL section  above. Nepomuk is disabled. I don't know how I could create a problematic file in this case.
Comment 3 Simon Solinas 2013-09-17 21:40:29 UTC
Created attachment 82380 [details]
screenshoot

screenshoot
Comment 4 Frank Reininghaus 2013-09-17 21:45:48 UTC
Thanks for the quick reply.

(In reply to comment #2)
> Screenshot is present in the URL section above.

Oops, sorry, I must have missed that! Sorry about that.

If I'm not mistaken, this information inside the Information Panel is provided by Nepomuk even if the indexer is disabled, so I'll reassign.

> I don't know how I could create a problematic file in this case.

Well, if "thisisatest.pdf" does not contain anything private, you could attach it here.
Comment 5 Simon Solinas 2013-09-17 21:48:34 UTC
Created attachment 82381 [details]
problematic file
Comment 6 Christoph Feck 2013-09-17 23:02:26 UTC
Another test file can be fetched from http://www.mabb.de/files/content/document/Foerderung/mabb_Broschuere_OER_in_der_Praxis.pdf

It displays "Title: Offene " followed by many garbage characters (looks like binary), actual title should be "Offene Bildungsresourcen (OER) in der Praxis".
Comment 7 Christoph Feck 2013-09-18 01:21:48 UTC
Interesting detail: If I hover over the PDF from comment #6 forth and back multiple times, the "Title: Offene" is constant, while the garbage that follows it changes randomly, so it looks like the parser references random pointers.
Comment 8 Christoph Feck 2013-10-06 23:07:53 UTC
https://git.reviewboard.kde.org/r/113138/
Comment 9 Christoph Feck 2013-10-06 23:41:51 UTC
Git commit 4a719dc3a0a8ee8e896e56544c2dfa642fd0f037 by Christoph Feck.
Committed on 06/10/2013 at 23:39.
Pushed by cfeck into branch 'KDE/4.11'.

Fix trailing garbage in extracted PDF title
FIXED-IN: 4.11.3
REVIEW: 113138

M  +2    -3    services/fileindexer/indexer/popplerextractor.cpp

http://commits.kde.org/nepomuk-core/4a719dc3a0a8ee8e896e56544c2dfa642fd0f037
Comment 10 Christoph Feck 2013-10-12 19:47:26 UTC
*** Bug 324706 has been marked as a duplicate of this bug. ***