Summary: | Huge titles are assigned for PDF files without title | ||
---|---|---|---|
Product: | nepomuk | Reporter: | Antonio Rojas <arojas> |
Component: | fileindexer | Assignee: | Nepomuk Bugs Coordination <nepomuk-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | me, nepomuk-bugs |
Priority: | NOR | ||
Version: | 4.10.80 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | http://commits.kde.org/nepomuk-core/894661480595e90627bb6a10b2e073648b150758 | Version Fixed In: |
Description
Antonio Rojas
2013-06-21 21:24:17 UTC
Confirmed. Do you really think no title should be set? I was thinking of maybe trimming the title to the first 50 or 100 characters. If a PDF file does not have a title set, why should nepomuk try to guess it? Even if it is trimmed, it would probably include some text besides the actual title. It could be confusing when displayed in Dolphin. I think the expected behavior is that the "Title" nepomuk field corresponds to the "Title" field in the PDF file. Well, the reason it was added was that a large number of pdf files do not have titles, and we would still like a title, so we try to infer it from the first page. It works remarkably well for research papers. I'm not too keen on removing this feature. I can either try to guess the title better, or trim it. Git commit 894661480595e90627bb6a10b2e073648b150758 by Vishesh Handa. Committed on 25/06/2013 at 11:10. Pushed by vhanda into branch 'master'. PopplerExtractor: Trim the guessed title to the first 50 characters Sometimes the guessed title is just too long, in those cases we try to trim it to the first 50 characters. M +3 -0 services/fileindexer/indexer/popplerextractor.cpp http://commits.kde.org/nepomuk-core/894661480595e90627bb6a10b2e073648b150758 In beta 2, the "guessed" titles are shorter, but they contain many chinese and other UTF8 characters which seem unrelated to the contents of the PDF Antonio, could you report it as a separate bug, ideally attaching a small PDF file that shows the issue? |