Bug 125150 - PDF files are not searchable
Summary: PDF files are not searchable
Status: RESOLVED UNMAINTAINED
Alias: None
Product: kdeprint
Classification: Unmaintained
Component: general (show other bugs)
Version: 3.3.2
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: KDEPrint Devel Mailinglist
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-08 04:53 UTC by Greg Hartman
Modified: 2011-05-27 18:16 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
"Print to file (PDF)" special printer screenshot (39.26 KB, image/png)
2007-01-12 23:21 UTC, Kurt Pfeifle
Details
"Print to file (PDF)" special printer: click "Properties --> Driver Settings" and set it up to embed all fonts fully (144.17 KB, image/png)
2007-01-12 23:22 UTC, Kurt Pfeifle
Details
kpdf showing this very web page printed with KDE "Print to file (PDF)" being searchable (66.17 KB, image/png)
2007-01-12 23:23 UTC, Kurt Pfeifle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Hartman 2006-04-08 04:53:18 UTC
Version:           3.3.2 (using KDE 3.3.2,  (3.1))
Compiler:          gcc version 3.3.5 (Debian 1:3.3.5-13)
OS:                Linux (i686) release 2.6.8-3-386

It I attempt to save the web page as a PDF file:

1. Browse to an interesting page
2. Choose Print from the Location menu
3. Choose Save To File (PDF)
4. Press the Print button

the resulting file is not searchable. If I run pdftotext on the resulting file, I get no output. Normally I would expect to get a text representation of the content of the web site.

When I encountered this problem in other web browsers I was able to work around it by disabling the FreeType 2 library. My guess is that FreeType 2 is doing something to the Postscript output for printing that obscures the characters.
Comment 1 Philip Rodrigues 2006-04-08 11:01:31 UTC
Do you get the same result if you print to pdf from other KDE applications?
Comment 2 Greg Hartman 2006-04-10 18:30:21 UTC
I tried printing from Kontact and encountered the same problem.

I then tried printing to a Postscript file from Firefox (a non-KDE application) and running it through kprinter and printing to a PDF. The resulting PDF was searchable.

Does this mean that KDE applications are generating Postscript files that don't use the show operator to display strings?
Comment 3 m.wege 2006-07-26 08:08:01 UTC
I have a question: Is there any way to get around this bug? I would like to create a PDF from my kaddressbook.
Comment 4 Kurt Pfeifle 2007-01-12 23:20:07 UTC
Mark, Greg, Phil:

This should work!

However, you need to enable font embedding into PostScript files. To be sure, enable it first for Qt. (Run "qtconfig")

Then enable it in KDEPrint as well. Run "kaddprinterwizard --kdeconfig", click "Fonts" in left column, configure it...

Last, once you've selected the (special) printer in the print dialog "Print to file (PDF)", click "Properties", go to "Driver Settings" tab. Set "Embed all fonts", "Embed complete font" and "Maxiumum subset" to 1 (!!!).

It may fail with older Ghostscript versions. It should work with newer ones, like ESP Ghostscript 8.15.

See attached screenshots. They show how I access this very site with Konqueror, print it to PDF and open the result in kpdf, searching it for "searchable".

It may fail with various websites, using fonts that are not available for embedding (which may be replaced by your system with pixmap fonts, that are not really searchable). One such example for me+mySystem is Google (which I didn't really have time yet to investigate the details for).

Cheers,
Kurt
Comment 5 Kurt Pfeifle 2007-01-12 23:21:20 UTC
Created attachment 19251 [details]
"Print to file (PDF)" special printer screenshot
Comment 6 Kurt Pfeifle 2007-01-12 23:22:42 UTC
Created attachment 19252 [details]
"Print to file (PDF)" special printer: click "Properties --> Driver Settings" and set it up to embed all fonts fully
Comment 7 Kurt Pfeifle 2007-01-12 23:23:57 UTC
Created attachment 19253 [details]
kpdf showing this very web page printed with KDE "Print to file (PDF)" being searchable
Comment 8 Kurt Pfeifle 2007-01-12 23:26:28 UTC
Mark Wege,

tell me if this works for your kaddressbook.
Comment 9 m.wege 2007-01-13 09:09:04 UTC
Hi Kurth,
it works in general, but there are some strange things, which I do not know if they are a problem of KDE or not.
If done the search with the Linux Version of Adobe Reader: The search is done, but not always successfull. The search does not seem to find words which contain double letters, like "Nonnendamm". When I search another term near this entry it turns out in the search window it is reduced in the result window to "Nonendam". This happens with all words with double letters. The PDF displays correctly. Unfortunately KPDF does not support search, so I can not tell, if it is problem of the PDFs or Adobe Reader.
Comment 10 m.wege 2007-01-13 09:11:29 UTC
BTW: It is great that there is something done about this bug. Since this seems a settings problem, I hope the fix makes it into the distributions.
Comment 11 Kurt Pfeifle 2007-01-13 09:28:43 UTC
      "Unfortunately KPDF does not support search, so I can not tell, 
       if it is problem of the PDFs or Adobe Reader."

OF COURSE kpdf supports search!!!!! (Unless you are stuck with a very old version.)

Didn't you look at my screenshot http://bugs.kde.org/attachment.cgi?id=19253&action=view (showing exactly kpdf in search mode)? Look at the blue-ish highlight of the found word. The search field to type in the search string is on the top of the left thumbnail preview column.
Comment 12 Lars 2007-09-25 23:59:11 UTC
My experience on this subject:

Do NOT activate font embedding for postscript files for QT/KDE in qtconfig or kaddprinterwizard (like succested in #4).
But DO activate font embedding for ghostscript in kprint -> Print to pdf -> properties -> driver settings (also succested by Kurt in #4).

Background:
There are two steps to generate a pdf from konqueror:
1. konqueror generates postscript for a document (with Qt's ps driver)
2. ghostscript "converts" this postscript to a PDF file

Qt's postscript driver seems to embed fonts in an unproper way, like described here:
http://lists.kde.org/?l=kde&m=109064343317685&w=2
(only Type 3 and no names...)

So the generated postscript from konqueror in the first step seems to containt improperly embedded fonts. And ghostscript (in step 2: postscript -> pdf) can't "repair" this.

As soon as you deactivate font embedding for the first step (Qt's font embedding), ghostscript can THEN do a proper font embedding in the second step.


My PDFs generated in this way all are searchable by kpdf and contains properly embedded fonts (in my case with konqueror: truetype fonts with proper font names).
Comment 13 Greg Hartman 2007-09-26 16:34:53 UTC
I followed the advice in comment #4 and still got unsearchable PDFs. In light of this I gave up on the bug until I read comment #12 this morning.

Disabling font embedding in qtconfig fixed the PDF files.

My guess is that this indicates that Qt's font embedding is broken, so I'm going to reopen the bug.

I'm currently running:

Debian Etch
qt 3.3.7-4
ESP Ghostscript 8.15.3
konqueror  3.5.5a.dfsg.1-6
Comment 14 Cristian Tibirna 2007-09-27 06:03:44 UTC
Maybe open it against Qt? KDEPrint offers the ability to generate fully useable PDFs, even there where the fonts used aren't available. The fact that the resulting PDF generated by Qt isn't searchable isn't KDEPrint's thing to solve, unfortunately.

Thanks for your help.
Comment 15 Philipp Sternberg 2007-10-17 15:42:23 UTC
Well yeah I agree with Comment #14, the bug should be (re)opened against qt (at least as a whish), for qt's post-script driver is apparently the guilty part of this mess:
As stated in the mailing list thread mentioned in #12 (http://lists.kde.org/?l=kde&m=109064343317685&w=2)
'the Qt PostScript driver will not embed TrueType fonts as Type42
so if you embed TrueType fonts with any KDE application and make a PDF, the
TrueType fonts will not be scalable and will look BAD.' (and apparently the ability to search the text gets lost with this too)

For me the #12 worked out, I get less ugly pdfs which are searchable.
Anyway to work around all this (disabling the font embedding in qtconfig kaddprinterwizard) you can tell the program to be printed from to use a font which is not Type3 (TrueType) but Type1. However the program to be printed from must allow you to select a font, (as does kwrite).
You may run "kcmshell kcmfontinst" as root to see which fonts are of what type (Type 1 or not)

cheers,

Phil

Comment 16 John Layt 2011-05-27 18:16:09 UTC
KDEPrint is obsolete, unmaintained and will never be revived.  Closing all open bugs.