Bug 376692 - Search function fails to find phrases split over two lines
Summary: Search function fails to find phrases split over two lines
Status: RESOLVED DUPLICATE of bug 418520
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords: usability
Depends on:
Blocks:
 
Reported: 2017-02-20 00:39 UTC by Tom Colley
Modified: 2020-03-28 18:15 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
The example text formatted as a PDF file. (9.81 KB, application/pdf)
2017-02-23 03:58 UTC, Tom Colley
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Colley 2017-02-20 00:39:06 UTC
When using the search function to find phrases in PDF's , I've found that Okular fails to find phrases split over two lines. For example, if I searched the following text in a PDF for "United States", it would only find the first instance; not the other two.

   I love the United States. My friend said she loves the United 
   States too but wouldn't want to live there. I live in the Unit-
   ed States.

Since the need to find phrases whether they are split over two lines or not is almost universal, I think this feature would be widely appreciated. Thank you!

I am using Okular 0.19.3 (KDE Development Platform 4.13.3) on Ubuntu 14.04, which is the latest version currently available in the Ubuntu standard repositories.
Comment 1 Albert Astals Cid 2017-02-20 23:35:45 UTC
Please attach such a file
Comment 2 Tom Colley 2017-02-23 03:58:43 UTC
Created attachment 104179 [details]
The example text formatted as a PDF file.

I've attached an example file/PDF, using the text previously given for example. The PDF contains this text:

   I love the United States. My friend said she loves the United 
   States too but wouldn't want to live there. I live in the Unit-
   ed States.

I revise my diagnosis. When searching for "United States", Okular will find the FIRST and THIRD instances of the phrase, but NOT the SECOND instance. The second instance involves two words in the same paragraph split only by a line break.
Comment 3 Albert Astals Cid 2017-03-03 23:02:42 UTC
Searching code is hard given we basically have to guess what's in the PDF.
Comment 4 Tom Colley 2017-03-03 23:35:28 UTC
(In reply to Albert Astals Cid from comment #3)
> Searching code is hard given we basically have to guess what's in the PDF.

Hi Albert - I don't understand your comment. This request is not about searching code; it's about searching for text (a phrase), in a pdf file. By a phrase, I mean two or more words separated by space. The PDF creation process tends to eliminate spaces at the end of lines and Okular's search function doesn't appear to take this into account. I guess I'm asking for a search function that can interpret line breaks as spaces, though the implementation would have to be more considered.  Some attention has already been put into Okular's search function around this, as shown by the way searching the provided pdf for:

"united states"

finds the third instance, which is formatted with a dash and a line break:

"Unit-
ed States"

So the search function will find instances of phrases split by a dash and a line break, but not separated by a line break only!

Effective searching for phrases is a feature most users will find helpful.
Comment 5 Albert Astals Cid 2017-03-05 18:18:34 UTC
I understand what you say, it's you that don't understand what i say.

I was just saying how making code to search text in a pdf is not as easy as you would think since we have to guess what's in the PDF.
Comment 6 Tom Colley 2017-03-06 00:32:13 UTC
(In reply to Albert Astals Cid from comment #5)
> I was just saying how making code to search text in a pdf is not as easy as
> you would think since we have to guess what's in the PDF.

I see. I know these things often look simpler on the surface. Thanks for clarifying.
Comment 7 Nate Graham 2020-03-28 18:15:55 UTC
Just fixed; see Bug 418520!

*** This bug has been marked as a duplicate of bug 418520 ***