SUMMARY Searching a PDF document in Okular does not match any accented character (diacritics) STEPS TO REPRODUCE 1. Open PDF document containing an accented character like à 2. Search for "à" OBSERVED RESULT Search finds no match. EXPECTED RESULT Find a match. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Mageia 7 KDE Frameworks Version: 5.57.0 Qt Version: Qt 5.12.6 (built against 5.12.2) ADDITIONAL INFORMATION This is a different problem from Bug #274933.
file please?
Created attachment 127151 [details] Screenshot showing search result for "e" Searching for e in case-insensitive mode seems to show that the accented character is interpreted as two characters, with only the first one matching.
Created attachment 127152 [details] PDF file with a series of accented characters
Another point I noticed: when using the Selection tool and selecting an accented character, the context menu offers to copy two characters.
All the needed data supplied. Conflicting bug report: https://bugs.kde.org/show_bug.cgi?id=274933
To connect this with bug #274933: if the text contains "aé éa" then searching for "ae" matches the first word (and highlights the "a" and only one half of the "é" character as in the screenshot). Searching for "ea" does not find a match, I assume because of the virtual, unmatchable second character of "é".
I'm sorry but this is not a bug, the PDF is simply not created correctly and is created with an A and then a ` on top of it as two seperate caracters and not with a À character. That's why search fails and why copy&paste gives you two characters, because there's two characters. I have not been able to find any PDF viewer that can search à in this document (Adobe Reader cheats and since it can't find any à it says, i'm going to match all the a in the document and also matches Ä for example)
The PDF is generated by pdflatex. I still think that is a bug, because one way I can start such a search is by copying accented characters from the document and pasting them into the search box. I don't know if that is one or two characters but whatever that string is in the PDF, I'd like to search for it. > I have not been able to find any PDF viewer that can search à in this document I have. It was the first one I tried: evince.
Created attachment 127309 [details] Search for "à" in the document using evince
(In reply to Jerome from comment #8) > The PDF is generated by pdflatex. I know latex has too many configuration options and one stupidly does it wrong, search because there's one that does it right and writes a single character. > > I still think that is a bug, because one way I can start such a search is by > copying accented characters from the document and pasting them into the > search box. I don't know if that is one or two characters but whatever that > string is in the PDF, I'd like to search for it. > > > I have not been able to find any PDF viewer that can search à in this document > > I have. It was the first one I tried: evince.
Indeed that PDF document was encoded as OT1, which is not recommended, and the search works with the document encoded with T1. What I find strange is that when copied and pasted into the search box, the pair of characters (letter + diacritic) is correctly interpreted, and I assume, converted to its unicode equivalent.