Bug 207748

Summary: LTR languages searches text backwards
Product: [Applications] okular Reporter: Dotan Cohen <kde-2011.08>
Component: generalAssignee: Okular developers <okular-devel>
Status: CONFIRMED ---    
Severity: wishlist CC: aacid, adam.golanski, dragon, eladhen2, Fahad.alsaidi, matitiahu.allouche, med.medin.2014, mh.firouzjah, munzirtaha, nadavkav, nate, ohadcn, olivier, overman.supermundane, postix, shimi.chen, simonandric5, syn_org939, tsm.7
Priority: NOR Keywords: rtl, usability
Version: unspecified   
Target Milestone: ---   
Platform: Ubuntu   
OS: Unspecified   
See Also: https://bugs.kde.org/show_bug.cgi?id=407133
https://bugs.kde.org/show_bug.cgi?id=439791
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Hebrew-language PDF document.
arabic text

Description Dotan Cohen 2009-09-18 02:38:00 UTC
Version:            (using KDE 4.3.1)
Installed from:    Ubuntu Packages

When searching through Hebrew text, text is searched for backwards (probably due to the fact that visual Hebrew is used in PDF documents). If the text searched for is all Hebrew, then Okular could reverse the order of the letters when searching.
Comment 1 Pino Toscano 2009-10-06 19:19:43 UTC
Can you attach a sample document showing the issue?
Comment 2 Dotan Cohen 2009-11-28 17:26:08 UTC
Created attachment 38661 [details]
Hebrew-language PDF document.

All Hebrew PDF documents display the issue. Here is one.
Comment 3 Gadi Cohen 2010-01-17 20:53:45 UTC
I can confirm this.

I don't think the fix is that complicated either.

I'm not really familiar with the libraries (and I'm a GNOME user), but a quick search reveals that KDE has reliable BiDi support since 3.0.1.  In particular, I found this function:

QCString QHebrewCodec::fromUnicode ( const QString & uc, int & lenInOut ) const [virtual]

(from http://doc.trolltech.com/3.3/qhebrewcodec.html)

which I could guess, that if the search string piped through it before the search takes place, it would fix the problem.

(Although there might be a more suitable Qt bidi function which fixes Hebrew, Arabic and all other RTL languages in one go.)

Gadi
Comment 4 Dotan Cohen 2010-01-17 21:30:18 UTC
It seems that bug #128609 is for the same issue, but for KPDF instead of Okular. One of these bugs should be duped of the other. I will leave it to the devs to decide which. Thanks.
Comment 5 Matitiahu Allouche 2010-01-18 22:29:06 UTC
PDF's objective is to reflect the exact appearance of text.  For Hebrew, it means that the glyphs are stored in visual order.  If your PDF viewer accepts user input in logical order (which is the case in Windows and Linux), it should transform search arguments (captured from a user dialog) from logical to visual order before performing the search.
For Arabic, there is the additional issue that the glyphs represent letter shapes, and you must perform "shaping", in addition to reordering, on the search arguments to choose the proper glyphs for each Arabic letter.
Comment 6 ohad cohen 2013-01-15 07:34:28 UTC
i might be related to https://bugs.kde.org/show_bug.cgi?id=184399
Comment 7 Albert Astals Cid 2014-03-05 23:55:41 UTC
*** Bug 282849 has been marked as a duplicate of this bug. ***
Comment 8 Albert Astals Cid 2014-03-05 23:56:53 UTC
*** Bug 331785 has been marked as a duplicate of this bug. ***
Comment 9 Fahad Al-Saidi 2015-09-29 05:44:11 UTC
Here is a quick patch to fix this problem. 
https://git.reviewboard.kde.org/r/125442/

Thanks
Comment 10 Fahad Al-Saidi 2016-07-21 09:30:46 UTC
This bug needs retest against Poppler >= 0.40 because there of this:
https://bugs.freedesktop.org/show_bug.cgi?id=55977
Comment 11 Olivier Churlaud 2016-07-21 09:41:24 UTC
Using poppler 0.42.0, typing hebrew put the search box in right-to-left but I must write the word in left to right (so backward) so that it matches.
Comment 12 Fahad Al-Saidi 2016-07-21 09:47:37 UTC
Created attachment 100228 [details]
arabic text
Comment 13 Fahad Al-Saidi 2016-07-21 09:49:24 UTC
you can search using this word: "بسم" in attached arabic text pdf
if you find it, it means it is fixed in upstream otherwise the problem in okular.
Comment 14 Olivier Churlaud 2016-07-21 09:50:37 UTC
See my comment about hebrew:  it didn't work because of the said reasons.
Comment 15 Elad Hen 2016-11-21 15:05:01 UTC
This bug is still present in Mint Cinnamon 18 (and presumably in all of the Ubuntu 16.04 family). It should be noted that the similar bug in Evince, Atril and some others, that stemmed from Poppler, are fixed as of Ubuntu 16.04/ Mint 18.
Comment 16 Christoph Feck 2017-11-23 02:16:44 UTC
*** Bug 386468 has been marked as a duplicate of this bug. ***
Comment 17 Fahad Al-Saidi 2017-11-23 09:50:30 UTC
this bug also effect the copying the RTL text. the copied text is reversed.
Comment 18 Nate Graham 2018-02-04 16:09:08 UTC
Fahad submitted a patch for this, which I've migrated to Phabricator:

https://phabricator.kde.org/D10298
Comment 19 Fahad Al-Saidi 2018-02-08 08:38:30 UTC
I think the problem form QT interface for poppler. please see this bug 

https://bugs.freedesktop.org/show_bug.cgi?id=105015
Comment 20 Fahad Al-Saidi 2018-02-11 17:57:10 UTC
I think I've found where is the problem. It is from TextPagePrivate::correctTextOrder(), it sorts words & characters to be LTR using theses compareTinyTextEntityY & compareTinyTextEntityX.

This approach doesn't fit with RTL text.
Comment 21 Fahad Al-Saidi 2018-02-12 09:31:51 UTC
I proposed another patch to fix this bug, here

https://phabricator.kde.org/D10455
Comment 22 Laura David Hurka 2021-06-22 11:39:56 UTC
*** Bug 429869 has been marked as a duplicate of this bug. ***
Comment 23 mh-firouzjah 2022-05-28 15:41:18 UTC
same problem for another rtl language Persian.

Linux/KDE Plasma: 5.15.38-1-Manjaro(64-bit)
(available in About System)
KDE Plasma Version: 5.24.5
KDE Frameworks Version: 5.93
Qt Version: 5.15.3
Comment 24 Laura David Hurka 2022-08-05 14:36:29 UTC
*** Bug 442046 has been marked as a duplicate of this bug. ***
Comment 25 Laura David Hurka 2022-08-05 14:37:32 UTC
*** Bug 457448 has been marked as a duplicate of this bug. ***