Bug 445851 - Don't copy newlines within paragraph
Summary: Don't copy newlines within paragraph
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 21.08.3
Platform: Arch Linux Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-21 05:03 UTC by codingkoopa
Modified: 2023-04-12 14:30 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description codingkoopa 2021-11-21 05:03:56 UTC
SUMMARY
When a selection of text spanning multiple lines is copied, the newlines are included. This has the effect of including newlines in the middle of sentences, which is undesirable when copying text from the PDF to a new document.

STEPS TO REPRODUCE
1. Obtain a PDF containing a paragraph of text, such as this one: https://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf.
2. Copy an entire paragraph of text, or a selection within the paragraph spanning multiple times.
3. Paste the selection into a new document or text editor.

OBSERVED RESULT
Newlines used to break the content are preserved:

Adobe® Portable Document Format (PDF) is a universal file format that preserves all
of the fonts, formatting, colours and graphics of any source document, regardless of
the application and platform used to create it.

EXPECTED RESULT
Newlines used to break the content are not preserved:

Adobe® Portable Document Format (PDF) is a universal file format that preserves all of the fonts, formatting, colours and graphics of any source document, regardless of the application and platform used to create it.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux
KDE Plasma Version: 5.23.3
KDE Frameworks Version: 5.88.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
The pdf.js PDF viewer elides the newlines as I want, but butchers the spacing in seemingly unrelated ways:

Adobe® Portable Document Format (PDF) is a universal file format that preserves allof  the  fonts,  formatting,  colours  and  graphics  of  any  source  document,  regardless  ofthe application and platform used to create it.

Bug #359242 also discusses unwanted newlines in the clipboard, but this bug discusses the exclusion of newlines that *are* within the text selection.
Comment 1 Laura David Hurka 2021-11-21 15:06:58 UTC
> Newlines used to break the content are preserved:

This is not exactly what happens. The Okular user interface does not know about newlines or paragraphs, it only knows about the positions of individual letters. If a letter is below the previous one, it inserts a newline to the selection.

Besides that, I think this should not be different, at least not for PDF. If newlines are not copied, the selection will still contain hyphens. Like this: Because every-thing is on one line, it will be diffi-cult to remove the hyphens manual-ly afterwards.

;)
Comment 2 codingkoopa 2021-11-28 05:51:52 UTC
Thanks for the response! :)

I did have a feeling that there is more going on here than what meets the eye. This would still be convenient to have, but I understand if it's not worth the time to fix those edge cases.