Bug 402017

Summary: Cannot save PDF when loaded file has been deleted
Product: [Applications] okular Reporter: Jens <jens-bugs.kde.org>
Component: generalAssignee: Okular developers <okular-devel>
Status: RESOLVED FIXED    
Severity: grave CC: aacid, bugseforuns, chronoangel, grgbrn, grmat, joerge-e, jonas743, karl, kde-bugs, kingfati4, kishore96, luigi.toscano, mail, markjballard, martin.marmsoler, mathieu, nate, null, oliver.sander, pino, pv.bugzilla, richard.dwight, steve, winghongchan, yurchor
Priority: VHI Keywords: usability
Version: 1.5.3   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 23.04.1
Attachments: attachment-9638-0.html

Description Jens 2018-12-11 20:01:36 UTC
SUMMARY
I opened a PDF from a mail (i.e. it was downloaded to /tmp/messageviewer_XYZfoobar/whatever.pdf and then opened), and did some commenting and annotating. In the meantime, some cron job (I suppose) removed this folder from /tmp. 
Now I cannot "Save as ..." and I cannot "save" any more because Okular complains the file was "modified by another program".

I think since the PDF is still in memory (since I can see it and browse it and continue modifying it) it should be possible to write it to disk?

Using KDE Neon 18.04 with all updates applied.
Comment 1 Nate Graham 2018-12-11 23:02:29 UTC
When this happens, it should probably detect it and do a Save As operation rather than futilely try to save. Not sure if we could implement this everywhere in KIO or if every KDE app would need it patched in.
Comment 2 Albert Astals Cid 2018-12-12 23:02:26 UTC
Nate, please read the report more carefully, "Save As" doesn't work either.
Comment 3 Jens 2018-12-13 23:12:38 UTC
This is true. I can, however, "Print" to a new PDF file.
This is bad because it loses quality and embedded text and comments / annotations, but it proves the file is still available in memory and *could* be saved.
Comment 4 Albert Astals Cid 2019-01-06 19:12:58 UTC
*** Bug 402925 has been marked as a duplicate of this bug. ***
Comment 5 David Hurka 2019-05-25 00:06:52 UTC
Just discovered a class FileKeeper, which should do this.

Added in commit 6531398b919e58def8c6ba4711d20ea517147189, eleven years ago.

Just grep for #ifdef OKULAR_KEEP_FILE_OPEN
Comment 6 Nate Graham 2019-06-16 22:29:11 UTC
*** Bug 407583 has been marked as a duplicate of this bug. ***
Comment 7 Nate Graham 2019-06-16 22:29:23 UTC
*** Bug 408774 has been marked as a duplicate of this bug. ***
Comment 8 Richard Dwight 2020-01-23 15:06:56 UTC
(In reply to Nate Graham from comment #7)
> *** Bug 408774 has been marked as a duplicate of this bug. ***

I'm repeatedly running into this bug in my workflow - the most recent time after extensive annotation... (in which okular is excellent btw).  I had to reproduce annotations by hand - essentially lost work.  Extremely frustrating to know that the data is right there (can print, have the original pdf), but saving (including Save As) is forbidden.

Is there any chance of this being addressed?
Comment 9 Nate Graham 2020-01-29 19:16:48 UTC
More of a bug than a wishlist. Also, as this can cause data loss with no workaround, I'm upgrading the severity accordingly.

David, would you be interested in investigating why it doesn't work and submitting a patch to fix it?
Comment 10 David Hurka 2020-05-11 20:43:52 UTC
*** Bug 421342 has been marked as a duplicate of this bug. ***
Comment 11 bark mallard 2020-05-12 09:09:53 UTC
This bug caused Okular to lose pdf files opened from Firefox, using Okular v.1.3.3, installed from the standard Ubuntu 18.04.4 repository, for a fresh installation about a month ago.

Okular said it couldn't save pdfs because they had been modified by some other program. Those files were left open for a period of days before attempting to save them. The files were consequently lost.
Comment 12 Nate Graham 2020-05-12 19:52:42 UTC
Setting to VHI as this causes data loss.
Comment 13 David Hurka 2020-05-13 22:56:44 UTC
Please look at the following. It seems like keeping the file was implemented in 2008, specifically with Firefox in mind. But it broke something else, so it was disabled using an #undef. There is no way anymore to enable it with a compile switch or through an environment variable. It was also only implemented for UNIX OSes.

@Pino: Can you tell more about the problem? Can it be fixed, or do we need another approach than creating a file handle?

If we need another approach, I vote for creating a temporary full copy of the file, if
a) the file is smaller than a limit, set by the memory performance mode
b) (optionally) the file is from a remote location or has a path that is likely to be temporary (e. g. /tmp/, or **/.*/)

---- APPENDIX A: This is a self-descriptive heading for an appendix ----

commit 5b5b4cabe1d8e07392641ba9cf574eda4678b12f
Author: Pino Toscano <pino@kde.org>
Date:   Sat Jun 7 21:52:11 2008 +0000

    disable the open handle on the file, it breaks the file watching
    
    CCBUG: 163363
    
    svn path=/trunk/KDE/kdegraphics/okular/; revision=818144

diff --git a/part.cpp b/part.cpp
--- a/part.cpp
+++ b/part.cpp
@@ -182,8 +182,5 @@
 #undef OKULAR_KEEP_FILE_OPEN
-#if defined(Q_OS_UNIX)
-#  define OKULAR_KEEP_FILE_OPEN
-#endif
 
 #ifdef OKULAR_KEEP_FILE_OPEN
 static bool keepFileOpen()
 {

commit 6531398b919e58def8c6ba4711d20ea517147189
Author: Pino Toscano <pino@kde.org>
Date:   Sat Jun 7 21:08:41 2008 +0000

    Keep an open file handle on the local file currently open: this way, we can get it back from it,
    in case for some reason (read: Firefox blindly removing temporary files) it gets "deleted".
    Of course, this works (and thus it is activated) only on UNIX systems (as the file is not deleted for real until there are open handles on it).
    BUG: 163363
    (If not wanted, this behavior can be disabled by export'ing OKULAR_NO_KEEP_FILE_OPEN to 1.)
    
    Also, in case the local file gets deleted but the real document is remote, use its (remote) URL for the copy.
    
    svn path=/trunk/KDE/kdegraphics/okular/; revision=818136

diff --git a/part.cpp b/part.cpp
--- a/part.cpp
+++ b/part.cpp
@@ -124,0 +182,8 @@
+#undef OKULAR_KEEP_FILE_OPEN
+#if defined(Q_OS_UNIX)
+#  define OKULAR_KEEP_FILE_OPEN
+#endif
+
+#ifdef OKULAR_KEEP_FILE_OPEN
+static bool keepFileOpen()
+{
Comment 14 Nate Graham 2020-05-13 23:46:07 UTC
Temporarily wasting some memory or disk space is a far better alternative than data loss. :)
Comment 15 Yuri Chornoivan 2020-05-14 06:21:36 UTC
(In reply to David Hurka from comment #13)
> Please look at the following. It seems like keeping the file was implemented
> in 2008, specifically with Firefox in mind. But it broke something else, so
> it was disabled using an #undef. There is no way anymore to enable it with a
> compile switch or through an environment variable. It was also only
> implemented for UNIX OSes.
> 
> @Pino: Can you tell more about the problem? Can it be fixed, or do we need
> another approach than creating a file handle?

I guess it will break Kile and interactive editing of TeX. You will not be able to recompile TeX and watch the results on the side panel without reloading Kile itself. There will be a large hassle if you uncomment this (much larger than from poor Firefox users). ;)
Comment 16 Pino Toscano 2020-05-14 07:09:54 UTC
(In reply to David Hurka from comment #13)
> @Pino: Can you tell more about the problem? Can it be fixed, or do we need
> another approach than creating a file handle?

See bug 163363.

Holding an open descriptor on the document fixes the case that the document disappears (hello firefox, and apparently hello kmail too...) while it is opened in okular.

The big downside, which is also what I mentioned, is that file watching breaks, and reading the commit should give an hint on why: the handle is kept for the _old_ file, so when some PDF producer (latex, etc) creates a new file by rename()ing the new temporary file on the old name (which is an atomic operation), you don't notice that.
At least, I remember there were problems in the results you get from file watching because of this.

While fixing firefox to not remove files under applications' feet is not an easy task (12 years and still no change on that), why don't we fix at least our own products (kmail)?

Enabling FileKeeper again will not fix this situation, as you will trade one issue with another.

(In reply to Nate Graham from comment #14)
> Temporarily wasting some memory or disk space is a far better alternative
> than data loss. :)

This won't behave nicely in editing sessions e.g. in kile or in other document producers, where you edit the sources and save+recompile to PDF often, resulting in a big churn of file creations/removals, and in slowdown due to the blocking file saving.
Comment 17 Nate Graham 2020-05-14 15:19:27 UTC
(In reply to Pino Toscano from comment #16)
> This won't behave nicely in editing sessions e.g. in kile or in other
> document producers, where you edit the sources and save+recompile to PDF
> often, resulting in a big churn of file creations/removals, and in slowdown
> due to the blocking file saving.
Slowdowns and inefficiency are better than data loss.

Hopefully we can come up with a way to avoid both, though.
Comment 18 bark mallard 2020-05-14 15:38:13 UTC
Created attachment 128456 [details]
attachment-9638-0.html

Just came across another variation of this problem. I have many documents
open on many workspaces, managing various inter-related workfklows. It is
inevitable in this scenario that I sometimes open the same document twice.
I just did this with Okular and ended up with two versions of an annotated
pdf, one of which could not be saved. Okular said it could not save the pdf
because it had been modified by another program (it had been modified by
another instance of itself, actually). I found the open version and copied
the annotations across.

Text editors usually tell me when this happens and might give me a choice
whether to proceed in read-only or edit mode. (If they were very clever,
which they are not, they would tell me which workspace and which program
had the open doc, because it could be any number of editors in any number
of places). They also give me a choice of actions when the underlying doc
is modified by the other program: ie. ignore changes / reload document from
changed version / cancel and ignore the mismatch completely (therefore to
take the option, not given, of saving the now different version under a
different name). If they were really clever, they would give me the option
of viewing the differences in something comprehensible to humans (ie. not
diff), and pick which ones to keep. Or to merge the two documents and view
only deletes or conflicts before deciding what to do with them, But that is
another matter.

This has come up with Okular because I've started using it to annotate
pdfs. (Hurrah! at long, long, long, long last. this is such a relief! pdfs
- the most ignorant (albeit under-appreciated) form of digital document
known to man, that are like pieces of paper that people are allowed to read
only with their hands tied behind their backs, at last given primitive
intelligence). Okular does annotation very well.
Comment 19 Albert Astals Cid 2020-05-14 19:53:04 UTC
>  the handle is kept for the _old_ file, so when some PDF producer (latex, etc) creates a new file by rename()ing the new temporary file on the old name (which is an atomic operation), you don't notice that.
> At least, I remember there were problems in the results you get from file watching because of this.

This could be workaroundable (or already is?) by watching the parent dir as the parent dir would see the file disappear and be created. I remember some case in which this was neeed and i think it was working, but as said, needs somebody to test it extensively (ideally adding autotests :))
Comment 20 Albert Astals Cid 2020-05-14 19:56:32 UTC
The other thing is trying to figure out if we can just use poppler's file handle to save the data (that should still be open), last time i tried it didn't seem to be working well at all, but i have been known to make mistakes
Comment 21 David Hurka 2020-05-14 23:04:26 UTC
I still don’t completely understand the issue with FileKeeper. Is it:
a) FileKeeper keeps Okulars own file handle opened, so Okular itself will not notice changes to the file through *this* handle?
b) FileKeeper opens another file handle, and such an open handle prevents others (including other Okular code) to monitor the file?

In case of a, the solution would be that FileKeeper opens a new file handle, right? Then Okular could continue to monitor the file through the old handle.

In case of b and other OSes, it is probably necessary to copy the file to another temporary file, if considered appropriate.

Besides that, how does saving the file after annotating work? There must be some function in Poppler to save the loaded and modified file, right?
Comment 22 Albert Astals Cid 2020-05-16 16:10:48 UTC
(In reply to David Hurka from comment #21)
> Besides that, how does saving the file after annotating work? There must be
> some function in Poppler to save the loaded and modified file, right?

I don't understand what "save the loaded and modified file" means, sorry my brain can't parse it.

But if you grep for saving in the okular poppler generator you'll find PDFGenerator::save. You can pull the thread from there.
Comment 23 David Hurka 2020-06-15 20:21:34 UTC
> But if you grep for saving in the okular poppler generator you'll find
> PDFGenerator::save. You can pull the thread from there.

Thanks. The related error message for annotated documents is “There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your changes will be lost, because the file can no longer be saved. Do you want to continue closing the file? [Continue Closing] [Abort Closing]“, which is quite frustrating in itself.

> > Temporarily wasting some memory or disk space is a far better alternative
> > than data loss. :)
> 
> This won't behave nicely in editing sessions e.g. in kile or in other document
> producers, where you edit the sources and save+recompile to PDF often, [...]

Good point. But if okularpart runs in reader mode, there is no need to save the document, so we don’t need an extra backup, right?

I wanted to suggest a size limit for such a backup, which could be linked to the Okular memory setting. Suggestion:
Low: Backup files below 2MB; if they are not in the home directory below 10MB
Normal (default): Backup files below 10MB; if they are not in the home directory below 20MB
Agressive: Backup files below 50MB
Greedy: Backup all files which fit into the free memory

And I just want to add: It’s not a problem in Firefox, but in Okular. Firefox says “Hey Okular, please open this file and make the user happy!”, and not “Hey Okular, please open this file and make the user happy. Oh, and please tell me when you don’t need it anymore!”
Comment 24 Albert Astals Cid 2020-06-15 20:27:29 UTC
(In reply to David Hurka from comment #23)
> And I just want to add: It’s not a problem in Firefox, but in Okular.
> Firefox says “Hey Okular, please open this file and make the user happy!”,
> and not “Hey Okular, please open this file and make the user happy. Oh, and
> please tell me when you don’t need it anymore!”

We disagree, from the moment firefox tells me "open this file", the file is mine, in my opinion anything firefox does to it it's bad manners
Comment 25 Nate Graham 2020-06-15 20:34:47 UTC
Is there a technically feasible way for Firefox to do something better?
Comment 26 Pino Toscano 2020-06-15 20:48:22 UTC
(In reply to Nate Graham from comment #25)
> Is there a technically feasible way for Firefox to do something better?

In the same way XDG desktop environment do it: by handling the XDG base directory and XDG desktop files specifications to detect which applications can handle a certain content type and/or a certain remote protocol.

See also https://bugzilla.mozilla.org/show_bug.cgi?id=415441#c12 -- maybe there is a better non-closed bug asking this, and if not it would be good to open it.
However, in general Mozilla does not care that much about integrating properly Firefox/etc in Unix desktops, as their main share of users is on Windows...
Comment 27 Nate Graham 2020-06-15 21:16:57 UTC
That's a shame. Thanks for the history.

If Firefox isn't interested in playing ball, then our only realistic option is work around it in Okular. This isn't ideal, but seems preferable to letting user data be destroyed for the sake of a principle (making the fix in the right place).
Comment 28 Pino Toscano 2020-06-15 21:45:05 UTC
(In reply to Nate Graham from comment #27)
> If Firefox isn't interested in playing ball, then our only realistic option
> is work around it in Okular. This isn't ideal, but seems preferable to
> letting user data be destroyed for the sake of a principle (making the fix
> in the right place).

The "principle" here is one of the factors, and not one of bigger ones.

This behaviour (i.e. remove the temporary files when closing tabs or exiting) affects _any_ application launched by Firefox: okular, kwrite/kate, gwenview, you name it. Adding the same workarounds in any application potentially launched by Firefox and used by users means adding more complexity to those applications (like proven by the discussion in this bug, and my attempt 12 years ago, bug 163363), and even more so when each application usually handles file reading in its own way, making even harder to "share" a potential workaround.

Also, what if you try to talk with Mozilla people about this, for example?
Comment 29 Sven Brauch 2020-06-15 22:05:23 UTC
I was repeatedly bitten by this issue in the past as well and I don't quite get it. Can Okular not simply keep an open file handle for the document, which results in ext4 not removing the data from the disk, even if the original file is removed (i.e. like file handles always work on Linux)?

As a fact, that is already the case (as I think Albert considered a few comments above), as you can see in lsof. Okular can also still load the data for not-yet-rendered pages, so the data is clearly there and okular can access it. If it is not possible to re-use that file handle from the save logic, maybe just create another one?
Comment 30 Sven Brauch 2020-06-15 22:14:27 UTC
Regarding other applications: most applications (e.g. kate) are not affected by this issue, since they just load the whole file into memory, and save it from their in-memory buffer if requested. Gwenview also seems to do this, even though it closes the file when deleted, it can still save an e.g. rotated image, apparently from memory (it has no open file handle).

Okular is quite special here with its "stream from disk" behaviour, from a quick look around I didn't find another commonly used application doing this.
Comment 31 Sven Brauch 2020-06-16 21:16:00 UTC
I tried to just fix this now, and I am more confused than ever about this issue. Just removing the error message and early return makes Save As work flawlessly. Why is the error message even there?
Comment 32 Nate Graham 2020-06-18 14:20:35 UTC
Perhaps a relic from the past when things did not used to work properly? If you send a merge request, I can help test it.
Comment 33 Sven Brauch 2020-06-18 16:22:46 UTC
There's a bit of work around that place needed to make "Save" work properly ("Save as" does automatically, but not "Save"), so I'd like to hear Albert's opinion first ;)
Comment 34 Albert Astals Cid 2020-06-18 20:46:15 UTC
If you can make it work without breaking any features, i don't see the problem.
Comment 35 David Hurka 2020-07-23 09:35:20 UTC
Below is the commit that made Okular refuse to save if the file is modified. Looks like Sven Brauch was basically trying to revert this commit, but we should first figure out what’s the problem with poppler.

commit 559836c39270506e1f7c7b58bcf6b088a097b246
Author: Albert Astals Cid <albert.astals.cid@kdab.com>
Date:   Fri Nov 17 14:19:12 2017 +0100

    Give warnings when the file is modified externally
    
    Summary:
    Unfortunately, poppler (the only backed that supports saving) is not able
    to save properly if the file is modified by a third party while it is opened
    
    So we give the user a warning saying things went wrong and give him the option
    to not reload/close, that way if there was something very important in the annotations
    she added she can try to save them (even if by copy&paste the contents to a third program)

https://phabricator.kde.org/D8863
Comment 36 David Hurka 2020-07-23 13:07:34 UTC
If some generators can’t keep their own document file, we should probably consider every file a remote file, i. e. make a temporary copy of every file and open that instead. Right?
Comment 37 Nate Graham 2020-07-23 18:31:27 UTC
Hmm, that's an idea, yeah.
Comment 38 David Hurka 2020-07-24 12:33:36 UTC
There exists this code in part.cpp:

> if (m_document->canSaveChanges()) {
>     [...]
> } else {
>     // If the generators doesn't support saving changes, we will
>     // just copy the original file.
>     if (isDocumentArchive) {
>         [...]
>     } else {
>         // duh, our local file disappeared...
>         if (!QFile::exists(localFilePath())) {

So if we make all generators access a local copy, we might either move the problem or solve a second problem at the same time. I vote for the second option. ;)

Since there is Generator::loadDocumentFromData(QIODevice ...), it should be 
flexible to implement reading from a local copy. But since SaveInterface::save() exists only in QString version, we either need to use real temp files (QTemporaryFile?), or redefine this parameter to be ignored and write to the previously given QIODevice instead. I vote for the first option. ;)

With always accessing a local copy, we probably won’t need swapBackingFile() anymore. swapBackingFile() also exists only in QString version.
Comment 39 Albert Astals Cid 2020-07-24 15:01:00 UTC
I don't think copying the file before opening is acceptable.

It uses my hard disk more than it should, it wears&tears my hard disk more than it should and it moves us away from being able to load as you go PDF linearized files.

I mean think of this message you would have to show the user "Sorry, we can't open your file because you don't have enough space on disk".

And the user face would be all like "Dude the file is already on disk"
Comment 40 David Hurka 2020-07-24 18:50:44 UTC
> it moves us away from being able to load as you go PDF linearized files.
Hmm, ok.

Since this is not closed as INTENDED, I assume there must be another solution.

A) Keep the current swapBackingFile etc. mechanisms and make a backup file condititonally, like in #23:

> I wanted to suggest a size limit for such a backup, which could be linked
> to the Okular memory setting. Suggestion:
> * Low: Backup files below 2MB; if they are not in the home directory
> below 10MB
> * Normal (default): Backup files below 10MB; if they are not in the home
> directory below 20MB
> * Agressive: Backup files below 50MB
> * Greedy: Backup all files which fit into the free memory
(Remote files are always backed up, because of ReadOnlyPart.)

B) Disable editing tools (annotations, forms) when the file is not a remote file.
Comment 41 David Hurka 2020-07-24 19:35:51 UTC
> I mean think of this message you would have to show the user "Sorry, we
> can't open your file because you don't have enough space on disk".

If there is not enough space to create a temporary copy, how should the file be saved after editing?

If the file was streamed-from-disk in that case, it would have to be deleted in order to save the edited version. But then, nothing can be saved anymore, so I don’t see the point to not create temporary copies.

So I make another suggestion:

C) Create temporary copies of any file. But if there is not enough space to make a temporary copy, consider the original file to be the temporary copy (but remember to not delete it).

D) Like C, but also remove swapBackingFile. I think full reload instead of swapBackingFile is acceptable in this corner case. (If that would lose e. g. annotations, the user could still save to .okular first.)

...

Hmm, okay, the user could still save to a remote location.
Comment 42 Albert Astals Cid 2020-07-24 21:46:10 UTC
(In reply to David Hurka from comment #40)
> > it moves us away from being able to load as you go PDF linearized files.
> Hmm, ok.
> 
> Since this is not closed as INTENDED, I assume there must be another
> solution.

I would close it as INTENDED, you deleted the file, it's your fault, you should not have deleted it. If you did not delete the file, blame whoever did it.

But I understand the world is imperfect and it would be useful if we could work in that situation.

> 
> A) Keep the current swapBackingFile etc. mechanisms and make a backup file
> condititonally, like in #23:
> 
> > I wanted to suggest a size limit for such a backup, which could be linked
> > to the Okular memory setting. Suggestion:
> > * Low: Backup files below 2MB; if they are not in the home directory
> > below 10MB
> > * Normal (default): Backup files below 10MB; if they are not in the home
> > directory below 20MB
> > * Agressive: Backup files below 50MB
> > * Greedy: Backup all files which fit into the free memory
> (Remote files are always backed up, because of ReadOnlyPart.)
> 
> B) Disable editing tools (annotations, forms) when the file is not a remote
> file.

Personally, i would suggest trying to figure out why poppler fails and fix it other than doing all strange things that need the user to read lots of stuff that has "if this and if that and not that"
Comment 43 Albert Astals Cid 2020-07-24 21:47:20 UTC
(In reply to David Hurka from comment #41)
> > I mean think of this message you would have to show the user "Sorry, we
> > can't open your file because you don't have enough space on disk".
> 
> If there is not enough space to create a temporary copy, how should the file
> be saved after editing?

Getting a "not enough space" on saving, is totally normal and expected, getting it on opening is the weirdest thing ever.

> If the file was streamed-from-disk in that case, it would have to be deleted
> in order to save the edited version. But then, nothing can be saved anymore,
> so I don’t see the point to not create temporary copies.
> 
> So I make another suggestion:
> 
> C) Create temporary copies of any file. But if there is not enough space to
> make a temporary copy, consider the original file to be the temporary copy
> (but remember to not delete it).
> 
> D) Like C, but also remove swapBackingFile. I think full reload instead of
> swapBackingFile is acceptable in this corner case. (If that would lose e. g.
> annotations, the user could still save to .okular first.)
> 
> ...
> 
> Hmm, okay, the user could still save to a remote location.
Comment 44 David Hurka 2020-07-25 16:24:28 UTC
> Personally, i would suggest trying to figure out why poppler fails and
> fix it other than doing all strange things that need the user to read
> lots of stuff that has "if this and if that and not that"

I looked at what happens when Okular tries to save a *modified* *PDF* document. In this case, the Generator is able to save the document including the changes to a new file.

<investigation>

“->” will denote “function calls function”.
Poppler_qt5 will denote the Qt5 interface of Poppler, as Okular sees it.
Poppler will denote internal Poppler stuff.

1.1. Part requests to save the document to a local file:

Part::saveAs() -> ... -> PDFGenerator::save()

1.2. PDFGenerator::save() creates a new Poppler_qt5::PDFConverter from the existing Poppler_qt5::Document, and uses it to save the document:

PDFGenerator::save() -> Poppler_qt5::Document::pdfConverter()->convert()

1.3. Poppler_qt5::PDFConverter::convert() creates a Poppler::QIODeviceOutStream, which is a  Poppler::OutStream, on the local output file. Then it lets the document be saved in this OutStream:

Poppler_qt5::PDFConverter::convert() -> Poppler::PDFDoc::saveAs() -> Poppler::PDFDoc::saveIncrementalUpdate()

1.4. Poppler::PDFDoc::saveIncrementalUpdate() creates a copy of “str” (see 2.2), creating a new Poppler::FileStream.

1.5. Poppler::PDFDoc::saveIncrementalUpdate() first makes a verbatim copy from this FileStream to the OutStream. Later it modifies the output file within the OutStream to reflect the modifications.

So where does the input data Poppler::FileStream come from?

2.1. Document::openDocument() calls the PDFGenerator to open the PDF document from a local file. (It can also pass the data in a QByteArray, but that is only relevant when the document comes from stdin.)

Document::openDocument() -> ... -> PDFGenerator::loadDocumentWithPassword() -> Poppler_qt5::Document::load() -> new Poppler::DocumentData() -> new Poppler::PDFDoc() -> Poppler::GooFile::open() -> poppler/gfile.cc: openFileDescriptor() -> open()

2.2. Poppler::PDFDoc::PDFDoc() makes the new GooFile accessible as Poppler::FileStream, which is a Poppler::BaseStream, and stores it as “str”.

So what does reading from this Poppler::FileStream do?

3.1. As seen in 1.5, Poppler::PDFDoc::saveIncrementalUpdate() reads the whole input file from the FileStream:

Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::getChar() -> Poppler::FileStream::fillBuf() -> Poppler::GooFile::read() -> unistd.h: pread()

So what happens in 1.4, when the stream is copied?

4.1. Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::copy() -> new Poppler::FileStream()

4.2. copy() created a new stream on the same GooFile, but with an empty buffer. This means, any first call to FileStream::getChar() will call pread() on Poppler’s input file handle.

</investigation>

So there we are: The Poppler feature, which we already discovered as stream-from-disk, is implemented by calling pread(). pread() relies on the integrity of Poppler’s input file handle. This means Okular is responsible to guarantee the integrity of the file handle. Since Okular passes a path to a local file, we rely on the OS, which should not modify or delete the input file while Poppler has its file handle on it.

This works fine in these cases:
A) Okular opens a remote file. ReadOnlyPart will make a temporary file as local copy, so integrity is guaranteed.
B) Okular opens a persistent local file (e. g. ~/Mess/LM555.pdf). Integrity is guaranteed as long as the user does not modify the file intentionally.

This works not fine in this case:
C) Okular opens a temporary local file, because Firefox told it to do so. Integrity is not necessarily guaranteed, because Firefox will delete this file soon. Before commit 559836c3, Okular let Poppler try to read the deleted file through its still existing file handle. As seen in 4.2, this should work fine if the system uses e. g. ext4.

This works not at all in this case:
D) Okular opens a file from a thumb drive, and later the thumb drive fails. Because the data is absolutely unavailable now, Poppler can’t read any data through its file handle .

Apparently we are mostly concerned about case C. How about letting PDFGenerator try to save the file even if the source file is detected to be modified? If PDFGenerator fails, we can inform the user that the PDF backend is to blame.
Comment 45 Albert Astals Cid 2020-07-25 22:54:40 UTC
This is what the old posix says (or at least i remember/understood from my university years): "Once you have a file descriptor from a file open, the file can disappear, change, whatever, you will still read the contents of the file as if it was when you opened the file descriptor".

But either that has changed or i misunderstood it, because that is not what happens, see https://i.imgur.com/DaSS7Ad.gif where i change a file after having opened it, and the read after having changed the file gives me the changed contents and not the original ones.

If you want the code it's http://paste.debian.net/1157802/

Note if I change it with vi instead of kate, "it works", i.e. gives me the old contents instead of the new, but the fact that "it can fail", means we can't trust it to do A nor B.
Comment 46 David Hurka 2020-07-26 11:28:25 UTC
This means reverting 559836c3 would make Okular work nicely with Firefox in many cases, but probably not all cases.

I think we can agree on some priorities now?
1. Work nicely with Firefox in many cases (I. e. allow late saving)
2. Work nicely with Firefox in all cases
3. Linearized PDFs
4. Don’t use more memory than needed (I. e. allow stream-from-disk)
5. These are the priorities that come to my mind now, maybe you have others?

To implement 3, we would need to bypass ReadOnlyPart::openUrl(), which downloads the whole file first. Additionally we need to teach browsers to pass the remote URL to Okular instead of downloading the document first.

To implement 2, we need to create backup files for files which are suspected to vanish.

2 is in conflict with 3 and 4, so we need to offer configuration options and/or to actively use swapBackingFile.
Comment 47 Sven Brauch 2020-07-26 11:37:10 UTC
Sorry for not checking back, I had a look; for PDF it actually works perfectly by just removing the error message, but for other file types (e.g. PNG) we would need to keep the FD open manually. There actually already is code for that, which needs a slight refactoring to work correctly though, which is where I stopped -- sorry, need to pick that up again.

@Albert: I think you misunderstood. An open file descriptor refers to a specific file, which is basically ref-counted by all open file descriptors and all file names in the file system hierarchy referring to it. POSIX guarantees you that it won't swap this *pointer* behind your back, but not that you always get the same *contents*. So for modifying files, it depdends on how applications do that: if they do open("filename"), write(), close() you get the new contents; if they do open("filename.tmp"), write(), close(), rename("filename.tmp", "filename"), you get the old contents, because the new contents are in a *different* file (which now happens to have the same name).

So the behaviour is clear, well-defined and reliable, it just depends on what applications do.
Comment 48 David Hurka 2020-07-26 21:54:34 UTC
Sven wrote:
> if they do open("filename.tmp"), write(), close(),
> rename("filename.tmp", "filename"), you get the old contents,
> because the new contents are in a *different* file (which now
> happens to have the same name).
> 
> So the behaviour is clear, well-defined and reliable,
Well, it’s not necessarily reliable. On your ext4 home partition it should work, but if you use FAT for some reason, or even the thumb drive fails, the data is still lost.

> but for other file types (e.g. PNG) we would need to keep the FD
> open manually. There actually already is code for that, which needs
> a slight refactoring to work correctly though,
I think for generators which can’t save(), we should just go the KISS way and always make a temporary copy. Most document types other than PDF need (very) much less space on disk than we need space on RAM to render a single viewport. (E. g. PNG doesn’t have 30 other pages, TXT is always very compact.)

Which code do you mean? FileKeeper didn’t turn out to work very well, but if you understand how to fix it... :)
Comment 49 Sven Brauch 2020-07-26 22:06:14 UTC
> On your ext4 home partition it should work, but if 
> you use FAT for some reason, or even the thumb 
> drive fails, the data is still lost.

Ok, but now we have strayed quite far from the original goal. Those are not common cases on Linux. For FAT or NTFS, one would need to check if the file can even be removed by default, I think these use exclusive locking as standard?

Re. copying, to be honest I have never in my life seen anyone view any file except .pdf (or maybe .ps) in okular, nor done that myself, so I guess the solution for these doesn't matter that much ...
Comment 50 Luigi Toscano 2020-07-26 22:11:31 UTC
We have had a relevant amount of people asking and reporting issues with the comic-book files, for example. Let's not have a backend-specific solution, please.
Comment 51 Albert Astals Cid 2020-08-04 22:30:19 UTC
@Sven thanks for the refresher :)

Honestly the only good solution i see is getting the file keeper to work, and hopefully after 12 years file watching has improved and the file keeper doesn't break it.

David, which issues have you had with the file keeper?
Comment 52 Phil 2020-08-20 09:22:55 UTC
0. To ensure we have properly diagnosed the problem:
1. This error is easily reproduced,
2. Open the same file twice (simultaneously). 
   Make changes in both copies, and save one. 
3. It is not surprising that okular refuses to save over the changed file.
4. It IS surprising that the file cannot be saved at all.

5. In my test case, okular uses 40MB to view an 0.8MB file. It would be at most a 5% increase in RAM to keep the whole file in memory. Perhaps do that for small files? 

6. In the meantime, perhaps a change to the error message to clarify (3&4) above:

CURRENT VERSION:
There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your changes will be lost, because the file can no longer be saved.
Do you want to continue reloading the file?

PROPOSED VERSION:
There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your past *and future changes* will be lost, because the file can no longer be saved, *not even to a new file*.
Do you want to continue reloading the file? 

OPTIONAL SUGGESTION: 
(Consider manually copying over your changes to a freshly opened copy of the file.)
Comment 53 Phil 2020-08-21 11:35:55 UTC
I propose the bug report be titled, "Cannot save PDF when loaded file has been deleted *or touched by any other process*."

To reproduce: 
1. open $FILE in okular. 
2. Highlight a word. 
3. touch $FILE

Result: "There are unsaved changes, and the file '...' has been modified by another program. Your changes will be lost, because the file can no longer be saved. Do you want to continue reloading the file?"

Note: Okular takes 800 MB to render a 5.4 MB file I'm reading; keeping it all in memory would require *at most* 0.7 % more memory. Why not do that if memory is available?
Comment 54 Albert Astals Cid 2020-12-22 20:51:30 UTC
*** Bug 430690 has been marked as a duplicate of this bug. ***
Comment 55 Steve Kelem 2020-12-24 23:00:16 UTC
This bug has been languishing for two years. 
Can't you apply the fixes you have?
Comment 56 Nate Graham 2021-01-06 05:19:37 UTC
*** Bug 431198 has been marked as a duplicate of this bug. ***
Comment 57 Nate Graham 2021-01-06 05:20:19 UTC
Can we please put our heads together and come up with a solution for this issue?
Comment 58 mateMat 2021-01-06 08:35:20 UTC
Yes this bug is a pain, I lost a lot of work in annotations and comments yesterday.

Since I am not a KDE dev and not a software developper in general, I can only give you a external observation of this nasty bug.

I think it is a 2 factors conjecture that lead to this bug: 
1. The okular annotation tool seems to be detected by the main instance as modifier from another program/stack.
I have seen that okular is having multiple entries in the program list, this must be due to the different back-ends that run it.
These back-ends should be considered as the same software in plasma/filekeeper. This might solve the problem, but the technicalities are beyond my proficiency in C.Sc.

2. The background job should look for OKULAR_KEEP_FILE_OPEN before purging /tmp files

What about keeping the file in memory (/tmp) as long as okular has not closed it and if the file is detected in /tmp during a save, ask to save it in the ~/ folder.
I think it is the way Preview works.
Comment 59 Karl 2021-01-08 05:19:15 UTC
This bug is a pain.. 

I found a way to recover using PV .. a work-around

1) find the process+fd by looking directly in /proc:

ls -al /proc/*/fd/* 2>/dev/null | grep {filename}

2) Then with pv thrown in:

tail -c +0 -f /proc/{procnum}/fd/{fdnum} | pv -s {expectedsize} > {recovery_filename}

You may need to Ctrl-C when done (ls /proc/{procnum}/fd/{fdnum} will tell you that the file no longer exists)), but if you know the exact size in bytes, you can use pv -S to make it exit when the count is reached.
Comment 60 Yuri Chornoivan 2021-01-29 10:37:38 UTC
*** Bug 432271 has been marked as a duplicate of this bug. ***
Comment 61 Karl 2021-01-29 19:51:37 UTC
See post 59 - as this works - I've used it a few times - it is obvious that it can be fixed within the program. 

At the worst, there would be some system call - more likely the library calls to do this already exist within the dependencies. 

So this bug is confirmed and fixable.
Comment 62 kingfati4 2021-04-15 23:28:55 UTC
(In reply to Karl from comment #59)
> This bug is a pain.. 
> 
> I found a way to recover using PV .. a work-around
> 
> 1) find the process+fd by looking directly in /proc:
> 
> ls -al /proc/*/fd/* 2>/dev/null | grep {filename}
> 
> 2) Then with pv thrown in:
> 
> tail -c +0 -f /proc/{procnum}/fd/{fdnum} | pv -s {expectedsize} >
> {recovery_filename}
> 
> You may need to Ctrl-C when done (ls /proc/{procnum}/fd/{fdnum} will tell
> you that the file no longer exists)), but if you know the exact size in
> bytes, you can use pv -S to make it exit when the count is reached.


This bug has caused me a lot of troubles, lost nerves and data, incredibly frustrating that this is ongoing for 3 years. I'm happy you were able to find a workaround. Can you maybe give a further explanation? Are you supposed to close Okular throughout? I tried copying into another directory using {recovery_filename}, but the progress chart does not move. I'm not able to create files inside that folder either. It keeps being stuck at 0Bytes and does not change.
Comment 63 Nate Graham 2021-04-16 13:15:29 UTC
Albert, we really need to fix this. Can we come up with a path forward?
Comment 64 Unknown 2021-04-17 22:19:36 UTC
I too have encountered this bug a lot, as recent as this morning where I
almost lost a PDF I had tracked down.  I’m not knowledgeable in Qt or
C++, but I’m looking around in the codebase nevertheless, in the hopes
that I spot something useful. I see that, in document_p.h, there’s this
definition

    QVector<Page *> m_pagesVector;

which probably does hold the data necessary to save even if the original
document is gone.

Now, as an Emacs user I observe the following patterns:

- A file that looks like ‘.#<filename>’ is created, this is a lock
  file, used to prevent simultaneous editing

- Emacs listens on the target file ‘<filename>’ using file system
  notification libraries. If you edit a buffer that visits a file that
  has been edited outside of Emacs, it will query you, asking you to
  either confirm the edit, cancel the edit, or to ‘revert the buffer’
  before editing, which means to update the contents of the buffer
  re-reading the file from disk.

- When enabled, Emacs writes backup files after each save, which look
  like, without customisations, ‘<filename>~’.  This is governed by a
  variety of environment and in-Emacs variables, but the created file
  contains the version of the edited file as it was before it was saved,
  e.g.:
  - open file.txt, containing "hi"
  - append ", john"
  - save
  - file.txt~ is created, containing "hi"
  - file.txt  is overwritten, containing "hi, john"

- The contents of a buffer is separate from the contents of the file,
  the data is spared in memory even if the file is deleted, so you can
  save a buffer into a file regardless.

I doubt it’s desirable to copy this wholesale to Okular, but the
following could be done:

- Upon opening a file, it’s wholly read into a buffer as a string of
  bytes, which acts as a buffer between the Document class, and the on
  disk file. This would ensure that data is always in memory, but
  without taking up too much space, as indicated in the 53rd comment in
  the thread:

Phil 2020-08-21 11:35:55 UTC writes:
> Note: Okular takes 800 MB to render a 5.4 MB file I'm reading; keeping
> it all in memory would require *at most* 0.7 % more memory. Why not do
> that if memory is available?
(https://bugs.kde.org/show_bug.cgi?id=402017#c53)

- Okular keeps tabs on the file through filesystem notification
  libraries like inotify, kqueue, etc., and alerts the user if the
  file on disk is changed, asking them to take action.

- If the user has enabled auto-reloading the files, Okular filters for
  deletion events and apparently destructive events (i.e. new contents
  of the file do not constitute a valid document, or file is deleted, or
  new file is empty, or user does not have read/write permissions
  anymore), and the user is again queried as to what to do.

- When the user attempts to exit the application, Okular checks if the
  file is somehow destroyed or modified, and offers the user one last
  chance to record the file as it appears in Okular.

- These queries would look like the example modal below, with slight
  variation depending on the triggering event:

,----
| The file ‘/home/u/some.pdf’ has been modified outside Okular.  Before
| further modifications, you need to decide how to respond to these
| changes.
| 
| [_S_ave as a copy...]  [_C_ancel]  [Edit anyways...]  [Overwrite]
`----

- ‘Save as a copy...’ means the document as it currently is in Okular is
  saved under a new name, and Okular switches to that file for the
  current document.

- ‘Cancel’ cancels the attempted modification by the user.

- ‘Edit anyways...’ allows the user to continue editing the document,
  without creating a copy.  The user is responsible to ‘Save’ or ‘Save
  as...’ the document later.  This option is dangerous, so does not
  present a mnemonic keyboard shortcut.

- ‘Overwrite’ will overwrite the file the current document targets with
  the contents of the document as they currently appear in Okular.  This
  option too does not get a mnemonic keyboard shortcut, as it’s
  destructive.

The above would perfectly remedy the problem at hand, and below are some
possible enhancements:

- ‘VERSION_CONTROL’ and ‘SIMPLE_BACKUP_SUFFIX’ could be respected, as
  they appear in "(coreutils) Backup options" info manual
  (https://www.gnu.org/software/coreutils/manual/html_node/Backup-options.html),
  accompanied by options in the Okular’s preferences system itself

- Even if the above is not set, Okular could preemptively store copies
  of files in a known temporary location like ~/.cache and/or /tmp; and
  use lock files and file notifications to prevent alterations to
  documents it’s displaying.

Even if this whole thing is too much, Okular keeping the binary source
file data in memory along with the costly object tree that it uses to
render the file could help with the case that bites the users most
frequently: download file through browser, which saves it in a tmp dir,
and if you exit the browser before you save a copy in okular, the
browser will remove the tmp dir, leading to this problem.

Hope this was useful as a user perspective, comparing it with software
that handles this more gracefully.  I doubt I could contribute any code
in reasonable time without extensive guidance, but I’m totally willing
to help with testing any attempts.
Comment 65 Otto Richter 2021-04-18 09:36:43 UTC
Agree to the last comment. I mean, agree to many comments in these years, there are many valid points and ideas and after reading all of that, I don't really understand why this bug is still open after so many users (and probably many more without knowledge of bug tracking systems) ran into this.
Just hit this bug today and an hour of annotation, because I did not figure out how to save the file. It's super nasty that there is no way to save the file, and even after I restored the original file, I couldn't find a way to save it.
Comment 66 Oliver Sander 2021-04-18 11:53:54 UTC
> I don't really understand why this bug is still open after so many users

Because there is nobody who has both the skills and the free time to fix it.
Comment 67 mateMat 2021-04-18 17:42:07 UTC
Perfectly understood.

I terms of strategy, it seems to happen when we open a PDF on kmail or elsewhere with Okular. Would it make sense to transfer the right to hold this file from kmail to okular, this way when okular closes, it is still in /tmp and we can save it?
Comment 68 Karl 2021-04-18 22:45:43 UTC
There was a time when creating a disk copy would have been seen as a 'bad-thing®', but today, with modern sizes of RAM and disk-drives, I don't think it is probably the right answer. 

If I look at the typical size of PDFs - they are not huge - If I look at the extreme cases -   

It might appear as a sledge-hammer approach, but if you look at all the variable situations, and not being able to count on the downloading programs to do things sanely - I think it is the way to go. No need to create complicated user interface choices..
Comment 69 Jonas 2021-12-27 21:09:06 UTC
I'm also affected by this, in my case it's because I'm synchronizing files via cloud storage between different computers, and sometimes an update from another machine only comes through after I opened an old version (at least I think that's what happens). 

The effect is also that I end up with annotations that I can't save (not even via "save as"), which is very annoying.
Comment 70 Greg B 2023-03-20 11:26:01 UTC
It would be great to see some progress on this! I've lost data multiple times from this bug while filling out PDF forms (lengthy forms, unfortunately). Not being able to "Save As" is a real disaster. Was looking for an alternative to an Adobe product for this task, so it's especially disappointing.

A few more details: I'm editing a local file (no browser downloads) but it's in a directory monitored by syncthing. Nothing has changed the file, but it's possibly changed some metadata which, from skimming this thread, seems to be enough to cause the problem?
Comment 71 Bug Janitor Service 2023-04-20 20:47:58 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/733
Comment 72 Bug Janitor Service 2023-04-20 20:48:00 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/733
Comment 73 Nate Graham 2023-05-02 21:10:36 UTC
Git commit b5191a2c1fa8f4ce45e36258942c6157a3799e7a by Nate Graham.
Committed on 02/05/2023 at 20:57.
Pushed by ngraham into branch 'master'.

Let user "Save As..." when document has been externally modified

Currently Okular stops the user from saving their document if it has
unsaved changes and was modified externally. This makes some sense
because there are now two sources of truth, and Okular cannot reconcile
them itself. However as a consequence, it causes data loss since the
user's unsaved changes in Okular become un-save-able. This is quite
frustrating when it happens.

But this prohibiton on saving over an externally-modified document only
really makes sense for a "Save" operation that overwrites the original
document. If instead, the user does a "Save As...", then they can save
their local changes to another file and avoid losing unsaved changes.
Then if needed, they can manually compare the original
externally-modified document with their newly-saved document and
reconcile the changes by hand.

Accordingly, this commit avoids showing the error message box and
blocking saving if the user is doing a "Save As..." to a new location,
rather than overwriting the open file.
FIXED-IN: 23.04.1

M  +3    -1    part/part.cpp

https://invent.kde.org/graphics/okular/commit/b5191a2c1fa8f4ce45e36258942c6157a3799e7a
Comment 74 Nate Graham 2023-05-02 21:10:44 UTC
Git commit 89f17155bffa20500d7f3f098b08f61eedcb780d by Nate Graham.
Committed on 02/05/2023 at 20:57.
Pushed by ngraham into branch 'master'.

Offer options when trying to overwrite externally-modified file

When the user tries to do a "Save" operation on an open file that was
since modified externally, currently Okular prohibits it. We can make
this a bit more user-friendly by instead explaining the situation
clearly and offering the user some options:
- Yes, really overwrite the changes made in the other program
- Instead save the unsaved changes as a new file elsewhere
- Abort

M  +35   -14   part/part.cpp

https://invent.kde.org/graphics/okular/commit/89f17155bffa20500d7f3f098b08f61eedcb780d
Comment 75 Nate Graham 2023-05-02 21:27:04 UTC
Git commit b596df0c937e015ab284f44fdd7159e925c7acb2 by Nate Graham.
Committed on 02/05/2023 at 21:11.
Pushed by ngraham into branch 'release/23.04'.

Let user "Save As..." when document has been externally modified

Currently Okular stops the user from saving their document if it has
unsaved changes and was modified externally. This makes some sense
because there are now two sources of truth, and Okular cannot reconcile
them itself. However as a consequence, it causes data loss since the
user's unsaved changes in Okular become un-save-able. This is quite
frustrating when it happens.

But this prohibiton on saving over an externally-modified document only
really makes sense for a "Save" operation that overwrites the original
document. If instead, the user does a "Save As...", then they can save
their local changes to another file and avoid losing unsaved changes.
Then if needed, they can manually compare the original
externally-modified document with their newly-saved document and
reconcile the changes by hand.

Accordingly, this commit avoids showing the error message box and
blocking saving if the user is doing a "Save As..." to a new location,
rather than overwriting the open file.
FIXED-IN: 23.04.1


(cherry picked from commit b5191a2c1fa8f4ce45e36258942c6157a3799e7a)

M  +3    -1    part/part.cpp

https://invent.kde.org/graphics/okular/commit/b596df0c937e015ab284f44fdd7159e925c7acb2
Comment 76 Bug Janitor Service 2023-05-14 22:45:19 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/752
Comment 77 Nate Graham 2023-06-08 19:24:46 UTC
Git commit 99600667d44ce0002b1cb1f0dfb3db3c634bc4c6 by Nate Graham.
Committed on 08/06/2023 at 15:21.
Pushed by ngraham into branch 'master'.

Don't stop or warn if the edited file was deleted on disk

If an open  file with unsaved changes was deleted on disk, Okular
currently warns the user about it (in master) or prevents them from
saving it (in 23.04). But in both cases this is unnecessary; if the
open file was deleted on disk when the user tries to save changes,
Okular can simply save to its file path and re-create it, with no
risk of data loss or stomping on anyone else's work. So it should do
that.

M  +8    -3    part/part.cpp

https://invent.kde.org/graphics/okular/-/commit/99600667d44ce0002b1cb1f0dfb3db3c634bc4c6
Comment 78 Sune Vuorela 2023-11-17 13:04:17 UTC
*** Bug 477136 has been marked as a duplicate of this bug. ***