Summary: | Cannot save PDF when loaded file has been deleted | ||
---|---|---|---|
Product: | [Applications] okular | Reporter: | Jens <jens-bugs.kde.org> |
Component: | general | Assignee: | Okular developers <okular-devel> |
Status: | RESOLVED FIXED | ||
Severity: | grave | CC: | aacid, bugseforuns, chronoangel, grgbrn, grmat, joerge-e, jonas743, karl, kde-bugs, kingfati4, kishore96, luigi.toscano, mail, markjballard, martin.marmsoler, mathieu, nate, null, oliver.sander, pino, pv.bugzilla, richard.dwight, steve, winghongchan, yurchor |
Priority: | VHI | Keywords: | usability |
Version: | 1.5.3 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | https://invent.kde.org/graphics/okular/commit/b596df0c937e015ab284f44fdd7159e925c7acb2 | Version Fixed In: | 23.04.1 |
Sentry Crash Report: | |||
Attachments: | attachment-9638-0.html |
Description
Jens
2018-12-11 20:01:36 UTC
When this happens, it should probably detect it and do a Save As operation rather than futilely try to save. Not sure if we could implement this everywhere in KIO or if every KDE app would need it patched in. Nate, please read the report more carefully, "Save As" doesn't work either. This is true. I can, however, "Print" to a new PDF file. This is bad because it loses quality and embedded text and comments / annotations, but it proves the file is still available in memory and *could* be saved. *** Bug 402925 has been marked as a duplicate of this bug. *** Just discovered a class FileKeeper, which should do this. Added in commit 6531398b919e58def8c6ba4711d20ea517147189, eleven years ago. Just grep for #ifdef OKULAR_KEEP_FILE_OPEN *** Bug 407583 has been marked as a duplicate of this bug. *** *** Bug 408774 has been marked as a duplicate of this bug. *** (In reply to Nate Graham from comment #7) > *** Bug 408774 has been marked as a duplicate of this bug. *** I'm repeatedly running into this bug in my workflow - the most recent time after extensive annotation... (in which okular is excellent btw). I had to reproduce annotations by hand - essentially lost work. Extremely frustrating to know that the data is right there (can print, have the original pdf), but saving (including Save As) is forbidden. Is there any chance of this being addressed? More of a bug than a wishlist. Also, as this can cause data loss with no workaround, I'm upgrading the severity accordingly. David, would you be interested in investigating why it doesn't work and submitting a patch to fix it? *** Bug 421342 has been marked as a duplicate of this bug. *** This bug caused Okular to lose pdf files opened from Firefox, using Okular v.1.3.3, installed from the standard Ubuntu 18.04.4 repository, for a fresh installation about a month ago. Okular said it couldn't save pdfs because they had been modified by some other program. Those files were left open for a period of days before attempting to save them. The files were consequently lost. Setting to VHI as this causes data loss. Please look at the following. It seems like keeping the file was implemented in 2008, specifically with Firefox in mind. But it broke something else, so it was disabled using an #undef. There is no way anymore to enable it with a compile switch or through an environment variable. It was also only implemented for UNIX OSes. @Pino: Can you tell more about the problem? Can it be fixed, or do we need another approach than creating a file handle? If we need another approach, I vote for creating a temporary full copy of the file, if a) the file is smaller than a limit, set by the memory performance mode b) (optionally) the file is from a remote location or has a path that is likely to be temporary (e. g. /tmp/, or **/.*/) ---- APPENDIX A: This is a self-descriptive heading for an appendix ---- commit 5b5b4cabe1d8e07392641ba9cf574eda4678b12f Author: Pino Toscano <pino@kde.org> Date: Sat Jun 7 21:52:11 2008 +0000 disable the open handle on the file, it breaks the file watching CCBUG: 163363 svn path=/trunk/KDE/kdegraphics/okular/; revision=818144 diff --git a/part.cpp b/part.cpp --- a/part.cpp +++ b/part.cpp @@ -182,8 +182,5 @@ #undef OKULAR_KEEP_FILE_OPEN -#if defined(Q_OS_UNIX) -# define OKULAR_KEEP_FILE_OPEN -#endif #ifdef OKULAR_KEEP_FILE_OPEN static bool keepFileOpen() { commit 6531398b919e58def8c6ba4711d20ea517147189 Author: Pino Toscano <pino@kde.org> Date: Sat Jun 7 21:08:41 2008 +0000 Keep an open file handle on the local file currently open: this way, we can get it back from it, in case for some reason (read: Firefox blindly removing temporary files) it gets "deleted". Of course, this works (and thus it is activated) only on UNIX systems (as the file is not deleted for real until there are open handles on it). BUG: 163363 (If not wanted, this behavior can be disabled by export'ing OKULAR_NO_KEEP_FILE_OPEN to 1.) Also, in case the local file gets deleted but the real document is remote, use its (remote) URL for the copy. svn path=/trunk/KDE/kdegraphics/okular/; revision=818136 diff --git a/part.cpp b/part.cpp --- a/part.cpp +++ b/part.cpp @@ -124,0 +182,8 @@ +#undef OKULAR_KEEP_FILE_OPEN +#if defined(Q_OS_UNIX) +# define OKULAR_KEEP_FILE_OPEN +#endif + +#ifdef OKULAR_KEEP_FILE_OPEN +static bool keepFileOpen() +{ Temporarily wasting some memory or disk space is a far better alternative than data loss. :) (In reply to David Hurka from comment #13) > Please look at the following. It seems like keeping the file was implemented > in 2008, specifically with Firefox in mind. But it broke something else, so > it was disabled using an #undef. There is no way anymore to enable it with a > compile switch or through an environment variable. It was also only > implemented for UNIX OSes. > > @Pino: Can you tell more about the problem? Can it be fixed, or do we need > another approach than creating a file handle? I guess it will break Kile and interactive editing of TeX. You will not be able to recompile TeX and watch the results on the side panel without reloading Kile itself. There will be a large hassle if you uncomment this (much larger than from poor Firefox users). ;) (In reply to David Hurka from comment #13) > @Pino: Can you tell more about the problem? Can it be fixed, or do we need > another approach than creating a file handle? See bug 163363. Holding an open descriptor on the document fixes the case that the document disappears (hello firefox, and apparently hello kmail too...) while it is opened in okular. The big downside, which is also what I mentioned, is that file watching breaks, and reading the commit should give an hint on why: the handle is kept for the _old_ file, so when some PDF producer (latex, etc) creates a new file by rename()ing the new temporary file on the old name (which is an atomic operation), you don't notice that. At least, I remember there were problems in the results you get from file watching because of this. While fixing firefox to not remove files under applications' feet is not an easy task (12 years and still no change on that), why don't we fix at least our own products (kmail)? Enabling FileKeeper again will not fix this situation, as you will trade one issue with another. (In reply to Nate Graham from comment #14) > Temporarily wasting some memory or disk space is a far better alternative > than data loss. :) This won't behave nicely in editing sessions e.g. in kile or in other document producers, where you edit the sources and save+recompile to PDF often, resulting in a big churn of file creations/removals, and in slowdown due to the blocking file saving. (In reply to Pino Toscano from comment #16) > This won't behave nicely in editing sessions e.g. in kile or in other > document producers, where you edit the sources and save+recompile to PDF > often, resulting in a big churn of file creations/removals, and in slowdown > due to the blocking file saving. Slowdowns and inefficiency are better than data loss. Hopefully we can come up with a way to avoid both, though. Created attachment 128456 [details]
attachment-9638-0.html
Just came across another variation of this problem. I have many documents
open on many workspaces, managing various inter-related workfklows. It is
inevitable in this scenario that I sometimes open the same document twice.
I just did this with Okular and ended up with two versions of an annotated
pdf, one of which could not be saved. Okular said it could not save the pdf
because it had been modified by another program (it had been modified by
another instance of itself, actually). I found the open version and copied
the annotations across.
Text editors usually tell me when this happens and might give me a choice
whether to proceed in read-only or edit mode. (If they were very clever,
which they are not, they would tell me which workspace and which program
had the open doc, because it could be any number of editors in any number
of places). They also give me a choice of actions when the underlying doc
is modified by the other program: ie. ignore changes / reload document from
changed version / cancel and ignore the mismatch completely (therefore to
take the option, not given, of saving the now different version under a
different name). If they were really clever, they would give me the option
of viewing the differences in something comprehensible to humans (ie. not
diff), and pick which ones to keep. Or to merge the two documents and view
only deletes or conflicts before deciding what to do with them, But that is
another matter.
This has come up with Okular because I've started using it to annotate
pdfs. (Hurrah! at long, long, long, long last. this is such a relief! pdfs
- the most ignorant (albeit under-appreciated) form of digital document
known to man, that are like pieces of paper that people are allowed to read
only with their hands tied behind their backs, at last given primitive
intelligence). Okular does annotation very well.
> the handle is kept for the _old_ file, so when some PDF producer (latex, etc) creates a new file by rename()ing the new temporary file on the old name (which is an atomic operation), you don't notice that.
> At least, I remember there were problems in the results you get from file watching because of this.
This could be workaroundable (or already is?) by watching the parent dir as the parent dir would see the file disappear and be created. I remember some case in which this was neeed and i think it was working, but as said, needs somebody to test it extensively (ideally adding autotests :))
The other thing is trying to figure out if we can just use poppler's file handle to save the data (that should still be open), last time i tried it didn't seem to be working well at all, but i have been known to make mistakes I still don’t completely understand the issue with FileKeeper. Is it: a) FileKeeper keeps Okulars own file handle opened, so Okular itself will not notice changes to the file through *this* handle? b) FileKeeper opens another file handle, and such an open handle prevents others (including other Okular code) to monitor the file? In case of a, the solution would be that FileKeeper opens a new file handle, right? Then Okular could continue to monitor the file through the old handle. In case of b and other OSes, it is probably necessary to copy the file to another temporary file, if considered appropriate. Besides that, how does saving the file after annotating work? There must be some function in Poppler to save the loaded and modified file, right? (In reply to David Hurka from comment #21) > Besides that, how does saving the file after annotating work? There must be > some function in Poppler to save the loaded and modified file, right? I don't understand what "save the loaded and modified file" means, sorry my brain can't parse it. But if you grep for saving in the okular poppler generator you'll find PDFGenerator::save. You can pull the thread from there. > But if you grep for saving in the okular poppler generator you'll find > PDFGenerator::save. You can pull the thread from there. Thanks. The related error message for annotated documents is “There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your changes will be lost, because the file can no longer be saved. Do you want to continue closing the file? [Continue Closing] [Abort Closing]“, which is quite frustrating in itself. > > Temporarily wasting some memory or disk space is a far better alternative > > than data loss. :) > > This won't behave nicely in editing sessions e.g. in kile or in other document > producers, where you edit the sources and save+recompile to PDF often, [...] Good point. But if okularpart runs in reader mode, there is no need to save the document, so we don’t need an extra backup, right? I wanted to suggest a size limit for such a backup, which could be linked to the Okular memory setting. Suggestion: Low: Backup files below 2MB; if they are not in the home directory below 10MB Normal (default): Backup files below 10MB; if they are not in the home directory below 20MB Agressive: Backup files below 50MB Greedy: Backup all files which fit into the free memory And I just want to add: It’s not a problem in Firefox, but in Okular. Firefox says “Hey Okular, please open this file and make the user happy!”, and not “Hey Okular, please open this file and make the user happy. Oh, and please tell me when you don’t need it anymore!” (In reply to David Hurka from comment #23) > And I just want to add: It’s not a problem in Firefox, but in Okular. > Firefox says “Hey Okular, please open this file and make the user happy!”, > and not “Hey Okular, please open this file and make the user happy. Oh, and > please tell me when you don’t need it anymore!” We disagree, from the moment firefox tells me "open this file", the file is mine, in my opinion anything firefox does to it it's bad manners Is there a technically feasible way for Firefox to do something better? (In reply to Nate Graham from comment #25) > Is there a technically feasible way for Firefox to do something better? In the same way XDG desktop environment do it: by handling the XDG base directory and XDG desktop files specifications to detect which applications can handle a certain content type and/or a certain remote protocol. See also https://bugzilla.mozilla.org/show_bug.cgi?id=415441#c12 -- maybe there is a better non-closed bug asking this, and if not it would be good to open it. However, in general Mozilla does not care that much about integrating properly Firefox/etc in Unix desktops, as their main share of users is on Windows... That's a shame. Thanks for the history. If Firefox isn't interested in playing ball, then our only realistic option is work around it in Okular. This isn't ideal, but seems preferable to letting user data be destroyed for the sake of a principle (making the fix in the right place). (In reply to Nate Graham from comment #27) > If Firefox isn't interested in playing ball, then our only realistic option > is work around it in Okular. This isn't ideal, but seems preferable to > letting user data be destroyed for the sake of a principle (making the fix > in the right place). The "principle" here is one of the factors, and not one of bigger ones. This behaviour (i.e. remove the temporary files when closing tabs or exiting) affects _any_ application launched by Firefox: okular, kwrite/kate, gwenview, you name it. Adding the same workarounds in any application potentially launched by Firefox and used by users means adding more complexity to those applications (like proven by the discussion in this bug, and my attempt 12 years ago, bug 163363), and even more so when each application usually handles file reading in its own way, making even harder to "share" a potential workaround. Also, what if you try to talk with Mozilla people about this, for example? I was repeatedly bitten by this issue in the past as well and I don't quite get it. Can Okular not simply keep an open file handle for the document, which results in ext4 not removing the data from the disk, even if the original file is removed (i.e. like file handles always work on Linux)? As a fact, that is already the case (as I think Albert considered a few comments above), as you can see in lsof. Okular can also still load the data for not-yet-rendered pages, so the data is clearly there and okular can access it. If it is not possible to re-use that file handle from the save logic, maybe just create another one? Regarding other applications: most applications (e.g. kate) are not affected by this issue, since they just load the whole file into memory, and save it from their in-memory buffer if requested. Gwenview also seems to do this, even though it closes the file when deleted, it can still save an e.g. rotated image, apparently from memory (it has no open file handle). Okular is quite special here with its "stream from disk" behaviour, from a quick look around I didn't find another commonly used application doing this. I tried to just fix this now, and I am more confused than ever about this issue. Just removing the error message and early return makes Save As work flawlessly. Why is the error message even there? Perhaps a relic from the past when things did not used to work properly? If you send a merge request, I can help test it. There's a bit of work around that place needed to make "Save" work properly ("Save as" does automatically, but not "Save"), so I'd like to hear Albert's opinion first ;) If you can make it work without breaking any features, i don't see the problem. Below is the commit that made Okular refuse to save if the file is modified. Looks like Sven Brauch was basically trying to revert this commit, but we should first figure out what’s the problem with poppler. commit 559836c39270506e1f7c7b58bcf6b088a097b246 Author: Albert Astals Cid <albert.astals.cid@kdab.com> Date: Fri Nov 17 14:19:12 2017 +0100 Give warnings when the file is modified externally Summary: Unfortunately, poppler (the only backed that supports saving) is not able to save properly if the file is modified by a third party while it is opened So we give the user a warning saying things went wrong and give him the option to not reload/close, that way if there was something very important in the annotations she added she can try to save them (even if by copy&paste the contents to a third program) https://phabricator.kde.org/D8863 If some generators can’t keep their own document file, we should probably consider every file a remote file, i. e. make a temporary copy of every file and open that instead. Right? Hmm, that's an idea, yeah. There exists this code in part.cpp:
> if (m_document->canSaveChanges()) {
> [...]
> } else {
> // If the generators doesn't support saving changes, we will
> // just copy the original file.
> if (isDocumentArchive) {
> [...]
> } else {
> // duh, our local file disappeared...
> if (!QFile::exists(localFilePath())) {
So if we make all generators access a local copy, we might either move the problem or solve a second problem at the same time. I vote for the second option. ;)
Since there is Generator::loadDocumentFromData(QIODevice ...), it should be
flexible to implement reading from a local copy. But since SaveInterface::save() exists only in QString version, we either need to use real temp files (QTemporaryFile?), or redefine this parameter to be ignored and write to the previously given QIODevice instead. I vote for the first option. ;)
With always accessing a local copy, we probably won’t need swapBackingFile() anymore. swapBackingFile() also exists only in QString version.
I don't think copying the file before opening is acceptable. It uses my hard disk more than it should, it wears&tears my hard disk more than it should and it moves us away from being able to load as you go PDF linearized files. I mean think of this message you would have to show the user "Sorry, we can't open your file because you don't have enough space on disk". And the user face would be all like "Dude the file is already on disk" > it moves us away from being able to load as you go PDF linearized files. Hmm, ok. Since this is not closed as INTENDED, I assume there must be another solution. A) Keep the current swapBackingFile etc. mechanisms and make a backup file condititonally, like in #23: > I wanted to suggest a size limit for such a backup, which could be linked > to the Okular memory setting. Suggestion: > * Low: Backup files below 2MB; if they are not in the home directory > below 10MB > * Normal (default): Backup files below 10MB; if they are not in the home > directory below 20MB > * Agressive: Backup files below 50MB > * Greedy: Backup all files which fit into the free memory (Remote files are always backed up, because of ReadOnlyPart.) B) Disable editing tools (annotations, forms) when the file is not a remote file. > I mean think of this message you would have to show the user "Sorry, we
> can't open your file because you don't have enough space on disk".
If there is not enough space to create a temporary copy, how should the file be saved after editing?
If the file was streamed-from-disk in that case, it would have to be deleted in order to save the edited version. But then, nothing can be saved anymore, so I don’t see the point to not create temporary copies.
So I make another suggestion:
C) Create temporary copies of any file. But if there is not enough space to make a temporary copy, consider the original file to be the temporary copy (but remember to not delete it).
D) Like C, but also remove swapBackingFile. I think full reload instead of swapBackingFile is acceptable in this corner case. (If that would lose e. g. annotations, the user could still save to .okular first.)
...
Hmm, okay, the user could still save to a remote location.
(In reply to David Hurka from comment #40) > > it moves us away from being able to load as you go PDF linearized files. > Hmm, ok. > > Since this is not closed as INTENDED, I assume there must be another > solution. I would close it as INTENDED, you deleted the file, it's your fault, you should not have deleted it. If you did not delete the file, blame whoever did it. But I understand the world is imperfect and it would be useful if we could work in that situation. > > A) Keep the current swapBackingFile etc. mechanisms and make a backup file > condititonally, like in #23: > > > I wanted to suggest a size limit for such a backup, which could be linked > > to the Okular memory setting. Suggestion: > > * Low: Backup files below 2MB; if they are not in the home directory > > below 10MB > > * Normal (default): Backup files below 10MB; if they are not in the home > > directory below 20MB > > * Agressive: Backup files below 50MB > > * Greedy: Backup all files which fit into the free memory > (Remote files are always backed up, because of ReadOnlyPart.) > > B) Disable editing tools (annotations, forms) when the file is not a remote > file. Personally, i would suggest trying to figure out why poppler fails and fix it other than doing all strange things that need the user to read lots of stuff that has "if this and if that and not that" (In reply to David Hurka from comment #41) > > I mean think of this message you would have to show the user "Sorry, we > > can't open your file because you don't have enough space on disk". > > If there is not enough space to create a temporary copy, how should the file > be saved after editing? Getting a "not enough space" on saving, is totally normal and expected, getting it on opening is the weirdest thing ever. > If the file was streamed-from-disk in that case, it would have to be deleted > in order to save the edited version. But then, nothing can be saved anymore, > so I don’t see the point to not create temporary copies. > > So I make another suggestion: > > C) Create temporary copies of any file. But if there is not enough space to > make a temporary copy, consider the original file to be the temporary copy > (but remember to not delete it). > > D) Like C, but also remove swapBackingFile. I think full reload instead of > swapBackingFile is acceptable in this corner case. (If that would lose e. g. > annotations, the user could still save to .okular first.) > > ... > > Hmm, okay, the user could still save to a remote location. > Personally, i would suggest trying to figure out why poppler fails and
> fix it other than doing all strange things that need the user to read
> lots of stuff that has "if this and if that and not that"
I looked at what happens when Okular tries to save a *modified* *PDF* document. In this case, the Generator is able to save the document including the changes to a new file.
<investigation>
“->” will denote “function calls function”.
Poppler_qt5 will denote the Qt5 interface of Poppler, as Okular sees it.
Poppler will denote internal Poppler stuff.
1.1. Part requests to save the document to a local file:
Part::saveAs() -> ... -> PDFGenerator::save()
1.2. PDFGenerator::save() creates a new Poppler_qt5::PDFConverter from the existing Poppler_qt5::Document, and uses it to save the document:
PDFGenerator::save() -> Poppler_qt5::Document::pdfConverter()->convert()
1.3. Poppler_qt5::PDFConverter::convert() creates a Poppler::QIODeviceOutStream, which is a Poppler::OutStream, on the local output file. Then it lets the document be saved in this OutStream:
Poppler_qt5::PDFConverter::convert() -> Poppler::PDFDoc::saveAs() -> Poppler::PDFDoc::saveIncrementalUpdate()
1.4. Poppler::PDFDoc::saveIncrementalUpdate() creates a copy of “str” (see 2.2), creating a new Poppler::FileStream.
1.5. Poppler::PDFDoc::saveIncrementalUpdate() first makes a verbatim copy from this FileStream to the OutStream. Later it modifies the output file within the OutStream to reflect the modifications.
So where does the input data Poppler::FileStream come from?
2.1. Document::openDocument() calls the PDFGenerator to open the PDF document from a local file. (It can also pass the data in a QByteArray, but that is only relevant when the document comes from stdin.)
Document::openDocument() -> ... -> PDFGenerator::loadDocumentWithPassword() -> Poppler_qt5::Document::load() -> new Poppler::DocumentData() -> new Poppler::PDFDoc() -> Poppler::GooFile::open() -> poppler/gfile.cc: openFileDescriptor() -> open()
2.2. Poppler::PDFDoc::PDFDoc() makes the new GooFile accessible as Poppler::FileStream, which is a Poppler::BaseStream, and stores it as “str”.
So what does reading from this Poppler::FileStream do?
3.1. As seen in 1.5, Poppler::PDFDoc::saveIncrementalUpdate() reads the whole input file from the FileStream:
Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::getChar() -> Poppler::FileStream::fillBuf() -> Poppler::GooFile::read() -> unistd.h: pread()
So what happens in 1.4, when the stream is copied?
4.1. Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::copy() -> new Poppler::FileStream()
4.2. copy() created a new stream on the same GooFile, but with an empty buffer. This means, any first call to FileStream::getChar() will call pread() on Poppler’s input file handle.
</investigation>
So there we are: The Poppler feature, which we already discovered as stream-from-disk, is implemented by calling pread(). pread() relies on the integrity of Poppler’s input file handle. This means Okular is responsible to guarantee the integrity of the file handle. Since Okular passes a path to a local file, we rely on the OS, which should not modify or delete the input file while Poppler has its file handle on it.
This works fine in these cases:
A) Okular opens a remote file. ReadOnlyPart will make a temporary file as local copy, so integrity is guaranteed.
B) Okular opens a persistent local file (e. g. ~/Mess/LM555.pdf). Integrity is guaranteed as long as the user does not modify the file intentionally.
This works not fine in this case:
C) Okular opens a temporary local file, because Firefox told it to do so. Integrity is not necessarily guaranteed, because Firefox will delete this file soon. Before commit 559836c3, Okular let Poppler try to read the deleted file through its still existing file handle. As seen in 4.2, this should work fine if the system uses e. g. ext4.
This works not at all in this case:
D) Okular opens a file from a thumb drive, and later the thumb drive fails. Because the data is absolutely unavailable now, Poppler can’t read any data through its file handle .
Apparently we are mostly concerned about case C. How about letting PDFGenerator try to save the file even if the source file is detected to be modified? If PDFGenerator fails, we can inform the user that the PDF backend is to blame.
This is what the old posix says (or at least i remember/understood from my university years): "Once you have a file descriptor from a file open, the file can disappear, change, whatever, you will still read the contents of the file as if it was when you opened the file descriptor". But either that has changed or i misunderstood it, because that is not what happens, see https://i.imgur.com/DaSS7Ad.gif where i change a file after having opened it, and the read after having changed the file gives me the changed contents and not the original ones. If you want the code it's http://paste.debian.net/1157802/ Note if I change it with vi instead of kate, "it works", i.e. gives me the old contents instead of the new, but the fact that "it can fail", means we can't trust it to do A nor B. This means reverting 559836c3 would make Okular work nicely with Firefox in many cases, but probably not all cases. I think we can agree on some priorities now? 1. Work nicely with Firefox in many cases (I. e. allow late saving) 2. Work nicely with Firefox in all cases 3. Linearized PDFs 4. Don’t use more memory than needed (I. e. allow stream-from-disk) 5. These are the priorities that come to my mind now, maybe you have others? To implement 3, we would need to bypass ReadOnlyPart::openUrl(), which downloads the whole file first. Additionally we need to teach browsers to pass the remote URL to Okular instead of downloading the document first. To implement 2, we need to create backup files for files which are suspected to vanish. 2 is in conflict with 3 and 4, so we need to offer configuration options and/or to actively use swapBackingFile. Sorry for not checking back, I had a look; for PDF it actually works perfectly by just removing the error message, but for other file types (e.g. PNG) we would need to keep the FD open manually. There actually already is code for that, which needs a slight refactoring to work correctly though, which is where I stopped -- sorry, need to pick that up again. @Albert: I think you misunderstood. An open file descriptor refers to a specific file, which is basically ref-counted by all open file descriptors and all file names in the file system hierarchy referring to it. POSIX guarantees you that it won't swap this *pointer* behind your back, but not that you always get the same *contents*. So for modifying files, it depdends on how applications do that: if they do open("filename"), write(), close() you get the new contents; if they do open("filename.tmp"), write(), close(), rename("filename.tmp", "filename"), you get the old contents, because the new contents are in a *different* file (which now happens to have the same name). So the behaviour is clear, well-defined and reliable, it just depends on what applications do. Sven wrote: > if they do open("filename.tmp"), write(), close(), > rename("filename.tmp", "filename"), you get the old contents, > because the new contents are in a *different* file (which now > happens to have the same name). > > So the behaviour is clear, well-defined and reliable, Well, it’s not necessarily reliable. On your ext4 home partition it should work, but if you use FAT for some reason, or even the thumb drive fails, the data is still lost. > but for other file types (e.g. PNG) we would need to keep the FD > open manually. There actually already is code for that, which needs > a slight refactoring to work correctly though, I think for generators which can’t save(), we should just go the KISS way and always make a temporary copy. Most document types other than PDF need (very) much less space on disk than we need space on RAM to render a single viewport. (E. g. PNG doesn’t have 30 other pages, TXT is always very compact.) Which code do you mean? FileKeeper didn’t turn out to work very well, but if you understand how to fix it... :) > On your ext4 home partition it should work, but if
> you use FAT for some reason, or even the thumb
> drive fails, the data is still lost.
Ok, but now we have strayed quite far from the original goal. Those are not common cases on Linux. For FAT or NTFS, one would need to check if the file can even be removed by default, I think these use exclusive locking as standard?
Re. copying, to be honest I have never in my life seen anyone view any file except .pdf (or maybe .ps) in okular, nor done that myself, so I guess the solution for these doesn't matter that much ...
We have had a relevant amount of people asking and reporting issues with the comic-book files, for example. Let's not have a backend-specific solution, please. @Sven thanks for the refresher :) Honestly the only good solution i see is getting the file keeper to work, and hopefully after 12 years file watching has improved and the file keeper doesn't break it. David, which issues have you had with the file keeper? 0. To ensure we have properly diagnosed the problem: 1. This error is easily reproduced, 2. Open the same file twice (simultaneously). Make changes in both copies, and save one. 3. It is not surprising that okular refuses to save over the changed file. 4. It IS surprising that the file cannot be saved at all. 5. In my test case, okular uses 40MB to view an 0.8MB file. It would be at most a 5% increase in RAM to keep the whole file in memory. Perhaps do that for small files? 6. In the meantime, perhaps a change to the error message to clarify (3&4) above: CURRENT VERSION: There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your changes will be lost, because the file can no longer be saved. Do you want to continue reloading the file? PROPOSED VERSION: There are unsaved changes, and the file 'file.pdf' has been modified by another program. Your past *and future changes* will be lost, because the file can no longer be saved, *not even to a new file*. Do you want to continue reloading the file? OPTIONAL SUGGESTION: (Consider manually copying over your changes to a freshly opened copy of the file.) I propose the bug report be titled, "Cannot save PDF when loaded file has been deleted *or touched by any other process*." To reproduce: 1. open $FILE in okular. 2. Highlight a word. 3. touch $FILE Result: "There are unsaved changes, and the file '...' has been modified by another program. Your changes will be lost, because the file can no longer be saved. Do you want to continue reloading the file?" Note: Okular takes 800 MB to render a 5.4 MB file I'm reading; keeping it all in memory would require *at most* 0.7 % more memory. Why not do that if memory is available? *** Bug 430690 has been marked as a duplicate of this bug. *** This bug has been languishing for two years. Can't you apply the fixes you have? *** Bug 431198 has been marked as a duplicate of this bug. *** Can we please put our heads together and come up with a solution for this issue? Yes this bug is a pain, I lost a lot of work in annotations and comments yesterday. Since I am not a KDE dev and not a software developper in general, I can only give you a external observation of this nasty bug. I think it is a 2 factors conjecture that lead to this bug: 1. The okular annotation tool seems to be detected by the main instance as modifier from another program/stack. I have seen that okular is having multiple entries in the program list, this must be due to the different back-ends that run it. These back-ends should be considered as the same software in plasma/filekeeper. This might solve the problem, but the technicalities are beyond my proficiency in C.Sc. 2. The background job should look for OKULAR_KEEP_FILE_OPEN before purging /tmp files What about keeping the file in memory (/tmp) as long as okular has not closed it and if the file is detected in /tmp during a save, ask to save it in the ~/ folder. I think it is the way Preview works. This bug is a pain.. I found a way to recover using PV .. a work-around 1) find the process+fd by looking directly in /proc: ls -al /proc/*/fd/* 2>/dev/null | grep {filename} 2) Then with pv thrown in: tail -c +0 -f /proc/{procnum}/fd/{fdnum} | pv -s {expectedsize} > {recovery_filename} You may need to Ctrl-C when done (ls /proc/{procnum}/fd/{fdnum} will tell you that the file no longer exists)), but if you know the exact size in bytes, you can use pv -S to make it exit when the count is reached. *** Bug 432271 has been marked as a duplicate of this bug. *** See post 59 - as this works - I've used it a few times - it is obvious that it can be fixed within the program. At the worst, there would be some system call - more likely the library calls to do this already exist within the dependencies. So this bug is confirmed and fixable. (In reply to Karl from comment #59) > This bug is a pain.. > > I found a way to recover using PV .. a work-around > > 1) find the process+fd by looking directly in /proc: > > ls -al /proc/*/fd/* 2>/dev/null | grep {filename} > > 2) Then with pv thrown in: > > tail -c +0 -f /proc/{procnum}/fd/{fdnum} | pv -s {expectedsize} > > {recovery_filename} > > You may need to Ctrl-C when done (ls /proc/{procnum}/fd/{fdnum} will tell > you that the file no longer exists)), but if you know the exact size in > bytes, you can use pv -S to make it exit when the count is reached. This bug has caused me a lot of troubles, lost nerves and data, incredibly frustrating that this is ongoing for 3 years. I'm happy you were able to find a workaround. Can you maybe give a further explanation? Are you supposed to close Okular throughout? I tried copying into another directory using {recovery_filename}, but the progress chart does not move. I'm not able to create files inside that folder either. It keeps being stuck at 0Bytes and does not change. Albert, we really need to fix this. Can we come up with a path forward? I too have encountered this bug a lot, as recent as this morning where I almost lost a PDF I had tracked down. I’m not knowledgeable in Qt or C++, but I’m looking around in the codebase nevertheless, in the hopes that I spot something useful. I see that, in document_p.h, there’s this definition QVector<Page *> m_pagesVector; which probably does hold the data necessary to save even if the original document is gone. Now, as an Emacs user I observe the following patterns: - A file that looks like ‘.#<filename>’ is created, this is a lock file, used to prevent simultaneous editing - Emacs listens on the target file ‘<filename>’ using file system notification libraries. If you edit a buffer that visits a file that has been edited outside of Emacs, it will query you, asking you to either confirm the edit, cancel the edit, or to ‘revert the buffer’ before editing, which means to update the contents of the buffer re-reading the file from disk. - When enabled, Emacs writes backup files after each save, which look like, without customisations, ‘<filename>~’. This is governed by a variety of environment and in-Emacs variables, but the created file contains the version of the edited file as it was before it was saved, e.g.: - open file.txt, containing "hi" - append ", john" - save - file.txt~ is created, containing "hi" - file.txt is overwritten, containing "hi, john" - The contents of a buffer is separate from the contents of the file, the data is spared in memory even if the file is deleted, so you can save a buffer into a file regardless. I doubt it’s desirable to copy this wholesale to Okular, but the following could be done: - Upon opening a file, it’s wholly read into a buffer as a string of bytes, which acts as a buffer between the Document class, and the on disk file. This would ensure that data is always in memory, but without taking up too much space, as indicated in the 53rd comment in the thread: Phil 2020-08-21 11:35:55 UTC writes: > Note: Okular takes 800 MB to render a 5.4 MB file I'm reading; keeping > it all in memory would require *at most* 0.7 % more memory. Why not do > that if memory is available? (https://bugs.kde.org/show_bug.cgi?id=402017#c53) - Okular keeps tabs on the file through filesystem notification libraries like inotify, kqueue, etc., and alerts the user if the file on disk is changed, asking them to take action. - If the user has enabled auto-reloading the files, Okular filters for deletion events and apparently destructive events (i.e. new contents of the file do not constitute a valid document, or file is deleted, or new file is empty, or user does not have read/write permissions anymore), and the user is again queried as to what to do. - When the user attempts to exit the application, Okular checks if the file is somehow destroyed or modified, and offers the user one last chance to record the file as it appears in Okular. - These queries would look like the example modal below, with slight variation depending on the triggering event: ,---- | The file ‘/home/u/some.pdf’ has been modified outside Okular. Before | further modifications, you need to decide how to respond to these | changes. | | [_S_ave as a copy...] [_C_ancel] [Edit anyways...] [Overwrite] `---- - ‘Save as a copy...’ means the document as it currently is in Okular is saved under a new name, and Okular switches to that file for the current document. - ‘Cancel’ cancels the attempted modification by the user. - ‘Edit anyways...’ allows the user to continue editing the document, without creating a copy. The user is responsible to ‘Save’ or ‘Save as...’ the document later. This option is dangerous, so does not present a mnemonic keyboard shortcut. - ‘Overwrite’ will overwrite the file the current document targets with the contents of the document as they currently appear in Okular. This option too does not get a mnemonic keyboard shortcut, as it’s destructive. The above would perfectly remedy the problem at hand, and below are some possible enhancements: - ‘VERSION_CONTROL’ and ‘SIMPLE_BACKUP_SUFFIX’ could be respected, as they appear in "(coreutils) Backup options" info manual (https://www.gnu.org/software/coreutils/manual/html_node/Backup-options.html), accompanied by options in the Okular’s preferences system itself - Even if the above is not set, Okular could preemptively store copies of files in a known temporary location like ~/.cache and/or /tmp; and use lock files and file notifications to prevent alterations to documents it’s displaying. Even if this whole thing is too much, Okular keeping the binary source file data in memory along with the costly object tree that it uses to render the file could help with the case that bites the users most frequently: download file through browser, which saves it in a tmp dir, and if you exit the browser before you save a copy in okular, the browser will remove the tmp dir, leading to this problem. Hope this was useful as a user perspective, comparing it with software that handles this more gracefully. I doubt I could contribute any code in reasonable time without extensive guidance, but I’m totally willing to help with testing any attempts. Agree to the last comment. I mean, agree to many comments in these years, there are many valid points and ideas and after reading all of that, I don't really understand why this bug is still open after so many users (and probably many more without knowledge of bug tracking systems) ran into this. Just hit this bug today and an hour of annotation, because I did not figure out how to save the file. It's super nasty that there is no way to save the file, and even after I restored the original file, I couldn't find a way to save it. > I don't really understand why this bug is still open after so many users
Because there is nobody who has both the skills and the free time to fix it.
Perfectly understood. I terms of strategy, it seems to happen when we open a PDF on kmail or elsewhere with Okular. Would it make sense to transfer the right to hold this file from kmail to okular, this way when okular closes, it is still in /tmp and we can save it? There was a time when creating a disk copy would have been seen as a 'bad-thing®', but today, with modern sizes of RAM and disk-drives, I don't think it is probably the right answer. If I look at the typical size of PDFs - they are not huge - If I look at the extreme cases - It might appear as a sledge-hammer approach, but if you look at all the variable situations, and not being able to count on the downloading programs to do things sanely - I think it is the way to go. No need to create complicated user interface choices.. I'm also affected by this, in my case it's because I'm synchronizing files via cloud storage between different computers, and sometimes an update from another machine only comes through after I opened an old version (at least I think that's what happens). The effect is also that I end up with annotations that I can't save (not even via "save as"), which is very annoying. It would be great to see some progress on this! I've lost data multiple times from this bug while filling out PDF forms (lengthy forms, unfortunately). Not being able to "Save As" is a real disaster. Was looking for an alternative to an Adobe product for this task, so it's especially disappointing. A few more details: I'm editing a local file (no browser downloads) but it's in a directory monitored by syncthing. Nothing has changed the file, but it's possibly changed some metadata which, from skimming this thread, seems to be enough to cause the problem? A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/733 A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/733 Git commit b5191a2c1fa8f4ce45e36258942c6157a3799e7a by Nate Graham. Committed on 02/05/2023 at 20:57. Pushed by ngraham into branch 'master'. Let user "Save As..." when document has been externally modified Currently Okular stops the user from saving their document if it has unsaved changes and was modified externally. This makes some sense because there are now two sources of truth, and Okular cannot reconcile them itself. However as a consequence, it causes data loss since the user's unsaved changes in Okular become un-save-able. This is quite frustrating when it happens. But this prohibiton on saving over an externally-modified document only really makes sense for a "Save" operation that overwrites the original document. If instead, the user does a "Save As...", then they can save their local changes to another file and avoid losing unsaved changes. Then if needed, they can manually compare the original externally-modified document with their newly-saved document and reconcile the changes by hand. Accordingly, this commit avoids showing the error message box and blocking saving if the user is doing a "Save As..." to a new location, rather than overwriting the open file. FIXED-IN: 23.04.1 M +3 -1 part/part.cpp https://invent.kde.org/graphics/okular/commit/b5191a2c1fa8f4ce45e36258942c6157a3799e7a Git commit 89f17155bffa20500d7f3f098b08f61eedcb780d by Nate Graham. Committed on 02/05/2023 at 20:57. Pushed by ngraham into branch 'master'. Offer options when trying to overwrite externally-modified file When the user tries to do a "Save" operation on an open file that was since modified externally, currently Okular prohibits it. We can make this a bit more user-friendly by instead explaining the situation clearly and offering the user some options: - Yes, really overwrite the changes made in the other program - Instead save the unsaved changes as a new file elsewhere - Abort M +35 -14 part/part.cpp https://invent.kde.org/graphics/okular/commit/89f17155bffa20500d7f3f098b08f61eedcb780d Git commit b596df0c937e015ab284f44fdd7159e925c7acb2 by Nate Graham. Committed on 02/05/2023 at 21:11. Pushed by ngraham into branch 'release/23.04'. Let user "Save As..." when document has been externally modified Currently Okular stops the user from saving their document if it has unsaved changes and was modified externally. This makes some sense because there are now two sources of truth, and Okular cannot reconcile them itself. However as a consequence, it causes data loss since the user's unsaved changes in Okular become un-save-able. This is quite frustrating when it happens. But this prohibiton on saving over an externally-modified document only really makes sense for a "Save" operation that overwrites the original document. If instead, the user does a "Save As...", then they can save their local changes to another file and avoid losing unsaved changes. Then if needed, they can manually compare the original externally-modified document with their newly-saved document and reconcile the changes by hand. Accordingly, this commit avoids showing the error message box and blocking saving if the user is doing a "Save As..." to a new location, rather than overwriting the open file. FIXED-IN: 23.04.1 (cherry picked from commit b5191a2c1fa8f4ce45e36258942c6157a3799e7a) M +3 -1 part/part.cpp https://invent.kde.org/graphics/okular/commit/b596df0c937e015ab284f44fdd7159e925c7acb2 A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/752 Git commit 99600667d44ce0002b1cb1f0dfb3db3c634bc4c6 by Nate Graham. Committed on 08/06/2023 at 15:21. Pushed by ngraham into branch 'master'. Don't stop or warn if the edited file was deleted on disk If an open file with unsaved changes was deleted on disk, Okular currently warns the user about it (in master) or prevents them from saving it (in 23.04). But in both cases this is unnecessary; if the open file was deleted on disk when the user tries to save changes, Okular can simply save to its file path and re-create it, with no risk of data loss or stomping on anyone else's work. So it should do that. M +8 -3 part/part.cpp https://invent.kde.org/graphics/okular/-/commit/99600667d44ce0002b1cb1f0dfb3db3c634bc4c6 *** Bug 477136 has been marked as a duplicate of this bug. *** |