Bug 394775 - Annotations in the separated XML files
Summary: Annotations in the separated XML files
Status: RESOLVED INTENTIONAL
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 1.3.3
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 396681 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-05-28 04:25 UTC by linnets
Modified: 2023-03-07 10:59 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description linnets 2018-05-28 04:25:38 UTC
I've installed OpenSuse 15, and Okular asked for save annotations to pdf file, as they are not locally stored anymore.

While there is more advanced pdf editor for Linux - 'Master PDF Editor', I still like to use Okular as viewer because of its annotations, which saved in separated XML files.

There are advantages:

1) searching inside or editing notes are very easy.
2) easy to transfer annotations and backup annotations without make a copy of PDF files, some are very large.
3) If I want to save annotations into pdf, there is already feature in Okular.
Comment 1 Albert Astals Cid 2018-05-30 07:33:43 UTC
Saving annotations to a separate file is something that won't come back. Sorry.
Comment 2 Christoph Feck 2018-07-20 18:42:32 UTC
*** Bug 396681 has been marked as a duplicate of this bug. ***
Comment 3 kamp 2019-07-08 11:51:25 UTC
I am missing this feature too.

Saving annotations to a separate PDF file was very comfortable and I could keep the original file untouched.

It would be nice to have this feature as an optional in the okular settings.

Is there a reason why it should not come back?
The feature was already available.
Comment 4 Sebastian Guttenberg 2019-09-29 09:39:25 UTC
In the duplicate #396681 , there are reasons provided, why this feature disappeared, but I don't find them all convincing. 

One reason is of course understandable (not wanting to maintain two user interfaces an the fact that storage inside the pdf was one of the most frequently requested features). However, what I don't agree at all, is that the old way is bad because it "didn't do what the user expected and was full of bugs". This shortcoming of the old feature was only because of poor documentation and not because of the method itself. One really had to do research to find out that the .xml was stored in 
.kde/share/apps/okular/docdata . If at the time of pressing "save" this had been clearly announced, confusion would have been far less. Also it might have been a good solution to also allow the .xml to lie in the same directory as the pdf, so that it would have been easier to copy them together, if desired. 
One of the mentioned bugs in "full of bugs" was apparently the fact that  an xml was not related to its pdf any longer after renaming the pdf. But this wouldn't have been a bug, if user's simply had known about the mechanism behind. And then it's clear: if you rename the pdf and want to keep the annotations, you have to rename the .xml .  

In #397097 there are a few arguments in favour of the old way. For me, one of the main arguments is disk-space. Assume you're working through a scanned book of say 40MB. Then you make a few tiny annotations worth a kB, but if you don't want to overwrite the original you get another 40MB. As a researcher you might have tons of pdf's where you don't want to change the original, so you have to double each article that you annotate. 

On top of that, if you work with a tool like jabref, and have the original pdf's linked to the entry of your bibliography, then doing annotations and saving it as a different pdf will force you to update your links in the library, otherwise you won't see the annotations next time. 

Furthermore, the old way in principle would have allowed (though I think it wasn't implemented) to switch on and off the annotations easily. If they are embedded in the pdf this might be more complicated (or am I wrong there?).
Comment 5 Ambrogio De Lorenzo 2020-01-10 10:38:01 UTC
There are a lot of reasons to leave files untouched.
I try to simplify my point of view.
1. I can annotate on a shared PDF but I don't wont others to see my notes. And also I want to use open everytime the same file (maybe because I open it from a web browser)
2. The file is signed (so it cannot be or must not be modified)
3. The file is indexed (sometimes using the hash) so it should not be modified

Using the xml for annotation should be a good way to save annotations witout sharing them.
If I modify the PDF and I want to share the new modified version, I can always do it with the "save as" function.

So I think that this "old" function should changed as a "new feature".

I have 2 questions:
1. what was the last okular version used this function
2. There are other PDF viewer that permit annotations without changing PDF?

Regards
 Ambrogio
Comment 6 ederag 2020-04-12 19:53:37 UTC
(In reply to Ambrogio De Lorenso from comment #5)
> 1. what was the last okular version used this function
> 2. There are other PDF viewer that permit annotations without changing PDF?

Version 1.2.
Okular is too good to move away from ! A package for openSUSE can be found in
https://build.opensuse.org/package/show/home:ederag/okular-1.2


Another description of use cases: in the last paragraph of
https://bugs.kde.org/show_bug.cgi?id=397097#c2
and in
https://bugs.kde.org/show_bug.cgi?id=397097#c3


An interface design was proposed in
https://bugs.kde.org/show_bug.cgi?id=397097#c8
what do you think ?


I'm ready to help implementing it, 
although my c++ is rusty and the task is daunting.
Discussion and pointers would be appreciated.
Comment 7 Ambrogio De Lorenzo 2020-04-14 08:25:28 UTC
(In reply to ederag from comment #6)
> Version 1.2.
> Okular is too good to move away from ! A package for openSUSE can be found in
> https://build.opensuse.org/package/show/home:ederag/okular-1.2
Actually I use okular 1.9.3
1.2 is too old. Maybe a lot of new functionality in PDF compatibility could be lost with it.
> 
> 
> Another description of use cases: in the last paragraph of
> https://bugs.kde.org/show_bug.cgi?id=397097#c2
> and in
> https://bugs.kde.org/show_bug.cgi?id=397097#c3
> 
> 
> An interface design was proposed in
> https://bugs.kde.org/show_bug.cgi?id=397097#c8
> what do you think ?
> 
> 
> I'm ready to help implementing it, 
> although my c++ is rusty and the task is daunting.
> Discussion and pointers would be appreciated.

All job that can permit to leave untouched the original file, but that can permit to annotate on the PDF is really usefull.
There are a lot of cases that cannot permit to modify the original doc.
A second copy, modified with annotations, is not so simple to be used and maintained.

I think developers should evaluate our point of view, and decide if it is possible to return to previous behaviour.

Regards
 Ambrogio
Comment 8 Jonathan Schmidt-Dominé 2020-04-14 08:44:29 UTC
To summarise my situation: Actually I am still using Okular 1.3. I am reading and annotating a lot of PDFs for my work, any newer version of Okular would completely break my workflow. I need the original PDFs to share them with others without my annotations, many of them are very large (books etc.), always being asked whether I want to save my annotations sounds like madness to me, I am often searching for something within the XML files.

Best regards,

Jonathan
Comment 9 David Hurka 2020-04-14 15:07:38 UTC
If I understand it correctly, you want to make local notes on a remote PDF file. (Or similar to that, where the “remote PDF” is on your local machine.) There are some types of annotations:
* Notes: You spot an interesting point in the PDF and make some notes in the floating popup note window.
* Drawings: You spot something in the PDF that needs to be changed, so you draw directly on the page.
* Drawings with popup note: I don’t think that makes sense.

In case of Notes, it is understandable that the local note should stay when the remote PDF changes. But this can be done with Bookmarks. Bookmarks can even be searched from within Okular, no need to deal with the XML files.

In case of Drawings, they should not stay when the remote PDF changes, because when the PDF gets fixed they become obsolete.

The only problem with Bookmarks is that they are not visible in the viewport. But that could be changed.

Viewing the document without annotations would probably make a good feature request: A button in the annotation toolbar to hide all annotations.
Comment 10 ederag 2020-04-15 11:27:05 UTC
(In reply to David Hurka from comment #9)
> * Drawings with popup note: I don’t think that makes sense.

Here is my use case for that:
highlight a sentence in orange to mean "there's an issue here",
and give details in the popup. Very handy.


> In case of Notes, it is understandable that the local note should stay when
> the remote PDF changes. But this can be done with Bookmarks.

Bookmarks were attached to pages (or did that change ?),
the granularity was not fine enough for me.


> In case of Drawings, they should not stay when the remote PDF changes,
> because when the PDF gets fixed they become obsolete.

Indeed. But why focus on pdf changes ?
I only use annotations when the underlying pdf stays unaltered.
(articles, or official documents)
Comment 11 David Hurka 2020-04-16 13:34:56 UTC
I just remembered about Xournal. It’s a notetaking application, which can use PDFs as background. I never used it, and I’m not sure how much this paragraph applies to Xournal, but:

> Fileformat
> The fileformat *.xopp is an XML which is .gz compressed. PDFs are not embedded > into the file, so if the PDF is deleted, the background is lost. 
(From Xournal README)
Comment 12 ederag 2020-04-18 12:06:07 UTC
Thanks for the tip, xournal improved a lot!
Yet xournalpp is fine for few pages, but currently slow to open books.
(tested with a 56MB, 500 pages long pdf, 
 xournalpp versions 1.0.8 and current master: 4d2e2fb)
Development is active, that might improve quickly.

I'm really fond of okular reactivity.
The text/columns aware highlighter of okular is also amazing.

The migration of docdata would also be an issue.
And it does not look feasible to annotate a pdf attached to a mail,
move the mail to its folder, reopen the pdf and see the annotations,
as used to be possible with okular.

But the LaTeX annotations of xournal are appealing.

okular part.cpp was very readable (as often with kde code),
and docdata capability is still around (for archives).
That opens other possibilities.

A workaround might be found, without bothering okular devs. Need to think.
Comment 13 David Hurka 2020-04-19 15:22:23 UTC
(In reply to ederag from comment #12)
> okular part.cpp was very readable (as often with kde code),
> and docdata capability is still around (for archives).
> That opens other possibilities.
> 
> A workaround might be found, without bothering okular devs. Need to think.

Wow, I had a hard time understanding the code. (I didn’t read much C++ code before.) If you invest time, it should be possible to make an Okular fork which uses XML annotations. (“Oxular”?)

It will probably not be merged, because users who don’t know that old versions of Okular will not miss it, and for the other users it was probably more frustrating than useful.
Comment 14 David Hurka 2020-04-25 12:10:56 UTC
From the real world I can name another use case for separated XML files now. I just sent a PDF with some questions added to a teacher. So if you let a document be reviewed by multiple reviewers, or send it to students, it would be useful to export all annotations to an XML file (or copy them to the clipboard), and import them into another document.

I think an import/export menu could be added to the bottom toolbar of the Reviews/Annotations side panel, without cluttering the UI too much. Or even just in the context menu.

Of course I need to open a new feature request for that, but would that help the reporters of this bug?
Comment 15 ederag 2020-04-26 11:59:41 UTC
(In reply to David Hurka from comment #14)
> So if you let a document be reviewed by multiple reviewers, ...
> it would be useful to export all annotations ...
>  and import them into another document.
> ... would that help the reporters of this bug?

Interesting use case and feature, but we need an automatic, 
instant save of the annotations (as used to be).

Another use case where separate annotations are necessary:
when we receive a protected file.
That will be prevented by
https://invent.kde.org/kde/okular/-/merge_requests/105
> The current document is protected: All actions are disabled
Comment 16 ederag 2020-05-09 12:06:15 UTC
Good news: an experimental helper script seems to workaround our use case.
(there are issues, it's not ready to share yet)
It required a two line hack (not satisfying yet) to okular,
so that the archive holds the original file, 
and the annotations in its metadata.xml.
The helper is using sha512sum, so the annotations follow file renames.

So if okular would keep the ability/option to store annotations in metadata.xml,
together with an option to avoid https://bugs.kde.org/show_bug.cgi?id=397097,
then chances are we could all be happy.

Why think so ?
The reasons for burning things into the pdf were indeed compelling:
- confusion on renames, 
- use cases opposite to ours (https://bugs.kde.org/show_bug.cgi?id=315552)
- and security issues with forms data (https://bugs.kde.org/show_bug.cgi?id=267350)

The latter forces to burn the form data into the pdf.
So it was decided to burn annotations into as well.

Yet there is a strong difference between forms and annotations:
one writes *into* a form ("fill a form"), while
one *adds* an annotation *onto* a document.

Please don't get me wrong, 
I'm not prying to change the default behavior at all.
Just advocating that our use case makes sense as well, 
and would be a nice option to have.
Comment 17 Andrew Norton 2022-09-05 05:46:36 UTC
I prefer to keep my PDF collection clean by saving Okular annotations to separate files, so I've written a Python3 program (annotation-mgr) to do that.

The purpose annotation-mgr is to simulate the old Okular behaviour for new Okular versions. While annotation manager is running, Okular appears to behave as though it saves annotations to separate files, leaving original PDF files un-modified.

For further info and download:   https://github.com/ahnorton/annotation-manager
Comment 18 ederag 2022-09-05 18:58:22 UTC
I have not been able to fix the issues mentioned in comment #16,
because the annotation saving into the pdf is deeply entangled with annotation handling in general now.
It might be doable, but would be a huge change, and discussions gave little hope that it would be welcome,
so I went back to okular-1.2.

Hence I'm delighted to see another take.
The idea to keep pristine okular, and store/apply the diff (tail) of the annotated files looks clever !

Did I read correctly that the pdf files have to be writable, and thus temporarily modified,
which I don't want (I want a pure viewer) ?
Are you open to discussions about this ?
Comment 19 Andrew Norton 2022-09-18 12:15:45 UTC
(In reply to ederag from comment #18)

> Did I read correctly that the pdf files have to be writable, and thus
> temporarily modified,
> which I don't want (I want a pure viewer) ?

The Python script (annotation-mgr) replaces any annotated pdf by its backed-up original when Okular closes. Thus, permissions and modification time of the original pdf are preserved. However, while the "working copy" of the pdf is being viewed and annotated, yes, that file is writable.

BTW, I don't think Okular cares whether or not a pdf file has write permission. If it does not, then when Okular saves annotations it simply changes the write permission (if the user is owner). 

I'm not clear why you would want the pdf file always non-writable?  Perhaps you have simultaneous multiple users viewing and annotating the same pdf and you want each user to save their own annotations?  If so, I see no way to do that without changing the Okular code.
Comment 20 ederag 2022-09-18 14:48:36 UTC
> I'm not clear why you would want the pdf file always non-writable? 

Because many of these pdf are no longer available, and I don't want any corruption risk.

> Perhaps you have simultaneous multiple users viewing and annotating the same pdf and
> you want each user to save their own annotations? 

That would be a relevant use case, indeed.

> If so, I see no way to do that without changing the Okular code.

Actually, since there were no reply,
I rolled my own take at this, and got a good proof of concept.

As in comment #16, it is using sha512, but this time I took some of your ideas
such as keeping pristine okular and working with binary deltas
(except I used rdiff instead of plain tail).

Currently, there are two bash scripts.
The first script copies the original file, applies the former delta if it exists,
and opens the working copy with the pristine okular.
The other bash script, running in the background, monitors the diffs folder;
when the working copy is updated (when the user saves the file in okular),
the delta is recalculated (within a lock to prevent race conditions).

I'm working on the second script to make it a dbus service, with automatic launch.
It's a bit like your manager, but much simpler.