Bug 425148 - okular import of old embedded annotations does not import the original create/modify time stamp
Summary: okular import of old embedded annotations does not import the original create...
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-08 19:50 UTC by jprorama
Modified: 2020-08-24 21:56 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
attachment-26762-0.html (2.54 KB, text/html)
2020-08-09 23:39 UTC, jprorama
Details
attachment-29420-0.html (1.07 KB, text/html)
2020-08-18 19:53 UTC, jprorama
Details
attachment-28902-0.html (941 bytes, text/html)
2020-08-22 21:43 UTC, jprorama
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jprorama 2020-08-08 19:50:28 UTC
SUMMARY

Observed with okular versions since 1.3.3 on Ubuntu 18.04.  When opening a PDF with annotations using the old "internal format" storage the user is prompted to save the annotations to the PDF file.


"This document contains annotations or form data that were saved internally by a previous Okular version. Internal storage is no longer supported.
Please save to a file in order to move them if you want to continue to edit the document."

When saving the annotations, the original create and modify time stamps are lost and replaced with the current datetime.

STEPS TO REPRODUCE
1. use current version of okular to open a PDF with annotations stored in the original internal storage (eg. okular xml file for pdf in  ~/.kde/share/apps/okular/docdata/)
2. the need to migrate is detected
3. saving the file to a new PDF preserves the original annotation.
4. the create and modify time stamps of from the original annotation are lost and replaced with the current date.

OBSERVED RESULT

The original time stamps for create and modify of an annotation are lost on import to the PDF and replaced with the current date.

EXPECTED RESULT

The original time stamps for create and modify of an annotation are should be preserved and imported to the PDF.


SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: Ubuntu 18.04 is test system but any version of okular seems to behave this way.
(available in About System)
KDE Plasma Version:  
KDE Frameworks Version:  5.44.0 and above
Qt Version: 5.9.5 (built against 5.9.5) and above

ADDITIONAL INFORMATION

This appears to be part of how the annotations are imported.  The annotation is properly reconstructed from the original annotation but the metadata on the annotation is not migrated to the new annotation format.

The migration logic seems to begin here: 

https://github.com/KDE/okular/blob/master/core/document.cpp#L1192

This appears to reach the code where the local data is converted to new annotation format.

https://github.com/KDE/okular/blob/master/core/page.cpp#L811

The newly instantiated annotations don't appear to have their time stamps explicitly set from the original so the new instance timestamps are the ones that get saved.
Comment 1 Albert Astals Cid 2020-08-09 22:13:43 UTC
Sure, valid issue, but at this point how many people are left using such old versions that this is actually an issue we want to spend time fixing?
Comment 2 jprorama 2020-08-09 23:39:05 UTC
Created attachment 130742 [details]
attachment-26762-0.html

It’s an issue in 1.10 so wouldn’t call it old.

1.3 is in Ubuntu 18.04 and is a valid upgrade path from soon to expire
Ubuntu 16.04 which included a prior okular release that still used the
internal storage for annotation.

This is a bug still present in the data migration routines of 1.10 that
actually causes data loss by ignoring the metadata of the original
annotation.

For those of us who use the annotation feature to take notes in Okular and
are confronted by the loss of timestamps of a large collection of
annotations, it’s a real bug that doesn't have any resolution but in the
code base of okular during migration steps.

I filed the bug as the first step of problem resolution with the intent of
coding a solution.  I believe it comes down to a line or so of code: set
the properties from the original annotation after creating the migrated
Annotation instance. See the code sections referenced in the report.

I’m looking for bug reviewers that can help guide that implementation.



On Sun, Aug 9, 2020 at 5:13 PM Albert Astals Cid <bugzilla_noreply@kde.org>
wrote:

> https://bugs.kde.org/show_bug.cgi?id=425148
>
> Albert Astals Cid <aacid@kde.org> changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|                            |aacid@kde.org
>
> --- Comment #1 from Albert Astals Cid <aacid@kde.org> ---
> Sure, valid issue, but at this point how many people are left using such
> old
> versions that this is actually an issue we want to spend time fixing?
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 3 jprorama 2020-08-10 00:41:06 UTC
Here's a proposed fix to the also set the properties when creating the new annotations.

https://invent.kde.org/graphics/okular/-/merge_requests/239

I look forward to suggestions on whether this fix would be sufficient. 
I'm new to KDE dev and okular so don't understand the code base well, so
accept that this may be naive and require additional considerations.
Comment 4 Laura David Hurka 2020-08-10 09:16:11 UTC
(In reply to Albert Astals Cid from comment #1)
> Sure, valid issue, but at this point how many people are left using such old
> versions that this is actually an issue we want to spend time fixing?

Yes, I think we should support upgrading from such old versions. At least when the fix does not require structural changes in Okular. The submitted MR 239 looks promising to me, although it doesn’t pass tests right now.
Comment 5 jprorama 2020-08-10 11:20:55 UTC
The first pipeline failed On clang because of tab vs space.  I added a new commit to fis it and the second one passed. 

I have yet to test it because I’m still getting my local build env set up. This is currently based on a code read.
Comment 6 Albert Astals Cid 2020-08-10 22:06:39 UTC
(In reply to David Hurka from comment #4)
> (In reply to Albert Astals Cid from comment #1)
> > Sure, valid issue, but at this point how many people are left using such old
> > versions that this is actually an issue we want to spend time fixing?
> 
> Yes, I think we should support upgrading from such old versions. At least
> when the fix does not require structural changes in Okular. The submitted MR
> 239 looks promising to me, although it doesn’t pass tests right now.

To be fair we support upgrading from such old versions, one could argue that the annotation timestamp being changed is even correct since now it's inside the PDF and previously wasn't, but i guess one could argue that this is a technicality and that the annotation was really added in the past to the users point of view.
Comment 7 jprorama 2020-08-10 22:20:14 UTC
(In reply to Albert Astals Cid from comment #6)
> 
> To be fair we support upgrading from such old versions, one could argue that
> the annotation timestamp being changed is even correct since now it's inside
> the PDF and previously wasn't, but i guess one could argue that this is a
> technicality and that the annotation was really added in the past to the
> users point of view.

Being the user, I naturally favor the second interpretation. ;)

In that line of thinking, the annotation is a data object created by the user.  That it was once stored in an an okular xml file and is now stored in the pdf itself is really of little importance to the user.  It's the data that I created that I care about.  In the current implementation, it's not just the timestamp that I would lose. Since the new annotation element is created by me, I'm the owner, the old object may in fact have a different owner listed in the xml.  That owner information is also lost by not migrating the metadata and this seems a even more significant than the time stamp.

Thanks for the additional thoughts and support in working on the migration.

I'm still in the process of getting a pristine okular to build in my build env so that I can test out the proposed fix with my data.  At the moment, most of the build dependencies succeed (78/84).  I'm running into an issue with polkit-qt-1.
Comment 8 Albert Astals Cid 2020-08-10 22:32:04 UTC
(In reply to jprorama from comment #7)
>  I'm running into an issue with polkit-qt-1.

Just don't build polkit-qt-1, it's not needed for what you want to test.
Comment 9 jprorama 2020-08-13 21:13:17 UTC
I've had a chance to debug this further.  It appears that the annotations are properly restored with original metadata (author, create, mod dates etc) during the import.  

When the message about the internal storage conversion is present, you can look at the annotations and see all the original properties are present.

It appears that the date stamps are overwritten when the PDF is saved.

This makes my original source code mode unecessary since the attributes are in fact read from internal storage correctly.   I'm trying to track down where/why the restored dates are lost during the save.
Comment 10 jprorama 2020-08-16 18:08:48 UTC
I'm starting to wonder if this might actually be an issue in Poppler but need some guidance in understanding the logic that Okular follows in the conversion process.  

As noted, Okular is correctly restoring the original create/mod dates of the annotation from its original storage. It prompts the user to save the PDF to a file to preserve the annotations with the PDF.

At this point I have traced lots of code, but it seems to come down to Okular handing of the Save operation to the PDFGenerator::save() method.  There is a QHashIteration across the annotations that compares unique ids between okular annotations and poppler annotations, but doesn't seem to have any other affect.

The actual save of the current Okular document that needs migration appears to be handed off to the Poppler::PDFConverter().  It seems like this utility takes the current Okular PDF document and saves it out to a temporary file.  It's during this operation that the original annotation create/modify times are lost.  The temp file is then copied to the saveAs target, but by that time the times are already set to the current date.

Is this understanding correct?

If so, is there any way to control the conversion of the annotations within Okular to avoid this issue.  I thought the PopplerAnnotationProxy might be able to, but it doesn't seem to be invoked by Poppler during the PDFConverter.
Comment 11 Albert Astals Cid 2020-08-18 19:30:57 UTC
There's some issues in the poppler side setting the modification date, i'm working on them, no idea if it will fix this or not, but it's at least a pre-requisite
Comment 12 jprorama 2020-08-18 19:53:40 UTC
Created attachment 130978 [details]
attachment-29420-0.html

Great, thanks!

Happy to review/test/contribute.   This is a personal need to convert pdfs
with preserved timestamps. So i have interest and can make time. 😉

On Tue, Aug 18, 2020 at 2:30 PM Albert Astals Cid <bugzilla_noreply@kde.org>
wrote:

> https://bugs.kde.org/show_bug.cgi?id=425148
>
> --- Comment #11 from Albert Astals Cid <aacid@kde.org> ---
> There's some issues in the poppler side setting the modification date, i'm
> working on them, no idea if it will fix this or not, but it's at least a
> pre-requisite
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 13 Albert Astals Cid 2020-08-18 21:06:14 UTC
This should help you i think https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/612
Comment 14 jprorama 2020-08-22 15:32:09 UTC
Thanks for the link.  I'm working to test it but having difficulty getting my okular build to use the poppler I built of that branch.

I'm using kdesrc-build for okular and ninja for poppler (as described by the CI build steps).  My kde build env is in ~/kde.

I build configured poppler with:

cmake -G Ninja -DCMAKE_INSTALL_PREFIX=~/kde/ ..

and run install to get the libs in ~/kde/usr/lib

ninja install

I set okular CMake
Comment 15 jprorama 2020-08-22 15:36:20 UTC
ugh. pressed enter too soon.

I set Okular CMake to look for poppler version 0.102 which looks like that's the tag on the libraries from the branch.

find_package(Poppler "0.102.0" COMPONENTS Qt5)

I set LD_LIBRARY_PATH to include ~/kde/usr/lib at the front. Then build okular.

When I look at the backend components in okular after loading a pdf, it still reports poppler 0.85 as the build target.

Needless to say my documenttest runs aren't showing any impact of the poppler improvements from your branch.

Thanks for any build hints.
Comment 16 Albert Astals Cid 2020-08-22 21:31:45 UTC
I'm sorry but bugzilla is not the place to explain how to compile things, if you want drop by the #okular or #poppler IRC channel in freenode and ask for help, if I (or someone else is there) we may help you.
Comment 17 jprorama 2020-08-22 21:43:43 UTC
Created attachment 131106 [details]
attachment-28902-0.html

Got it, thanks!

On Sat, Aug 22, 2020 at 4:31 PM Albert Astals Cid <bugzilla_noreply@kde.org>
wrote:

> https://bugs.kde.org/show_bug.cgi?id=425148
>
> --- Comment #16 from Albert Astals Cid <aacid@kde.org> ---
> I'm sorry but bugzilla is not the place to explain how to compile things,
> if
> you want drop by the #okular or #poppler IRC channel in freenode and ask
> for
> help, if I (or someone else is there) we may help you.
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 18 jprorama 2020-08-23 15:57:29 UTC
I was able to test your latest patch and it does preserve the creation time but still updates the modification time to the current time of the conversion.

I've updated the DocumentTest for migration to deal with the dates a little more sanely.
Comment 19 Albert Astals Cid 2020-08-24 21:30:02 UTC
(In reply to jprorama from comment #18)
> I was able to test your latest patch and it does preserve the creation time
> but still updates the modification time to the current time of the
> conversion.

Which from poppler's point of view is correct, you're modifying the annotation, so the modification time changes, if you want the modification time to be preserved, you need to make sure we set the modification time the last when recreating the poppler annotation from the okular annotation.
Comment 20 jprorama 2020-08-24 21:38:45 UTC
Not sure I’m following. 

It seems the document is resident in okular with correct create/modify times during migration when loaded from the internal xml.  The document is handed to poppler for writing (pdfconvert() i think).  It seems poppler should serialize what is there in the okular runtime. 

This behavior suggest to me that okular has done what you are suggesting: set modification time last during save. 

Could you clarify? Thanks. 

> On Aug 24, 2020, at 4:30 PM, Albert Astals Cid <bugzilla_noreply@kde.org> wrote:
> 
> https://bugs.kde.org/show_bug.cgi?id=425148
> 
> --- Comment #19 from Albert Astals Cid <aacid@kde.org> ---
> (In reply to jprorama from comment #18)
>> I was able to test your latest patch and it does preserve the creation time
>> but still updates the modification time to the current time of the
>> conversion.
> 
> Which from poppler's point of view is correct, you're modifying the annotation,
> so the modification time changes, if you want the modification time to be
> preserved, you need to make sure we set the modification time the last when
> recreating the poppler annotation from the okular annotation.
> 
> -- 
> You are receiving this mail because:
> You reported the bug.
Comment 21 Albert Astals Cid 2020-08-24 21:56:20 UTC
(In reply to jprorama from comment #20)
> Not sure I’m following. 
> 
> It seems the document is resident in okular with correct create/modify times
> during migration when loaded from the internal xml.  The document is handed
> to poppler for writing (pdfconvert() i think).  It seems poppler should
> serialize what is there in the okular runtime. 
> 
> This behavior suggest to me that okular has done what you are suggesting:
> set modification time last during save. 
>
> Could you clarify? Thanks. 

You need to understand the code and add the missing line yourself. There's basically a setModificationDate() call missing after creating the poppler annotation from the stored okular annotation in okular code.

The only way for me to be clearer is for me to do the change by myself and then tell you "see if you change this line it works", which i already said i wasn't interested in working on, and I've already spent more time than i wanted on this.