Bug 443055 - Pdf File's Size Changed to 0 Bytes After Sudden Power Outage
Summary: Pdf File's Size Changed to 0 Bytes After Sudden Power Outage
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR crash
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-28 05:55 UTC by Jason Liam
Modified: 2021-09-29 13:06 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Liam 2021-09-28 05:55:32 UTC
I was reading a book using Okular. I made some annotations and then pressed Ctrl+S to save the changes. Just 2 seconds after i pressed Ctrl+S there was a power outage and my pc powered off(note pc didn't have any UPS). Next when the power came back on i started my pc and found that the same pdf file is now occupying 0 bytes of space. Unfortunately, i was not using git in this case so i cannot get the old file back.

I have found that there are some xml files under the folder /home/username/.local/share/okular/docdata/ that correspond to my old pdf. For example the files are named like:
/home/username/.local/share/okular/docdata/4297319.NetworkProgramming.pdf.xml
/home/username/.local/share/okular/docdata/4298415.NetworkProgramming.pdf.xml
...

My question is that Is there a way to get back my old pdf file back using these above xml files? For example, i have an old pdf which is the same as the old pdf except that this one doesn't have any annotations in it. So Will it be possible to add these xml information into that file(that doesn't have any annotation)?

I am using Ubuntu 18.04 x86_64 and Okular 1.3.3
Comment 1 Laura David Hurka 2021-09-28 12:47:35 UTC
> My question is that Is there a way to get back my old pdf file back
> using these above xml files?

No, but...

> For example, i have an old pdf which is the same as the old pdf
> except that this one doesn't have any annotations in it. So Will
> it be possible to add these xml information into that file(that
> doesn't have any annotation)?
> 
> I am using Ubuntu 18.04 x86_64 and Okular 1.3.3

This old Okular version stored annotations in these XML files. So if you find that one of these XML files contains your annotations, you only need to get the original PDF file back. Then the annotations will either appear magically in the PDF file, or you can figure out which new XML file is created for this new PDF file, and copy the annotations from the old XML file to the new XML file.

To get the annotations actually in the PDF file, you will need to install a newer version of Okular. It should offer you to migrate the annotations from the XML file to the PDF file.

I am surprised that the PDF file has been lost. Your Okular version should theoretically not modify it...
Comment 2 Jason Liam 2021-09-28 14:43:28 UTC
(In reply to David Hurka from comment #1)
> > My question is that Is there a way to get back my old pdf file back
> > using these above xml files?
> 
> No, but...
> 
> > For example, i have an old pdf which is the same as the old pdf
> > except that this one doesn't have any annotations in it. So Will
> > it be possible to add these xml information into that file(that
> > doesn't have any annotation)?
> > 
> > I am using Ubuntu 18.04 x86_64 and Okular 1.3.3
> 
> This old Okular version stored annotations in these XML files. So if you
> find that one of these XML files contains your annotations, you only need to
> get the original PDF file back. Then the annotations will either appear
> magically in the PDF file, or you can figure out which new XML file is
> created for this new PDF file, and copy the annotations from the old XML
> file to the new XML file.
> 
I tried this but this doesn't seem to work. I have the following things:
1) NetworkProgramming.pdf . This file is the replica of the file that i lost except that this old version doesn't have any annotations in it. Actually when i downloaded this file then i copied this file to another folder and then started adding annotation in that file. So this is the file i started with. 
2) I have the xml files for the version that had a lot of annotations in it and which i lost. But i checked these xml files and almost all of them(AFAIK) have a very similar content inside them. For example the xml file contains information like the following:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE documentInfo>
<documentInfo url="/home/username/Documents/NetworkProgramming.pdf">
 <generalInfo>
  <history>
   <oldPage viewport="199;C2:0.5:0.99783:1"/>
   <oldPage viewport="200;C2:0.5:1.0038:1"/>
   <oldPage viewport="201;C2:0.5:0.990287:1"/>
   <oldPage viewport="202;C2:0.5:0.970861:1"/>
   <oldPage viewport="203;C2:0.5:0.975084:1"/>
   <oldPage viewport="204;C2:0.5:0.956503:1"/>
   <oldPage viewport="205;C2:0.5:0.0713682:1"/>
   <oldPage viewport="204;C2:0.5:0.000422297:1"/>
   <oldPage viewport="203;C2:0.5:0.00802365:1"/>
   <oldPage viewport="202;C2:0.5:0.0646115:1"/>
   <current viewport="201;C2:0.5:0.467483:1"/>
  </history>
  <views>
   <view name="PageView">
    <zoom mode="0" value="1.28479"/>
   </view>
  </views>
 </generalInfo>
</documentInfo>

I had a lot of annotations in my pdf around 2 MB or so and looking at the content of the xml files it looks like they have very little to do with those annotations. Is this how Okular's xml format save annotation information for like underline, text etc? Each of these xml files takes around 800 bytes of space.
This is what i did next:
Step 1) I put the pdf file(with not annotation) in the same folder that i lost.  
Step 2) I opened a xml file from yesterday since i added a lot of annotations yesterday. The content of this xml file is what i have pasted above.
Step 3) Next i opened the new version(that doesn't have any annotation) of the pdf with okular and found out the xml file that is created. Actually there were 4 new xml files that were created for this new version of pdf.
Step 4) I pasted the content that i copied from yesterday's xml to this new xml and then save this xml file and closed the application(text editor).
Step 5) I opened the pdf version with okular to see whether the yesterday's annotations were added but they aren't. 
What am i doing wrong? 
> To get the annotations actually in the PDF file, you will need to install a
> newer version of Okular. It should offer you to migrate the annotations from
> the XML file to the PDF file.
> 
> I am surprised that the PDF file has been lost. Your Okular version should
> theoretically not modify it...
Yes if instead a temporary file were used to write and checked if the write was successful this problem could have been avoided.
Comment 3 Laura David Hurka 2021-09-28 22:28:01 UTC
If you say you created at least X bytes worth of annotations, but you can’t find a docdata file which is bigger than X bytes, I am afraid, you can not recover the annotations. :(

Except there is another location where Okular stored annotations in these days. It was definitifely an XML file. But as far as I remember, it was these docdata files.

> > I am surprised that the PDF file has been lost. Your Okular version should
> > theoretically not modify it...
> 
> Yes if instead a temporary file were used to write and checked if the write
> was successful this problem could have been avoided.

Okular should theoretically open a PDF file only for reading, not for writing. So it should stay in its state on disk.

But actually I remember a bug report where a PDF file has shrunken to 0 bytes after a power outage.
Comment 4 Jason Liam 2021-09-29 05:42:38 UTC
(In reply to David Hurka from comment #3)
> If you say you created at least X bytes worth of annotations, but you can’t
> find a docdata file which is bigger than X bytes, I am afraid, you can not
> recover the annotations. :(
> 
These are the things i noticed:
1) I have 976 number of files correspoinding to the pdf i lost. By looking at the date/time of their modification i see that they range from 26 August 2021 to 28 September 2021. During this one month i added a lot of annotations every day. The pdf file that i started with originally(without annotation) is of 3.2 MB. Now when i was adding annotations everyday i noticed that the size of the pdf gets bigger and bigger everyday. And at the last day(28 August 2021) it was taking around 4.3 MB of space. So the size increased by around 1 MB during a span of one month. 
2) Whenever i added some text annotation into my pdf lets say i added a text annotation saying "this is some example text annotation inside the pdf", i was able to press Ctrl+F and search for this particular string "this is some example text annotation inside the pdf" or "this is some" etc etc. This makes me wonder if the annotations that i was adding day by day were actually embedded inside the pdf instead of being in the xml files. I am not sure if the version of okular that i am using (1.3.3) have this embedded annotations feature. 
3) Now to confirm that the xml files that i have have nothing to do with the annotations that i add and the annotations are actually embedded inside the pdf itself, i took this pdf lets say named MyBook.pdf. Now i checked the docdata folder to see(make sure) that there are no xml files corresponding to this pdf. So at this point i had no xml files corresponding to MyBook.pdf. Now i opened the MyBook.pdf using Okular and added some annotation like created a rectagle and added a text annotation saying "some xyz text for testing" and pressed Ctrl+S. Now i went back to the docdata folder and found that there are now 4 xml files corresponding to MyBook.pdf. When i right click on any of these xml files and see how much space each takes it is around 943 bytes(each). Now when i open all of them simultaneously in sublime text i see that they all have almost the same content and there is no tag for rectangle or the text annotations that i made. Below is the content inside one of the xml files:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE documentInfo>
<documentInfo url="/home/username/Documents/MyBook.pdf">
 <generalInfo>
  <history>
   <oldPage viewport="199;C2:0.5:0.99783:1"/>
   <oldPage viewport="200;C2:0.5:1.0038:1"/>
   <oldPage viewport="201;C2:0.5:0.990287:1"/>
   <oldPage viewport="202;C2:0.5:0.970861:1"/>
   <oldPage viewport="203;C2:0.5:0.975084:1"/>
   <oldPage viewport="204;C2:0.5:0.956503:1"/>
   <oldPage viewport="205;C2:0.5:0.0713682:1"/>
   <oldPage viewport="204;C2:0.5:0.000422297:1"/>
   <oldPage viewport="203;C2:0.5:0.00802365:1"/>
   <oldPage viewport="202;C2:0.5:0.0646115:1"/>
   <current viewport="201;C2:0.5:0.467483:1"/>
  </history>
  <views>
   <view name="PageView">
    <zoom mode="0" value="1.28479"/>
   </view>
  </views>
 </generalInfo>
</documentInfo>

Now i am confused that if there is no need for xml files since they do not have anything to do with the annotations i add in the pdf then why are they created in the first place? What is the need of these xml files they do not correspond to the actual annotations that i make? Is this a bug in this version of okular 1.3.3. Just FYI the 4 files are named like follows:
3187327.MyBook.pdf.xml
...
...

Moreover these files takes around 900 bytes each so they waste memory unnecessarily it seems.  
> Except there is another location where Okular stored annotations in these
> days. It was definitifely an XML file. But as far as I remember, it was
> these docdata files.
> 
> > > I am surprised that the PDF file has been lost. Your Okular version should
> > > theoretically not modify it...
> > 
> > Yes if instead a temporary file were used to write and checked if the write
> > was successful this problem could have been avoided.
> 
> Okular should theoretically open a PDF file only for reading, not for
> writing. So it should stay in its state on disk.
> 
> But actually I remember a bug report where a PDF file has shrunken to 0
> bytes after a power outage.
Comment 5 Laura David Hurka 2021-09-29 13:06:39 UTC
You are right, Okular 1.3.3 already saves annotations in PDF files. I thought this feature had been added in 1.4 or later, but it was actually added in 0.15.

In that case you unfortunately can not recover the annotations in a usual way. You might try some data recovery tools, depending on the file system which you use.

The docdata files are used to remember which page you viewed last, so when you open the document again, you see the same page.