Bug 249572 - Add a way to configure file metadata storage options
Summary: Add a way to configure file metadata storage options
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-31 03:12 UTC by Dave Jarvis
Modified: 2015-03-14 09:01 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Jarvis 2010-08-31 03:12:08 UTC
Version:           unspecified (using KDE 4.5.0) 
OS:                Linux

~/.kde/share/apps/okular/docdata$ ll | wc -l
4055
~/.kde/share/apps/okular/docdata$ du -hcs
17M     .
17M     total


Reproducible: Didn't try

Steps to Reproduce:
1. Use Okular.
2. Open lots of PDFs over several weeks.
3. Wonder where 17 MB of disk space went.

Actual Results:  
Okular's .xml files are not erased after a while.

Expected Results:  
The files are 300 bytes each. Surely they aren't necessary to keep around more than a week?

Not leave files hanging around until the Sun becomes a red giant.
Comment 1 Albert Astals Cid 2010-08-31 19:51:54 UTC
Not a bug, these files are used to maintain metadata of the files so they can't be erased without Okular loosing functionality.
Comment 2 Dave Jarvis 2010-08-31 20:49:01 UTC
That does not make sense.

Keeping meta data around for PDFs that have been deleted will not cause Okular to lose functionality.
Comment 3 Albert Astals Cid 2010-08-31 21:11:41 UTC
Agreed, unfortunately Okular is not a sentient being and has no way to know you deleted the file.
Comment 4 Christoph Feck 2010-08-31 21:53:48 UTC
Using Nepomuk in the future would be an option. That way, Okular could keep associations between its meta data and the files, and Nepomuk probably knows what files got deleted. But I guess it will be a long and bumpy ride until Nepomuk will be used/accepted both by users and developers.
Comment 5 Dave Jarvis 2010-08-31 21:57:06 UTC
Are shell scripts sentient, then?

if [ -a file ]; then
  echo "File exists!"
fi

;-)

Thus, on startup, it is technically possible for Okular to examine the XML files it has littered about and incinerate those that have no corresponding PDF file.

Another way idea is to delete files that are over a certain age.

Okular losing functionality without the files still does not make sense. Okular is not sentient enough to know that tomorrow I am going to download and read a PDF from Springer (at least, it better not, because that would make it sentient and clairvoyant). Thus it does not create the XML file *before* I need it. Therefore the XML files are not critical to its functionality after I close the program.

Furthermore, FoxIt, xpdf, and Acroread, to my knowledge, do not leave extraneous files lurking around.

This is a bug.
Comment 6 Albert Astals Cid 2010-08-31 22:27:33 UTC
It is not a bug. FoxIt, xpdf, and Acroread do not provide the features we do so they don't need files around (you really need to learn what lurk means because doesn't mean that).

Anyway, if you think it's a bug, you can provide a patch that does what you say and then will discuss if we accept it or not.
Comment 7 Dave Jarvis 2010-08-31 22:47:46 UTC
Lurk: lie in wait, lie in ambush, behave in a sneaky and secretive manner 
Lurk: to hang out or wait around a location, preferably without drawing attention to oneself

If I lost 17MB on my old laptop (as it has very little space left) to XML files, I would certainly feel as though I had been ambushed by Okular. That the files were buried in $HOME/.kde implies a secret (hidden) location. That they build up over time, and never get erased, means that issue is not immediately apparent. Ergo, to lurk.

I most certainly will not waste my time developing something and THEN have it discussed for inclusion, or tossed by the wayside.

If you want to accept my proposal that Okular automatically deletes XML files for which it cannot find an existing PDF, then I will develop it and you will include it on the main branch (provided it meets your standards).
Comment 8 Albert Astals Cid 2010-08-31 23:03:16 UTC
You won't be able to develop that since Okular doesn't store the full path of the file in its metadata.

The only solution to your problem is adding a new page in the settings dialog of Okular. This page would have the following components.

A combo box that would let you select if you want to "Keep all metadata files", "Keep metadata files for x days", "Keep up to X MB of metadata files", "Do not keep metadata at all".

It would also have a list with all the metadata files and would let you remove metadata files from there in a file per file basis.

I would accept that as a valid solution for your problem.
Comment 9 Albert Astals Cid 2010-08-31 23:09:31 UTC
Correction to my comment, we do have the url inside the XML file, still i think my solution is much more complete.
Comment 10 Dave Jarvis 2010-09-01 01:47:21 UTC
Given that you can bring Okular closer to sentience in this regard, I recommend deleting the meta data if the corresponding PDF no longer exists. I will implement this solution for you between October and November.

This should be done when the main Okular window is closed, but before the application terminates. (To avoid impacting start-up times for those people who have thousands of files, on slower disk drives, that can be erased.)

Note: in the Macintosh world things "just work" without user intervention. Users *don't care* about software and users don't want to learn. Users don't want to babysit their PDF reader. Users should expect that software applications will be well-behaved and operate under the principle of least astonishment. 17 MB was a surprise when I stumbled upon the horde.
Comment 11 Albert Astals Cid 2010-09-01 01:50:50 UTC
Sincerely, I don't care of what happens in the Macintosh world.

And thanks for volunteering to write that code.
Comment 12 Dave Jarvis 2010-10-13 23:05:51 UTC
Until the KDE packages are fixed, I cannot work on this bug, as I cannot create a development environment.

<<<  PACKAGES FAILED TO BUILD  >>>
kdebase - ~/kdesvn/log/2010-10-13-04/kdebase/cmake.log
kdepim - ~/kdesvn/log/2010-10-13-04/kdepim/build-1.log
kdegraphics - ~/kdesvn/log/2010-10-13-04/kdegraphics/cmake.log
kdeplasma-addons - ~/kdesvn/log/2010-10-13-04/kdeplasma-addons/cmake.log
Script finished processing at Wed Oct 13 13:37:01 2010

ERROR #1 - ~/kdesvn/log/2010-10-13-04/kdebase/cmake.log
  Could NOT find DBusMenuQt: Found version "..", but required is at least
  "0.6.0" (found /usr/lib/libdbusmenu-qt.so)

ERROR #2 - ~/kdesvn/log/2010-10-13-04/kdepim/build-1.log
/home/kde-devel/kdesvn/kdepim/mailcommon/mailutil.cpp: In function ‘bool MailCommon::Util::createTodoFromMail(const Akonadi::Item&)’:
/home/kde-devel/kdesvn/kdepim/mailcommon/mailutil.cpp:170: error: ‘i18n’ was not declared in this scope
make[2]: *** [mailcommon/CMakeFiles/mailcommon.dir/mailutil.o] Error 1
make[1]: *** [mailcommon/CMakeFiles/mailcommon.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

ERROR #3 - ~/kdesvn/log/2010-10-13-04/kdegraphics/cmake.log
-- checking for module 'lcms'
--   package 'lcms' not found
CMake Error at /home/kde-devel/kde/share/apps/cmake/modules/FindPackageHandleStandardArgs.cmake:204 (MESSAGE):

Likely resolved with: $ sudo apt-get install liblcms-dev

However, the scripts should be written so as to install any dependencies automatically.

ERROR #4 - 
-- checking for module 'scim'
--   package 'scim' not found
-- Could NOT find SCIM  (missing:  SCIM_LIBRARIES SCIM_INCLUDE_DIR)
CMake Error at /home/kde-devel/kde/share/apps/cmake/modules/FindPackageHandleStandardArgs.cmake:204 (MESSAGE):
  Could NOT find QCA2 (missing: QCA2_LIBRARIES QCA2_INCLUDE_DIR)
Call Stack (most recent call first):
  /home/kde-devel/kde/share/apps/cmake/modules/FindQCA2.cmake:44 (find_package_handle_standard_args)
  dataengines/microblog/CMakeLists.txt:1 (find_package)

Likely resolved with: $ sudo apt-get install libscim-dev

Once the first two errors have been resolved, I can continue setting up a development environment.
Comment 13 Albert Astals Cid 2010-10-14 00:35:04 UTC
This is not the place to report problems building things that have nothing to do with okular
Comment 14 Dave Jarvis 2010-10-14 06:06:16 UTC
1. I was giving you a status update: I cannot create development environment due to problems with the KDE scripts, ergo, I cannot fix this bug at this time.

2. I posted the information on the off-chance that someone here, as a developer of Okular who has had to install a working KDE development environment--as it is a requirement to build Okular--may have stumbled upon these issues and could offer a quick work-around. (I have posted the content to the appropriate KDE mailing list, too.)

If there is another way to build Okular that does not depend on having the source code to KDE kicking around, I'd be glad to know it. Or direct me to a document that fully describes the build process.
Comment 15 Albert Astals Cid 2010-10-14 20:08:05 UTC
FTI i fixed #3 and #4, i am not facing #1 nor #2 so can't fix them. The easiest way to get those things helped is going into the #kde-devel channel in freenode and asking there.
Comment 16 Sergei Ivanov 2012-12-04 19:32:52 UTC
I am annoyed by this bug too. I cleaned my ~/.kde/share/apps/okular/docdata two months ago and now it has 3500+ files taking away 14M of storage. Everything else under ~/.kde occupies less than 800K combined.

I never had that many PDFs. I just use latex, edit latex files with kile, and display the generated PDFs with okular. Every time a latex file is edited, okular creates a new XML file for the generated PDF and does not delete the old one. This is ridiculous.

For example, I am currently working on a latex file 'symplectic.tex'. This is one file but I am editing it. In .../okular/docdata I have 206 files named *.symplectic.pdf.xml:

~/.kde/share/apps/okular/docdata$ ls -l *symplectic*
-rw-rw-r-- 1 serg serg 359 Nov 18 23:07 100993.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 359 Nov 18 23:06 101028.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 358 Nov 18 23:13 101644.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 359 Nov 19 00:37 108118.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 359 Nov 18 23:06 108267.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 359 Nov 19 00:39 108329.symplectic.pdf.xml
-rw-rw-r-- 1 serg serg 359 Nov 19 00:55 125006.symplectic.pdf.xml
[... about 200 similar entries skipped]

These tiny files contain no information other than the last viewing mode and position in the file. It is pointless to remember this information for so many files and for so long. Some other applications manage to keep their 'recently visited' data in one file (for example, kile in ~/.kde/share/config/kilerc) and do not let it grow infinitely. It would be nice if okular did that too.