Bug 195628

Summary: extract notes into plain text
Product: [Applications] okular Reporter: heikki.lehvaslaiho
Component: generalAssignee: Okular developers <okular-devel>
Status: REPORTED ---    
Severity: wishlist    
Priority: NOR    
Version: 0.8.4   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:

Description heikki.lehvaslaiho 2009-06-08 07:21:27 UTC
Version:           0.8.4 (using 4.2.4 (KDE 4.2.4), Kubuntu packages)
Compiler:          cc
OS:                Linux (i686) release 2.6.28-11-generic

I like the new note taking system in okular. I use it for adding notes to academic thesis that I am reviewing. What is missing is an easy way of exporting the made notes into a text file.

Since the notes are kept in an XML file, this is easy to do - although beyond technical capabilities of a casual user.

For my own purposes, I've written the following perl script. It has hard coded xml file name and understands only notes done in "Note[1]" style. It uses XML::Simple for simplicity but the module converts lists of one into hash references and longer lists into array references  leading into unnecessary complex code.

Done properly within okular and KDE framework, the note export functionality would have access to the note XML file and could format the exported text into various formats (plain text, HTML, odt, ...). Naturally, it would export all note styles.


---------------------------------------
#!/usr/bin/env perl

use XML::Simple;
use Data::Dumper;

my $file = "/home/heikki/.kde/share/apps/okular/docdata/2653117.sarahthesis.pdf.xml";

my $ref = XMLin($file);
#print Dumper $ref;


print "Notes on ", $ref->{url}, "\n\n";

my $c = 1;

foreach my $page (@{$ref->{pageList}->{page}}) {
    print "\n#===== Page ", $page->{number}, " =====\n";

    if ( ref($page->{annotationList}->{annotation}) eq 'ARRAY') {

	foreach my $a (@{$page->{annotationList}->{annotation}}) {
	    print "\n!----- Note $c -----\n\n";
#	    print $a->{base}->{creationDate}, "\n";
	    print $a->{base}->{contents}, "\n";
	    $c++;
	}
    } else { # only one annotation on the page
	print "\n!----- Note $c -----\n\n";
#	print $page->{annotationList}->{annotation}->{base}->{creationDate}, "\n";
	print $page->{annotationList}->{annotation}->{base}->{contents}, "\n";
	$c++;
    }
}

print "\n";
---------------------------------------