Bug 195628 - extract notes into plain text
Summary: extract notes into plain text
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 0.8.4
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-08 07:21 UTC by heikki.lehvaslaiho
Modified: 2009-06-08 07:21 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description heikki.lehvaslaiho 2009-06-08 07:21:27 UTC
Version:           0.8.4 (using 4.2.4 (KDE 4.2.4), Kubuntu packages)
Compiler:          cc
OS:                Linux (i686) release 2.6.28-11-generic

I like the new note taking system in okular. I use it for adding notes to academic thesis that I am reviewing. What is missing is an easy way of exporting the made notes into a text file.

Since the notes are kept in an XML file, this is easy to do - although beyond technical capabilities of a casual user.

For my own purposes, I've written the following perl script. It has hard coded xml file name and understands only notes done in "Note[1]" style. It uses XML::Simple for simplicity but the module converts lists of one into hash references and longer lists into array references  leading into unnecessary complex code.

Done properly within okular and KDE framework, the note export functionality would have access to the note XML file and could format the exported text into various formats (plain text, HTML, odt, ...). Naturally, it would export all note styles.


---------------------------------------
#!/usr/bin/env perl

use XML::Simple;
use Data::Dumper;

my $file = "/home/heikki/.kde/share/apps/okular/docdata/2653117.sarahthesis.pdf.xml";

my $ref = XMLin($file);
#print Dumper $ref;


print "Notes on ", $ref->{url}, "\n\n";

my $c = 1;

foreach my $page (@{$ref->{pageList}->{page}}) {
    print "\n#===== Page ", $page->{number}, " =====\n";

    if ( ref($page->{annotationList}->{annotation}) eq 'ARRAY') {

	foreach my $a (@{$page->{annotationList}->{annotation}}) {
	    print "\n!----- Note $c -----\n\n";
#	    print $a->{base}->{creationDate}, "\n";
	    print $a->{base}->{contents}, "\n";
	    $c++;
	}
    } else { # only one annotation on the page
	print "\n!----- Note $c -----\n\n";
#	print $page->{annotationList}->{annotation}->{base}->{creationDate}, "\n";
	print $page->{annotationList}->{annotation}->{base}->{contents}, "\n";
	$c++;
    }
}

print "\n";
---------------------------------------