342504 – Add possibility to copy formulas as MATHML/Latex Math/OO Math

Bug 342504 - Add possibility to copy formulas as MATHML/Latex Math/OO Math

Summary: Add possibility to copy formulas as MATHML/Latex Math/OO Math

Status:	REPORTED

Alias:	None

Product:	okular
Classification:	Applications
Component:	PDF backend (other bugs)
Version First Reported In:	0.20.2
Platform:	Other Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Okular developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-01-05 12:20 UTC by Christoph Thielecke
Modified:	2015-01-05 20:36 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Christoph Thielecke 2015-01-05 12:20:14 UTC

It would be nice if there is a possibility to copy a certain text (which is a formula) as formula (detect and convert) and let return the specfic formula syntax text to clipboard.
Possible target formats should be MATHML, Latex. Another possible candidate is LibreOffice /OpenOffice math.

From user perspective, I expect to select the select function and have addional entries like under "Text": copy as MATHML to clipboard, copy as Latex to clipboard, copy as LibreOffice Math to clipboard

Reproducible: Always

Comment 1 Christoph Feck 2015-01-05 20:17:44 UTC

Is there any other software able to extract formulas from PDF? To me it looks like a very hard problem, as soon as the formulas use multiple levels of text (fractions etc.)

Comment 2 Yuri Chornoivan 2015-01-05 20:36:31 UTC

(In reply to Christoph Feck from comment #1)
> Is there any other software able to extract formulas from PDF? To me it
> looks like a very hard problem, as soon as the formulas use multiple levels
> of text (fractions etc.)

MaxTract (development canceled) can do the extraction directly.

http://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php

Infty Reader can do it using OCR.

Some thoughts on the problem can be found here (my tests confirm the conclusions of this paper and nothing seems changed from 2011):

http://www.cs.bham.ac.uk/~aps/research/papers/pdf/BaSeSoSu-ICDAR11-ComparingApproachesToMathematicalDocumentAnalysisFromPDF.pdf

IMHO, it is hard to expect that free OCR engines like Ocropus/Tesseract can solve the problem in the nearest future. At least, I failed to train Tesseract in recognition of even rather simple formulas.