Bug 481785 - Torture testing for Baloo
Summary: Torture testing for Baloo
Status: RESOLVED WORKSFORME
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: unspecified
Platform: Other iOS
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-24 20:39 UTC by Alejandro Nova
Modified: 2024-06-14 03:47 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alejandro Nova 2024-02-24 20:39:59 UTC
At the request of the maintainer, I’m filling a bug against Baloo to upload a file that will be used to torture the File Indexer. 

File: History of the Chilean Constitution (volume 1)
License: Public domain (according to Chilean Law)
Length: 1.155 pages, including accented text
Comment 1 tagwerk19 2024-02-26 06:53:01 UTC
My Google skills are not up to this :-) Tried "History of the Chilean Constitution" PDF and "Historia de la Constitución chilena" PDF and have come out blank. I have found a:

http://sitios.uvm.cl/derechosfundamentales/revista/04.075-239.Bibliografia-Juridica-Chilena.pdf

Quite a bit shorter and seems very clean (BibTex?). Baloo copes with this. When I look for large PDFs I often get scanned documents, pages without plain text.
Comment 2 tagwerk19 2024-03-07 11:25:40 UTC
Do you have a link to a PDF of "History of the Chilean Constitution" that we can test?
Comment 3 Alejandro Nova 2024-03-16 00:03:11 UTC
Sure. 

https://www.bcn.cl/leychile/consulta/antecedentes_const_1980

The official name is "Comisión de Estudios de la Nueva Constitución" or "Comisión Ortúzar". 

You will find eleven volumes, from which I uploaded the first. You may freely download all eleven volumes.

With this you will be able to properly torture the Baloo indexer.
Comment 4 tagwerk19 2024-03-16 07:57:53 UTC
(In reply to Alejandro Nova from comment #3)
> ... You will find eleven volumes, from which I uploaded the first. You may
> freely download all eleven volumes ...
Takes me to:
    https://nuevo.leychile.cl/servicios/Navegar/scripts/obtienearchivo?id=recursoslegales/10221.3/3764/2/Tomo_I_Comision_Ortuzar.pdf
Let me try this out on various systems...
Comment 5 tagwerk19 2024-03-16 13:23:05 UTC
(In reply to tagwerk19 from comment #4)
> ... Let me try this out on various systems ...
Hmmm. For me, it works.

A "pdf to text" conversion:, 
    $ pdftotext Tomo_I_Comision_Ortuzar.pdf
gave me a .txt version
    $ wc Tomo_I_Comision_Ortuzar.txt 
      49821  501678 3233386 Tomo_I_Comision_Ortuzar.txt
in a few seconds...

It seems that Baloo was able to index the Tomo_I_Comision_Ortuzar.pdf - searching for a collection of words in the file works:
    $ baloosearch actuen actuo actus acucia acucioso

I had a moment of surprise when baloosearch did not find the .txt version. Baloo has a file size limit for text files but as far as I remember it was more than that....
Comment 6 Stefan Brüns 2024-05-14 14:37:33 UTC
(In reply to Alejandro Nova from comment #0)
> At the request of the maintainer, I’m filling a bug against Baloo to upload
> a file that will be used to torture the File Indexer. 
> 
> File: History of the Chilean Constitution (volume 1)
> License: Public domain (according to Chilean Law)
> Length: 1.155 pages, including accented text

You left out any information what is not working for you ...
Comment 7 tagwerk19 2024-05-15 05:21:17 UTC
(In reply to Stefan Brüns from comment #6)
> You left out any information what is not working for you ...
It was a continuation of a thread on KDE Discuss, with the file taking a disproportionate time to index.

The weakness of memory, I didn't take a copy. It's sobering when you cannot find what you are looking for :-/
Comment 8 Bug Janitor Service 2024-05-30 03:45:48 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 9 Bug Janitor Service 2024-06-14 03:47:18 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!