Bug 405247

Summary: baloo (baloo_file_extractor) should report what file is causing it problems
Product: [Frameworks and Libraries] frameworks-baloo Reporter: skierpage <skierpage>
Component: Baloo File DaemonAssignee: baloo-bugs-null
Status: RESOLVED FIXED    
Severity: normal CC: aspotashev, meven29, nate, stefan.bruens
Priority: NOR    
Version: 5.55.0   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description skierpage 2019-03-09 00:54:36 UTC
SUMMARY
When I do balooctl enable, I get a lot of warnings and then a crash (bug 405017). But journalctl and .xsession-errors have no indication what files are causing these problems, making further debugging difficult.

STEPS TO REPRODUCE
1. In a terminal, run balooctl disable, then balooctl enable

OBSERVED RESULT
I get lots of output including:

Could not determine correct datetime format from: "2017:01:14 24:01:42"
Error: XMP Toolkit error 203: Duplicate property or field node
Warning: Failed to decode XMP metadata.
Warning: Directory Image, entry 0x0001 has unknown Exif (TIFF) type 0; setting type size 1.
Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it.
...
and I often get a crash in kfilemetadata_exiv2extractor library.


EXPECTED RESULT
I just want to know what files are causing these problems so I can fix their metadata, file better bug reports, and avoid the crash.

SOFTWARE/OS VERSIONS
KDE Plasma Version: 5.14.5
KDE Frameworks Version: 5.55.0
Qt Version: 5.11.3

ADDITIONAL INFORMATION
It would help if there was documentation on how to run baloo_file_extractor on a particular file. (I know I can run `balooctl index path/to/file` but that usually reports "Already indexed". I want to see the extractor run regardless.)
Comment 2 Méven Car 2019-03-13 09:48:10 UTC
Have you tried "balooshow file.jpg", it could be of help.
Seen from https://community.kde.org/Baloo/Debugging

You issue seams triggered by a photo created on 14th of January 2017 probably at 1am, it could help you find it.
Comment 3 Nate Graham 2019-03-16 23:37:12 UTC
The problem with writing the file it's currently indexing to some log somewhere is that this would be a huge privacy violation and we'd be storing a lot of potentially sensitive and personally-identifiable information in that log file.

`balooctl monitor` already does show you which file it's working on; is there a reason why that's not sufficient?
Comment 4 skierpage 2019-04-14 00:16:15 UTC
(In reply to Nate Graham from comment #3)
> The problem with writing the file it's currently indexing to some log
> somewhere is that this would be a huge privacy violation and we'd be storing
> a lot of potentially sensitive and personally-identifiable information in
> that log file.
1) That seems more an issue for user permissions and logging/journalctl. User log files including Baloo logs should only be readable by other users with admin rights. journalctl output already contains warnings that expose paths like 'baloo_file[1540]: File moved to path which now no longer exists - "/home/spage/OMG/my/secrets"'

FWIW in Fedora 29 I tried to view journal files as an unprivileged user (`sudo -u openvpn journalctl`), and got
    Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
      Pass -q to turn off this notice.
    No journal files were opened due to insufficient permissions.

2) If logging failing files is really a privacy issue, then it could be under the control of a BALOO_LOG_FAILING_PATHS environment variable.

> `balooctl monitor` already does show you which file it's working on; is
> there a reason why that's not sufficient?
I had problems getting this to work. When I boot Linux, baloo starts reindexing on startup, reaches a problem file, and crashes. If I'm lucky my desktop is up and I get a DrKonqi crash notice, but nothing tells me what file had the problem. Also `balooctl monitor` reports the file indexing but not the crash. I filed this bug because it was hard for me to figure out what was crashing baloo indexing in bug 405017. The only reason I was able to identify the file is I had "only" added 50 new pictures and I guessed that one of my PANO files might have caused the problem.

We all want Baloo to be more reliable, and making it easy to find out what file causes it problems will surely help. Cheers, thanks!
Comment 5 Stefan Brüns 2020-08-05 04:43:18 UTC
1) "balooctl failed" reports problematic files
2) "balooctl monitor reports the filename when it starts to index a file, and if there is no "OK" after it, it failed.