Bug 515572

Summary: Baloo is not able to create a reliable database
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Daniel <daniel.schoeni>
Component: Baloo File DaemonAssignee: baloo-bugs-null
Status: REPORTED ---    
Severity: normal CC: nicolas.fella
Priority: NOR    
Version First Reported In: 5.115.0   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Daniel 2026-02-05 20:17:23 UTC
SUMMARY
Files can be added to the index, but viewing it's data gets wrong informations. The whole database seems to be scambled up.

STEPS TO REPRODUCE
1. Setup 
    exclude folders[$e]=$HOME/
    folders[$e]=
    index contents=false
    only basic indexing=false

2. Add files like:
    balooctl index "/home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv"
    or folders:
    find $DIR_NAME -type f -regex '.*\.\(mp4\|mkv\|avi\|mp3\|jpg\|jpeg\|png\|MP4\|MKV\|AVI\|MP3\|JPG\|JPEG\|PNG\)' -exec balooctl index {} +

3. Look for results, like:
    balooshow -x "/home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv"

OBSERVED RESULT
balooshow -x "/home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv"

Die Dokument-Kennung in der Baloo-Datenbank und im Dateisystem sind verschieden:
Url: /home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv
ID:       878226761127958 (DB) <-> 537330911877142 (FS)
Inode:    204478 (DB) <-> 125107 (FS)
DeviceID: 438376470 (DB) == 438376470 (FS)
1e8b31a211816 438376470 125107 /home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv [/home/sd/Media/Musik/_Classic/Vaclav Neumann/Rusalka/1-04-Czech Philharmonic-04 He comes here frequently.mp3]
        Mtime: 1516553273 2018-01-21T17:47:53
        Ctime: 1516553273 2018-01-21T17:47:53
        Cached properties:
                Bitrate: 160000
                Kanäle: 2
                Dauer: 295
                Genre: Classical
                Abtastrate: 44100
                Nummer des Stücks: 4
                Jahr der Veröffentlichung: 1998
                Kommentar: D107
                Interpret: Czech Philharmonic
                Album: Rusalka
                Interpret des Albums: Vaclav Neumann
                Komponist: Dvořák
                Titel: 04 He comes here frequently
                CD-Nummer: 1
                ReplayGain Album Peak: 0.999969
                ReplayGain Album Gain: 5.38
                ReplayGain Track Peak: 0.652557
                ReplayGain Track Gain: 4.71

Interne Information
Dateinamen-Begriffe: F04 F1 Fcomes Fczech Ffrequently Fhe Fhere Fmp3 Fphilharmonic 
XAttr Begriffe: 
Plain Text Terms: 04 classical comes czech dvorak frequently he here neumann philharmonic rusalka vaclav 
Property Terms: Maudio Mmpeg T2 X1-160000 X10-rusalka X11-neumann X11-vaclav X12-dvorak X15-04 X15-comes X15-frequently X15-he X15-here X2-2 X3-295 X4-classical X5-44100 X6-4 X62-1 X7-1998 X74-0.999969 X75-5.38 X76-0.652557 X77-4.71 X8-d107 X9-czech X9-philharmonic 
replayGainAlbumPeak: 0.999969
replayGainAlbumGain: 5.38
channels: 2
duration: 295
bitRate: 160000
trackNumber: 4
replayGainTrackPeak: 0.652557
releaseYear: 1998
replayGainTrackGain: 4.71
genre: classical
sampleRate: 44100
album: rusalka
albumArtist: neumann vaclav
comment: d107
artist: czech philharmonic
title: 04 comes frequently he here
composer: dvorak
discNumber: 1

Let's try with the answer from the last call:
balooshow -x "/home/sd/Media/Musik/_Classic/Vaclav Neumann/Rusalka/1-04-Czech Philharmonic-04 He comes here frequently.mp3"

Die Dokument-Kennung in der Baloo-Datenbank und im Dateisystem sind verschieden:
Url: /home/sd/Media/Musik/_Classic/Vaclav Neumann/Rusalka/1-04-Czech Philharmonic-04 He comes here frequently.mp3
ID:       537330911877142 (DB) <-> 378198078593046 (FS)
Inode:    125107 (DB) <-> 88056 (FS)
DeviceID: 438376470 (DB) == 438376470 (FS)
157f81a211816 438376470 88056 /home/sd/Media/Musik/_Classic/Vaclav Neumann/Rusalka/1-04-Czech Philharmonic-04 He comes here frequently.mp3 [/home/sd/Media/Musik/M/Meg Myers/Take Me to the Disco/10 Little Black Death (Meg Myers).mp3]
        Mtime: 1538768000 2018-10-05T21:33:20
        Ctime: 1538768000 2018-10-05T21:33:20
        Cached properties:
                Bitrate: 320000
                Kanäle: 2
                Dauer: 242
                Genre: Rock
                Abtastrate: 44100
                Nummer des Stücks: 10
                Jahr der Veröffentlichung: 2018
                Interpret: Meg Myers
                Album: Take Me to the Disco
                Titel: Little Black Death
                Copyright: 2018 300 Entertainment
                Herausgeber: 300 Entertainment
                Beschriftung: 300 Entertainment
                ReplayGain Album Peak: 1.097838
                ReplayGain Album Gain: -11.63
                ReplayGain Track Peak: 1.079415
                ReplayGain Track Gain: -11.13

Interne Information
Dateinamen-Begriffe: F10 Fblack Fdeath Flittle Fmeg Fmp3 Fmyers 
XAttr Begriffe: 
Plain Text Terms: 300 black death disco entertainment little me meg myers rock take the to 
Property Terms: Maudio Mmpeg T2 X1-320000 X10-disco X10-me X10-take X10-the X10-to X15-black X15-death X15-little X2-2 X22-2018 X22-300 X22-entertainment X23-300 X23-entertainment X3-242 X4-rock X5-44100 X6-10 X69-300 X69-entertainment X7-2018 X74-1.097838 X75-11.63 X76-1.079415 X77-11.13 X9-meg X9-myers 
publisher: 300 entertainment
trackNumber: 10
replayGainAlbumPeak: 1.097838
releaseYear: 2018
replayGainAlbumGain: 11.63
genre: rock
sampleRate: 44100
album: disco me take the to
replayGainTrackPeak: 1.079415
replayGainTrackGain: 11.13
artist: meg myers
title: black death little
channels: 2
duration: 242
bitRate: 320000
copyright: 2018 300 entertainment
label: 300 entertainment

balooctl status

Die Baloo-Dateiindizierung läuft
Indizierungsstatus: Inaktiv
Gesamtzahl der indizierten Dateien: 211.871
Dateien, die noch indiziert werden: 0
Dateien, deren Indizierung fehlgeschlagen ist: 0
Der aktuelle Index hat eine Größe von 338,82 MiB


EXPECTED RESULT
The command balooshow -x "file" should give the information for this specific file, not for any other.

SOFTWARE/OS VERSIONS
Operating System: Ubuntu Studio 24.04
KDE Plasma Version: 5.27.12
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.13
Kernel Version: 6.14.0-37-generic (64-bit)
Graphics Platform: X11
Processors: 16 × 13th Gen Intel® Core™ i7-13700K
Memory: 62.5 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 4060 Ti/PCIe/SSE2
Manufacturer: ASUS

ADDITIONAL INFORMATION

I would expect, that the data for any file is correct. If there would be a problem reading a file, it  simply should not create an index entry. I looks, like it could be the problem, if a file has bad or missing data, the indexer keeps the entry open and fills in the data of the next file. Maybe it's also something else, but the result is however unusable.

I need the index information, to get values for duration, width and hight to compare media files in dolphin. With the actual state of the data, it is impossible to do that. So this is a minus point for Linux, because I dont't have any problem to get this data on the Windows Explorer. I'm trying at the moment to change from Windows to Linux, but faced with such problems, I can't really do it.
Comment 1 Daniel 2026-02-06 01:03:38 UTC
Now I've found 1 File (out of over 200'000), which had a '\009' inside its name. the special character is invisible and if the file is downloaded or copied with a filemanager, it's not identifiable as a filename with a problematic character. I guess, this was the reason for the database to struggle. This should really not break the whole database, but it happened.

After a rename of the file, a database purge and a reindex, the problem has been gone, at least for the moment.

balooshow -x "/home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv"
1e8b31a211816 438376470 125107 /home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv
        Mtime: 1748907580 2025-06-03T01:39:40
        Ctime: 1748907580 2025-06-03T01:39:40
        Cached properties:
                Bitrate: 2194924
                Dauer: 6858
                Breite: 1280
                Höhe: 538
                Seitenverhältnis: 2.376237623762376
                Bildwiederholrate: 23.976023976023978

Interne Information
Dateinamen-Begriffe: F2020 F2067 Fdie Fkampf Fmkv Fum Fzukunft 
XAttr Begriffe: 
Plain Text Terms: 
Property Terms: Mmatroska Mvideo Mx T3 X1-2194924 X26-1280 X27-538 X28-2.376237623762376 X29-23.976023976023978 X3-6858 
height: 538
width: 1280
frameRate: 23.976023976023978
aspectRatio: 2.376237623762376
bitRate: 2194924
duration: 6858
Comment 2 Daniel 2026-02-08 16:53:30 UTC
Today I've realized, that the database is corrupted again.

Yesterday it was working, I did shut down the system, went to sleep, and after starting the system today, I have now wrong data on all entries, like "duration" for pictures and a Size of 75x75 for a movie, which is in fact 1280x720, but no duration. For another movie, the duration shows 0:03:18, which must be from a mp3, but not from this movie.

So the cause for this corruption seems not to come from the filename with a control character inside, but from something else, which I can't find nor identify.
Comment 3 Daniel 2026-02-08 16:55:52 UTC
Yes the data from the movie I took to compare is also corrupted.

balooshow -x "/home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv"
Die Dokument-Kennung in der Baloo-Datenbank und im Dateisystem sind verschieden:
Url: /home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv
ID:       94927656982 (DB) <-> 110802004678678 (FS)
Inode:    22 (DB) <-> 25798 (FS)
DeviceID: 438376470 (DB) == 438376470 (FS)
64c61a211816 438376470 25798 /home/sd/Media/Filme/_mkv/2067 Kampf um die Zukunft (2020)/2067 Kampf um die Zukunft (2020).mkv [/home/sd/Media/Musik/A/A Taste of Honey/A Taste of Honey + Twice as Sweet [Disc 2]/2-06-A Taste of Honey-Don't You Lead Me On.mp3]
        Mtime: 1453754438 2016-01-25T21:40:38
        Ctime: 1453754438 2016-01-25T21:40:38
        Cached properties:
                Bitrate: 192000
                Kanäle: 2
                Dauer: 198
                Genre: Funk
                Abtastrate: 44100
                Nummer des Stücks: 6
                Jahr der Veröffentlichung: 2000
                Kommentar: Track 2
                Interpret: A Taste of Honey
                Album: A Taste of Honey + Twice as Sweet [Disc 2]
                Interpret des Albums: A Taste of Honey
                Titel: Don't You Lead Me On
                CD-Nummer: 2
                ReplayGain Album Peak: 0.456482
                ReplayGain Album Gain: 1.25
                ReplayGain Track Peak: 0.393982
                ReplayGain Track Gain: 1.37

Interne Information
Dateinamen-Begriffe: F06 F2 Fa Fdon't Fhoney Flead Fme Fmp3 Fof Fon Ftaste Fyou 
XAttr Begriffe: 
Plain Text Terms: + 2 a as disc don't funk honey lead me of on sweet taste twice you 
Property Terms: Maudio Mmpeg T2 X1-192000 X10-+ X10-2 X10-a X10-as X10-disc X10-honey X10-of X10-sweet X10-taste X10-twice X11-a X11-honey X11-of X11-taste X15-don't X15-lead X15-me X15-on X15-you X2-2 X3-198 X4-funk X5-44100 X6-6 X62-2 X7-2000 X74-0.456482 X75-1.25 X76-0.393982 X77-1.37 X8-2 X8-track X9-a X9-honey X9-of X9-taste 
title: don't lead me on you
replayGainAlbumGain: 1.25
comment: 2 track
artist: a honey of taste
album: + 2 a as disc honey of sweet taste twice
albumArtist: a honey of taste
genre: funk
sampleRate: 44100
trackNumber: 6
releaseYear: 2000
bitRate: 192000
replayGainTrackPeak: 0.393982
channels: 2
replayGainTrackGain: 1.37
duration: 198
discNumber: 2
replayGainAlbumPeak: 0.456482