Bug 435161

Summary: Baloo crashes with ASSERT in metadatamover.cpp
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Oded Arbel <oded>
Component: Baloo File DaemonAssignee: Stefan Brüns <stefan.bruens>
Status: RESOLVED FIXED    
Severity: normal CC: baloo-bugs-null, bernie, christoph, fahmy.khaled, nate, postix, tagwerk19
Priority: NOR    
Version First Reported In: 5.80.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=478854
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Oded Arbel 2021-03-30 18:38:40 UTC
SUMMARY
Another Baloo assert crash, this time:

----8<----
Mar 30 19:04:59 vesho baloo_file[2358496]: kf.baloo: Failed to create watch for "/home/odeda/.cache/mozilla/firefox/i1m74zv1.default/safebrowsing-updating/" No such file or directory
Mar 30 19:04:59 vesho baloo_file[2358496]: kf.baloo: "/home/odeda/.cache/mozilla/firefox/i1m74zv1.default/safebrowsing-updating/" -> "/home/odeda/.cache/mozilla/firefox/i1m74zv1.default/safebrowsing-backup/"
Mar 30 19:04:59 vesho baloo_file[2358496]: ASSERT: "from[from.size()-1] != QLatin1Char('/')" in file [...]src/file/metadatamover.cpp, line 80
----8<----

Looks like `MetadataMover::updateMetadata()` (that is called from `MetadataMover::moveFileMetadata()` is called for a directory.

STEPS TO REPRODUCE
I'm not really sure of the process - it has happened during first scan, but it looks like a directory being moved and I think that it was moved before it was indexed, but I'm not sure of the exact process. Regardless, the directory no longer exists - I think it was moved than deleted before baloo got a chance to index it, but I don't think that is related to the problem.

OBSERVED RESULT
baloo_file crashes and indexing stops.

EXPECTED RESULT
It shouldn't crash

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 5.21.80
KDE Frameworks Version: 5.81.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
It looks like the issue is in `KInotify::slotEvent()`, which emits `KInotify::moved` and hits `FileWatch::slotFileMoved`, but looking at the code:

----8<----
                // update the path cache
                if (event->mask & IN_ISDIR) {
                    // ... stuff in case event is for a directory
                }
//                qCDebug(BALOO) << oldPath << "EventMoveTo" << path;
                Q_EMIT moved(QFile::decodeName(oldPath), fname);
----8<----

Which means that `KInotify::moved` is emitted for both directories and files, but `FileWatch::slotFileMoved` expects only files and enforces that with an assert.
Comment 1 Oded Arbel 2021-03-30 18:45:39 UTC
(In reply to Oded Arbel from comment #0)
> Which means that `KInotify::moved` is emitted for both directories and
> files, but `FileWatch::slotFileMoved` expects only files and enforces that
> with an assert.

Lookgin deeper into `FileWatch::slotFileMoved`, it does expect to handle directories in some cases, but for calls that are forwarded to `MetadataMover::moveFileMetadata()` - it doesn't care. `MetadataMover::moveFileMetadata()` itself doesn't seem to care either, but it calls `MetadataMover::updateMetadata()` that will crash on paths that end with '/'.

Maybe the correct approach would be to have `MetadataMover::updateMetadata()` silently ignore paths that end with '/'?
Comment 2 Oded Arbel 2021-03-30 18:58:11 UTC
(In reply to Oded Arbel from comment #0)
> Regardless, the
> directory no longer exists - I think it was moved than deleted before baloo
> got a chance to index it, but I don't think that is related to the problem.

This could actually be related to the problem because I have configured `$HOME/.cache` to be excluded from scanning so it shouldn't have been scanned.

`FileWatch::slotFileMoved` consults ``FileIndexerConfig::shouldBeIndexed()` which in turn uses `QFileInfo::isDir()` which - as far as I understand - check if the path is an actual directory and will return `false` for paths that don't exist. I think the case where a directory was moved and then deleted is not handled correctly and will cause weird behaviors that might trigger asserts.
Comment 3 tagwerk19 2021-04-01 10:35:01 UTC
Not sure whether this is related...

It's a glitch I get "every so often". When I've missed capturing it, I've not found a way of reproducing it...

    cd ~/Documents
    mkdir temp
    echo "Hello Penguin" > temp/file1.txt
    balooshow -x temp/file1.txt

gave:

    144da00000fc01 64513 1330592 temp/file1.txt [/home/test/.kde/share/config/kdeglobals]
            Mtime: 1617136526 2021-03-30T22:35:26
            Ctime: 1617136526 2021-03-30T22:35:26
            Cached properties:
                    Line Count: 158

    Internal Info
    Terms: 0 0.025 ... Mplain Mtext T5 T8 X20-158 activebackground activeblend activeforeground backgroundalternate backgroundnormal ... widgetstyle window wm
    File Name Terms: Fkdeglobals
    XAttr Terms:
    lineCount: 158

That is, the details of something else completely (a different file, indexed at a different time). "Balooctl monitor" did not show "file1.txt" being indexed.

After a reboot I get the more expected:

    144da00000fc01 64513 1330592 temp/file1.txt [/home/test/Documents/temp/file1.txt]
            Mtime: 1617270546 2021-04-01T11:49:06
            Ctime: 1617270546 2021-04-01T11:49:06
            Cached properties:
                    Line Count: 1

    Internal Info
    Terms: Mplain Mtext T5 T8 X20-1 hello penguin
    File Name Terms: Ffile1 Ftxt
    XAttr Terms:
    lineCount: 1

I was hoping to be able to snapshot/clone the "confused" system, seemingly not this time though :-(

It may be that if such a "confused" system is referencing a directory rather than a file then you might get such an assert....

    Neon Unstable
    Plasma: 5.21.80
    Frameworks: 5.81.0
    Qt: 5.15.2
Comment 4 tagwerk19 2021-04-01 10:36:52 UTC
Had to trim the list of terms in the previous post as the full list looked too much like spam.
Comment 5 Bernie Innocenti 2021-12-06 04:10:15 UTC
I just hit this same ASSERT() in baloo built from git master, and the pathname was: "/home/bernie/thunderbird.bak/".

Invalid encoding. Ignoring "/home/bernie/emu/amiga/amigahd/Work/Network/Programs/MicroDot/Data/DBX_60/EDJAEDCOBDJEJLABDMJFJNPKGINIEG"
ASSERT: "from[from.size()-1] != QLatin1Char('/')" in file /home/bernie/kde/src/baloo/src/file/metadatamover.cpp, line 80

(gdb) bt
#0  0x00007ffff70cdd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff70b7862 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7680910 in QMessageLogger::fatal(char const*, ...) const () from /usr/lib/libQt5Core.so.5
#3  0x00007ffff767fcf5 in qt_assert(char const*, char const*, int) () from /usr/lib/libQt5Core.so.5
#4  0x00005555555908b2 in Baloo::MetadataMover::updateMetadata (this=0x5555555d9800, tr=0x7fffffffcc00, from="/home/bernie/thunderbird/", to="/home/bernie/thunderbird.bak/") at /home/bernie/kde/src/baloo/src/file/metadatamover.cpp:80
#5  0x00005555555904aa in Baloo::MetadataMover::moveFileMetadata (this=0x5555555d9800, from="/home/bernie/thunderbird/", to="/home/bernie/thunderbird.bak/") at /home/bernie/kde/src/baloo/src/file/metadatamover.cpp:42
#6  0x000055555558a1ed in Baloo::FileWatch::slotFileMoved (this=0x7fffffffd570, urlFrom="/home/bernie/thunderbird/", urlTo="/home/bernie/thunderbird.bak/") at /home/bernie/kde/src/baloo/src/file/filewatch.cpp:77
...
Comment 6 Bernie Innocenti 2021-12-06 06:18:28 UTC
I'm able to trigger a similar assertion a different location by simply creating a new directory in my home:



...
[Detaching after fork from child process 31519]
QIBusPlatformInputContext: invalid portal bus.
QSocketNotifier: Can only be used with threads started with QThread
[Detaching after fork from child process 31538]
QIBusPlatformInputContext: invalid portal bus.
QSocketNotifier: Can only be used with threads started with QThread
ASSERT: "!filePath.endsWith(QLatin1Char('/'))" in file /home/bernie/kde/src/baloo/src/file/newfileindexer.cpp, line 38

Thread 4 "Thread (pooled)" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff1680640 (LWP 31481)]
0x00007ffff70cdd22 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff70cdd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff70b7862 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7680910 in QMessageLogger::fatal(char const*, ...) const () from /usr/lib/libQt5Core.so.5
#3  0x00007ffff767fcf5 in qt_assert(char const*, char const*, int) () from /usr/lib/libQt5Core.so.5
#4  0x0000555555579e99 in Baloo::NewFileIndexer::run (this=0x555555870bf0)
    at /home/bernie/kde/src/baloo/src/file/newfileindexer.cpp:38
#5  0x00007ffff76c1332 in ?? () from /usr/lib/libQt5Core.so.5
#6  0x00007ffff76be02f in ?? () from /usr/lib/libQt5Core.so.5
#7  0x00007ffff6016259 in start_thread () from /usr/lib/libpthread.so.0
#8  0x00007ffff718f5e3 in clone () from /usr/lib/libc.so.6
(gdb) f 4
#4  0x0000555555579e99 in Baloo::NewFileIndexer::run (this=0x555555870bf0)
    at /home/bernie/kde/src/baloo/src/file/newfileindexer.cpp:38
38              Q_ASSERT(!filePath.endsWith(QLatin1Char('/')));
(gdb) list
33              : BasicIndexingJob::MarkForContentIndexing;
34
35          Transaction tr(m_db, Transaction::ReadWrite);
36
37          for (const QString& filePath : std::as_const(m_files)) {
38              Q_ASSERT(!filePath.endsWith(QLatin1Char('/')));
39
40              QString mimetype;
41              QFileInfo fileInfo(filePath);
42
(gdb) p filePath
$1 = "/home/bernie/foo/"
Comment 7 Bernie Innocenti 2021-12-06 08:22:36 UTC
I don't see how these asserts could have ever worked in the past. I traced back the pathnames, and the '/' is appended to all directories with normalizeTrailingSlash(): https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/kinotify.cpp#L359

So, new directories will always have a trailing slash, why is NewFileIndexer::run() not expecting to see them?
My first hypothesis was that NewFileIndex is only for processing files. But there's a specific check for directories a few lines after the Q_ASSERT that fails: ​https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/newfileindexer.cpp#L47-51

Code to append slashes to directories was added ~1 year ago:
https://invent.kde.org/frameworks/baloo/-/commit/73183acf00a2b7f10b286a1241716cb7e0785f0d

A few months later, we removed a couple of asserts on directory removal and rename:
https://invent.kde.org/frameworks/baloo/-/commit/96284739b4b63110113af0ba7dedfd18c4459063

The second check was missing a '!', and it was added shortly after by Nate:
https://invent.kde.org/frameworks/baloo/-/commit/dc04879f058dd1eab72e4bd4c80bdc1ea3c06c9b

So I think we missed directory creation and a few other cases. It should be safe to keep trailing slashes, because BasicIndexingJob strips them in the constructor:
https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/basicindexingjob.cpp#L26-28

There are other asserts disallowing trailing slashes. These are placed just before code like this:
  QString fileName = filePath.mid(filePath.lastIndexOf(QLatin1Char('/')) + 1);

So I think we can't simply remove the check, as it will set fileName to an empty string. QFileInfo::fileName() is explicitly documented to behave like this:
  "Note that, if this QFileInfo object is given a path ending in a slash, the name of the file is considered empty."
Comment 8 Bernie Innocenti 2021-12-06 08:23:41 UTC
On a tangential note, why didn't the autotests catch this regression?
Comment 9 Bug Janitor Service 2021-12-06 09:01:33 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/71
Comment 10 Bernie Innocenti 2021-12-06 09:04:23 UTC
Just to be clear: I could 100% reproduce the crash in baloo_file with these directory operations in my home:

  cd ~
  mkdir foo   <- CRASH!
  mv foo bar  <- CRASH!
  chmod -x bar  <- CRASH!
Comment 11 tagwerk19 2021-12-06 11:41:48 UTC
(In reply to Bernie Innocenti from comment #10)
> ... 100% reproduce ...
I remember having a go at building a "with a clean index" test and got nowhere. Chasing up what would happen if an inode was reused but I couldn't pin down anything.

My recollection is that the Asserts were included (and kept) to flush out "bad cases" before they made things worse. Alas, cannot remember/find where I read that.
Comment 12 tagwerk19 2024-01-07 22:58:10 UTC
(In reply to tagwerk19 from comment #11)
> ... but got nowhere ...
I stumbled on this again (Bug 478854)

I can reliably reproduce on neon-unstable; on newly-installed, as vanilla as possible, systems but if I try the same on a neon-testing, I don't get failures.
Comment 13 Christoph Cullmann 2025-06-23 18:21:29 UTC
Baloo::MetadataMover::updateMetadata

got fixed in

commit 27a5e70abe6479dee045f5ed8412c55fc1d87243
Author: Méven Car <meven@kde.org>
Date:   Wed Jan 15 18:35:47 2025 +0100

    MetadataMover::updateMetadata: tolerate path ending with /

    Since KInotify can return path ending with '/', instead of asserting,
    sanitize the input data by chopping the end /.

    SENTRY: BALOO-1H4

in KF6.

The /home/bernie/kde/src/baloo/src/file/newfileindexer.cpp:38

in

commit 33756dd44c4f67a7984fa4c5da61726b46aa6bc6
Author: Méven Car <meven.car@kdemail.net>
Date:   Fri Mar 29 17:52:28 2024 +0000

    Some indexers need non-slash trailed path
    
    Fix regression introduced in 73183acf, prior to this commit all path
    from KInotify did not have a trailing slash, adjust in consequence.
    
    9b61371fdefbd538938f20cdc87eed03d170fa5b adjusted BasicIndexingJob but
    not the other cases.
    
    Add some missing unit test cases, although they don't cover the bug
    cases, they allow to illustrate the crash conditions.
    
    BUG: 478854
    
    Superseeds https://invent.kde.org/frameworks/baloo/-/merge_requests/119
    
    cc @tagwerk19.invent @cullmann
Comment 14 Christoph Cullmann 2025-06-23 18:24:21 UTC
*** Bug 497014 has been marked as a duplicate of this bug. ***