Summary: | KRunner crashes inside LMDB with assertion “mdb.c:6125: Assertion 'IS_LEAF(mp)' failed in mdb_cursor_set()” when using `Baloo::PostingDB::prefixIter()` | ||
---|---|---|---|
Product: | [Frameworks and Libraries] frameworks-baloo | Reporter: | Chris Long <achrislong> |
Component: | Engine | Assignee: | baloo-bugs-null |
Status: | RESOLVED WORKSFORME | ||
Severity: | crash | CC: | alexander.lohnau, antonio.ponzetto, dmalick, erin-kde, ilochab, johnjaylward, jplx256, kde, kfunk, nate, pythonshell, rasum.subedi13, xinfeiyang-2008 |
Priority: | VHI | Keywords: | drkonqi |
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Ubuntu | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
Chris Long
2018-04-24 16:37:02 UTC
*** Bug 394008 has been marked as a duplicate of this bug. *** *** Bug 391574 has been marked as a duplicate of this bug. *** *** Bug 387647 has been marked as a duplicate of this bug. *** *** Bug 386266 has been marked as a duplicate of this bug. *** *** Bug 386938 has been marked as a duplicate of this bug. *** *** Bug 402741 has been marked as a duplicate of this bug. *** *** Bug 405485 has been marked as a duplicate of this bug. *** *** Bug 408779 has been marked as a duplicate of this bug. *** I beg to differ on the list of duplicates of this bug, only this bug 387647 and my bug 408779 are related the assertion raised in LMDB, occuring inside Baloo::PostingDB::prefixIter: > [KCrash Handler] > #6 0x00007f8e59f51428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 > #7 0x00007f8e59f5302a in __GI_abort () at abort.c:89 > #8 0x00007f8e2d51d002 in mdb_assert_fail (env=0x7f8df8003d30, expr_txt=expr_txt@entry=0x7f8e2d51eb22 "IS_LEAF(mp)", func=func@entry=0x7f8e2d51f330 <__func__.7885> "mdb_cursor_next", line=line@entry=5685, file=0x7f8e2d51ea40 "mdb.c") at mdb.c:1481 > #9 0x00007f8e2d514740 in mdb_cursor_next (mc=mc@entry=0x7f8dfc066140, key=0x7f8e1dffa390, data=data@entry=0x7f8e1dffa3a0, op=MDB_NEXT) at mdb.c:5685 > #10 0x00007f8e2d5131e9 in mdb_cursor_get (mc=0x7f8dfc066140, key=key@entry=0x7f8e1dffa390, data=data@entry=0x7f8e1dffa3a0, op=op@entry=MDB_NEXT) at mdb.c:6184 > #11 0x00007f8e2d7478da in Baloo::PostingDB::iter<Baloo::PostingDB::prefixIter(const QByteArray&)::<lambda(const QByteArray&)> > (this=0xfc003b90, this=0xfc003b90, prefix=..., validate=...) at /build/baloo-kf5-B5k90A/baloo-kf5-5.36.0/src/engine/postingdb.cpp:227 > #12 Baloo::PostingDB::prefixIter (this=this@entry=0x7f8e1dffa480, prefix=...) at /build/baloo-kf5-B5k90A/baloo-kf5-5.36.0/src/engine/postingdb.cpp:246 > #13 0x00007f8e2d751c42 in Baloo::Transaction::postingIterator (this=this@entry=0x7f8e1dffa8a0, query=...) at /build/baloo-kf5-B5k90A/baloo-kf5-5.36.0/src/engine/transaction.cpp:296 > … All the other duplicate bugs are either completely useless (405485) or are caused by a SegFault due to some incorrect string manipulation inside of Baloo::PostingDB::iter (iter ≠ prefixIter) instead: > [KCrash Handler] > #6 __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:245 > #7 0x00007f1bfb8934e0 in memcpy (__len=567624928, __src=0x7f1b21d54510, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:53 > #8 QByteArray::QByteArray (this=0x7f1b21d54410, data=0x7f1b21d54510 "\200R0\004\033\177", size=567624928) at tools/qbytearray.cpp:1452 > #9 0x00007f1b239a2917 in DBPostingIterator::DBPostingIterator (this=0x7f1b04305810, data=<optimized out>, size=<optimized out>) at /build/baloo-kf5-wIK3t6/baloo-kf5-5.18.0/src/engine/postingdb.cpp:177 > #10 0x00007f1b239a29f1 in Baloo::PostingDB::iter (this=this@entry=0x7f1b21d54500, term=...) at /build/baloo-kf5-wIK3t6/baloo-kf5-5.18.0/src/engine/postingdb.cpp:169 > #11 0x00007f1b239ab8bb in Baloo::Transaction::postingIterator (this=this@entry=0x7f1b21d54900, query=...) at /build/baloo-kf5-wIK3t6/baloo-kf5-5.18.0/src/engine/transaction.cpp:294 > … Oops, my mistake! Feel free to rejigger the bugs' duplicates as needed. Done! All `Baloo::PostingDB::prefixIter(QByteArray const&)` related crashes have been moved to bug 386266. According to https://www.openldap.org/lists/openldap-bugs/201502/msg00056.html it looks like this assertion indeed means that the LMDB database used by Baloo is corrupted. Interestingly enough however, when I build LMDB from master `baloosearch` will work fine, but `baloorunner` will still crash as usual… I move the files `${XDG_DATA_HOME}/baloo` to `${XDG_DATA_HOME}/baloo.bak`, restarted Baloo & `baloorunner` and let it reindex all files. It appears to work correctly now. Since it probably **was** database corruption and, apparently, LMDB does not support database recovery, maybe we should create a new issue that runs some integrity checks up front and otherwise suggests recreating the database? 😕 Thanks so much for the bug wrangling and investigation! Yes, I agree thatthis kind of thing could be much improved. I'm not sure if an integrity check would be the right way to go though, just for performance reasons. Maybe it should be more intelligent and fix itself rather than crashing when it encounters this situation? Better LMDB error handling in general is tracked at Bug 368557. I actually managed to consistently reproduce this crash using the following script. This exists successfully on the rebuilt database, but crashes with the same assertion on the original one: ```py #!/usr/bin/python3 """Scan through a complete LMDB database.""" import argparse import pathlib import sys import lmdb __dir__ = pathlib.Path(__file__).parent __version__ = "0.1.0" def main(argv=sys.argv[1:], program=sys.argv[0]): parser = argparse.ArgumentParser(description=__doc__, prog=pathlib.Path(program).name) parser.add_argument("-V", "--version", action="version", version="%(prog)s {0}".format(__version__)) parser.add_argument("-n", "--no-subdir", action="store_false", dest="subdir") parser.add_argument("dbpath", action="store") #parser.add_argument(…) args = parser.parse_args(argv) with lmdb.open(args.dbpath, subdir=args.subdir, max_dbs=100) as env: print(f"Scanning database: {args.dbpath}") dbnames = [] try: transaction = env.begin() try: cursor = transaction.cursor() while cursor.next(): dbnames.append(cursor.key()) finally: cursor.close() finally: transaction.abort() for dbname in dbnames: print(f"Scanning database: {args.dbpath}/{dbname.decode('utf-8')}") try: handle = env.open_db(dbname) transaction = env.begin(db=handle) try: cursor = transaction.cursor() while cursor.next(): pass finally: cursor.close() finally: transaction.abort() return 0 if __name__ == "__main__": sys.exit(main()) ``` Running this on my rebuilt 1.8GiB database only takes about 1.2s (in CPython), so it appears it could be used as an actual integrity scanner within Baloo to detect whether the loaded database is reliable. Only caveat is that it **needs** to run a in subprocess with exit status monitoring since the assertion **cannot** be converted into some kind of error status being returned. The problem with bug 368557 is that that's about assertion failures inside Baloo when receiving an appropriate return code from LMDB. Not about LMDB getting so confused about the database that it just terminates the process. The script I wrote would only need to run once on Baloo startup. The deeper problem I see though is that – to my knowledge – LMDB offers **absolute no recovery mechanisms whatsoever**. In a lot of ways that actually disqualifies it as a database entirely: They advertise it being ACID, but apparently that's no always the case or this should not be possible in the first place. (The consequence of this likely being quite disruptive regarding Baloo's codebase if attempted to being fixed.) *** Bug 415769 has been marked as a duplicate of this bug. *** The baloo runner has been refactored into a D-Bus runner and runs in its own process. *** Bug 414122 has been marked as a duplicate of this bug. *** |