Bug 364133 - baloo_file_extractor crashes when database is larger than 5GB
Summary: baloo_file_extractor crashes when database is larger than 5GB
Status: RESOLVED DUPLICATE of bug 364475
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.22.0
Platform: openSUSE Linux
: NOR crash
Target Milestone: ---
Assignee: Pinak Ahuja
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-09 09:00 UTC by marvin24
Modified: 2016-09-11 11:20 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description marvin24 2016-06-09 09:00:31 UTC
ok, subject is just a guess. Version is 5.22.0 from openSUSE tumbleweed.
# ls .local/share/baloo:
-rw-r--r-- 1 me users 5364727808  8. Jun 23:33 index

backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  Baloo::getVarint32Ptr (value=<synthetic pointer>, limit=0x7a4b8a0 "\001", p=0x0) at /usr/src/debug/baloo-5.22.0/src/codecs/coding.h:120
(gdb) bt full
#0  0x00007fa845f83cf9 in Baloo::getDifferentialVarInt32(char*, char*, QVector<unsigned int>*) (value=<synthetic pointer>, limit=0x7a4b8a0 "\001", p=0x0)
    at /usr/src/debug/baloo-5.22.0/src/codecs/coding.h:120
        result = <optimized out>
        n = 182
        size = 4978149
        v = 1744503365
#1  0x00007fa845f83cf9 in Baloo::getDifferentialVarInt32(char*, char*, QVector<unsigned int>*) (p=0x0, 
    p@entry=0x3d27520 "\220ݴ\003", limit=limit@entry=0x7a4b8a0 "\001", values=values@entry=0x7fff4d721218)
    at /usr/src/debug/baloo-5.22.0/src/codecs/coding.cpp:158
        n = 182
        size = 4978149
        v = 1744503365
#2  0x00007fa845f8331f in Baloo::PositionCodec::decode(QByteArray const&) (this=this@entry=0x7fff4d72126f, arr=...)
    at /usr/src/debug/baloo-5.22.0/src/codecs/positioncodec.cpp:55
        info = {docId = 33791235021865007, positions = {d = 0xf106460}}
        data = 0x3d27520 "\220ݴ\003"
        end = 0x7a4b8a0 "\001"
        vec = {d = 0x4ad0090}
#3  0x00007fa845f6c910 in Baloo::PositionDB::get(QByteArray const&) (this=this@entry=0x7fff4d721360, term=...)
    at /usr/src/debug/baloo-5.22.0/src/engine/positiondb.cpp:101
        key = {mv_size = 7, mv_data = 0x373b658}
        val = {mv_size = 64117840, mv_data = 0x3d25c50}
        rc = <optimized out>
        data = {d = 0x245aa30}
        codec = {<No data fields>}
#4  0x00007fa845f80ebc in Baloo::WriteTransaction::commit() (this=<optimized out>) at /usr/src/debug/baloo-5.22.0/src/engine/writetransaction.cpp:299
        id = 33791230726897711
        op = @0x420aa18: {type = Baloo::WriteTransaction::AddId, data = {docId = 33791230726897711, positions = {d = 0x373b6d0}}}
        __for_range = <synthetic pointer>
        __for_begin = 0x420aa18
        operations = <optimized out>
        list = {d = 0x7fa6da3ff010}
        fetchedPositionList = false
        positionList = {d = 0x7fa8444adc00 <QArrayData::shared_null>}
        postingDB = {m_txn = 0xa6e860, m_dbi = 2}
        positionDB = {m_txn = 0xa6e860, m_dbi = 3}
        iter = {c = {{d = 0xa80120, e = 0xa80120}}, i = {i = 0x4625cd0}, n = {i = 0x38dc240}}
#5  0x00007fa845f77c02 in Baloo::Transaction::commit() (this=0xa2ad10) at /usr/src/debug/baloo-5.22.0/src/engine/transaction.cpp:262
#6  0x00000000004094ec in Baloo::App::processNextFile() (this=0x7fff4d721a20) at /usr/src/debug/baloo-5.22.0/src/file/extractor/app.cpp:100
        message = {d_ptr = 0x1}
        vl = 
            {<QListSpecialMethods<QVariant>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x7fa84489a460}, d = 0x7fa84489a460}}
#7  0x00007fa84443d5c6 in  () at /usr/lib64/libQt5Core.so.5
#8  0x00007fa84443250b in QObject::event(QEvent*) () at /usr/lib64/libQt5Core.so.5
#9  0x00007fa84512891c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#10 0x00007fa84512d7d6 in QApplication::notify(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#11 0x00007fa8444068b8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#12 0x00007fa844457bbe in QTimerInfoList::activateTimers() () at /usr/lib64/libQt5Core.so.5
#13 0x00007fa844458139 in  () at /usr/lib64/libQt5Core.so.5
#14 0x00007fa842555227 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
#15 0x00007fa842555458 in  () at /usr/lib64/libglib-2.0.so.0
#16 0x00007fa8425554fc in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0
#17 0x00007fa844458c3f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#18 0x00007fa84440479a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#19 0x00007fa84440ca6d in QCoreApplication::exec() () at /usr/lib64/libQt5Core.so.5
#20 0x00000000004083fc in main(int, char**) (argc=1, argv=0x7fff4d721c08) at /usr/src/debug/baloo-5.22.0/src/file/extractor/main.cpp:57
---Type <return> to continue, or q <return> to quit---
        aboutData = {d = 0x955d60}
        app = <incomplete type>
        appObject = 
          {<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7fa8448418e0 <QObject::staticMetaObject>, stringdata = 0x41d940 <qt_meta_stringdata_Baloo__App>, data = 0x41d8c0 <qt_meta_data_Baloo__App>, static_metacall = 0x41bed0 <Baloo::App::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_mimeDb = {d = 0x7fa844850900}, m_extractorCollection = <incomplete type>, m_config = {<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7fa8448418e0 <QObject::staticMetaObject>, stringdata = 0x41d6c0 <qt_meta_stringdata_Baloo__FileIndexerConfig>, data = 0x41d640 <qt_meta_data_Baloo__FileIndexerConfig>, static_metacall = 0x41c0e0 <Baloo::FileIndexerConfig::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_config = <incomplete type>, m_folderCache = {<QListSpecialMethods<QPair<QString, bool> >> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0xa2f760}, d = 0xa2f760}}, m_excludeFilterRegExpCache = {m_regexpCache = {<QListSpecialMethods<QRegularExpression>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0xa53520}, d = 0xa53520}}}, m_excludeMimetypes = {q_hash = {{d = 0xa20b30, e = 0xa20b30}}}, m_indexHidden = false, m_onlyBasicIndexing = false, m_devices = 0x9fe690, m_maxUncomittedFiles = 40}, m_notifyNewData = <incomplete type>, m_io = {m_stdinHandle = 0, m_stdoutHandle = 1, m_batchSize = 0, m_stdout = <incomplete type>}, m_idleMonitor = {<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7fa8448418e0 <QObject::staticMetaObject>, stringdata = 0x41d800 <qt_meta_stringdata_Baloo__IdleStateMonitor>, data = 0x41d780 <qt_meta_data_Baloo__IdleStateMonitor>, static_metacall = 0x41bf10 <Baloo::IdleStateMonitor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_isIdle = false}, m_updatedFiles = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x2688410}, d = 0x2688410}}, <No data fields>}, m_tr = 0xa2ad10}



Reproducible: Always
Comment 1 marvin24 2016-06-17 08:12:18 UTC
here is another one. This time I attached gdb to baloo_file_extractor and waited for the segfault. The one above came from a core dump. Somehow, this one here looks different. Also, there seem to be some kind of overflow. I ran balooctl status right after the segfault:

# balooctl status
Die Baloo-Dateiindizierung läuft
Indizierungsstatus: Dateiinhalt wird indiziert
75224/96084 Dateien indiziert
Current size of index is 1.008,71 MiB

# ls .local/share/baloo:
-rw-r--r-- 1 user users 5352673280 16. Jun 21:02 index

output of bt full:
Thread 1 "baloo_file_extr" received signal SIGSEGV, Segmentation fault.
0x00007f1a501926d4 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
(gdb) bt full
#0  0x00007f1a501926d4 in __memcpy_sse2_unaligned () at /lib64/libc.so.6
#1  0x00007f1a5261e503 in Baloo::PostingCodec::decode(QByteArray const&) (__len=18446744073052851664, __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string3.h:53
        vec = {d = 0x7f1a50b48bb8}
#2  0x00007f1a5261e503 in Baloo::PostingCodec::decode(QByteArray const&) (this=this@entry=0x7ffcd9bfd5ff, arr=...)
    at /usr/src/debug/baloo-5.22.0/src/codecs/postingcodec.cpp:42
        vec = {d = 0x7f1a50b48bb8}
#3  0x00007f1a5260a150 in Baloo::PostingDB::get(QByteArray const&) (this=this@entry=0x7ffcd9bfd6e0, term=...)
    at /usr/src/debug/baloo-5.22.0/src/engine/postingdb.cpp:100
        key = {mv_size = 10, mv_data = 0x5e57e6d8}
        val = {mv_size = 3638267344, mv_data = 0xd8db8dd0}
        rc = <optimized out>
        arr = {d = 0x9f23d50}
        codec = {<No data fields>}
#4  0x00007f1a5261b72d in Baloo::WriteTransaction::commit() (this=<optimized out>) at /usr/src/debug/baloo-5.22.0/src/engine/writetransaction.cpp:286
        operations = {d = 0x64833860}
        list = {d = 0x7f1a50b48bb8}
        fetchedPositionList = <optimized out>
        positionList = {d = 0xc97c9900}
        postingDB = {m_txn = 0x1b493f0, m_dbi = 2}
        positionDB = {m_txn = 0x1b493f0, m_dbi = 3}
        iter = {c = {{d = 0xc8b8d50, e = 0xc8b8d50}}, i = {i = 0x44089e0}, n = {i = 0x64833830}}
#5  0x00007f1a52612c02 in Baloo::Transaction::commit() (this=0x1099b750) at /usr/src/debug/baloo-5.22.0/src/engine/transaction.cpp:262
#6  0x00000000004094ec in Baloo::App::processNextFile() (this=0x7ffcd9bfddb0) at /usr/src/debug/baloo-5.22.0/src/file/extractor/app.cpp:100
        message = {d_ptr = 0x1}
        vl = 
            {<QListSpecialMethods<QVariant>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x7ffcd9bfd840}, d = 0x7ffcd9bfd840}}
#7  0x00007f1a50ad85c6 in  () at /usr/lib64/libQt5Core.so.5
#8  0x00007f1a50acd50b in QObject::event(QEvent*) () at /usr/lib64/libQt5Core.so.5
#9  0x00007f1a517c391c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#10 0x00007f1a517c87d6 in QApplication::notify(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#11 0x00007f1a50aa18b8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#12 0x00007f1a50af2bbe in QTimerInfoList::activateTimers() () at /usr/lib64/libQt5Core.so.5
#13 0x00007f1a50af3139 in  () at /usr/lib64/libQt5Core.so.5
#14 0x00007f1a4ebf0227 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
#15 0x00007f1a4ebf0458 in  () at /usr/lib64/libglib-2.0.so.0
#16 0x00007f1a4ebf04fc in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0
#17 0x00007f1a50af3c3f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#18 0x00007f1a50a9f79a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#19 0x00007f1a50aa7a6d in QCoreApplication::exec() () at /usr/lib64/libQt5Core.so.5
#20 0x00000000004083fc in main(int, char**) (argc=1, argv=0x7ffcd9bfdf98) at /usr/src/debug/baloo-5.22.0/src/file/extractor/main.cpp:57
        aboutData = {d = 0x1a32d60}
        app = <incomplete type>
        appObject = 
          {<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7f1a50edc8e0 <QObject::staticMetaObject>, stringdata = 0x41d940 <qt_meta_stringdata_Baloo__App>, data = 0x41d8c0 <qt_meta_data_Baloo__App>, static_metacall = 0x41bed0 <Baloo::App::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_mimeDb = {d = 0x7f1a50eeb900}, m_extractorCollection = <incomplete type>, m_config = {<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7f1a50edc8e0 <QObject::staticMetaObject>, stringdata = 0x41d6c0 <qt_meta_stringdata_Baloo__FileIndexerConfig>, data = 0x41d640 <qt_meta_data_Baloo__FileIndexerConfig>, static_metacall = 0x41c0e0 <Baloo::FileIndexerConfig::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_config = <incomplete type>, m_folderCache = {<QListSpecialMethods<QPair<QString, bool> >> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x1b2ab00}, d = 0x1b2ab00}}, m_excludeFilterRegExpCache = {m_regexpCache = {<QListSpecialMethods<QRegularExpression>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x1b47970}, d = 0x1b47970}}}, m_excludeMimetypes = {q_hash = {{d = 0x1b4b470, e = 0x1b4b470}}}, m_indexHidden = false, m_onlyBasicIndexing = false, m_devices = 0x1adb4d0, m_maxUncomittedFiles = 40}, m_notifyNewData = <incomplete type>, m_io = {m_stdinHandle = 0, m_stdoutHandle = 1, m_batchSize = 0, m_stdout = <incomplete type>}, m_idleMonitor = {<QObject> = {<No da---Type <return> to continue, or q <return> to quit---
ta fields>}, static staticMetaObject = {d = {superdata = 0x7f1a50edc8e0 <QObject::staticMetaObject>, stringdata = 0x41d800 <qt_meta_stringdata_Baloo__IdleStateMonitor>, data = 0x41d780 <qt_meta_data_Baloo__IdleStateMonitor>, static_metacall = 0x41bf10 <Baloo::IdleStateMonitor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, m_isIdle = true}, m_updatedFiles = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, <No data fields>}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0xdad2ad70}, d = 0xdad2ad70}}, <No data fields>}, m_tr = 0x1099b750}
Comment 2 marvin24 2016-07-10 17:10:03 UTC
ah, https://github.com/KDE/baloo/blob/master/src/engine/database.cpp#L97 shows the 5 GB are hardcoded (old size was 50 GB). But the size can be adjusted at runtime as it seems: http://lmdb.tech/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5
Comment 3 Christoph Cullmann 2016-09-11 11:20:52 UTC
Yeah, we should just increase the limit, baloo has no kind of "out of space" handling, increasing it on demand won't work ATM :/

*** This bug has been marked as a duplicate of bug 364475 ***