Bug 257931

Summary: Strigi (nepomuk indexing) crashes with "is not a UTF8 or latin1 string" error
Product: [Unmaintained] nepomuk Reporter: Syam <get.sonic>
Component: generalAssignee: Sebastian Trueg <sebastian>
Status: RESOLVED FIXED    
Severity: crash CC: ammonid, barbara.kronsfoth, david-ac94, roland.scheer, rooksy, trueg
Priority: NOR    
Version: 4.5   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Syam 2010-11-26 02:51:38 UTC
Version:           unspecified (using KDE 4.5.3) 
OS:                Linux

I started using Nepomuk/Strigi recently and enabled indexing. I observed that after a while, the indexing icon (in the systray) is vanishing. When I check in System Settings, it shows that Strigi is disabled. If I enable it again, it runs for a couple of seconds (the icon in the systray appears) and again vanishes.

Instead of enabling it in System Settings, I started strigi from terminal by doing:
nepomukservicestub nepomukstrigiservice

The same observations repeated. But when strigi crashed, I got the following in the terminal:

==============================

'le1 le2' is not a UTF8 or latin1 string
'le1' is not a UTF8 or latin1 string
'le2' is not a UTF8 or latin1 string
' Edit le1' is not a UTF8 or latin1 string
'le1 le2' is not a UTF8 or latin1 string
'le1' is not a UTF8 or latin1 string
'le2' is not a UTF8 or latin1 string
' Add le1 to' is not a UTF8 or latin1 string
'le1 le2' is not a UTF8 or latin1 string
'le1' is not a UTF8 or latin1 string
'le2' is not a UTF8 or latin1 string
'le1 le2' is not a UTF8 or latin1 string
'le1' is not a UTF8 or latin1 string
'le2' is not a UTF8 or latin1 string
'git di' is not a UTF8 or latin1 string
'git di HEAD' is not a UTF8 or latin1 string
'git di --cached' is not a UTF8 or latin1 string
'le1 le2' is not a UTF8 or latin1 string
'le1' is not a UTF8 or latin1 string
'le2' is not a UTF8 or latin1 string
'Di' is not a UTF8 or latin1 string
'di between' is not a UTF8 or latin1 string
'di between' is not a UTF8 or latin1 string
Analyzer AuThroughAnalyzer has left the stream in a bad state.
Analyzer Audible has left the stream in a bad state.
Analyzer DdsThroughAnalyzer has left the stream in a bad state.
Analyzer FontThroughAnalyzer has left the stream in a bad state.
Analyzer GifThroughAnalyzer has left the stream in a bad state.
Analyzer IcoThroughAnalyzer has left the stream in a bad state.
Analyzer Mp4 has left the stream in a bad state.
Analyzer PcxThroughAnalyzer has left the stream in a bad state.
Analyzer RgbThroughAnalyzer has left the stream in a bad state.
Analyzer SidThroughAnalyzer has left the stream in a bad state.
Analyzer XbmThroughAnalyzer has left the stream in a bad state.
Analyzer OggThroughAnalyzer has left the stream in a bad state.
nepomukservicestub: /builddir/build/BUILD/strigi-0.7.2/src/streams/dataeventinputstream.cpp:30: Strigi::DataEventInputStream::DataEventInputStream(Strigi::InputStream*, Strigi::DataEventHandler&): Assertion `input->position() == 0' failed.
KCrash: Application 'nepomukservicestub' crashing...
KCrash: Attempting to start /usr/libexec/kde4/drkonqi from kdeinit
sock_file=/home/syamcr/.kde/socket-square/kdeinit4__0
QSocketNotifier: Invalid socket 8 and type 'Read', disabling...
QSocketNotifier: Invalid socket 11 and type 'Read', disabling...
QSocketNotifier: Invalid socket 13 and type 'Read', disabling...
QSocketNotifier: Invalid socket 22 and type 'Read', disabling...
nepomukservicestub: Fatal IO error: client killed
kDebugStream called after destruction (from virtual Strigi::NepomukIndexManager::~NepomukIndexManager() file /builddir/build/BUILD/kdebase-runtime-4.5.3/nepomuk/strigibackend/nepomukindexmanager.cpp line 78)

kDebugStream called after destruction (from virtual Strigi::NepomukIndexWriter::~NepomukIndexWriter() file /builddir/build/BUILD/kdebase-runtime-4.5.3/nepomuk/strigibackend/nepomukindexwriter.cpp line 332) 

Reproducible: Always

Steps to Reproduce:
Start strigi again
Comment 1 Syam 2010-11-26 02:53:04 UTC
This has some similarity to bug #238138, since that also reports a "is not a UTF8 or latin1 string", but I don't see "Error in parsing: Keyword obj not found." message.
I have no idea which file(s) is causing this problem.
Comment 2 Sebastian Trueg 2011-11-04 19:57:18 UTC
The indexer has been moved into its own process in KDE 4.7 which results in the service never crashing due to buggy strigi analyzer plugins. From a Nepomuk point of view this is thus fixed.
As for the Strigi crash: a lot of stability improvements have been done. If the crash persists with Strigi 0.7.6 please open a bug at strigi.sf.net
Comment 3 Sebastian Trueg 2011-11-04 19:58:30 UTC
*** Bug 261083 has been marked as a duplicate of this bug. ***
Comment 4 Sebastian Trueg 2011-11-04 20:05:39 UTC
*** Bug 271663 has been marked as a duplicate of this bug. ***
Comment 5 Sebastian Trueg 2011-11-04 20:34:07 UTC
*** Bug 283750 has been marked as a duplicate of this bug. ***
Comment 6 Thijs 2012-01-20 18:44:19 UTC
*** Bug 292055 has been marked as a duplicate of this bug. ***
Comment 7 Anne-Marie Mahfouf 2012-04-05 08:29:02 UTC
*** Bug 297512 has been marked as a duplicate of this bug. ***