Summary: | kio_mbox produces 100% cpu load | ||
---|---|---|---|
Product: | [Unmaintained] korn | Reporter: | Thomas Eschenbacher <Thomas.Eschenbacher> |
Component: | general | Assignee: | Mart Kelder <mart> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | 0.3 | ||
Target Milestone: | --- | ||
Platform: | Gentoo Packages | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | strace of kio_mbox parsing a large mbox |
Description
Thomas Eschenbacher
2005-12-23 23:24:25 UTC
Thanks for your report. Can you reproduce this bug? Does it always happen, or if (and only if) you have at least one mail in that mailbox (or something else that could trigger this bug)? Does this happen with the previous versie (in KDE 3.4.x) of KOrn? Yes, I now found a way to reproduce it. I just send a mail with my current kernel binary as attachment to myself, at least two times. Then kio_mbox produces 100% cpu load for a long time (~20sec). After this I can identify the process id with "top" and did a strace -f -p <pid>. It shows up that it does loop with calls to fstat64, like this: ---------------------------- fstat64(7, {st_mode=S_IFREG|0600, st_size=4016911, ...}) = 0 [repeats over 200.000 times !] ---------------------------- Then, after some time I see the following sequence: -------------------- fstat64(7, {st_mode=S_IFREG|0600, st_size=4016911, ...}) = 0 fstat64(7, {st_mode=S_IFREG|0600, st_size=4016911, ...}) = 0 write(3, " 436_6a_\0\0\0\3\0\0\0\6\0\0\10\2\0\0\200\0\0\0@\1\0"..., 1024) = 1024 write(3, "\0S\0a\0t\0 \0D\0e\0c\0 \0002\0004\0 \0001\0000\0:\000"..., 64) = 64 gettimeofday({1135416104, 679553}, NULL) = 0 write(3, " 0_68_", 10) = 10 close(7) = 0 munmap(0xb6389000, 131072) = 0 select(4, [3], NULL, NULL, NULL -------------------- ...and after this it stays calm. I check for e-mails every 30 seconds, so If I let the mbox grow more and the process mentioned above does not finish within those 30 seconds it produces a permanent 100%-cpu-load. AFAIR this came since the upgrade to KDE-3.4.2, but I'm not sure. (As a workaround I currently check by POP3, but this spills my syslog with messages from pop3d) Thanks for your reply. So, if I understand correctly, it is a performance issue and not a infinitive loop. I can understand that it is slow if the mbox file is very large: it have to search for a new email, and therefor, it reads every line. By reading that line, if also calles fstat to see if it is at the end of the file. It seems possible to use another function to leave out fstat's, but it is a little bit tricky (it can easily cause new bugs), so I have to be carefull. I also don't know what the performance revenu is, but it should not be much slower (most likely, it is faster). It should only cause a permanent CPU load if the processing time is a multiply of 30 seconds (thus 30 secs, 60 secs, 90 secs, 120 secs, etc.), because if shouldn't check again if the previous check isn't finished (of course, there can be a bug in this system). Created attachment 14034 [details]
strace of kio_mbox parsing a large mbox
(bzipped, the original is about 6.5 MB)
Well, I understand that parsing line by line is quite slow, but not this much!? I made another test run, which took ~20sec per pass. If I cat the mbox file to a xterm it's about the same time, whereas "dd if=mbox of=/dev/null" takes less than one second. IMO somthing between would be acceptable. This time I logged a complete run with strace, from one select() to the next. You can see that there is a read() from time to time, but between are thousands of fstats... (I will attach you the logfile). Thanks for your reply and for your file. Your mbox-file is 4016911 bytes large. It is reasanable to assume that there are 80 characters a line, thus there are 50211 lines. According to the file you send, there are 110192 fstat's, which means 2 fstat's per line. These fstat's come from the Qt-library (at least one is coming from the atEnd() function, maybe both). It is possible to rewrite the atEnd() calls to use another function (with possible side-effects, but that doesn't mean it can't be done). I have made a fix which works around the atEnd() functions, but I need to test it well before I commit. I also need to see that it also resolves both fstat calls. SVN commit 491742 by mkelder: Remove atEnd() calls and use the result of readLine to find out if the file is at EOF. This should speed up reading files with a large number of lines. CCBUG: 118926 M +7 -6 readmbox.cc M +1 -0 readmbox.h --- trunk/KDE/kdepim/kioslave/mbox/readmbox.cc #491741:491742 @@ -44,6 +44,7 @@ m_stream( 0 ), m_current_line( new QString( QString::null ) ), m_current_id( new QString( QString::null ) ), + m_atend( true ), m_prev_time( 0 ), m_only_new( onlynew ), m_savetime( savetime ), @@ -84,16 +85,15 @@ if( !m_stream ) return true; - if( m_stream->atEnd() ) + *m_current_line = m_stream->readLine(); + m_atend = m_current_line->isNull(); + if( m_atend ) // Cursor was at EOF { - *m_current_line = QString::null; *m_current_id = QString::null; m_prev_status = m_status; return true; } - *m_current_line = m_stream->readLine(); - //New message if( m_current_line->left( 5 ) == "From " ) { @@ -122,7 +122,7 @@ if( !m_stream ) return false; - while( !m_stream->atEnd() && *m_current_id != id ) + while( !m_atend && *m_current_id != id ) nextLine(); return *m_current_id == id; @@ -145,6 +145,7 @@ { if( m_stream ) m_stream->device()->reset(); + m_atend = m_stream->atEnd(); } bool ReadMBox::atEnd() const @@ -152,7 +153,7 @@ if( !m_stream ) return true; - return m_stream->atEnd() || ( m_info->type() == UrlInfo::message && *m_current_id != m_info->id() ); + return m_atend || ( m_info->type() == UrlInfo::message && *m_current_id != m_info->id() ); } bool ReadMBox::inListing() const --- trunk/KDE/kdepim/kioslave/mbox/readmbox.h #491741:491742 @@ -123,6 +123,7 @@ QTextStream* m_stream; QString* m_current_line; QString* m_current_id; + bool m_atend; struct utimbuf* m_prev_time; SVN commit 491743 by mkelder: Backport. Remove atEnd() calls and use the result of readLine to find out if the file is at EOF. This should speed up reading files with a large number of lines. BUG: 118926 M +7 -6 readmbox.cc M +1 -0 readmbox.h --- branches/KDE/3.5/kdepim/kioslaves/mbox/readmbox.cc #491742:491743 @@ -43,6 +43,7 @@ m_stream( 0 ), m_current_line( new QString( QString::null ) ), m_current_id( new QString( QString::null ) ), + m_atend( true ), m_prev_time( 0 ), m_only_new( onlynew ), m_savetime( savetime ), @@ -83,16 +84,15 @@ if( !m_stream ) return true; - if( m_stream->atEnd() ) + *m_current_line = m_stream->readLine(); + m_atend = m_current_line->isNull(); + if( m_atend ) // Cursor was at EOF { - *m_current_line = QString::null; *m_current_id = QString::null; m_prev_status = m_status; return true; } - *m_current_line = m_stream->readLine(); - //New message if( m_current_line->left( 5 ) == "From " ) { @@ -121,7 +121,7 @@ if( !m_stream ) return false; - while( !m_stream->atEnd() && *m_current_id != id ) + while( !m_atend && *m_current_id != id ) nextLine(); return *m_current_id == id; @@ -144,6 +144,7 @@ { if( m_stream ) m_stream->device()->reset(); + m_atend = m_stream->atEnd(); } bool ReadMBox::atEnd() const @@ -151,7 +152,7 @@ if( !m_stream ) return true; - return m_stream->atEnd() || ( m_info->type() == UrlInfo::message && *m_current_id != m_info->id() ); + return m_atend || ( m_info->type() == UrlInfo::message && *m_current_id != m_info->id() ); } bool ReadMBox::inListing() const --- branches/KDE/3.5/kdepim/kioslaves/mbox/readmbox.h #491742:491743 @@ -121,6 +121,7 @@ QTextStream* m_stream; QString* m_current_line; QString* m_current_id; + bool m_atend; struct utimbuf* m_prev_time; Thanks for your very quick response! With my test mbox file it now takes 8sec instead of 20sec, a big speedup :-) However, I guess that users who have the choice would be much happier by switching from mbox to maildir if possible, the handling of maildir should be much more efficient than mbox handling could ever be... I filed a report on http://bugs.gentoo.org/show_bug.cgi?id=116870 with a modified ebuild package that includes your patch. Thanks for testing. It is a nice speedup for such a fix. Of course, you only need a bigger file to get the same problem again. Maildir and mbox both have adventages and disavantages. Maildir can be inefficient (harddist usage) if you have a lot of small emails and a filesystem that uses a couple of kilobytes as blocksize. |