Bug 403643 - Okular stops rendering PDF and uses up 100% cpu (possible regression)
Summary: Okular stops rendering PDF and uses up 100% cpu (possible regression)
Status: RESOLVED DUPLICATE of bug 396137
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 1.9.1
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-26 23:58 UTC by Sergio
Modified: 2020-01-30 08:59 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
gdb stacktrace (22.61 KB, text/plain)
2020-01-26 11:34 UTC, postix
Details
gdb backtrace #2 (75.86 KB, text/plain)
2020-01-26 12:52 UTC, postix
Details
GDB Backtrace (231.17 KB, text/plain)
2020-01-26 13:07 UTC, postix
Details
gdb backtrace #3 (253.22 KB, text/plain)
2020-01-26 13:36 UTC, postix
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergio 2019-01-26 23:58:48 UTC
SUMMARY

Seen on kubuntu cosmic, never experienced it in bionic or before.

Occasionally, okular stops rendering PDF pages. This happens both in the thumbnails and the mail pages. Looking at the document shows the initial pages looking OK, then one page where the rendering is so to say partial (looks like rendering at a very low resolution), then all the other pages in the document appear blank.

At the same time cpu usage goes up to 100%.

When this happens, exiting the application closes its window but leaves a process behind running (it must be explicitly killed).

The documents on which this happens are not broken and the issue is not easily reproduced. Killing all okular processes and reopening the same document typically results in okular now working just fine on it.


STEPS TO REPRODUCE

Cannot reproduce reliably. Issue happens occasionally by opening PDF documents
Comment 1 Michael D 2019-02-24 13:35:03 UTC
I had the same issue today and wondered why my battery life was plummeting. I'm on Manjaro testing, okular 1.6.2.
Comment 2 postix 2020-01-22 14:44:05 UTC
I can confirm this issue. I have an PDF with a lots of scanned pages from a text book. 

After scrolling and reading a while Okular stops rendering new pages and a few appear extremely blurry. At the same time the CPU goes to 100% and I can no longer close the tab of the document, which has problems.
 
However, I can close the Okular app itself, however it does not terminate actually, but the process keeps running in the background with 100% CPU.


# System settings
Operating System: Manjaro Linux 
KDE Plasma Version: 5.17.5
KDE Frameworks Version: 5.67.0
Qt Version: 5.14.0
Kernel Version: 5.4.13-3-MANJARO
OS Type: 64-bit
Processors: 4 × Intel® Core™ i7-7500U CPU @ 2.70GHz
Memory: 15,4 GiB

# Additional information
RAM settings are were either set to 2nd or highest setting.
Comment 3 Oliver Sander 2020-01-22 20:03:28 UTC
Is this the same as https://bugs.kde.org/show_bug.cgi?id=396137 ?
Comment 4 postix 2020-01-24 16:32:26 UTC
(In reply to Oliver Sander from comment #3)
> Is this the same as https://bugs.kde.org/show_bug.cgi?id=396137 ?

It could be, however I don't need to "scroll fast".

Here's a backtrace when I just happened again. I tried to exit Okular, the window  closed, but the process kept running in the background for a while.
I will try to reproduce it at another time with debug symbols attached. :)

```
[postix ~]# gdb attach `pidof okular`
GNU gdb (GDB) 8.3.1

(...)

Attaching to process 4039
[New LWP 4041]
[New LWP 4042]
[New LWP 4046]
[New LWP 4047]
[New LWP 4048]
[New LWP 4049]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f429aece480 in g_mutex_lock () from /usr/lib/libglib-2.0.so.0
/usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
  refcounts = ['_M_refcount']['_M_pi']
(gdb) bt
#0  0x00007f429aece480 in g_mutex_lock () at /usr/lib/libglib-2.0.so.0
#1  0x00007f429af1da0c in g_main_context_prepare () at /usr/lib/libglib-2.0.so.0
#2  0x00007f429af1e046 in  () at /usr/lib/libglib-2.0.so.0
#3  0x00007f429af1e1f1 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#4  0x00007f429ca78d40 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#5  0x00007f429ca1f41c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#6  0x00007f42947bab7a in  () at /usr/lib/libOkular5Core.so.9
#7  0x00007f42947bbe43 in Okular::Document::closeDocument() () at /usr/lib/libOkular5Core.so.9
#8  0x00007f4294959513 in Okular::Part::closeUrl(bool) () at /usr/lib/qt/plugins/okularpart.so
#9  0x0000562c14bdd7b6 in  ()
#10 0x0000562c14bdd8c9 in  ()
#11 0x00007f429ca4df50 in QObject::event(QEvent*) () at /usr/lib/libQt5Core.so.5
#12 0x00007f429d59678b in QWidget::event(QEvent*) () at /usr/lib/libQt5Widgets.so.5
#13 0x00007f429d6b0865 in QMainWindow::event(QEvent*) () at /usr/lib/libQt5Widgets.so.5
#14 0x00007f429dec045c in KMainWindow::event(QEvent*) () at /usr/lib/libKF5XmlGui.so.5
#15 0x00007f429df0b988 in KXmlGuiWindow::event(QEvent*) () at /usr/lib/libKF5XmlGui.so.5
#16 0x00007f429d552472 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5
#17 0x00007f429d55bc89 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5
#18 0x00007f429ca208b2 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/libQt5Core.so.5
#19 0x00007f429ca23116 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /usr/lib/libQt5Core.so.5
#20 0x00007f429ca796f4 in  () at /usr/lib/libQt5Core.so.5
#21 0x00007f429af1c39e in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0
#22 0x00007f429af1e1b1 in  () at /usr/lib/libglib-2.0.so.0
#23 0x00007f429af1e1f1 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#24 0x00007f429ca78d40 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#25 0x00007f429ca1f41c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#26 0x00007f429ca27426 in QCoreApplication::exec() () at /usr/lib/libQt5Core.so.5
#27 0x0000562c14bd379a in  ()
#28 0x00007f429c3f0153 in __libc_start_main () at /usr/lib/libc.so.6
#29 0x0000562c14bd47ce in  ()
```
Comment 5 postix 2020-01-24 16:33:59 UTC
Additional remark: The CPU goes up to 100% for a single core.
Comment 6 Albert Astals Cid 2020-01-25 00:19:41 UTC
(In reply to Postix from comment #4)
> (In reply to Oliver Sander from comment #3)
> > Is this the same as https://bugs.kde.org/show_bug.cgi?id=396137 ?
> 
> It could be, however I don't need to "scroll fast".
> 
> Here's a backtrace when I just happened again. I tried to exit Okular, the
> window  closed, but the process kept running in the background for a while.
> I will try to reproduce it at another time with debug symbols attached. :)
> 
> ```
> [postix ~]# gdb attach `pidof okular`
> GNU gdb (GDB) 8.3.1
> 
> (...)
> 
> Attaching to process 4039
> [New LWP 4041]
> [New LWP 4042]
> [New LWP 4046]
> [New LWP 4047]
> [New LWP 4048]
> [New LWP 4049]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/usr/lib/libthread_db.so.1".
> 0x00007f429aece480 in g_mutex_lock () from /usr/lib/libglib-2.0.so.0
> /usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731:
> SyntaxWarning: list indices must be integers or slices, not str; perhaps you
> missed a comma?
>   refcounts = ['_M_refcount']['_M_pi']
> (gdb) bt
> #0  0x00007f429aece480 in g_mutex_lock () at /usr/lib/libglib-2.0.so.0
> #1  0x00007f429af1da0c in g_main_context_prepare () at
> /usr/lib/libglib-2.0.so.0
> #2  0x00007f429af1e046 in  () at /usr/lib/libglib-2.0.so.0
> #3  0x00007f429af1e1f1 in g_main_context_iteration () at
> /usr/lib/libglib-2.0.so.0
> #4  0x00007f429ca78d40 in
> QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
> () at /usr/lib/libQt5Core.so.5
> #5  0x00007f429ca1f41c in
> QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at
> /usr/lib/libQt5Core.so.5
> #6  0x00007f42947bab7a in  () at /usr/lib/libOkular5Core.so.9
> #7  0x00007f42947bbe43 in Okular::Document::closeDocument() () at
> /usr/lib/libOkular5Core.so.9
> #8  0x00007f4294959513 in Okular::Part::closeUrl(bool) () at
> /usr/lib/qt/plugins/okularpart.so
> #9  0x0000562c14bdd7b6 in  ()
> #10 0x0000562c14bdd8c9 in  ()
> #11 0x00007f429ca4df50 in QObject::event(QEvent*) () at
> /usr/lib/libQt5Core.so.5
> #12 0x00007f429d59678b in QWidget::event(QEvent*) () at
> /usr/lib/libQt5Widgets.so.5
> #13 0x00007f429d6b0865 in QMainWindow::event(QEvent*) () at
> /usr/lib/libQt5Widgets.so.5
> #14 0x00007f429dec045c in KMainWindow::event(QEvent*) () at
> /usr/lib/libKF5XmlGui.so.5
> #15 0x00007f429df0b988 in KXmlGuiWindow::event(QEvent*) () at
> /usr/lib/libKF5XmlGui.so.5
> #16 0x00007f429d552472 in QApplicationPrivate::notify_helper(QObject*,
> QEvent*) () at /usr/lib/libQt5Widgets.so.5
> #17 0x00007f429d55bc89 in QApplication::notify(QObject*, QEvent*) () at
> /usr/lib/libQt5Widgets.so.5
> #18 0x00007f429ca208b2 in QCoreApplication::notifyInternal2(QObject*,
> QEvent*) () at /usr/lib/libQt5Core.so.5
> #19 0x00007f429ca23116 in
> QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at
> /usr/lib/libQt5Core.so.5
> #20 0x00007f429ca796f4 in  () at /usr/lib/libQt5Core.so.5
> #21 0x00007f429af1c39e in g_main_context_dispatch () at
> /usr/lib/libglib-2.0.so.0
> #22 0x00007f429af1e1b1 in  () at /usr/lib/libglib-2.0.so.0
> #23 0x00007f429af1e1f1 in g_main_context_iteration () at
> /usr/lib/libglib-2.0.so.0
> #24 0x00007f429ca78d40 in
> QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
> () at /usr/lib/libQt5Core.so.5
> #25 0x00007f429ca1f41c in
> QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at
> /usr/lib/libQt5Core.so.5
> #26 0x00007f429ca27426 in QCoreApplication::exec() () at
> /usr/lib/libQt5Core.so.5
> #27 0x0000562c14bd379a in  ()
> #28 0x00007f429c3f0153 in __libc_start_main () at /usr/lib/libc.so.6
> #29 0x0000562c14bd47ce in  ()
> ```

This backtrace is not useful, next time you need to trace all threads not just the main one 
"thread apply all bt" in gdb

Also ideally do that before closing

Also if you have a file in which it's relatively easy to reproduce, attach it
Comment 7 postix 2020-01-26 11:34:21 UTC
Created attachment 125427 [details]
gdb stacktrace

(In reply to Albert Astals Cid from comment #6)
> "thread apply all bt" in gdb
> Also ideally do that before closing


I tried it again, this time with a debug build of Okular and the qt-debug packages on Manjaro Unstable in a virtual machine. Also glibc-debug was installed, but it still seems to lack the debug symbols.

What did I do?

1) Started Okular with gdb
gdb okular
gdb run

2) Loaded a troublesome PDF and started scrolling until the CPU went high and no more - not yet rendered - pages were loaded
3) Interrupted the process in the terminal by ctrl+c
4) Generated the backtrace via
thread apply all bt full

which is now attached and hopefully more useful to you now. If not, please tell what I should do instead or additionally of steps 1) to 4). :)
 
> Also if you have a file in which it's relatively easy to reproduce, attach it
With 17 MB it's too big to attach it here. Maybe I can send it to you via mail?
Comment 8 Albert Astals Cid 2020-01-26 11:45:15 UTC
The backtrace is not very useful really, also it seems like you were trying to quit okular, did you do that? If so, don't do that.

Sure send me the file, but you didn't answer the important question of "it's relatively easy to reproduce"
Comment 9 postix 2020-01-26 12:04:33 UTC
(In reply to Albert Astals Cid from comment #8)
> The backtrace is not very useful really, also it seems like you were trying
> to quit okular, did you do that? If so, don't do that.

I don't quit Okular but at one step I need to hold the program (ctrl + c) in gdb to generate the debug trace? I kindly asked you how I do it correctly. What I did exactly I mentioned above.

> Sure send me the file, but you didn't answer the important question of "it's
> relatively easy to reproduce"

There was no "question" and if there was one, I answered it implicitly: Yes it's easy to reproduce on my side. ;)
Comment 10 postix 2020-01-26 12:52:47 UTC
Created attachment 125431 [details]
gdb backtrace #2

I tried it again just as you described it here: https://bugs.kde.org/show_bug.cgi?id=396137#c1

There are also a few lines with
> generatePixmap
like in the other report: https://bugs.kde.org/show_bug.cgi?id=396137#c2

Hopefully you are now more happy or do you need more bt/ctrl+c/continue sequences?

P.S. I also send you a mail.
Comment 11 postix 2020-01-26 13:07:10 UTC
Created attachment 125432 [details]
GDB Backtrace

Some more ctrl+c/thread apply bt full/continue sequences.
Comment 12 postix 2020-01-26 13:36:54 UTC
Created attachment 125435 [details]
gdb backtrace #3

Update.
Comment 13 Albert Astals Cid 2020-01-26 21:48:04 UTC

*** This bug has been marked as a duplicate of bug 396137 ***
Comment 14 postix 2020-01-26 21:55:24 UTC
(In reply to Albert Astals Cid from comment #13)
> 
> *** This bug has been marked as a duplicate of bug 396137 ***

Albert, have my efforts at least helped you a little bit besides probably identifying this bug as a duplicate? I just hope I did not invest so much time for nothing. :-/
Comment 15 Albert Astals Cid 2020-01-26 22:06:17 UTC
(In reply to Postix from comment #14)
> (In reply to Albert Astals Cid from comment #13)
> > 
> > *** This bug has been marked as a duplicate of bug 396137 ***
> 
> Albert, have my efforts at least helped you a little bit besides probably
> identifying this bug as a duplicate? I just hope I did not invest so much
> time for nothing. :-/

Sadly not really, have not been able to reproduce it myself nor I have find any good clue on the backtraces :/
Comment 16 Albert Astals Cid 2020-01-26 22:12:09 UTC
(In reply to Albert Astals Cid from comment #15)
> (In reply to Postix from comment #14)
> > (In reply to Albert Astals Cid from comment #13)
> > > 
> > > *** This bug has been marked as a duplicate of bug 396137 ***
> > 
> > Albert, have my efforts at least helped you a little bit besides probably
> > identifying this bug as a duplicate? I just hope I did not invest so much
> > time for nothing. :-/
> 
> Sadly not really, have not been able to reproduce it myself nor I have find
> any good clue on the backtraces :/

One thing, since you seem to be able to reproduce it relatively easily, could you try okular from flathub and try reproducing it using that okular instead of your distribution one? https://flathub.org/apps/details/org.kde.okular
Comment 17 Tobias Deiminger 2020-01-26 23:48:55 UTC
(In reply to Postix from comment #10)
> There are also a few lines with
> > generatePixmap

Qt has a shortcut for QTimer::singleShot(msec = 0, ...), where they call QMetaObject::invokeMethodImpl to trigger your lambda immediately via a QueuedConnection plus one iteration through the event loop.

So a callstack like yours
#n+1 QMetaObject::invokeMethodImpl
#n   Okular::Generator::generatePixmap
...
#n-m QCoreApplicationPrivate::sendPostedEvents
could indeed stem from a "text generation already finished but mTextPageReady still false" endless loop, assuming the compiler has inlined some calls for optimization. Note the "could", I see no definite proof yet.

I generally agree with Albert that this issue is particularly hard to debug without being able to reproduce it locally.
Comment 18 Albert Astals Cid 2020-01-27 22:54:19 UTC
(In reply to Tobias Deiminger from comment #17)
> (In reply to Postix from comment #10)
> > There are also a few lines with
> > > generatePixmap
> 
> Qt has a shortcut for QTimer::singleShot(msec = 0, ...), where they call
> QMetaObject::invokeMethodImpl to trigger your lambda immediately via a
> QueuedConnection plus one iteration through the event loop.
> 
> So a callstack like yours
> #n+1 QMetaObject::invokeMethodImpl
> #n   Okular::Generator::generatePixmap
> ...
> #n-m QCoreApplicationPrivate::sendPostedEvents
> could indeed stem from a "text generation already finished but
> mTextPageReady still false" endless loop, assuming the compiler has inlined
> some calls for optimization. Note the "could", I see no definite proof yet.
> 
> I generally agree with Albert that this issue is particularly hard to debug
> without being able to reproduce it locally.

The timer is not the problem, that timer gets triggered quite easily and doesn't infinite loop.

The problem is that sometimes it seems to infinite loop, so yeah, no idea :D
Comment 19 Tobias Deiminger 2020-01-28 07:56:47 UTC
(In reply to Albert Astals Cid from comment #18)
> The timer is not the problem, that timer gets triggered quite easily and
> doesn't infinite loop.
> 
> The problem is that sometimes it seems to infinite loop, so yeah, no idea :D

Just thought about scenarios how such an inifte loop could occur. All can be ruled out by attached backtraces:
- isFinished never signaled -  can be ruled out: If TextPageGenerationThread ran forever, TextPageGenerationThread::run would be on stack, but it's not. If the thread exited non-gracefully (i.e. abort()/SIGABRT), the whole program would have crashed, but it did not.
- textpageGenerationFinished() never called - can be ruled out: It's connected to finished(), then same reasons as above apply.
- textpageGenerationFinished() called, but QMutexLocker blocking forever - can be ruled out: textpageGenerationFinished is called in main thread via QueuedConnection, so the lock() would be visible on Thread 1 stack, and more, the mutex would also block the single shot timer, which is obviously not the case.

After that I see no possibility for an infinite loop. How about generatePixmap simply being called very frequently in regular ways, then hitting the singleShot sometimes (but leaving it again), misleading us?

I'll try to reproduce in kubuntu 19.10. Other ways of debugging could be to collect more data by means of a remotely executed gdb script or even a remote gdb session. But both have severe privacy/security implications, so I'm not sure if we could even suggest this to bug reporters.
Comment 20 Oliver Sander 2020-01-28 08:00:56 UTC
> Other ways of debugging could be to collect more data by means of a remotely executed gdb script or even a remote gdb session. But both have severe privacy/security implications, so I'm not sure if we could even suggest this to bug reporters.

Tobias, you can suggest that to me.  I cannot reproduce the issue reliably, but it does happen regularly.
Comment 21 Tobias Deiminger 2020-01-28 18:47:25 UTC
(In reply to Oliver Sander from comment #20)
> Tobias, you can suggest that to me.

Ok, suggested herewith :)
Comment 22 Tobias Deiminger 2020-01-29 00:01:23 UTC
I can now reproduce the issue (CPU 30%, pages no more rendered, but okular symbol shown) in Ubuntu 19.10. But: For me it happens only if strace is attached to okular while scrolling through a PDF (the one from bug 39613). The issue then remains after detaching strace, it will never recover, and can be observed by attaching gdb.

I see that QEventLoop is no more going to sleep, because the underlying poll() system call is always called with a timeout of 0, and that generatePixmap is called at high frequency while TextPageGenerationThread::isFinished == 1. Can't go on for today, but seems like a good position to continue debugging upcoming days.

Albert, could you try if the "strace trick" helps to reproduce the issue at your side?
Comment 23 Tobias Deiminger 2020-01-29 22:59:17 UTC
Merge request at https://invent.kde.org/kde/okular/merge_requests/98 describes and fixes the issue I was able to reproduce locally. Hopefully it's the same as root cause as our bug reporters observed.
Comment 24 Tobias Deiminger 2020-01-30 08:59:29 UTC
Git commit 593803b0a1d98eba64aac38316aa521130b4ae78 by Tobias Deiminger.
Committed on 30/01/2020 at 08:15.
Pushed by aacid into branch 'release/19.12'.

Fix render stop and high load due to timing issue

Text generation is connected to pixmap generation thread started signal.
However the signal may have been emitted faster than the connect could took place,
and because started is fired only once, the connected lambda never got executed.

generatePixmap tried to sync up with that never happening text generation anyway
by means of scheduling itself. This lead to a infinite loop via a no more sleeping QEventLoop.

Fixed by moving the connect in front of starting the thread.
Related: bug 396137, bug 396087

M  +2    -2    core/generator.cpp

https://invent.kde.org/kde/okular/commit/593803b0a1d98eba64aac38316aa521130b4ae78