Bug 368907 - kaffeine sometimes uses 100% CPU
Summary: kaffeine sometimes uses 100% CPU
Status: RESOLVED FIXED
Alias: None
Product: kaffeine
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Mageia RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: Mauro Carvalho Chehab
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-16 16:58 UTC by José JORGE
Modified: 2017-07-09 14:05 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
perf report (6.04 KB, text/plain)
2017-07-05 21:54 UTC, Jan Palus
Details
callgrind output (1.02 MB, application/gzip)
2017-07-05 23:33 UTC, Jan Palus
Details
Don't ignore errors when writing to libVLC pipe (637 bytes, patch)
2017-07-08 12:16 UTC, Mauro Carvalho Chehab
Details
Wait 100ms before retrying after an error (1.83 KB, patch)
2017-07-08 12:59 UTC, Mauro Carvalho Chehab
Details
Better handle errors when writing to pipe (1.61 KB, patch)
2017-07-08 13:43 UTC, Mauro Carvalho Chehab
Details
Better handle errors when writing to pipe - v2 (2.97 KB, patch)
2017-07-08 14:06 UTC, Mauro Carvalho Chehab
Details
Disable notifier if no data (1.25 KB, patch)
2017-07-08 14:58 UTC, Mauro Carvalho Chehab
Details
Disable notifier if no data - v2 (1.73 KB, patch)
2017-07-08 15:21 UTC, Mauro Carvalho Chehab
Details
Demo app to show infinite signal emission (914 bytes, application/zip)
2017-07-08 17:55 UTC, coekbe
Details
A patch using a read notifier (689 bytes, patch)
2017-07-08 17:57 UTC, coekbe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description José JORGE 2016-09-16 16:58:10 UTC
I use kaffeine for DVB-T, and it uses nicely VDPAU, as the CPU used is less than 20% like other players with VDPAU. But sometimes, it starts using 100% CPU, and then it keeps at this level even when channels are stopped.

This happens sometimes directly when I start playing the first channel, or when changing the channel. It never starts happening when simply playing a channel.

I tried to run with debug log (-d), but I saw nothing different with between sessions with this bug or without.

Reproducible: Sometimes

Steps to Reproduce:
1. play dvb-t
2. it is in the french TNT system


Actual Results:  
100% CPU

Expected Results:  
At least the CPU should not be used if I stop playing/
Comment 1 Oscar Fuentes 2016-10-15 14:37:44 UTC
I'm experiencing this too after upgrading from Kubuntu 16.04 to 16.10.
Comment 2 knossos456 2016-11-19 16:12:00 UTC
Have you same problem with VLC (because newest Kaffeine is based on VLC core) ?
For me and ATI 5450, all is OK (VLC and Kaffeine) for VDPAU and ubuntu 16.04 ( under 3% cpu usage
Please post your graphical card model
Comment 3 José JORGE 2016-11-19 16:25:37 UTC
(In reply to knossos456 from comment #2)
> Have you same problem with VLC (because newest Kaffeine is based on VLC
> core) ?
> For me and ATI 5450, all is OK (VLC and Kaffeine) for VDPAU and ubuntu 16.04
> ( under 3% cpu usage
> Please post your graphical card model

It is an ATI HD3200 (R600 GPU integrated in the motherbord). I have no problems when using directly VLC.
Comment 4 Oscar Fuentes 2016-11-19 19:26:23 UTC
Mine is an ATI HD6450 (also R600) using the radeon non-propietary driver.

No problems with VLC.
Comment 5 knossos456 2016-11-20 09:28:24 UTC
And wich version of Kaffeine for each (exact 2.0x version)?

Have you same problem with Kaffeine 1.22 based on Xnine ?
Comment 6 Oscar Fuentes 2016-11-20 14:08:26 UTC
Kubuntu 16.10 comes with Kaffeine 2.0.4.

As far as I recall I never experienced the same problem with Kaffeine 1.x, for the 8 years I'm using it.

(Currently I'm using Kaffeine 1.2.2 again due to this problem and another one with scheduled recordings.)
Comment 7 knossos456 2016-11-20 16:44:23 UTC
I'm on Ubuntu 16.04 LTS.
I roll back to Kaffeine 1.22 too because Kaffeine 2.0x have crach problemes with virtual dvb adapter (dvbloopback) wich is need to descramble.

If you dont use virtual adapters, please notice also that kaffeine 2.0x use vlc as dvb core, and perhaps your version of Vlc is not correct, version > 2.22 is requested.
This is explained on VLC forum, VDPAU did not work on old VLC versions.
Comment 8 José JORGE 2016-11-21 22:32:01 UTC
(In reply to knossos456 from comment #7)
> If you dont use virtual adapters, please notice also that kaffeine 2.0x use
> vlc as dvb core, and perhaps your version of Vlc is not correct, version >
> 2.22 is requested.
> This is explained on VLC forum, VDPAU did not work on old VLC versions.

I use vlc 2.2.4. It is important to understand that the 100% CPU does not stop if I stop the player. I have to close it. So it seems a Kaffeine interface bug, not a vlc one.
Comment 9 knossos456 2016-11-22 18:07:31 UTC
Ok, understanded.
Have VLC same probleme with a recoded HD stream ?
Have you installed Ubuntu from scratch or upgraded a previous version ?
What will done the command vdpauinfo ?


On my side, for VDPAU, i had to install ubuntu 16 lts from scratch to have VDPAU working, upgrading was not working, I had 50% CPU usage.

It is not normal your résult for 6000 ATI series card to have 20 % CPU usage...
You need to have better than me, arround 2 % usage.
Comment 10 Mauro Carvalho Chehab 2016-12-02 08:36:10 UTC
(In reply to José JORGE from comment #8)
> (In reply to knossos456 from comment #7)
> > If you dont use virtual adapters, please notice also that kaffeine 2.0x use
> > vlc as dvb core, and perhaps your version of Vlc is not correct, version >
> > 2.22 is requested.
> > This is explained on VLC forum, VDPAU did not work on old VLC versions.
> 
> I use vlc 2.2.4. It is important to understand that the 100% CPU does not
> stop if I stop the player. I have to close it. So it seems a Kaffeine
> interface bug, not a vlc one.

It could still be a libVlc issue, as Kaffeine starts using it when initializing, and only stops when Kaffeine is closed.
Comment 11 Mauro Carvalho Chehab 2016-12-02 08:45:17 UTC
(In reply to Mauro Carvalho Chehab from comment #10)
> (In reply to José JORGE from comment #8)
> > (In reply to knossos456 from comment #7)
> > I use vlc 2.2.4. It is important to understand that the 100% CPU does not
> > stop if I stop the player. I have to close it. So it seems a Kaffeine
> > interface bug, not a vlc one.
> 
> It could still be a libVlc issue, as Kaffeine starts using it when
> initializing, and only stops when Kaffeine is closed.

Btw, one way to check if this is due to VDPAU is to set Kaffeine to not use it.
You can change the output plugin via those menu options:

"Settings" -->  "Configure Kaffeine" -->  "libVLC". Any VLC command line option is accepted there. You could try those:
	--no-video-title-show -V xcb_glx
or:

	--no-video-title-show -V xcb_xv

and re-start Kaffeine.

Please test if changing to one of the above will affect the CPU usage.
Comment 12 José JORGE 2016-12-02 22:15:57 UTC
(In reply to Mauro Carvalho Chehab from comment #11)
> option is accepted there. You could try those:
> 	--no-video-title-show -V xcb_glx
> or:
> 	--no-video-title-show -V xcb_xv
> 
> and re-start Kaffeine.
> 
> Please test if changing to one of the above will affect the CPU usage.

None of those affect the CPU usage. The bug still triggers after watching one channel, either by clicking on stop, or by clicking on another channel.

Is there a way to debug the libvlc usage by kaffeine?
Comment 13 knossos456 2016-12-18 16:19:25 UTC
my diseqc propblem is solved, I've changed the switch.

I have now problems on some DVB-S2 - 8PSK transponder.
DVB-S2 QPSK seem to be OK in all cases.
For example, i can't scan :
http://fr.kingofsat.net/tp.php?tp=259
my parameters for this transponder in scanfile.dvb are :
S2 11681000 H 27500000 3/4 25 8PSK
What is wrong ?
Comment 14 knossos456 2016-12-18 16:21:57 UTC
oups in wrong section
Comment 15 johan.claudebreuninger 2016-12-19 22:11:17 UTC
I have the same problem on Kaffeine 2.0.5

The first time when I start the app the CPU load is normal: 1-2 %. After switching channel (HD or SD) Kaffeine uses one complete CPU core.
Comment 16 Mauro Carvalho Chehab 2016-12-27 18:46:08 UTC
a "100% CPU" usage, doesn't really mean that the CPU is 100%. The way the
Kernel generates this stat is by identifying how many times the Kernel task
scheduler is idle, e. g. if there's no pending task to be scheduled, it increments a counter. If such counter is not incremented, it shows 100% usage.

So, a 100% CPU only means that, every time the scheduler is called, there's
a task waiting to be awake. The way Kaffeine works is that there are multiple synchronous tasks running:
- tasks that receive DVB audio, video and EPG streams;
- tasks to parse MPEG data, decompress it and display the video via libVlc.

If any of this task happens as often as the scheduler is called, it will
show 100% CPU usage. It is hard to discover what task is taking more time without profiling it. One could run a profiler to check what task is
consuming more time, in order to identify the culprit.

Maybe this could be a reflex of EPG parsing, as reported on this bug:
   https://bugs.kde.org/show_bug.cgi?id=371579
Comment 17 José JORGE 2016-12-28 07:42:10 UTC
(In reply to Mauro Carvalho Chehab from comment #16)
> If any of this task happens as often as the scheduler is called, it will
> show 100% CPU usage. It is hard to discover what task is taking more time
> without profiling it. One could run a profiler to check what task is
> consuming more time, in order to identify the culprit.

Any hint on how to profile? Thanks.

> 
> Maybe this could be a reflex of EPG parsing, as reported on this bug:
>    https://bugs.kde.org/show_bug.cgi?id=371579

I don't think so, as the 100% CPU sometimes stops only by clicking on stop, then play again.
Comment 18 Mauro Carvalho Chehab 2017-02-16 11:31:15 UTC
(In reply to José JORGE from comment #17)
> (In reply to Mauro Carvalho Chehab from comment #16)
> > If any of this task happens as often as the scheduler is called, it will
> > show 100% CPU usage. It is hard to discover what task is taking more time
> > without profiling it. One could run a profiler to check what task is
> > consuming more time, in order to identify the culprit.
> 
> Any hint on how to profile? Thanks.
> 
> > 
> > Maybe this could be a reflex of EPG parsing, as reported on this bug:
> >    https://bugs.kde.org/show_bug.cgi?id=371579
> 
> I don't think so, as the 100% CPU sometimes stops only by clicking on stop,
> then play again.

Bug #371579 was fixed upstream. Could you please test if this solved the issue for you?
Comment 19 José JORGE 2017-02-16 14:35:44 UTC
(In reply to Mauro Carvalho Chehab from comment #18)
> Bug #371579 was fixed upstream. Could you please test if this solved the
> issue for you?

Thanks, but I suppose I have to checkout form git? Or can I try a tarball?
Comment 20 Mauro Carvalho Chehab 2017-02-16 17:14:46 UTC
(In reply to José JORGE from comment #19)
> (In reply to Mauro Carvalho Chehab from comment #18)
> > Bug #371579 was fixed upstream. Could you please test if this solved the
> > issue for you?
> 
> Thanks, but I suppose I have to checkout form git? Or can I try a tarball?

You'll need to checkout from git, as I didn't create a tarball yet. I'm actually waiting for a couple of days to receive more feedback before releasing Kaffeine 2.0.6 with this (and other) bug fixes.
Comment 21 José JORGE 2017-02-19 21:36:13 UTC
(In reply to Mauro Carvalho Chehab from comment #18)
> Bug #371579 was fixed upstream. Could you please test if this solved the
> issue for you?

Issue not solved as of today's GIT. I could get one time kaffeine to only use ~25% CPU after lots of play/stop on DVB. But 99% of the time, CPU is at 125% (dual-core system) when playing, and at 100% when playback is stopped.

Any hint on how to dump tasks load welcome...
Comment 22 Mauro Carvalho Chehab 2017-03-23 10:07:00 UTC
(In reply to José JORGE from comment #21)
> (In reply to Mauro Carvalho Chehab from comment #18)
> > Bug #371579 was fixed upstream. Could you please test if this solved the
> > issue for you?
> 
> Issue not solved as of today's GIT. I could get one time kaffeine to only
> use ~25% CPU after lots of play/stop on DVB. But 99% of the time, CPU is at
> 125% (dual-core system) when playing, and at 100% when playback is stopped.
> 
> Any hint on how to dump tasks load welcome...

You could use:
   QT_LOGGING_RULES=<something>

in order to enable debug logs, e. g. something like:

   $ QT_LOGGING_RULES="*.debug=true" kaffeine

That will enable *a lot* of debug messages, but may help to show what's going wrong.
Comment 23 Stetsbequem 2017-04-30 09:53:44 UTC
> „Status: UNCONFIRMED“
 
[x] I use Kaffeine for DVB-S/S2. Channel hopping with Kaffeine 2: >= 100% CPU.
Reproducible: always with kaffeine 2.0.* – 2.0.5-1. (Whith Kaffeine 1*: never!)
Comment 24 Jan Palus 2017-07-05 21:54:53 UTC
Created attachment 106451 [details]
perf report
Comment 25 Jan Palus 2017-07-05 21:55:47 UTC
I experience the same issue (version 2.0.10). Seems like it starts to happen as soon as some glitch in decoding is encountered -- video artifacts, green screen which kaffeine recovers from within seconds but keeps on using 100% CPU. Following errors are logged then:

[h264 @ 0x7fc1780ce820] top block unavailable for requested intra mode
[h264 @ 0x7fc1780ce820] error while decoding MB 22 0, bytestream 26429
[h264 @ 0x7fc1780eae80] mmco: unref short failure
[h264 @ 0x7fc1781069e0] mmco: unref short failure
[h264 @ 0x7fc1780eae80] decode_slice_header error
...
[h264 @ 0x7fc178122620] reference picture missing during reorder
[h264 @ 0x7fc178122620] reference picture missing during reorder
[h264 @ 0x7fc178122620] Missing reference picture, default is 65460
[h264 @ 0x7fc178122620] Missing reference picture, default is 65460
[h264 @ 0x7fc1780ce820] mmco: unref short failure
[h264 @ 0x7fc1780ce820] reference picture missing during reorder
[h264 @ 0x7fc1780ce820] Missing reference picture, default is 65461
[h264 @ 0x7fc1780ce820] mmco: unref short failure
[h264 @ 0x7fc1780ce820] mmco: unref short failure
...
[h264 @ 0x7fc178141480] reference picture missing during reorder
[h264 @ 0x7fc178141480] reference picture missing during reorder
[h264 @ 0x7fc178141480] Missing reference picture, default is 65292
[h264 @ 0x7fc178141480] Missing reference picture, default is 65292
[h264 @ 0x7fc1780e1140] reference picture missing during reorder
[h264 @ 0x7fc1780e1140] Missing reference picture, default is 65293
[h264 @ 0x7fc178141480] reference picture missing during reorder
[h264 @ 0x7fc178141480] Missing reference picture, default is 65300
...
[h264 @ 0x7fc178098940] reference picture missing during reorder
[h264 @ 0x7fc178098940] reference picture missing during reorder
[h264 @ 0x7fc178098940] Missing reference picture, default is 65452
[h264 @ 0x7fc178098940] Missing reference picture, default is 65452
[h264 @ 0x7fc178141b20] reference picture missing during reorder
[h264 @ 0x7fc178141b20] Missing reference picture, default is 65453
[h264 @ 0x7fc178098940] reference picture missing during reorder
[h264 @ 0x7fc178098940] Missing reference picture, default is 65460

I've tried to profile kaffeine but it's really hard to tell where is the issue. It appears as if it goes into some notification loop (quite a lot of time is spent in mutex lock/unlock -- biggest self time). See attached perf-report.txt
Comment 26 Jan Palus 2017-07-05 23:33:02 UTC
Created attachment 106452 [details]
callgrind output

Also attaching callgrind output
Comment 27 Szymon Stefanek 2017-07-08 02:05:57 UTC
I have the same issue with version 2.0.10. Viewing the same DVB channel sometimes uses low CPU and sometimes stays almost fixed at 100% (saturating one core, I guess). Version 2.0.10 installed from https://launchpad.net/~mtron/+archive/ubuntu/kaffeine-stable seems to be affected just like the 2.0.5 from the official ubuntu repositories. In addition 2.0.10 also tends to spit out a lot of warnings like the following:

08-07-17 03:55:54.236 [Debug   ] kaffeine.epg: channel invalid
08-07-17 03:55:54.263 [Debug   ] kaffeine.epg: channel invalid
08-07-17 03:55:54.390 [Debug   ] kaffeine.epg: table 0x4f, extension 0x2189, session 1/1, size 383
08-07-17 03:55:54.392 [Warning ] "2 instead of 1 arguments to message {Country %1: not rate...} supplied before conversion."

but these might be unrelated as they seem to appear also when CPU usage is low.

Another message I see often (in both versions) is:

[000000002d6da6e8] core input error: ES_OUT_SET_(GROUP_)PCR  is called too late (pts_delay increased to 488 ms)
[000000002d6da6e8] core input error: ES_OUT_RESET_PCR called

Hard to say if it's related.

Profiling shows similar results to the ones obtained by Jan Palus.
Most instruction fetches seem to happen inside src_process in libsamplerate.so.

An interesting fact is that when profiling via valgrind the CPU usage tends to be lower than when running normally. 100% is very rare. So it's even possible that the bug is not triggered when running under valgrind.
Comment 28 Jan Palus 2017-07-08 10:44:50 UTC
Some random observations:
1. For some reason issue is much easier to reproduce if kaffeine is run from gdb. Usually takes single channel change to trigger the problem
2. With enabled G_MAIN_POLL_DEBUG in glib I see quite a lot of input events when kaffeine is operating normally:

polling context=0x7fffd0000990 n=4 timeout=-1
g_main_poll(4) timeout: -1 - elapsed 0.0000060000 seconds [13 :i]

but once issue is triggered I see a big flood of output events:

polling context=0x7fffd8002c10 n=8 timeout=253
g_main_poll(8) timeout: 253 - elapsed 0.0000010000 seconds [22 :o]

where FD 22 is:

$ ls -l 22
l-wx------ 1 jan users 64 07-08 12:37 22 -> /run/user/1000/dvbpipe.m2t
Comment 29 Jan Palus 2017-07-08 11:38:59 UTC
It appears once the issue is triggered DvbLiveViewInternal::writeToPipe() is called thousands of times a second but buffers is always empty.
Comment 30 Mauro Carvalho Chehab 2017-07-08 12:16:15 UTC
Created attachment 106498 [details]
Don't ignore errors when writing to libVLC pipe

(In reply to Jan Palus from comment #29)
> It appears once the issue is triggered DvbLiveViewInternal::writeToPipe() is
> called thousands of times a second but buffers is always empty.

Hmm... maybe some error happened when writing to pipe. Could you please check if this patch will avoid the issue? I guess we may need some extra code to make it retry after a while.
Comment 31 Jan Palus 2017-07-08 12:27:49 UTC
Thanks for the patch. Unfortunately it does not change much -- issue is still there. New warning is constantly printed when kaffeine operates normally:

08-07-17 14:23:17.447 [Warning ] kaffeine.mediawidget: Error 11 while writing to pipe
08-07-17 14:23:17.477 [Warning ] kaffeine.mediawidget: Error 11 while writing to pipe
08-07-17 14:23:17.482 [Warning ] kaffeine.mediawidget: Error 11 while writing to pipe

but it is no longer printed when issue is triggered.
Comment 32 Mauro Carvalho Chehab 2017-07-08 12:59:33 UTC
Created attachment 106500 [details]
Wait 100ms before retrying after an error

(In reply to Jan Palus from comment #31)
> Thanks for the patch. Unfortunately it does not change much -- issue is
> still there. New warning is constantly printed when kaffeine operates
> normally:
> 
> 08-07-17 14:23:17.447 [Warning ] kaffeine.mediawidget: Error 11 while
> writing to pipe
> 08-07-17 14:23:17.477 [Warning ] kaffeine.mediawidget: Error 11 while
> writing to pipe
> 08-07-17 14:23:17.482 [Warning ] kaffeine.mediawidget: Error 11 while
> writing to pipe
> 
> but it is no longer printed when issue is triggered.

Ok, error 11 is EAGAIN. I suspect that the pipeline buffer is full. Probably, VLC is too slow to handle the DVB channel you're tuning.

Please check what happens with the attached patch. It will wait 100 ms before trying again. That will likely cause frame drops, but it should reduce the CPU load.
Comment 33 Mauro Carvalho Chehab 2017-07-08 13:06:07 UTC
(In reply to Mauro Carvalho Chehab from comment #32)
> Ok, error 11 is EAGAIN. I suspect that the pipeline buffer is full.
> Probably, VLC is too slow to handle the DVB channel you're tuning.
> 
> Please check what happens with the attached patch. It will wait 100 ms
> before trying again. That will likely cause frame drops, but it should
> reduce the CPU load.

Please notice that I don't expect this patch to be on its final version. Delaying retries for a long time can cause data loss when everything is fine. So, I guess we should probably set the timeout to a lower value, and increment a retry count every time the retry happens. If the counter is too big, perhaps we should flush the buffer by calling resetPipe().
Comment 34 Mauro Carvalho Chehab 2017-07-08 13:43:12 UTC
Created attachment 106501 [details]
Better handle errors when writing to pipe

Please try this version. It is simpler than the previous one. It should prevent using 100% of CPU if something gets wrong while writing to the pipe.

I added an extra logic there that will reset the pipe if it takes too many retries.
Comment 35 Mauro Carvalho Chehab 2017-07-08 14:06:54 UTC
Created attachment 106502 [details]
Better handle errors when writing to pipe - v2

(In reply to Mauro Carvalho Chehab from comment #34)
> Created attachment 106501 [details]
> Better handle errors when writing to pipe
> 
> Please try this version. It is simpler than the previous one. It should
> prevent using 100% of CPU if something gets wrong while writing to the pipe.
> 
> I added an extra logic there that will reset the pipe if it takes too many
> retries.

I forgot to reset the counter when write() succeeds. This patch should fix it.
Comment 36 Jan Palus 2017-07-08 14:31:49 UTC
Looks like it's still the same, but I'm not really sure whether we are on the same page -- writeToPipe() is not looping. It is rather called a lot once the bug is triggered. I've made following small adjustment to its code:

 void DvbLiveViewInternal::writeToPipe()
 {
+  static int notEmptyCnt = 0;
+  static int emptyCnt = 0;
+  if (buffers.isEmpty()) {
+    ++emptyCnt;
+  } else {
+    ++notEmptyCnt;
+  }
+    printf("empty: %d, not empty: %d\n", emptyCnt, notEmptyCnt);
 	while (!buffers.isEmpty()) {
 		const QByteArray &currentBuffer = buffers.at(0);
 		int bytesWritten = int(write(writeFd, currentBuffer.constData(), currentBuffer.size()));

That's output after 10 seconds "without bug":

empty: 0, not empty: 614

Now after bug is triggered empty counter skyrockets to some crazy value after 10 secs (and continues which is most probably the cause of high CPU usage):

empty: 5053133, not empty: 534

as if notifications about pipe being ready for writing were suddenly coming very very frequently
Comment 37 Jan Palus 2017-07-08 14:38:49 UTC
And just to be clear -- not empty counter continues to increase with the same pace once bug is triggered and everything works fine (TV runs smoothly), it's just high CPU usage in the background that changes.
Comment 38 Mauro Carvalho Chehab 2017-07-08 14:58:53 UTC
Created attachment 106506 [details]
Disable notifier if no data

(In reply to Jan Palus from comment #36)
> Looks like it's still the same, but I'm not really sure whether we are on
> the same page -- writeToPipe() is not looping. It is rather called a lot
> once the bug is triggered. I've made following small adjustment to its code:
> 
>  void DvbLiveViewInternal::writeToPipe()
>  {
> +  static int notEmptyCnt = 0;
> +  static int emptyCnt = 0;
> +  if (buffers.isEmpty()) {
> +    ++emptyCnt;
> +  } else {
> +    ++notEmptyCnt;
> +  }
> +    printf("empty: %d, not empty: %d\n", emptyCnt, notEmptyCnt);
>  	while (!buffers.isEmpty()) {
>  		const QByteArray &currentBuffer = buffers.at(0);
>  		int bytesWritten = int(write(writeFd, currentBuffer.constData(),
> currentBuffer.size()));
> 
> That's output after 10 seconds "without bug":
> 
> empty: 0, not empty: 614
> 
> Now after bug is triggered empty counter skyrockets to some crazy value
> after 10 secs (and continues which is most probably the cause of high CPU
> usage):
> 
> empty: 5053133, not empty: 534
> 
> as if notifications about pipe being ready for writing were suddenly coming
> very very frequently

Ah, OK. Well, I guess all we need to do is to disable the notifier to solve this issue. Still, I think that the first patch will make it more reliable.

Please test with both the previous patch and this one.
Comment 39 Mauro Carvalho Chehab 2017-07-08 15:21:27 UTC
Created attachment 106507 [details]
Disable notifier if no data - v2

It makes sense to also disable the notifier at the resetPipe(), with is called when a channel changes.
Comment 40 Jan Palus 2017-07-08 17:04:38 UTC
Looks like this was it! After applying last patch issue did not occur within 10min of constant channel switching. Thanks a lot Mauro!
Comment 41 coekbe 2017-07-08 17:55:54 UTC
Created attachment 106508 [details]
Demo app to show infinite signal emission

It looks like using a write notifier on a FIFO (or an ordinary file) is invalid. I attached a small application to demonstrate this. After unzipping and changing to the directory, try it with:

qmake -project
qmake
make
mkfifo testpipe
./testapp

On my system, writeToPipe() will be called on and on and starting immediately even though there is nothing going on at the fifo. On the other hand, (if the app is modified accordingly) as a read notifier it appears to work as expected even on FIFOs. Perhaps the notifier should be on reading instead?
Comment 42 coekbe 2017-07-08 17:57:51 UTC
Created attachment 106509 [details]
A patch using a read notifier

Another patch fixing the 100% cpu usage
Comment 43 Mauro Carvalho Chehab 2017-07-08 18:59:54 UTC
(In reply to coekbe from comment #41)
> Created attachment 106508 [details]
> Demo app to show infinite signal emission
> 
> It looks like using a write notifier on a FIFO (or an ordinary file) is
> invalid. I attached a small application to demonstrate this. After unzipping
> and changing to the directory, try it with:
> 
> qmake -project
> qmake
> make
> mkfifo testpipe
> ./testapp
> 
> On my system, writeToPipe() will be called on and on and starting
> immediately even though there is nothing going on at the fifo. 

That's the expected behavior. Basically, reads and writes are buffered internally at the Kernel. When such buffer is filled, a write to the
buffer will return an error code, if the file descriptor is opened in
non-blocking mode, until the buffer is flushed. So, if you do something
like (with is what the writeToPipe loop actually does):

    while (write(fd, data, size) < 0) {}

it will call write() several times, spending a lot of CPU cycles doing nothing.

the usage of a QSocketNotifier there is meant to avoid doing that.
Basically, if write() returns an error (typically, EAGAIN), indicating
that the Kernel buffer is full, the routine will exit and called later,
when the buffer is flushed.

That should save some CPU load during normal Kaffeine operation.

> On the other
> hand, (if the app is modified accordingly) as a read notifier it appears to
> work as expected even on FIFOs. Perhaps the notifier should be on reading
> instead?

A read notifier doesn't make much sense with the current code. Take a look at processData(): basically, when some data is received, it updates the buffer, then it calls writeToPipe().
Comment 44 Mauro Carvalho Chehab 2017-07-09 01:59:09 UTC
@Jan: thanks for identifying what part of Kaffeine was causing high CPU load!

Fixed on those two patches:

https://cgit.kde.org/kaffeine.git/commit/?id=7df55bf402a21778a945478ff44fca304fe06ab3:

    dvbliveview: disable write ready notifications if nothing to write
    
    The write notifier only makes sense if there's something to write.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>

https://cgit.kde.org/kaffeine.git/commit/?id=a233d847ab7fdf13215f41e7240dcd307215ae42:

    dvbliveview: better handle errors when writing to pipe
    
    The current logic of writeToPipe() loops forever if an error
    occurs. That causes CPU load to be 100%, as it will call write()
    in non-block mode inside a loop until it succeeds. That's not
    needed, as the write logic can already monitor the file descriptor
    availability via QSocketNotifier.
Comment 45 coekbe 2017-07-09 06:13:16 UTC
(In reply to Mauro Carvalho Chehab from comment #43)
>
> That's the expected behavior. Basically, reads and writes are buffered
> internally at the Kernel. When such buffer is filled, a write to the
> buffer will return an error code, if the file descriptor is opened in
> non-blocking mode, until the buffer is flushed. So, if you do something
> like (with is what the writeToPipe loop actually does):
> 
>     while (write(fd, data, size) < 0) {}
> 
> it will call write() several times, spending a lot of CPU cycles doing
> nothing.

You have an incorrect idea about the cause of this bug. Look at my demo app. I was referring to the writeToPipe() of the demo app, not of Kaffeine, although I named the method the same for apparent reasons. The point is that, at least on (my) Linux, enabling a write QSocketNotifier on a FIFO is *always* wrong. It just should not be used that way. Once enabled, Qt will start hammering the writeToPipe() method nonstop, all the time. Note how there are never any writes or reads on the FIFO at all in the test app. Yet the hammering goes on.

> 
> the usage of a QSocketNotifier there is meant to avoid doing that.
> Basically, if write() returns an error (typically, EAGAIN), indicating
> that the Kernel buffer is full, the routine will exit and called later,
> when the buffer is flushed.

The lines

		if ((bytesWritten < 0) && (errno == EINTR)) {
			continue;
		}

are not the cause of the cpu consumption. The write is repeated *only if* the write system call was interrupted (EINTR), but that is not what is occurring here.

> A read notifier doesn't make much sense with the current code. Take a look
> at processData(): basically, when some data is received, it updates the
> buffer, then it calls writeToPipe().

The idea of using the read notifier is that Kaffeine writeToPipe() cannot write to the FIFO if the VLC backend is not reading fast enough. So when writeToPipe() cannot write all the buffers, it could enable a read notifier. Once the VLC backend reads, Qt will emit the read signal and writeToPipe() could resume writing the rest of the buffers.

But again, the write notifier does not work correctly on FIFOs and cannot be the right way to do this.
Comment 46 José JORGE 2017-07-09 07:20:46 UTC
Yes a big thanks for all, it was beyond my abilities...

For an end user, is there an idea when an official release will bring this fix?
Comment 47 Mauro Carvalho Chehab 2017-07-09 11:22:04 UTC
(In reply to José JORGE from comment #46)
> Yes a big thanks for all, it was beyond my abilities...

Yeah, fixing performance issues usually is not trivial.

> For an end user, is there an idea when an official release will bring this
> fix?

I intend to wait for a while for people to test it for regressions before releasing a new version. If everything is ok, I'll likely release it next week.
Comment 48 coekbe 2017-07-09 14:05:07 UTC
(In reply to coekbe from comment #45)
> Note how
> there are never any writes or reads on the FIFO at all in the test app. Yet
> the hammering goes on.
> 

I was confused about this. This is actually the correct behaviour for write QSocketNotifier. I thought it should notify if something was written, but instead it notifies if writing is possible. Sorry. So the bug was that the notifications were never disabled once they got enabled.