Bug 250398 - Playing JOGL games causes lag
Summary: Playing JOGL games causes lag
Status: RESOLVED NOT A BUG
Alias: None
Product: kwin
Classification: Plasma
Component: general (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-06 22:55 UTC by Richard
Modified: 2011-02-06 15:42 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard 2010-09-06 22:55:22 UTC
Version:           unspecified (using KDE 4.5.0) 
OS:                Linux

I have a GeForce GTS 250 graphics card with Nvidia's proprietary 256.53 driver. If I try to play an online game called Runescape, the X Windowing System will begin to slow and if I ignore it, it will reach the point of freezing. This is not exclusive to the game, but affects every graphical application on the system.

Reproducible: Always

Steps to Reproduce:
1. Have compositing on.
2. Start a web browser and play a JOGL game (e.g. Runescape).

Actual Results:  
After about 5 minutes, the UI will begin to become sluggish, where the mouse will visibly skip and rotating the game world will be laggy. Within an hour, the X Windowing system will freeze.

Expected Results:  
The game should work without any lag.

I have observed this under KDE 4.4.5 and KDE 4.5.1. I had kwin open recently and observed that CPU utilization on all cores was low until lag set-in. Then one core was at 100% all the time. After stopping compositing, the lag become better, but was still there and that core was still at 100%. Then several minutes after I stopped compositing, the lag disappeared and CPU usage reverted back to about 15% to 20% on each core.

I am using dev-java/sun-jdk-1.6.0.21 and dev-java/jogl-1.1.1a. No configuration change in the Advanced Desktop Effects seems to significantly alter my observations.
Comment 1 Thomas Lübking 2010-09-06 23:09:07 UTC
sounds like a leak - and since it's related to a particular toolkit (? what about eg. ioquake & friends) it's pretty much for sure in either this toolkit or the driver.

However:
what if you turn off all effects (and esp. blurring)
Comment 2 Richard 2010-09-06 23:46:23 UTC
I turned off all of the effects and ran Runescape for 18 minutes. While X Windows normally develops severe lag issues in that time frame, this time, X Windows did not lag or freeze.

My system had the default effects settings when I verified that the issue was unfixed in KDE 4.5.1 earlier today. I have limited time, but if I can be helpful troubleshoot further, I will try to troubleshoot further. How should I proceed?
Comment 3 Richard 2010-09-06 23:48:15 UTC
I caught a typo after I posted my reply. That second paragraph should have been:

"My system had the default effects settings when I verified that the issue was unfixed in KDE 4.5.1 earlier today. I have limited time, but if I can be helpful, I will try to troubleshoot further. How should I proceed?"
Comment 4 Thomas Lübking 2010-09-06 23:54:59 UTC
figure which effect causes this by turning on them one by one and test gaming.

i frankly don't know what effects are active by default, but since sharpen is likely not (and you should not use it, since the nvidia driver provides this with zero overhead) and it would be a persistent (non-transition) effect, i'd start with blur, then shadows...

this could be (rlated to) bug #242457
Comment 5 Richard 2010-10-02 04:03:24 UTC
I have deleted my ~/.kde4 directory and now I no longer seem to be able to reproduce this problem. I am closing the bug report. I will reopen it if it develops again further down the road.
Comment 6 Richard 2010-10-02 04:05:25 UTC
It struck as I clicked "Commit". :/

I am very busy with my class work; I will troubleshoot this when I find time between all of my class work.
Comment 7 Richard 2010-12-10 20:16:22 UTC
(In reply to comment #4)
> figure which effect causes this by turning on them one by one and test gaming.
> 
> i frankly don't know what effects are active by default, but since sharpen is
> likely not (and you should not use it, since the nvidia driver provides this
> with zero overhead) and it would be a persistent (non-transition) effect, i'd
> start with blur, then shadows...
> 
> this could be (rlated to) bug #242457

While I have tried investigating this a few times in my spare time over the past few months, none of my attempts were successful until today. Sadly, it seems that if I had read your comment more carefully, I would have discovered the cause of this issue much quicker. The issue indeed appears to be the blur effect.

I discovered this after discovering that switching to XRender in KDE 4.5.4 fixed my issue and two effects were disabled by it, Magic Lamp and Blur. I enabled Magic Lamp a month ago, so that left Blur as the potential culprit. I disabled Blur, switched back to OpenGL and everything is fast again.

*** This bug has been marked as a duplicate of bug 242457 ***
Comment 8 Thomas Lübking 2010-12-11 00:30:56 UTC
can you then confirm your issue's gone in 4.6 beta?
(the possible dupe's marked "works for me", there's no explicit "was fixed by...", so it can also probably not be backported unless somebody knows what caused this in particular)
Comment 9 Richard 2010-12-11 01:07:59 UTC
(In reply to comment #8)
> can you then confirm your issue's gone in 4.6 beta?
> (the possible dupe's marked "works for me", there's no explicit "was fixed
> by...", so it can also probably not be backported unless somebody knows what
> caused this in particular)

I have some bad news. After I posted about the issue going away with the Blur effect, I encountered some minor lag with Runescape running in a background tab in Chromium. Switching to the tab and switching back to whatever I was doing seemed to alleviate the issue, so things definitely seem better with compositing on and the blur effect off.

It might be a related issue, but I also tried playing some native OpenGL games, specifically Scorched 3D and Aquaria, and I encountered lag in them too. When playing Aquaria, I tried turning compositing off and I still had lag issues. With compositing off, the lag went away several minutes after its onset as I was trying to get to a save crystal.

I am right now using Xorg-server 1.9.3 RC2 with Nvidia's 260.19.26 driver. This is a production system and I am still tied down. Attempting to reproduce these issues is incredibly time consuming and I still am tied down with school work, so I am going to try downgrading xorg server to 1.8.2 in the mean time to see if these issues go away. There is a possibly related issue that people at nvnews.net found in the xorg server 1.9.x branch, so hopefully I can kill two birds with one stone. I have been wreslting with this since xorg 1.7.x, so this will be more of a test of whether or not a combination of things were causing this problem for me more than anything else.

http://www.nvnews.net/vbulletin/showpost.php?p=2361777&postcount=28

I will try an upgrade to KDE 4.6 in a few weeks when I have a bit more time. I wish I could do more sooner, but this issue is hard enough to reproduce that I cannot do more at this time.
Comment 10 Richard 2010-12-20 03:22:55 UTC
I tried upgrading to KDE 4.6 Beta 2, but that did not go well. There are issues with the packages on Gentoo Linux.

I also tried using LXDE as a troubleshooting measure and I had similiar problems under this, so I believe that this issue is not a Kwin. I did some digging around and found out that MTRR is somewhat important for the nvidia drivers and I noticed that CONFIG_MTRR_SANITIZER was not compiled into my kernel. I recompiled my kernel with it and the lag issues appear to be gone. I was able to play a game of Osmosis with Runescape running in the background and no lag occurred.
Comment 11 Thomas Lübking 2010-12-20 14:32:51 UTC
I hope (for you) that this is it, but afaik nvidia uses PAT since ages and only falls back to MTRR (+the sanitizer has to be activated either by default in the kernel config or by a GRUB kernel parameter)
Comment 12 Richard 2010-12-20 18:28:56 UTC
(In reply to comment #11)
> I hope (for you) that this is it, but afaik nvidia uses PAT since ages and only
> falls back to MTRR (+the sanitizer has to be activated either by default in the
> kernel config or by a GRUB kernel parameter)

The following article claims that PAT is complementary to MTRR:

http://en.gentoo-wiki.com/wiki/MTRR

Gentoo Linux's guide for the Nvidia drivers claims that it requires MTRR support in the kernel and that having uncachable entries in /proc/mtrr causes problems:

http://www.gentoo.org/doc/en/nvidia-guide.xml

I had uncachable entries in /proc/mtrr. With MTRR cleanup enabled, those entries became writeback entries. That appears to have solved my lag issues, although given how difficult it is for me to reproduce this issue, only time will tell.

Since issues caused by uncachable entries in /proc/mtrr are incredibly difficult to diagnose, it might be a good idea if KDE scanned /proc/mtrr for uncachable entries and put a warning message into some sort of log upon finding such entries. That would give people encountering issues some kind of clue as to what is wrong so that they do not blame KDE for it.
Comment 13 Richard 2010-12-20 21:07:49 UTC
My previous comment was somewhat overreaching. The Gentoo Linux Nvidia Driver Documentation says that this only affects systems with 4GB or more memory and after raising some false alarms on the Gentoo Linux forums, it appears that this is the case.

Having MTRR cleanup enabled clearly fixes any problems caused by an improperly configured MTRR table, but it is not obvious from finding uncachable entries in the MTRR table that there is a problem.

I have done some more experiments with the JOGL problems I had. There is some minor lag that occurs when Runescape is running in a background tab in Chromium for an extended period of time, but it occurs regardless of whether or not compositing is enabled. Switching back to the Runescape tab causes it to disappear within 2 seconds, during which time, Runescape's framerate was visibly choppy. If I do not switch back, the lag appears to resolve itself on its own.

The causes of the JOGL issues have been incredibly difficult to diagnose and they have extremely poor reproducibility, but it appears that the crashes that were related to them were caused by race conditions in Nvidia's binary driver that were triggered by an improperly configured MTRR table, which is not a KDE issue.

Since I am still having minor JOGL issues, I will reopen this pending some experiments in either LXDE or OpenBox to determine that nothing in KDE is causing them. I will also edit the title to reflect that. I suggest that the severity of this bug be downgraded. It really was two separate issues and the one that caused things to freeze/crash is clearly not a KDE issue.
Comment 14 Richard 2010-12-20 21:08:16 UTC
My previous comment was somewhat overreaching. The Gentoo Linux Nvidia Driver Documentation says that this only affects systems with 4GB or more memory and after raising some false alarms on the Gentoo Linux forums, it appears that this is the case.

Having MTRR cleanup enabled clearly fixes any problems caused by an improperly configured MTRR table, but it is not obvious from finding uncachable entries in the MTRR table that there is a problem.

I have done some more experiments with the JOGL problems I had. There is some minor lag that occurs when Runescape is running in a background tab in Chromium for an extended period of time, but it occurs regardless of whether or not compositing is enabled. Switching back to the Runescape tab causes it to disappear within 2 seconds, during which time, Runescape's framerate was visibly choppy. If I do not switch back, the lag appears to resolve itself on its own.

The causes of the JOGL issues have been incredibly difficult to diagnose and they have extremely poor reproducibility, but it appears that the crashes that were related to them were caused by race conditions in Nvidia's binary driver that were triggered by an improperly configured MTRR table, which is not a KDE issue.

Since I am still having minor JOGL issues, I will reopen this pending some experiments in either LXDE or OpenBox to determine that nothing in KDE is causing them. I will also edit the title to reflect that. I suggest that the severity of this bug be downgraded. It really was two separate issues and the one that caused things to freeze/crash is clearly not a KDE issue.
Comment 15 Richard 2011-02-06 14:52:39 UTC
(In reply to comment #11)
> I hope (for you) that this is it, but afaik nvidia uses PAT since ages and only
> falls back to MTRR (+the sanitizer has to be activated either by default in the
> kernel config or by a GRUB kernel parameter)

Thomas, I have identified the cause of my issue and it is specific to my system. My graphics card was running at 103 +/- 5 degrees Celsius. The Nvidia driver will thermal throttle the card at 105 degrees centigrade. I had switched from Windows 7 to Gentoo Linux around the time the dust had become an issue, so I mistakenly assumed that the switch had caused it. After dealing with the dust accumulation in my case, temperatures changed to Celsius to 50 +/- 5 degrees Celsius. Not only have the JOGL issues been fixed, but everything feels faster, including web browsing.

Since fixing this, I have actually stumbled across two other issues on my systems that I had not noticed because everything was so horribly slow. The major one is that my laptop's 2D performance is slow. Switching from the Nvidia blob to Nouveau fixes the slow 2D performance, but Nouveau increases idle power consumption by 10 watts, so I cannot use it as a permanent solution. The other issue involves random interface lag on my desktop, but it clears itself within 1 to 2 seconds. Martin reported on his blog today that he fixed an issue that affected the Nvidia blob, but not Nouveau, so I am going to try the latest development code and see if that fixes my issues. If it does not (or causes me to discover something else), I will file new bug reports for them.

With that said, I am closing this as Invalid. I apologize for any inconvenience that my thermal throttling issue might have caused you.
Comment 16 Thomas Lübking 2011-02-06 15:42:30 UTC
(In reply to comment #15)
> The major one is that my laptop's 2D performance is slow. Switching from the Nvidia
> blob to Nouveau fixes the slow 2D performance, but Nouveau increases idle power
> consumption by 10 watts, so I cannot use it as a permanent solution.

pretty known issue and pretty likely related to a flooded pixmap cache, try:

nvidia-settings -a PixmapCache=0; nvidia-settings -a PixmapCache=1

(do NOT try to be smart and collect both assignments - the tool won't just do anything then ;-)
another way would be to use the "export QT_GRAPHICSSYSTEM=raster" environment (which will bypass XRender and do everything on the CPU)

This is (likely) triggered by tth. apparently in KApplication which (in consequence) "abuses" QPixmal::fill(Qt::transparent) - and the nvidia driver really does not like that :-(

>  The other issue involves random interface lag on my desktop, but it clears itself within
> 1 to 2 seconds.
That's (likely again) nvidia's pixmap cache re-organzing itself - can possibly be prevented by cron'ing the above hack.

> With that said, I am closing this as Invalid. I apologize for any inconvenience
> that my thermal throttling issue might have caused you.
At least me doesn't feel any inconvenienced =)