Bug 260248 - kwin lockup with nvidia driver 260*
Summary: kwin lockup with nvidia driver 260*
Status: RESOLVED WORKSFORME
Alias: None
Product: kwin
Classification: Plasma
Component: compositing (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR major
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: investigated, triaged
Depends on:
Blocks:
 
Reported: 2010-12-16 08:10 UTC by Jamie Smith
Modified: 2018-09-23 10:13 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jamie Smith 2010-12-16 08:10:06 UTC
Version:           unspecified (using KDE 4.5.85) 
OS:                Linux

KWin locks up intermittently with Nvidia driver and opengl composite enabled. XComposite (non-gl) works fine.

Reproducible: Couldn't Reproduce

Steps to Reproduce:
Not reliably reproduceable.

Actual Results:  
..

Expected Results:  
System should reliably do the opposite of the expected action, and fail to cursor-lock.

Tried with vsync and d/idr rendering options adjusted and still have a lock up.
Comment 1 Thomas Lübking 2010-12-16 14:21:47 UTC
- how deep is the lockup?
  * can you still "shift+alt+F12" to suspend compositing?
  * if not: can you still "ctrl+alt+F1" to reach VT1?
  * does the pointer keep moving?

- can you link this to a specific effect plugin?
(try to disable "blur"/"sharpen" first, don't use "sharpen" at all, since the nvidia driver can do this better/faster)
Comment 2 Jamie Smith 2010-12-21 00:12:53 UTC
I can not use the keyboard; even numlock doesn't work.

The pointer is frozen. The last time this happened the soundcard also acted 
up.

I will try disabling all effects now to see if it makes the computer any 
stabler; I get a lockup very very intermittently.

It seems to be a bug in the transparency effect. I had wobby windows and 
transparency both enabled at the time, with no other effects enabled.

It almost looks like a race condition.

Anyway it works flawlessly so far with both fps and wobbly windows effects 
running.


On Thursday December 16 2010 6:21:47 am Thomas Lübking wrote:
> https://bugs.kde.org/show_bug.cgi?id=260248
> 
> 
> Thomas Lübking <thomas.luebking@gmail.com> changed:
> 
>            What    |Removed                     |Added
> ---------------------------------------------------------------------------
> - Status|UNCONFIRMED                 |NEEDSINFO
>          Resolution|                            |WAITINGFORINFO
> 
> 
> 
> 
> --- Comment #1 from Thomas Lübking <thomas luebking gmail com>  2010-12-16
> 14:21:47 --- - how deep is the lockup?
>   * can you still "shift+alt+F12" to suspend compositing?
>   * if not: can you still "ctrl+alt+F1" to reach VT1?
>   * does the pointer keep moving?
> 
> - can you link this to a specific effect plugin?
> (try to disable "blur"/"sharpen" first, don't use "sharpen" at all, since
> the nvidia driver can do this better/faster)
Comment 3 Thomas Lübking 2010-12-21 00:29:53 UTC
(In reply to comment #2)
> I can not use the keyboard; even numlock doesn't work.
> The pointer is frozen. The last time this happened the soundcard also acted 
> up.

This (esp. the affected sound) sounds much like a halted kernel, ie. a severe issue in either the kernel or the nvidia kernel module or your hardware.
(a race in kwin would not prevent you from switching to VT1, or even lock the cursor)

Since you're on gentoo you might hit this issue:
https://bugs.kde.org/show_bug.cgi?id=250398

esp. if you've got >= 4GB RAM, do "cat /proc/mtrr". every line should end by "write-back" or "write-combining" and NONE by "uncachable" or similar.

(please notice that any kind of halted kernel would be no more a kwin issue as userspace processes should by no means be able to halt the kernel - ever...)
Comment 4 perrantrevan 2011-01-09 11:40:15 UTC
I get a fairly reproducible lockup with nvidia 260* drivers.

I have desktop cube activated by moving the mouse to the bottom-right corner.

The cube effect is smooth but on exiting the effect (by right clicking), many times the desktop becomes unresponsive (the mouse cursor still works).

Sometimes kwin halts desktop effects and I can re-enable them (Alt+Shift+F12)

Sometimes this even the keyboard stops responding and I have to hit the power button!

This problem only seems to occur with desktop cube.
Comment 5 Thomas Lübking 2011-01-09 15:15:33 UTC
Do you have a second machine and an sshd on the affected one so you can try whether in this case you can still ssh into the broken one (iff, it's "only" an X11 issue - otherwise the kernel is likely halted)
Comment 6 perrantrevan 2011-01-09 15:36:11 UTC
Unfortunately I don't have 2nd machine. Would anything useful be in the xorg log?
Comment 7 perrantrevan 2011-01-09 15:40:14 UTC
I use Ubuntu packages and since yesterday have started to use their mainline kernel 2.6.37-999. However, the problem still occurs with the new kernel.
Comment 8 Thomas Lübking 2011-01-09 15:48:13 UTC
While being alotoffun(tm) to blame ubuntu shipping low quality stuff, it's rather likely in the nvidia kernel module.

And no - if the kernel halts, there won't be any log entries, however if it's an X11 thing the nvidia driver /might/ spam /var/log/Xorg.0.log with error messages ("EE") or warnings ("WW"), so maybe

grep -E "EE|WW" /var/log/Xorg.*log*
Comment 9 omega 2011-01-09 19:11:19 UTC
i have same issue.


grep -E "EE|WW" /var/log/Xorg.*log*
/var/log/Xorg.0.log:    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/var/log/Xorg.0.log:[    25.308] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
/var/log/Xorg.0.log:[    25.308] (WW) AllowEmptyInput is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
/var/log/Xorg.0.log:[    25.308] (WW) Disabling Keyboard0
/var/log/Xorg.0.log:[    25.308] (WW) Disabling Mouse0
/var/log/Xorg.0.log:[    25.309] (II) Loading extension MIT-SCREEN-SAVER
/var/log/Xorg.0.log:[    30.225] (II) XKB: reuse xkmfile /var/lib/xkb/server-CC7E2D49EE1636C9B218267AD3BAD04B12023D75.xkm
/var/log/Xorg.0.log:[    30.260] (WW) Microsoft Microsoft® Nano Transceiver v2.0: ignoring absolute axes.
/var/log/Xorg.0.log:[    30.261] (EE) Microsoft Microsoft® Nano Transceiver v2.0: failed to initialize for relative axes.
/var/log/Xorg.0.log:[    30.261] (WW) Device 'Microsoft Microsoft® Nano Transceiver v2.0' has 37 axes, only using first 36.
/var/log/Xorg.0.log.old:        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/var/log/Xorg.0.log.old:[    25.343] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
/var/log/Xorg.0.log.old:[    25.343] (WW) AllowEmptyInput is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
/var/log/Xorg.0.log.old:[    25.343] (WW) Disabling Keyboard0
/var/log/Xorg.0.log.old:[    25.343] (WW) Disabling Mouse0
/var/log/Xorg.0.log.old:[    25.344] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
/var/log/Xorg.0.log.old:[    25.344] (II) Loading extension MIT-SCREEN-SAVER
/var/log/Xorg.0.log.old:[    30.241] (II) XKB: reuse xkmfile /var/lib/xkb/server-CC7E2D49EE1636C9B218267AD3BAD04B12023D75.xkm
/var/log/Xorg.0.log.old:[    30.271] (WW) Microsoft Microsoft® Nano Transceiver v2.0: ignoring absolute axes.
/var/log/Xorg.0.log.old:[    30.280] (EE) Microsoft Microsoft® Nano Transceiver v2.0: failed to initialize for relative axes.
/var/log/Xorg.0.log.old:[    30.280] (WW) Device 'Microsoft Microsoft® Nano Transceiver v2.0' has 37 axes, only using first 36.
Comment 10 Thomas Lübking 2011-01-09 20:39:35 UTC
nothing spectacular in that log - only some input device (mouse, touchapd - M$ nano thing) has too many axes =D

the only thing is the input device autodetection, but that should no cause "hangs" when leaving some effect.

could you move to VT1 from the situation or did you have to hardreset?
Comment 11 Reartes Guillermo 2011-01-11 17:42:29 UTC
Distribution: Fedora 13 X86_64

Kernel: 2.6.34.7-66.fc13.x86_64
kdebase-workspace-4.5.4-1.fc13.x86_64
nVIDIA Propietary(P) Drivers: 260.19.29

nVIDIA GT220 (XFX) on a Asus M4N72-E
02:00.0 VGA compatible controller [0300]: nVidia Corporation GT216 [GeForce GT 220] [10de:0a20] (rev a2)

Issue: FREZE

When desktop effects are enabled, system randomly freezes. It is a hard-freeze, no ssh. The only avaiable
option is to use SYSRQ feature to poweroff the system. (So at least some piece of the kernel is still alive)

I started using these drivers(P) recently, and i had at about 3 freezes in less than a week.
It happens mostly when the cursor hovering the taskbar to switch to another task. I cannot remember the task
name, problably firefox, opera, kwrite. 

MTRR Output (Have 8gb ram):
# cat /proc/mtrr 
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c0000000 ( 3072MB), size=  128MB, count=1: write-back

background:
* originaly i intended to use nouveau, but the card has random gpu locks (seems common in these models)
  this situation lasted some months after debuging. i did not found a solution. Acceleration was disabled.
* installed the drivers(P), and enabled deskop effects last week. And i noticed the issue very soon...
Comment 12 Thomas Lübking 2011-01-11 21:07:02 UTC
(In reply to comment #11)
> When desktop effects are enabled, system randomly freezes. It is a hard-freeze,
> no ssh. The only avaiable option is to use SYSRQ feature to poweroff the system.

ok, that's for sure a major driver / kernel module issue then.

My impression is that it's in the CUDA or rather VDPAU part since
- all cards supporting them seems affected
- all others seem not
- from the release notes a lot of work went there recently.

Unfortunately i don't know whether it's possible to block those drivers/parts from loading - you could only attempt to move away the related libs
/usr/lib/libcuda.so -> libcuda.so.x.y.z
/usr/X11R6/lib/libvdpau.so.x.y.z
/usr/X11R6/lib/libvdpau_trace.so.x.y.z
/usr/X11R6/lib/libvdpau_nvidia.so.x.y.z

but i don't know whether X11 will still start up if those libs are missing (or nvidia simply disables the feature then)
Comment 13 Jamie Smith 2011-01-14 17:42:13 UTC
I just experienced a lockup with no effects and OpenGL / Direct Rendering, so 
it may not be a problem with any plugin, but may be an issue with the opacity 
code.

I ran fine for ages with XRender and plugins when the bug first manifested in 
a driver update from Nvidia. I think the 200-series was the first 
manifestation. KDE 4.4 I think was also (somewhat topically) the second part 
of the equation.


On January 9 2011 11:11:21 am omega wrote:
> https://bugs.kde.org/show_bug.cgi?id=260248
> 
> 
> omega <biasquez@inwind.it> changed:
> 
>            What    |Removed                     |Added
> ---------------------------------------------------------------------------
> - CC|                            |biasquez@inwind.it
> 
> 
> 
> 
> --- Comment #9 from omega <biasquez inwind it>  2011-01-09 19:11:19 ---
> i have same issue.
> 
> 
> grep -E "EE|WW" /var/log/Xorg.*log*
> /var/log/Xorg.0.log:    (WW) warning, (EE) error, (NI) not implemented,
> (??) unknown.
> /var/log/Xorg.0.log:[    25.308] (WW) The directory
> "/usr/share/fonts/X11/cyrillic" does not exist.
> /var/log/Xorg.0.log:[    25.308] (WW) AllowEmptyInput is on, devices using
> drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
> /var/log/Xorg.0.log:[    25.308] (WW) Disabling Keyboard0
> /var/log/Xorg.0.log:[    25.308] (WW) Disabling Mouse0
> /var/log/Xorg.0.log:[    25.309] (II) Loading extension MIT-SCREEN-SAVER
> /var/log/Xorg.0.log:[    30.225] (II) XKB: reuse xkmfile
> /var/lib/xkb/server-CC7E2D49EE1636C9B218267AD3BAD04B12023D75.xkm
> /var/log/Xorg.0.log:[    30.260] (WW) Microsoft Microsoft® Nano Transceiver
> v2.0: ignoring absolute axes.
> /var/log/Xorg.0.log:[    30.261] (EE) Microsoft Microsoft® Nano Transceiver
> v2.0: failed to initialize for relative axes.
> /var/log/Xorg.0.log:[    30.261] (WW) Device 'Microsoft Microsoft® Nano
> Transceiver v2.0' has 37 axes, only using first 36.
> /var/log/Xorg.0.log.old:        (WW) warning, (EE) error, (NI) not
> implemented, (??) unknown.
> /var/log/Xorg.0.log.old:[    25.343] (WW) The directory
> "/usr/share/fonts/X11/cyrillic" does not exist.
> /var/log/Xorg.0.log.old:[    25.343] (WW) AllowEmptyInput is on, devices
> using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
> /var/log/Xorg.0.log.old:[    25.343] (WW) Disabling Keyboard0
> /var/log/Xorg.0.log.old:[    25.343] (WW) Disabling Mouse0
> /var/log/Xorg.0.log.old:[    25.344] (WW) Open ACPI failed
> (/var/run/acpid.socket) (No such file or directory)
> /var/log/Xorg.0.log.old:[    25.344] (II) Loading extension
> MIT-SCREEN-SAVER /var/log/Xorg.0.log.old:[    30.241] (II) XKB: reuse
> xkmfile
> /var/lib/xkb/server-CC7E2D49EE1636C9B218267AD3BAD04B12023D75.xkm
> /var/log/Xorg.0.log.old:[    30.271] (WW) Microsoft Microsoft® Nano
> Transceiver v2.0: ignoring absolute axes.
> /var/log/Xorg.0.log.old:[    30.280] (EE) Microsoft Microsoft® Nano
> Transceiver v2.0: failed to initialize for relative axes.
> /var/log/Xorg.0.log.old:[    30.280] (WW) Device 'Microsoft Microsoft® Nano
> Transceiver v2.0' has 37 axes, only using first 36.
Comment 14 Reartes Guillermo 2011-02-13 00:01:29 UTC
I also experienced lockups with desktop effects disabled.

Info: 
I run the system with pcie_aspm=off, as a workaround for the sata controller.

I recently noticed that when disabling pcie_aspm the pcie link speed is 2.5GT/s
and when pcie aspm is enabled it is Speed 5GT/s.

Also nvidia-settings report pcie gen1 (but it is set to auto [actually gen2])

Also i even brougth an ATI HD5670 and to my surprise it also freezes and also have the same issue with the link speed. (Tested with the (P) Catalyst)

There are other differences when executing lspci -vvv in a system with pcie aspm enabled and disabled.

The OtherOS (wich is no longer in its partition) never crashed with both nv or ati.
Comment 15 Thomas Lübking 2011-02-13 20:01:09 UTC
> I also experienced lockups with desktop effects disabled
That's for sure not related to KWin (which has no impact on painting while compositing is off) but sth. in the kernel (PCI subsystem, i assume from you description and the fact that it applies to all kinds of GPU)
Comment 16 Thomas Lübking 2011-02-13 20:04:10 UTC
@perrantrevan (comment #4)
your issue sounds different from OP and related to  new bug #266182
Comment 17 RussianNeuroMancer 2011-02-15 07:04:27 UTC
This is still issue with 270 beta driver?
Comment 18 Martin Flöser 2011-04-30 10:54:53 UTC
most likely a driver or distro bug. Is this still reproducable with a newer stack?
Comment 19 Andrew Crouthamel 2018-09-22 01:44:32 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days, the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please set the bug status as REPORTED so that the KDE team knows that the bug is ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 20 Jamie Smith 2018-09-22 02:13:56 UTC
The Nouveau driver IIRC doesn't manifest this particular issue. I can't say as much for the nvidia binary driver. 

Closing; please feel free to reopen if this issue is still verifiable.