Hallo, After system startup everything works great. But after running X11 for around 49 days moving windows with click+drag on the titlebar and windows switching using Alt-Tab with kwin_x11 stop working. The sympton is exactly like described in https://bugs.kde.org/show_bug.cgi?id=348569 - only without using Cluster-SSH, only using xterm, Firefox and Chrome: Thomas Lübking 2015-06-02 07:38:39 UTC a) Do windows still have a titlebar? b) Can you move them using "Alt+Left Mousebutton"? c) Can you switch focus using Alt+Tab? d) Can you interact with the remaining windows (with the mouse) at all? Jan-Marten Brüggemann 2015-06-02 07:49:44 UTC a) Yes, I have titlebars and can use the minimize, maximize and close buttons as well. b) No c) No d) Yes, all functions of the remaining windows are normal, the only thing is, I can not move any windows and alt+tab does not work. Putting windows to the front/back by clicking on the title bar of half-overlapping windows still works, as does focus-changing with click or focus-follows mouse. Only moving is not possible anymore, until X11 gets restarted - then it will work again for 49 days, even without a system reboot, and then break again. And this problem is reproduceable on multiple computers and for a quite long time (years). I will not restart the currently affected laptop just yet, in case there are some more infos to find out about the current broken state. The 49 day "delay" until the bug hits makes it quite difficult to quickly reproduce. Greetings Haegar
Sure this is not a duplicate of that bug? To help diagnose the error, you may need to post the output of either: kwin-x11 --replace if you use X11 or kwin-wayland --replace if you use Wayland (which is almost perfect in 5.10 previews!) Plus this is 5.8.5; in neon, also probably in Arch and Parabola, 5.8.6 is available
As the platform of the bug says this is Debian Unstable, which at least right now still has 5.8.5. haegar@aurora:~$ kwin_x11 --replace Module 'org.kde.kwin.decoration' does not contain a module identifier directive - it cannot be protected from external registrations. kwin_core: Compositing is not possible QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setRenderHint: Painter must be active to set rendering hints QPainter::setWindow: Painter not active QPainter::setClipRect: Painter not active QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setRenderHint: Painter must be active to set rendering hints QPainter::setWindow: Painter not active QPainter::setClipRect: Painter not active Clicking on a window titlebar to move it gets the output: QXcbConnection: XCB error: 3 (BadWindow), sequence: 6086, resource id: 44119848, major code: 2 (ChangeWindowAttributes), minor code: 0 QXcbConnection: XCB error: 3 (BadWindow), sequence: 7820, resource id: 67109147, major code: 18 (ChangeProperty), minor code: 0 QXcbConnection: XCB error: 3 (BadWindow), sequence: 9398, resource id: 67109148, major code: 18 (ChangeProperty), minor code: 0 QXcbConnection: XCB error: 3 (BadWindow), sequence: 9614, resource id: 67109149, major code: 18 (ChangeProperty), minor code: 0 Not sure that this bug does not have the same source - but as that bug is around Cluster-SSH sending invalid timestamps, which I neither use nor have installed, I decided to open a new one.
This affects me, too, exactly as described. I'm using 32-bit ~x86 Gentoo Linux on a 32-bit system. KWin/Plasma 5.9.2 QT 5.7.1 and 4.8.7 xorg 7.4-r2 Wayland is not installed. I'm using kwin_x11 rather than kwin_wayland If I refrain from either Alt+Tabbing _or_ switching virtual desktops, kwin is perfectly happy to keep resizing and moving windows. If I do _either_ of those things, kwin emits a single "QXcbConnection: XCB error: 3 ... " message to stdout every time I try to move or resize a window. Alt+Tab doesn't produce any messages to stdout. While kwin is failing to resize/move windows, I can: Alter window Z-order (either by clicking on a window's titlebar or using the raise/lower window keyboard shortcuts) Switch between virtual desktops. Create new windows. Close windows. Start krunner (but unlike normal (where it automatically gets keyboard focus) I have to manually click it for it to have keyboard focus). The only "fix" is to restart kwin_x11 and never press alt+tab, or switch virtual desktops. :( I've tried restarting kwin_x11 by doing all of killall kwin_x11 ; sleep $X ; kwin_x11 & kwin_x11 --replace & kstart5 -- kwin_x11 --replace kwrapper5 kwin_x11 --replace & None of them fix the problem. I've gotta use this computer for work, so I'm going to begrudgingly go throw all my state in a trash fire and restart KDE. I politely suggest that someone fire up a VM and leave it running for ~49.5 days to debug this. Some of us have long-running systems that we don't want to have to restart for no reasonable reason. (Don't get me wrong. As a programmer, I get that KDE is a complicated system. I'm also fully aware that most folks working on it are volunteers. :) I'm also aware that the faulty component might not even be _in_ KDE. But, like, _Windows 95_ was affected by this sort of error. Everyone who _ever_ works with anything having to do with time should have learned from that. ;)) I'm more than happy to provide additional information upon request. :)
49 days are 2^32 ms, so it might be because of a timer only using 32 bit.
Waiting for 49 from a dev running the system in a VM won't really help you. All we could know then is that we can reproduce it, that info is not worth to wait two days. The good information is the 32 bit msec timestamp. Those are used a lot on X11 so it's quite likely that they overflow. I try to investigate where they are used and what a smaller timestamp might mean. We do have sanity checks there, maybe they need to be weakened.
> We do have sanity checks there, maybe they need to be weakened. I'm 100% certain that I had uninterrupted KDE4 sessions that ran trouble-free for longer than 50 days. Best of luck in tracking down the errors. Thanks for looking at this!
We do have the following code: void setX11Time(xcb_timestamp_t timestamp) { if (timestamp > m_x11Time) { m_x11Time = timestamp; } } this means if the timestamp wraps around we will never update again. And the x11Time is used for various checks. So this is clearly fishy. I need to check in our git log why we did the sanity checking here, but I'm sure there is a good reason for it.
Relevant commit in KWin is: commit d95ab94f0a664c09223cf1b6d273da41b1cb625e Author: Martin Gräßlin <mgraesslin@kde.org> Date: Thu Feb 19 09:59:21 2015 +0100 Migrate away from QX11Info::appTime The porting to Qt5 broke the timestamp handling in many areas. A deeper look into Qt's xcb plugin shows that the appTime handling is not sufficient for KWin's need. E.g. the time is only updated in response to a property notify event if it's for a Qt created window, which is hardly ever the case in KWin. Another example is that key press/release events never updated the appTime. As the functionality in Qt is rather trivial we can do the timestamp handling ourselves. We filter all events anyway and it is slightly faster as we don't have to go through the QPA interface any more. ---- This means I need to compare how it was done in Qt 4.
Some additional investigation results: both moving windows and Alt+Tab require a keyboard/pointer grab. The xTime() is going into those requests and the documentation says: "The request fails with status InvalidTime if the specified time is earlier than the last-pointer-grab time or later than the current server time. Otherwise, the last-pointer-grab time is set to the specified time, with CurrentTime replaced by the current server time." If we don't update the xTime any more due to overflow we are in the situation that our specified time is "later than current server time". On the other hand if we pass the new wrapped around number we fall into the condition that "specified time is earlier than the last-pointer-grab time". To illustrate: let's assume the last-pointer-grab time is 2^32, Now we increment by 1 and will have a server time of 0. If we send in 0 it is earlier than the last-pointer-grab time. On the other hand if we send in our last state from updating X time we are later than the server time. To a certain degree this is broken in the X11 specification and if the X-Server behaves exactly like described in the spec we won't be able to fix it. KWin cannot use the CurrentTime check as we need to be explicit in the time we request, or we would run into other bugs. So currently what I see is that we need to relax our conditions in order to get the grab still working, but window management will probably still be broken, till some other application requests a pointer grab with 0.
(In reply to Martin Gräßlin from comment #9) > Some additional investigation results: both moving windows and Alt+Tab > require a keyboard/pointer grab. The xTime() is going into those requests > and the documentation says: > > "The request fails with status InvalidTime if the specified time is earlier > than the last-pointer-grab time or later than the current server time. > Otherwise, the last-pointer-grab time is set to the specified time, with > CurrentTime replaced by the current server time." I may be out of my depth (or reading the wrong documentation), but this quote seems relevant: "A timestamp is a time value, expressed in milliseconds. It typically is the time since the last server reset. Timestamp values wrap around (after about 49.7 days). The server, given its current time is represented by timestamp T, always interprets timestamps from clients by treating half of the timestamp space as being later in time than T. One timestamp value, named CurrentTime, is never generated by the server. This value is reserved for use in requests to represent the current server time." via: https://www.x.org/releases/X11R7.7/doc/libX11/libX11/libX11.html#Pointer_Grabbing So it looks like maybe the X11 folks already handle the timestamp rollover case, and kwin is being too clever?
That is the documentation of Xlib, but KWin uses xcb. Mostly the two libraries match, but sometimes there are differences. Also a spec is one thing, the other is how the one and only X-Server behaves. Long story short: I don't trust any x documentation.
(In reply to Martin Gräßlin from comment #11) > That is the documentation of Xlib, but KWin uses xcb. Mostly the two > libraries match, but sometimes there are differences. Fair enough, but the XCB guide [0] says: "For XCB, the documentation relies even more heavily on the protocol specifications. The generated API is an exact mapping to the X protocol; it translates the C call data into X protocol encoded packets as straightforwardly as possible." I would _expect_ that XCB wouldn't do something that runs contrary to the xlib documentation's definition of basic datatypes, such as timestamps. Maybe you've gotten bitten by just such an incompatibility, though? > Also a spec is one thing, the other is how the one and only > X-Server behaves. True, but I expect a few things to be true * xlib's description of how timestamp rollover is handled in the X server is how it's intended to be handled. * The x.org developers working on the parts of the server that handle timestamps aren't careless noobs, so xorg's handling of timestamp rollover is going to match the described behavior. > Long story short: I don't trust any x documentation. That's not an entirely unreasonable stance to have. But, if you can't easily test to check whether the documented behavior matches the implemented behavior, isn't it more pragmatic to code as if the documented behavior is true, and rely on user bug reports (supported by "The implemented behavior didn't match the docs!" prints in the code, where practical) to let you know when the implementation doesn't match the docs? :) [0] https://www.x.org/wiki/guide/xlib-and-xcb/#index5h2
> I would _expect_ that XCB wouldn't do something that runs contrary to the > xlib documentation's definition of basic datatypes, such as timestamps. > Maybe you've gotten bitten by just such an incompatibility, though? Unfortunately yes. Recent case: https://phabricator.kde.org/D5549 Also there are differences in the library. E.g. Xlib is older than 64 bit, so some parts which are in the protocol defined as int32 are set to long in Xlib, which results in nice incompatible code on 64 bit. So yeah it doesn't match 1:1 and in these aspects doesn't match the X protocol. That's why I don't trust the XLib library any more for interaction with X. Anyway I checked the x protocol and timestamp is a card32 which would support the description of XLib. Unfortunately the x protocol doesn't talk about how timestamps are intended to be. > > > Also a spec is one thing, the other is how the one and only > > X-Server behaves. > > True, but I expect a few things to be true > * xlib's description of how timestamp rollover is handled in the X server is > how it's intended to be handled. > * The x.org developers working on the parts of the server that handle > timestamps aren't careless noobs, so xorg's handling of timestamp rollover > is going to match the described behavior. > > > Long story short: I don't trust any x documentation. > > That's not an entirely unreasonable stance to have. But, if you can't easily > test to check whether the documented behavior matches the implemented > behavior, isn't it more pragmatic to code as if the documented behavior is > true, and rely on user bug reports (supported by "The implemented behavior > didn't match the docs!" prints in the code, where practical) to let you know > when the implementation doesn't match the docs? :) well I would like to fix it for good and not have to wait for the next bug fix release + 50 days to figure out whether it worked ;-)
(In reply to Martin Gräßlin from comment #13) > well I would like to fix it for good and not have to wait for the next bug > fix release + 50 days to figure out whether it worked ;-) The Linux kernel also has a counter that could wrap around and produce potential errors. The devs help themselves by setting the initial value of that counter very high, so a wrap occurs soon after boot: /* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. */ #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) Could the same be done for the X-Server, if just for debugging and testing?
(In reply to Christian Gerloff from comment #14) > The devs help themselves by setting the initial value of > that counter very high, so a wrap occurs soon after boot: Why's that better than doing what the X server is documented to do, [0] namely: "Timestamp values wrap around (after about 49.7 days). The server, given its current time is represented by timestamp T, always interprets timestamps from clients by treating half of the timestamp space as being later in time than T." [0] And almost certainly _does_ do, given that KWin 4 and numerous other X11 window managers manage to run for > 49.7 days without issue.
Git commit 0bec9ad7337536e319c17c5684d97e1156399fdb by Martin Gräßlin. Committed on 03/05/2017 at 19:36. Pushed by graesslin into branch 'Plasma/5.8'. Improve the x11 timestamp handling Summary: So far KWin only updated the x11 timestamp if the new timestamp is larger than the existing one. While this is a useful thing it creates problems when the 32 bit msec based time stamp wraps around which happens after running an X server for 49 days. After the timestamp wrapped around KWin would not update the timestamp any more and thus some calls might fail. Most prominent victims are keyboard and pointer grab which fails as the timestamp is either larger than the server timestamp or smaller than the last grab timestamp. Another problem related to timestamp handling is KWin getting broken by wrong timestamps sent by applications. A prominent example is clusterssh which used to send a timestamp as unix time which is larger than the x timestamp and thus our timestamp gets too large. This change addresses these problems by allowing to reset the timestamp. This is only used from updateXTime (which is normally invoked before we do things like grabKeyboard). Thus we make QX11Info::getTimestamp the ultimate trusted source for timestamps. Related: bug 348569 FIXED-IN: 5.8.7 Test Plan: As I cannot wait 50 days: unit tests for the two conditions added. Reviewers: #kwin, #plasma Subscribers: plasma-devel, kwin Tags: #kwin Differential Revision: https://phabricator.kde.org/D5704 M +11 -0 autotests/CMakeLists.txt A +123 -0 autotests/test_x11_timestamp_update.cpp [License: GPL (v2)] M +6 -2 main.h M +1 -1 utils.cpp M +3 -3 utils.h https://commits.kde.org/kwin/0bec9ad7337536e319c17c5684d97e1156399fdb
*** Bug 381823 has been marked as a duplicate of this bug. ***
50 days have past on my machine. $ uptime 10:54:38 up 50 days, 4 min Alt-Tab is still working. I can move and resize windows around. Nice. But copying from old windows is still broken. When I copy (mouse select, mouse select - right click - copy, select Ctrl-C) in KDE/Qt-Windows, nothing is copied. No new entry in klipper. And something different then the copied thing when I paste somewhere (Ctrl-V, mouse middle click, mouse right click - paste). It seems specific to those KDE/Qt-Windows, because copying from Firefox, XTerm, GVim still works. And pasting in old KDE/Qt-Windows works too. When I restart the application, it works normally again. Debian-Testing $ dpkg -l | grep kwin ii kwin-addons 4:5.8.5-2 amd64 additional desktop and window switchers for KWin ii kwin-common 4:5.8.7-1 amd64 KDE window manager, common files ii kwin-data 4:5.8.7-1 all KDE window manager data files ii kwin-style-breeze 4:5.8.5-2 amd64 KWin Breeze Style ii kwin-x11 4:5.8.7-1 amd64 KDE window manager, X11 version ii libkwin4-effect-builtins1 4:5.8.7-1 amd64 KDE window manager effect builtins library ii libkwineffects9 4:5.8.7-1 amd64 KDE window manager effects library ii libkwinglutils9 4:5.8.7-1 amd64 KDE window manager gl utils library ii libkwinxrenderutils9 4:5.8.7-1 amd64 KDE window manager render utils library ii qml-module-org-kde-kwindowsystem:amd64 5.28.0-1 amd64 provides integration of QML and KDE frameworks - kwindowsystem
The copy but is a different issue and sounds a lot like a bug in Qt. Please report there.