Bug 425864 - Aurorae-based windecos vanishing with libglvnd
Summary: Aurorae-based windecos vanishing with libglvnd
Status: RESOLVED WORKSFORME
Alias: None
Product: kwin
Classification: Plasma
Component: aurorae (show other bugs)
Version: git master
Platform: Gentoo Packages Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-27 11:30 UTC by Duncan
Modified: 2022-03-23 04:35 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
1i5t5.duncan: Wayland-
1i5t5.duncan: X11+


Attachments
debug patch (952 bytes, patch)
2020-11-25 08:24 UTC, Vlad Zahorodnii
Details
kwin_x11 --replace output, switching windeco oxygen > plastik > oxygen (4.86 KB, text/plain)
2020-11-29 17:33 UTC, Duncan
Details
Picture of titlebar issues (565.67 KB, image/png)
2021-02-16 22:18 UTC, freaky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2020-08-27 11:30:06 UTC
So Gentoo's currently switching from a manual selection of the X/Mesa OpenGL driver (using Gentoo's eselect-opengl tool) to automatic, using libglvnd.  After doing the necessary rebuilds (mesa and xorg-server, plus switching between the mutually blocking eselect-opengl and libglvnd) and restarting X/plasma, my windecos were all transparent!

After various troubleshooting, it seems all aurorae-based windecos, including kwin's native plastik, are affected, while oxygen and breeze work just fine.  Additionally, everything else appears to be fine, so it's only aurorae-based windecos affected.

I did note that with kwin's blur-behind-semi-transparent effect on, the area where the windeco should be was blurred (well, just the titlebar, since I have the no-borders option set).  Additionally, while the titlebar buttons are as invisible as the titlebar itself, they still respond to clicks.  (Of course with blur-behind off, the titlebar area was entirely invisible.)

Hardware/Software:
All frameworks/plasma/kde-apps live-git versions from the gentoo/kde overlay (-9999 master versions), on a Radeon rx460 card (Polaris11), running the native freedomware kernel/mesa/xf86-video-amdgpu drivers.  Possibly relevant version numbers: Gentoo/~amd64, Linux 5.8/5.9-rcs (tested/current), Qt-5.15.0, xorg-server-1.20.8-r1, mesa-20.1.6/20.2.0_rc2 (tested/current).

Further testing revealed that with either the opengl-3.1 or opengl-2.0 backends set in the compositor kcm, aurorae-based titlebars were transparent, while xrender, as expected, rendered fine, tho of course slower and disabling opengl-only effects like wobbly-windows.  Disabling compositing of course rendered the titlebars too, but disabled effects such as zoom and window transparency that I would have serious trouble doing without (especially transparency due to old-eyes).

The gentoo/xorg folks suggested I try restarting kwin_x11 from a terminal window to see if I got any useful diagnostics.  Unfortunately, as I pointed out, most kde apps, kwin_x11 included, spit out all sorts of alarming looking output even when they're (to all appearances) working just fine so such output tends to be effectively useless unless you can get a diff between the output in the good and bad cases.  Fortunately I was able to rebuild multiple times for testing, toggling between eselect-opengl and libglvnd mediated mesa, so getting the good/bad outputs to diff wasn't /too/ horribly difficult, just inconvenient.

Unfortunately even the kwin_x11 good/bad output diff, while smaller, was still incredibly noisy.  Manually cutting it down further based on messages the diff had already excluded, the result was two lines appearing only in the bad/libglvnd case, in order with some noise in between:

QQuickRenderControl::initialize called with incorrect current context

QOpenGLVertexArrayObject::destroy() failed to make VAO's context current

Further testing indicated that (as one might expect) the ::initialize error appears at kwin_x11 --replace load time, while the ::destroy() error appears later, when closing a window.

Typically, regardless of the windeco (so running unaffected breeze/oxygen as well as affected aurorae-based), kwin_x11 --replace will crash a few times then come up with the other-wm dialog.  With no other wms installed I just hit OK with the pre-filled kwin_x11, but by then it has restarted without composite and doesn't crash further.  *But* I can then turn composite back on and it still doesn't crash, aurorae-based windecos simply go transparent, while oxygen/breeze windecos and kwin effects work normally.  I'll only crash kwin again if I do another --replace, at which point it starts the multi-crash, return-with-compositing-disabled, renable-compositing to be stable again but with transparent aurorae-based windecos, routine all over again.


Given that the bug originally triggered with the Gentoo switch to libglvnd (with the older eselect-opengl masked and set to be removed in a month, now perhaps 3 weeks), I originally filed a bug there, but the bug now appears to be in kwin/aurorae, so I'm filing it here.

Two other points of interest on the Gentoo bug are:

1) A gentoo/kde dev tested as well and couldn't duplicate for whatever reason.  He was running a Radeon of some sort, he didn't say what, and versions he said were similar to mine.  I'd guess he's running a newer Radeon and the difference is either the hardware itself or down to hardware-specific drivers, but of course on gentoo there's all sorts of possible configuration differences as well.

And more interestingly and likely to be of value:

2) The gentoo/xorg/mesa dev suggested the problem was likely an uninitialized opengl context, which the two specific errors in the output suggest as well.  He pointed to this commit correcting a similar context-init omission in xdriinfo as an example of what might be missing/needed.

https://cgit.freedesktop.org/xorg/app/xdriinfo/commit/?id=6273d9dacbf165331c21bcda5a8945c8931d87b8

Not being a dev I really wouldn't know where to start thinking about how to apply that to kwin's aurorae engine, but the general idea does seem to fit the errors I saw in the output and seems reasonable.  And if you give me a patch or simply make the commit in kwin-master I can test it.

Finally, here's the gentoo bug link with various attachments including the full kwin_x11 --replace output, along with the original troubleshooting I've summarized above.

https://bugs.gentoo.org/736916
Comment 1 Duncan 2020-08-27 11:33:30 UTC
Adding the upstream kwin bug URL here as a comment and to the URL field: https://bugs.kde.org/show_bug.cgi?id=425864
Comment 2 Duncan 2020-08-27 11:36:30 UTC
(In reply to Duncan from comment #1)
> Adding the upstream kwin bug URL here as a comment and to the URL field:
> https://bugs.kde.org/show_bug.cgi?id=425864

Oops, wrong bugzi tab! =:^)
Comment 3 Duncan 2020-11-18 13:41:43 UTC
So I've been working on switching to wayland and am far enough plasma-wayland's my default environment now.

I got wondering about this bug and forgot that it was X/libglvnd related and that I was on wayland, so thought I'd test it.  Sure enough, I had aurorae-based titlebars again! =:^)  

Then I remembered I was on wayland, and being in a convenient task lull so I could restart plasma in X mode, tested it back on X as well.  Unfortunately the bug continues to exist there. =:^(

Fortunately I've both patched up Oxygen windecos to find them of acceptable height now (bug #425874) and prettier as well, so aurorae-based windecos don't matter so much now, and after adapting various scripts/configs and in some cases adjusting workflow, am finding wayland, now usable.  While wayland still has many bugs and some stability issues, it doesn't have /this/ bug, and X-only bugs aren't a big deal for me any more either.

Extra fortunate since this bug doesn't seem to be getting anywhere...  Well if nothing else in a few years as people switch to wayland I suppose it can be resolved/obsolete, but thought I'd add this update anyway, since it does confirm the expected, that the problem doesn't appear on wayland.
Comment 4 Duncan 2020-11-24 10:24:41 UTC
Setting flags +X11 -wayland

Switched to kde/plasma on wayland now.  Generally only starting plasma on X when testing whether some bug showing up on wayland is on X as well, but (as commented on the 18th) the problem was still there on X as of a few days ago.
Comment 5 Vlad Zahorodnii 2020-11-25 08:24:56 UTC
Created attachment 133628 [details]
debug patch

Can you apply the attached patch and check if there is any "Aaaaaaargggggghhh! OpenGL context is not valid" message printed in the terminal? (you would need to run kwin_x11 --replace in terminal after applying the patch)
Comment 6 Duncan 2020-11-29 17:33:44 UTC
Created attachment 133731 [details]
kwin_x11 --replace output, switching windeco oxygen > plastik > oxygen

(In reply to Vlad Zahorodnii from comment #5)
> Can you apply the attached patch and check if there is any
> "Aaaaaaargggggghhh! OpenGL context is not valid"

No such message (and I double-checked my build log to be sure the patch had applied).  There's some nasty-looking output logged but not that.  But in case it's useful...

Here's the log from a kwin_x11 --replace >kwin.debug 2>&1 , while already having the windeco kcm open so I can immediately test switching to an aurorae-based one.  I did the replace with the windeco set to oxygen, switched it to plastik, applied, switched it back to oxygen, applied, and did another replace to stop logging.

The first set of OpenGL/Mesa/etc info is from the initial replace.

The second is triggered by the apply after switching to plastik.  You can see the nasty-looking output after it from trying to load plastik (my previously preferred aurorae-based blacksquare has similar output, but plastik is the kde-shipped aurorae-based windeco so I did the log with it).

The third is from the apply after switching back to oxygen -- no nasty output.

(The final three lines of XCB BadWindow errors are unrelated; I was launching recursive pdmenus in konsole to run the final replace from there.)

(I'm pretty much switched to wayland now, having hack-patched a local workaround to the last real irritating bug #429177 on that today.  So I quit plasma/wayland to CLI (no *DM installed here) and start plasma/X to test X-side-only bugs like this, now.  I had my brain full of wayland bugs, workarounds and hack-patches with the switch so it took me a few days to apply this patch and get a log.)
Comment 7 Vlad Zahorodnii 2020-12-11 08:23:54 UTC
> QQuickRenderControl::initialize called with incorrect current context

Okay, this sort of explains why window decorations vanish. But still, it doesn't explain why there's no current opengl context.

Either EffectQuickView fails to create an offscreen surface, or something goes wrong while it tries to make the context current.
Comment 8 freaky 2021-02-16 22:17:21 UTC
Running into this issue as well.

I got a Dell XPS 15 9550 however (intel / nvidia hybrid). Seems pretty much independent of the hardware thus considering Duncan has AMD.
Comment 9 freaky 2021-02-16 22:18:29 UTC
Created attachment 135745 [details]
Picture of titlebar issues
Comment 10 Eugene Shalygin 2021-02-24 21:57:47 UTC
Same problem with an old Intel Haswell: 

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 4600 (HSW GT2) (0x416)
    Version: 21.0.0
    Accelerated: yes
    Video memory: 1536MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 4600 (HSW GT2)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 21.0.0-rc5
OpenGL core profile shading language version string: 4.50
Comment 11 Eugene Shalygin 2022-02-16 09:03:09 UTC
Works on Gentoo after libglvnd upgrade to 1.4.0.
Comment 12 Duncan 2022-02-21 05:57:59 UTC
Anyone else still seeing this bug?  Pending that I'm setting status to NEEDSINFO/WAITINGFORINFO

FWIW I've been wayland-only for awhile now and in fact don't have xorg installed at all any longer -- the only X I have is xwayland on kwin_wayland (with weston as a backup compositor), so for me this is obsolete.  I believe the needsinfo/waitingforinfo status will auto-resolve after a few weeks if nobody says they're still seeing it and resets the status.
Comment 13 Bug Janitor Service 2022-03-08 04:35:38 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 14 Bug Janitor Service 2022-03-23 04:35:18 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!