Since commit e641022bf9482a11209577b5654cd43231be0755, KWin crashes during startup when compositing is enabled. The crash is caused by the addition of if (dpy == EGL_NO_DISPLAY) dpy = eglGetDisplay(EGL_DEFAULT_DISPLAY); in libkwineffects/kwinglutils.cpp. When reverting the commit or removing these two lines, everything works fine again. Right before the crash, I see "radeon: Failed to get PCI ID, error number -13" which comes from mesa (more precisely from do_winsys_init in src/gallium/winsys/radeon/drm/radeon_drm_winsys.c when RADEON_INFO_DEVICE_ID is queried). Reproducible: Always Steps to Reproduce: 1. Start KWin with compositing enabled
Created attachment 96906 [details] Output of "qdbus org.kde.KWin /KWin supportInformation"
could you please paste the backtrace?
From the mesa troubleshooting guide: If see this error message: radeon: Failed to get PCI ID, error number -13, make sure you have permissions to access the device (usually /dev/dri/card0), and get the latest version of mesa from git. Prior to this commit: 044de40cb0c6af54d99252f55145972780362afa, you would have seen this error message when running compute programs and X at the same time.
Created attachment 96907 [details] Backtrace
(In reply to Thomas Lübking from comment #3) > From the mesa troubleshooting guide: > > If see this error message: radeon: Failed to get PCI ID, error number -13, > make sure you have permissions to access the device (usually > /dev/dri/card0), and get the latest version of mesa from git. Prior to this > commit: 044de40cb0c6af54d99252f55145972780362afa, you would have seen this > error message when running compute programs and X at the same time. Yes, I know. It's not a permission problem though.
Created attachment 96908 [details] Backtrace with debug symbols
Does it also happen with the latest stable MESA release? (This very much looks like a driver bug, notably since the egl call should maybe cause an error, but certainly not segfault) The call is kinda crucial for intel drivers atm, so if this is some design issue in the radeon drivers, we'll have to branch drivers here ...
I have only tried with mesa master. I can also compile the latest stable version but this will take a bit since I will have to downgrade llvm, clang and maybe some other first.
what's btw. the output of eglinfo?
It segfaults in the same function. So it really seems to be a driver issue.
Thank you for your patience. I have sent a patch to mesa-dev that fixes the crash. I still see "radeon: Failed to get PCI ID, error number -13" (both with KWin and eglinfo) but apart from that, both seem to work.
Many thanks for your efforts. Does eglinfo work as expected w/ the patch (beyond not segfaulting ;-) for i'd say do_winsys_init() should not fail itfp.?
It seems so, this is the output I get: radeon: Failed to get PCI ID, error number -13 EGL API version: 1.4 EGL vendor string: Mesa Project EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3 EGL extensions string: EGL_EXT_buffer_age EGL_EXT_image_dma_buf_import EGL_KHR_create_context EGL_KHR_get_all_proc_addresses EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_surfaceless_context EGL_MESA_configless_context EGL_MESA_image_dma_buf_export EGL client extensions string: EGL_EXT_client_extensions EGL_EXT_platform_base EGL_EXT_platform_wayland EGL_EXT_platform_x11 EGL_KHR_client_get_all_proc_addresses EGL_MESA_platform_gbm Configurations: bf lv colorbuffer dp st ms vis cav bi renderable supported id sz l r g b a th cl ns b id eat nd gl es es2 vg surfaces --------------------------------------------------------------------- 0x01 32 0 8 8 8 8 0 0 0 0 0x34325241-- y y y win 0x07 32 0 8 8 8 8 16 0 0 0 0x34325241-- y y y win 0x0d 32 0 8 8 8 8 24 0 0 0 0x34325241-- y y y win 0x13 32 0 8 8 8 8 24 8 0 0 0x34325241-- y y y win 0x19 32 0 8 8 8 8 32 0 0 0 0x34325241-- y y y win 0x1f 24 0 8 8 8 0 0 0 0 0 0x34325258-- y y y win 0x25 24 0 8 8 8 0 16 0 0 0 0x34325258-- y y y win 0x2b 24 0 8 8 8 0 24 0 0 0 0x34325258-- y y y win 0x31 24 0 8 8 8 0 24 8 0 0 0x34325258-- y y y win 0x37 24 0 8 8 8 0 32 0 0 0 0x34325258-- y y y win 0x79 16 0 5 6 5 0 0 0 0 0 0x36314752-- y y y win 0x7f 16 0 5 6 5 0 16 0 0 0 0x36314752-- y y y win 0x85 16 0 5 6 5 0 24 0 0 0 0x36314752-- y y y win 0x8b 16 0 5 6 5 0 24 8 0 0 0x36314752-- y y y win 0x91 16 0 5 6 5 0 32 0 0 0 0x36314752-- y y y win I'm not sure if it is ok to fail or not. I will have to look into it a bit more... Just for reference, a link to the patch: http://lists.freedesktop.org/archives/mesa-dev/2016-January/106102.html
I'd say you should have more client extensions (this is what the intel driver spits out in the EGL_NO_DISPLAY case)
The output seems to be missing? I wonder if the problem has something to do with rendernodes. The platform code opens /dev/dri/card0 when no display is supplied but the pipe-loader code seems to open a render node (/dev/dri/renderD*). The RADEON_INFO ioctl that fails has DRM_AUTH|DRM_RENDER_ALLOW as access restrictions. This means that the ioctl can be called from render nodes, but I am not completely sure what DRM_AUTH implies and if that can cause it to fail when using /dev/dri/card0.
Created attachment 96909 [details] Mesa patch to prefer rendernodes Ok, I get significantly more output and no error with the attached patch. But I am not sure if this is the right thing to do or if it is supposed to work with /dev/dri/card0...
This call happens very early, in particular before creating the context since we want to know whether there's support for the buffer_age extention to flag the preservation bit - the situation for eglinfo will be likewise (it simply doesn't require a context anyway, let alone an X drawable) and the result is the option for render nodes: https://dvdhrm.wordpress.com/2013/09/01/splitting-drm-and-kms-device-nodes/ > fd = loader_open_device("/dev/dri/renderD128"); I assume a hardcoded string won't be accepted ;-) The code in question looks like it unconditionally tries /dev/dri/renderD0, that one's likely occupied and in return it resorts to card0 What *should* (probably, I'm no expert) happen is to try all available render nodes (ie. loop from 0-128?) until one available is found?!
> The code in question looks like it unconditionally tries /dev/dri/renderD0, > that one's likely occupied and in return it resorts to card0 > > What *should* (probably, I'm no expert) happen is to try all available > render nodes (ie. loop from 0-128?) until one available is found?! The current code just tries to open /dev/dri/card0 twice (on Linux), because "int n = snprintf(buf, sizeof(buf), DRM_DEV_NAME, DRM_DIR_NAME, 0);" just copies that string into buf. So it does not use render nodes at all. Of course, the path of the render node could be calculated in a similar way - the hardcoded path was mostly just to see if it works.
The bug is that you end up in that fuction at all - should be dri2_initialize_surfaceless(), but I can't say why atm. resp. this one should fail and then the next be tried (until the surfaceless one succeeds)
No, the current code tries all EGL drivers (which is only drm), but it only tries one platform for each driver. This platform is either what is set by the EGL_PLATFORM env var or if that is empty what was the first passed to the configure script in --with-egl-platforms. In my case this was drm and I was not aware that the order matters.
Maybe KWin should set EGL_PLATFORM to x11 or wayland, depending on where it runs. This is at least what other compositors do.
Or even better, use eglGetPlatformDisplayEXT to get a display for the right platform (if available).
The relevant code (initEGL()) is in platform agnostic in libkwineffects - we'd either have to transfer it into the platform implementations to make us of eglGetPlatformDisplayEXT (or pass it down alongside the EGL_EXT_platform* tests incl. an explicit query for client extensions on EGL_NO_DISPLAY) => Simply setting EGL_PLATFORM avoids the crash? I'm however not sure whether that's a general solution to the problem (see eglinfo situation)
(In reply to Thomas Lübking from comment #23) > The relevant code (initEGL()) is in platform agnostic in libkwineffects - > we'd either have to transfer it into the platform implementations to make us > of eglGetPlatformDisplayEXT (or pass it down alongside the EGL_EXT_platform* > tests incl. an explicit query for client extensions on EGL_NO_DISPLAY) Yes, that's why I haven't proposed a patch yet... > => Simply setting EGL_PLATFORM avoids the crash? The crash is already avoided by the patch I sent to mesa-dev. But simply setting EGL_PLATFORM makes mesa choose another platform that uses render nodes instead of /dev/dri/card0 so that I do not see "radeon: Failed to get PCI ID, error number -13" anymore. Also, it would avoid the crash even when my patch was not applied to mesa. EGL_PLATFORM was a workaround by mesa for this problem when eglGetPlatformDisplayEXT was not yet around. It still works but I think that the latter is the cleaner solution. > I'm however not sure whether that's a general solution to the problem (see > eglinfo situation) I'm afraid that there is not really a general solution, because unfortunately eglGetDisplay made it into the standard in its current form without considering that there might be drivers with multiple platforms. Applications should probably use glGetPlatformDisplayEXT when possible. And for all that don't, distributions should set a sane default backend that will hopefully not fail (surfaceless?).
The problem should only be in the EglOnXBackend, so a qputenv("EGL_PLATFORM", "x11"); could be added, if it helps.
(In reply to Martin Gräßlin from comment #25) > The problem should only be in the EglOnXBackend, so a > qputenv("EGL_PLATFORM", "x11"); > > could be added, if it helps. +1 We might remove the early initEGL() once we expect buffer_get support and/or drop EGL_SWAP_BEHAVIOR_PRESERVED_BIT otherwise. @Niels, is this sufficient (the nvidia blob resolves some environment on loading only ...) diff --git a/eglonxbackend.cpp b/eglonxbackend.cpp index 63ef6e7..eca6e23 100644 --- a/eglonxbackend.cpp +++ b/eglonxbackend.cpp @@ -89,6 +89,7 @@ EglOnXBackend::~EglOnXBackend() void EglOnXBackend::init() { + qputenv("EGL_PLATFORM", "x11"); initEGL(); // required to toggle initBufferAge(); // EGL_SWAP_BEHAVIOR_PRESERVED_BIT if (!initRenderingContext()) {
Yes, this would be sufficient. For mesa at least, since it queries the environment variable in eglGetDisplay. We already use EGL_PLATFORM for the hwcomposer backend and the wayland QPA, so it should be fine to also use it here.
https://git.reviewboard.kde.org/r/126958/
Git commit 176f492085f4a7f159cd78e525e1d52701beb51b by Thomas Lübking. Committed on 02/02/2016 at 10:06. Pushed by luebking into branch 'master'. export EGL_PLATFORM as x11 in eglonxbackend required because the early initEGL to detect buffer_age support can cause usage of the wrong driver in eglGetDisplay() FIXED-IN: 5.6 REVIEW: 126958 M +1 -0 eglonxbackend.cpp http://commits.kde.org/kwin/176f492085f4a7f159cd78e525e1d52701beb51b