Bug 397834

Summary: Native Firefox/Wayland port works poorly on Wayland Plasma
Product: [Plasma] kwin Reporter: Martin Stransky <stransky>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: CLOSED UPSTREAM    
Severity: normal CC: andreamtp+bz, bugseforuns, christian.rohmann, Ivan.qrt, kde, kode54, leonard, maggu2810, matejm98mthw, nate, nortexoid, rainer, sawyerbergeron
Priority: NOR    
Version: 5.16.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=404836
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Firefox Wayland with hiffen surface
Log of "frozen" error dialog
Wayland debug output on Kwin
Wayland debug output on Weston

Description Martin Stransky 2018-08-24 12:33:45 UTC
Follow up from https://bugzilla.mozilla.org/show_bug.cgi?id=1478283

Hi Guys,

I work on Firefox/Wayland port and I have troubles to make it functional under Plasma Wayland compositor. I have a stock Fedora 28 builds with all updates, packages:

plasma-*-5.13.4-1.fc28.x86_64 
qt5-*-5.10.1-1.fc28.x86_64

and Intel(R) HD Graphics 530

Firefox Wayland can be installed from https://firefox-flatpak.mojefedora.cz/ as a flatpak or built from source with this line at .mozconfig:

ac_add_options --enable-default-toolkit=cairo-gtk3-wayland

As for the issues:

1) With standard (non-accelerated) config, Firefox creates wl_subsurface which is attached to the wl_surface owned by GtkWidget. That works on mutter/weston but Plasma ignores that and just blank window is shown instead of Firefox window. Running with WAYLAND_DEBUG doesn't show any error/failure.

2) When accelerated graphics is used (egl_window), Firefox content is rendered but mouse input is ignored, mouse pointer reacts to content under Firefox window.

I appreciate any help and/or guidance how to debug and fix those issued on Plasma.

Thanks!
Comment 1 Rainer Finke 2018-08-24 19:04:23 UTC
I would like to confirm these issue, tested with AMD and Intel modesetting drivers.

Arch Linux
Linux 4.18.4
Mesa-git
Kwin-git
Kwayland-git
Qt 5.11.1
KDE Frameworks 5.49
XWayland 1.20.1
Comment 2 Rainer Finke 2018-09-08 13:05:36 UTC
Created attachment 114837 [details]
Firefox Wayland with hiffen surface

Attached is a screenshot of Firefox Wayland where only a blank page is shown on Plasma. Interestingly the Kwin debug console does show that there is some content rendered by Firefox but it is not shown in the main window (Kwin debug console will collapse all surfaces as soon as I try to make a screenshot with spectacle, so it is not visible in the screenshot).
Somehow the mouse input is not working as well. The window borders should be shown as soon as Gtk will be updated to 3.24, not sure if it will change anything else for the Firefox Wayland build though. 
Is there a possibility to get some additional output to see where something is going wrong?
Comment 3 Martin Flöser 2018-09-08 15:12:01 UTC
Please provide the output of Wayland debug. I assume there's a subtle protocol error - we had similar problems with Qt.
Comment 4 David Edmundson 2018-09-08 15:50:52 UTC
Ideally debug of both working on mutter and the non-working kwin case.
Comment 5 David Edmundson 2018-09-08 16:10:37 UTC
On my system with the flatpak I get a toplevel with
"Nightly closed unexpectedly whilst starting. blah blah" and two buttons.

and it acts frozen.
Could be good as it's a simpler test case of something else broken.

This window responds to pings
Debug shows I'm sending motion and buttons events as you'd expect
I'm not seeing any damage events.
Comment 6 David Edmundson 2018-09-08 16:15:40 UTC
Created attachment 114840 [details]
Log of "frozen" error dialog

WAYLAND_DEBUG Log of just a firefox error dialog that isn't responding
Comment 7 Rainer Finke 2018-09-08 17:12:59 UTC
Created attachment 114842 [details]
Wayland debug output on Kwin
Comment 8 Rainer Finke 2018-09-08 17:16:21 UTC
Created attachment 114843 [details]
Wayland debug output on Weston

I had to cut the file, as on Weston too much output was created just by moving the mouse.

Tried to test this on Gnome on Wayland, but Gnome crashed too fast with my AMD GPU, may have to try with Intel.
Comment 9 David Edmundson 2018-09-10 08:08:16 UTC
To confirm this is for the "non-accelerated" mode in 1?

@MartinS can you clarify how one switches between the egl_window and raster version (the 1 and 2 in your first post)
Comment 10 Martin Stransky 2018-09-10 08:18:22 UTC
(In reply to David Edmundson from comment #9)
> @MartinS can you clarify how one switches between the egl_window and raster
> version (the 1 and 2 in your first post)

1) sw rendering via wl_subsurface attached to mGdkWindow - set layers.acceleration.force-enabled = false at about:config

2) EGL rendering via elg_window - set layers.acceleration.force-enabled = true at about:config

go to about:support to see which rendering path is used, it's at "Compositing" field.
Comment 11 David Edmundson 2018-09-11 20:55:29 UTC
>d but mouse input is ignored, mouse pointer reacts to content under Firefox window.

I've been investigating this first as I have a raster dialog on my startup.



Firefox uses subsurfaces.

[1911371.914]  -> zxdg_shell_v6@17.get_xdg_surface(new id zxdg_surface_v6@31, wl_surface@30)

[1911388.950]  -> wl_subcompositor@29.get_subsurface(new id wl_subsurface@41, wl_surface@40, wl_surface@30)

Here we have surface 30 as the parent which is the toplevel, surface 40 is the subsurface

our pointer only enters surface 40
wl_pointer@3.enter(4096, wl_surface@40, 323.000000, 122.000000)

as it fills the whole area we never enter surface 30


gdkdevice-wayland.c:1507

pointer_handle_enter

has the line 
  if (!GDK_IS_WINDOW (wl_surface_get_user_data (surface)))
    return;

our subsurface is not a window so we fail early. It then ignores all mouse events as it thinks no window has focus.

Our code is definitely going out of it's way to make sure we send things to the subsurface; docs are a bit unclear on whether it's right or not.
Comment 12 Martin Flöser 2018-09-12 07:08:49 UTC
The last time we run into such a situation clarification on wayland mailing list showed that my interpretation was correct and that e.g. Weston was wrong.

Can you please clarify whether an enter should be sent to the parent? IMHO it doesn't make sense as we would have to send a leave directly afterwards when entering the subsurface.
Comment 13 David Edmundson 2018-09-12 10:44:24 UTC
Reading GTK code a bit more.

Subsurfaces are still windows (just like in Qt) GDK_IS_WINDOW is just a cast check on the wl_surface userdata.
There's no sensible reason that should be failing for any GDK windows inside windows.

/but/ in firefox code (moz_container_map_surface) we're creating surfaces directly with custom userdata that GTK doesn't directly know about.

FWIW bodging kwayland to send enter/exit to the root surface (https://phabricator.kde.org/P259) does make firefox clicks work.

However, I'm now quite confident that the clicks are a bug in Firefox as above.
Comment 14 David Edmundson 2018-09-13 22:43:17 UTC
Edit:

Nope. It is a kwin bug.

I was right about the surfaces and everything, but the mozilla surfaces don't set an input region. Which means we click in them, we *should* be selecting the parent GTK window. KWayland checks the subsurface based on visual region.
Comment 15 Markus Rathgeb 2018-09-25 22:08:59 UTC
If it is a kwin bug, any change to get this fixed in 5.14?
Should https://phabricator.kde.org/P259 already improve the situation?
Comment 16 David Edmundson 2018-09-27 09:15:28 UTC
We are not merging that paste. There is a better approach, it will be merged when it's ready.

There is still the separate SHM issue to sort out.
On that it seems Firefox simply stops sending buffers. I've yet to determine why.
Comment 17 Martin Stransky 2018-09-27 10:56:14 UTC
(In reply to David Edmundson from comment #16)
> There is still the separate SHM issue to sort out.
> On that it seems Firefox simply stops sending buffers. I've yet to determine
> why.

Firefox has two wl_buffers and does double buffering. Under KDE the buffers are usually not release in time so there's no buffer available to draw to and Firefox does not send another buffer then.

When I tried to use as many buffers as possible (always create a new buffer when there's one at compositor) Firefox created ~100 wl_buffers but without any visible result.
Comment 18 Markus Rathgeb 2018-09-27 11:13:40 UTC
(In reply to David Edmundson from comment #16)
> We are not merging that paste. There is a better approach, it will be merged
> when it's ready.

Sure, merge the solution that is clean and fits to the kwin architecture.

As I am using Gentoo it is pretty easy for me to try such patches.
I would like to use Firefox Wayland on KDE very much and I will try to help with testing code changes (kwin, kwayland, firefox, ...).


I merged the above patch for kwayland myself, rebuild and installed it on my system.
But Firefox/Wayland doesn't work better on my system. It is still unusable using the Wayland backend on kwin.

(FTR: I build the firefox nightly flatpak myself to use the most recent Gnome SDK to get GTK+ 3.24).
Comment 19 David Edmundson 2018-10-08 08:16:31 UTC
Git commit b4cd89ea4977609ba17634d62efd7b1bc8bd112a by David Edmundson.
Committed on 08/10/2018 at 08:15.
Pushed by davidedmundson into branch 'master'.

Don't silently error if damage is sent before buffer

Summary:
Firefox sends

wl_surface@37.damage(0, 0, 808, 622)
wl_surface@37.attach(wl_buffer@34, 0, 0)

Which we silently treat as an error.

There's nothing in the spec to forbid this. The only thing that matters
is the state on commit. This moves a check there.

Test Plan:
Had a debug in there which was being activated
Gets firefox slightly further (but not complete)

Reviewers: #kwin

Subscribers: kde-frameworks-devel

Tags: #frameworks

Differential Revision: https://phabricator.kde.org/D15912

M  +1    -5    src/server/surface_interface.cpp

https://commits.kde.org/kwayland/b4cd89ea4977609ba17634d62efd7b1bc8bd112a
Comment 20 David Edmundson 2018-10-11 15:02:54 UTC
On the SHM subsurfaces, I've not got anywhere so far. 

We seem to end up in the code path where Firefox doesn't have a buffer available and waits. However FF should be coping with that state. Something it'd be good for Martin S to check.

It's interesting to note that the weston-subsurface test on kwin goes beserk. On weston the GL icon spins smoothly on kwin it looks like it's animating at a million fps. (even after removing that Qt workaround that sent an early frameRendered) That'll be something unusual on the kwin side.

Hopefully that's related, because it's a much easier test piece. 

I'm also going to add basic subsurface support to the testrenderingserver in kwayland/tests in the hope that will narrow things down.
Comment 21 Martin Flöser 2018-10-11 16:58:39 UTC
what's the weston-subsurface test called? I cannot find it in the installed weston package
Comment 22 David Edmundson 2018-10-11 18:15:28 UTC
weston-subsurfaces, but I think that's a red-herring.

I wrote subsurface support in testrenderingserver; weston-subsurfaces works normal speed, but firefox continues to be broken. 

That's good as it makes things easier to narrow down.
Comment 23 Martin Stransky 2018-10-12 07:23:14 UTC
(In reply to David Edmundson from comment #20)
> We seem to end up in the code path where Firefox doesn't have a buffer
> available and waits. However FF should be coping with that state. Something
> it'd be good for Martin S to check.

I can patch Firefox to use more buffers if that helps you. 

Also Firefox draws to wayland in an extra thread (compositor thread), the wl_surfaces/buffers are created/attached there...maybe the event loop at compositor thread is wrongly configured/handled, events are not propagated between Firefox compositor thread and Wayland compositor thus the buffers are not released/attached?
Comment 24 Martin Stransky 2018-10-12 07:27:19 UTC
(In reply to David Edmundson from comment #20)
> On the SHM subsurfaces, I've not got anywhere so far. 
> 
> We seem to end up in the code path where Firefox doesn't have a buffer
> available and waits. However FF should be coping with that state. Something
> it'd be good for Martin S to check.

On second though, missing buffer should lead to frame loss and not complete lock up. I expect the wl_buffer is released in some time when compositor finishes the drawing and it's available again, isn't it?

If the wl_buffers are locked by compositor indefinitely then there's something obviously wrong there.
Comment 25 David Edmundson 2018-10-12 13:23:16 UTC
I've updated kwayland's testRenderingServer to support very subcompositors and XDGShell. Branch is  davidedmundson/test_render_subcomp

It still has a layer over libwayland, but it reduces a lot of the complexity and noise.

Interestingly it works in QtWayland's minimal examples; the wayland traces between the two are near identical, up until one stops.
Comment 26 Martin Stransky 2018-10-24 08:47:15 UTC
Guys, is there any fixed kwin version I can test on Fedora 29?

Also there's a new Firefox wayland build at https://firefox-flatpak.mojefedora.cz/ which uses triple buffering.
Comment 27 Christopher Snowhill 2018-12-14 08:58:13 UTC
Two more months, any additional news?
Comment 28 Markus Rathgeb 2018-12-14 09:04:21 UTC
The last news: https://bugzilla.mozilla.org/show_bug.cgi?id=1478283#c12
Comment 29 David Edmundson 2019-01-25 18:42:43 UTC
Update:

I just reran my previously failing test server in kwayland branch davidedmundson/test_render_subcomp  

and now things seem to be working there

So there has been some sort of important low-level fix on the FF side!

Running in kwin still fails, but that means it's back on us to try stuff again now there's been some movement.
Comment 30 Martin Stransky 2019-01-28 09:43:57 UTC
Good to hear so.

btw. Mozilla nighly builds are now created with Wayland support, you just need to set GDK_BACKEND=wayland.
Comment 31 Matej Mrenica 2019-05-21 12:52:57 UTC
I have tried FF Nightly on Plasma 5.15.90 and it works as good, or even better than, Xorg or Xwayland, as long as you don't try to resize, or move the window. Also "Maximize" and "Minimize" buttons are missing.
Comment 32 Martin Stransky 2019-05-22 08:07:30 UTC
(In reply to mthw0 from comment #31)
> I have tried FF Nightly on Plasma 5.15.90 and it works as good, or even
> better than, Xorg or Xwayland, 

I expect your nightly is running with WebRender (HW accelerated) backend which is recently enabled in nightly. That mean the window is rendered by GL and not by Basic (SW) compositor which is broken.

You can see your recent backend in about:support in Compositing field.

WebRender can be disabled in about:config page.

> as long as you don't try to resize, or move
> the window. Also "Maximize" and "Minimize" buttons are missing.

Do you have system titlebar disabled or enabled? You switch that in Customize -> Title Bar.
Comment 33 Matej Mrenica 2019-05-22 12:02:09 UTC
After further testing:
Using Webrender or OpenGL makes Firefox stop responding after an attempt to resize, so I switched to 'Basic'.
Runing FF Nightly with Basic makes it really (unusably) slow (CPU is not at 100%) and also there are lots of screen artifacts (like repeated text).
Resizing the window most often leads to a black window, but snapping(?) the window to left or right usually works correctly (same with maximizing).
Using CSD or SSD makes almost no difference, only with CSD there is a 20px (may vary) invisible border around the window, when snapped(?) to a side.
Comment 34 Patrick Silva 2019-05-22 14:01:14 UTC
I have the nightly build (running basic compositing) installed via flatpak on Arch Linux running Plasma 5.16 beta.
Hamburger menu is entirely black when opened for the first time,
it is shown correctly when it is opened again, but nothing happens
when any of its entries is clicked.
System title bar is disabled here and maximize, minimize and close buttons are missing.
Comment 35 Rainer Finke 2019-05-25 08:45:12 UTC
I did as well some test with Firefox Nightly (2019-05-23) and Plasma 5.16 beta + Qt 5.13 beta. Firefox itself can be used now on Wayland quite stable including menu's/tabs.

Webrender compositing:
- Firefox window cannot be be maximized and reduced to window size without freezing
- Resizing the window will freeze Firefox

OpenGL compositing:
- Firefox window can now be maximized and reduced to window size at least sometimes before it freezes as well
- Resizing the window will freeze Firefox
- Typing in the address bar can freeze firefox (happened several times when trying to open about:support)

Basic compositing:
- Firefox window can now be maximized and reduced to window size, but rendering shows a lot of graphical glitches and is slow or at least needs user interaction (scrolling) to show content
- Resizing the window works as well, but there are a lot of graphical glitches and even the window boarder has often another size then the Firefox content, website is often not rendered correctly until reloading a page or user interaction
- Sometimes you just get a black window and see the content only during resizing the window

General issues:
- Copy and Paste doesn't work, you cannot insert any content to a Firefox Wayland window
- kwin has issues rendering Firefox correctly, if other applications are shown above the Firefox window there is a lot of flickering (probably the subsurface issue https://bugs.kde.org/show_bug.cgi?id=387313) and with the wobbling windows effect you can see that the Firefox frame is wobbling but not the content itself
- Firefox doesn't use the global font dpi defined in systemsettings (it works with xwayland). On my 4k screen I cannot use the 2x scaling as it is rendering everything way to big and therefore font dpi is increased from 90 to 130.
- Mouse cursor size is very small, when it is inside the Firefox window, it grows to normal size again if outside of Firefox
Comment 36 David Edmundson 2019-06-16 08:25:15 UTC
This bug has gotten too convoluted. The original SHM updates bug has gone away, in part due to fixes by us and in part due to changes upstream.

Lets open new reports for new issues.

To save some time:
 - subsurfaces: known issue, will fix
 - copy paste - unknown, though we need to research it
 - fontdpi - I should remove the setting. There's not a global font DPI standard, and it doesn't make sense.
 - mouse - clients provide cursor bitmaps which implicitly includes sizes. Almost certainly upstream.