Bug 477820 - Minimal Overlapping Window Placement freezes kwin when opening very big window under fractional scaling
Summary: Minimal Overlapping Window Placement freezes kwin when opening very big windo...
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: general (show other bugs)
Version: git master
Platform: Arch Linux Linux
: NOR major
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: qt6
Depends on:
Blocks:
 
Reported: 2023-12-01 08:34 UTC by fanzhuyifan
Modified: 2023-12-04 19:43 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
gdb trace of kwin (4.80 KB, text/plain)
2023-12-01 22:49 UTC, fanzhuyifan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description fanzhuyifan 2023-12-01 08:34:10 UTC
SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***


STEPS TO REPRODUCE
1. set Window Placement to Minimal Overlapping
2. Set screen resolution to 2560x1600
3. Open konsole
4. Run `xclock -geometry 200x1600`

Step 4 can be replaced with other sizes based on your resolution. I suspect the issue can be triggered as long as it is big enough.

OBSERVED RESULT
System freezes. Can't even switch to virtual consoles.

EXPECTED RESULT
System doesn't freeze.

SOFTWARE/OS VERSIONS
Operating System: Arch Linux 
KDE Plasma Version: 5.90.0
KDE Frameworks Version: 5.246.0
Qt Version: 6.6.1
Kernel Version: 6.6.3-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 20 × 13th Gen Intel® Core™ i9-13900H
Memory: 15.2 GiB of RAM
Graphics Processor: Mesa Intel® Graphics
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG Zephyrus G16 GU603VV_GU603VV
System Version: 1.0

ADDITIONAL INFORMATION

Only triggered for Minimal Overlapping, not for other window placements.

Reproduced in neon unstable.

I would argue that this bug should have high severity and priority. 
Even though the example is quite contrived, it serves to illustrate a very general
problem of freezing when the opened window is too big. This can happen for multiple
reasons, including apps remembering their last size. This would lead to the entire system
being frozen upon opening a certain app (which happened to me.
These are very hard to debug, especially because there would be nothing in the system journals.
(I had to do a alt+sysrq r before even being able to switch to a virtual console using ctr-alt-fx !
Then I found out that kwin was using 200% cpu, and it seems to be spending most of the time in placeSmart)

The relevant code seems to be the Placement::placeSmart function in https://invent.kde.org/plasma/kwin/-/blob/master/src/placement.cpp?ref_type=heads
Comment 1 Nate Graham 2023-12-01 20:08:12 UTC
Cannot reproduce the issue with "Minimal Overlapping window placement mode set and xclock using that geometry, a geometry with a vertical height matching either the logical or physical resolution of my highest DPI screen, or any other geometry that I try.

Can you reproduce it in any other way?
Comment 2 fanzhuyifan 2023-12-01 21:28:40 UTC
(In reply to Nate Graham from comment #1)
> Cannot reproduce the issue with "Minimal Overlapping window placement mode
> set and xclock using that geometry, a geometry with a vertical height
> matching either the logical or physical resolution of my highest DPI screen,
> or any other geometry that I try.
> 
> Can you reproduce it in any other way?

Fractional scaling seems to be a necessary component. I cannot reproduce this at all when my global scale is 100%.

When I set global scale to 105%, I am able to consistently reproduce it with the command `xclock -geometry 200x1600`.

Output of `kscreen-doctor -o`:

Output: 1 eDP-1
        enabled
        connected
        priority 1
        Panel
        Modes:  0:2560x1600@240*!  1:2560x1600@60  2:1600x1200@60  3:1280x1024@60  4:1024x768@60  5:2560x1600@60  6:1920x1200@60  7:1280x800@60  8:2560x1440@60  9:1920x1080@60  10:1600x900@60  11:1368x768@60  12:1280x720@60 
        Geometry: 0,0 2438x1524
        Scale: 1.05
        Rotation: 1
        Overscan: 0
        Vrr: Automatic
        RgbRange: Automatic
        HDR: incapable
        Wide Color Gamut: incapable
        ICC profile: none
Comment 3 fanzhuyifan 2023-12-01 21:39:17 UTC
(In reply to fanzhuyifan from comment #2)
> Fractional scaling seems to be a necessary component. I cannot reproduce
> this at all when my global scale is 100%.
> 
> When I set global scale to 105%, I am able to consistently reproduce it with
> the command `xclock -geometry 200x1600`.
> 
> Output of `kscreen-doctor -o`:
> 
> Output: 1 eDP-1
>         enabled
>         connected
>         priority 1
>         Panel
>         Modes:  0:2560x1600@240*!  1:2560x1600@60  2:1600x1200@60 
> 3:1280x1024@60  4:1024x768@60  5:2560x1600@60  6:1920x1200@60  7:1280x800@60
> 8:2560x1440@60  9:1920x1080@60  10:1600x900@60  11:1368x768@60 
> 12:1280x720@60 
>         Geometry: 0,0 2438x1524
>         Scale: 1.05
>         Rotation: 1
>         Overscan: 0
>         Vrr: Automatic
>         RgbRange: Automatic
>         HDR: incapable
>         Wide Color Gamut: incapable
>         ICC profile: none

In this case the cutoff is between 1558 and 1559. 200x1558 works, but 200x1559 and higher freezes the system (this is tested on neon unstable)
Comment 4 fanzhuyifan 2023-12-01 22:23:19 UTC
I think I figured out what was causing this bug.

The original placeSmart [1] was written under the assumption that all coordinates and dimensions are integers.
Thus, it uses the `int` type to store such things. However, this assumption no longer holds under fractional scaling.
In particular, the various methods of QRectF return qreal types [2], which can be non integers.

Hence, currently the code mixes arithmetic between integers and floats, and sometimes implicitly converts the results
to ints. Thus, after this sequence of assignments,
> int possible;
> possible = area.y() + area.height();
> y = possible;
the expression `(y < area.y() + area.height())` might evaluate to true, causing an infinite loop.

[1] https://invent.kde.org/plasma/kwin/-/blob/dd9e04fb180a7e7c9a49cd029fe497f19c537618/src/placement.cpp#L159
[2] https://doc.qt.io/qt-6/qrectf.html
Comment 5 fanzhuyifan 2023-12-01 22:49:11 UTC
Created attachment 163732 [details]
gdb trace of kwin

The behavior mentioned in the previous comment is clearly illustrated in this gdb trace.

We see that area.y() is 40, area.height() is 1026.66, and so y is 1066, which is smaller than area.y() + area.height()!
Comment 6 Bug Janitor Service 2023-12-01 23:13:39 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/4740
Comment 7 fanzhuyifan 2023-12-01 23:20:42 UTC
(In reply to fanzhuyifan from comment #5)
> Created attachment 163732 [details]
> gdb trace of kwin
> 
> The behavior mentioned in the previous comment is clearly illustrated in
> this gdb trace.
> 
> We see that area.y() is 40, area.height() is 1026.66, and so y is 1066,
> which is smaller than area.y() + area.height()!

This trace was created for global scale=1.5, on a 2560x1600 screen.
Comment 8 fanzhuyifan 2023-12-04 19:43:01 UTC
Git commit c441ce5cd1a691c245d0d65ae5d9df10c91d328a by Yifan Zhu.
Committed on 04/12/2023 at 17:59.
Pushed by davidedmundson into branch 'master'.

Avoid accidental mixing of qreal and int

The code for placeSmart uses ints to store coordinates and dimensions,
but the various methods of QRectF return qreal types, which can be
non-integers under fractional scaling, causing multiple issues.

This commit explicitly converts the needed quantities to ints,
avoiding the issues.

Previously, the code relied on the assumption that
y = area.y() + area.height(); implies !(y < area.y() + area.height()).
This doesn't always hold when mixing qreals and ints, causing infinite loops
under fractional scaling when attempting to open large windows, as
reported in BUG 477820.

This commit also extends the fix in 5502ce9 to fractional scaling scenarios.
The ceiling of client widths and height is used, instead of the implicit
floor, which caused BUG 477886.
Related: bug 477886

M  +12   -7    src/placement.cpp

https://invent.kde.org/plasma/kwin/-/commit/c441ce5cd1a691c245d0d65ae5d9df10c91d328a