Bug 464782 - kscreen picks DoubleScan modelines ahead of regular ones, causing system hang on amdgpu
Summary: kscreen picks DoubleScan modelines ahead of regular ones, causing system hang...
Status: REPORTED
Alias: None
Product: KScreen
Classification: Plasma
Component: common (show other bugs)
Version: 5.26.5
Platform: Other Linux
: NOR grave
Target Milestone: ---
Assignee: kscreen-bugs-null@kde.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-25 04:23 UTC by nyanpasu64
Modified: 2023-01-25 17:28 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
xrandr --verbose, all modelines of my VGA CRT monitor (VX720) (26.70 KB, text/plain)
2023-01-25 04:23 UTC, nyanpasu64
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nyanpasu64 2023-01-25 04:23:19 UTC
Created attachment 155574 [details]
xrandr --verbose, all modelines of my VGA CRT monitor (VX720)

SUMMARY
When I hook up a CRT to KDE and pick a low resolution (like 800x600) in System Settings -> Display Configuration, and the chosen resolution and refresh rate contain a modeline with DoubleScan enabled, kscreen sometimes picks that instead of the native resolution. On amdgpu, this causes a full system hang, which persists when I reboot and login again with the CRT plugged in, until I delete the ~/.local/share/kscreen folder.

STEPS TO REPRODUCE
1. Power on a VGA CRT and connect it to a DP-to-VGA adapter (in any order).
2. Plug the DP-to-VGA adapter (connected to a powered-on CRT) into a DP port on your GPU. (My CRT does not emit EDID data through a plugged-in VGA cable when powered off, which is not good behavior but you can't exactly ask Gateway to release a fixed CRT these days.)
3. In "Display Configuration", set the CRT to 800x600 at 60 Hz and Apply.

OBSERVED RESULT
The system hangs, amdgpu emits ominous warnings into the systemd journal, and the only way out is REISUB (the screen doesn't even update across Alt+SysRq+REISU until you press B to reboot), or a hard reset.

I get the same system hang when I run `xrandr --output DP-2 --mode 0x9c` which picks `800x600 (0x9c) 81.000MHz +HSync +VSync DoubleScan`, but no system hang when I run `xrandr --output DP-2 --mode 0x9f` which picks `800x600 (0x9f) 40.000MHz +HSync +VSync`.

EXPECTED RESULT
kscreen picks a regular 800x600 resolution, without DoubleScan whenever available, avoiding old creaky untested Xorg display modes which crash on modern drivers. (I think the intended function of DoubleScan is to emit a 1200-line signal, outputting each of the 600 rows of pixels twice, but modern amdgpu doesn't handle that anymore... or interlacing... grrr https://gitlab.freedesktop.org/drm/amd/-/issues/1636)

If a given resolution or refresh rate is not available without DoubleScan, I'm not sure what to do... Don't list it at all? Let the user pick it and hang their system if they're on amdgpu (it might work fine on radeon or other drivers?)

SOFTWARE/OS VERSIONS
Operating System: Arch Linux
KDE Plasma Version: 5.26.5
KDE Frameworks Version: 5.102.0
Qt Version: 5.15.8
Kernel Version: 6.1.7-zen1-1-zen (64-bit)
Graphics Platform: X11
Processors: 12 × AMD Ryzen 5 5600X 6-Core Processor
Memory: 15.5 GiB of RAM
Graphics Processor: AMD Radeon RX 570 Series
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: B550M DS3H

DP-to-VGA adapter: https://plugable.com/products/dpm-vgaf, https://www.amazon.com/Plugable-Passive-DisplayPort-Adapter-Supports/dp/B01GW8FV7U, operates using DP++ (HDMI over DisplayPort)
VGA monitor: Gateway VX720

ADDITIONAL INFORMATION
Comment 1 nyanpasu64 2023-01-25 04:54:16 UTC
Out of curiosity, I decided to look into how amdgpu crashed. The journal says:

```
Jan 24 19:05:51 ryzen kernel: ------------[ cut here ]------------
Jan 24 19:05:51 ryzen kernel: WARNING: CPU: 1 PID: 1012 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1228 dc_commit_state_no_check+0x1581/0x18a0 [amdgpu]
Jan 24 19:05:51 ryzen kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bnep ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle btusb iptable_nat btrtl nf_nat btbcm btintel nf_conntrack btmtk nf_defrag_ipv6 joydev nf_defrag_ipv4 mousedev iptable_filter bluetooth ecdh_generic crc16 bridge stp llc rfkill intel_rapl_msr lm92 snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc gigabyte_wmi wmi_bmof intel_rapl_common uas usbhid vfat edac_mce_amd usb_storage fat kvm_amd amdgpu snd_hda_codec_realtek snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi irqbypass crct10dif_pclmul snd_hda_intel crc32_pclmul polyval_clmulni snd_intel_dspcfg polyval_generic snd_intel_sdw_acpi gf128mul ghash_clmulni_intel gpu_sched snd_hda_codec sha512_ssse3 drm_buddy snd_hda_core video aesni_intel drm_ttm_helper snd_hwdep ttm crypto_simd snd_pcm cryptd rapl
Jan 24 19:05:51 ryzen kernel:  drm_display_helper snd_timer sp5100_tco pcspkr snd cec psmouse soundcore ccp zenpower(OE) i2c_piix4 r8168(OE) wmi gpio_amdpt gpio_generic acpi_cpufreq mac_hid uinput dm_multipath dm_mod it87(OE) hwmon_vid sg crypto_user fuse bpf_preload ip_tables x_tables serio_raw atkbd libps2 vivaldi_fmap nvme xhci_pci nvme_core i8042 xhci_pci_renesas nvme_common serio btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
Jan 24 19:05:51 ryzen kernel: CPU: 1 PID: 1012 Comm: Xorg Tainted: G S         OE      6.1.7-zen1-1-zen #1 251eee86d1e3407eafb15439b5bcc81efef5caf9
Jan 24 19:05:51 ryzen kernel: Hardware name: Gigabyte Technology Co., Ltd. B550M DS3H/B550M DS3H, BIOS F15c 05/11/2022
Jan 24 19:05:51 ryzen kernel: RIP: 0010:dc_commit_state_no_check+0x1581/0x18a0 [amdgpu]
Jan 24 19:05:51 ryzen kernel: Code: ff 89 c6 e9 f8 f5 ff ff 48 89 ef e8 29 7b 01 00 48 89 ef e8 41 ab e7 e7 e9 b7 f6 ff ff 80 b8 80 03 00 00 00 0f 84 d4 f4 ff ff <0f> 0b e9 cd f4 ff ff 31 c0 e9 ca f5 ff ff be 03 00 00 00 e8 27 38
Jan 24 19:05:51 ryzen kernel: RSP: 0018:ffffc44941737730 EFLAGS: 00010202
Jan 24 19:05:51 ryzen kernel: RAX: ffffa0ec47d81000 RBX: ffffa0ec795c0000 RCX: 0000000000000001
Jan 24 19:05:51 ryzen kernel: RDX: 00000000000011c7 RSI: 0000000000000e6e RDI: 000000285ce7d259
Jan 24 19:05:51 ryzen kernel: RBP: ffffa0ec795c0aa0 R08: ffffc449417376fc R09: ffffa0ec795c0000
Jan 24 19:05:51 ryzen kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jan 24 19:05:51 ryzen kernel: R13: ffffa0ec795c3638 R14: ffffa0ebc7340000 R15: 0000000000000002
Jan 24 19:05:51 ryzen kernel: FS:  00007f7eb8527400(0000) GS:ffffa0eeee840000(0000) knlGS:0000000000000000
Jan 24 19:05:51 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 19:05:51 ryzen kernel: CR2: 00007f3c936057c0 CR3: 0000000105688000 CR4: 0000000000750ee0
Jan 24 19:05:51 ryzen kernel: PKRU: 55555554
Jan 24 19:05:51 ryzen kernel: Call Trace:
Jan 24 19:05:51 ryzen kernel:  <TASK>
Jan 24 19:05:51 ryzen kernel:  dc_commit_state+0x11e/0x170 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  amdgpu_dm_atomic_commit_tail+0x55f/0x2de0 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  ? free_unref_page+0x36f/0x780
Jan 24 19:05:51 ryzen kernel:  ? bw_calcs+0x806/0x1f80 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  ? dce112_validate_bandwidth+0x77/0x1d0 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  ? dc_validate_global_state+0x3db/0x580 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  ? dma_resv_get_fences+0xa3/0x2c0
Jan 24 19:05:51 ryzen kernel:  ? dma_resv_get_singleton+0x46/0x140
Jan 24 19:05:51 ryzen kernel:  ? wait_for_completion_timeout+0x13e/0x170
Jan 24 19:05:51 ryzen kernel:  ? wait_for_completion_interruptible+0x139/0x1e0
Jan 24 19:05:51 ryzen kernel:  commit_tail+0x94/0x130
Jan 24 19:05:51 ryzen kernel:  drm_atomic_helper_commit+0x1ca/0x200
Jan 24 19:05:51 ryzen kernel:  drm_atomic_commit+0x7b/0x100
Jan 24 19:05:51 ryzen kernel:  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
Jan 24 19:05:51 ryzen kernel:  drm_atomic_helper_set_config+0x74/0xb0
Jan 24 19:05:51 ryzen kernel:  drm_mode_setcrtc+0x3ee/0x900
Jan 24 19:05:51 ryzen kernel:  ? drm_mode_getcrtc+0x180/0x180
Jan 24 19:05:51 ryzen kernel:  drm_ioctl_kernel+0xcd/0x170
Jan 24 19:05:51 ryzen kernel:  drm_ioctl+0x1eb/0x450
Jan 24 19:05:51 ryzen kernel:  ? drm_mode_getcrtc+0x180/0x180
Jan 24 19:05:51 ryzen kernel:  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu ea650a4e77dfc87577a726d0395dd5509c6cbd3f]
Jan 24 19:05:51 ryzen kernel:  __x64_sys_ioctl+0x94/0xd0
Jan 24 19:05:51 ryzen kernel:  do_syscall_64+0x5f/0x90
Jan 24 19:05:51 ryzen kernel:  ? do_user_addr_fault+0x1e9/0x6c0
Jan 24 19:05:51 ryzen kernel:  ? exc_page_fault+0x74/0x170
Jan 24 19:05:51 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 24 19:05:51 ryzen kernel: RIP: 0033:0x7f7eb8ea4ecf
Jan 24 19:05:51 ryzen kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Jan 24 19:05:51 ryzen kernel: RSP: 002b:00007ffc0aedcfc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 19:05:51 ryzen kernel: RAX: ffffffffffffffda RBX: 000056367abbe990 RCX: 00007f7eb8ea4ecf
Jan 24 19:05:51 ryzen kernel: RDX: 00007ffc0aedd050 RSI: 00000000c06864a2 RDI: 0000000000000012
Jan 24 19:05:51 ryzen kernel: RBP: 00007ffc0aedd050 R08: 0000000000000000 R09: 000056367ab45960
Jan 24 19:05:51 ryzen kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c06864a2
Jan 24 19:05:51 ryzen kernel: R13: 0000000000000012 R14: 000056367a9dc308 R15: 00007ffc0aedd100
Jan 24 19:05:51 ryzen kernel:  </TASK>
Jan 24 19:05:51 ryzen kernel: ---[ end trace 0000000000000000 ]---
```

I'm running the Arch linux-zen package, version 6.1.7.zen1-1. This maps to https://github.com/zen-kernel/zen-kernel/blob/v6.1.7-zen1/drivers/gpu/drm/amd/display/dc/core/dc.c. But line 1228 doesn't lie in dc_commit_state_no_check! Instead, by looking through Ghidra's disassembly of dc_commit_state_no_check, it seems the crash site `dc_commit_state_no_check+0x1581` happens between two `dm_perf_trace_timestamp` (which arise from the `PERF_TRACE` macro), originating from `wait_for_no_pipes_pending` which has been inlined into dc_commit_state_no_check. This matches up with how Line 1228 lies in wait_for_no_pipes_pending and consists of `ASSERT(!pipe->plane_state->status.is_flip_pending);`.

It seems that when modern amdgpu is given a DoubleScan mode, it sets up the GPU with the wrong register contents, causing the GPU to never properly output a frame, then crashes waiting for the GPU "pipe" (pipeline?) to finish processing. I'm not sure if this is worth reporting upstream, since I don't know of *anyone* (not even me who has written my own modelines with interlacing) who deliberately uses DoubleScan modes today, and you can replicate similar functionality if you truly desire by using xrandr to upscale a low resolution.