491664 – plasmashell crash due to AMDGPU "*ERROR* failed to unmap legacy queue"

Bug 491664 - plasmashell crash due to AMDGPU "*ERROR* failed to unmap legacy queue"

Summary: plasmashell crash due to AMDGPU "*ERROR* failed to unmap legacy queue"

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	plasmashell
Classification:	Plasma
Component:	generic-crash (other bugs)
Version First Reported In:	6.1.4
Platform:	Arch Linux Linux

Importance:	NOR crash
Target Milestone:	1.0
Assignee:	Plasma Bugs List

URL:
Keywords:	drkonqi

Depends on:
Blocks:

Reported:	2024-08-13 12:32 UTC by Thorondir
Modified:	2024-08-13 17:56 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:	https://crash-reports.kde.org/organizations/kde/issues/23450/events/8fdafce14fa344fd90637eed203bcf05/

Attachments
New crash information added by DrKonqi (81.69 KB, text/plain) 2024-08-13 12:32 UTC, Thorondir	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Thorondir 2024-08-13 12:32:16 UTC

Application: plasmashell (6.1.4)

Qt Version: 6.7.2
Frameworks Version: 6.4.0
Operating System: Linux 6.10.4-arch2-1 x86_64
Windowing System: Wayland
Distribution: Arch Linux
DrKonqi: 6.1.4 [CoredumpBackend]

-- Information about the crash:
This happens a lot if I watch youtube videos while on battery, for some reason.
Plasmashell freezes, and the only two ways to get back to a working desktop is either:
1. reboot
2. suspend the laptop, wake it back up.  That makes it recover, for some reason


`dmesg` says:

[ 2669.375871] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=330653, emitted seq=330655
[ 2669.376449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 2858 thread firefox:cs0 pid 2932
[ 2669.376944] amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
[ 2671.422667] amdgpu 0000:c5:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 2671.422680] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 2671.683695] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 2671.685417] amdgpu 0000:c5:00.0: amdgpu: MODE2 reset
[ 2671.724949] amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 2671.725641] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[ 2671.725726] amdgpu 0000:c5:00.0: amdgpu: SMU is resuming...
[ 2671.726957] amdgpu 0000:c5:00.0: amdgpu: SMU is resumed successfully!
[ 2671.729328] [drm] DMUB hardware initialized: version=0x08003D00
[ 2672.177192] [drm] kiq ring mec 3 pipe 1 q 0
[ 2672.179267] amdgpu 0000:c5:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 2672.179916] amdgpu 0000:c5:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 2672.179920] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 2672.179923] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 2672.179925] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 2672.179927] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 2672.179929] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 2672.179931] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 2672.179934] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 2672.179936] amdgpu 0000:c5:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 2672.179938] amdgpu 0000:c5:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 2672.179940] amdgpu 0000:c5:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 2672.179942] amdgpu 0000:c5:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 2672.179945] amdgpu 0000:c5:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 2672.182272] amdgpu 0000:c5:00.0: amdgpu: recover vram bo from shadow start
[ 2672.182275] amdgpu 0000:c5:00.0: amdgpu: recover vram bo from shadow done
[ 2672.182296] amdgpu 0000:c5:00.0: amdgpu: GPU reset(6) succeeded!
[ 2672.199626] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

The crash can be reproduced sometimes.

-- Backtrace (Reduced):
#5  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#6  0x000072732f6a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#7  0x000072732f64c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#8  0x000072732f6334c3 in __GI_abort () at abort.c:79
[...]
#13 0x000072732f6a339d in start_thread (arg=<optimized out>) at pthread_create.c:447


Reported using DrKonqi

Comment 1 Thorondir 2024-08-13 12:32:17 UTC

Created attachment 172580 [details]
New crash information added by DrKonqi

DrKonqi auto-attaching complete backtrace.

Comment 2 Nate Graham 2024-08-13 17:30:01 UTC

KWin folks, any reason to suspect this might be actionable for us at all, as opposed to being an amdgpu driver bug?

Comment 3 Zamundaaa 2024-08-13 17:56:43 UTC

While plasmashell would ideally survive GPU resets (which will be handled in the future by Qt), the actual bug here is that there is a GPU reset happening in the first place. Please report that at https://gitlab.freedesktop.org/mesa/mesa/-/issues