Bug 458771

Summary:	flickering
Product:	[Frameworks and Libraries] libplasma	Reporter:	pf
Component:	libplasma	Assignee:	Plasma Bugs List <plasma-bugs-null>
Status:	RESOLVED WORKSFORME
Severity:	normal	CC:	me, notmart
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	annotated journal Second occurence of screens flashing within a few hours.

Description pf 2022-09-06 02:44:55 UTC

SUMMARY (frameworks-plasma is my best guess; sorry if wrong)
Random flickering; sometimes extremely fast, other times, slow, sometimes, even acting as a toggle.
Affects windows, taskbar, systray, digital clock, individual icons, physical screen areas, tabs in tabbed windows, etc.

I see many flickering reports; but most are against specific products. This report is to inform KDE that the flickering problem is wide-spread; but appears to be at a container level: windows, taskbar, systray, physical screen, etc as containers.

Initially, I blamed firefox nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1746708
More recently, seeing flickering on multiple applications: firefox, claws-mail, libreoffice calc, ...
Sometimes, the flickering acts like a toggle switch:
- preform some activity: scroll, click, grab/move, select, etc.
- display does not change until I exit/enter the window; often, I have to move the window ever so slightly for the display to update.
- window-shading/un-shading sometimes returns an empty window, until I move it.

STEPS TO REPRODUCE (firefox)
1. use firefox extensively (lots of windows & tabs)
2. start streaming some video; after a while (an hour or more usually)
3. flickering starts (video affected, sound continues) Usually, the video stream is in a loop toggling between a couple of frames over and over -- speed varies; not always the same speed.
In this case, I can open a new firefox window and move the affected tabs to the new window which "solves" the issue for a while. So problem appears to be the window as a container.

STEPS TO REPRODUCE (taskbar)
1. use the system
2. hours (usually) after system restarted, taskbar icons will do a "dance" back and forth; sometimes one icon comes and goes, aggravating the "dance"
3. moving the mouse over the affected area affects the issue; usually stops the flickering

STEPS TO REPRODUCE (random screen blocks flicker)
1. no idea what triggers these
2. have not seen an obvious trigger, kinda like a Venn diagram, but with rectangles rather than circles.
3. what makes up the rectangles acting like a Venn diagram is not obvious to date.

STEPS TO REPRODUCE (zoom video calls)
1. use it
2. after 15-30 minutes, zoom participants freeze or toggle between a couple of frames
3. restart zoom clears it up. (zoom window is a container).

OBSERVED RESULT
flickering

EXPECTED RESULT
no flickering

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:
Operating System: Mageia 9
KDE Plasma Version: 5.25.4
KDE Frameworks Version: 5.97.0
Qt Version: 5.15.5
Kernel Version: 5.19.4-server-2.mga9 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 125.5 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Dell Inc.
Product Name: XPS 8950

ADDITIONAL INFORMATION
After many months of trying to find commonality[1], it's looking more like a race condition.
claws-mail uses a notification icon in the systray; normally, updates (new mail, unread mail, no mail) occur instantaneously. When the flickering problem begins, the systray updates are delayed until the mouse is moved out/in the claws-mail window.
When flickering gets excessive, even the digital clock starts to flicker digits or the entire clock.

Just now, moved mouse over taskbar, and no tooltips appeared, until I switched desktops.

[1] first saw this flickering on old Dell M6800 running Mageia8, so I'm not convinced its a hardware issue as some have suggested.

Comment 1 pf 2022-09-06 12:29:07 UTC

Flickering got so bad this morning; had to reboot.  Once it starts, issue gets progressively worse.
LibreOffice Calc: one document was flickering, another would not accept ANY input.
Firefox:  popup tooltips (from left menu bar on gitlab page) flickering wildly.
Task bar with 2 (~3" wide app icons):  left icon not moving, right icon bouncing between its normal position and overlapping left icon ~1/3 at high speed -- as though left icon's "reported width" kept changing while visible width remained unchanged..
Individual taskbar icons flickering color changes (between light blue and lighter blue).
So much flickering of different types, it's impossible to identify all flickering modes. Flickering happens randomly at random locations on both screens.  it is getting progressively worse, now affecting all apps; in the  beginning, it was mainly firefox, and usually after streaming video for a while. 
Some responses in bug reports try to blamed video hardware; but seeing this on Dell M6800 laptop and on new Dell XPS 8950 (128GB RAM) and AMD Radeon RX 6600 XT video card.  Running X11, not Wayland.

This seems to have the earmarks of code segment(s) missing lock(s) to prevent being interrupted, with values getting changed by a parallel process...

Comment 2 pf 2022-09-06 12:32:27 UTC

see also: https://bugs.mageia.org/show_bug.cgi?id=30482

Comment 3 pf 2022-09-12 22:46:13 UTC

Another variant just now:  moving the mouse over the systray, then sliding across the various systray icons, each icon brings up a tooltip. The tooltips vary in size, so one tooltip has a large one (say ~2"x3"); the next tooltip the mouse moves over is a small one -- however, the small tooltip is initially displayed at the size of the large one, and the font size is scaled up to fit (magnified); then that tooltip is scaled down to its normal size. During the moment that the small icon is first displayed the size of the large one, and when it becomes its normal small size; the tooltip gets caught in a flicker between what appears to be the large icon's tooltip and the small icon's tooltip before it can be resized down to its normal size.  This time the only way I could stop the flicker was to move mouse to the secondary screen.  

Each time I try to get a video capture, moving the mouse to another container (see previous comments); it usually stops. Capturing with my phone may be the only way...  

Operating System: Mageia 9
KDE Plasma Version: 5.25.4
KDE Frameworks Version: 5.97.0
Qt Version: 5.15.6
Kernel Version: 5.19.8-server-1.mga9 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 125.5 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Dell Inc.
Product Name: XPS 8950

BIOS updated from 1.3.0 to 1.6.0 yesterday.

Comment 4 pf 2022-10-07 06:38:57 UTC

I've been working with shotcut (video editor); the flickering started affecting various regions of that window. After window shading and restoring the window, I noticed something new/strange: keystrokes were totally ignored (space for start/stop playing, etc.) Then, the entire window started to window-shade on/off on its own. Each on/off appeared synchronized with clock updates -- each second (sometimes 2 seconds), the window would be window-shaded, then restored, etc. Very slow flickering.
Referring back to my "container" comments, it appears various containers (within a window (areas of the window), and the entire window) flicker independently. There is no consistency in the speed of flicker -- milliseconds for some flickering; now, on/off cycles in seconds for the entire window. Other window/applications are not affected.
It's been suggested that this could be a screen issue -- I consider this premise impossible because physical monitors are only aware of video streams as pixels; they have no concept of display areas varying in size from a few pixels flickering to entire application windows flickering.
When flickering starts, an application can also appear visually frozen. Moving the window as little as one pixel refreshes the window into the view it should have; but it instantly appears frozen again.
Sigh... so many symptoms... though they appear "container" related. Like updating a window area, or an entire window, and anything in between... Is there some common code that handles painting all areas regardless of size? This is beginning to feel like a memory leak or heap overflow. I'm mainly a user who writes python code; not a system developer.

Comment 5 pf 2022-10-09 22:24:42 UTC

Created attachment 152675 [details]
annotated journal

Was waiting for a convenient time to reboot; but system had other plans... 

Attached journal starts way before the failure in case there is something therein that provides a clue.  I've added whitespace around some entries plus some comments; but nothing was removed.

After screens went dark and started flashing narrow horizontal video strips; I tried restarting KDE with Ctrl+Alt+Bkspc a few times; but while it did provide login screen, it never completed login -- only displayed Breeze Splash Screen IIRC...

Comment 6 pf 2022-10-10 02:23:30 UTC

Created attachment 152676 [details]
Second occurence of screens flashing within a few hours.

Just occurred again.  This time the journal is from the earlier reboot.  I was able to restore my DKE/Plasma session by killing kded5 which didn't appear to help; but killing startpalsma-x11 stopped the flashing and gave me an sddm login, so I'm back up without rebooting.  This journal looks just like the previous one:

Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32769, for process Xorg pid 22526 thread Xorg:cs0 pid 22807)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800116201000 from client 0x1b (UTCL2)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32769, for process Xorg pid 22526 thread Xorg:cs0 pid 22807)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800116200000 from client 0x1b (UTCL2)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Oct 09 21:30:03 pf.pfortin.com kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1

In case it matters, the "page starting at address" alternates between 0x0000800116201000 and 0x0000800116200000

Comment 7 pf 2022-10-10 02:37:10 UTC

I captured a video of the problem with my phone a few weeks ago;  this what I saw twice so far today:
https://drive.google.com/file/d/1xgzy1zE-TlHeC5pAouvkfKkUIzDakPVf/view?usp=sharing
The screens are secondary(left) and primary(right).

Comment 8 pf 2022-10-15 15:52:52 UTC

After killing startpalsma-x11 (comment 6), and __avoiding streaming video__, the system has been quite stable until evening of 10/14/22 (about 4 days); then I decided to install a 4-port USB PCIe card which has issues. While examining dmesg, I noticed these amdgpu messages; postin here in case there's a clue within:

[    2.176993] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
[    2.848136] [drm] amdgpu kernel modesetting enabled.
[    2.848185] amdgpu: CRAT table not found
[    2.848186] amdgpu: Virtual CRAT table created for CPU
[    2.848190] amdgpu: Topology: Add CPU node
[    2.848289] Console: switching to colour dummy device 80x25
[    2.848311] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[    2.848352] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
[    2.850196] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
[    2.850196] amdgpu: ATOM BIOS: BR77997.001
[    2.850203] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[    2.850336] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x4010000000-0x40101fffff 64bit pref]
[    2.850348] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x4000000000-0x400fffffff 64bit pref]
[    2.850380] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x4200000000-0x43ffffffff 64bit pref]
[    2.850386] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x4100000000-0x41001fffff 64bit pref]
[    2.850425] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[    2.850427] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    2.850428] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    2.850673] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[    2.850677] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[    4.064604] amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
[    4.065094] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[    4.259768] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    4.281896] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    4.281920] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2900 (59.41.0)
[    4.281925] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[    4.281962] amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
[    4.330961] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[    4.463147] amdgpu: sdma_bitmap: ffff
[    4.463185] amdgpu: SRAT table not found
[    4.463186] amdgpu: Virtual CRAT table created for GPU
[    4.463332] amdgpu: Topology: Add dGPU node [0x73ff:0x1002]
[    4.463333] kfd kfd: amdgpu: added device 1002:73ff
[    4.463349] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
[    4.463380] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    4.463380] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    4.463381] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    4.463381] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    4.463382] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    4.463382] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    4.463382] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    4.463383] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    4.463383] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    4.463384] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    4.463384] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    4.463385] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    4.463385] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[    4.463385] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[    4.463386] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[    4.463386] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[    4.464248] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[    4.464401] [drm] Initialized amdgpu 3.48.0 20150101 for 0000:03:00.0 on minor 0
[    4.468844] fbcon: amdgpudrmfb (fb0) is primary device
[    4.468872] [drm] DSC precompute is not needed.
[    4.647304] Console: switching to colour frame buffer device 240x67
[    4.664260] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    9.616281] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

Comment 9 pf 2022-11-11 05:31:51 UTC

Not sure what resolved this; but system has been stable for  nearly 3 weeks.

Comment 10 pf 2022-11-30 04:17:51 UTC

Should I reopen this bug or create a new one?

There is still some flickering; but it's mild.   It most often occurs if I move the mouse into the systray, then slide left towards the panel's desktop selector. As it moves across the application icons, flickering happens briefly, and clears up when the mouse is moved up and away from the panel.