Bug 346184 - Movit kills the graphics driver
Summary: Movit kills the graphics driver
Status: RESOLVED FIXED
Alias: None
Product: kdenlive
Classification: Applications
Component: Video Display & Export (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR critical
Target Milestone: ---
Assignee: Jean-Baptiste Mardelle
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-14 19:26 UTC by Paul Konecny
Modified: 2018-12-05 18:42 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
fritzibaby: MOVIT+


Attachments
Effects in Timeline (2.86 MB, image/jpeg)
2015-04-14 19:28 UTC, Paul Konecny
Details
Radeon crash messages (3.07 MB, image/jpeg)
2015-04-14 19:29 UTC, Paul Konecny
Details
movit pkgbuild (642 bytes, text/plain)
2015-04-14 19:30 UTC, Paul Konecny
Details
MLT pkgbuild (1.41 KB, text/plain)
2015-04-14 19:31 UTC, Paul Konecny
Details
kdenlive pkgbuild (1.50 KB, text/plain)
2015-04-14 19:31 UTC, Paul Konecny
Details
Journald output for radeon (96.17 KB, text/x-log)
2015-04-26 11:18 UTC, Paul Konecny
Details
Backtrace for nouveau (23.42 KB, text/x-log)
2015-07-20 08:29 UTC, Paul Konecny
Details
Journalctl grepped for nouveau (7.64 KB, text/x-log)
2015-07-20 08:30 UTC, Paul Konecny
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Konecny 2015-04-14 19:26:47 UTC
Hello J-B, 
I tried grading a GoPro video with GPU effects and this lead to a GPU lockup. 
I don't exactly know what caused this. It could be Arch drivers, upstream, movit, kdenlivee or a combination of all. 
The effects used were White balance, Saturation and Deconvolution sharpen. 
After adding all effects and trying to seek the timeline my screen went black, came back, went black again and so forth. switching to console I got the error messages you can see in the attached images. 
I don't know if this is the right place to start searching for a solution but I just wanted to let you know. 
I'll attach the pkgbuilds for kdenlive, movit and mlt I used.
If you want I can send you the video file I used. Sadly I have no project file as things get funky before I can save one. 

Reproducible: Always

Steps to Reproduce:
1. Create project
2. Add clips to bin and drag to timeline
3. Add effects
4. Seek timeline and watch GPU crash and burn



Hardware:
AMD Phenom X6 1090T, 8GB RAM
AMD Radeon HD6970, 2 GB RAM
Software:
Arch 64bit
Kdenlive version up to commit  2a5bd36
Mlt Version 	0.9.6
movit 1.1.3
Comment 1 Paul Konecny 2015-04-14 19:28:05 UTC
Created attachment 92027 [details]
Effects in Timeline
Comment 2 Paul Konecny 2015-04-14 19:29:12 UTC
Created attachment 92028 [details]
Radeon crash messages
Comment 3 Paul Konecny 2015-04-14 19:30:46 UTC
Created attachment 92029 [details]
movit pkgbuild
Comment 4 Paul Konecny 2015-04-14 19:31:10 UTC
Created attachment 92030 [details]
MLT pkgbuild
Comment 5 Paul Konecny 2015-04-14 19:31:41 UTC
Created attachment 92031 [details]
kdenlive pkgbuild
Comment 6 Jean-Baptiste Mardelle 2015-04-14 21:36:58 UTC
I could reproduce some kind of monitor freeze after seeking with the 3 effects mentionned.
I have reported a few issues to Dan (MLT's author) about OpenGL and Movit that he is investigating. Maybe that will help, we have to wait for his feedback since that stuff is not easy to understand for me.
Comment 7 Paul Konecny 2015-04-15 07:22:12 UTC
Thanks J-B I'll test it today on my notebook with Intel HD4400 graphics.  
What GPU are you using? Did yours lock up as well? Cause if not maybe the issues you found should be reported to the mesa / kernel devs to fix the radeon driver.
Comment 8 Paul Konecny 2015-04-15 10:39:03 UTC
I just tried it on my notebook and while the Intel driver didn't crash my system became so slow it was unusable. I guess the Intel IGP can't handle the computational workload ? Good news is, I now have a project file for you if you want. 

Bad news: My notebook has hybrid graphics so I thought I'd try crashing my laptops radeon but kdenlive wouldn't even start. I filed the bug for this here: https://bugs.kde.org/show_bug.cgi?id=346213
Comment 9 Paul Konecny 2015-04-26 11:16:42 UTC
I managed to recover the journal from the lockup and it seems pretty bad. 
I'll attach it. I hope it helps you and Dan.
Comment 10 Paul Konecny 2015-04-26 11:18:13 UTC
Created attachment 92228 [details]
Journald output for radeon
Comment 11 Paul Konecny 2015-04-26 12:22:08 UTC
As the Intel driver didn't lock up I decided to file a bug for r600g as well. 
https://bugs.freedesktop.org/show_bug.cgi?id=90184
I hope they can help.
Comment 12 Paul Konecny 2015-05-17 15:38:04 UTC
Hi J-B,
 after further testing on this issue I wanted to ask if it is possible for kdenlive to detect a GPU lockup?
I went through my logs again and it seems that the GPU is able to reset successfully but locks up again immediately after reset. Is there a way to cut the command stream of kdenlive to the GPU if a GPU lockup is detected?
Comment 13 Jean-Baptiste Mardelle 2015-05-18 19:35:12 UTC
Paul, just to be sure, did you try Kdenlive's recent git ? I made some changes around the 10th of may that might help with this problem.

The Deconvolution effect required a huge computing power, so seeking is a pain, and there is a big lag, but no freeze for me.
Comment 14 Paul Konecny 2015-05-18 20:09:49 UTC
Yes, I tried it again with Version 15.07.0 (rev. v15.03.97-287-gb74d079) (from 12.05. I think)
On my Intel hardware there is no lockup. As you said, the system just slows down considerably.
On my HD6970 though, as soon as I drop the effect on the clip ... boom. 
As you can see below, the GPU locks up is then reset by the driver, tries to resume processing and hangs again. As mentioned above I already filed a bug for radeon but up until now I've got no responses. Unfortunately I don't have access to a newer GCN card to see if it's just a problem with my GPU generation. Do you have access to AMD hardware so you could try as well?
I was lucky that I got the dmesg via a shell on vt6 as every reset also erased my typing from the command line. 

[ 3456.613024] radeon 0000:01:00.0: ring 0 stalled for more than 29503msec
[ 3456.616011] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000008360a last fence id 0x000000000008360c on ring 0)
[ 3457.112264] radeon 0000:01:00.0: ring 0 stalled for more than 30003msec
[ 3457.115228] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000008360a last fence id 0x000000000008360c on ring 0)
[ 3457.233528] radeon 0000:01:00.0: Saved 42 dwords of commands on ring 0.
[ 3457.233611] radeon 0000:01:00.0: GPU softreset: 0x00000009
[ 3457.233614] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5704828
[ 3457.233616] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFE000001
[ 3457.233618] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xFE000001
[ 3457.233619] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3457.233686] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[ 3457.233688] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3457.233690] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[ 3457.233692] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008000
[ 3457.233693] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80030243
[ 3457.233695] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 3457.233697] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[ 3457.233699] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x00000000
[ 3457.233701] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000
[ 3457.233704] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[ 3457.233706] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[ 3457.248306] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B
[ 3457.248359] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[ 3457.249513] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[ 3457.249515] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 3457.249517] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 3457.249519] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3457.249586] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[ 3457.249588] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3457.249589] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 3457.249591] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 3457.249593] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[ 3457.249595] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 3457.249597] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[ 3457.249674] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 3457.288596] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[ 3457.290595] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
[ 3457.290666] radeon 0000:01:00.0: WB enabled
[ 3457.290669] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff880221362c00
[ 3457.291431] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90012532118
[ 3457.291433] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff880221362c04
[ 3457.291434] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff880221362c08
[ 3457.291436] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff880221362c0c
[ 3457.291437] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff880221362c10
[ 3457.310054] [drm] ring test on 0 succeeded in 3 usecs
[ 3457.310061] [drm] ring test on 3 succeeded in 4 usecs
[ 3457.310068] [drm] ring test on 4 succeeded in 4 usecs
[ 3457.486939] [drm] ring test on 5 succeeded in 2 usecs
[ 3457.486945] [drm] UVD initialized successfully.
[ 3457.491545] [drm] ib test on ring 0 succeeded in 0 usecs
[ 3457.492036] [drm] ib test on ring 3 succeeded in 0 usecs
[ 3457.492469] [drm] ib test on ring 4 succeeded in 0 usecs
[ 3457.644113] [drm] ib test on ring 5 succeeded
[ 3467.629336] radeon 0000:01:00.0: ring 0 stalled for more than 10000msec
[ 3467.632010] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000008360e last fence id 0x0000000000083610 on ring 0)
[ 3468.128563] radeon 0000:01:00.0: ring 0 stalled for more than 10500msec
[ 3468.131315] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000008360e last fence id 0x0000000000083610 on ring 0)
[ 3468.627795] radeon 0000:01:00.0: ring 0 stalled for more than 11000msec
Comment 15 Paul Konecny 2015-06-22 13:42:59 UTC
Hi JB, 
I just got a chance to test this on an AMD HD7950 Tahiti GPU and fortunately there's no lockup there. I seems this is only an issue on older VLIW4 and maybe VLIW5 cards (up to HD6000). 
GCN cards don't seem to be affected (HD7000 and newer). 
As this does not appear to be an issue of kdenlive I believe we can close this. 
Thanks!

On a sidenote: You mentioned there would be a dev sprint this summer where donations would be appreciated. Any news on that? Me and Jesse (a professional video editor) would love to support you guys and the great work you're doing.
Thanks and Cheers!
Comment 16 Paul Konecny 2015-07-20 08:29:36 UTC
Created attachment 93656 [details]
Backtrace for nouveau

Version 15.07.0 (rev. v15.04.0-439-g0d73372)
mlt 0.9.6
movit 1.1.3
It seems that nouveau is also affected by this. Although not as bad as r600g. I was able to get a backtrace with SIGABRT out of it before it locked up. I'll try the nvidia proprietary driver and report back. 
Thanks!
Comment 17 Paul Konecny 2015-07-20 08:30:22 UTC
Created attachment 93657 [details]
Journalctl grepped for nouveau
Comment 18 Paul Konecny 2015-07-20 08:50:19 UTC
I can confirm nvidia 352 has no issues. 
Affected drivers so far are r600g and nouveau.
Should there be a warning for people using these drivers?