Bug 408004

Summary: No display: ATSC tuner appears to work, but "mjpeg demux error: cannot peek"
Product: [Applications] kaffeine Reporter: awf_ul <awf_ul>
Component: generalAssignee: Mauro Carvalho Chehab <mchehab>
Status: RESOLVED UPSTREAM    
Severity: normal    
Priority: NOR    
Version: 2.0.17   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Bug Depends on:    
Bug Blocks: 408090    
Attachments: dvb5scan
dvb5zap
dvb_channel.conf
dvb-fe
dmesg from f29 kern 4.18.16-300 - good
dmesg from f30 kern 5.1.6-300 - bad
lspci from f29 kern 4.18.16-300 - good
lspci from f30 kern 5.1.6-300 - bad
Second bisect; failure repeatable and solid
dmesg after bisect2 - from 4.20.0-rc4 - good
dmesg after bisect2 - from 4.20.0-rc4 - bad
dmesg from kernel v5.2
lspci from kernel v5.2

Description awf_ul 2019-05-27 17:40:59 UTC
ATSC TV tuner appears to work, Channels show lock and good SNR on known good station.

gets [00007fe81001d3e0] mjpeg demux error: cannot peek

STEPS TO REPRODUCE
1. Start Kaffeine using known scan tables from Fedora 29 which was functional
2. Select Digital TV; says Playing in upper left corner, station info and time display
3. 

OBSERVED RESULT

Black screen, randomly *does* start to function but very rarely.


27-05-19 13:13:42.059 [Info    ] kaffeine.dvb: Using built-in dvb device manager
27-05-19 13:13:42.315 [Info    ] kaffeine.dev: Found dvb device P1131713317de7352: Nextwave NXT200X VSB/QAM frontend
27-05-19 13:13:42.524 [Warning ] kf5.kio.core: "Could not enter folder tags:/."
[00007fe81001d3e0] mjpeg demux error: cannot peek
[00007fe82901dbe0] mjpeg demux error: cannot peek
[00007fe82901df50] mjpeg demux error: cannot peek
[00007fe82901e1b0] mjpeg demux error: cannot peek
[00007fe82901e1b0] mjpeg demux error: cannot peek
[00007fe82901e1b0] mjpeg demux error: cannot peek


EXPECTED RESULT
Ran well in Fedora 29. Online updated to Fedora 30. Other video player like VLC works with tuner and plays ATSC channel.


SOFTWARE/OS VERSIONS
 
Linux/KDE Plasma: 
Up to date Fedora 30, x86_64. 

ADDITIONAL INFORMATION

58930.143142] radeon_dp_aux_transfer_native: 32 callbacks suppressed <repeated many times>

In DMesg: [131517.358412] dvb_frontend: dvb_frontend_get_frequency_limits: frequency interval: tuner: 0...0, frontend: 54000000...860000000
Comment 1 awf_ul 2019-05-29 15:35:37 UTC
Updated today, May 25th 2019 newest update to video drivers.

Is functioning. Thank You.
Comment 2 awf_ul 2019-05-30 00:15:59 UTC
Nope; it appears to not be fixed. If the tuner or screenbuffer is opened, tuned, initialized by another app it appears to work. Otherwise, just a blank screen.
Comment 3 Mauro Carvalho Chehab 2019-05-30 02:00:42 UTC
(In reply to awf_ul from comment #2)
> Nope; it appears to not be fixed. If the tuner or screenbuffer is opened,
> tuned, initialized by another app it appears to work. Otherwise, just a
> blank screen.

Why did you open #408090? I can't see the difference from this one...

What's really weird here is to complain about mjpeg... This sounds completely bogus, as it doesn't sound right to receive mjpeg streams with ATSC.

In order to be able to debug it, I need ~60 seconds with a full record of the channel you're having problems, in order to use a RF generator and test it locally. It can be captured using dvbv5-zap:

https://linuxtv.org/wiki/index.php/Dvbv5-zap#Recording_the_full_MPEG_stream_on_a_transponder

Even 60 seconds of record will produce a big file. So, you'll likely need to use Google Driver or something similar and e-mail me a link for download.
Comment 4 Mauro Carvalho Chehab 2019-05-30 02:01:06 UTC
*** Bug 408090 has been marked as a duplicate of this bug. ***
Comment 5 awf_ul 2019-05-31 14:12:05 UTC
Created attachment 120438 [details]
dvb5scan
Comment 6 awf_ul 2019-05-31 14:12:35 UTC
Created attachment 120439 [details]
dvb5zap
Comment 7 awf_ul 2019-05-31 14:14:07 UTC
Created attachment 120440 [details]
dvb_channel.conf
Comment 8 awf_ul 2019-05-31 14:16:09 UTC
Created attachment 120441 [details]
dvb-fe
Comment 9 Mauro Carvalho Chehab 2019-05-31 14:20:22 UTC
(In reply to awf_ul from comment #6)
> Created attachment 120439 [details]
> dvb5zap

Good! I need the file created by this command (mpeg.ts).

As I said before, as it is recording the full TS with all tables, this is too big to be attached here (or even to be sent via e-mail). You need to place it on some place (like Google Drive or any other place where I can get it) and send me a link to download it (it could be directly to my e-mail).
Comment 10 awf_ul 2019-05-31 14:33:46 UTC
I first did a dvb5scan to identify my local channels using ATSC center freq.

-> dvbv5-scan -o ./dvb_channel.conf /usr/share/dvbv5/atsc/us-ATSC-center-frequencies-8VSB -C US -v

         *an anomaly - certain stations that locked did not make it into    dvb_channel.conf

Then I did a dvb5-zap to dump the stream.

-> dvbv5-zap -P -t 60 "FOX66" -c ./dvb_channel.conf -o mpeg.ts

The mpeg stream was empty file, several attempts, after power down to reset board. Makes sense as the display screen is always blank.

Note: oddly the on screen display is updating, the EPG?, showing the upcoming shows correctly. Isn't that some kind of a substream? Why no mpeg.ts output then?


<why did I open the other bug? This is maybe the second time ever I have used these systems and do not have a statechart of operations or a feel of how it all works; am just a messenger. I can open a bug, close a bug, but not reopen one? So I cloned it. I did not know what to do to make sure it wasn't lost. >
Comment 11 awf_ul 2019-06-03 19:04:13 UTC
confirmed many times that the dvb5zap would start "Record to file 'mpeg.ts' started", but dump no mpeg.ts file.

After maybe 20-30 tries, and an overnight time delay, dvb5zap actually recorded an mpeg.ts file, and at that time kaffeine began to work normally across many stations.

After shutting down the app, it would black screen again, dvb5zap created empty mpeg.ts files again. Same after power cycle.

It appears to me that for some reason the board (KWorld ATSC110) is not always initializing properly.

My system is very stable in all regards, voltages good, though an outside chance a bad tuner board?
Comment 12 awf_ul 2019-06-03 19:19:44 UTC
https://gofile.io/?c=tlNZN2  - a bit short as it caught me off guard when it began to finally record.
Comment 13 Mauro Carvalho Chehab 2019-06-03 20:17:37 UTC
(In reply to awf_ul from comment #11)
> confirmed many times that the dvb5zap would start "Record to file 'mpeg.ts'
> started", but dump no mpeg.ts file.
> 
> After maybe 20-30 tries, and an overnight time delay, dvb5zap actually
> recorded an mpeg.ts file, and at that time kaffeine began to work normally
> across many stations.
> 
> After shutting down the app, it would black screen again, dvb5zap created
> empty mpeg.ts files again. Same after power cycle.
> 
> It appears to me that for some reason the board (KWorld ATSC110) is not
> always initializing properly.
> 
> My system is very stable in all regards, voltages good, though an outside
> chance a bad tuner board?

If you're having problems with dvbv5-zap too, then indeed it is a hardware problem. There are some alternatives:

1) bad antenna/antenna cabling. Usually PC devices require a stronger/better signal than normal TV sets. So, even if you have a good signal at TV, you may not have a good enough signal for those cheap USB sticks;

2) it could indeed be a broken USB stick that it is starting to fail;

3) If you're using hardware like Raspberry Pi, it could be power problem at USB. There are known problems with USB hardware on some RPi devices that cause the device to restart multiple times. Also, you need very strong power supply, in order to give enough power to RPi3 and still have at least 500 mW for the USB stick, without dropping voltage.

The dvbv5-zap has a monitoring mode (-m). It runs the device on "all-pid" mode (just like -P) and checks if there are broken frames, counting them. I suggest you to run it for some time on this mode and see what happens.
Comment 14 awf_ul 2019-06-05 17:21:47 UTC
Update:
Not a hardware problem. Everything runs well in Fedora 29 Live but broken in Fedora 30. So broken dvbv5-scan is not functioning properly.

What I did:
Loaded Fedora 29 Live from DVD. Reloaded saa7134 module so board firmware would load. dvbv5-scan ran fine, created all channels in dvb_channels.conf. dvb-zap ran well on two channels, capturing streams from both. Both streams fine using VLC.

I then tried Fedora 30. I did dnf clean all; dnf update; dnf distro-sync.
dvbv5-scan sees and locks a good channel but does not get the information from the stream.

...
Scanning frequency #6 177028615
ERROR    command INVERSION (6) not found during store
ERROR    command INVERSION (6) not found during store
FREQUENCY = 177028615
MODULATION = VSB/8
DELIVERY_SYSTEM = ATSC
       (0x00) Signal= 88.40% C/N= 51.32% UCB= 33630 postBER= 2328
Got parameters for ATSC:
FREQUENCY = 177028615
MODULATION = VSB/8
Lock   (0x1f) Signal= 89.55% C/N= 53.58% UCB= 35112 postBER= 3936
dvb_read_sections: waiting for table ID 0x00, program ID 0x00
ERROR    dvb_read_sections: no data read on section filter
ERROR    error while waiting for PAT table
Scanning frequency #7 183028615
...
Comment 15 Mauro Carvalho Chehab 2019-06-05 17:29:48 UTC
(In reply to awf_ul from comment #14)
> Update:
> Not a hardware problem. Everything runs well in Fedora 29 Live but broken in
> Fedora 30. So broken dvbv5-scan is not functioning properly.
> 
> What I did:
> Loaded Fedora 29 Live from DVD. Reloaded saa7134 module so board firmware
> would load. dvbv5-scan ran fine, created all channels in dvb_channels.conf.
> dvb-zap ran well on two channels, capturing streams from both. Both streams
> fine using VLC.
> 
> I then tried Fedora 30. I did dnf clean all; dnf update; dnf distro-sync.
> dvbv5-scan sees and locks a good channel but does not get the information
> from the stream.
> 
> ...
> Scanning frequency #6 177028615
> ERROR    command INVERSION (6) not found during store
> ERROR    command INVERSION (6) not found during store
> FREQUENCY = 177028615
> MODULATION = VSB/8
> DELIVERY_SYSTEM = ATSC
>        (0x00) Signal= 88.40% C/N= 51.32% UCB= 33630 postBER= 2328
> Got parameters for ATSC:
> FREQUENCY = 177028615
> MODULATION = VSB/8
> Lock   (0x1f) Signal= 89.55% C/N= 53.58% UCB= 35112 postBER= 3936
> dvb_read_sections: waiting for table ID 0x00, program ID 0x00
> ERROR    dvb_read_sections: no data read on section filter
> ERROR    error while waiting for PAT table
> Scanning frequency #7 183028615
> ...

Ok. It seems very likely that this is either a Kernel or a dvbv5 library regression.

Let's try to isolate it. Could you please try to:

1) run the Fedora 29 Kernel you have at the LiveCD on Fedora 30?

You should be able to download it from https://koji.fedoraproject.org/koji/, for example (or even from a F29 repository).

2) do the same for dvbv5?
Comment 16 awf_ul 2019-06-05 18:43:36 UTC
I installed kernel 4.18.16 from F29 onto my F30 environment.

dvbv5-scan worked, as well as dvb-zap, and I can view the mpeg.ts file with vlc. ,apparently pointing to something problematic in the 5.1.x kernel.
Comment 17 Mauro Carvalho Chehab 2019-06-05 19:13:25 UTC
(In reply to awf_ul from comment #16)
> I installed kernel 4.18.16 from F29 onto my F30 environment.
> 
> dvbv5-scan worked, as well as dvb-zap, and I can view the mpeg.ts file with
> vlc. ,apparently pointing to something problematic in the 5.1.x kernel.

Ok. Please keep doing more tests with this Kernel version. As this seems to be an intermitent bug, better to keep this running for a while. If it keep working properly for a couple of days, please attach the output of dmesg with the working Kernel and with the problematic one.

The device you're using is the saa7134-based Kworld ATSC 110, right?

The only change on this driver since 4.18 was the addition of composite and s-video inputs for "ASUS P7131 4871", with shoudn't affect anything.

So, if a regression was introduced, it was somewhere either at the PCI core or inside the Kernel core. Before trying to dig into it, better to confirm that the bug has gone with 4.18.
Comment 18 awf_ul 2019-06-07 13:21:55 UTC
The KWorld ATSC 115 (PCI ID 17de:7352) board works without fail on the Fedora 29 4.18.16-300 kernel under an up to date Fedora 30 environment. It fails continuously under the current Fedora 30 5.1.6-300 kernel under the same conditions.

I have attached the dmesgs from each kernel, as well as the lspci -vvv.

If I may add; under the 5.1.x kernels, the board did work maybe one in twenty times, just once. And only after I tried multiple multiple dtv applications. It appeared the board was not being initialized or maybe tuned properly.

Thank You for your assistance!
Comment 19 awf_ul 2019-06-07 13:23:17 UTC
Created attachment 120655 [details]
dmesg from f29 kern 4.18.16-300 - good
Comment 20 awf_ul 2019-06-07 13:24:12 UTC
Created attachment 120656 [details]
dmesg from f30 kern 5.1.6-300 - bad
Comment 21 awf_ul 2019-06-07 13:25:28 UTC
Created attachment 120657 [details]
lspci from f29 kern 4.18.16-300 - good
Comment 22 awf_ul 2019-06-07 13:26:19 UTC
Created attachment 120658 [details]
lspci from f30 kern 5.1.6-300 - bad
Comment 23 awf_ul 2019-06-07 13:30:07 UTC
I will add there doesn't appear to be any obvious significant difference between the dmesg's or the lspci's.
Comment 24 Mauro Carvalho Chehab 2019-06-07 15:00:19 UTC
Logs seem normal. Nothing seems to point to a problem. I guess you'll need to do a git bisect, in order to pinpoint the patch that broke it.

There are some instructions about how to proceed at:

   https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
   https://wiki.gentoo.org/wiki/Kernel_git-bisect

A bisect usually takes ~14 Kernel compilations, in order to be able to identify a broken patch.

I also asked a friend of mine with has some Kworld devices if he may be able do a test there, and eventually help with bisect. However, it may take a while for him to do that, as he needs to find not only his board, but also a working machine with a PCI slot.
Comment 25 awf_ul 2019-07-14 12:54:33 UTC
Did bisect, checking each kernel compilation along the way; the failing commit is here:

3731c3d4774e38b9d91c01943e1e6a243c1776be is the first bad commit
commit 3731c3d4774e38b9d91c01943e1e6a243c1776be
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Dec 6 12:50:26 2018 -0800

    dma-mapping: always build the direct mapping code
    
    All architectures except for sparc64 use the dma-direct code in some
    form, and even for sparc64 we had the discussion of a direct mapping
    mode a while ago.  In preparation for directly calling the direct
    mapping code don't bother having it optionally but always build the
    code in.  This is a minor hardship for some powerpc and arm configs
    that don't pull it in yet (although they should in a relase ot two),
    and sparc64 which currently doesn't need it at all, but it will
    reduce the ifdef mess we'd otherwise need significantly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Tested-by: Tony Luck <tony.luck@intel.com>

:040000 040000 639c86795e7435f7dde3bf8a48f183f932c08e16 ab40fc2f930f14428513fca3
739ef28b7b9cefd1 M	arch
:040000 040000 28ddf6fbe5648c7655ac4eed39116e451c996885 e5c852697674caada8c62f1936646e0155d90061 M	kernel
Comment 26 awf_ul 2019-07-14 15:47:38 UTC
As I added before it looks like the PC may have to be powered down between new kernels; there is a possibility the initialization of the card may stick across reboots. Once in maybe 10-20 tries, the card works in what would normally be a "bad" kernel, but it may be from the previous "good" kernel.
That said, I will replay the last good/bad bisect with power down to verify.
Comment 27 awf_ul 2019-07-15 02:03:14 UTC
Ooops; somethings wrong... I'll go over it again.
Comment 28 awf_ul 2019-07-16 11:29:11 UTC
Created attachment 121549 [details]
Second bisect; failure repeatable and solid

Repeated last few stages of bisect; now have two kernels across this boundary; one works with PCIe board, and the other does not.
Comment 29 awf_ul 2019-07-16 16:17:54 UTC
Created attachment 121554 [details]
dmesg after bisect2 - from 4.20.0-rc4 - good
Comment 30 awf_ul 2019-07-16 16:19:09 UTC
Created attachment 121555 [details]
dmesg after bisect2 - from 4.20.0-rc4 - bad
Comment 31 awf_ul 2019-07-17 01:48:55 UTC
Created attachment 121568 [details]
dmesg from kernel v5.2
Comment 32 awf_ul 2019-07-17 02:18:46 UTC
Created attachment 121569 [details]
lspci from kernel v5.2
Comment 33 awf_ul 2019-07-17 02:28:21 UTC
Below is the result of the second bisect (see bisect log in attachment "Second bisect; failure repeatable and solid") 

55897af63091ebc2c3f239c6a6666f748113ac50 is the first bad commit
commit 55897af63091ebc2c3f239c6a6666f748113ac50
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 3 11:43:54 2018 +0100

    dma-direct: merge swiotlb_dma_ops into the dma_direct code
    
    While the dma-direct code is (relatively) clean and simple we actually
    have to use the swiotlb ops for the mapping on many architectures due
    to devices with addressing limits.  Instead of keeping two
    implementations around this commit allows the dma-direct
    implementation to call the swiotlb bounce buffering functions and
    thus share the guts of the mapping implementation.  This also
    simplified the dma-mapping setup on a few architectures where we
    don't have to differenciate which implementation to use.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Tested-by: Tony Luck <tony.luck@intel.com>

:040000 040000 e9b0807231777e3da732cb3f744a8867f3128f90 95962d177078f04ae1bd66f560c87a9eed9d0f0e M	arch
:040000 040000 e7a90740563e49061bb6baf6413e9de37a919820 b537630e5cf02def1bcfac744cd06da0f8201ff8 M	include
:040000 040000 4698daf585ffbc830cfda58e8a2e8390f66b4cf9 a942b6a22ca5f49aaef13490a03e4589766fd497 M	kernel
Comment 34 awf_ul 2019-07-17 02:30:44 UTC
Sent upstream to iommu@lists.linux-foundation.org on advice of Christoph Hellwig. Will update this bug.
Comment 35 awf_ul 2019-07-19 13:50:26 UTC
Fixed. I applied the following patch to kernel v5.2 by the great efforts of those on the iommu@lists.linux-foundation.org team and Mauro. My machine works as normal.

Fixes: 55897af63091("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
---
  kernel/dma/direct.c | 18 +++++++++++-------
  1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index b90e1ae..0e87f86 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -242,12 +242,14 @@ void dma_direct_sync_sg_for_device(struct device *dev,
      int i;
        for_each_sg(sgl, sg, nents, i) {
-        if (unlikely(is_swiotlb_buffer(sg_phys(sg))))
-            swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
+        phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
+
+        if (unlikely(is_swiotlb_buffer(paddr)))
+            swiotlb_tbl_sync_single(dev, paddr, sg->length,
                      dir, SYNC_FOR_DEVICE);
            if (!dev_is_dma_coherent(dev))
-            arch_sync_dma_for_device(dev, sg_phys(sg), sg->length,
+            arch_sync_dma_for_device(dev, paddr, sg->length,
                      dir);
      }
  }
@@ -279,11 +281,13 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
      int i;
        for_each_sg(sgl, sg, nents, i) {
+        phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
+
          if (!dev_is_dma_coherent(dev))
-            arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
-   
-        if (unlikely(is_swiotlb_buffer(sg_phys(sg))))
-            swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length, dir,
+            arch_sync_dma_for_cpu(dev, paddr, sg->length, dir);
+
+        if (unlikely(is_swiotlb_buffer(paddr)))
+            swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
                      SYNC_FOR_CPU);
      }
Comment 36 Mauro Carvalho Chehab 2019-07-19 14:11:30 UTC
(In reply to awf_ul from comment #35)
> Fixed. I applied the following patch to kernel v5.2 by the great efforts of
> those on the iommu@lists.linux-foundation.org team and Mauro. My machine
> works as normal.
> 
> Fixes: 55897af63091("dma-direct: merge swiotlb_dma_ops into the dma_direct
> code")
> Signed-off-by: Fugang Duan <fugang.duan@nxp.com>

Great! Thanks for your help tracking this.

As the bug was solved at Kernel upstream, I'm closing this one.