Bug 372023 - ISO files listing/extracting broken
Summary: ISO files listing/extracting broken
Status: RESOLVED FIXED
Alias: None
Product: krusader
Classification: Applications
Component: general (show other bugs)
Version: 2.5.0
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Nikita Melnichenko
URL:
Keywords: reproducible, triaged
Depends on:
Blocks:
 
Reported: 2016-11-03 13:49 UTC by Andrej Krutak
Modified: 2018-06-22 08:16 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrej Krutak 2016-11-03 13:49:48 UTC
Whichever ISO file I open, the contents are not shown completely (e.g. sometimes ISO9660 directory is empty, sometimes not), and opening any of the files only shows garbage.

Krusader 2.4.0 and MC show the contents of the same ISO correctly...
Comment 1 Alex Bikadorov 2016-11-22 18:56:02 UTC
I'm not familiar with ISO. When opened with the iso:/ protocol the first level are always three directories:

>/El Torito Boot
>/El Torito BootJoliet level 3
>/ISO9660

The actual content is in one of these and Ark shows this content directly.

And yes, opening any of the files shows garbage. Maybe an encoding issue.

Windows/Microsoft ISOs cannot be be opened with Krusader OR Ark. Content is one README file:

> This disc contains a "UDF" file system and requires an operating system
> that supports the ISO-13346 "UDF" file system specification.

But I don't have an ISO with missing files. Can you give an example?
Comment 2 Theo 2018-05-24 21:13:49 UTC
Can confirm for kio_iso 2.6.0-2.1 from openSUSE Tumbleweed. I don't know about missing files, but I get binary garbage on the first try opening a file. Reloading the opened file (for instance by pressing F5 in KrViewer, KWrite, or Okteta) gives me the correct data of the file.
Comment 3 Nikita Melnichenko 2018-05-26 05:54:10 UTC
What's the example of such a file? The smaller the better. I can't repro so far.
Comment 4 Theo 2018-05-26 15:25:53 UTC
(In reply to Nikita Melnichenko from comment #3)
> What's the example of such a file? The smaller the better.
It happens with every ISO image I try. I did a test with a small sample size of ISO images and the data offset seems to be off by the same sector count on the first try for a given ISO. I got 52, 64, and 80 2K sectors ahead for three different CDs/DVDs.

For instance, for

https://sourceforge.net/projects/supergrub2/files/1.30/

kio_iso seems to deliver data that is 52 sectors ahead on the first try.

By the way, is it normal that a lot of data is read when I access a small file on a big ISO image?
Comment 5 Nikita Melnichenko 2018-05-28 05:30:42 UTC
(In reply to Theo from comment #4)
> (In reply to Nikita Melnichenko from comment #3)
> > What's the example of such a file? The smaller the better.
> It happens with every ISO image I try. I did a test with a small sample size
> of ISO images and the data offset seems to be off by the same sector count
> on the first try for a given ISO. I got 52, 64, and 80 2K sectors ahead for
> three different CDs/DVDs.

How did you determine this? This may be useful for later debugging.

> 
> For instance, for
> 
> https://sourceforge.net/projects/supergrub2/files/1.30/
> 
> kio_iso seems to deliver data that is 52 sectors ahead on the first try.

This image is not identified as ISO in my krusader (v2.7). Dolphin recognizes it fine and reads from the first try. For example, I can browse to

iso:/.../sgd_cdrom_1.30.iso/El Torito BootJoliet level 3/boot/grub/

without a problem. If the link is directly copied to Krusader, it also opens the link fine.

> 
> By the way, is it normal that a lot of data is read when I access a small
> file on a big ISO image?

I'm not sure as I'm not familiar with the format.
Comment 6 Theo 2018-05-28 12:21:33 UTC
(In reply to Nikita Melnichenko from comment #5)
> (In reply to Theo from comment #4)
> > It happens with every ISO image I try. I did a test with a small sample size
> > of ISO images and the data offset seems to be off by the same sector count
> > on the first try for a given ISO. I got 52, 64, and 80 2K sectors ahead for
> > three different CDs/DVDs.
> 
> How did you determine this?

By brute force. I opened a file on the ISO with Okteta using iso:/, copied the first 16 bytes of binary garbage in hex format, then fed this into a binary grep on the whole ISO image to get the offset. The same for the correct data after reloading the file. I did this for more than one file for every tested ISO image.

I use https://github.com/tmbinc/bgrep for the binary grep, but it should also be possible to use standard grep (or some other method), albeit not as fast: https://stackoverflow.com/questions/4180081/binary-grep-on-linux/17168777#17168777

Example:

$ okteta iso:/home/user/sgd_cdrom_1.30.iso/ISO9660/grubdetect.lua
$ bgrep -A 20 C78508FFFFFF00000000C7850CFFFFFF sgd_cdrom_1.30.iso | sed 's/\\x//g'
sgd_cdrom_1.30.iso: 00042800
c78508ffffff00000000c7850cffffff00000080
sgd_cdrom_1.30.iso: 00042872
c78508ffffff00000000c7850cffffff00000000

In this case I have to use "-A" to check that the first match is the correct one. After reloading the file in Okteta:

$ bgrep 23216C75610A0A66756E6374696F6E20 sgd_cdrom_1.30.iso
sgd_cdrom_1.30.iso: 0005c800
$ echo 'ibase=16;(0005C800-00042800)/800' | bc -l
52.00000000000000000000

> > For instance, for
> > 
> > https://sourceforge.net/projects/supergrub2/files/1.30/
> > 
> > kio_iso seems to deliver data that is 52 sectors ahead on the first try.
> 
> This image is not identified as ISO in my krusader (v2.7).

Could this be related to this annoyance: https://bugs.freedesktop.org/show_bug.cgi?id=99421#c20? I get this nonsense:

$ kmimetypefinder5 sgd_cdrom_1.30.iso
model/x.stl-binary

> Dolphin
> recognizes it fine and reads from the first try. For example, I can browse to
> 
> iso:/.../sgd_cdrom_1.30.iso/El Torito BootJoliet level 3/boot/grub/
> 
> without a problem. If the link is directly copied to Krusader, it also opens
> the link fine.

Doesn't make a difference whether I use Krusader or any other KDE program. It seems to depend only on the iso:/ protocol. Did you also try opening a file on the ISO to check if the data is correct? There might also be a problem with some missing files in the directory listing (comment #0).

> > By the way, is it normal that a lot of data is read when I access a small
> > file on a big ISO image?
> 
> I'm not sure as I'm not familiar with the format.

It's unexpected because one should get the position of the file from the file system on the ISO and then go directly to the file without reading a lot of data from the image. But maybe I have a completely wrong idea about how ISO 9606 works...
Comment 7 Nikita Melnichenko 2018-05-30 04:09:21 UTC
Theo, thanks for providing the details, it will help with debugging. Indeed, I completely missed that I needed to read file contents. I confirm on my end that viewing files on the first try shows binary garbage.

I think we don't have people who know the code of kio_iso and ISO format very well (original dev is retired long ago), so if you feel like you can help figuring out the reason, you're more than welcome to do this. I can help you with the debugging setup, etc.
Comment 8 Theo 2018-06-10 23:18:52 UTC
(In reply to Nikita Melnichenko from comment #7)
> if you feel like you can
> help figuring out the reason, you're more than welcome to do this. I can
> help you with the debugging setup, etc.

What would this setup roughly look like? Using my distributor's debuginfo/debugsource packages and investigating with GDB, I can say the following so far (note that I might not really know what I'm doing here, and that I can provide details on my findings if necessary):

kio_iso uses KArchive's KCompressionDevice[1] to access the ISO file. The device is created in KIso::prepareDevice and KCompressionDevice's constructor creates a QIODevice and a QFileDevice. The corresponding *Private devices have separate sets of member variables 'pos' and 'devicePos' that get out of sync when the QFileDevice is closed at the end of KIso::openArchive. The QFileDevice is reset to offset zero while QIODevice keeps the offset positions it has after reading the ISO file system. When the ISO file is opened for reading the file data, QFileDevice's offset positions are treated as if they haven't been reset and subsequent seeks fall short of the requested offset.

Now I don't know whether QIODevice's offset should be reset when the QFileDevice is closed after reading the file system or when it is opened for reading the requested file data, but I tried the latter after reading the following in the Qt documentation:

 "When subclassing QIODevice, you must call QIODevice::seek() at the start of your function to ensure integrity with QIODevice's built-in buffer."[2]

Replacing

    if (size && !m_isoFile->device()->isOpen()) m_isoFile->device()->open(QIODevice::ReadOnly);

by

    if (size && !m_isoFile->device()->isOpen()) {
        m_isoFile->device()->open(QIODevice::ReadOnly);
        m_isoFile->device()->seek(0);
    }

in kio_isoProtocol::getFile (iso/iso.cpp) seems to work and I get the correct file data on the first try.

This does not fix the issue of missing files in the directory listing, compare for instance the contents of '/ISO9660' and '/El Torito BootJoliet level 3' when browsing sgd_cdrom_1.30.iso as archive.

Regarding my question about reading through a lot of data when requesting file data, it seems that the problem here is kio_iso's use of KCompressionDevice for uncompressed ISO files. KCompressionDevice does not seem to support random access and a seek is implemented as a read from the beginning, even for uncompressed files. This seems like a really stupid idea since ISO files tend to be big and should be read with a method that supports random access.

[1] https://api.kde.org/frameworks/karchive/html/classKCompressionDevice.html
[2] http://doc.qt.io/qt-5/qiodevice.html#seek
Comment 9 Theo 2018-06-13 00:27:41 UTC
(In reply to Theo from comment #8)
> This does not fix the issue of missing files in the directory listing,
> compare for instance the contents of '/ISO9660' and '/El Torito BootJoliet
> level 3' when browsing sgd_cdrom_1.30.iso as archive.

This is caused by the same problem: KCompressionDevice and its QFile, both derived from QIODevice, have member variables 'pos' and 'devicePos' that get out of sync. The following workaround fixes both the incomplete file listing and the wrong file data on the first try (makes my previously posted fix unnecessary): Insert

    dev->seek(0);

before

    if (dev->seek((qint64)start << (qint64)11)) {
        if ((dev->read(buf, len << 11u)) != -1) return (len);
    }

in the callback function 'readf' in kiso.cpp. This seek is done in KCompressionDevice::seek(qint64 pos) only in some cases, see kcompressiondevice.cpp[1]:

    if (d->deviceReadPos < pos) { // we can start from here
[...]
    } else {
        // we have to start from 0 ! Ugly and slow, but better than the previous
        // solution (KTarGz was allocating everything into memory)
        if (!seek(0)) { // recursive
            return false;
        }
[...]
    }

Apparently, this is considered ugly, but I start to believe that KCompressionDevice is broken, and unless someone explains to me why I'm wrong and how KCompressionDevice is supposed to work I see no other way to fix this (again, I'm not a developer and might have no clue what I'm talking about).

Ceterum censeo KCompressionDevice should not be used on uncompressed ISO files anyway: https://bugs.kde.org/show_bug.cgi?id=395296

[1] https://api.kde.org/frameworks/karchive/html/kcompressiondevice_8cpp_source.html#l00178
Comment 10 Nikita Melnichenko 2018-06-13 07:22:41 UTC
Theo, thanks for sharing all your investigation results and a potential fix.

(In reply to Theo from comment #8)
> What would this setup roughly look like? Using my distributor's
> debuginfo/debugsource packages and investigating with GDB
Some people hesitate to do this and to compile the modified code. It's great that you can do it by yourself!

> I start to believe that KCompressionDevice is broken, and unless someone explains to me why I'm wrong
I doubt we have KArchive experts here. If you feel there is a bug in KCompressionDevice implementation, could you please file a bug against the product "frameworks-karchive" to discuss with the devs? This may help other apps too.

Some thoughts why KCompressionDevice might have been used instead of a plain file. If you look into KIso::KIso and KIso::prepareDevice, there is a code to guess mime type and take a different action. AFAIK, ISO format doesn't support any internal compression, so I guess it's for supporting .iso.gz, .iso.bz2 on the fly.

> Insert dev->seek(0);
The workaround seems to be harmless as there is another seek that follows. I'll check more and prepare a code review for the team when I have the right amount of free time.
Comment 11 Theo 2018-06-16 23:04:14 UTC
(In reply to Theo from comment #9)
> The following workaround fixes both the incomplete file listing
> and the wrong file data on the first try (makes my previously posted fix
> unnecessary): Insert
> 
>     dev->seek(0);

I'm afraid I was wrong about that. This only fixes the directory listing. For the correct file data on the first try the previous fix from comment #8 is required. (I think I accidentally kept the changes from the first fix while testing the second fix.)
Comment 12 Nikita Melnichenko 2018-06-20 07:22:18 UTC
FYI: https://phabricator.kde.org/D13626
Comment 13 Theo 2018-06-20 12:48:30 UTC
(In reply to Nikita Melnichenko from comment #10)
> If you feel there is a bug in
> KCompressionDevice implementation, could you please file a bug against the
> product "frameworks-karchive" to discuss with the devs?

Here it is: https://bugs.kde.org/show_bug.cgi?id=395471
Comment 14 Nikita Melnichenko 2018-06-22 07:47:25 UTC
Git commit c33f9629c7e313be9e8bdde0579e00a5f98fad39 by Nikita Melnichenko.
Committed on 22/06/2018 at 07:44.
Pushed by melnichenko into branch 'master'.

KIso: Fixed file offsets of the underlying IO device

The patch is proposed by Theo <alpha0x89@yahoo.de>.

kio_iso uses KArchive's KCompressionDevice to access the ISO file.
The device is created in KIso::prepareDevice and KCompressionDevice's
constructor creates a QIODevice and a QFileDevice. The corresponding
*Private devices have separate sets of member variables 'pos' and
'devicePos' that get out of sync when the QFileDevice is closed at the end
of KIso::openArchive. The QFileDevice is reset to offset zero while
QIODevice keeps the offset positions it has after reading the ISO file
system. When the ISO file is opened for reading the file data,
QFileDevice's offset positions are treated as if they haven't been reset
and subsequent seeks fall short of the requested offset.

The workaround is suggested by Qt documentation [1]
"When subclassing QIODevice, you must call QIODevice::seek() at the start
of your function to ensure integrity with QIODevice's built-in buffer."

For more details see the discussion in the bug.

FIXED: [ 372023 ] ISO files listing/extracting broken

Differential Revision: https://phabricator.kde.org/D13626

[1] http://doc.qt.io/qt-5/qiodevice.html#seek

M  +7    -1    iso/iso.cpp
M  +4    -0    iso/kiso.cpp

https://commits.kde.org/krusader/c33f9629c7e313be9e8bdde0579e00a5f98fad39
Comment 15 Nikita Melnichenko 2018-06-22 08:16:50 UTC
Git commit 16f54c8e1b7a4c223288f7f183e5b36286689097 by Nikita Melnichenko.
Committed on 22/06/2018 at 07:57.
Pushed by melnichenko into branch '2.7'.

KIso: Fixed file offsets of the underlying IO device

The patch is proposed by Theo <alpha0x89@yahoo.de>.

kio_iso uses KArchive's KCompressionDevice to access the ISO file.
The device is created in KIso::prepareDevice and KCompressionDevice's
constructor creates a QIODevice and a QFileDevice. The corresponding
*Private devices have separate sets of member variables 'pos' and
'devicePos' that get out of sync when the QFileDevice is closed at the end
of KIso::openArchive. The QFileDevice is reset to offset zero while
QIODevice keeps the offset positions it has after reading the ISO file
system. When the ISO file is opened for reading the file data,
QFileDevice's offset positions are treated as if they haven't been reset
and subsequent seeks fall short of the requested offset.

The workaround is suggested by Qt documentation [1]
"When subclassing QIODevice, you must call QIODevice::seek() at the start
of your function to ensure integrity with QIODevice's built-in buffer."

For more details see the discussion in the bug.

FIXED: [ 372023 ] ISO files listing/extracting broken

Differential Revision: https://phabricator.kde.org/D13626

[1] http://doc.qt.io/qt-5/qiodevice.html#seek

(cherry picked from commit c33f9629c7e313be9e8bdde0579e00a5f98fad39)

M  +7    -1    iso/iso.cpp
M  +4    -0    iso/kiso.cpp

https://commits.kde.org/krusader/16f54c8e1b7a4c223288f7f183e5b36286689097