Bug 102265 - nested kioslaves for archives
Summary: nested kioslaves for archives
Status: RESOLVED DUPLICATE of bug 73821
Alias: None
Product: kio
Classification: Unmaintained
Component: kioslave (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-23 13:57 UTC by Andy
Modified: 2005-05-15 19:31 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
C++ code to check URIs (632 bytes, text/plain)
2005-04-01 08:48 UTC, Jos van den Oever
Details
java code to check URIs (613 bytes, text/plain)
2005-04-01 08:49 UTC, Jos van den Oever
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andy 2005-03-23 13:57:41 UTC
Version:            (using KDE KDE 3.4.0)
Installed from:    Compiled From Sources
OS:                Linux

In Konqueror, when clicking on an archive on a local drive, it is opened right there using the relevant kioslave, which is very useful.
 
However, when clicking on an archive on a remote drive, e.g. sftp or smb, it is opened using "ark", which looks and feels very different to the Konqueror icon and tree views. 
 
That's confusing and inconsistent. 
 
The problem is that kioslaves can't be nested. When clicking on a local archive, the resulting URL is something like: 
 
  tar:/dir/archive.tgz 
 
This means that the 'tar' slave directly accesses local files, which duplicates functionality of the 'file' slave and stops it from working on remote files (unless they're mounted locally of course). 
 
Ideally, ioslaves for archives would be wrappers around other slaves, thereby allowing for remote access, e.g.: 
 
  tar:(file:/dir/archive.tgz) 
  tar:(sftp://user@server/archive.tgz) 
 
The parentheses (or some other syntax) would be necessary to disambiguate directories when browsing into archives, e.g.: 
 
  tar:(sftp://user@server/archive.zip)/dir 
 
In this way, nested archives would be possible too, e.g.: 
 
  zip:(tar:(file:/dir/archive.tgz)/file.zip) 

I think this should be considered for KDE4.0.
Comment 1 Thiago Macieira 2005-03-24 02:02:37 UTC
Problems with that:

1) That address would no longer be a valid URL. 

We could come up with a "hostname" that encodes the real address, like
  zip://smb%3a%2f%2fserver%2fsharename%2ffile.zip/
but I am afraid that the hostname-strictness we have would also prevent that hostname (STD 3 compliance)

2) IOSlaves don't provide random-access functions. So, in order to do that, we have to download the file all over. That's what happens with ark, but not what happens on local files using the ioslaves.
Comment 2 Andy 2005-03-24 11:42:57 UTC
I accept this isn't a trivial request, but do you agree that nested ioslaves would be good for interface consistency and network transparency?

KDE4 is a great opportunity to make these kinds of substantial changes.


regarding 1)

True, these addresses wouldn't be URLs as we know them, but should that be a showstopper? KURL could be extended to support nested protocols, whereby standard one-protocol URLs would just become a special case.

I don't know what STD3 compliance entails, but I don't think the hostname encoding scheme would be a good idea anyway because it's not really human-readable and because it wouldn't allow for multiple nesting levels.


Regarding 2)

I don't think that would be a problem from the user's perspective, as long as the download is performed transparently, e.g. the user doesn't get to see the temp file address, and longer downloads are indicated by a progress bar, which is of course what already happens when applications access ioslaves.

But perhaps it's a good time anyway to look at random-access functions in ioslaves for protocols that do support it?
Comment 3 Thiago Macieira 2005-03-24 12:19:07 UTC
Should the fact that you can't represent the address as an URL be a problem? Yes, a very grave one indeed. If it isn't an URL, it cannot be passed to other programs.

Extending KURL to "accept" the broken URIs would be unwise, as we are trying to clean it up. Maybe we could conjure up a new URI scheme that would allow us to do that, but remember that URIs are a superset of URLs and not all programs can accept them.

As for random-access functions, I fully agree with you.
Comment 4 Andy 2005-03-24 13:12:48 UTC
If nested URLs were implemented in KURL and kio, wouldn't all KDE programs automatically be able to use them? In the case of non-KDE programs, they have to be tricked using temporary files anyway if they're supposed to work with ioslaves. So I'm not sure I understand the problem.

As for cleaning up or splitting up KURL, I don't think that necessarily rules out extending its functionality, as long as it's done in a coherent and well though-out way.

Thinking about it, things like tar and zip aren't really protocols, but filters, so perhaps a syntax along the following lines might be worth considering:

sftp://user@server/dir/archive.tgz|tar/file

This would allow for multiple filters without requiring parentheses and correspond nicely with shell syntax. (But I suspect the bar character isn't available to be used like that?)
Comment 5 Thiago Macieira 2005-03-24 21:14:29 UTC
There's no such thing as a nested URL. It's either a proper URL, or it's not.

However, if we get decent URI support (which I intend to provide), we could invent an URI scheme of our own and do things like:

accessing a file inside a remote tarball:
  multi:fish://me@server.domain.com/home/me/file.tar.gz,tar:/directory/file.c
automatically uncompressing a gzip's content:
  multi:file:///home/thiago/text.ps.gz,gzip:/

It would probably be a good idea to try and standardise that as a FreeDesktop spec, so that we are allowed to pass them in %U/%u.

It would also be possible to craft such a solution with the current implementation, using kioslaves:

multi://me@server.domain.com/?p1=fish,home/me/file.tar.gz&p2=tar,directory/file.c
multi:///?p1=file,home/thiago/text.ps.gz&p2=gzip

What do you think?
I prefer the URI way, since it's cleaner, but I'm not sure if it is supported by KIO currently. What it would lack is proper hostname-encoding, since KURL doesn't support URIs very well now. The second one, though, is fully compliant and would certainly be doable now.

I am assuming the second and further ioslaves down the chain don't need hostnames, ports, usernames or passwords. Can you think of a situation where they would be required?

(I can imagine people wanting to nest fish inside fish for proxying, but I don't think we should support that)
Comment 6 Andy 2005-03-25 10:22:00 UTC
When you say the second way is doable now, do you mean for 3.5? That would be great, in whatever form, especially if it gets discussions about ioslave architecture for 4.0 going.

I also prefer the first scheme though, because it's cleaner and more readable.

The fish-inside-fish scenario would actually be quite useful, e.g. it would allow me to browse my computer at work while having to go through my department's ssh gateway, but it doesn't seem to fit the kind of URI nesting we're discussing here. 

But extending the fish URI scheme to support host chains might do the job. (Should I file a separate wish for that?)

  fish://user1@host1//user2@host2/dir/file

Apart from that, one could imagine ioslaves that support encrypted files or archives, thus requiring username or password.

If something like the multi: scheme could be done with URIs, then how about the scheme I had originally suggested? It doesn't necessarily have to use parentheses; perhaps square brackets or curly braces would be preferable, e.g.:

  tar:(file:/dir/archive.tar)/file
  gzip:[sftp://user@host/dir/doc.ps.gz]
  bzip2:{tar:{fish://host/archive.tar}/file.bz2}

I think any of those would be cleaner and more readable than the multi: scheme. (Things like gzip and bzip2 wouldn't really require the bracketing, but I think it's needed for consistency. The current gzip:/dir/file style URLs could still be supported for compatibility.)

The difference in syntax of course also suggests a difference in implementation. The multi: scheme would only require one new ioslave that would be able to glue existing slaves together, probably using temporary files.

The bracketing scheme on the other hand would require changes to all the relevant ioslaves, replacing code for accessing the local file system with code for emulating random access to the nested ioslave. Much of that though could be factored out into a common base class (KIO::FilterSlave?). 

This would also prepare it nicely for ioslaves that directly support random access, whereas the multi scheme couldn't do that without making substantial changes to the filter-style ioslaves too.
Comment 7 Jos van den Oever 2005-03-31 14:53:56 UTC
Adding nestability to kioslaves is a wonderful idea. When extending the kioslave implementation like this, please consider arbitrary nesting depth. A common example where this is highly useful is nested archives, e.g. a text file in a tar.bz2 in a tar.gz in a zip file.

Catting such a file is relatively easy (unzip -p file.zip file.tar.gz | tar xzO file.tar.bz2 | tar xjO file.txt). Trying to fit access to such a file in a URI is, as the above discussion illustrates, difficult, especially if it is to be done in a user-friendly manner.

The advantages of nested kioslaves are, however, huge. A good example would be a file indexer that would automatically be able to read all files on disc, regardless of whether they are in some sort of archive. A program that would greatly benefit from this is Kat (http://kat.sourceforge.net/).


Comment 8 Thiago Macieira 2005-04-01 00:33:10 UTC
> Catting such a file is relatively easy 
> (unzip -p file.zip file.tar.gz | tar xzO file.tar.bz2 | tar xjO file.txt)

The URL form I proposed (KDE3):
multi:///?p1=file,/home/me/file.zip&p2=zip,/file.tar.gz&p3=gzip&p4=tar,/file.tar.bz2&p5=bzip2&p6=tar,/file.txt

The URI for KDE4 that I proposed:
multi:file:///home/me/file.zip,zip:/file.tar.gz,gzip,tar:/file.tar.bz2,bzip2,tar:/file.txt

The URI Andy proposed:
tar:{bzip2:{tar:{gzip:{zip:{file:///home/me.file.zip}/file.zip}}/file.tar.bz2}}/file.txt
Comment 9 Jos van den Oever 2005-04-01 08:46:30 UTC
Just to check the proposals, I ran the URIs through QUrl and java.net.URI.
Only number 1 and 2 are valid according to java.net.URI and only number 2 and 3 are valid according to QUrl (v3.2).

Comment 10 Jos van den Oever 2005-04-01 08:48:27 UTC
Created attachment 10462 [details]
C++ code to check URIs
Comment 11 Jos van den Oever 2005-04-01 08:49:07 UTC
Created attachment 10463 [details]
java code to check URIs
Comment 12 Andy 2005-04-01 11:57:20 UTC
Looking at these examples, the bracketing scheme appears rather awkward to parse for a human, because it's difficult to match the curly braces without explicitly counting them. Of course in the usual cases with only one or two nesting levels this wouldn't be a problem, but still ...

The multi scheme deals better with deep nesting, but it has other problems.

First, 'multi:' itself is an implementation detail that shouldn't be visible to the user, who is only interested in where the data is, not how to glue components for accessing it together.

Second, it raises the question of how to deal with nested multis.

Third, the syntax does not reflect the fact that data filters like 'tar' or 'gzip' are conceptually different from data sources like 'http' or 'file'. Therefore, it permits things like these:

  multi:file:///home/me/file.zip,http://host
  multi:gzip:,zip:/file

Because of the problems with both the bracketing and the multi schemes, I would like to throw the pipe syntax I had mentioned earlier back into the mix:

  file:///home/me/file.zip|zip/file.tar.gz|gzip|tar/file.tar.bz2|bzip2|tar/file.txt

This copes well with deep nesting and clearly distinguishes between data sources and filters.
Comment 13 Thiago Macieira 2005-04-01 13:01:06 UTC
> I ran the URIs through QUrl

Well, you can't do that. They are URIs, not URLs. Only the first one is supposed to be a URL and should validate against all. That indicates a problem in QUrl somewhere.

> First, 'multi:' itself is an implementation detail that shouldn't be visible
> to the user, who is only interested in where the data is, not how to glue
> components for accessing it together.

The idea is that Konqueror creates those automatically. If you click a zip file in file://home/me, it'll automatically generate the URL multi:///?p1=file,/home/me/file.zip&p2=zip,/

The user will see a mess in the Location, but it'll work. Hence the idea of a cleaner URI for KDE4.

> Therefore, it permits things like these: 
> 
>    multi:file:///home/me/file.zip,http://host 
>    multi:gzip:,zip:/file

We would enforce that only the first protocol is allowed a username-password-hostname-port part. That is, everything after the first one has just the first slash.

As for the second option you gave, that would be an error because no file was given.

> I would like to throw the pipe syntax I had mentioned earlier back into the
> mix: 
> 
>   file:///home/me/file.zip|zip/file.tar.gz|gzip|tar/file.tar.bz2|bzip2|tar/file.txt

The big problem here is that that's a URL and we can't change its meaning. That URL means file "file.zip|zip/file.tar.gz|gzip|tar/file.tar.bz2|bzip2|tar/file.txt" in /home/me on your local filesystem. That can't be changed.

We could add in a multi: prefix, which would allow us to change the meaning of the special characters. But in that case, we would stumble upon what I had proposed as URI, only with pipes instead of commas.
Comment 14 Jos van den Oever 2005-04-01 15:04:08 UTC
On 4/1/2005, "Thiago Macieira" <thiago@kde.org> wrote:
>------- Additional Comments From thiago kde org  2005-04-01 13:01 -------
>> I ran the URIs through QUrl
>
>Well, you can't do that. They are URIs, not URLs. Only the first one is supposed to be a URL and should validate against all. That indicates a problem in QUrl somewhere.


You're right, in Qt4 you can, but in Qt3 you can't. In Qt4 QUrl is
really a URI!
http://doc.trolltech.com/4.0/qurl.html#isValid

When using Qt4, all 3 URI's are seen as valid. It is beta software
though.

A nice example of nested URIs is the Active URI specification:
http://www.1060research-server-1.co.uk/docs/2.0.2/book/advdev/doc_guide_compoundURI.html
http://www.1060research-server-1.co.uk/docs/2.0.2/book/introduction/doc_intro_concepts_requests.html
The goal is not completely equal because the URI's are named and typed,
which isn't necessary for the kioslaves. The choice of separators there
is '@' and '+'. Important remark: "In order to be valid a URI a
compound URI must be carefully escaped."

This format is similar to
multi:file:///home/me/file.zip,zip:/file.tar.gz,gzip,tar:/file.tar.bz2,bzip2,tar:/file.txt
which, more generally formulated is
multi:{escapted_uri},{escaped_uri},...
I think this is the most elegant solution. One could argue about what
delimiter to use.
I'm in favor of the pipe.

Would it be possible to make a piping kioslave for KDE3? Instead of
starting from file:/home/user, you could then start from
'multi:file:/home/user'. In KDE4 one could then extend the behavior
such that the linked URI can be a multi: automatically, when needed.
Comment 15 Thiago Macieira 2005-05-15 19:31:35 UTC

*** This bug has been marked as a duplicate of 73821 ***