Bug 204423

Summary: Spaces in Domains makes SMB:// useless
Product: [Frameworks and Libraries] kio-extras Reporter: Vladimir <vladimiroski>
Component: SambaAssignee: Harald Sitter <sitter>
Status: RESOLVED FIXED    
Severity: normal CC: altheimer, faure, K_2005, nate, sitter, thiago
Priority: NOR    
Version: 18.04.2   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In: 20.04
Sentry Crash Report:

Description Vladimir 2009-08-19 17:20:29 UTC
Version:           4.3.0 (using KDE 4.3.0)
Compiler:          deb http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu jaunty main KDE 4.3.0 downloaded from that PPA.
OS:                Linux
Installed from:    Ubuntu Packages

I'm in a office network of wich network's domain name was selected long time ago and it included spaces in it. Ej. "Office C.V.".

Problem arises when you try to navigate the network using "smb://Office C.V." but you can actually see the Network if you just type "smb://".

It seems that Konqueror (and Dolphin too) are unable to process the space in the Domain name and fails to open it.

I've had problems mounting shares with spaces in fstab, but that can be worked around by using "\040" instead of the spaces.

I've tried everything in Konqueror/Dolphin to access this network, some attempts are:

* smb://Office\040C.V.
 Konqueror complains about wrong URL format
 Dolphin says it's an invalid URL

* smb://Office+C.V.
 Konqueror & Dolphin says that "server can't be contacted".


This is a big bug for me, as I have to do "smbtree" to find the individuals PC on the network and use "smb://<name>" to skip the domain issue.

But when it comes to someone having spaces in his network's name... well, there I have to use fstab and the \040 trick.

That is impractical to do, it's as impractical as telling everyone to change the domain name just because me.
Comment 1 Vladimir 2009-08-26 00:41:06 UTC
Well, system crashed and I decided to try Ubuntu, Nautilus does not suffer from this bug and so I don't think I'm going back to KDE while having this inconvenience.

I think I'll not be able to test this anymore.
Comment 2 Nate Graham 2017-10-28 15:31:44 UTC
*** Bug 246471 has been marked as a duplicate of this bug. ***
Comment 3 Nate Graham 2017-10-28 15:39:38 UTC
*** Bug 195012 has been marked as a duplicate of this bug. ***
Comment 4 Harald Sitter 2020-02-06 12:10:04 UTC
CCing David Faure for some input

This is either a bug in QUrl or not a bug at all.

We do

```
               QUrl u("smb://");
               u.setHost(dirpName);
```

to which QUrl says
> Invalid hostname (contains invalid characters); source was \"FOO BAR\"; scheme = \"smb\", host = \"\"

https://tools.ietf.org/html/rfc3986#section-3.2

defines host as
>   host          = IP-literal / IPv4address / reg-name`
of which the only relevant group for the bug is reg-name:
>      reg-name    = *( unreserved / pct-encoded / sub-delims )
of which no group would allow for spaces except for pct-encoded, assuming the space is percent-encoded of course.

Which would make this a QUrl bug if the RFC didn't also explicitly say:

> URI producing
>   applications must not use percent-encoding in host unless it is used
>   to represent a UTF-8 character sequence.

I **think** that is meant to say that one must not percent-encode if the character is plain ASCII, so by extension a space cannot be part of reg-name at all.

OTOH I ran smb://FOO%20BAR/ through a bunch of other rfc3986/7 implementation and they all found it to be perfectly valid.

So, I am really not sure.

Iff spaces cannot be expressed, then spaces in workgroup and domains are probably not supportable as it'd impair URI portability. Also QUrl would then be behaving correctly in declaring the URI invalid, and we use QUrl all over the place, so that'd be a bit of a problem.
Comment 5 David Faure 2020-02-10 17:27:59 UTC
Outdated RFC, you need to read about IDN and punycode.

https://en.wikipedia.org/wiki/Internationalized_domain_name

Thiago is the expert about these things.
Comment 6 Thiago Macieira 2020-02-13 17:27:50 UTC
QUrl's behaviour is intentional. The hostname component of the URL has to be a valid hostname.

Do not store anything in that component that is not a hostname. Like a workgroup name. Store that elsewhere.
Comment 7 Harald Sitter 2020-02-13 18:10:02 UTC
Thanks.

With that in mind we cannot really support spaces while also following the smb URI format [1]. I suppose we'll just have to deviate a bit iff the workgroup name contains a space by using a variant of the notation that stuffs the workgroup into the userinfo `smb://work group;@/` and then translate that back to an smb URI for libsmbclient again. Means the urls wont be portable but at least navigation within our tech works.

[1] https://www.iana.org/assignments/uri-schemes/prov/smb
Comment 8 Thiago Macieira 2020-02-13 18:38:42 UTC
(In reply to Harald Sitter from comment #7)
> Thanks.
> 
> With that in mind we cannot really support spaces while also following the
> smb URI format [1]. I suppose we'll just have to deviate a bit iff the
> workgroup name contains a space by using a variant of the notation that
> stuffs the workgroup into the userinfo `smb://work group;@/` and then
> translate that back to an smb URI for libsmbclient again. Means the urls
> wont be portable but at least navigation within our tech works.
> 
> [1] https://www.iana.org/assignments/uri-schemes/prov/smb

You may need the user info for the actual user name that is being used to search that work group. I would recommend using the path or query component instead:

smb://userwg;user:password@/browsed_workgroup
smb://userwg;user:password@/?=search=browsed_workgroup

This searches the workgroup named "browsed_workgroup" with the user "userwg\user".

The query has the added benefit a server inside the workgroup is a proper sub-URL:

smb://userwg;user:password@server/share/folder/file.txt?search=browsed_workgroup

That is,

  QUrl wg("smb://user@/?search=WG");
  QUrl relative("/share/folder/file.txt");
  qDebug() << wg.resolved(relative);  // "smb://user@/share/folder/file.txt?search=WG"
Comment 9 Harald Sitter 2020-03-03 13:13:27 UTC
https://phabricator.kde.org/D27804
Comment 10 Harald Sitter 2020-04-06 09:29:09 UTC
Git commit f40191a147c9643717fda1cf9d1f42c526550893 by Harald Sitter.
Committed on 06/04/2020 at 09:27.
Pushed by sitter into branch 'master'.

smb: add hack to support spaces in workgroup names

Summary:
workgroup names are as best I can tell always still netbios names which
means they can contain a bunch of characters ordinarily not found in valid
host names. e.g. spaces
this causes trouble with the IANA SMB URI draft, as used by libsmbc,
since the workgroup would be the host field of the RI when browsing
a workgroup (i.e. filtering hosts that are member of a given workgroup)
because QUrl does not allow invalid hostnames in the host field.

to bypass this problem we now put the workgroup name into the query of the
url as `kio-workgroup`, should it cause trouble in the host field. SMBUrl
takes this query into account when constructing the url for smbc.
since the latter has uniquely exciting potential for breakage this entire
dance is only done when absolutely necessary and otherwise we continue with
all the same code and behavior as without this commit.

on a side note: the awkward name flexibility seems to not extend to
computer names anymore (supposedly because of LLMNR) so this entire
use case is already very niche as we (and libsmbclient) currently only
support workgroup browsing for NT1 networks, and NT1 is by default not
supported on windows10 or samba.

FIXED-IN: 20.04

Test Plan: builds, test passes, can browse workgroup with space in name

Reviewers: ngraham

Subscribers: kde-frameworks-devel, kfm-devel, thiago

Tags: #dolphin, #frameworks

Differential Revision: https://phabricator.kde.org/D27804

M  +35   -0    smb/autotests/smburltest.cpp
M  +11   -1    smb/kio_smb_browse.cpp
M  +57   -4    smb/smburl.cpp

https://commits.kde.org/kio-extras/f40191a147c9643717fda1cf9d1f42c526550893