Bug 450650

Summary: URL encoded chars in feed-entry-link-href become invalid - replaced by question marks
Product: [Applications] akregator Reporter: Delian Krustev <krustev>
Component: feed parserAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: major CC: montel
Priority: NOR    
Version: 5.15.3   
Target Milestone: ---   
Platform: Debian stable   
OS: Linux   
Latest Commit: Version Fixed In: 5.19.3

Description Delian Krustev 2022-02-21 13:13:58 UTC
SUMMARY

The Atom feed can be seen here:

  https://blog.krustev.net/atom.xml

Here's a feed entry:

<entry>
<id>https://blog.krustev.net/blog/%D0%97%D0%B0+%D0%B7%D0%B0%D0%BF%D0%B5%D1%82%D0%B0%D0%B9%D0%BA%D0%B8%D1%82%D0%B5</id>
<title>За запетайките</title>
<published>2022-02-15T16:44:29+02:00</published>
<updated>2022-02-15T16:44:29+02:00</updated>
<link rel="alternate" type="text/html" href="https://blog.krustev.net/blog/%D0%97%D0%B0+%D0%B7%D0%B0%D0%BF%D0%B5%D1%82%D0%B0%D0%B9%D0%BA%D0%B8%D1%82%D0%B5"/>
<summary>Нужно ли е да има толкова много правила за поставяне на запетайки ?</summary>
</entry>


The URLs of the blog are user controlled and they can contain all sort of chars
like international language, question marks, double quotes, gt, lt, etc.
Thus the links have to be URL encoded in order to be valid.

STEPS TO REPRODUCE
1. Create an article with a URL which needs URL encoding.
2. Open the article in Akregator
3. Right click on "Complete story" and "Copy the link address"

OBSERVED RESULT
When copied the following text gets into the clipboard:

https://blog.krustev.net/blog/??+???????????

And is displayed the same way on the status bar when "hovered" (with all percent encoded chars as question marks) .

EXPECTED RESULT
The link should be displayed as URL decoded in the status bar:

  https://blog.krustev.net/blog/За+запетайките

And the encoded link should be "copied" or passed to the browser when clicked.

Various other feeds work with these links as expected. E.g. gnome-feeds, liferea.
Web based ones also: feedly.com, feedburner.com, feeder.co


SOFTWARE/OS VERSIONS
Up-to-date versions from Debian BullsEye:

KDE Plasma Version: kde-plasma-desktop version 5:111
KDE Frameworks Version: 5.78.0
Qt Version: 5.15.2
Comment 1 Laurent Montel 2022-02-22 06:49:35 UTC
I confirm it.
It's when we extract info from feed that it's not converted correctly
Comment 2 Laurent Montel 2022-02-22 07:09:43 UTC
Git commit 4c061653e32af8944a78fdd1c3f3edda0b106b44 by Laurent Montel.
Committed on 22/02/2022 at 07:08.
Pushed by mlaurent into branch 'release/21.12'.

Fix bug 450650:  URL encoded chars in feed-entry-link-href become invalid - replaced by question marks

Fix store encoding url

FIXED-IN: 5.19.3

M  +2    -2    plugins/mk4storage/feedstoragemk4impl.cpp

https://invent.kde.org/pim/akregator/commit/4c061653e32af8944a78fdd1c3f3edda0b106b44
Comment 3 Laurent Montel 2022-02-22 07:11:28 UTC
Hi,
The bug was how we stored url. We stored as latin1 but we really need to store as utf8
you need to remove feed + reimport otherwise all info will be stored as latin1

Regards