Bug 507265 - SRU search using Dublin Core returns wrong/empty metadata
Summary: SRU search using Dublin Core returns wrong/empty metadata
Status: RESOLVED UPSTREAM
Alias: None
Product: tellico
Classification: Applications
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Robby Stephenson
URL:
Keywords:
Depends on: 507523
Blocks:
  Show dependency treegraph
 
Reported: 2025-07-20 09:49 UTC by Karl Ove Hufthammer
Modified: 2025-08-16 12:17 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Karl Ove Hufthammer 2025-07-20 09:49:16 UTC
SUMMARY
Running a SRU search using Dublin Core as the output format only returns a single search result, and where all metadata is empty.


STEPS TO REPRODUCE
1. Add a SRU source with the follow settings:
   Scheme: http
   Host: brgbib.bib.no
   Port: 80
   Path: /cgi-bin/sru
   Format: Dublin Core
2. Run a web search using the title "kokebok" (without the quotes)

OBSERVED RESULT
A single search result (row) is returned, and all metadata fields are empty. No error message is displayed.


EXPECTED RESULT
Multiple search results should be returned, and the metadata (title, author etc.) properly extracted.

Here is the URL for the raw search results:
https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22

As you can see, multiple books are listed, e.g.:
Asiatisk : en visuell kokebok : fremgangsmåten bilde for bilde
Baking : en visuell kokebok : fremgangsmåten bilde for bilde
Barnas fargerike kokebok : spis grønt, rødt, gult og lilla!


SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20250718
KDE Plasma Version: 6.4.3
KDE Frameworks Version: 6.16.0
Qt Version: 6.9.1
Kernel Version: 6.15.6-1-default (64-bit)
Graphics Platform: X11


ADDITIONAL INFORMATION
I observe this in Tellico 4.1.2, but this wasn’t an option in Bugzilla.

I have also tested other output formats (MODS, MARCXML, PAM), but neither work. They either result in no search results & no error message (MODS, MARCXML) or an empty search result (PAM).
Comment 1 Karl Ove Hufthammer 2025-07-20 10:04:50 UTC
BTW, the server also supports MarcXchange (which I understand is a superset of MARCXML?):
https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=marcxchange&query=dc.title%3D%22kokebok%22
Comment 2 Robby Stephenson 2025-07-24 19:35:02 UTC
(In reply to Karl Ove Hufthammer from comment #0)
> Running a SRU search using Dublin Core as the output format only returns a
> single search result, and where all metadata is empty.

This is rather interesting. The SRU server returns different data to a KDE-based http request than for others. I haven't yet figured out what causes that. I've tried switching user-agents and every other http request header that I can think of so far.

You can compare
kioclient cat "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22"
to
curl "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22"

In the course of testing that, I did find other issues with the schema that I've fixed. Thanks for the bug report. I'll continue investigating.
Comment 3 Robby Stephenson 2025-07-24 19:39:53 UTC
Git commit dfbd56530369df47a9f783f3ae28a6acac5031dc by Robby Stephenson.
Committed on 24/07/2025 at 19:38.
Pushed by rstephenson into branch 'master'.

Improve SRU result parsing

Include additional Dublin Core and marcXchange namespaces. Correct a
typo in the editor request query info.

FIXED-IN: 4.1.3

M  +7    -11   src/fetch/srufetcher.cpp
M  +5    -1    src/tests/srufetchertest.cpp
M  +46   -25   xslt/srw2tellico.xsl

https://invent.kde.org/office/tellico/-/commit/dfbd56530369df47a9f783f3ae28a6acac5031dc
Comment 4 Robby Stephenson 2025-07-25 00:59:02 UTC
(In reply to Robby Stephenson from comment #2)
> (In reply to Karl Ove Hufthammer from comment #0)
> > Running a SRU search using Dublin Core as the output format only returns a
> > single search result, and where all metadata is empty.
> 
> This is rather interesting. The SRU server returns different data to a
> KDE-based http request than for others. I haven't yet figured out what
> causes that. I've tried switching user-agents and every other http request
> header that I can think of so far.

Still no luck with http header changes but I did discover that sending a POST request works (where the default GET does no), so I'll add that as an option
Comment 5 Robby Stephenson 2025-07-25 00:59:54 UTC
(In reply to Karl Ove Hufthammer from comment #1)
> BTW, the server also supports MarcXchange (which I understand is a superset
> of MARCXML?):
> https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.
> 1&maximumRecords=25&recordSchema=marcxchange&query=dc.title%3D%22kokebok%22

You can enter marcXchange as a custom method which should work (though the needed schema changes aren't in 4.1.2).
Comment 6 Robby Stephenson 2025-07-25 01:00:23 UTC
Git commit 4d3a39542a05b4a2f4f8045f433a56cb9c71e617 by Robby Stephenson.
Committed on 25/07/2025 at 00:57.
Pushed by rstephenson into branch 'master'.

Allow for POST SRU requests

M  +5    -0    ChangeLog
M  +30   -8    src/fetch/srufetcher.cpp
M  +26   -0    src/tests/srufetchertest.cpp
M  +1    -0    src/tests/srufetchertest.h

https://invent.kde.org/office/tellico/-/commit/4d3a39542a05b4a2f4f8045f433a56cb9c71e617
Comment 7 Robby Stephenson 2025-07-25 02:10:30 UTC
Git commit 243b58baa77f9ecce9eac79136e77fcd43447621 by Robby Stephenson.
Committed on 25/07/2025 at 01:00.
Pushed by rstephenson into branch '4.1'.

Allow for POST SRU requests

M  +5    -0    ChangeLog
M  +30   -8    src/fetch/srufetcher.cpp
M  +26   -0    src/tests/srufetchertest.cpp
M  +1    -0    src/tests/srufetchertest.h

https://invent.kde.org/office/tellico/-/commit/243b58baa77f9ecce9eac79136e77fcd43447621
Comment 8 Karl Ove Hufthammer 2025-07-25 07:45:45 UTC
(In reply to Robby Stephenson from comment #2)
> This is rather interesting. The SRU server returns different data to a
> KDE-based http request than for others. I haven't yet figured out what
> causes that. I've tried switching user-agents and every other http request
> header that I can think of so far.

Thanks for looking into this! I’ve done some testing. For some reason, kioclient seems to send a content-length header with a value of zero (i.e., a request to not return a content body?!). You can duplicate the kioclient results with:

curl -H "content-length: 0" "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22"
Comment 9 Robby Stephenson 2025-07-26 01:41:57 UTC
(In reply to Karl Ove Hufthammer from comment #8)
> (In reply to Robby Stephenson from comment #2)
> > This is rather interesting. The SRU server returns different data to a
> > KDE-based http request than for others. I haven't yet figured out what
> > causes that. I've tried switching user-agents and every other http request
> > header that I can think of so far.
> 
> Thanks for looking into this! I’ve done some testing. For some reason,
> kioclient seems to send a content-length header with a value of zero (i.e.,
> a request to not return a content body?!). 

Interesting. From what I read, that seems to indicate there's no body in the http request itself (which is true) but curl doesn't send that in typical requests, I guess. You caught that over the wire? I wasn't seeing that in the outgoingMetaData in the KIO job, but it may be hiding it somehow.
Comment 10 Robby Stephenson 2025-07-26 19:44:04 UTC
The pointer was helpful. I did confirm that the Content-Length: 0 header is being sent, in contrast to curl and even when using QNetworkAccessManager/QNetworkRequest directly. RFC9110, https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length, says:
> A user agent SHOULD NOT send a Content-Length header field when the request message does not contain content
> and the method semantics do not anticipate such data.
An older one, RFC2616, https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13, is worded differently.
> Any Content-Length greater than or equal to zero is a valid value.
There doesn't seem to be anything that distinguishes method semantics in the older RFC.

I'll try to pull together a separate bug report against KIO and also see if there's a workaround that Tellico could do to block the header form being included. From my quick glance through the code, I didn't see one (nor actually where the Content-Length header is even being set for the GET request).

There are various reports around the web of this being an issue for some CDNs.
https://github.com/http-kit/http-kit/issues/583
https://github.com/httprb/http/issues/487

In the meantime, the brgbib server seems to work with SRU POST requests. I've now added support for specifying a POST request for the next Tellico release.
Comment 11 Robby Stephenson 2025-08-16 12:17:00 UTC
BUG507523 has been fixed, so I think the underlying issue in KIO will work, as of Frameworks 6.18.0 when it comes out. At that point, you could use Tellico with the BRGBIB SRU server. Before that, you can use the just-released Tellico 4.1.3 and configure it to do SRU requests via PUSH instead of GET, which seems to work. Thanks again for reporting and helping with the issue!