Summary: | SRU search using Dublin Core returns wrong/empty metadata | ||
---|---|---|---|
Product: | [Applications] tellico | Reporter: | Karl Ove Hufthammer <karl> |
Component: | general | Assignee: | Robby Stephenson <robby> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | ||
Priority: | NOR | ||
Version First Reported In: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Bug Depends on: | 507523 | ||
Bug Blocks: |
Description
Karl Ove Hufthammer
2025-07-20 09:49:16 UTC
BTW, the server also supports MarcXchange (which I understand is a superset of MARCXML?): https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=marcxchange&query=dc.title%3D%22kokebok%22 (In reply to Karl Ove Hufthammer from comment #0) > Running a SRU search using Dublin Core as the output format only returns a > single search result, and where all metadata is empty. This is rather interesting. The SRU server returns different data to a KDE-based http request than for others. I haven't yet figured out what causes that. I've tried switching user-agents and every other http request header that I can think of so far. You can compare kioclient cat "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22" to curl "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22" In the course of testing that, I did find other issues with the schema that I've fixed. Thanks for the bug report. I'll continue investigating. Git commit dfbd56530369df47a9f783f3ae28a6acac5031dc by Robby Stephenson. Committed on 24/07/2025 at 19:38. Pushed by rstephenson into branch 'master'. Improve SRU result parsing Include additional Dublin Core and marcXchange namespaces. Correct a typo in the editor request query info. FIXED-IN: 4.1.3 M +7 -11 src/fetch/srufetcher.cpp M +5 -1 src/tests/srufetchertest.cpp M +46 -25 xslt/srw2tellico.xsl https://invent.kde.org/office/tellico/-/commit/dfbd56530369df47a9f783f3ae28a6acac5031dc (In reply to Robby Stephenson from comment #2) > (In reply to Karl Ove Hufthammer from comment #0) > > Running a SRU search using Dublin Core as the output format only returns a > > single search result, and where all metadata is empty. > > This is rather interesting. The SRU server returns different data to a > KDE-based http request than for others. I haven't yet figured out what > causes that. I've tried switching user-agents and every other http request > header that I can think of so far. Still no luck with http header changes but I did discover that sending a POST request works (where the default GET does no), so I'll add that as an option (In reply to Karl Ove Hufthammer from comment #1) > BTW, the server also supports MarcXchange (which I understand is a superset > of MARCXML?): > https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1. > 1&maximumRecords=25&recordSchema=marcxchange&query=dc.title%3D%22kokebok%22 You can enter marcXchange as a custom method which should work (though the needed schema changes aren't in 4.1.2). Git commit 4d3a39542a05b4a2f4f8045f433a56cb9c71e617 by Robby Stephenson. Committed on 25/07/2025 at 00:57. Pushed by rstephenson into branch 'master'. Allow for POST SRU requests M +5 -0 ChangeLog M +30 -8 src/fetch/srufetcher.cpp M +26 -0 src/tests/srufetchertest.cpp M +1 -0 src/tests/srufetchertest.h https://invent.kde.org/office/tellico/-/commit/4d3a39542a05b4a2f4f8045f433a56cb9c71e617 Git commit 243b58baa77f9ecce9eac79136e77fcd43447621 by Robby Stephenson. Committed on 25/07/2025 at 01:00. Pushed by rstephenson into branch '4.1'. Allow for POST SRU requests M +5 -0 ChangeLog M +30 -8 src/fetch/srufetcher.cpp M +26 -0 src/tests/srufetchertest.cpp M +1 -0 src/tests/srufetchertest.h https://invent.kde.org/office/tellico/-/commit/243b58baa77f9ecce9eac79136e77fcd43447621 (In reply to Robby Stephenson from comment #2) > This is rather interesting. The SRU server returns different data to a > KDE-based http request than for others. I haven't yet figured out what > causes that. I've tried switching user-agents and every other http request > header that I can think of so far. Thanks for looking into this! I’ve done some testing. For some reason, kioclient seems to send a content-length header with a value of zero (i.e., a request to not return a content body?!). You can duplicate the kioclient results with: curl -H "content-length: 0" "https://brgbib.bib.no/cgi-bin/sru?operation=searchRetrieve&version=1.1&maximumRecords=25&recordSchema=dc&query=dc.title%3D%22kokebok%22" (In reply to Karl Ove Hufthammer from comment #8) > (In reply to Robby Stephenson from comment #2) > > This is rather interesting. The SRU server returns different data to a > > KDE-based http request than for others. I haven't yet figured out what > > causes that. I've tried switching user-agents and every other http request > > header that I can think of so far. > > Thanks for looking into this! I’ve done some testing. For some reason, > kioclient seems to send a content-length header with a value of zero (i.e., > a request to not return a content body?!). Interesting. From what I read, that seems to indicate there's no body in the http request itself (which is true) but curl doesn't send that in typical requests, I guess. You caught that over the wire? I wasn't seeing that in the outgoingMetaData in the KIO job, but it may be hiding it somehow. The pointer was helpful. I did confirm that the Content-Length: 0 header is being sent, in contrast to curl and even when using QNetworkAccessManager/QNetworkRequest directly. RFC9110, https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length, says: > A user agent SHOULD NOT send a Content-Length header field when the request message does not contain content > and the method semantics do not anticipate such data. An older one, RFC2616, https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13, is worded differently. > Any Content-Length greater than or equal to zero is a valid value. There doesn't seem to be anything that distinguishes method semantics in the older RFC. I'll try to pull together a separate bug report against KIO and also see if there's a workaround that Tellico could do to block the header form being included. From my quick glance through the code, I didn't see one (nor actually where the Content-Length header is even being set for the GET request). There are various reports around the web of this being an issue for some CDNs. https://github.com/http-kit/http-kit/issues/583 https://github.com/httprb/http/issues/487 In the meantime, the brgbib server seems to work with SRU POST requests. I've now added support for specifying a POST request for the next Tellico release. BUG507523 has been fixed, so I think the underlying issue in KIO will work, as of Frameworks 6.18.0 when it comes out. At that point, you could use Tellico with the BRGBIB SRU server. Before that, you can use the just-released Tellico 4.1.3 and configure it to do SRU requests via PUSH instead of GET, which seems to work. Thanks again for reporting and helping with the issue! |