SUMMARY `baloosearch` couldn't locate a word processing file with a term in it. It was a .doc file, not .docx or .odt. STEPS TO REPRODUCE 1. In LibreOffice Writer, create a document containing just "baloopleaseindexme" 2. File > Save As in Word 97-2003 format as baloo_indexing_test.doc in some directory that Baloo indexes. 3. In a terminal, run `baloosearch baloopleaseindexme` 4. In a terminal, run `balooshow -x /path/to/baloo_indexing_test.doc OBSERVED RESULT The document contents aren't indexed, so baloosearch for the content fails. balooshow doesn't list any words in the document, just Terms: Mapplication Mmsword T5 X19-0 X20-0 EXPECTED RESULT baloo should index these files as it does .odt and .docx files. SOFTWARE/OS VERSIONS Linux/KDE Plasma: KDE Plasma Version: 5.21.5 KDE Frameworks Version: 5.82.0 Qt Version: 5.15.2 on Wayland ADDITIONAL INFORMATION There are tools to extract text from MSOffice files, e.g. % flatpak run org.libreoffice.LibreOffice --invisible --convert-to txt --outdir /tmp/ /path/to/baloo_indexing_test.doc will convert a .doc file to .txt. And TDF/DocumentLiberation project offers introspection tools like mso-dumper's doc-dump which dumps in some weird XML format. In the interim this limitation should be mentioned somewhere, but I can't see where Baloo describes the file types whose content it does index. I don't know if Baloo indexes contents of other MS Office 1990-2000 formats. Again, I should have to create test files to find out, known limitations should be documented.
Does Baloo use KFileMetaData extractors? https://invent.kde.org/frameworks/kfilemetadata/-/blob/master/src/extractors/officeextractor.cpp#L20 suggests that KFileMetaData relies on the external programs catdoc for application/msword, xls2csv for application/vnd.ms-excel, and catppt for application/vnd.ms-powerpoint. I have catdoc (and the others) installed, yet these .doc files didn't get indexed. Maybe if the programs baloo_file_extractor and baloo_filemetadata_temp_extractor were documented, I could run them by hand and figure out what's going on.
So it turns out Baloo did and can index contents of other .doc files, e.g. external .doc files I received in 2016 and earlier, and `catdoc` displays their contents; but catdoc doesn't display anything for the contents of the recent .doc file I received or the .doc file generated by LibreOffice 7.1.3.2 that Baloo doesn't index. I couldn't find any Linux utility that identifies the version of the Word file format that a .doc file uses, or whether it's been saved with Word's "Fast Save" feature. The two failing documents contain the string "Microsoft Word-Dokument" near the front, whereas the working ones contain "Microsoft Word 9.0" or "Microsoft Word 97-2004 Document" near the end. So the problem here seems to be with KFileMetaData and its use of catdoc. I couldn't find a bug that catdoc doesn't support some Word file formats; its maintainer's CVStrac is dead, the most active bug list seems to be Debian's bug tracker.
The bug is still here in framework 5.100
if the problem is catdoc, antiword is a good alternative
Baloo uses kfilemetadata, and it clearly states it. Without providing a example file, this is not reproducible, and nothing can be done to enhance the file type support.
Pinak has been inactive for years. Default assignee is broken.
(In reply to skierpage from comment #0) > STEPS TO REPRODUCE > 1. In LibreOffice Writer, create a document containing just > "baloopleaseindexme" > 2. File > Save As in Word 97-2003 format as baloo_indexing_test.doc in some > directory that Baloo indexes. > 3. In a terminal, run `baloosearch baloopleaseindexme` > 4. In a terminal, run `balooshow -x /path/to/baloo_indexing_test.doc Maybe LibreOffice Writer has been fixed, I've just followed the steps with Version: 7.3.7.2 / LibreOffice Community on Neon testing, and I get: $ balooshow -x baloo_indexing_test.doc 1437d40000fc01 64513 1325012 baloo_indexing_test.doc [/home/test/testfiles/baloo_indexing_test.doc] Mtime: 1669018797 2022-11-21T09:19:57 Ctime: 1669018974 2022-11-21T09:22:54 Cached properties: Word Count: 1 Line Count: 1 Internal Info Terms: Mapplication Mmsword T5 X19-1 X20-1 baloopleaseindexme File Name Terms: Fbaloo Fdoc Findexing Ftest XAttr Terms: lineCount: 1 wordCount: 1 $ baloosearch baloopleaseindexme /home/test/testfiles/baloo_indexing_test.doc Elapsed: 0.25022 msecs I can probably look back at earlier releases and see if the behaviour has changed. Likely to be somewhat hit or miss though :-/
(In reply to tagwerk19 from comment #7) > (In reply to skierpage from comment #0) > > STEPS TO REPRODUCE > > 1. In LibreOffice Writer, create a document containing just > > "baloopleaseindexme" > > 2. File > Save As in Word 97-2003 format as baloo_indexing_test.doc in some > > directory that Baloo indexes. > > 3. In a terminal, run `baloosearch baloopleaseindexme` > > 4. In a terminal, run `balooshow -x /path/to/baloo_indexing_test.doc > Maybe LibreOffice Writer has been fixed, I've just followed the steps with > > Version: 7.3.7.2 / LibreOffice Community > > on Neon testing, and I get: > > $ balooshow -x baloo_indexing_test.doc > 1437d40000fc01 64513 1325012 baloo_indexing_test.doc > [/home/test/testfiles/baloo_indexing_test.doc] > Mtime: 1669018797 2022-11-21T09:19:57 > Ctime: 1669018974 2022-11-21T09:22:54 > Cached properties: > Word Count: 1 > Line Count: 1 > > Internal Info > Terms: Mapplication Mmsword T5 X19-1 X20-1 baloopleaseindexme > File Name Terms: Fbaloo Fdoc Findexing Ftest > XAttr Terms: > lineCount: 1 > wordCount: 1 > > $ baloosearch baloopleaseindexme > /home/test/testfiles/baloo_indexing_test.doc > Elapsed: 0.25022 msecs > > I can probably look back at earlier releases and see if the behaviour has > changed. Likely to be somewhat hit or miss though :-/ No, it should show also the content (keywords) indexed.
Created attachment 153915 [details] baloo test .doc (Libreoffice)
Created attachment 153916 [details] baloo test .doc WPS office
Created attachment 153917 [details] powerpoint by WPS
Created attachment 153918 [details] powerpoint by libreoffice
I attached some files (doc, ppt,xls), both from Libreoffice and WPS. Their content are not indexed by Baloo.
Created attachment 153919 [details] xls by libreoffice
Created attachment 153920 [details] xls by WPS office
(In reply to Guido from comment #8) > ... it should show also the content (keywords) indexed. What I see with "balooshow -x" is: > Terms: Mapplication Mmsword T5 X19-1 X20-1 baloopleaseindexme Where the "baloopleaseindexme" is the content. ... I think things are working here
(In reply to tagwerk19 from comment #16) > (In reply to Guido from comment #8) > > ... it should show also the content (keywords) indexed. > What I see with "balooshow -x" is: > > Terms: Mapplication Mmsword T5 X19-1 X20-1 baloopleaseindexme > Where the "baloopleaseindexme" is the content. > > ... I think things are working here can you upload your file? I would like to test it
(In reply to Guido from comment #9) > Created attachment 153915 [details] > baloo test .doc (Libreoffice) This is the baloo_test_Libreoffice_7.4.2.3.doc file and... (In reply to Guido from comment #10) > Created attachment 153916 [details] > baloo test .doc WPS office This is the baloo_test_WPS_Office.doc file Start with checking mime types... $ kmimetypefinder baloo_test_Libreoffice_7.4.2.3.doc application/msword $ kmimetypefinder baloo_test_WPS_Office.doc application/msword Both are "thought of" as MS word files.... If I set up debugging and move the two files to an indexed folder, I see: Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Folder cache: std::vector("/home/test/testfiles/": included) Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5660354579332097 "/home/test/testfiles/baloo_test_Libreoffice_7.4.2.3.doc" "application/msword" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: Fetching extractors for "application/msword" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5674107064613889 "/home/test/testfiles/baloo_test_WPS_Office.doc" "application/msword" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: Fetching extractors for "application/msword" and "balooshow -x" for each gives me: $ balooshow -x baloo_test_Libreoffice_7.4.2.3.doc 141c100000fc01 64513 1317904 baloo_test_Libreoffice_7.4.2.3.doc [/home/test/testfiles/baloo_test_Libreoffice_7.4.2.3.doc] Mtime: 1669049328 2022-11-21T17:48:48 Ctime: 1669049328 2022-11-21T17:48:48 Cached properties: Word Count: 84 Line Count: 4 Internal Info Terms: 14 2022 5 5.0 5.100.0 83 Mapplication Mmsword T5 X19-84 X20-4 a addon an and announcement announcements announces are available commonly developers for frameworks friendly functionality https hyperlink improvements in introduction is kde libraries licensing making manner mature monday monthly needed november of org part peer planned predictable provide qt quick release releases reviewed see series terms tested the this to today variety well which wide with � File Name Terms: F7.4.2.3 Fbaloo Fdoc Flibreoffice Ftest XAttr Terms: wordCount: 84 lineCount: 4 $ balooshow -x baloo_test_WPS_Office.doc 1428920000fc01 64513 1321106 baloo_test_WPS_Office.doc [/home/test/testfiles/baloo_test_WPS_Office.doc] Mtime: 1669049328 2022-11-21T17:48:48 Ctime: 1669049328 2022-11-21T17:48:48 Cached properties: Word Count: 85 Line Count: 4 Internal Info Terms: 14 2022 5 5.0 5.100.0 83 Mapplication Mmsword T5 X19-85 X20-4 a addon an and announcement announcements announces are available commonly developers for frameworks friendly functionality h https hyperlink improvements in introduction is kde libraries licensing making manner mature monday monthly needed november of org part peer planned predictable provide qt quick release releases reviewed see series terms tested the this to today variety well which wide with � File Name Terms: Fbaloo Fdoc Foffice Ftest Fwps XAttr Terms: wordCount: 85 lineCount: 4 Again, it seems that this is OK. I'm checked on a Neon Testing system with LibreOffice, presumably the LibreOffice from 22.04, installed. ... That's the good news.
interesting enough, on my system all files are seen as wps office by kmimetypefinder. I will try to remove the WPS mimetypes, or WPS itself.
(In reply to Guido from comment #11) > Created attachment 153917 [details] > powerpoint by WPS That's the baloo_test_WPS.ppt file.... (In reply to Guido from comment #12) > Created attachment 153918 [details] > powerpoint by libreoffice ... and the baloo_test_libreoffice.ppt Again, try the mime types... $ kmimetypefinder baloo_test_WPS.ppt application/vnd.ms-powerpoint $ kmimetypefinder baloo_test_libreoffice.ppt application/vnd.ms-powerpoint which look OK to an untutored eye. However for some reason baloo picks a more generic mimetype... Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5664426208328705 "/home/test/testfiles/baloo_test_WPS.ppt" "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No extractor for "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5691149494844417 "/home/test/testfiles/baloo_test_libreoffice.ppt" "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No extractor for "application/x-ole-storage" ... and balooshow shows the "application/x-ole-storage" mimetype, not the content $ balooshow -x baloo_test_WPS.ppt 141fc40000fc01 64513 1318852 baloo_test_WPS.ppt [/home/test/testfiles/baloo_test_WPS.ppt] Mtime: 1669049328 2022-11-21T17:48:48 Ctime: 1669049328 2022-11-21T17:48:48 Internal Info Terms: Mapplication Mole Mstorage Mx File Name Terms: Fbaloo Fppt Ftest Fwps XAttr Terms: $ balooshow -x baloo_test_libreoffice.ppt 1438120000fc01 64513 1325074 baloo_test_libreoffice.ppt [/home/test/testfiles/baloo_test_libreoffice.ppt] Mtime: 1669049328 2022-11-21T17:48:48 Ctime: 1669049328 2022-11-21T17:48:48 Internal Info Terms: Mapplication Mole Mstorage Mx File Name Terms: Fbaloo Flibreoffice Fppt Ftest XAttr Terms:
ok, I removed the WPS mimetypes and now > kmimetypefinder '/run/media/guido/nvme1/baloo test/baloo_test_Libreoffice_7.4.2.3.doc' application/msword nevertheless baloo doesn't index it: balooshow -x baloo_test_Libreoffice_7.4.2.3.doc 6d59800010305 66309 447896 baloo_test_Libreoffice_7.4.2.3.doc [/run/media/guido/nvme1/baloo test/baloo_test_Libreoffice_7.4.2.3.doc] Mtime: 1669025062 2022-11-21T11:04:22 Ctime: 1669053488 2022-11-21T18:58:08 Cached properties: Conto delle parole: 0 Conteggio righe: 0 Informazioni interne Termini: Mapplication Mmsword T5 X19-0 X20-0 Termini di nome di file: F7.4.2.3 Fbaloo Fdoc Flibreoffice Ftest XAttr termini: lineCount: 0 wordCount: 0
(In reply to Guido from comment #14) > Created attachment 153919 [details] > xls by libreoffice This is the baloo_test_libreoffice.xls file... (In reply to Guido from comment #15) > Created attachment 153920 [details] > xls by WPS office ... and the baloo_test_wps.xls The mimetypes are... $ kmimetypefinder baloo_test_libreoffice.xls application/vnd.ms-excel $ kmimetypefinder baloo_test_wps.xls application/vnd.ms-excel but, as with the .ppt files above, baloo treats the files as "application/x-ole-storage" and does find an extractor for them: Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5691454437522433 "/home/test/testfiles/baloo_test_libreoffice.xls" "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No extractor for "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing 5691463027457025 "/home/test/testfiles/baloo_test_wps.xls" "application/x-ole-storage" Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No extractor for "application/x-ole-storage" With the "balooshow -x" results.... $ balooshow -x baloo_test_libreoffice.xls 1438590000fc01 64513 1325145 baloo_test_libreoffice.xls [/home/test/testfiles/baloo_test_libreoffice.xls] Mtime: 1669049328 2022-11-21T17:48:48 Ctime: 1669049328 2022-11-21T17:48:48 Internal Info Terms: Mapplication Mole Mstorage Mx File Name Terms: Fbaloo Flibreoffice Ftest Fxls XAttr Terms: $ balooshow -x baloo_test_wps.xls 14385b0000fc01 64513 1325147 baloo_test_wps.xls [/home/test/testfiles/baloo_test_wps.xls] Mtime: 1669049991 2022-11-21T17:59:51 Ctime: 1669049991 2022-11-21T17:59:51 Internal Info Terms: Mapplication Mole Mstorage Mx File Name Terms: Fbaloo Ftest Fwps Fxls XAttr Terms: It's possible to get kmimetypefinder to consider "just" the filename or "just" the content: $ kmimetypefinder -f baloo_test_libreoffice.xls application/vnd.ms-excel $ kmimetypefinder -c baloo_test_libreoffice.xls application/x-ole-storage which suggests some confusion with priorities and "magic" in the mimetype database.
(In reply to tagwerk19 from comment #20) > $ kmimetypefinder baloo_test_libreoffice.ppt > application/vnd.ms-powerpoint This only checcks for the filename: $> echo "This is not a powerpoint document" > /tmp/foo.ppt $> kmimetypefinder /tmp/foo.ppt application/vnd.ms-powerpoint > which look OK to an untutored eye. However for some reason baloo picks a > more generic mimetype... > > Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing > 5664426208328705 "/home/test/testfiles/baloo_test_WPS.ppt" > "application/x-ole-storage" > Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No > extractor for "application/x-ole-storage" > Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.baloo: Indexing > 5691149494844417 "/home/test/testfiles/baloo_test_libreoffice.ppt" > "application/x-ole-storage" > Nov 21 17:59:51 testmc baloo_file_extractor[3120]: kf.filemetadata: No > extractor for "application/x-ole-storage" > > ... and balooshow shows the "application/x-ole-storage" mimetype, not the > content Bug in shared mime info, https://gitlab.freedesktop.org/xdg/shared-mime-info/ /usr/share/mime/packages/freedesktop.org.xml has : <mime-type type="application/msword"> <sub-class-of type="application/x-ole-storage"/> but application/vnd.ms-powerpoint has no sub-class-of. Dito for e.g. Access and Excel documents.
(In reply to Guido from comment #21) > ok, I removed the WPS mimetypes and now ... > ... > lineCount: 0 > wordCount: 0 That doesn't look right somehow... I have enabled debugging by creating a file ~/.config/QtProject/qtlogging.ini containing [Rules] kf.filemetadata=true kf.baloo=true and checked with journalctl for debug output, maybe you see something there...
(In reply to tagwerk19 from comment #24) > (In reply to Guido from comment #21) > > ok, I removed the WPS mimetypes and now ... > > ... > > lineCount: 0 > > wordCount: 0 > That doesn't look right somehow... > > I have enabled debugging by creating a file > ~/.config/QtProject/qtlogging.ini > containing > [Rules] > kf.filemetadata=true > kf.baloo=true > > and checked with journalctl for debug output, maybe you see something > there... I tried your suggestion, rebooted, stopped baloo, purged, reenabled but I have nothing in journald about indexing, only the message tha baloo is starting
(In reply to Stefan Brüns from comment #23) > Bug in shared mime info, https://gitlab.freedesktop.org/xdg/shared-mime-info/ > > /usr/share/mime/packages/freedesktop.org.xml has : > <mime-type type="application/msword"> > <sub-class-of type="application/x-ole-storage"/> > > but application/vnd.ms-powerpoint has no sub-class-of. Dito for e.g. Access > and Excel documents. Oh dear ... ... I'm guessing that means an Override.xml file 8-/
(In reply to Guido from comment #25) > (In reply to tagwerk19 from comment #24) > I tried your suggestion, rebooted, stopped baloo, purged, reenabled but I > have nothing in journald about indexing, only the message tha baloo is > starting I'll admit I've not fully understood how to get baloo to output debug messages. My experience so far if that, having set up the qtlogging.ini file, and I do a 'balooctl purge' on a console, I get to see the warning/debug messages streamed to that console. I have recently found that if I redirect the stderr to /dev/null - with a 'balooctl purge 2> /dev/null' I see the messages in the journal. I would love to know how properly to control this (Bug 460390)
Created attachment 153934 [details] Override.xml file to sidestep the .xls and .ppt baloo indexing issues. Attached an Override.xml file that adds the: <sub-class-of type="application/x-ole-storage"/> lines for the "application/vnd.ms-powerpoint" and "application/vnd.ms-powerpoint" entries. This would be copied, as root, to the /usr/share/mime/packages folder (the one that contains the freedesktop.org.xml) and the mimetype database rebuilt: # update-mime-database -V /usr/share/mime That worked for me.
(In reply to tagwerk19 from comment #28) > That worked for me. That should of course be... That worked for me, Thank you Stefan!
This bug report has gotten very hard to follow. But 1. if I follow my own steps (with LibreOffice Writer 7.4.2.3), baloo doesn't index. 2. if I download Guido's attachment 153915 [details] baloo_test_Libreoffice_7.4.2.3.doc , baloo doesn't index. 3 if I download Guido's attachment 153916 [details] baloo_test_WPS_Office.doc, baloo does index. 4. I have old MS Office docs that baloo does index. In all cases, the output of `catdoc FILENAME` matches baloo's indexing behavior -- the files baloo doesn't index are the ones for which catdoc has no output is empty and its exit code is 69. @tagwerk19, what are your results with attachment 153915 [details] ? I wrote > I couldn't find any Linux utility that identifies the version of the Word file format that a .doc file uses `file FILENAME` gives a lot of info; the non-indexed LibreOffice documents have Code page -535. I don't know if this is significant. I stepped through catdoc with gdb and for my file it didn't find an oleEntry matching WordDocument and exited with error code 69. It is unhelpful that kfilemetadata's officeextractor.cpp doesn't log when `catdoc`it fails to index anything! kmimetypefinder identifies all of these .doc files as application/msword
@tagwerk19 , it looks like in #comment 18 you did try @Guido's file baloo_test_Libreoffice_7.4.2.3.doc , and according to `balooshow -x` it did index its terms. I thought maybe it's because you have a different `catdoc`, but Debian and Fedora use basically the same 0.95 version. So I'm confused. What does `catdoc baloo_test_Libreoffice_7.4.2.3.doc` output for you and what's its exit status? (In reply to Guido from comment #4) > if the problem is catdoc, antiword is a good alternative I wrote a hacky script that strips the "-s cp1252 -d utf8 -w'" arguments that kfilemetadata passes to catdoc and then execs `antiword` with the remaining arguments (I think just the path to the file to index). If I put that in /usr/local/bin/catdoc (so kfilemetadata finds it first). then baloo does index baloo_test_Libreoffice_7.4.2.3.doc , yay! However, antiword doesn't index a small .doc file like my one-word "baloopleaseindexme"; if run from the command line it prints "I'm afraid the text stream of this file is too small to handle."
(In reply to Stefan Brüns from comment #23) > ... Bug in shared mime info, https://gitlab.freedesktop.org/xdg/shared-mime-info/ It looks like there is also .ppt and .xls mimetype info in /usr/share/mime/packages/libreoffice.xml. These are also without the: <sub-class-of type="application/x-ole-storage"/> I don't know what happens when there are multiple, distinct, entries for a mime type - but Override.xml, https://bugs.kde.org/attachment.cgi?id=153934, seems to override both.
(In reply to skierpage from comment #31) > ... What does `catdoc baloo_test_Libreoffice_7.4.2.3.doc` output for you ... I've not tried catdoc as a command before, but as they say, every day a learning day :-) On Neon Testing (rebased on Ubuntu 22.04) and catdoc 0.95 $ catdoc baloo_test_Libreoffice_7.4.2.3.doc $ catdoc baloo_test_WPS_Office.doc both worked and gave me the "KDE today announces..." text. However on Fedora 37 and Manjaro, also with catdoc 0.95: $ catdoc baloo_test_WPS_Office.doc worked but: $ catdoc baloo_test_Libreoffice_7.4.2.3.doc gave nothing and I see the same as skierpage: > ... the output of `catdoc FILENAME` matches baloo's indexing behavior Where catdoc fails, I get the same: > lineCount: 0 > wordCount: 0 as Guido (in Comment 21)
(In reply to skierpage from comment #0) > ADDITIONAL INFORMATION > There are tools to extract text from MSOffice files... That is a good lead, thanks! Looks like you can convert a doc to text with: $ libreoffice --headless --convert-to "txt:Text (encoded):UTF8" document.doc or stream the text to stdout, minimally with: $ libreoffice --cat document.doc but this can give some "extraneous" warning messages. I'm trying out: $ libreoffice --headless --safe-mode --cat document.doc and: $ libreoffice --headless "-env:UserInstallation=file:///tmp/Baloo_Conversion_${USER}" --cat document.doc It seems that this conversion ought work more generally but I get failures with .xls or .ppt files, maybe watch: https://bugs.documentfoundation.org/show_bug.cgi?id=150846
Finally, I certainly had issues with the mime type database. Following Stefan's, comment 23, suggestion fixed it for me. Looking at Neon Testing, Fedora 37 and Manjaro, they have the same issue, they all need the Override.xml. The mime type fix is necessary but not sufficient.
Confirming...
> On Neon Testing (rebased on Ubuntu 22.04) and catdoc 0.95 > > $ catdoc baloo_test_Libreoffice_7.4.2.3.doc > $ catdoc baloo_test_WPS_Office.doc > > both worked and gave me the "KDE today announces..." text. Thanks! I think I figured it out. Even though every distro and upstream are all at version 0.95, Debian has a patch to catdoc that fixes this bug https://bugs.debian.org/874048 (and carries some other catdoc patches), but upstream lacks it and so Fedora lacks it too. I filed https://bugzilla.redhat.com/2150140. So the problem with LibreOffice .doc files on Fedora can be RESOLVED > UPSTREAM. This should be two bug reports, one for LibreOffice .doc files and another for the .ppt and .xls mimeinfo bug ; the current bug title doesn't match either problem.
(In reply to Stefan Brüns from comment #23) > Bug in shared mime info, https://gitlab.freedesktop.org/xdg/shared-mime-info/ > > /usr/share/mime/packages/freedesktop.org.xml has : > <mime-type type="application/msword"> > <sub-class-of type="application/x-ole-storage"/> > > but application/vnd.ms-powerpoint has no sub-class-of. Dito for e.g. Access > and Excel documents. Reported upstream: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/issues/190
Requested upstream: https://bugs.documentfoundation.org/show_bug.cgi?id=152446 https://bugs.documentfoundation.org/show_bug.cgi?id=152451 Rolled up into: https://bugs.documentfoundation.org/show_bug.cgi?id=70625
Bugs in several upstream projects (catdoc, shared-mime-info), which should contain the fixes by now.