Bug 420939 - Baloo purge not completing
Summary: Baloo purge not completing
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: balooctl (show other bugs)
Version: 5.68.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-03 04:52 UTC by Scott
Modified: 2021-08-12 08:29 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
output of running baloo purge (135.74 KB, text/plain)
2020-05-03 04:52 UTC, Scott
Details
baloofilerc (1.10 KB, text/plain)
2020-05-09 22:59 UTC, Scott
Details
conf file (1.13 KB, text/plain)
2021-07-14 03:18 UTC, Scott
Details
window1 (104.82 KB, image/png)
2021-07-14 03:19 UTC, Scott
Details
window2 (410.48 KB, image/png)
2021-07-14 03:19 UTC, Scott
Details
Screen1 (198.58 KB, image/png)
2021-07-15 02:07 UTC, Scott
Details
mediainfo1 (210.47 KB, image/png)
2021-07-16 03:16 UTC, Scott
Details
mediainfo2 (183.52 KB, image/png)
2021-07-16 03:17 UTC, Scott
Details
attachment-8879-0.html (5.58 KB, text/html)
2021-08-05 22:40 UTC, Scott
Details
attachment-21123-0.html (8.48 KB, text/html)
2021-08-08 01:18 UTC, Scott
Details
stderr.txt (1.28 MB, text/plain)
2021-08-09 01:50 UTC, Scott
Details
baloofilerc (1.44 KB, application/octet-stream)
2021-08-09 01:50 UTC, Scott
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Scott 2020-05-03 04:52:20 UTC
Created attachment 128107 [details]
output of running baloo purge

SUMMARY
command: baloo purge hangs and does not complete, repeatedly

STEPS TO REPRODUCE
1. baloo purge
2. 
3. 

OBSERVED RESULT
see attached output then terminal just hangs with no prompt

EXPECTED RESULT
That the baloo database is entirely empty

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
KDE Plasma Version: 5.18.4
KDE Frameworks Version: 5.68.0
Qt Version: 5.12.8
Baloo Version: 5.68.0 

ADDITIONAL INFORMATION
I have performed purge 5 times because after each purge baloo status each time came back with a database of thousands of entries. For the last 3 times it reported the same 3949 entries despite the fact it appeared to be doing something. The initial status before the first purge was over 9000 records.
Comment 1 Nate Graham 2020-05-07 14:44:41 UTC
What is the exact command you're running? It should be `balooctl purge`.
Comment 2 Scott 2020-05-07 22:08:18 UTC
Yes, it was, balooctl purge. baloo purge, as a command does nothing.
Comment 3 Stefan Brüns 2020-05-08 03:30:55 UTC
baloo is not a command, so of course it does nothing.

balooctl purge removes the DB, and then the DB is rebuild, as intended.

I can't see any bug here.
Comment 4 Scott 2020-05-08 23:16:46 UTC
@Stefan Bruns
are you saying that the purge command is designed that after completely removing the database it immediately rebuilds it?

Why does the terminal just hang without returning the prompt?

If it is rebuilding the database as you imply then it is getting it wrong by an order of thousands as it repeatedly arrives at 3949 entries when there are approximately 6000 records based on the configuration I am using- baloo is only searching non home directory items whose contents are tightly controlled.

I am astounded you make the no bug claim on this software which, which to the best of my knowledge has only really worked at a reasonable level on Ubuntu 19.10, as it did for me. After fresh installing 20.04 I don't get the information baloo should be providing. This ticket does not address that issue rather the first issue I ran into in trying to solve the big issue of indexing not displaying in Dolphin, an issue, excluding 19.10, which has been the subject of many bug reports over many years.
Comment 5 Stefan Brüns 2020-05-09 00:51:20 UTC
Not "it" is rebuilding the db, the indexing daemon is enabled and adds any item included in the config to the db. The purge command purges the db, it does nothing else.

If the daemon stops after ~4000 items, either the other 2000 items are excluded (by mime type, by file name, hidden files/folders, ...), or it is hitting something where it gets stuck.

Unfortunately, you only provide very vague information. For the beginning, you should run 'balooctl monitor' in another terminal window during your experiments and report its output.
Comment 6 Scott 2020-05-09 09:47:30 UTC
So I thought I might follow your advice and did the following:

1/ (Window 1) scott@scottlounge:~$ balooctl monitor
Waiting for file indexer to start
Press Ctrl+C to stop monitoring
2/ (Window 2) scott@scottlounge:~$ balooctl purge
Deleted the index database
3/ (Window 2) scott@scottlounge:~$ barlooctl status

Command 'barlooctl' not found, did you mean:

  command 'balooctl' from deb baloo-kf5 (5.68.0-0ubuntu1)

Try: sudo apt install <deb name>

1/ So even this simple process does not work properly because I get error that the command cannot be found. I assume we agree this a bug?
2/ This time I get the expected message from the purge command, "Deleted the index database" and my prompt.
3/ At this point window 1, barloo monitor, reported, Idle. (no errors)
4/ A subsequent running of balooctl purge (a minute or 2 later) gave rise to the same behaviour as in my initial post. I assume that not getting the exact same behaviour each time a command such as this is issued is also a bug? As is not terminating and displaying the prompt?
5/ I also consider it a bug that the purge command despite not terminating then goes on to, I assume, display to the terminal what is being indexed.
6/ A brief spot check showed that indeed thousands of files were not indexed. There appears to be no commonality in what is not indexed as it includes files, sub directories, directories and even entire disks.

It may be worth noting that there are only 4 file types comprised in the total of approx 6,000 entries being file types: .ts, .m2ts, .mkv and .avi. 

So, running monitor does not appear to have achieved anything, perhaps you have another suggestion? In view of this product's troubled history over the best part of a decade, my suggestion would lie somewhere in product testing. The errors are so huge and glaring what detail can any single user give to pinpoint the problem that are not instantly recognisable by anyone with a large(ish) data set? By way of comparison I use another piece of software, Plex, that uses the same data set without issue which would suggest that the data set is not the problem.

Then, on top of this, I still can't look at my file manager and see the run time of a bloody movie or see any metadata detail at all from a networked PC. I appreciate that it's a one issue one ticket world so I will leave off on these in this thread.
Comment 7 Stefan Brüns 2020-05-09 14:07:35 UTC
If you can't type, thats a bug on your side.

"barlooctl status"

There is an "r" too much ...

You are making too much assumptions, without even checking what you are doing.
Comment 8 Scott 2020-05-09 22:41:31 UTC
OK, you have me on one point which is not central to this ticket. What assumptions are you referring to? Focusing on my limited ability to spell as your only response is not progressing this problem to resolution. Are you able to replicate not getting a command termination and prompt after multiple submissions of the purge command?
Comment 9 Nate Graham 2020-05-09 22:42:17 UTC
(In reply to Scott from comment #8)
> OK, you have me on one point which is not central to this ticket. What
> assumptions are you referring to? Focusing on my limited ability to spell as
> your only response is not progressing this problem to resolution. Are you
> able to replicate not getting a command termination and prompt after
> multiple submissions of the purge command?
No.
Comment 10 Scott 2020-05-09 22:59:40 UTC
Created attachment 128318 [details]
baloofilerc
Comment 11 Scott 2020-05-09 23:02:26 UTC
Text for above attachment. Perhaps the incomplete indexing is related to this issue. Why are so many files not indexed and why are whole disks not indexed considering they are specifically requested to be indexed in the rc file?
Comment 12 Scott 2020-05-09 23:14:33 UTC
The attachment shows an error for the inclusion of HDD disk10. I sent you a local backup copy of the file, the live version is correct.
Comment 13 Christoph Feck 2020-05-22 20:40:53 UTC
New information was added in recent comments; changing status for inspection.
Comment 14 Stefan Brüns 2020-05-22 22:22:03 UTC
So, going through your complaints ...

> 1/ So even this simple process does not work properly because I get error that the command cannot be found. I assume we agree this a bug?

Obviously not a bug, you spelled it wrong, and it notifies you about it.

5/ I also consider it a bug that the purge command despite not terminating then goes on to, I assume, display to the terminal what is being indexed.

What makes you think the *command* has not completed? When you start a program from the command line, its output is directed to the terminal. That is standard behavior, not a bug.

For me, the "balooctl purge" commands completes immediately when baloo is idle, and after a few seconds when it is currently busy.

You still have not listed any specific files which should be indexed and are not. You do not provide the paths of files not indexed, you do not provide the files itself. The config file without any context is pointless.
Comment 15 Scott 2020-05-23 00:49:15 UTC
1/ You have already mentioned my spelling error in comment and, not surprisingly, I have acknowledged it.

2/ I feel the command is not terminating because the terminal is not returned to the prompt as is normal. The expected behaviour of balooctl purge is to purge the database and return to the prompt. I agree that the terminal will normally display some message(s) regarding the program that has been started before displaying the prompt at termination of that program.

What it actually does is purge the database and then display what I assume to be the results of another process, some form of baloo indexing and at the end of the indexing the prompt is not returned. I assume that whatever it is doing it is waiting for some kind of indexing related input/activity - nothing to do with the purge command.

3/ I will address the file indexing issue separately.
Comment 16 Bug Janitor Service 2020-06-07 04:33:08 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 17 Christoph Feck 2020-06-12 10:39:48 UTC
New information was added with comment 15; changing status for inspection.
Comment 18 tagwerk19 2021-07-08 10:04:17 UTC
Did you get this sorted?

Have a look at Bug 431664, that might mean you were seeing a corrupted database and, since 5.78, the issue's been fixed.

A "balooctl purge" does three things, close down the running baloo_file process, remove the .local/share/baloo/index database and restart baloo_file. It can be that the "closedown" hangs. In doubt, you can manually kill the process, check that it has gone, and delete the index file.

I've noticed that you can get log messages streaming to your terminal session after you've closed down and restarted baloo. What you are seeing in your original attachment looks like failures/warnings when baloo_file_extractor tries to read metadata from the various files. You should see, if you are running "balooctl monitor" in second window, a stream of files listed as they are indexed (they are dealt with in batches of 40). If this stream "suddenly stops" and you haven't successfully indexed everything, there's a chance that you have one file (or filetype) that chokes baloo. There'd be a bit to troubleshooting required there.
Comment 19 Scott 2021-07-11 23:00:31 UTC
With Stefan on the case I decided this issue would never be resolved so since my last comment over a year ago I just ignored the problem. I have since upgraded to 21.04 and the issue remains unresolved and movie durations are still not displayed.

With regards to the warnings they appear to be generated by subtitles not being able to be read properly.

The strange thing here being that of the thousands of files I have, 99% of them fall into one of 3 file types:
1/ .ts muxed with PGS subtitles and DTS audio
2/ .m2ts muxed with PGS subtitles and DTS audio
3/ .mkv muxed with SRT subtitles and DTS audio
of course the video codecs are more varied and to time demanding to standardise. 

It seems that someone realises there are issues with obtaining movie durations as this field is not available in Dolphin>Show additional information>Video, only aspect ratio and frame rate are shown. Of the very limited number of files that do display a duration I get that information from Dolphin>Show additional information>Audio>Duration.

If I can provide further information please advise.
Comment 20 tagwerk19 2021-07-12 13:45:10 UTC
(In reply to Scott from comment #19)
> If I can provide further information please advise.
The issues are:

    You don't get as many files indexed as you expect.
    A "balooctl purge" hangs

where the files are video format (with subtitles) plus there's a separate issue with durations not appearing as index metadata

Let's simplify and take a few steps back; apologies if this seems rather basic...

It's worth temporarily setting up the indexing just for a test folder under your "home " and copying "test" files across into it one at a time. If you are running "balooctl monitor" you should see baloo notice a file arrive and index it.

To be sure that you are starting "clean" you can close down the indexer,

    balooctl disable

and check that it has really gone (try "pgrep -l baloo_file" and kill the process if necessary). Then manually delete the database:

    rm ~/.local/share/baloo/index

Run

    balooctl monitor

in a separate window and

    balooctl enable

Repeat the "enable" if you don't see the monitor say "File indexer is running". Sometimes it needs to be told twice.

When you copy a file you should see it listed in the monitor and be able to see the information extracted with

    balooshow -x ...testfile...

I would expect that this shows you the embedded metadata (some? all?). You should be able to see the underlying metadata that dolphin displays as aspect ratio, frame rate, duration etc.

If you don't get any results, you can prompt baloo to index the file:

    balooctl index ...testfile...

which may give you extra information/error messages.

If you are not seeing the metadata for particular file formats (or for some files and not others), then I think split off this issue and submit separate a bug. You are likely to need a someone with deep and detailed knowledge of video formats who can take it on from there :-/

It's also worth noting if the "balooctl index" takes a long time to process. There's a bug fixed in 5.83.0 where determining the mimetype meant that the whole file was read. If you are dealing with large video files, that could be a considerable load. See Bug 398908.

Similarly, check whether the indexer is crashing when you copy the test file or do a manual index (see if coredumpctl gives you anything). A repeatable crash would mean installing debug symbols and submitting that as a separate bug.

Any metadata that "balooshow -x" shows you should be findable with baloosearch.

If "things work" with your test files in your home directory and don't work on your mounted discs, then that another angle to follow up.

Hope this is followable...
Comment 21 Scott 2021-07-14 03:18:55 UTC
Created attachment 140036 [details]
conf file
Comment 22 Scott 2021-07-14 03:19:29 UTC
Created attachment 140037 [details]
window1
Comment 23 Scott 2021-07-14 03:19:50 UTC
Created attachment 140038 [details]
window2
Comment 24 Scott 2021-07-14 03:20:26 UTC
In relation to the issues the first two I believe result in third, that is no duration displayed. Your approach I believe is spot-on by taking a step back.

I followed your directions and I am attaching 2 files that contain the output of 2 terminal. I am also attaching a copy of the config file I am using. I might add at this point that the contents were somewhat different than what is documented at: https://community.kde.org/Baloo/Configuration which is what I used to setup my config file.

What I did was to copy 3 small video files and an mp3 music file to ~/Videos and set that up as the only folder to be indexed. As you can see in the window1 file only 3 new files were indexed and 5 modified files were indexed. In one instance a new file was labelled as a modified file and the remaning 4 modified files appeared without any keyboard or other input. No duration values were displayed.

As per your direction I then proceeded to index one of the video files files manually as shown in window2 and that displayed the errors as shown.

If I have not followed your instructions correctly, please advise and I will re-do.
Comment 25 tagwerk19 2021-07-14 08:46:43 UTC
(In reply to Scott from comment #24)
> ... I am also attaching a copy of the config file I am using ...
The "only basic indexing=true" means that baloo will not index the content of the files, it will note the file exists and index the filename (*). 

> What I did was to copy 3 small video files and an mp3 music file to ~/Videos
> and set that up as the only folder to be indexed. As you can see in the
> window1 file only 3 new files were indexed and 5 modified files were
> indexed. In one instance a new file was labelled as a modified file and the
> remaning 4 modified files appeared without any keyboard or other input. No
> duration values were displayed.
I'm not sure how much you can deduce from these messages. Baloo has "been told" that the file has been modified - if you are copying a big file, that could happen a number of times. I'd need to watch the behaviour on my system to be sure but I know that if you are appending information to a log file, baloo notes that  the file's changed.

(Balooctl monitor only lists files by name when the file is sent over to the baloo_file_extractor for content indexing)
 
> As per your direction I then proceeded to index one of the video files files
> manually as shown in window2 and that displayed the errors as shown.
I think at this point, try the:

   balooshow -x ~/Videos/S01E01.m2ts

and see what information baloo has extracted.

If you've got nothing (it would appear under an "Internal Info" line), then something's failed.

You should have "File Name Terms" that includes S01E01 and m2ts.

You may have, if the the file extractors understand/cope with the format, embedded metadata appearing in a "Terms:" line, or maybe listed with a property name like "Width:" or "Height:" (taking these examples from indexed JPG's, I'm afraid I don't know Video formats...)

Baloosearch should find your videos "by filename"

    baloosearch S01E01
    baloosearch m2ts

If you get anything listed in "Terms" you should find the file with a

    baloosearch yourterm

If you have properties appearing you could try (as an example)

    baloosearch width:1024

Hope this is all OK...

*)  Basic indexing will also index any "Extended Attributes", that is
    if you've added a tag, rating or comment in Dolphin, they'll
    be indexed
Comment 26 Scott 2021-07-15 02:07:28 UTC
I have deleted: only basic indexing=true from my config file and rerun all the steps from yesterday. 

In addition I also ran the balooshow -x command on 1 video file and the mp3 file. I will upload the terminal printout as, screen1.

Some information has been indexed as shown on the attachment, of interest are the 2 items labelled Mtime and Ctime I don't know what they perport to display but neither of them is the duration time despite time format being displayed after the date (last 6 digitd at the end of the line). In any case there is no value displayed in the duration column of Dolphin.

The other data one might expect to be collected such as width or ratio seems not to have been collected.

I am most surprised that even simple mp3 files seem not to be indexed correctly/fully.
Comment 27 Scott 2021-07-15 02:07:59 UTC
Created attachment 140064 [details]
Screen1
Comment 28 tagwerk19 2021-07-15 06:49:28 UTC
(In reply to Scott from comment #26)
> ... the 2 items labelled Mtime and Ctime I don't know what they perport to
> display ...
Mtime is the modification date/time of the file - in the "internal format" long number and the conversion to more human-readable. That's when the file was created or last edited.

Ctime is date/time the details about the file were changed. If you change the permissions, you'd see the Ctime change. You'd normally see the Mtime and Ctime be the same...

They've got nothing to do with "embedded" information within the file, like the Height, Width, Duration...
 
> I am most surprised that even simple mp3 files seem not to be indexed
> correctly/fully.
You might have a look at the MP3's with sometime like Easytag. That digs down inside the files to get at and allow you to change the ID3 tags

There a bit about this in Bug 437189

Baloo does need to be doing "content indexing" to see these tags - after all it is opening the file and having a look inside.

Alas, I've not tried to find an "Easytag" equivalent for video files.
Comment 29 tagwerk19 2021-07-15 07:10:19 UTC
(In reply to Scott from comment #27)
> Created attachment 140064 [details]
> Screen1
Interesting screenshot...

I'd agree, strange that there's nothing shown after "Terms:" beyond the mimetype...

In this instance, I'd find a completely different MP3, look at that and compare.

I've just discovered that "exiftool", something I'd only considered as looking at metadata in image files, can also list the metadata for audio/video. Well, worked for me for a small sample of files. Try

    exiftool ...yourfile...
Comment 30 Scott 2021-07-15 09:05:09 UTC
Seems we had a mid-air collision and my last post was lost as it was submitted at the same time you submitted your last post, anyway to briefly recap.

Firstly, just to confirm, what is the correct config setting/syntax to enable content indexing:
1/ Remove the line, only basic indexing=true, or,
2/ Include the line, only basic indexing=false

Baloo is part,n parcel of Kubuntu and should work straight out-of-the-box, just like my monitor or speakers. There should be no need to install programs for baloo to work. Following your instructions I have demonstrated that the expected indexing does not occur. So we can confirm that this preceding step, to the issues you identified in an earlier post, is a problem.

It is clear that baloo should index duration (the play time of a music piece) as is evidenced by the selectable field in Dolphin, Audi>Duration.

As you are likely aware, issues surrounding the usability of baloo have been around for many years, I have been trying to use it for 7 years without success.

This bug was originally about purging and now it throws up issues with initial indexing, which is also mentioned early on in the dialogue of the bug. I think though that someone needs to decide what baloo is supposed to do, is it supposed to collect audio/visual file metadata? If yes, it fails and that is the first bug as is amply demonstrated by the numerous bug reports about baloo and metadata information not appearing. 

If no, then Dolphin needs to be amended so these metadata fields are not accessible.

If I have to install extra software so baloo might work I am not really interested, I can do what I have done and install other software that gives me durations which are also visible from remote computers which baloo cannot do.

From the information in this post it is clear that multiple people have an issue with baloo not indexing correctly. Is it possible to just fix that issue first and then revisit the other downstream issues?

Better yet, why don't we use a file manager that can read metadata and give baloo the heave-ho.
Comment 31 tagwerk19 2021-07-15 11:39:31 UTC
(In reply to Scott from comment #30)
> ... just to confirm, what is the correct config setting/syntax to
> enable content indexing:
> 1/ Remove the line, only basic indexing=true, or,
> 2/ Include the line, only basic indexing=false
Yes. Or go through "System Settings > Search > File Search" and you'll see an "Also index file content" checkbox.

> It is clear that baloo should index duration (the play time of a music
> piece) as is evidenced by the selectable field in Dolphin, Audio>Duration.
The question is, where has the "duration" information got lost.

From your screenshot (with balooshow), baloo hasn't extracted it. On to the next step...

If you run 'exiftool' on the file, you should see if the information is there in the file. If it is, then there's a bug in the extractor. If exiftool does not show the info, then maybe there's an issue with the file.

A useful comparative test is to download and try a different MP3 file...

In general I see the "Duration" listed for the MP3's I've checked (in Dolphin and listed by "balooshow -x" and exiftool). I don't see much info for videos (a little and maybe not sufficient) even with exiftool.

I think yes, it's good that baloo tries to extract embedded metadata from the files. However, it's clearly a horrible, messy job to have to do.

As far as I remember I installed exiftool on Neon with:
    sudo apt install libimage-exiftool-perl
It is handy, solid and straightforward
Comment 32 Scott 2021-07-16 03:16:23 UTC
(In reply to Scott from comment #30)
> ... just to confirm, what is the correct config setting/syntax to
> enable content indexing:
> 1/ Remove the line, only basic indexing=true, or,
> 2/ Include the line, only basic indexing=false
Yes. Or go through "System Settings > Search > File Search" and you'll see an "Also index file content" checkbox.
-----------------------------------------------------------------------------------------
So here is the problem, the above information is not entirely correct. The settings in baloofilerc do not control whether full file indexing occurs. That control rests with the setting in System Settings > Search > File Search. Once that is enabled (ticked) the four files we spoke about yesterday are now more fully indexed and display valid data in the Duration column of Dolphin. It appears that the GUI takes precedent over the config file as including a directory in the config is ignored (not indexed) if that directory is selected as, Not indexed in the GUI.

There are still issues with warning/error output:
1/ [mpegts @ 0x55e6d2be2d00] start time for stream 2 is not set in estimate_timings_from_pts
2/ [mpegts @ 0x55e6d2be2d00] Could not find codec parameters for stream 2 (Subtitle: hdmv_pgs_subtitle (pgssub) ([144][0][0][0] / 0x0090)): unspecified size
3/ Consider increasing the value for the 'analyzeduration' and 'probesize' options

Is it possible to increase the values shown in item 3 above?

Having had some success in having these 4 files indexed I proceeded to index a directory within the main disk array. Baloo was only able to index metadata on about 30% of the files. I then copied one indexed file and one not indexed file to the same directory as the previously mentioned 4  files and renamed them. I repeated the indexing and attempted single file indexing with the same result. I then deleted 2 of the video files, S01E04&5.
My Test directory now has 4 files, 1 x mp3 file, 1 x initial video file, 1 new file with metadata not indexed and 1 new file with metadata indexed. Both new files are close to 15GB in size.

As you are aware I have 2 terminal windows open during this process. When I unable balooctl the monitor window reports, Indexing: /directory.file.m2ts OK, for all 4 files. In the other terminal window error/warning messages are printed (the purple prefixed messages in my previous attachments)  as shown in the numbered items above. The exception was the new file which did not have it's metadata indexed, where an additional error/warning message was displayed: 
[mpegts @ 0x5645497daec0] Failed to open codec in avformat_find_stream_info


I then used a program, MediaInfo, to read the file details, which are attached and discovered that this file uses the VC-1 codec. This is a little surprising as this codec is quite an early one, dating back to 2006. I looked at several more files and found that the video codec is not the sole reason as I found numerous AVC that were not indexed and even for late model codecs such as H.265 some were indexed while others were not.
Comment 33 Scott 2021-07-16 03:16:46 UTC
Created attachment 140105 [details]
mediainfo1
Comment 34 Scott 2021-07-16 03:17:07 UTC
Created attachment 140106 [details]
mediainfo2
Comment 35 tagwerk19 2021-07-16 21:58:00 UTC
(In reply to Scott from comment #32)
> So here is the problem, the above information is not entirely correct. The
> settings in baloofilerc do not control whether full file indexing occurs.
I think baloofilerc is the single source of config information, the GUI updates it and prompts baloo to reread it. I think the GUI does do some other stuff like the equivalent of a "balooctl check" when you include other folders. Beyond that, I'm afraid I don't know.

> ... Once that is enabled (ticked) the four files we spoke about
> yesterday are now more fully indexed and display valid data in the Duration
> column of Dolphin ...
That sounds good news (some good news at least :-)

> Is it possible to increase the values shown in item 3 above?
If I google for the error message, I get hits for ffmpeg.

It seems sensible for baloo to depend on ffmpeg and I also see I've a "kfilemetadata_ffmpegextractor.so" on my system. It looks as if you are ahead of me though with video formats, cf Bug 399650. My guess is that you are running into a similar sort of issue you had before.

I'd say check your files with ffmpeg/ffprobe to see if ffmpeg can read the metadata. You can install ffmpeg on Kubuntu with "sudo apt install ffmpeg" and try out ffprobe:

    https://trac.ffmpeg.org/wiki/FFprobeTips

Maybe this helps. You've the advantage that you know the metadata _can_ be read with MediaInfo, ffprobe could help pin down where the issue is.

> As you are aware I have 2 terminal windows open during this process. When I
> unable balooctl the monitor window reports, Indexing: /directory.file.m2ts
> OK, for all 4 files. In the other terminal window error/warning messages are
> printed (the purple prefixed messages in my previous attachments)  as shown
> in the numbered items above.
Yes. I've noticed that after you've restarted baloo from the command line (a balooctl enable or balooctl purge) you get the error messages scrolling up on your screen.

These are from the "background processes" baloo_file and baloo_file_extractor and would normally go to a log.

Were you getting these messages when you were running "balooctl purge"? In this case you wouldn't get the command prompt repeated after the message.
Comment 36 Scott 2021-07-17 00:23:45 UTC
I think baloofilerc is the single source of config information, the GUI updates it and prompts baloo to reread it. I think the GUI does do some other stuff like the equivalent of a "balooctl check" when you include other folders. Beyond that, I'm afraid I don't know.
---------------------------------------------------------------------------------------------
I have tried various conf settings, if the, index file contents checkbox is not ticked in the GUI then content indexing does not occur.

Please find following the result of probing the file for duration which was not indexed by baloo.

scott@scottlounge:~/Videos$ ffprobe -v error -sexagesimal -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 ~/Videos/"File2 BD".m2ts
2:23:21.801467

The result of, 2:23:21.801467 is correct. I tested the remaining 3 files in my test directory and they were also correct.

By way of helpful information, this warning/note appears on the ffmpeg link in your last post:
Note: Not all formats, such as Matroska and WebM, store duration at the stream level resulting in duration=N/A. Refer to Format (container) duration instead. 

This appears to no longer be correct as some tests I did on some mkv files returned a correct result for duration.

-------------------------------------------------------------------------------------
Were you getting these messages when you were running "balooctl purge"? In this case you wouldn't get the command prompt repeated after the message.

It's over a year since I raised the issue, I cannot recall that level of detail.

How are you quoting in your replies?

I refer back to my comment #19 where I described the makeup of my data files by file type. What I did not say was that every file in my database was created by me on this computer using the same software every time in the case .ts and .m2ts and new versions of the same software in the case of .mkv files. Every file is the result of me muxing the RAW data streams (video, audio and subtitles) into one of the 3 containers. Every audio and subtitle stream is broken down to its component parts and then rebuilt to the format I want. In the case of audio every audio track is reduced to it's component wav file parts using the same decoding codec and then encoded as DTS, either MA or normal. So from a group perspective the only difference between files is the video codec, in every other respect the files (in each group) are identical. The point being that as advised in an earlier post it is not the video codec which is the deciding issue as the same codec is sometimes indexed and sometimes not, though codec version may possibly be an issue, which leaves, what as the deciding issue as to whether baloo indexes metadata or not?

I have also looked at bit width of audio and video resolution and discounted them as reasons for baloo not indexing them as just like video codecs some are indexed and some are not. If I look at UHD files where the standards are much tighter, eg all video files I have use the h.265 video codec the situation is still the same, some indexed, some not. I could ffprobe every one of my 5000+ files and get a positive result for all of them because I made all of them in exactly the same way.

There seems to be no problem with the files as they are playable on literally dozens of players on both Linux and Windows as I have spent considerable time testing various players for suitability for my own use.
Comment 37 tagwerk19 2021-07-17 12:42:50 UTC
(In reply to Scott from comment #36)
> ... The result of, 2:23:21.801467 is correct. I tested the remaining 3
> files in my test directory and they were also correct ...
That's reassuring...

> ... Every file is the result of me muxing the RAW data streams (video,
> audio and subtitles) into one of the 3 containers. Every audio and subtitle
> stream is broken down to its component parts and then rebuilt to the format
> I want ...
and ...

> ... If I look at UHD files where the standards are
> much tighter, eg all video files I have use the h.265 video codec the
> situation is still the same, some indexed, some not ...
It sounds as if you have complete control of how you build the files. 

Thinking of the group who decides to trace the problem in the code. The aim would be to create something "minimal" that works with ffprobe and fails when being indexed (which I assume fails in ffmpegextractor...). They would need "small examples" of what's failing (small enough that can be uploaded as attachments).

> Were you getting these messages when you were running "balooctl purge"? In
> this case you wouldn't get the command prompt repeated after the message.
> 
> It's over a year since I raised the issue, I cannot recall that level of
> detail.
Accepted :-) For me, sometimes last week is hard...

> How are you quoting in your replies?
There's a [reply] link in the top right of each of the comments. Click on that and the comment magically appears in the Additional Comments box as quoted text. It's just copied there, you can cut/paste/re-sort the text to weave in your replies as you see fit. I do check with the "Preview" tag to see whether what I've typed formats OK.
Comment 38 Scott 2021-07-17 22:08:31 UTC
Quotes - The reply button only appears if you are logged in, thought I was logged in because I could post.


> Thinking of the group who decides to trace the problem in the code. The aim
> would be to create something "minimal" that works with ffprobe and fails
> when being indexed (which I assume fails in ffmpegextractor...). They would
> need "small examples" of what's failing (small enough that can be uploaded
> as attachments).

I can understand the requirement for small, easy to manipulate data sets that can be uploaded as an attachment and I can produce small files but the site file size upload limit makes this unrealistic in this case. All of my UHD movies, which you indicated an interest in, have much more data in them for 1 second of play time than 4000 KB. Even low quality HD files will blow past the limit in a few seconds.

I can reduce the file size by cutting out a piece of a file which would maintain it's characteristics, to reduce a file's size by reducing the quality requires transcoding which will fundamentally alter the the file's characteristics or properties.

Regarding your purge question, I looked at the first attachment, output of running baloo purge, and that information appears there.
Comment 39 tagwerk19 2021-07-19 06:04:56 UTC
(In reply to Scott from comment #38)
> ... I can produce small files but the
> site file size upload limit makes this unrealistic in this case ...
I was imagining a test file would have to be very short and it's probably a matter of much luck (and gut-feeling) to find test cases where the baloo  / ffmpegextractor failed.

> ... I can reduce the file size by cutting out a piece of a file which would
> maintain it's characteristics, to reduce a file's size by reducing the
> quality requires transcoding which will fundamentally alter the the file's
> characteristics or properties ...
There's a "container format" and the encoding? (I only have a very basic knowledge here :-)

Where is the metadata (like duration) held? Is this clear cut? Or is it "It's copied to the container, but if it's not copied, you have to look down into the encoded source"?

> Quotes - The reply button only appears if you are logged in, thought I was
> logged in because I could post ...
Puzzled. You'd have to be logged in to post (otherwise how it know who is posting?). Maybe something different happens if you "remain logged in" because you have kept the appropriate cookie.
Comment 40 Scott 2021-07-19 23:06:20 UTC
(In reply to tagwerk19 from comment #39)

> I was imagining a test file would have to be very short and it's probably a
> matter of much luck (and gut-feeling) to find test cases where the baloo  /
> ffmpegextractor failed.

There is no problem identifying where ffmpegextractor failed, they are all the files without a duration and there are thousands of them.

> There's a "container format" and the encoding? (I only have a very basic
> knowledge here :-)
> 
> Where is the metadata (like duration) held? Is this clear cut? Or is it
> "It's copied to the container, but if it's not copied, you have to look down
> into the encoded source"?

I am not sufficiently knowledgeable to provide the detail you appear to be after but my limited understanding is: In a standard video, from the perspective of the video player, you have 3 streams, video, audio and subtitle. In an advanced container as I have discussed in this thread such as .ts, .m2ts and .mkv files (containers/wrappers) there may be many more streams than the 3 which the video player uses to produce the watchable video.

Each stream is encoded and read using a specific codec and is entirely stand-alone, bearing no dependency relationship to the container whatsoever. The audio and video streams have duration metadata, subtitles I don't know as they are either a collection of still images or a text file. M2ts and ts containers also have duration metadata while mkv containers do not see, https://trac.ffmpeg.org/wiki/FFprobeTips. 

Unless provision can be made to upload larger files it will not be possible to send a sample to be investigated. I would think that 20MB per sample file would be sufficient.
Comment 41 tagwerk19 2021-07-20 20:55:40 UTC
(In reply to Scott from comment #40)
> ... I am not sufficiently knowledgeable to provide the detail you appear to be
> after ...
> ... I would think that 20MB per sample file would be sufficient ...

No problem, I'm certainly tapping in the dark and learning as I go...

I think that finding comparative test cases, where one can be indexed and the other not, is the bigger job. If you've got that, then the sharing of the samples "ought" to be easier (dropbox perhaps...)






but my limited understanding is: In a standard video, from the
> perspective of the video player, you have 3 streams, video, audio and
> subtitle. In an advanced container as I have discussed in this thread such
> as .ts, .m2ts and .mkv files (containers/wrappers) there may be many more
> streams than the 3 which the video player uses to produce the watchable
> video.
> 
> Each stream is encoded and read using a specific codec and is entirely
> stand-alone, bearing no dependency relationship to the container whatsoever.
> The audio and video streams have duration metadata, subtitles I don't know
> as they are either a collection of still images or a text file. M2ts and ts
> containers also have duration metadata while mkv containers do not see,
> https://trac.ffmpeg.org/wiki/FFprobeTips. 
> 
> Unless provision can be made to upload larger files it will not be possible
> to send a sample to be investigated. I would think that 20MB per sample file
> would be sufficient.
Comment 42 Scott 2021-07-22 00:10:51 UTC
I have created a few files of 10 seconds each and they appear to maintain their properties, meaning they are either indexed or not indexed in the same way as the parent file. How many do you need and I presume at least one mkv, ts and m2ts? File size can approach 100MB.
Comment 43 tagwerk19 2021-07-22 22:25:13 UTC
(In reply to Scott from comment #42)
> I have created a few files of 10 seconds each and they appear to maintain
> their properties, meaning they are either indexed or not indexed in the same
> way as the parent file. How many do you need and I presume at least one mkv,
> ts and m2ts? File size can approach 100MB.
I would say it would be best to open a new bug, under frameworks-kfilemetadata, say you've got test files, some that show "Duration" and some that don't.  Say they are too big to upload and see what's proposed.

... Well done replicating the issue. If a 10 second video is still 100 Mbyte it does seem unlikely that you can get it down to 4 :-/
Comment 44 tagwerk19 2021-08-05 07:31:42 UTC
(In reply to Scott from comment #40)
> ,,, In a standard video, from the
> perspective of the video player, you have 3 streams, video, audio and
> subtitle. In an advanced container as I have discussed in this thread such
> as .ts, .m2ts and .mkv files (containers/wrappers) ...
Looks as if we've got a couple of mimetype issues...

    .ts video files are getting misidentified as text/vnd.qt.linguist

    The video files that _embed_ the DTS HD audio are being identified
    _as_ audio/vnd.dts.hd and not the video container format (there's
    some very broad magic used when looking for DTS HD files)

The original issue seems to be that restarting baloo from the command line (with a balooctl purge or balooctl enable) gives a voluminous stream of warning  / debugging messages to the terminal, sent from the background baloo_file_extractor, and the command prompt was pushed up off the top of the screen.

Will close this "not completing" issue and open new bugs for the mimetype issues. Hope this is OK
Comment 45 tagwerk19 2021-08-05 15:17:13 UTC
(In reply to tagwerk19 from comment #44)
> ... new bugs for the mimetype issues ...
Cf:
    Bug 440631 : MPEG-2 TS .ts video file flagged as text/vnd.qt.linguist
    Bug 440632 : MPEG-2 TS and Matroska video files flagged as audio/vnd.dts.hd
Comment 46 Scott 2021-08-05 22:40:04 UTC
Created attachment 140554 [details]
attachment-8879-0.html

I refer to the following from my last e-mail:

Seems like a little bit of progress and then comes a status update:
scott@scottlounge:/$ balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 21,848
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 28.06 MiB

I think I did not explain sufficiently why this status report is a problem
and a bug. As we know I have about six thousand files that need to be
indexed which are all located in one top level directory called
Entertainment. I have provided a copy of the config file showing that this
is the only directory that baloo should be indexing and it appears that
baloo is doing that correctly. The problems are:
1/ baloo terminates during indexing for unknown reasons (not
hanging/freezing as I erroneously stated previously) without providing a
reason code.
2/ On restarting the indexing baloo re-indexes the same files with an
erroneous message that the files have changed (see my last email) or added
with baloo being turned off. Baloo is not checking that these index entries
already exist or there is some problem with the index file itself and so
just duplicates them which is why baloo reports over 21,000 files indexed
from a dataset only containing 6,000 entries.
3/ Further it reports files waiting to be indexed and files failed to index
both being zero when in fact approximately 1,000 of the 6,000 files in the
dataset have not been indexed. I have restarted baloo repeatedly and they
never get indexed, it re-indexes what it had before.

I think the problem is more than just misidentifying mime types.

I had to disable baloo because it somehow seriously interferes with my
ability to move files from the admin PC to the server. With baloo running
on the server any attempt to transfer files to it results in very slow
transfer speeds and on occasion failure to complete the move and this is
occuring while the indexer is reporting idle.


On Thu, Aug 5, 2021 at 11:17 PM <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=420939
>
> --- Comment #45 from tagwerk19@innerjoin.org ---
> (In reply to tagwerk19 from comment #44)
> > ... new bugs for the mimetype issues ...
> Cf:
>     Bug 440631 : MPEG-2 TS .ts video file flagged as text/vnd.qt.linguist
>     Bug 440632 : MPEG-2 TS and Matroska video files flagged as
> audio/vnd.dts.hd
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 47 tagwerk19 2021-08-06 08:11:10 UTC
(In reply to Scott from comment #46)

No problem, we carry on troubleshooting.

> I think the problem is more than just misidentifying mime types.
Finding out about the mimetypes and that baloo would never attempt to index some files was one step along the way. Good to find out but there's more to do.

> 3/ Further it reports files waiting to be indexed and files failed to index
> both being zero when in fact approximately 1,000 of the 6,000 files in the
> dataset have not been indexed. I have restarted baloo repeatedly and they
> never get indexed, it re-indexes what it had before.
It's possible that we've got another mimetype issue with these files, or they are your 1000 biggest files, or something else. I think copy one of them to your home directory and check with

    xdg-mime query filetype ...newstrangefile...

Check that the mimetype is sensible, then see what

    balooshow -x ...newstrangefile...

says.

> 1/ baloo terminates during indexing for unknown reasons (not
> hanging/freezing as I erroneously stated previously) without providing a
> reason code.
I'll ask a bit more about this. Your "balooctl status" output says 

> Baloo File Indexer is running
> Indexer state: Idle
That's what baloo says when it's alive and thinks it has nothing more to do. There is the content indexer process "baloo_file_extractor" that is run when there is indexing necessary, does its job, saves the results, stops and is run again when there is more to do. This would/should happen in the background and you wouldn't see exit codes.

> 2/ On restarting the indexing baloo re-indexes the same files with an
> erroneous message that the files have changed (see my last email) or added
> with baloo being turned off. Baloo is not checking that these index entries
> already exist or there is some problem with the index file itself and so
> just duplicates them which is why baloo reports over 21,000 files indexed
> from a dataset only containing 6,000 entries.
The error message is a:

> ... id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created
Need to check the Id and see if it is really changing. Ask with "stat", you'll get something like:

    $ stat 1.ts
      File: 1.ts
      Size: 41416704        Blocks: 80896      IO Block: 4096   regular file
    Device: fc01h/64513d    Inode: 794964      Links: 1
    Access: (0664/-rw-rw-r--)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
    Access: 2021-07-24 22:50:57.838161084 +0200
    Modify: 2021-07-24 22:50:57.838161084 +0200
    Change: 2021-07-24 22:51:42.686181710 +0200
    Birth: -

It's the "Device" and "Inode" numbers that you need to keep you eye on. The:

    Device: fc01h/64513d    Inode: 794964

If you reboot and these change, baloo will think it's got a new file and try to index it again. Keep a note of the numbers, check again after a reboot and compare.

You could also try a baloosearch for one of the files that always seems to be reindexed

    $ baloosearch -i ...oneofyoursavedfiles...

If you are OK, baloosearch will give a single result, if the id has been changing, "baloosearch -i" would show several lines - with different ID numbers and the same file/pathname. Something like:

    $ baloosearch -i testfile
    9ca00000028 /home/test/testfile
    9ca0000002a /home/test/testfile
    9ca0000002c /home/test/testfile

That would be a red flag...

> I had to disable baloo because it somehow seriously interferes with my
> ability to move files from the admin PC to the server. With baloo running
> on the server any attempt to transfer files to it results in very slow
> transfer speeds and on occasion failure to complete the move and this is
> occuring while the indexer is reporting idle.
I can only guess where - but you are indexing *really* large files, and there were a couple of fixes two months ago to stop a Mime lookup read the whole file into memory. Bug 398908, fixed according to
   https://bugs.kde.org/show_bug.cgi?id=398908#c97
with 5.83. If you don't have this version, maybe the best thing to do it wait until it gets to you with an update.
Comment 48 Scott 2021-08-08 01:18:06 UTC
Created attachment 140578 [details]
attachment-21123-0.html

To try and understand better what is going on I decided to delete the index
file and start from scratch. I have also piped the terminal output to a
file to better assist me to do this. I will list the issues I come across
as I come across them.

1/ Baloo indexer terminated again but I was not able to regain the prompt
by pressing enter.
2/ I think I may have confused the issue of indexing with displaying
duration by using them interchangeably. I have now discovered that some
files are indexed but do not display a duration. The following file appears
in the indexed list but there is no duration displayed:
    Indexing: /mnt/pool/Entertainment/Documentaries/Movies/Battle of
Jutland the Navys Bloodiest Day 2016.ts: Ok
    xdg-mime query filetype "Battle of Jutland the Navys Bloodiest Day
2016".ts = video/mp2t
    balooshow -x "Battle of Jutland the Navys Bloodiest Day 2016".ts =
f8c500000031
49 63685 Battle of Jutland the Navys Bloodiest Day 2016.ts: No index
information found
3/ I then did a balooctl disable > balooctl enable and compared what was
indexed this time:
    First index 5491 files indxed
    Second index 5345 files indexed
    The 2 files seem totally unrelated with both significant overlap and
differences. The file size is seemingly unrelated, some large files are
indexed and many small files are not.
4/ The Jutland film was indexed on both occasions.
    stat "Battle of Jutland the Navys Bloodiest Day 2016".ts
    File: Battle of Jutland the Navys Bloodiest Day 2016.ts
    Size: 1299907200      Blocks: 2538896    IO Block: 4096   regular file
    Device: 31h/49d Inode: 70217       Links: 1
    Access: (0775/-rwxrwxr-x)  Uid: ( 1001/  shagoo)   Gid: (
1002/shagadmin)
    Access: 2021-08-07 08:50:24.837378054 +0800
    Modify: 2016-06-03 11:55:55.180896791 +0800
    Change: 2021-08-02 20:01:39.015217906 +0800
5/ This seems to be your "red flag"
    baloosearch -i "Battle of Jutland the Navys Bloodiest Day 2016".ts
    f3f600000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
Jutland the Navys Bloodiest Day 2016.ts
    f8c500000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
Jutland the Navys Bloodiest Day 2016.ts
    Elapsed: 0.997359 msecs
6/ No duration was displayed for this file on the first indexing or the
second.

These results were obtained consecutively without any other programs being
run or re-booting the computer.

On Fri, Aug 6, 2021 at 4:11 PM <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=420939
>
> --- Comment #47 from tagwerk19@innerjoin.org ---
> (In reply to Scott from comment #46)
>
> No problem, we carry on troubleshooting.
>
> > I think the problem is more than just misidentifying mime types.
> Finding out about the mimetypes and that baloo would never attempt to index
> some files was one step along the way. Good to find out but there's more
> to do.
>
> > 3/ Further it reports files waiting to be indexed and files failed to
> index
> > both being zero when in fact approximately 1,000 of the 6,000 files in
> the
> > dataset have not been indexed. I have restarted baloo repeatedly and they
> > never get indexed, it re-indexes what it had before.
> It's possible that we've got another mimetype issue with these files, or
> they
> are your 1000 biggest files, or something else. I think copy one of them to
> your home directory and check with
>
>     xdg-mime query filetype ...newstrangefile...
>
> Check that the mimetype is sensible, then see what
>
>     balooshow -x ...newstrangefile...
>
> says.
>
> > 1/ baloo terminates during indexing for unknown reasons (not
> > hanging/freezing as I erroneously stated previously) without providing a
> > reason code.
> I'll ask a bit more about this. Your "balooctl status" output says
>
> > Baloo File Indexer is running
> > Indexer state: Idle
> That's what baloo says when it's alive and thinks it has nothing more to
> do.
> There is the content indexer process "baloo_file_extractor" that is run
> when
> there is indexing necessary, does its job, saves the results, stops and is
> run
> again when there is more to do. This would/should happen in the background
> and
> you wouldn't see exit codes.
>
> > 2/ On restarting the indexing baloo re-indexes the same files with an
> > erroneous message that the files have changed (see my last email) or
> added
> > with baloo being turned off. Baloo is not checking that these index
> entries
> > already exist or there is some problem with the index file itself and so
> > just duplicates them which is why baloo reports over 21,000 files indexed
> > from a dataset only containing 6,000 entries.
> The error message is a:
>
> > ... id seems to have changed. Perhaps baloo was not running, and this
> file was deleted + re-created
> Need to check the Id and see if it is really changing. Ask with "stat",
> you'll
> get something like:
>
>     $ stat 1.ts
>       File: 1.ts
>       Size: 41416704        Blocks: 80896      IO Block: 4096   regular
> file
>     Device: fc01h/64513d    Inode: 794964      Links: 1
>     Access: (0664/-rw-rw-r--)  Uid: ( 1000/    test)   Gid: ( 1000/
> test)
>     Access: 2021-07-24 22:50:57.838161084 +0200
>     Modify: 2021-07-24 22:50:57.838161084 +0200
>     Change: 2021-07-24 22:51:42.686181710 +0200
>     Birth: -
>
> It's the "Device" and "Inode" numbers that you need to keep you eye on.
> The:
>
>     Device: fc01h/64513d    Inode: 794964
>
> If you reboot and these change, baloo will think it's got a new file and
> try to
> index it again. Keep a note of the numbers, check again after a reboot and
> compare.
>
> You could also try a baloosearch for one of the files that always seems to
> be
> reindexed
>
>     $ baloosearch -i ...oneofyoursavedfiles...
>
> If you are OK, baloosearch will give a single result, if the id has been
> changing, "baloosearch -i" would show several lines - with different ID
> numbers
> and the same file/pathname. Something like:
>
>     $ baloosearch -i testfile
>     9ca00000028 /home/test/testfile
>     9ca0000002a /home/test/testfile
>     9ca0000002c /home/test/testfile
>
> That would be a red flag...
>
> > I had to disable baloo because it somehow seriously interferes with my
> > ability to move files from the admin PC to the server. With baloo running
> > on the server any attempt to transfer files to it results in very slow
> > transfer speeds and on occasion failure to complete the move and this is
> > occuring while the indexer is reporting idle.
> I can only guess where - but you are indexing *really* large files, and
> there
> were a couple of fixes two months ago to stop a Mime lookup read the whole
> file
> into memory. Bug 398908, fixed according to
>    https://bugs.kde.org/show_bug.cgi?id=398908#c97
> with 5.83. If you don't have this version, maybe the best thing to do it
> wait
> until it gets to you with an update.
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 49 tagwerk19 2021-08-08 08:16:19 UTC
(In reply to Scott from comment #48)
> ... I have now discovered that some
> files are indexed but do not display a duration. The following file appears
> in the indexed list but there is no duration displayed...
It's "video/mp2t", that's good news.

Does it give a "Duration" if you look with "MediaInfo"?

If it does...

    Try copying to your home directory, see if you can index it there and
    check the results of "balooshow -x"

If it doesn't...

    It'd be a question of thinking back to ask "what is different" or "what
    did I do differently" with this one. Not going to be easy...

> 5/ This seems to be your "red flag"
>     baloosearch -i "Battle of Jutland the Navys Bloodiest Day 2016".ts
>     f3f600000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
> Jutland the Navys Bloodiest Day 2016.ts
>     f8c500000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
> Jutland the Navys Bloodiest Day 2016.ts
>     Elapsed: 0.997359 msecs
This looks "complicated"... Maybe bad news for your indexing.

In each of your examples the "device Id" was 31(hex)/49(decimal). My worry was that was changing when you rebooted.

What is strange is that the (supposed) "Inode" for the file _is_ changing. It may be that I've got this wrong, but it looks like it appears as:

    63685(decimal)
    70217(decimal)

plus baloo has seen:

    f3f6(hex) - which is 62454(decimal)
    f8c5(hex) - which is the 63685(decimal)

If these ID's are jumping round then baloo doesn't have a chance.

You are looking at

    /mnt/pool/...

and have a set of discs mounted behind it? Your original .config/baloofilerc mentioned /media/data/disk01, disk02, disk03 and so on... Maybe the "pool" doesn't keep inodes stable :-/

Try finding your "Battle of Jutland the Navys Bloodiest Day 2016".ts on the mounted discs (rather than on the "pool") and do a stat of it there... It'd be something like:

    cd /media/data/disk1/entertainment/Movies
    stat "Battle of Jutland the Navys Bloodiest Day 2016".ts

The hope is that the device number/inode remains stable (and indeed stays the same when you reboot)
Comment 50 Scott 2021-08-09 01:50:36 UTC
Created attachment 140593 [details]
stderr.txt

The first issue, the pool drive, I use mergerfs:  here
<https://github.com/trapexit/mergerfs> ver. 2.31. Following your suspicions
I deleted the index file, /home/scott/.local/share/baloo/index and changed
the config to follow absolute addresses (attached) removing any reference
to the drive, pool. This cured the red-flag issue from may last post, baloo
now does it's indexing and is only a couple of files shy of the number of
files I expect to be indexed. More importantly it no longer duplicates
entries, if I disable and then enable balooctl no further indexing occurs,
the same is true after re-booting the machine. I performed this 3 times.

The new 2nd issue, running balooctl enable for the first time produces a
piped balooctl monitor file with 5488 files indexed. Running balooctl
status returns 5976 files indexed.

So after doing all this testing I find that not one file displays a
duration anymore. If I copy any file to my /home/Videos directory the
duration is displayed. I have only entered commands which start with
baloo...?

I have also attached the piped stderr file which may be of some value you.



On Sun, Aug 8, 2021 at 4:16 PM <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=420939
>
> --- Comment #49 from tagwerk19@innerjoin.org ---
> (In reply to Scott from comment #48)
> > ... I have now discovered that some
> > files are indexed but do not display a duration. The following file
> appears
> > in the indexed list but there is no duration displayed...
> It's "video/mp2t", that's good news.
>
> Does it give a "Duration" if you look with "MediaInfo"?
>
> If it does...
>
>     Try copying to your home directory, see if you can index it there and
>     check the results of "balooshow -x"
>
> If it doesn't...
>
>     It'd be a question of thinking back to ask "what is different" or "what
>     did I do differently" with this one. Not going to be easy...
>
> > 5/ This seems to be your "red flag"
> >     baloosearch -i "Battle of Jutland the Navys Bloodiest Day 2016".ts
> >     f3f600000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
> > Jutland the Navys Bloodiest Day 2016.ts
> >     f8c500000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of
> > Jutland the Navys Bloodiest Day 2016.ts
> >     Elapsed: 0.997359 msecs
> This looks "complicated"... Maybe bad news for your indexing.
>
> In each of your examples the "device Id" was 31(hex)/49(decimal). My worry
> was
> that was changing when you rebooted.
>
> What is strange is that the (supposed) "Inode" for the file _is_ changing.
> It
> may be that I've got this wrong, but it looks like it appears as:
>
>     63685(decimal)
>     70217(decimal)
>
> plus baloo has seen:
>
>     f3f6(hex) - which is 62454(decimal)
>     f8c5(hex) - which is the 63685(decimal)
>
> If these ID's are jumping round then baloo doesn't have a chance.
>
> You are looking at
>
>     /mnt/pool/...
>
> and have a set of discs mounted behind it? Your original
> .config/baloofilerc
> mentioned /media/data/disk01, disk02, disk03 and so on... Maybe the "pool"
> doesn't keep inodes stable :-/
>
> Try finding your "Battle of Jutland the Navys Bloodiest Day 2016".ts on the
> mounted discs (rather than on the "pool") and do a stat of it there...
> It'd be
> something like:
>
>     cd /media/data/disk1/entertainment/Movies
>     stat "Battle of Jutland the Navys Bloodiest Day 2016".ts
>
> The hope is that the device number/inode remains stable (and indeed stays
> the
> same when you reboot)
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 51 Scott 2021-08-09 01:50:36 UTC
Created attachment 140594 [details]
baloofilerc
Comment 52 tagwerk19 2021-08-12 08:29:44 UTC
It's been a long journey with this bug but I think we've finally got there...

From your information (and in case someone else is seeing similar issues), there's a mergerfs configuration option "use_ino":

    https://github.com/trapexit/mergerfs#inodecalc

That means that mergerfs provides calculated (but stable) inode numbers to things, like baloo, that need them. 

I can also say that the tangled mimetypes you found with video/mp2t and audio/vnd.dts.hd (Bug 440632) has been pinned down and I think sorted. It's likely to take a while though before it gets to peoples' desktops...