Bug 443693 - Clicking pause indexer should pause the Baloo indexer, but it doesn't
Summary: Clicking pause indexer should pause the Baloo indexer, but it doesn't
Status: REOPENED
Alias: None
Product: systemsettings
Classification: Unclassified
Component: kcm_baloo (show other bugs)
Version: 5.22.5
Platform: Archlinux Packages Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-14 06:24 UTC by Adam Fontenot
Modified: 2022-06-21 23:19 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.24


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Fontenot 2021-10-14 06:24:15 UTC
SUMMARY

When clicking "Pause Indexer" to pause file indexing in the File Search KCM, most users will expect indexing to be paused. But if the indexer is in the middle of extracting text from a file, clicking the button seemingly does nothing at all.

STEPS TO REPRODUCE
1. Get into a situation where baloo_file_extractor is using way too much CPU or memory and needs to be paused because it's slowing the rest of the system down.
2. Click "Pause Indexer".

OBSERVED RESULT

Nothing at all happens. In my case, I verified with lsof that baloo_file_extractor was working on one specific file continuously. It was using 100% of one CPU core and about 1 GB of memory. After clicking the button, it continued using 100% of one CPU core and 1 GB of memory for several minutes. I gave up and SIGKILLed it.

EXPECTED RESULT

Baloo *instantly* terminates all running baloo_file_extractor instances until the appropriate time to unpause the indexer. (When is that, exactly? The KCM interface provides no hints.)

If for some reason that is not possible (e.g. the state of the extractor must be kept in memory or be lost completely, and the latter is considered unacceptable), at the very least the process should be sent a signal to sleep until woken up so that it won't use any CPU.

Reasoning: ordinary users should not be expected to have to terminate system processes that are malfunctioning. Those who do are likely to be frustrated when some time later, Baloo starts indexing again and they run into the same problem. Rather, managing Baloo should be possible in the GUI without disabling it completely. This means that it needs to instantly respond to the user's commands, even if this behavior seems suboptimal from the point of view of the program.

SOFTWARE/OS VERSIONS
Linux: Arch Linux x86_64 (kernel 5.14.9)
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.86.0
Qt Version: 5.15.2
Comment 1 Nate Graham 2021-10-14 23:18:27 UTC
Yeah this does sometimes happen.
Comment 2 Nate Graham 2022-01-04 04:10:29 UTC
Fixed by Yerrey Dev with https://invent.kde.org/plasma/plasma-desktop/-/commit/7206fd574e636fe1d5056f398182e6e3439045d1 in Plasma 5.24!
Comment 3 Adam Fontenot 2022-06-19 19:15:24 UTC
This isn't fixed for me. Perhaps it's fixed in some situations but not others? Let me know if you want me to open a new bug.

Specific situation in which I saw this bug: I was compiling some software. All eight (virtual) cores on my system were in use. I had htop open and noticed that despite all cores being at 100%, baloo_file was using about 60% CPU (of one core) consistently, and more than 10% of my RAM. This despite being nice 19! (Perhaps something wrong with the scheduler, but I haven't looked into it further.)

I opened the system settings, and Baloo's status said "Idle, 100% complete". Clicking "Pause Indexer" did nothing. baloo_file continued to run with a pretty aggressive amount of CPU use given that I was trying to compile some software.

Speculation:

 * Baloo can be wrong about whether its indexer is running or not
 * When Baloo thinks its indexer is not running, and it is, clicking "Pause Indexer" will fail to stop the indexer.

SIGTERM sent to baloo_file stopped it successfully. 

SOFTWARE/OS VERSIONS (updated)
Linux: Arch Linux x86_64 (kernel 5.18.5)
KDE Plasma Version: 5.25.0
KDE Frameworks Version: 5.95.0
Qt Version: 5.15.4
Comment 4 Nate Graham 2022-06-21 16:11:21 UTC
When you click "Pause Indexer", can you run `balooctl status` in a terminal window and paste the output? I suspect what might be going on is that the indexer itself does in fact get paused, but its running indexing processes don't.
Comment 5 Adam Fontenot 2022-06-21 23:19:00 UTC
(In reply to Nate Graham from comment #4)
> When you click "Pause Indexer", can you run `balooctl status` in a terminal
> window and paste the output? I suspect what might be going on is that the
> indexer itself does in fact get paused, but its running indexing processes
> don't.

I was able to reproduce the exact conditions. Here's a timeline.

1. Start `balooctl monitor` and `htop`.
2. Run `balooctl status`. See Output 1 below.
3. Start compiling a large piece of software that will create many new files in a directory that is indexed by Baloo. For this test, I was compiling the Arch Linux package `libreoffice-fresh`.
4. Observe that `baloo_file` becomes active in htop. In the monitor, note that the status changes to "Indexing new files" or "Indexing modified files". High CPU use by baloo is observed despite near-100% CPU demand from the compiler.
5. Open System Settings. Baloo settings page is oddly slow to load, but when it loads it says "Idle". I have noticed that running `balooctl status` will *hang* for a long time if the indexer is running. Presumably the indexer has briefly caught up with the compiler?
6. Click "Pause Indexer". Baloo seems indecisive, but after 20 seconds or so "Suspended" appears in the monitor window.
7. Observe that `baloo_file` continues to run in `htop`, using a large amount of CPU.
8. Run `balooctl status`. See Output 2 below.
9. Start writing this bug report. As I type this, `baloo_file` has continued to run. It has grown to over 1.7 GB of resident memory use, and continues to eat a lot of CPU. For good measure, I am running `balooctl status` again now. It takes over 50 seconds to complete. The monitor window continues to show "Suspended" as its last status message. See Output 3 below.

Something that sticks out to me about this last state is that the number of indexed files is actually *dropping* over time, and despite this, the index size has ballooned (pun not intended) to an enormous size given the relatively moderate number of files indexed. I know for sure that this dramatic increase was the result of this testing, because I was recently forced to delete the Baloo data directory after seeing this issue. 

Nothing I have been able to do (including deleting the build directory entirely and trying to force baloo to recheck it) has been able to reduce this increased disk space usage. Every time it happens I am forced to delete Baloo's index and start over.

I'm including the result of `balooctl indexSize` below, as Output 4. I'm in the process of trying to write a bug report to cover disk usage problems with Baloo, as there isn't one at present I don't think. (There's one for i/o utilization, cpu use, memory consumption, etc.) If there is anything I can provide to make that bug report better, please let me know. I'd like to avoid side-tracking this bug with the disk usage problem as it isn't the central issue here.

Content indexing has been completely disabled throughout this entire process.

Output 1 (status before compiling begins):

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 270,102
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 134.10 MiB

Output 2 (status while compiling continues, and after Baloo is suspended):

Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 295,546
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 182.13 MiB

Output 3 (status after writing this comment, after previously stopping the build):

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 283,689
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 2.37 GiB

Output 4 (current indexSize):

File Size: 2.37 GiB
Used:      114.48 MiB

           PostingDB:      18.00 MiB    15.723 %
          PositionDB:      18.85 MiB    16.467 %
            DocTerms:      19.36 MiB    16.907 %
    DocFilenameTerms:      17.24 MiB    15.061 %
       DocXattrTerms:       4.00 KiB     0.003 %
              IdTree:       7.27 MiB     6.350 %
          IdFileName:      19.51 MiB    17.040 %
             DocTime:      12.43 MiB    10.861 %
             DocData:            0 B     0.000 %
   ContentIndexingDB:            0 B     0.000 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:       1.82 MiB     1.587 %