Bug 514195 - Application becomes unresponsive when searching for rare words in a large file (>2.4 GB)
Summary: Application becomes unresponsive when searching for rare words in a large fil...
Status: REPORTED
Alias: None
Product: kate
Classification: Applications
Component: search (other bugs)
Version First Reported In: 25.12.0
Platform: Arch Linux Linux
: NOR minor
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-01-05 17:53 UTC by ldargevicius20
Modified: 2026-01-10 18:12 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ldargevicius20 2026-01-05 17:53:06 UTC
Application becomes unresponsive when searching for rare words in a large file (>2.4 GB)

SUMMARY
When attempting to search for a word that appears only once or twice in a large file (>2.4 GB, based on my testing), the application becomes unresponsive (UI freezes) for some time (depends on a file size, 20-60sec on my machine, 2.5gb txt file).

STEPS TO REPRODUCE
1. Open the large file in Kate
2. Search for a word that dosent appear a lot
3. Observe UI freeze

OBSERVED RESULT
UI becomes unresponsive

EXPECTED RESULT
UI should remain responsive, and/or show progress

SOFTWARE/OS VERSIONS
- Kate 25.12.0-1 from the official Arch Linux repository (Arch Linux)
- Kate built from source on 2026-01-02 (Arch Linux)
- Kate 25.12.0 from the Fedora repository (Fedora Linux 43)

ADDITIONAL INFORMATION
To reproduce the issue, I created synthetic test data using a simple Python script:
```python
import random
import string
import os

OUTPUT_FILE = "large_test_file_2-5gb.txt"
FILE_SIZE_IN_GB = 2.5

TARGET_SIZE_BYTES = int(FILE_SIZE_IN_GB * pow(1024, 3))
CHUNK_SIZE = pow(1024, 2)
MIN_WORD_LEN = 3
MAX_WORD_LEN = 12
WORDS_PER_LINE = 1000

def generate_chunk(target_bytes):
    lines = []
    size = 0

    while size < target_bytes:
        line_words = (''.join(random.choices(string.ascii_lowercase, k=random.randint(MIN_WORD_LEN, MAX_WORD_LEN))) for _ in range(WORDS_PER_LINE))
        line = " ".join(line_words) + "\n"
        lines.append(line)
        size += len(line)

    return "".join(lines)

def create_file():
    print("Program started!")
    written = 0

    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        while written < TARGET_SIZE_BYTES:
            remaining = TARGET_SIZE_BYTES - written
            chunk_size = min(CHUNK_SIZE, remaining)

            chunk = generate_chunk(chunk_size)
            f.write(chunk)
            written += len(chunk.encode("utf-8"))

            if written % (100 * 1024 * 1024) < CHUNK_SIZE:
                print(f"Written: {written / (1024**3):.2f} GB")

    print("Program done!")
    print("Final size:", os.path.getsize(OUTPUT_FILE))

create_file() 
```

I’ll try to look into this problem to understand what’s wrong. If you have any suggestions on how to solve it, I’d really appreciate hearing them.
Comment 1 Christoph Cullmann 2026-01-10 18:12:11 UTC
You could profile it with perf.
The code for in document search is in KTextEditor.