Bug 514195

Summary: Application becomes unresponsive when searching for rare words in a large file (>2.4 GB)
Product: [Applications] kate Reporter: ldargevicius20
Component: searchAssignee: KWrite Developers <kwrite-bugs-null>
Status: REPORTED ---    
Severity: minor CC: christoph, ldargevicius20
Priority: NOR    
Version First Reported In: 25.12.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description ldargevicius20 2026-01-05 17:53:06 UTC
Application becomes unresponsive when searching for rare words in a large file (>2.4 GB)

SUMMARY
When attempting to search for a word that appears only once or twice in a large file (>2.4 GB, based on my testing), the application becomes unresponsive (UI freezes) for some time (depends on a file size, 20-60sec on my machine, 2.5gb txt file).

STEPS TO REPRODUCE
1. Open the large file in Kate
2. Search for a word that dosent appear a lot
3. Observe UI freeze

OBSERVED RESULT
UI becomes unresponsive

EXPECTED RESULT
UI should remain responsive, and/or show progress

SOFTWARE/OS VERSIONS
- Kate 25.12.0-1 from the official Arch Linux repository (Arch Linux)
- Kate built from source on 2026-01-02 (Arch Linux)
- Kate 25.12.0 from the Fedora repository (Fedora Linux 43)

ADDITIONAL INFORMATION
To reproduce the issue, I created synthetic test data using a simple Python script:
```python
import random
import string
import os

OUTPUT_FILE = "large_test_file_2-5gb.txt"
FILE_SIZE_IN_GB = 2.5

TARGET_SIZE_BYTES = int(FILE_SIZE_IN_GB * pow(1024, 3))
CHUNK_SIZE = pow(1024, 2)
MIN_WORD_LEN = 3
MAX_WORD_LEN = 12
WORDS_PER_LINE = 1000

def generate_chunk(target_bytes):
    lines = []
    size = 0

    while size < target_bytes:
        line_words = (''.join(random.choices(string.ascii_lowercase, k=random.randint(MIN_WORD_LEN, MAX_WORD_LEN))) for _ in range(WORDS_PER_LINE))
        line = " ".join(line_words) + "\n"
        lines.append(line)
        size += len(line)

    return "".join(lines)

def create_file():
    print("Program started!")
    written = 0

    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        while written < TARGET_SIZE_BYTES:
            remaining = TARGET_SIZE_BYTES - written
            chunk_size = min(CHUNK_SIZE, remaining)

            chunk = generate_chunk(chunk_size)
            f.write(chunk)
            written += len(chunk.encode("utf-8"))

            if written % (100 * 1024 * 1024) < CHUNK_SIZE:
                print(f"Written: {written / (1024**3):.2f} GB")

    print("Program done!")
    print("Final size:", os.path.getsize(OUTPUT_FILE))

create_file() 
```

I’ll try to look into this problem to understand what’s wrong. If you have any suggestions on how to solve it, I’d really appreciate hearing them.
Comment 1 Christoph Cullmann 2026-01-10 18:12:11 UTC
You could profile it with perf.
The code for in document search is in KTextEditor.