| Summary: | konsole history fill tmpfs | ||
|---|---|---|---|
| Product: | [Applications] konsole | Reporter: | humufr |
| Component: | history | Assignee: | Konsole Bugs <konsole-bugs-null> |
| Status: | CONFIRMED --- | ||
| Severity: | normal | CC: | mglb |
| Priority: | NOR | ||
| Version First Reported In: | 18.12.3 | ||
| Target Milestone: | --- | ||
| Platform: | Arch Linux | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
|
Description
humufr
2019-04-10 07:57:43 UTC
Konsole v18.12.03, Ubuntu 18.04
Test environment: default config (ran with empty $HOME) + infinite scrollback + store history in /tmp.
Before:
0 tmp/konsole-J14446.history
0 tmp/konsole-M14446.history
0 tmp/konsole-T14446.history
Running: base64 -w 511 /dev/urandom | head -n $((1024 * 1024))
After:
48M tmp/konsole-L14446.history
8.0G tmp/konsole-S14446.history
6.0M tmp/konsole-n14446.history
The command above prints "512MB" of characters (assuming 1 character = 1B). Character struct has 16B, so it is consistent with Konsole code.
For reference: This is slightly above 6.7 million lines with 80 characters each.
Lets try compressing history file (i.e. single-format random alphanumeric characters).
Algorithm: LZ4, 4MB block, fast compression (1)
Result: 8GB reduced to 1.5G => 3B/character.
Characters with single format have most of the structure repeated. Additionally, most people do not read half GB of random characters (I hope so).
More realistic input:
Running: find src tests tools \( -name '*.cpp' -or -name '*.h' -or -name '*.py' \) -exec pygmentize {} \;
24M /tmp/konsole-F14446.history
47K /tmp/konsole-V14446.history
374K /tmp/konsole-a14446.history
This outputs all Konsole source files, colorized. But not too much colorized (only keywords, function name in definitions, strings, primitive types, preprocessor). Good simulation of fancy prompt, colorful greps and ls here and there, errors/warnings from compiler, and mostly regular text.
Algorithm: The same as above
Result: 24M reduced to 3.8M => 2.5B/character.
This might be even better with another algorithm; LZ4 is just the first fast algorithm I thought of.
Actually it would be great to use compression even on in-memory history. I'll probably implement compression after finishing my current tasks, unless someone else wants to do it.
|