SUMMARY I am running Filelight 20.12.2 as installed from the Microsoft Store on Windows 10 on a folder containing subfolders that contain CJK characters, in this case, Korean. Here is the folder listing. Note that a folder name with nordic characters is included as well (Ófærð). Confession - 자백 (2019)/ Heartless City - 무정도시 (2013)/ Partners for Justice - 검법남녀 (2018-2019)/ Police University - 경찰수업 (2021)/ Run On - 런 온 (2020)/ Silicon Valley (2014)/ So I Married The Anti-Fan - 그래서 나는 안티팬과 결혼했다 (2021)/ Tell Me What You Saw - 본 대로 말하라 (2020)/ The Road - The Tragedy of One - 더 로드 1의 비극 (2021)/ Trapped (Ófærð) (2015)/ What's Wrong With Secretary Kim? 비서가 왜 그럴까 (2018)/ In Filelight's display of the space usage of the folder containing the aforementioned subfolders, all folders containing the Korean script are completely omitted and don't factor into the analysis. The folder with nordic characters is getting included correctly. STEPS TO REPRODUCE 1. Take any NTFS volume. Create a directory in it, and create subdirectories with Korean character names. 2. Run Filelight on the volume or the upper directory. OBSERVED RESULT Any folders that contain Korean characters in their names are excluded from the analysis and view, as if they did not exist. EXPECTED RESULT Folders with Korean characters should get included normally. SOFTWARE/OS VERSIONS Windows: Windows 10 21H1 19043.1165 Education macOS: - Linux/KDE Plasma: - (available in About System) KDE Plasma Version: n/a KDE Frameworks Version: 5.79.0 Qt Version: 5.15.2 ADDITIONAL INFORMATION The same problem occurs when the folder name contains Japanese characters including mixed language such as "Some latin text 完全版".
Is this still a problem with the latest release on the store?
(In reply to Harald Sitter from comment #1) > Is this still a problem with the latest release on the store? I have Filelight 21.12.3 with kdeframeworks 5.91.0 and qt 5.15.2 from the store at the moment and entries with CJK characters in their names are still getting omitted from the scan.
There is a bug somewhere in the iteration system where it doesn't properly use unicode. Haven't managed to find it yet though :(( Can confirm the issue.
The wrapper API we use for directory iteration is garbage and not properly unicode aware. I think the most solid fix both short and long term is to port away from it and properly abstract the code paths for windows and posix so we can have solid iteration results on either platform. Does require lots of new code though, so that's a bit sad.
Git commit e4c9db692acf2969ef14a927a842fa5edc657887 by Harald Sitter. Committed on 28/04/2022 at 11:33. Pushed by sitter into branch 'release/22.04'. rebuild the iteration tech using better architecture the previous approach just didn't cut it for windows. the new code sports a forward iterator that fronts for a platform-dependent walker object that encapsulates the iteration logic this looks and feels a lot like std::filesystem API but unfortunately we cannot really use that API directly because I want this change to be conservative enough to land in 22.04 as a bugfix for windows, also on POSIX std::filesystem returns the st_size (size in bytes) whereas we want the actual occupied blocks (st_blocks*size), and lastly it's also a tad slower because of heavier abstraction should we choose to go the std::filesystem route in the future anyway it should be a trivial switch because of how similar the APIs are. furthermore move to always convert from/to utf8. the QFile helpers ultimately end up in the same code paths anyway, so it seems simpler to just go with the utf8 variants directly (also on windows QFile somehow produces bogus output for actual unicode characters) the combined set of changes improves windows support substantially. it's now correctly iterating unicode entries, and correctly displaying unicode characters. iteration in general now has unit testing. M +13 -0 autotests/CMakeLists.txt A +129 -0 autotests/directoryIteratorTest.cpp [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] A +0 -0 autotests/iterator-tree.in/Con 자백/.keep A +1 -0 autotests/iterator-tree.in/bar A +0 -0 autotests/iterator-tree.in/foo/.keep A +7 -0 autotests/test-config.h.cmake M +12 -1 src/CMakeLists.txt A +14 -0 src/directoryEntry.h [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] A +4 -0 src/directoryIterator.cpp [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] A +66 -0 src/directoryIterator.h [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] M +2 -2 src/fileTree.cpp M +1 -2 src/fileTree.h M +25 -137 src/localLister.cpp A +105 -0 src/posixWalker.cpp [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] A +37 -0 src/posixWalker.h [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] M +1 -1 src/radialMap/map.cpp A +115 -0 src/windowsWalker.cpp [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] A +36 -0 src/windowsWalker.h [License: GPL(3+eV) GPL(v3.0) GPL(v2.0)] https://invent.kde.org/utilities/filelight/commit/e4c9db692acf2969ef14a927a842fa5edc657887