IN 0700-2000 OUT 07:00-20:00 IN Tu-Th 8:30-17:30, Fr 8:30-1700 OUT Tu-Th 08:30-17:30, Fr 08:30-17:00
This is indeed a somewhat common mistake in OSM data. It's unfortunately not easy to support (if possible at all), as 4 digit numbers match year numbers as well. Ie. "1900-2100" is a valid expression, but it's a year range, not a time range.
Thanks for taking a look at this, I agree this is a frequent mistake. Random ideas: - Is it possible to parse in stages, so after likely year placeholders are parsed, prefer that that 4 digits represent hhmm? - Is it possible to tighten year definition: 1900: drop or deprioritize 19## as year 20[0-3]#: recognize this as year, drop or deprioritize anything after that.
Right, accepting 4 digit numbers that are a valid time and that fall outside of the expected year values should be possible and could probably cover the majority of these cases already. I'll give that a try.
A possibly relevant merge request was started @ https://invent.kde.org/libraries/kopeninghours/-/merge_requests/81
Git commit 024831289df89c2282637eea7bfe774286b20b22 by Volker Krause. Committed on 05/12/2021 at 10:32. Pushed by vkrause into branch 'release/21.12'. Support 4 digit times to the extend possible The problem with 4 digit times is the ambiguity with year numbers. To solve this we now assume anything that would be a valid time outside of the [2001-2099] range to be a time. That leaves all practically relevant years valid and still covers the vast majority of 4 digit times found in the OSM corpus. This is worth it given how common the 4 digit time mistake is in OSM data. M +2 -2 autotests/evaluatetest.cpp M +5 -1 autotests/parsertest.cpp M +12 -6 src/lib/openinghourslexer.l M +4 -0 src/lib/openinghoursparser.y https://invent.kde.org/libraries/kopeninghours/commit/024831289df89c2282637eea7bfe774286b20b22