Bug 463246 - Importing CSV's with double-sets of quotes ("") splits the field
Summary: Importing CSV's with double-sets of quotes ("") splits the field
Status: RESOLVED FIXED
Alias: None
Product: kmymoney
Classification: Applications
Component: importer (show other bugs)
Version: 5.1.3
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KMyMoney Devel Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-19 18:51 UTC by matt
Modified: 2024-12-19 10:13 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.2
Sentry Crash Report:


Attachments
Test CSV (1.38 KB, text/csv)
2022-12-20 15:25 UTC, matt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description matt 2022-12-19 18:51:17 UTC
SUMMARY
I'm attempting to import Amazon Item Exports for reconciliation.
Amazon's Description column uses `""` to indicate a " symbol (since " is used for surrounding a field).  While LibreOffice handles this fine, KMyMoney treats that as a field delimiter, even though I've specified that fields are delimited by a comma (,).
This forces me to have to perform manual manipulation before importing, and is against spec.

Thank you!

STEPS TO REPRODUCE
1. Create a CSV file with a line where the description field has a ""
    (eg.    "The thing we purchased measures 4""x6""." )
2. Import that CSV into KMyMoney
3. Look at the fields, and you'll see the "x6" and "." have their own fields.

OBSERVED RESULT
A few of my hundreds of lines parsed incorrectly, with "additional fields" that don't exist in other transactions, because the " symbol is necessary for data and the imported CSV uses "" to cause a " symbol to be part of the data.

EXPECTED RESULT
I expect the CSV to be parsed correctly, where fields are only separated by a comma, if the comma is not within "-symbols, and "" causes a " symbol to be part of the data.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 alx.kuzza 2022-12-19 19:44:50 UTC
Could you, please, attach a fragment of the CSV file you are talking about?
Comment 2 matt 2022-12-19 21:10:25 UTC
https://www.rfc-editor.org/rfc/rfc4180

```
2.  Definition of the CSV Format
   ...
   7.  If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.  For example:

       "aaa","b""bb","ccc"
```
Comment 3 Jack 2022-12-19 21:36:02 UTC
It really would make it easier if you provide a sample csv file, even if it's only a header and one data row.  It may be obvious to you what is wrong, but debugging to find exactly where in the code the problem occurs is easier if the developers have a real example to trace through.
Comment 4 matt 2022-12-20 15:20:13 UTC
(In reply to alx.kuzza from comment #1)
> Could you, please, attach a fragment of the CSV file you are talking about?

Sure!
Comment 5 matt 2022-12-20 15:25:23 UTC
Created attachment 154711 [details]
Test CSV

This is a test CSV.  It includes a header (so the fields are understandable) directly from Amazon's Item Download.
Two entries are included, left mostly unmodified.  The first entry should parse fine.  The second line will break up the string in field 3 and stick "650-Sheet Capacity" into field 4.

Please accept my apologies, it appears my original description is likely off.  It seems that, while the fields are not breaking at the double-quote (""), KMyMoney appears to "stop protecting commas" such that commas after a "" are treated as delimiters, not string text.
Comment 6 matt 2022-12-20 15:26:11 UTC
(In reply to Jack from comment #3)
> It really would make it easier if you provide a sample csv file, even if
> it's only a header and one data row.  It may be obvious to you what is
> wrong, but debugging to find exactly where in the code the problem occurs is
> easier if the developers have a real example to trace through.

That's totally reasonable.
And a good request, since I realized that my original description was likely misleading.  Please see test file and accompanying note.
Comment 7 matt 2022-12-20 15:27:35 UTC
Thank you for fast responses, and more thanks for creating such a powerful, beautiful, and friendly piece of F/OSS money-management software!
Comment 8 alx.kuzza 2022-12-21 08:53:04 UTC
(In reply to matt from comment #0)
> SUMMARY
> I'm attempting to import Amazon Item Exports for reconciliation.
> Amazon's Description column uses `""` to indicate a " symbol (since " is
> used for surrounding a field).  While LibreOffice handles this fine,
> KMyMoney treats that as a field delimiter, even though I've specified that
> fields are delimited by a comma (,).
> This forces me to have to perform manual manipulation before importing, and
> is against spec.
> 
> Thank you!
> 
> STEPS TO REPRODUCE
> 1. Create a CSV file with a line where the description field has a ""
>     (eg.    "The thing we purchased measures 4""x6""." )
> 2. Import that CSV into KMyMoney
> 3. Look at the fields, and you'll see the "x6" and "." have their own fields.
> 
> OBSERVED RESULT
> A few of my hundreds of lines parsed incorrectly, with "additional fields"
> that don't exist in other transactions, because the " symbol is necessary
> for data and the imported CSV uses "" to cause a " symbol to be part of the
> data.
> 
> EXPECTED RESULT
> I expect the CSV to be parsed correctly, where fields are only separated by
> a comma, if the comma is not within "-symbols, and "" causes a " symbol to
> be part of the data.
> 
> SOFTWARE/OS VERSIONS
> Windows: 
> macOS: 
> Linux/KDE Plasma: 
> (available in About System)
> KDE Plasma Version: 
> KDE Frameworks Version: 
> Qt Version: 
> 
> ADDITIONAL INFORMATION

Good news, it is reproducible :)
Could you, please, also provide the OS/Version you have a problem with.
Comment 9 Bug Janitor Service 2022-12-21 13:03:36 UTC
A possibly relevant merge request was started @ https://invent.kde.org/office/kmymoney/-/merge_requests/180
Comment 10 matt 2022-12-21 13:31:36 UTC
This is reproducible on Kubuntu 22.04 running the KMyMoney-5.1-322-linux-gcc--x86_64.AppImage AppImage (Version 5.1.3-71876c371)
Comment 11 Thomas Baumgart 2022-12-28 15:46:53 UTC
Git commit 33879f13b955753214385bc1c67e77c6d510a7d5 by Thomas Baumgart, on behalf of Alexander Kuznetsov.
Committed on 28/12/2022 at 15:46.
Pushed by tbaumgart into branch 'master'.

Fix CSV parser to correctly support quoted cells

M  +60   -33   kmymoney/plugins/csv/import/core/csvutil.cpp
M  +23   -11   kmymoney/plugins/csv/import/core/tests/parsedata-test.cpp

https://invent.kde.org/office/kmymoney/commit/33879f13b955753214385bc1c67e77c6d510a7d5
Comment 12 Thomas Baumgart 2022-12-28 15:50:26 UTC
Git commit fc8f557ef56d07faf369ecc5ab917f5ef119493d by Thomas Baumgart, on behalf of Alexander Kuznetsov.
Committed on 28/12/2022 at 15:50.
Pushed by tbaumgart into branch '5.1'.

Fix CSV parser to correctly support quoted cells 

Backported from master
FIXED-IN: 5.1.4

M  +2    -0    .gitignore
M  +40   -13   kmymoney/plugins/csv/import/core/csvutil.cpp
M  +21   -11   kmymoney/plugins/csv/import/core/tests/parsedata-test.cpp

https://invent.kde.org/office/kmymoney/commit/fc8f557ef56d07faf369ecc5ab917f5ef119493d
Comment 13 matt 2023-01-09 22:16:15 UTC
Downloaded AppImage KMyMoney-5.1-348-linux-gcc-x86_64.AppImage and parsing appears to work correctly.  Thank you!