Bug 374352 - Payee merge converts imported name containing string "Check" to existing Payee name "Check"
Summary: Payee merge converts imported name containing string "Check" to existing Paye...
Status: RESOLVED FIXED
Alias: None
Product: kmymoney
Classification: Applications
Component: importer (show other bugs)
Version: 4.8.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: KMyMoney Devel Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-30 20:28 UTC by Peter J. Farley III
Modified: 2019-08-29 01:32 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.8.1,5.0.0


Attachments
Test CSV file for demonstrating bug behavior (1.31 KB, application/csv)
2016-12-31 05:54 UTC, Peter J. Farley III
Details
CSV Export of checking account after inmporting test input file (1.08 KB, application/csv)
2016-12-31 05:57 UTC, Peter J. Farley III
Details
CSV export of same transactions with all "Transfer" Payees as initially desired (1.13 KB, application/csv)
2016-12-31 06:01 UTC, Peter J. Farley III
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter J. Farley III 2016-12-30 20:28:41 UTC
CSV import of transactions with Payee name "Transfer to Checking Plus" incorrectly converts Payee text to previously imported Payee name "Check".

Several other Payee names like "Transfer to Checking" and "Transfer From Checking" are also incorrectly converted to Payee name "Check".

In each case of incorrect change to Payee name "Check", a "Check Number" value was also present in the imported data, but in all of these cases this number does not actually represent a "check number".  Instead that value is an online "reference number" generated from online banking transfers.

Not all such transfer transactions are changed.  Other similar transfers with non-null "Check Number" fields are not changed.  Only those transaction which also have the string "Check" embedded in the Payee name are incorrectly changed.

Payee merge should respect distinct Payee names even when the "Check Number" field is populated and the string "Check" (or other string representing an existing Payee value) is present in the Payee field.
Comment 1 NSLW 2016-12-30 21:00:46 UTC
Hi Peter,
your problem explanation is tedious and time consuming. Could you attach CSV file you've got problems with (anonymize it somehow)? Then maybe explain CSV row by CSV row what is your result and what you expect. It would be easier to solve this issue.
Comment 2 allan 2016-12-30 21:10:26 UTC
Also, would you say how you have setup matching for the troublesome payee/s.  It may require some tuning.
Allan
Comment 3 Jack 2016-12-30 22:25:46 UTC
Please note that the import is not changing any names, it is matching the imported row to an existing payee - albeit not the one you want.  As Allan asked - check whether you have matching turned on for the "Check" payee.  In addition, I would even ask if you really want to have a payee named "Check"?

Separately, things like "Transfer to Checking" and the others you listed do not seem like payees at all, but more likely a Memo field.  So - I wonder if this is a problem with terminology.

Final point - when you do a CSV import, you have to specify which columns contains the various fields.  Until you can provide a sample csv file, and state which columns you map to which fields for csv import, I have trouble imagining that a column with "Transfer to Checking" really contains what you want to match to a Payee.
Comment 4 allan 2016-12-30 22:44:35 UTC
(In reply to Jack from comment #3)
> Please note that the import is not changing any names, it is matching the
> imported row to an existing payee - albeit not the one you want.  As Allan
> asked - check whether you have matching turned on for the "Check" payee.  In
> addition, I would even ask if you really want to have a payee named "Check"?
> 
> Separately, things like "Transfer to Checking" and the others you listed do
> not seem like payees at all, but more likely a Memo field.  So - I wonder if
> this is a problem with terminology.

Unfortunately, my banks don't really have a concept of Payee, but tend to use the field more descriptively.  Such entries as these are not uncommon.  One can select to copy the payee field to the memo, but the matching still goes by the payee.
Allan

> Final point - when you do a CSV import, you have to specify which columns
> contains the various fields.  Until you can provide a sample csv file, and
> state which columns you map to which fields for csv import, I have trouble
> imagining that a column with "Transfer to Checking" really contains what you
> want to match to a Payee.
Comment 5 Peter J. Farley III 2016-12-31 02:22:15 UTC
When I begin this import process there are NO PAYEES AT ALL and no transactions at all in the system, so there are no Payee settings to begin with.  This first import process establishes all of my Payees over the last 3 years.

I indeed do want Payee names like "Transfer to Checking" and "Check".  The ones that are simply "Check" will be manually modified by me after import to reflect the actual hand-written Payee name on the actual check that was written, and the Category will be changed from the imported "Misc/Other" to the true category for that transaction.  The bank I use does not OCR checks to try to extract even the most clearly block-printed Payee names on the actual, physical checks that I have written, so I do have to change those manually after import. 

The "Transfer" Payee names are a result of online transfers initiated manually by me between multiple accounts of multiple types (savings, checking, etc.) linked together on the bank website.  The "Transfer" items have no actual Payee name, and I have chosen these "Transfer" Payee names to suit my personal preferences.

The CSV import file is a processed version of the CSV items actually downloaded from my bank, edited for consistency of format and names in the Payee and Memo fields, and to automatically add Category fields for each edited Payee.  Unfortunately the bank "Payee" names are horribly inconsistent and change over time as well, so pre-processing for consistency and clarity is required.

I will extract a small representative set of CSV import items which demonstrate the problem results and the expected results and upload them soon.

FWIW, the CSV file I import has these column assignments, which I will reiterate when I upload the sample file:

Col 2 -- Date
Col 3 -- Number
Col 4 -- Payee
Col 5 -- Memo
Col 6 -- Debit
Col 7 -- Credit
Col 11 -- Category

Thanks for all of your prompt and helpful comments.  I will get to the sample CSV as soon as I may, given RL intrusions this holiday season.

Regards,

Peter
Comment 6 Jack 2016-12-31 04:09:14 UTC
Given the process that you need to follow, based on what your bank provides, it seems that the problem is likely as Allan suggested - if the payee "Check" got created first, then based on how the matching criteria, anything with "Check" in the column you matched to payee will match to "Check" and not create or match with something else.  In this case, if you are starting from scratch, it might be worth manually creating that single payee, and assuring that none of the Matching criteria are checked.  I'm not completely sure that will work, but I think it's worth trying before doing much more.

One other approach would be to manually fix the actual payee in the csv file, rather than fixing it after import.  I can't be sure whether it would be more or less extra typing, but it might avoid the problem you are having with your current process.

One thing to remember, hopefully once you get it sorted out this time, thing will go more smoothly in the future.
Comment 7 Peter J. Farley III 2016-12-31 05:54:57 UTC
Created attachment 103107 [details]
Test CSV file for demonstrating bug behavior

Import this test file to an empty checking account with starting balance 6000.00 in US dollars.
Comment 8 Peter J. Farley III 2016-12-31 05:57:19 UTC
Created attachment 103108 [details]
CSV Export of checking account after inmporting test input file

Imported transactions with Payee names "Transfer to Checking Plus", "Transfer to Checking" and "Transfer From Checking" are all converted to Payee name "Check".
Comment 9 Peter J. Farley III 2016-12-31 06:01:15 UTC
Created attachment 103109 [details]
CSV export of same transactions with all "Transfer" Payees as initially desired

This is a CSV export after manual correction of the incorrect Payee changes showing the results desired from the initial Import instead of requiring manual changes after the import.
Comment 10 Peter J. Farley III 2016-12-31 06:09:48 UTC
Creating the several "Transfer From/to ..." and the "Check" Payees ahead of the import and setting them to "No Matching" did not help resolve this issue.

When you import the test CSV file that I uploaded, use these column settings:

Col 2 -- Date (Format y m d)
Col 3 -- Number
Col 4 -- Payee
Col 5 -- Memo
Col 6 -- Debit
Col 7 -- Credit
Col 11 -- Category

There are custom categories for Income and Expense in the test CSV not found in the standard category lists:

Income:
Banking:Transfer In
Other Income:Tax Refund

Expense:
Banking:Transfer Out
Miscellaneous/Other:Other

If there is anything I left out to help demonstrate the issue please let me know.
Comment 11 NSLW 2016-12-31 11:16:13 UTC
Git commit aef75f44b4276c526679fb3727ef64719222f17e by Łukasz Wojniłowicz.
Committed on 31/12/2016 at 11:15.
Pushed by wojnilowicz into branch '4.8'.

Exact payees matching
FIXED-IN: 4.8.1

M  +1    -1    kmymoney/converter/mymoneystatementreader.cpp

https://commits.kde.org/kmymoney/aef75f44b4276c526679fb3727ef64719222f17e
Comment 12 NSLW 2016-12-31 11:20:55 UTC
Git commit 586224503f42441365e284df71fe32ab88590dd3 by Łukasz Wojniłowicz.
Committed on 31/12/2016 at 11:20.
Pushed by wojnilowicz into branch 'master'.

Exact payees matching
FIXED-IN: 4.8.1
GIT_SILENT

M  +1    -1    kmymoney/converter/mymoneystatementreader.cpp

https://commits.kde.org/kmymoney/586224503f42441365e284df71fe32ab88590dd3
Comment 13 NSLW 2016-12-31 11:25:25 UTC
(In reply to Jack from comment #6)
> Given the process that you need to follow, based on what your bank provides,
> it seems that the problem is likely as Allan suggested - if the payee
> "Check" got created first, then based on how the matching criteria, anything
> with "Check" in the column you matched to payee will match to "Check" and
> not create or match with something else.  In this case, if you are starting
> from scratch, it might be worth manually creating that single payee, and
> assuring that none of the Matching criteria are checked.  I'm not completely
> sure that will work, but I think it's worth trying before doing much more.

It won't work because during import new "Check" payee will be created with matching criteria enabled and it will block off creation of other payees with given CSV files.
Comment 14 Thomas Baumgart 2016-12-31 12:14:39 UTC
I might have to revert this last change by Lukasz because it changes the behavior of the matching process and may break existing installations (mine as well ;) ). I have not found all the details, but there was a reason for this change back in Dec. 2008 (wow, we come a long way)

http://kmymoney2.cvs.sourceforge.net/viewvc/kmymoney2/kmymoney2/kmymoney2/converter/mymoneystatementreader.cpp?r1=1.62&r2=1.63&

on line 813. The checkin comment says

  Reworked matching logic so that in case of multiple payees match,
  the one with the largest match is taken. This returns the name
  matching to the 'old' functionality of partial matching.

and thus we wanted the 'partial matching' which Lukasz removed with his change.

A better solution to fix the problem here would be to setup the matching of newly created entries to be exact which means to simply enclose them in ^ and $ and assign the MyMoneyPayee::matchKey method instead of MyMoneyPayee::matchName in MyMoneyStatementReader::processTransactionEntry() where the MyMoneyFile::addPayee parameters are constructed. This should solve the issue as well w/o interfering existing databases.

I therefore reopen the tracker entry.
Comment 15 NSLW 2016-12-31 12:37:48 UTC
(In reply to Thomas Baumgart from comment #14)
> I might have to revert this last change by Lukasz because it changes the
> behavior of the matching process and may break existing installations (mine
> as well ;) ). I have not found all the details, but there was a reason for
> this change back in Dec. 2008 (wow, we come a long way)
> 
> http://kmymoney2.cvs.sourceforge.net/viewvc/kmymoney2/kmymoney2/kmymoney2/
> converter/mymoneystatementreader.cpp?r1=1.62&r2=1.63&
> 
> on line 813. The checkin comment says
> 
>   Reworked matching logic so that in case of multiple payees match,
>   the one with the largest match is taken. This returns the name
>   matching to the 'old' functionality of partial matching.
> 
> and thus we wanted the 'partial matching' which Lukasz removed with his
> change.
> 
> A better solution to fix the problem here would be to setup the matching of
> newly created entries to be exact which means to simply enclose them in ^
> and $ and assign the MyMoneyPayee::matchKey method instead of
> MyMoneyPayee::matchName in MyMoneyStatementReader::processTransactionEntry()
> where the MyMoneyFile::addPayee parameters are constructed. This should
> solve the issue as well w/o interfering existing databases.
> 
> I therefore reopen the tracker entry.

In which way it breaks existing installations?

Your suggestion would allow creating of all payees during first import as expected, 
but, as I assume, if you've already got "Transfer to Checking" and "Check" in your payees list then:
"Check" will match with "Check" as expected
but it will also match with
"Transfer to Checking"
as described in this bug. MyMoneyPayee::matchKey and MyMoneyPayee::matchName doesn't matter here, because user can change it freely after first import. 
What's your opinion?
Comment 16 Thomas Baumgart 2016-12-31 12:52:07 UTC
See https://docs.kde.org/stable4/en/extragear-office/kmymoney/details.payees.personalinformation.html#idm45525669640480 The given example will not work anymore. And I have plenty of those.

If one already has data on file it needs to be changed to work as expected. So to make the given case work, one would setup an exact match for 'Check' to be entered as '^Check$' and then no mismatch with other payees that include the substring 'check' will happen. And then, the match method really differs.

I am talking about the initial assignment for new payees needs to be setup and know that things can freely be customized afterwords using the UI.
Comment 17 NSLW 2016-12-31 13:10:16 UTC
(In reply to Thomas Baumgart from comment #16)
> See
> https://docs.kde.org/stable4/en/extragear-office/kmymoney/details.payees.
> personalinformation.html#idm45525669640480 The given example will not work
> anymore. And I have plenty of those.
> 
> If one already has data on file it needs to be changed to work as expected.
> So to make the given case work, one would setup an exact match for 'Check'
> to be entered as '^Check$' and then no mismatch with other payees that
> include the substring 'check' will happen. And then, the match method really
> differs.
> 
> I am talking about the initial assignment for new payees needs to be setup
> and know that things can freely be customized afterwords using the UI.

Ok, I understand now how it could be resolved differently.
Doesn't it mean that from now on all payees should have MyMoneyPayee::matchKey (and not MyMoneyPayee::matchName) by default and first key should be created automatically and be something like this '^Check$' but then MyMoneyPayee::matchName wouldn't be needed anymore. Moreover new user would have to learn regular expressions syntax.

That seems like a lot of complications for average user which only have cumbersome payees.

Wouldn't it be better if regular expressions (MyMoneyPayee::matchKey) would be partially matched and plain names (MyMoneyPayee::matchName) would be exactly matched?
Comment 18 Peter J. Farley III 2016-12-31 19:57:33 UTC
I would be happy for a solution that required me (the end user) to signal "do not match this Payee, take it literally as written", perhaps with a leading special character that would not normally appear in Payee names (perhaps tilde, "~", for instance).  The import process would remove the leading special character and store the remainder of the Payee field as-is and mark that Payee as "Do not match" if it is a new Payee.

For users like me who need to pre-process CSV imports anyway, such a solution is not a lot of additional effort and is a reasonable solution to the problem which does not break any existing Payee matching procedures.

However, even with this solution I can see a potential issue for CSV exports.  In order to be useful to re-create an entire account, the Payees marked "Do not match" would need to have the "special character" added to at least the first occurrence of that Payee in the export file, and that may break or at least annoy users who export to CSV and expect to see exactly what is in the base file.

To avoid that export issue may require an option on CSV export to mark the first "Do not match" Payee with the "special character" or not, defaulting to NOT add the character, and requiring users like me to check that option to get the results that we desire.

I also have not considered what QIF/OFX imports and exports should or should not do, since I am not dealing with those formats at the moment.  I would think they have the same issue as I have reported, and may need the same solution (whatever that may become).

Peter
Comment 19 Peter J. Farley III 2016-12-31 20:06:37 UTC
Thomas, I note this statement in the link you provided to "Transaction Matching Settings":

No Matching. Disables the feature for this payee. This is the default setting for all payees.

My CSV import process test marked all new Payees with the option "Match on Payee Name" instead of "No Matching".  Is that perhaps the issue at hand?  Or is the documentation out of date?

Peter
Comment 20 Thomas Baumgart 2017-01-01 12:02:34 UTC
This probably needs some adjustment of the documentation as there is a difference for the initial value for this setting between "manually created by the user" and "automatically created during import".
Comment 21 Thomas Baumgart 2017-01-01 13:42:32 UTC
Git commit 5ecbe8f8a74fbf14fbbac06ea418731aa51d884c by Thomas Baumgart.
Committed on 01/01/2017 at 13:38.
Pushed by tbaumgart into branch 'master'.

Improve previous commit for exact name match

Only perform the exact name match on payees that have been created
automatically during import.
This keeps the behavior for existing payee data untouched while
solving the problem.

M  +3    -3    kmymoney/converter/mymoneystatementreader.cpp

https://commits.kde.org/kmymoney/5ecbe8f8a74fbf14fbbac06ea418731aa51d884c
Comment 22 NSLW 2017-01-01 14:16:43 UTC
I tested this patch as follows:
1) create new .kmy file
2) create new payee named "Chec" and select "Matching on Payee name"
3) create import profile for "KMY_TST_BUG_374352.csv"
4) import "KMY_TST_BUG_374352.csv"

Results:
Transactions with payee
"Transfer to Checking"
"Transfer to Checking Plus"
"Transfer From Checking"
"Check"
all went to payee
"Chec" (not as expected)
and no payee with one of following names
"Transfer to Checking"
"Transfer to Checking Plus"
"Transfer From Checking"
"Check"
were created (not as expected).

This patch fixes specific user's single use case and certainly doesn't resolve bigger problem.
Comment 23 Thomas Baumgart 2017-01-01 16:10:38 UTC
Sorry, in this case your test case is wrong. To get what you want do

1) create new .kmy file
2) create new payee named "Chec" and select "Matching on on a name listed below" and enter "^Chec$" into the list (w/o the quotes)
3) create import profile for "KMY_TST_BUG_374352.csv"
4) import "KMY_TST_BUG_374352.csv

I know, this is not an easy to use solution, but it will be the work-around for the 4.8 series. In KF5 my plan is to add a "Matching on name (exact)" or similar option to avoid regexp's for the 'normal' user.
Comment 24 Peter J. Farley III 2017-01-01 19:19:55 UTC
(In reply to Thomas Baumgart from comment #23)
> Sorry, in this case your test case is wrong. To get what you want do
<Trimmed>
> I know, this is not an easy to use solution, but it will be the work-around
> for the 4.8 series. In KF5 my plan is to add a "Matching on name (exact)" or
> similar option to avoid regexp's for the 'normal' user.

This slightly modified method worked for me:

1) create new .kmy file
2) create new payee named "Check" and select "Matching on on a name listed below" and enter "^Check$" into the list (w/o the quotes)
3) create import profile for "KMY_TST_BUG_374352.csv"
4) import "KMY_TST_BUG_374352.csv

Note I used Payee name "Check" instead of "Chec" and regular expression "^Check$" instead of "^Chec$".  Test case succeeds and my larger 3-year import worked correctly as well.

This is an acceptable solution for me.  I look forward to the KF5 version.

You might also consider making an additional doc update for the 4.x and KF5 versions (in your copious spare time . . . :) ) with examples of short Payee names that match longer Payee names which have the short Payee name embedded in the longer name would be helpful to other users, mentioning this technique as a solution.

Thank you very much for your prompt and knowledgeable help.

Regards,

Peter
Comment 25 Thomas Baumgart 2017-01-07 08:08:52 UTC
Git commit f642c1913c2272429a08483acaee0162f958ef3b by Thomas Baumgart.
Committed on 07/01/2017 at 08:02.
Pushed by tbaumgart into branch 'master'.

Add exact payee name match option

The current 'Match on payees name' option works on a partial match
of the name. In some scenarios, an exact match is expected. The
new option allows to select that on a per payee basis.

The default for new payees entered through the GUI is not changed.
It remains 'No matching'. The default for new payees created during
statement import is the new 'Match exact name' option.

Also, the radio button based selection has been changed to a combobox.

GUI:

M  +7    -1    kmymoney/converter/mymoneystatementreader.cpp
M  +6    -1    kmymoney/mymoney/mymoneypayee.cpp
M  +2    -1    kmymoney/mymoney/mymoneypayee.h
M  +13   -0    kmymoney/mymoney/tests/mymoneypayee-test.cpp
M  +1    -0    kmymoney/mymoney/tests/mymoneypayee-test.h
M  +13   -14   kmymoney/views/kpayeesview.cpp
M  +489  -540  kmymoney/views/kpayeesviewdecl.ui

https://commits.kde.org/kmymoney/f642c1913c2272429a08483acaee0162f958ef3b
Comment 26 Thomas Baumgart 2017-01-25 08:46:44 UTC
Git commit dd85d3b1ee08c0d3499b840b655d9500c0f07ac5 by Thomas Baumgart.
Committed on 25/01/2017 at 08:41.
Pushed by tbaumgart into branch '4.8'.

Improve previous commit for exact name match

Only perform the exact name match on payees that have been created
automatically during import.
This keeps the behavior for existing payee data untouched while
solving the problem.
(cherry picked from commit 5ecbe8f8a74fbf14fbbac06ea418731aa51d884c)

M  +3    -3    kmymoney/converter/mymoneystatementreader.cpp

https://commits.kde.org/kmymoney/dd85d3b1ee08c0d3499b840b655d9500c0f07ac5