SUMMARY When importing a simple and small CSV kmymoney takes 100% CPU and it's getting really too much time (more than 15 minutes). STEPS TO REPRODUCE 1. Import a CSV file with about 4k lines OBSERVED RESULT Too slow and 100% CPU used. EXPECTED RESULT Fast import. SOFTWARE/OS VERSIONS Windows: macOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
I"m not sure I'd call 4000 lines a small csv, but I do agree it shouldn't take 15 minutes to import. First question, which distribution are you running, and where did you download the flatpak from? I am not aware of any official flatpak version of KMyMoney. Next step is have you tried importing a csv with only ten or so rows? This could help determine if the program is getting stuck importing any csv data, or if the problem is related to the size of the file. Also probably not relevant, but how large is your kmy file? Just a few accounts, or hundreds? Again, it shouldn't really matter, but I'm looking for any information which might help figure out where it is getting stuck. Finally, how long did it eventually take for the import, or did you interrupt it before it finished?
Created attachment 154945 [details] test data Use this data file to reproduce the bug. I've imported it to a new XML file. It tool more that 10 minutes to finish.
I'm going to skip a repeated request for more information (which distribution, where did you download KMM from) and go straight to the csv file. I see only three columns in the file. The first is always "01-01-2023" a valid date. The second is always -1. I don't know what this is, but it doesn't look like an amount or payee name or a category to me. The third is a consistent very long string followed by a space and some digits. While the digits might be an amount depending on your currency, they are not in a separate column, so I don't see any way KMyMoney could parse the value. Please tell us how you mapped the columns to the various fields in the csv import wizard, or at least what you expect KMyMoney to do with those three values.
What I will ask is what type of PC are you using? I just did import the csv to a newly created XML file with one checking account. It accepted column 1 as the date, column 2 as the payee, and column 3 as the payment amount. I have no idea what currency processing it used, but it did take the digits at the end of the values in the third column as whole dollars. The entire import took well under one minute, although I didn't use a stopwatch. My PC has an AMD Ryzen at about 3.5GHz. The multiple cores are irrelevant, as KMyMoney only uses a single thread. The cause of your slow import seems not to be your XML file, the csv file, or KMyMoney. Are both your files on a local hard drive or on a network? If local, are you sure the drive is healthy? If network, have you testing network throughput? Is there perhaps anything else odd or uncommon about your setup?
Hi, sorry for the few information provided. You imported my sample data as expected. I'm using a "standard" setup, no network involved. My drive is healthy. My PC is a DELL with 6 cores i7-8750H CPU @ 2.20GHz. I'm under Alpine Linux. I'm checking under another distro.
No sorry you should import it as follows: first column: date second column: amount third column: payee I'm using the third column as payee because I don't have the payee specified in my CSV exported from the bank. So I'll try to match the correct payee later using the automated matching feature. I've downloaded and installed kmymoney using the official Alpine Linux repository. Can you reproduce the bug now?
confirmed, yeah, this is what I have: 1) same 100% cpu 2) took 20 minutes to import that file.
Importing 100 rows took under 3 seconds. I only let the whole file go about 7 minutes before killing it, so it may well have gone near or over 10 minutes.. My best guess is that the time to process each row is increasing linearly with the number of payees, because it is creating a new payee for each row (the text may be the same, but the digits at the end are different) and for each row, it needs to compare against all the existing payees to see if the payee already exists. I would personally call your sample csv is a pathological example (meaning it triggers a possibly problem behavior in the worst possible way.) To be sure, I would time the import of 100, 200, 300, ... rows to see how the time compares to number of rows. My personal data file, which has data for well over 10 years, has 950 accounts (including 723 stock accounts, many closed) but only 335 payees. Note I don't keep a separate payee for absolutely every different possible one - I have some like "Miscellaneous gas station" and "Miscellaneous grocery store" with individual ones only for Payees I frequently use. So - I see how that file can take an excessively long time to import, but I think it is because the file is not really a good example of real data. If you really do expect to be importing thousands of transactions at a time, with very little possibility of recognizing payees as already known, then you will end up with slow imports, and I can't think of any way to make the program faster at what it has to do. Note it might well be notably faster if you used shorter payee examples, as your samples are all over 200 characters.
I have a suggestion. Add a fourth value to each row, "payee." Then import that column (4) as the payee, and import column 3 as the memo. I did this, and it took 45 seconds. If your point is to manually assign the payee later, then you probably don't really want 4000 new payees you will have to delete anyway. I am also assuming that column 3 is intended to be what the bank provides, which actually includes multiple bits of information, so it can't directly be the payee. Let me know how this works for you.
Hi Jack, thanks for your suggestion. I can indeed import it as Memo, as it really is. But I also need to import it as a Payee because I can't match the Memo field to determine the Payee (and thus it's category) automatically, using the bank-provided information alone. As you can imagine, I can't / don't want to do that job manually each time I import a CSV, even a small one. Instead, I need something that can add all the metadata automatically (including payees and categories). If I could match the Memo field and popupate the Payee and the Category fields, for example using another kMyMoney feature with regular expressions, as done in the Payee tab, it would be perfect. Is that possible? If not, do you have some other suggestions? Remember: my bank-provided CSVs don't contain the Payee nor Category information but only the Description field. My banks don't give me that information ready to import (and even if so, I would want to use my own categories...). Thank you.
First, I do not believe there is any way to match the Category on import. The Category is set if the imported transaction is matched to an existing transaction. If there is not already a wishlist for that, perhaps we can eventually alter this bug for that purpose. Second, there are some import features available for online OFX import which might help, but are unfortunately not available for csv import. I have no idea how hard it would be to extend those features to file import. Finally, I'm actually not sure I understand how you plan to find the actual Payee from the long imported Payee. Payee matching happens at import time - I'm not aware of being able to do it after the import. Have you read the Handbook chapter on Payees, specifically the section on Transaction Matching Settings? https://docs.kde.org/stable5/en/kmymoney/kmymoney/details.payees.personalinformation.html#details.payees.personalinformation.matching Before I make any suggestions, I would need to see some real examples of the Description field your bank provides. Can you provide some? If it does include the Payee name, then you could set that payee to match on that name. Finally, I wonder if what you really want to do is to pre-process your downloaded csv to extract explicit payee and category columns from the description. It might be an extra step, but once you write the script, the extra work for each import might be less than what you plan. Separately, even if you continue as you started, how may payees do you eventually expect to have? The excessive import time is because each row needs to compare the "payee" (description) to every existing payee, looking for a match. If you are only importing a hundred rows at a time, and you only have a few hundred Payees, it won't be instant, but much less than 20 minutes.
Hi Jack, thank you for your suggestions. Since there is no other solution, I think I will end up preprocessing the CSV as you said. Regards
First: what is the point to match a payee name having 213 characters? I updated a sample file to have "Some Payee Name" and an import finished in less than 2 minutes, well it is very slow, but not too excessive. For the future we need to see why it is slow. I would say it should run for maybe like 5-10 seconds. File may contain everything, this is an example what i'm importing: "Trans. Date","Post Date",Description,Amount,Category 2022-02-14,02/14/2022,"ZYBOOK COLLEGE TXTBOOK 4086349926 CA",72.00,Services "Services" is a category, it gets automatically imported and matched. NOTE: updated file is attached.
Created attachment 154977 [details] File with updated payee name from excessive 213 chars to reasonable.
Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging If you have already provided the requested information, please mark the bug as REPORTED so that the KDE team knows that the bug is ready to be confirmed. Thank you for helping us make KDE software even better for everyone!
This bug has been in NEEDSINFO status with no change for at least 30 days. The bug is now closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging Thank you for helping us make KDE software even better for everyone!