Bug 289282

Summary: USDA database incomplete & has wrong ID numbers
Product: [Applications] krecipes Reporter: Anthony DeRobertis <anthony>
Component: generalAssignee: Unassigned bugs mailing-list <unassigned-bugs>
Status: CONFIRMED ---    
Severity: normal CC: aacid, sgmoore, unassigned-bugs
Priority: NOR    
Version: 2.0-beta2   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: Patch to update to SR24
Program to aid in verifying ingredient-data-en_US.txt
Patch to update ingredient data file (instead of generate from USDA data as in other patch)

Description Anthony DeRobertis 2011-12-18 14:13:02 UTC
Version:           2.0-beta2 (using KDE 4.6.5) 
OS:                Linux

I'm trying to find 'red wine vinegar' in the nutrition information. Unfortunately, only two vinegars are listed—cider and distilled.

But, it should be there: http://www.nal.usda.gov/fnic/foodcomp/cgi-bin/list_nut_edit.pl?NDB_NO=02068&FDGP_CD=0200&FOOD_NAME=Vinegar%252c%2520red%2520wine&SCI_NAME=&COM_NAME=&MSRE_NO0=100grams&GRAMS_100=1.00&1=1.00&2=1.00&3=1.00&NUMBER_OF_CHECKBOXES=3

Its NDB ID 02068, so I tried sorting by ID in the dialog, and 2068 (I guess leading 0s are stripped?) is "HORMEL CANADIAN STYLE BACON".

I think this is because you need to update abbrev.txt. No idea if the format has changed, hopefully not. The ID numbers definitely have.


Reproducible: Always

Steps to Reproduce:
N/A

Actual Results:  
N/A

Expected Results:  
N/A

OS: Linux (x86_64) release 3.1.0-1-amd64
Compiler: gcc
Comment 1 Anthony DeRobertis 2011-12-18 14:51:41 UTC
Actually, looking at the old abbrev.txt file, I have no idea where that ID (for the Canadian bacon) came from—it doesn't appear in the NDB file.

Updating abbrev.txt seems to have mostly worked, but it looks like there are several other data files I need to update as well.
Comment 2 Anthony DeRobertis 2011-12-18 14:56:04 UTC
Also, it appears that there is an 'ingredient-data-en-US' which I guess was done by hand? Unfortunately, it contains incorrect data, for example:

tomatoes, stewed:11693

but 11693 is crushed tomatoes. Stewed tomatoes are 11533 ("Tomatoes, red, ripe, canned, stewed"

I'm going to replace it with data from FOOD_DES.txt; will upload all the new files once I finish them...
Comment 3 Anthony DeRobertis 2011-12-18 16:24:05 UTC
Created attachment 66870 [details]
Patch to update to SR24

This updates the weights.txt file as well, even though krecipes is (unfortunately!) not using it.

The number of fields in abbrev.txt changed, updated a define.

(xz'd; Bugzilla refused the patch for being too large. USDA data files are large. Not much I can do about that. gzip and bzip both exceeded 1MB)
Comment 4 Anthony DeRobertis 2011-12-21 09:07:30 UTC
It appears that the ingredient-data-en-US.txt file also controls which are loaded by default, and also the common names help in recipe matching.

I'm going through it, fixing it, but its going to take a bit. I'm a third of the way through, there are a *lot* of mistakes in it.
Comment 5 Anthony DeRobertis 2011-12-22 07:35:46 UTC
Created attachment 67007 [details]
Program to aid in verifying ingredient-data-en_US.txt
Comment 6 Anthony DeRobertis 2011-12-22 07:46:09 UTC
Created attachment 67008 [details]
Patch to update ingredient data file (instead of generate from USDA data as in other patch)

Some entries were plain wrong—they pointed to the wrong food. Many of them were close matches, so may have come from before the correct food existed in the USDA database. But that strategy doesn't really work, as it leave incorrect data imported. So where I couldn't find a match, I just deleted the entry.

I also fixed some names, to use what they're normally called in the US. Though where that leads to confusion, I've added more to the name to clarify.

I may have missed some...

At this point, I'm done with spamming this bug for a while. Hope to actually use krecipes now...