Bug 289282 - USDA database incomplete & has wrong ID numbers
Summary: USDA database incomplete & has wrong ID numbers
Status: CONFIRMED
Alias: None
Product: krecipes
Classification: Applications
Component: general (show other bugs)
Version: 2.0-beta2
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Unassigned bugs mailing-list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-18 14:13 UTC by Anthony DeRobertis
Modified: 2016-06-20 22:45 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch to update to SR24 (824.52 KB, application/x-xz)
2011-12-18 16:24 UTC, Anthony DeRobertis
Details
Program to aid in verifying ingredient-data-en_US.txt (793 bytes, application/x-perl)
2011-12-22 07:35 UTC, Anthony DeRobertis
Details
Patch to update ingredient data file (instead of generate from USDA data as in other patch) (13.26 KB, patch)
2011-12-22 07:46 UTC, Anthony DeRobertis
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anthony DeRobertis 2011-12-18 14:13:02 UTC
Version:           2.0-beta2 (using KDE 4.6.5) 
OS:                Linux

I'm trying to find 'red wine vinegar' in the nutrition information. Unfortunately, only two vinegars are listed—cider and distilled.

But, it should be there: http://www.nal.usda.gov/fnic/foodcomp/cgi-bin/list_nut_edit.pl?NDB_NO=02068&FDGP_CD=0200&FOOD_NAME=Vinegar%252c%2520red%2520wine&SCI_NAME=&COM_NAME=&MSRE_NO0=100grams&GRAMS_100=1.00&1=1.00&2=1.00&3=1.00&NUMBER_OF_CHECKBOXES=3

Its NDB ID 02068, so I tried sorting by ID in the dialog, and 2068 (I guess leading 0s are stripped?) is "HORMEL CANADIAN STYLE BACON".

I think this is because you need to update abbrev.txt. No idea if the format has changed, hopefully not. The ID numbers definitely have.


Reproducible: Always

Steps to Reproduce:
N/A

Actual Results:  
N/A

Expected Results:  
N/A

OS: Linux (x86_64) release 3.1.0-1-amd64
Compiler: gcc
Comment 1 Anthony DeRobertis 2011-12-18 14:51:41 UTC
Actually, looking at the old abbrev.txt file, I have no idea where that ID (for the Canadian bacon) came from—it doesn't appear in the NDB file.

Updating abbrev.txt seems to have mostly worked, but it looks like there are several other data files I need to update as well.
Comment 2 Anthony DeRobertis 2011-12-18 14:56:04 UTC
Also, it appears that there is an 'ingredient-data-en-US' which I guess was done by hand? Unfortunately, it contains incorrect data, for example:

tomatoes, stewed:11693

but 11693 is crushed tomatoes. Stewed tomatoes are 11533 ("Tomatoes, red, ripe, canned, stewed"

I'm going to replace it with data from FOOD_DES.txt; will upload all the new files once I finish them...
Comment 3 Anthony DeRobertis 2011-12-18 16:24:05 UTC
Created attachment 66870 [details]
Patch to update to SR24

This updates the weights.txt file as well, even though krecipes is (unfortunately!) not using it.

The number of fields in abbrev.txt changed, updated a define.

(xz'd; Bugzilla refused the patch for being too large. USDA data files are large. Not much I can do about that. gzip and bzip both exceeded 1MB)
Comment 4 Anthony DeRobertis 2011-12-21 09:07:30 UTC
It appears that the ingredient-data-en-US.txt file also controls which are loaded by default, and also the common names help in recipe matching.

I'm going through it, fixing it, but its going to take a bit. I'm a third of the way through, there are a *lot* of mistakes in it.
Comment 5 Anthony DeRobertis 2011-12-22 07:35:46 UTC
Created attachment 67007 [details]
Program to aid in verifying ingredient-data-en_US.txt
Comment 6 Anthony DeRobertis 2011-12-22 07:46:09 UTC
Created attachment 67008 [details]
Patch to update ingredient data file (instead of generate from USDA data as in other patch)

Some entries were plain wrong—they pointed to the wrong food. Many of them were close matches, so may have come from before the correct food existed in the USDA database. But that strategy doesn't really work, as it leave incorrect data imported. So where I couldn't find a match, I just deleted the entry.

I also fixed some names, to use what they're normally called in the US. Though where that leads to confusion, I've added more to the name to clarify.

I may have missed some...

At this point, I'm done with spamming this bug for a while. Hope to actually use krecipes now...