Bug 343848 - simon model compilation fails, stuck at 20%. filename encoding
Summary: simon model compilation fails, stuck at 20%. filename encoding
Status: RESOLVED FIXED
Alias: None
Product: simon
Classification: Applications
Component: simon (show other bugs)
Version: 0.4.1
Platform: Ubuntu Linux
: NOR grave
Target Milestone: ---
Assignee: Peter Grasch
URL: http://codeviewer.org/view/code:493a
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-06 02:22 UTC by Dimitrios Papantoniou
Modified: 2015-07-26 12:57 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dimitrios Papantoniou 2015-02-06 02:22:25 UTC
Both simon and sam are stuck at 20% for a couple of minutes and then i get an error log (attached) plus a message that i dont have a large enough training corpus.
I have some 200 recordings though. In summer i had also recorded several hours of speech, at least a couple of thousand sentence-long samples in another machine, but i was getting precisely the same error messages.
There is another thing that worries me. If you take a look at the error log the fine names for the greek sound files consist of unrecognizable characters. I went in the simon dir, and all file names look like this. This must be something with 0.4 series, i checked an old 0.3 installation on an old machine and file names looked normal. It is also simon specific, happens both in suse and ubuntu/mint. Can it be related to the error? It does give a whole lot of warnings in the error log.
WARNING: Error in '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/etc/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}_train.fileids', the feature file '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/feat/�³�¹�¬�½�½�·��_�µ���¹���­�»�¿����_�µ���¹���­�»�¿����_�µ�¯�½�±�¹_�³�¹�¬�½�½�·��_S13_2014-12-25_02-34-04.0.mfc' does not exist, or is empty

Discussed previously on this thread

Reproducible: Always

Steps to Reproduce:
1.Add greek words to vocabulary in default scenario
2.Train a few samples
3.Compile model

Actual Results:  
text with weird encoding in file names, model compilation stuch as described above


Link to training folder
https://drive.google.com/file/d/0B4gzqgHGk05CbExxMWJFZ3FrWnM/view?usp=sharing
Comment 1 Peter Grasch 2015-07-26 12:57:53 UTC
This has been fixed in the current development version.