Bug 343848

Summary: simon model compilation fails, stuck at 20%. filename encoding
Product: [Applications] simon Reporter: Dimitrios Papantoniou <papantoniou.dimitris>
Component: simonAssignee: Peter Grasch <me>
Status: RESOLVED FIXED    
Severity: grave CC: papantoniou.dimitris
Priority: NOR    
Version: 0.4.1   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
URL: http://codeviewer.org/view/code:493a
Latest Commit: Version Fixed In:

Description Dimitrios Papantoniou 2015-02-06 02:22:25 UTC
Both simon and sam are stuck at 20% for a couple of minutes and then i get an error log (attached) plus a message that i dont have a large enough training corpus.
I have some 200 recordings though. In summer i had also recorded several hours of speech, at least a couple of thousand sentence-long samples in another machine, but i was getting precisely the same error messages.
There is another thing that worries me. If you take a look at the error log the fine names for the greek sound files consist of unrecognizable characters. I went in the simon dir, and all file names look like this. This must be something with 0.4 series, i checked an old 0.3 installation on an old machine and file names looked normal. It is also simon specific, happens both in suse and ubuntu/mint. Can it be related to the error? It does give a whole lot of warnings in the error log.
WARNING: Error in '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/etc/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}_train.fileids', the feature file '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/feat/�³�¹�¬�½�½�·��_�µ���¹���­�»�¿����_�µ���¹���­�»�¿����_�µ�¯�½�±�¹_�³�¹�¬�½�½�·��_S13_2014-12-25_02-34-04.0.mfc' does not exist, or is empty

Discussed previously on this thread

Reproducible: Always

Steps to Reproduce:
1.Add greek words to vocabulary in default scenario
2.Train a few samples
3.Compile model

Actual Results:  
text with weird encoding in file names, model compilation stuch as described above


Link to training folder
https://drive.google.com/file/d/0B4gzqgHGk05CbExxMWJFZ3FrWnM/view?usp=sharing
Comment 1 Peter Grasch 2015-07-26 12:57:53 UTC
This has been fixed in the current development version.