Both simon and sam are stuck at 20% for a couple of minutes and then i get an error log (attached) plus a message that i dont have a large enough training corpus. I have some 200 recordings though. In summer i had also recorded several hours of speech, at least a couple of thousand sentence-long samples in another machine, but i was getting precisely the same error messages. There is another thing that worries me. If you take a look at the error log the fine names for the greek sound files consist of unrecognizable characters. I went in the simon dir, and all file names look like this. This must be something with 0.4 series, i checked an old 0.3 installation on an old machine and file names looked normal. It is also simon specific, happens both in suse and ubuntu/mint. Can it be related to the error? It does give a whole lot of warnings in the error log. WARNING: Error in '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/etc/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}_train.fileids', the feature file '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/feat/Ã�³Ã�¹Ã�¬Ã�½Ã�½Ã�·Ã�Â�_Ã�µÃ�Â�Ã�¹Ã�Â�Ã�ÂÃ�»Ã�¿Ã�Â�Ã�Â�_Ã�µÃ�Â�Ã�¹Ã�Â�Ã�ÂÃ�»Ã�¿Ã�Â�Ã�Â�_Ã�µÃ�¯Ã�½Ã�±Ã�¹_Ã�³Ã�¹Ã�¬Ã�½Ã�½Ã�·Ã�Â�_S13_2014-12-25_02-34-04.0.mfc' does not exist, or is empty Discussed previously on this thread Reproducible: Always Steps to Reproduce: 1.Add greek words to vocabulary in default scenario 2.Train a few samples 3.Compile model Actual Results: text with weird encoding in file names, model compilation stuch as described above Link to training folder https://drive.google.com/file/d/0B4gzqgHGk05CbExxMWJFZ3FrWnM/view?usp=sharing
This has been fixed in the current development version.