343848 – simon model compilation fails, stuck at 20%. filename encoding

Bug 343848 - simon model compilation fails, stuck at 20%. filename encoding

Summary: simon model compilation fails, stuck at 20%. filename encoding

Status:	RESOLVED FIXED

Alias:	None

Product:	simon
Classification:	Unmaintained
Component:	simon (other bugs)
Version First Reported In:	0.4.1
Platform:	Ubuntu Linux

Importance:	NOR grave
Target Milestone:	---
Assignee:	Peter Grasch

URL:	http://codeviewer.org/view/code:493a
Keywords:

Depends on:
Blocks:

Reported:	2015-02-06 02:22 UTC by Dimitrios Papantoniou
Modified:	2015-07-26 12:57 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Dimitrios Papantoniou 2015-02-06 02:22:25 UTC

Both simon and sam are stuck at 20% for a couple of minutes and then i get an error log (attached) plus a message that i dont have a large enough training corpus.
I have some 200 recordings though. In summer i had also recorded several hours of speech, at least a couple of thousand sentence-long samples in another machine, but i was getting precisely the same error messages.
There is another thing that worries me. If you take a look at the error log the fine names for the greek sound files consist of unrecognizable characters. I went in the simon dir, and all file names look like this. This must be something with 0.4 series, i checked an old 0.3 installation on an old machine and file names looked normal. It is also simon specific, happens both in suse and ubuntu/mint. Can it be related to the error? It does give a whole lot of warnings in the error log.
WARNING: Error in '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/etc/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}_train.fileids', the feature file '/tmp/kde-k8oylos/sam/model/internalSamuser{384b5872-2331-4f81-8d4a-091dd1616103}/feat/Ã�Â³Ã�Â¹Ã�Â¬Ã�Â½Ã�Â½Ã�Â·Ã�Â�_Ã�ÂµÃ�Â�Ã�Â¹Ã�Â�Ã�ÂÃ�Â»Ã�Â¿Ã�Â�Ã�Â�_Ã�ÂµÃ�Â�Ã�Â¹Ã�Â�Ã�ÂÃ�Â»Ã�Â¿Ã�Â�Ã�Â�_Ã�ÂµÃ�Â¯Ã�Â½Ã�Â±Ã�Â¹_Ã�Â³Ã�Â¹Ã�Â¬Ã�Â½Ã�Â½Ã�Â·Ã�Â�_S13_2014-12-25_02-34-04.0.mfc' does not exist, or is empty

Discussed previously on this thread

Reproducible: Always

Steps to Reproduce:
1.Add greek words to vocabulary in default scenario
2.Train a few samples
3.Compile model

Actual Results:
text with weird encoding in file names, model compilation stuch as described above

Link to training folder
https://drive.google.com/file/d/0B4gzqgHGk05CbExxMWJFZ3FrWnM/view?usp=sharing

Comment 1 Peter Grasch 2015-07-26 12:57:53 UTC

This has been fixed in the current development version.