Bug 175071

Summary: add automatic download of pronunciation files from Wikimedia Commons
Product: [Applications] parley Reporter: Piotr Kubowicz <derbeth>
Component: generalAssignee: Parley Developers <parley-devel>
Status: REPORTED ---    
Severity: wishlist    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Microsoft Windows   
OS: Microsoft Windows   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Perl script fetching information about pronunciation from Commons

Description Piotr Kubowicz 2008-11-13 19:23:14 UTC
Version:            (using KDE 4.1.3)
OS:                MS Windows
Installed from:    MS Windows

The main Parley website suggest helping in recording pronunciation files for Parley. This does not make any sense: Wikimedia Commons already has about 12 000 free-licensed audio files for English (http://commons.wikimedia.org/wiki/Category:Pronunciation), about 9 000 for Dutch, 7 000 for French, 5 000 for Russian - and many, many more. There are _75_ languages, for which there are some pronunciation files. The task is to utilise these materials.

My bug 175070 discusses a proposal how to make user interface for fetching files and displaying copyright information for them:
http://bugs.kde.org/show_bug.cgi?id=175070

Plus, there could be a special Parley directory for saving downloaded pronunciation files, so that they won't duplicate between different dictionaries. However, because such files are not big (about 17 KB each), it's not urgent.
Comment 1 Piotr Kubowicz 2008-11-13 19:33:32 UTC
Created attachment 28546 [details]
Perl script fetching information about pronunciation from Commons

This is my Perl script I use to fetch information about which pronunciation files are available in Wikimedia Commons for each language. It's a bit tricky, because although these files are organised into categories, they are named in a special way, which includes information about language and sometimes also dialect. So, Image:De-Buch.ogg is pronunciation of "Buch" in German (case sensitive) and Image:En-us-cat.ogg is pronunciation of "cat" in English, with US accent.
Comment 2 Piotr Kubowicz 2008-11-13 19:38:00 UTC
When searching for pronunciation files Parley could use one of two methods: try to guess filename on Wikimedia Commons (algorithm: get image name, add two-letter language prefix ending with pause, add ".ogg" extension; if this fails try adding dialect prefix after language prefix) or use some kind of web service on Parley server, which would use a prebuilt list of pronunciation files available for each word (a dummy method for creating such "database" is presented in my script).