Bug 175071 - add automatic download of pronunciation files from Wikimedia Commons
Summary: add automatic download of pronunciation files from Wikimedia Commons
Status: REPORTED
Alias: None
Product: parley
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Microsoft Windows Microsoft Windows
: NOR wishlist
Target Milestone: ---
Assignee: Parley Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-13 19:23 UTC by Piotr Kubowicz
Modified: 2008-11-13 19:38 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Perl script fetching information about pronunciation from Commons (8.52 KB, text/x-perl)
2008-11-13 19:33 UTC, Piotr Kubowicz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Piotr Kubowicz 2008-11-13 19:23:14 UTC
Version:            (using KDE 4.1.3)
OS:                MS Windows
Installed from:    MS Windows

The main Parley website suggest helping in recording pronunciation files for Parley. This does not make any sense: Wikimedia Commons already has about 12 000 free-licensed audio files for English (http://commons.wikimedia.org/wiki/Category:Pronunciation), about 9 000 for Dutch, 7 000 for French, 5 000 for Russian - and many, many more. There are _75_ languages, for which there are some pronunciation files. The task is to utilise these materials.

My bug 175070 discusses a proposal how to make user interface for fetching files and displaying copyright information for them:
http://bugs.kde.org/show_bug.cgi?id=175070

Plus, there could be a special Parley directory for saving downloaded pronunciation files, so that they won't duplicate between different dictionaries. However, because such files are not big (about 17 KB each), it's not urgent.
Comment 1 Piotr Kubowicz 2008-11-13 19:33:32 UTC
Created attachment 28546 [details]
Perl script fetching information about pronunciation from Commons

This is my Perl script I use to fetch information about which pronunciation files are available in Wikimedia Commons for each language. It's a bit tricky, because although these files are organised into categories, they are named in a special way, which includes information about language and sometimes also dialect. So, Image:De-Buch.ogg is pronunciation of "Buch" in German (case sensitive) and Image:En-us-cat.ogg is pronunciation of "cat" in English, with US accent.
Comment 2 Piotr Kubowicz 2008-11-13 19:38:00 UTC
When searching for pronunciation files Parley could use one of two methods: try to guess filename on Wikimedia Commons (algorithm: get image name, add two-letter language prefix ending with pause, add ".ogg" extension; if this fails try adding dialect prefix after language prefix) or use some kind of web service on Parley server, which would use a prebuilt list of pronunciation files available for each word (a dummy method for creating such "database" is presented in my script).