175071 – add automatic download of pronunciation files from Wikimedia Commons

Bug 175071 - add automatic download of pronunciation files from Wikimedia Commons

Summary: add automatic download of pronunciation files from Wikimedia Commons

Status:	REPORTED

Alias:	None

Product:	parley
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	Microsoft Windows Microsoft Windows

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Parley Developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-11-13 19:23 UTC by Piotr Kubowicz
Modified:	2008-11-13 19:38 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:

Attachments
Perl script fetching information about pronunciation from Commons (8.52 KB, text/x-perl) 2008-11-13 19:33 UTC, Piotr Kubowicz	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Piotr Kubowicz 2008-11-13 19:23:14 UTC

Version:            (using KDE 4.1.3)
OS:                MS Windows
Installed from:    MS Windows

The main Parley website suggest helping in recording pronunciation files for Parley. This does not make any sense: Wikimedia Commons already has about 12 000 free-licensed audio files for English (http://commons.wikimedia.org/wiki/Category:Pronunciation), about 9 000 for Dutch, 7 000 for French, 5 000 for Russian - and many, many more. There are _75_ languages, for which there are some pronunciation files. The task is to utilise these materials.

My bug 175070 discusses a proposal how to make user interface for fetching files and displaying copyright information for them:
http://bugs.kde.org/show_bug.cgi?id=175070

Plus, there could be a special Parley directory for saving downloaded pronunciation files, so that they won't duplicate between different dictionaries. However, because such files are not big (about 17 KB each), it's not urgent.

Comment 1 Piotr Kubowicz 2008-11-13 19:33:32 UTC

Created attachment 28546 [details]
Perl script fetching information about pronunciation from Commons

This is my Perl script I use to fetch information about which pronunciation files are available in Wikimedia Commons for each language. It's a bit tricky, because although these files are organised into categories, they are named in a special way, which includes information about language and sometimes also dialect. So, Image:De-Buch.ogg is pronunciation of "Buch" in German (case sensitive) and Image:En-us-cat.ogg is pronunciation of "cat" in English, with US accent.

Comment 2 Piotr Kubowicz 2008-11-13 19:38:00 UTC

When searching for pronunciation files Parley could use one of two methods: try to guess filename on Wikimedia Commons (algorithm: get image name, add two-letter language prefix ending with pause, add ".ogg" extension; if this fails try adding dialect prefix after language prefix) or use some kind of web service on Parley server, which would use a prebuilt list of pronunciation files available for each word (a dummy method for creating such "database" is presented in my script).