Bug 331552

Summary: Integrate wikilocation.org for retrieving wikipedia articles, part 1: Compare geonames and wikilocation
Product: [Applications] marble Reporter: Dennis Nienhüser <nienhueser>
Component: generalAssignee: Cruceru Calin-Cristian <crucerucalincristian>
Status: RESOLVED FIXED    
Severity: task CC: crucerucalincristian, sanjiban22393
Priority: NOR Keywords: junior-jobs
Version: 1.7 (KDE 4.12)   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Dennis Nienhüser 2014-02-26 21:34:59 UTC
The online service Wikipedia in Marble retrieves wikipedia articles by location from geonames.org and displays them on the map. The service is limited to a certain number of daily queries since some time. This task is about researching a way to use wikilocation.org as an additional source for wikipedia articles. Either
- query articles from both geonames and wikilocation in parallel
- replaces geonames with wikilocation or
- run wikilocation when detecting that we used up the limit

To decide for the best approach this task is about doing some research wrt to the quality of both services:
- Do both differ in the articles they provide? Compare a couple of sample locations
- What kind of additional information do they provide that is useful for Marble?
- Are there any speed differences between them?

See http://wikilocation.org/documentation/#api-articles and http://www.geonames.org/export/wikipedia-webservice.html#wikipediaBoundingBox
Comment 1 Sanjiban Bairagya 2014-02-26 21:49:53 UTC
I want to work on this task!
Comment 2 Sanjiban Bairagya 2014-02-26 22:34:08 UTC
So is this task supposed to be only research-based, or is there some coding present in it as well?
Comment 3 Dennis Nienhüser 2014-02-26 23:00:06 UTC
I don't expect any coding here (or none that would end up in Marble directly, you might want to do some script or similar though to visualize the API call results to compare the results for similar regions).
Comment 4 Sanjiban Bairagya 2014-02-26 23:21:22 UTC
Okay, actually I was thinking if it would be possible to maybe assign this task to the others and assign me in the next task instead. I am sorry I should not have gone for this task in such a hurry and messing it up in midway like this.
Comment 5 Dennis Nienhüser 2014-02-26 23:23:16 UTC
No worries ;-)
Comment 6 Sanjiban Bairagya 2014-02-26 23:25:06 UTC
Thank you :)
Comment 7 Cruceru Calin-Cristian 2014-02-27 00:04:22 UTC
I want to work on this task.
Comment 8 Cruceru Calin-Cristian 2014-02-28 10:27:44 UTC
I made some research and came to these conclusions:

- The two services do not differ too much in the articles they provide: one difference which may be interesting is that geonames.org provides a small summary of the entry. However, both provide wikipedia urls so if one wants to read more about it, it would be only one click away.

- As I previously said, they both offer almost the same information to marble. There are, tough, some small differences: geonames provides elevation in addition; wikilocation, however, returns the  distance from the given point. The two services work similarly. They differ, however, in that in geonames you give a bounding box and it returns the wikipedia entries within that  bounding box while in wikilocation you give a point by it's lat and lng and optionally a radius to search witin and limit for the number of results you want to return.

- As far as time is concerned, I spoted some differences. I made some tests, and it took "real    0m5.679s" for geonames to provide 20 entries while wikilocation returned 9 articles in "real    0m6.602s" so almost 1 second difference and it returned half of the entries return by geonames.

In conclusion, taking into consideration the time differences I think that replacing geonames with wikilocation would be the worst solution. I think that neither the last solution, "run wikilocation when detecting that we used up the limit" is not very good because when the limit would be reached, the difference of speed would be obvious. So my opinion is that querying articles from both geonames and wikilocation in parallel would be the best solution. 

PS: I couldn't make more tests because geonames said I reached the number of credits for the day. I tried to change the username but however, it didn't let me check more than one or two xml files.
Comment 9 Dennis Nienhüser 2014-03-02 08:14:11 UTC
The timing results here (Germany, Sunday morning local time) are
- about 2 seconds for wikilocation (high variance)
- 0.1 seconds for geonames (low variance)

Clear win for geonames, but that doesn't rule out wikilocation.

For geonames I used our marble username without any problems. Which username did you use? The demo one is often problematic as it is shared by all users trying their API.

I agree that replacing geonames with wikilocation is not an option right now, but I feel that detecting problems with geonames and only then working with wikilocation might be the better approach. It avoids unnecessary network traffic and duplicated results shown over each other in the map.
Comment 10 Cruceru Calin-Cristian 2014-03-02 11:29:06 UTC
Yes, I only used the demo user and some random usernams. I didn't know that I can use my marble username.

I think that your conclusion of detecting problems with geonames and then using wikilocation is close to what I thought since I specifically excluded the option of replacing geonames with wikilocation too. 
I wasn't sure which of the other two to choose and I considered as a criterion the  difference of speed. However, at a second thought, I think that you are right and it is more important to avoid unnecessary network traffic and duplicated results so running wikilocation when detecting that we used up the limit would be the best option.