Summary: | Integrate wikilocation.org for retrieving wikipedia articles, part 1: Compare geonames and wikilocation | ||
---|---|---|---|
Product: | [Applications] marble | Reporter: | Dennis Nienhüser <nienhueser> |
Component: | general | Assignee: | Cruceru Calin-Cristian <crucerucalincristian> |
Status: | RESOLVED FIXED | ||
Severity: | task | CC: | crucerucalincristian, sanjiban22393 |
Priority: | NOR | Keywords: | junior-jobs |
Version: | 1.7 (KDE 4.12) | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Dennis Nienhüser
2014-02-26 21:34:59 UTC
I want to work on this task! So is this task supposed to be only research-based, or is there some coding present in it as well? I don't expect any coding here (or none that would end up in Marble directly, you might want to do some script or similar though to visualize the API call results to compare the results for similar regions). Okay, actually I was thinking if it would be possible to maybe assign this task to the others and assign me in the next task instead. I am sorry I should not have gone for this task in such a hurry and messing it up in midway like this. No worries ;-) Thank you :) I want to work on this task. I made some research and came to these conclusions: - The two services do not differ too much in the articles they provide: one difference which may be interesting is that geonames.org provides a small summary of the entry. However, both provide wikipedia urls so if one wants to read more about it, it would be only one click away. - As I previously said, they both offer almost the same information to marble. There are, tough, some small differences: geonames provides elevation in addition; wikilocation, however, returns the distance from the given point. The two services work similarly. They differ, however, in that in geonames you give a bounding box and it returns the wikipedia entries within that bounding box while in wikilocation you give a point by it's lat and lng and optionally a radius to search witin and limit for the number of results you want to return. - As far as time is concerned, I spoted some differences. I made some tests, and it took "real 0m5.679s" for geonames to provide 20 entries while wikilocation returned 9 articles in "real 0m6.602s" so almost 1 second difference and it returned half of the entries return by geonames. In conclusion, taking into consideration the time differences I think that replacing geonames with wikilocation would be the worst solution. I think that neither the last solution, "run wikilocation when detecting that we used up the limit" is not very good because when the limit would be reached, the difference of speed would be obvious. So my opinion is that querying articles from both geonames and wikilocation in parallel would be the best solution. PS: I couldn't make more tests because geonames said I reached the number of credits for the day. I tried to change the username but however, it didn't let me check more than one or two xml files. The timing results here (Germany, Sunday morning local time) are - about 2 seconds for wikilocation (high variance) - 0.1 seconds for geonames (low variance) Clear win for geonames, but that doesn't rule out wikilocation. For geonames I used our marble username without any problems. Which username did you use? The demo one is often problematic as it is shared by all users trying their API. I agree that replacing geonames with wikilocation is not an option right now, but I feel that detecting problems with geonames and only then working with wikilocation might be the better approach. It avoids unnecessary network traffic and duplicated results shown over each other in the map. Yes, I only used the demo user and some random usernams. I didn't know that I can use my marble username. I think that your conclusion of detecting problems with geonames and then using wikilocation is close to what I thought since I specifically excluded the option of replacing geonames with wikilocation too. I wasn't sure which of the other two to choose and I considered as a criterion the difference of speed. However, at a second thought, I think that you are right and it is more important to avoid unnecessary network traffic and duplicated results so running wikilocation when detecting that we used up the limit would be the best option. |