Grants talk:IEG/WikiBrainTools

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Stuartyeates (talk | contribs) at 08:53, 3 October 2014 (reply). It may differ significantly from the current version.

Latest comment: 9 years ago by Stuartyeates in topic Suggestions

Some algorithmically intensive tools that already exist

This is a neat idea! Here are a few tools that already use rich algorithms and might be helpful to look into/talk to their developers:

Useful people to talk to:

Useful places to seek feedback and post notifications:

Looking forward to hearing more about your idea! Cheers, Jake Ocaasi (talk) 17:16, 25 September 2014 (UTC)Reply

Thanks for the suggestions, User:Ocaasi! I've already made some of the content-based changes you suggested (e.g. useful mailing lists to tap). I've also been in touch with User:EpochFail. Once the feedback period is open on Wed, I'll email the remaining people to see what kinds of improvements they'd suggest. Shilad (talk) 19:31, 28 September 2014 (UTC)Reply

Finalize your proposal this week!

Hi Shilad and Brenthect. Thanks for drafting this proposal!

  • We're hosting one last IEG proposal help session in Google Hangouts this weekend, so please join us if you'd like to get some last-minute help or feedback as you finalize your submission.
  • Once you're ready to submit it for review, please update its status (in your page's Probox markup) from DRAFT to PROPOSED, as the deadline is September 30th.
  • If you have any questions at all, feel free to contact me (IEG committee member) or Siko (IEG program head), or just post a note on this talk page and we'll see it.

Cheers, Ocaasi (talk) 20:04, 25 September 2014 (UTC)Reply

Promoting WikiBrain?

@Shilad and Brenthecht: Hey there. Very pleased to read this as someone who reads up on Wikipedia-related research. I'm not surprised to hear many researchers stray away from research in the area due to the interface-related obstacles they face that prevent data extraction. As this is one of the main problems you identify, though, I wanted to ask what this team might consider doing to inform and attract potential researchers to this new treasure trove of algorithms. Presumably (but correct me if I am wrong), many in the WikiTools community are already familiar with extracting data to inform their work. I get that this proposal would make their jobs easier and would open up new research avenues, but how will you be reaching out beyond the WikiTools community? I JethroBT (talk) 22:08, 29 September 2014 (UTC)Reply

Good question! We've already taken some first steps to promote WikiBrain to algorithmic researchers. We've published a paper describing WikiBrain, and several other papers that use WikiBrain and refer algorithmic and community researchers of Wikipedia to WikiBrain. We have begun to make some inroads, and have received algorithmic contributions from some other research groups. In addition, this grant would also support traveling to two major algorithmic conferences (SIGIR and WWW), where we would present demo posters and organize "Birds of a Feather" sessions. I'd also be interested to hear any other ideas you have! Shilad (talk) 03:18, 30 September 2014 (UTC)Reply

Suggestions

Here are a couple of suggestions based on a short poke around the website(s)

  1. It seems like WikiBrain is entirely based on the wikipedia dumps. If it is it needs to be made clear that data no tin the wikidumps is not accessible via WikiBrain.
  2. It seems like WikiBrain relies on downloads of the wikipedia files, which are huge downloads. The pitfalls of this need to be made clear.
  3. https://github.com/shilad/wikibrain has contributions from 13 contributors, which is better than I expected.
  4. It seems to me that to make non-trivial use of WikiBrain, an intricate java development environment needs to be installed, this needs to be made clearer.
  5. Is there a continuous integration server? That seems like the kind of thing that would be very useful
  6. The mailing list needs to show active use. You may need to encourage your co-located devs to switch to communicating via it.
  7. The beginners example at https://shilad.github.io/wikibrain/# links to https://github.com/shilad/wikibrain/blob/master/wikibrain-cookbook/src/main/java/org/wikibrain/phrases/cookbook/ResolveExample.java which is 404
  8. I STRONGLY recommend that you move from a co-located team to a geographically diverse team.

cheers Stuartyeates (talk) 01:41, 3 October 2014 (UTC)Reply

Stuartyeates, Thanks for all the great feedback! I want to follow up on a few of user suggestions.
    • WikiBrain installation needs: WikiBrain makes use of a few other data sources (page view data, Natural Earth GIS data, several public NLP datasets), but you are correct that it primarily uses WikiDumps. One of the primary goals of this project is to eliminate the need for tool developers to install WikiBrain at all. We would install WikiBrain on Wikimedia Labs, preprocess the data, and provide a web API for bots and researchers. I think this point should address your first few concerns.
      • The primary issue with reliance on WikiDumps is not size (as you point out that can be overcome) but the fact that not all information is there so some questions can't be answered. Defining the data available defines the nature of the research that can be undertaken and helps researchers identify early whether the project is right for them. (deleted articles? old versions of live articles? user edit traces? etc) Stuartyeates (talk)
    • Integration tests: At the moment, we do have a continuous unit test server (Travis CI), but not an integration test server. I have a short term (next month) goal to revive our integration tests.
      • The need to test on versions of java and java libraries on the Wikimedia servers as well as defaults on other servers is very important for getting things to 'just work' for a large group of people. Stuartyeates (talk) 07:57, 3 October 2014 (UTC)Reply
    • Mailing list: Totally agreed! I'll use your suggestion as a catalyst to encourage this change.
    • 404: Thanks for the tip. Looks like the link didn't survive a recent refactoring. I've now fixed it.
    • Geographically diverse team: YES! Are you volunteering? :) I'm only partially kidding. I do hope that a side-effect of the engagement plan for this IEG is to build a broader coalition of developers. I understand we'll need to be better about communication patterns to make this work (e.g. the mailing list).
      • Maybe, but I'm still genuinely confused as to whether the data I'm interested in is available via WikiBrain. I'm interested in tracking the growth of individual articles over time and the long-term edit patterns of users ('edit traces'). Stuartyeates (talk) 08:53, 3 October 2014 (UTC)Reply
Shilad (talk) 05:19, 3 October 2014 (UTC)Reply