Tooling for the digital humanities

Data-driven research is all about collecting data, processing it, analysing it, derive some conclusions and share all of that with the rest of the community. For this last step three deliverables are often, if not always, used: publications, datasets and web-based end-user interfaces. On April 16 I proposed to discuss the relevance of the last item through the light of modern web-based development techniques and we had a very interesting afternoon ! The point discussed through the slides was that creating the perfect tool that will match all the needs of its potential users and be maintained over a long period of time is an utopia. Betting that in the worse case people interested will fall back on downloading the data and import it into their own research environment may not be a winning alternative either as this often comes at a very high cost. ICT moves very fast and work on tools is limited by the availability and presence of their original developer, the availability of funds and the community adoption rate. But it does not really matter as the thing that end-users are really interested in is the data. This is what they will import in their own infrastructure and it is important to facilitate this as much as possible.

The key to make data easy to import and re-use is not to make properly documented data-dumps. Though this is an important thing to do, data consumers can even more rapidly get what they need out of an Application Programming Interface (API). Contrary to a data dump and API does not give away data but exposes part of it. Look for instances at the API provided by Twitter, RottenTomatoes, SlideShare and Flickr. Using any of these it is possible to rapidly get some social network data, images or movies information without having to go through the hassle of downloading a large data set and load it into some database system. It can further be remarked that none of these providers would give you their data anyway. APIs are also a good way to share part of your data without having to give it away 😉 Another advantage of APIs is that they require much less coding that end-user interfaces and also can use less computing resources.

A take-home message is that the design of an end-user interface should start by the design of the API that will power it and, eventually, a wide range of third-party tools. This does not take away the fact that if no third party is interested in the data nobody will use that API anyway but at least the opportunity of easy data re-use will be there, and this may actually foster interest in the data.

To find out more you can follow the links in this post and also have a look at those additional ones:
• A related workshop that will also look into tooling for the humanities
• A blog post from Elena Spadini about the session