CEDAR at the Digital Humanities Congress in Sheffield

The Digital Humanities Congress is a conference held in Sheffield every two years. Organized by the Humanities Research Institute of the University of Sheffield, its purpose is to promote the sharing of knowledge, ideas, and techniques within the digital humanities. The program for this year was quite impressive: 47 paper presentations, organized in 18 sessions and 4 plenaries spread over 3 days. Plenaries were placed as the last presentation of each day (except the opening), a format that I find interesting to leave attendants with good food for thought until the next day.

cedarweb3

Content wise, the conference was a big surprise to me, for a number of reasons. First, I was very impressed by the technical quality of most of the presentations. A DH venue is sometimes a good showcase for key philosophical questions and interesting proposals, and less of an exhibition of working solutions — but this event showed that the latter is increasingly gaining boost, something really great to see. Second, it showed a great balance between the ‘digital’ and the ‘humanities’, with a strong emphasis on solving problems computationally (and often needing to develop novel methods) derived from pure Humanities requirements (e.g. semantic similarity metrics across photography metadata), which also proves that DH can be inspirational to CS methods. This was confirmed by the fact that a majority of papers were based on data-driven projects. Usually these devote their efforts to (a) increase the degree of structure of their datasets (including their internal/external linkage to other data sources); (b) use CS methods to gain insight into the data (e.g. graph analysis, semantic similarity metrics, data mining); or (c) both.

Going a bit deeper on the content, it was good to see that, besides sessions specifically devoted to semantics and meaning, there were mentions everywhere about the Semantic Web and Linked Data as basic paradigms to open up, link and study the meaning of the Humanities on the Web. It was not a surprise, then, to find a whole lot of research in historical photography based on analysing and linking their metadata, instead of e.g. applying computer vision methods. That being said, it’s also true that this time a few more papers were devoted to literature than (social) history. But irrespective of their humanities nature, licensing of research datasets, methods and tools is still an issue in DH and was devoted lots of discussion. Usually CC derived licenses are sufficient, but funding issues or data owner restrictions put sometimes scholars in a pre-openness era position.
As a computer scientist, it’s always interesting to spot the differences between DH conferences and CS conferences. An interesting observation about the culture in DH venues: questions in the question rounds aren’t supposed to be “answered”, but to be “followed up”. Part of the academic discourse is built live during the discussion round, in contrast with CS, where questions are devoted to disambiguate what’s been just communicated or to criticise assumptions/hypotheses. In DH I find this particularly more thought-exploratory based.

cedarweb5

Our own presentation was scheduled in a session called “Linking Challenging Data”, and we showcased everything CEDAR has produced during 2014: refinement of our 5-star Linked Data publishing workflow, the (automatic) building of concept schemes (to be presented in detail in the SemStats workshop of ISWC), the monitoring of dimensions in current Linked Statistical Data (through LSD Dimensions), and the analysis of concept drift in the historical censuses. We received acclaim and great feedback. In particular, the DH audience is always worried about trust of our transformations, and this time was no exception. They were very pleased to know that we implement PROV to keep provenance of all data items (i.e. to the original Excel cell level), and were keen of examining our decision-making process. Second, there was discussion of pros and cons of applying ontology learning to historical datasets, and how to model the semantic differences with contemporary versions. Third, to model concept drift, they pointed out the Historical Thesaurus of English, an invaluable linguistic resource to understand dynamics of meaning. And fourth, as a general concern, scholars see the value of LOD but are skeptical about the usefulness of its current content, claiming that (a) more Humanities datasets need to be published as LOD; and (b) Humanities data needs to be heavily pre-processed before being published in the Semantic Web. This was somehow contradictory with our approach: we publish historical data as-is first, and run all our cleaning and harmonization workflow only afterwards.

The DH Congress also gave a lot of space for informal discussion and thought. In particular, lots of discussion about authenticity, deconstruction and reconstruction of sources, and what it really means to be a historical source, were very related to our recently accepted EKAW position paper “What is Linked Historical Data?”. Talking about sources always comes with thinking about their preservation; for this reason, it was argued that there is a tension/trade-off at digital archives between materiality of sources (i.e. keeping them as close as possible to their original context) and their preservation for eternity. The more detailed the data/metadata preserved, the more expensive it is to maintain. Some kind of study on priority of archiving seems fundamental here, assuming that we cannot preserve everything at its higher granularity level.

cedarweb4

It was a pleasure to find that (Dutch) industry was also present with a great tool called Node Goat, specifically tailored to visualization of DH research. We took the opportunity to promote the forthcoming COMMIT event in Amsterdam, a good mixture between private-public institutions, and the 2015 visiting fellowship at the eHumanities Group.
If I had to choose a favourite presentation, I would probably pick up Toby Burrow’s presentation “Ontologies and the Humanities: Some Issues Affecting the Design of Digital Infrastructure”, in the “Modeling Meaning” session. He did a fantastic job at pointing major issues affecting the publishing of the DH in the Semantic Web from a critical perspective, including explicit mentions to our research on concept drift. It seems that our networking at Intersect and HuNI in Australia last year worked great!
Overall, the Digital Humanities Congress turned out to be a great DH conference, with good balanced content, impressive organization, fantastic technical quality and excellent networking. Looking forward to attend again in 2016!