Data, Scholarship, and the eHumanities

A highlight of my September-October KNAW Visiting Scholar residency with DANS and the eHumanities Group was the audience discussion in the seminar on “Data Scholarship in the Humanities” that I led on 9 October. Participants came from all over the Netherlands, and from the humanities, social sciences, computing, law, libraries, and policy, spanning a wide range of specialties. My talk (Borgman, 2014), based on the book Big Data, Little Data, No Data: Scholarship in the Networked World, which is due out in January (Borgman, 2015), compared contemporary policy promoting open access to publications and data with the means and incentives to do so in the humanities.

The premise of my book is that enthusiasm for “big data” is obscuring the complexity and diversity of data in scholarship and the challenges for stewardship. Data practices are local, varying from field to field, individual to individual, and country to country. They are a lens to observe the rapidly changing landscape of scholarly work in the sciences, social sciences, and the humanities. Inside the black box of data is a plethora of research, technology, and policy issues. Data are best understood as representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. Rarely do they stand alone, separable from software, protocols, lab and field conditions, and other context. Concerns for data sharing and open access raise questions about what data to keep, what to share, when, how, and with whom. Open data is sometimes viewed simply as releasing data without payment of fees. In research contexts, open data may pose complex issues of licensing, ownership, responsibility, standards, interoperability, and legal harmonization. To scholars, data can be assets, liabilities, or both. To librarians, data also are scholarly products to curate for future users. However, data are much more difficult to manage than publications.

borgweb

The diversity of stakeholders participating in the seminar discussion reflected the many and varied stakes in research data and the challenges that data sharing policies pose for scholars, librarians, policy makers, publishers, students, and others. The usual scholarly issues arose first: incentives, integrity, interpretation, and control over one’s research. Legal issues abound in copyright and control; participants disagreed on how responsibility for data should be divided among researchers, funders, institutions, and private parties. Several people raised issues about the labor and resources required not only to release data, but also to reuse data. Others drew historical lessons about incentives to control and release research data that are largely ignored in contemporary research policy. The conversation expanded to address a wide array of other concerns. Economic issues for open data came to the fore. If universities and funding agencies claim that a business model exists for open data, then someone must earn money from those data. Researchers may be even more reluctant to release data if others will benefit monetarily, even if those monies are devoted to data stewardship. A nuanced exploration of the political economy around data sharing unfolded. Some were concerned about the normative models of science that would be promoted and enforced by these policies. Rather than promoting innovation, data release might stifle innovation by forcing standardization prematurely.

Open access and open data have resulted in unintended consequences, such as the ability of large corporate interests to exploit data at scales far beyond that which individual researchers can accomplish. Legal and licensing issues were a common thread of the discussion. Scholars want to control their own data to varying degrees, depending upon the discipline, funding support, and many other factors. Institutions such as universities and data archives want to make their data available selectively, balancing dissemination with trust and protection of research subjects. Considerations of how to balance public and private interests also occur at the technical boundaries. For example, institutions can offer technical interfaces such as APIs that allow others to innovate with their data. These are among the layered approaches to data access that are emerging. The opportunities to employ data in scholarship are vast – but so are the challenges for individual researchers, institutions, and policy makers.

My understanding of these complex issues was greatly enhanced by the discussion at that seminar and by many other conversations throughout my residency. I thank my generous and supportive hosts, and look forward to returning soon to the stimulating intellectual environment of the eHumanities Group.

boek-borgmanweb

Christine L. Borgman
Professor & Presidential Chair in Information Studies
University of California, Los Angeles

References
Borgman, C. L. (2014). Data Scholarship in the Humanities. Presented at the New Trends in eHumanities, Meertens Institute, eHumanities Group, Amsterdam.
Borgman, C. L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge MA: MIT Press.