Scientific topics in bibliometric looking glasses

a KnoweScape COST TD1210 workshop

The definition of what is a “topic” in science represents a baseline for a lot of follow-up questions such as how to define research diversity, how to measure interdisciplinarity, or how to identify breakthrough research. As reported elsewhere, since five years Jochen Gläser and colleagues organize a small-scale workshop in Berlin with bibliometric experts to see how this fundamental problem for understanding the science system can be supported by quantitative means.

Quite uniquely this summer workshop “Measuring the Diversity of Research”, August 29-30, 2014 got the participants engaged in applying their specific methods of clustering and mapping to a shared, cleaned dataset of publications from astrophysics. The topical landscape of a field or subfield is not easy to be determined automatically. Depending on which information signals are used, e.g. lexical elements or references, we get another mirror image of the field. Science is made up from a dense fabric of thoughts and results. Putting them into disjunctive classes always misses certain aspects. What is an optimal compromise in the delineation of a topic also very much depends on the question to be answered.

This workshop brought together the application of so-called hybrid methods, whereby bibliographic clustering is combined with lexical/textual analysis (Wolfgang Glänzel), direct citation methods (Nees Jan van Eck, Theresa Velden, Frank Havemann) and co-citations methods (Kevin Boyack).

The methodological core of the contributions to the workshop concerned clustering algorithms that run over complex networks. Their design is a research problem in itself. But networks are common also to other projects in the Computational Humanities programme, such as the Elite Network Shifts. There is also another communality that links Digital Humanities and Bibliometrics. It is a problem of how to allocate features or characteristics of objects, events or persons to a classification system, or in other words, how to define relevant dimensions in networked information spaces. Because of this underlying information-theoretical problem such scientometric exercises have a meaning beyond bibliometrics and research evaluation.

The Berlin workshop also proved that working together with one dataset is an excellent way to test methods, and to reveal their meaning. While this seems to be evident epistemologically, this does not often happen in scientific practices. This is why data sharing and at the same time organizing hands-on activities about shared data is so important. Trust is the decisive ingredient for such a practice change. Together with hospitality, friendliness, and a focused but unhurried atmosphere of discussion made this workshop a treat for knowledge workers, for which we all thank Jochen, Michael and Frank as local organizers!

For KnoweScape, this workshop contributed to the wider shared interest in information spaces that result from scholarly communication and its implications for research evaluation and impact measurements.
The new ideas that emerged will be discussed at the Second Annual KnoweScape Conference in Thessaloniki in November, and we will also address how to disseminate ideas and results beyond the TD120 community.