First results from the Elite Network Shifts Project
Extracting Entity Networks
One of the main tasks of the Elite Network Shifts (ENS) project is to automatically extract named entities – persons, organisations or places – from a corpus of digitized newspaper articles. Although the entity extraction field has been developing for a number of years now, the extraction of entity networks is still in its infancy.
In September, I presented a paper on extracting an entity network at the International Conference on Theory and Practice of Digital Libraries (TPDL) in Malta. It was an unusual topic for the conference participants, who were mostly concerned with organizing and storing digital data, but it was well received as one example of how the data in digital libraries can be used in practice. An entity network is constructed by detecting named entities and then using them as nodes in a graph. The links between the nodes are based on the co-occurrence of named entities found within the documents.
In the paper, I view network extraction as a problem of how to rank related entities. With hundreds of thousands of co-occurrences found among the entities in our project, we assign a weight to indicate the strength of the relation and then take only the very “strong” relations for visualization and interpretation. There are different ways to assign weights. They can be based on simple frequencies or on statistical methods like “conditional probability” or “pointwise mutual information.” The paper explores the utility of these various scoring methods by “ground truthing” them against the actual relations, as identified by some of the social scientists on the project.
This first ground truthing showed that it was best to use a combination of scoring methods. It also helped us to identify two kinds of problems with the co-occurrence method more generally. The first problem we call “enumeration,” where entities appear as a list, for example, of presidential candidates or people attending a meeting. The second problem we call “observer”, where a political observer may mention other persons in relation to a subject even though they have no direct connection to each other. These are just two of the problems that we continue to work on in this multi-faceted project.
Fig. 1.A sample network retrieved in response to “BL Habibie” as query entity. The thickness of the links depicts the association strength as represented in the document collection.
ENS Project Co-Ordinator Honoured
The Elite Network Shifts (ENS) team were pleased to attend the inauguration of ENS co-ordinator, Gerry van Klinken, as professor by special appointment at the University of Amsterdam on October 2nd. Gerry gave a well-received lecture to a packed audience, interweaving theoretical insight on citizenship with the biography of an Indonesian political activist and some personal reflections on his career to date. Please find here the transcript of the lecture.