Peter Christen is a Professor in the School of Computing at the Australian National University (ANU) in Canberra. He graduated with a PhD in Computer Science from the University of Basel, Switzerland, in 1990. He is also the Research Lead on the Scottish Historic Population Platform (SHiPP) within the Scottish Centre for Administrative Data Research (SCADR) at the University of Edinburgh in the UK. Peter’s main research interests are in record linkage and data mining, with a focus on privacy-preservation, data quality, and machine learning aspects of record linkage. He has published over 200 articles in these areas, including the two books “Data Matching” in 2012 and “Linking Sensitive Data” (co-authored with Thilina Ranbaduge and Rainer Schnell) in 2020. His work has attracted over 15,000 citations at Google Scholar.
Lessons from twenty years of working with (administrative) Big Data
Abstract: The last twenty years have seen a massive increase in the collection of data about people by businesses and governments. Such databases are mostly collected for administrative purposes, for example to manage the patients in a hospital. The wealth of knowledge that can be gained from analysing such administrative databases and the resulting value to organisations has led to the widespread use of data science technologies across both the private and public sectors.
However, administrative databases can also be used for research that is aimed at improving the social good, and to facilitate population studies across numerous domains. Known as Population Data Science, the use of administrative databases has various challenges that need to be considered. These include data quality and the human and social nature of how personal data are being collected, processed, and potentially integrated, as well as privacy aspects that need to be considered when working with databases that contain (possibly sensitive) personal information.
In this talk I will first provide an overview of what administrative data are, and give examples of how such data can be used for research to improve the social good. Then I will highlight some misconceptions that are commonly made when administrative data are used for analysis or research. I will also touch upon the challenges of accessing real administrative databases, and conclude with a set of lessons learnt and recommendations for anybody who is working with administrative Big Data.