Show simple item record

dc.contributor.authorFeigenbaum, James J.en_US
dc.date.accessioned2018-03-15T17:19:20Z
dc.date.available2018-03-15T17:19:20Z
dc.date.issued2016-03-28
dc.identifier.citationJ Feigenbaum. "Automated Census Record Linking: A Machine Learning Approach."
dc.identifier.urihttps://hdl.handle.net/2144/27526
dc.description.abstractThanks to the availability of new historical census sources and advances in record linking technology, economic historians are becoming big data genealogists. Linking individuals over time and between databases has opened up new avenues for research into intergenerational mobility, assimilation, discrimination, and the returns to education. To take advantage of these new research opportunities, scholars need to be able to accurately and efficiently match historical records and produce an unbiased dataset of links for downstream analysis. I detail a standard and transparent census matching technique for constructing linked samples that can be replicated across a variety of cases. The procedure applies insights from machine learning classification and text comparison to the well known problem of record linkage, but with a focus on the sorts of costs and benefits of working with historical data. I begin by extracting a subset of possible matches for each record, and then use training data to tune a matching algorithm that attempts to minimize both false positives and false negatives, taking into account the inherent noise in historical records. To make the procedure precise, I trace its application to an example from my own work, linking children from the 1915 Iowa State Census to their adult-selves in the 1940 Federal Census. In addition, I provide guidance on a number of practical questions, including how large the training data needs to be relative to the sample.en_US
dc.description.sponsorshipThis research has been supported by the NSF-IGERT Multidisciplinary Program in Inequality & Social Policy at Harvard University (Grant No. 0333403).en_US
dc.subjectCensus dataen_US
dc.subjectUnited Statesen_US
dc.subjectMachine learningen_US
dc.subjectLongitudinal historical samplesen_US
dc.titleAutomated census record linking: a machine learning approachen_US
dc.typeArticleen_US
pubs.elements-sourcemanual-entryen_US
pubs.notesEmbargo: Not knownen_US
pubs.organisational-groupBoston Universityen_US
pubs.organisational-groupBoston University, College of Arts & Sciencesen_US
pubs.organisational-groupBoston University, College of Arts & Sciences, Department of Economicsen_US
pubs.publication-statusUnpublisheden_US
dc.identifier.orcid0000-0002-1625-2021 (Feigenbaum, J)


This item appears in the following Collection(s)

Show simple item record