Differentiating Emigration from Return Migration in Digital Trace Data: A case study of scholarly migration

RISIS Research Seminar

March 12 2025 @ 12:30 PM – 2:00 PM

Presenter: Aliakbar Akbaritabar, Max Planck Institute for Demographic Research

Discussant: tba

Abstract

Most digital trace data does not include the nationality of individuals for privacy reasons. Once this data is used for migration research, it can have a left truncation issue since we are uncertain about the migrant’s country of origin. Identifying nationality enables a better differentiation between emigration and return migration. We detect the nationality with the least available data, full names, and use it instead of the country of academic origin in studying the migration of scholars. We gathered 2.6 million unique name-nationality pairs from Wikipedia and categorized them into families of nationalities with three granularity levels. We used a character-based machine learning model that reached a weighted F1-score of 80% for highest- and 64% for country-level categorization. We discuss the shifts in migration rates when considering the assigned country of origin based on authors’ names rather than the previously used country of first academic affiliation. Our results show that this impact is exacerbated in the case of countries of immigration that have a more diverse academic workforce such as the USA, Australia, and Canada. For instance, 75% of the outmigration from United States to China are return migration of scholars with Chinese names which has implications on research using bibliometric data to study the migration of scholars.