Due to licensing restrictions, we cannot provide the raw data. However, we provide estimates obtained using the methods that we developed, and made available with scientific publications for reproducibility purposes. Currently, the database is global, at the country level and covers the period from 1998-2018.
Description | Research article | Data download | Data source | Materials for reproducability |
---|---|---|---|---|
Yearly measures of the number of scholars, emigrations, immigrations, net migration rates and other variables, per country | Replication data on Github | Scopus | Github repository | |
Yearly bilateral flows of scholars between countries | Replication data on Github | Scopus | Github repository |
If you download or use this data, please subscribe to our newsletter to receive updates on the project:
More information about how the data is produced and processed can be found in our official data documentation
or problems concerning the access to the data, please write to us.
Currently, our main data source is the Scopus bibliometric database because of its high quality in author name disambiguation. It covers the metadata and abstracts from over 50 million articles from more than 9,000 publishers and over 17 million author profiles.
Via the Max Planck Digital Library, we use the infrastructure of the German Competence Centre for Bibliometrics to generate and download 240 million authorship records from the data. One authorship record is the unique combination of author, publication and affiliation addres.
We filter out unreliable entries of the Scopus database (please read the Methods and Documentation Working Paper for more information). In the next step, we group the data by year and author:
year | author | affiliation country |
---|---|---|
2008 | Jane Doe | DEU |
2008 | Jane Doe | DEU |
2008 | Jane Doe | FRA |
2012 | Jane Doe | USA |
If there is more than one affiliation country in one year, we take the most frequent one. If there is a year without any affiliation country, we fill the time up to two years before a publication with the country in the next available year:
year | author | inferred residence country |
---|---|---|
2006 | Jane Doe | DEU |
2007 | Jane Doe | DEU |
2008 | Jane Doe | DEU |
2010 | Jane Doe | USA |
2011 | Jane Doe | USA |
2012 | Jane Doe | USA |
Aggregated by country and year, these are the populations of researchers.
If the country of residence changes, this will create a "migration event". In our example we register one migration event. The outmigration country is Germany, the inmigration country is the USA. The year of the migration is the first year with a new residence country: 2010.
year | author | outmigration country | inmigration country |
---|---|---|---|
2010 | Jane Doe | DEU | USA |
The migration numbers for each country are obtained by aggregating all migration events by country and year. The migration rates are calculated by dividing the migration numbers by the country's population of researchers.
The data is aggregated by country and year.
year | country | population of researchers | inmigration total | outmigration total | netmigration | outmigration rate | inmigration rate | netmigration rate |
---|---|---|---|---|---|---|---|---|
2016 | DEU | 131310 | 4499 | 4523 | -24 | 0.034 | 0.034 | -0.0002 |
2016 | USA | 759857 | 15296 | 14250 | 1046 | 0.020 | 0.019 | -0.0014 |
The data is aggregated by inmigration country, outmigration country and year. It shows the flows of scholars between all countries with at least one migration event.
year | outmigration country | inmigration country | number of migrations |
---|---|---|---|
2010 | DEU | USA | 2393 |