Tuesday, 16 January 2018

Tapping Into Advertising Data for Studying International Migration

Ingmar Weber is the Research Director of the Social Computing Group at the Qatar Computing Research Institute (QCRI). As an undergraduate Ingmar studied mathematics at Cambridge University, before pursuing a PhD at the Max-Planck Institute for Computer Science. He subsequently held positions at the Ecole Polytechnique Fédérale de Lausanne and Yahoo Research Barcelona. In his interdisciplinary research, he applies computational methods to large amounts of online data from social media and other sources to study human behaviour at scale. Particular topics of interest include quantifying international migration using digital methods and other data for development projects. He has published over 100 peer-reviewed articles and his work is frequently featured in popular press  Since 2016 he has been selected as an ACM Distinguished Speaker.

International migration is one of the key drivers of demographic change. However, official statistics on “stocks of migrants”, i.e. how many people with origin country X are residing in country Y, are often unreliable. Reasons for this include the free movement of EU nationals within the EU, as well as generally inadequate census and civil registration systems for many developing countries.

Work done by Emilio Zagheni, Krishna Gummadi and myself tries to address some of the shortcomings of traditional methods to create migration statistics by tapping into a new kind of data: audience estimates provided by Facebook.

Facebook and other internet giants collect a rich data set on their users to be able to serve more targeted and more relevant advertising to their users. The data collected includes user self-declared attributes such as age or gender, it includes meta data such as the device or internet connection type used to access the service, it includes third party information such as credit card or voter registration data, and it includes attributes such as topical interests inferred from behavior such as "liking" posts on Facebook or visiting websites with social plugins. See https://www.cision.com/us/2017/07/how-to-improve-social-media-targeting/ for a good list of available targeting options on Facebook, Twitter, LinkedIn and Snapchat.
The detailed users profiles are generally not available to researchers outside the companies. However, aggregate and anonymized data is shared with potential advertisers in the form of audience estimates. Basically, Facebook and other social networks provide advertisers with information on "how many users match criteria X". For example, to help with planning an advertising campaign, an advertiser could inquire "how many monthly active Facebook users are married, male German expats aged 30-50 living in Qatar"? Answer: 120 (as of Dec 20, 2017).

This type of real-time digital census over Facebook's could potentially be of value to augment existing population estimates, in particular for countries where official statistics are unreliable or outdated. However, due to selection biases and an estimated 13% of duplicate or fake accounts it is clear that using this data set as a simplistic enumeration tool for the whole population will not give accurate results. See https://www.theguardian.com/technology/2017/sep/07/facebook-claims-it-can-reach-more-people-than-actually-exist-in-uk-us-and-other-countries for more indications of shortcomings of the data.

In our own research, we do not use the raw advertising audience estimates as the final answer. Rather we treat it as one of potentially many input signals for an estimation task of the kind "how many Germans are living in Qatar today"? As long as the biases in the underlying data are either (i) uniform, e.g. 13% of duplicate or fake Facebook accounts for all countries, or (ii) systematic, e.g. Western Europeans are always less likely to be on Facebook compared to Arab nationals, an appropriately fitted model can account for and correct such biases.

In our paper “Leveraging Facebook'sAdvertising Platform to Monitor Stocks of Migrants”, Emilio, Krishna Gummadi and I show the feasibility of this approach to derive stocks of migrants across different US states and around the world. Concretely, we show that it is indeed possible to build models to make out-of-sample predictions on how many people from a certain origin country are residing in a particular US state. Similarly, it is possible to predict the percentage of expats out of the whole population for countries around the globe.

Potentially, the Facebook audience estimates could also give estimates for stocks of migrants at the sub-national and even the sub-city level. To illustrate this, Matheus Araujo, Michael Aupetit, Yelena Mejova and myself created a data visualization for the Facebook data for Doha: http://fb-doha.qcri.org.

As an example, this shows a density map of Nepali expats across Doha, with the highest density in the Industrial Area. The tool also shows that Nepali expats in Doha are predominantly male (93%) and are Android users (94%). Contrast this to the same map for Western expats with the highest densities in West Bay and the Pearl. Western expats are more gender balanced (44% female) and more likely to own iPhones (56%).  A similar visualization for New York City can be explored at http://fb-nyc.qcri.org [Usage info for the two data visualizations: Select several filters on the left to drill down to smaller populations by nationality, gender or other criteria. Click a selection again to de-select and revert to the whole category such as all nationalities or all genders.]

Given Facebook’s global reach of 2.1B monthly active users we believe there is a lot of potential in using this data source to support global development efforts, in particular given its easy accessibility through official APIs. At the same time, no single data source is a cure-all and many have complementary strengths. Satellite data has truly global reach and can give estimates of population densities but satellite data will never reveal the nationality or gender of earthlings. Call detail records (CDR, https://en.wikipedia.org/wiki/Call_detail_record) are great for studying dynamic changes in population density, but there are limitations for monitoring international migration as people often change their SIM cards once they move.

I’m truly optimistic that as Digital Demography advances and matures as a field and as researchers start to work collaboratively, combining different data sources, we will see more and more scientific work with real impact on the creation of migration statistics. If you’re interested in how to use new data sources and methodologies to help fill data gaps around the globe, please get in touch by email at: iweber -atsignal - hbku.edu.qa.

Relevant slide decks:

Using internet advertising data for studying international migration (https://www.slideshare.net/IngmarWeber/using-internet-advertising-data-for-studying-international-migration)

Digital Demography - WWW'17 Tutorial - Part II (https://www.slideshare.net/IngmarWeber/digital-demography-www17-tutorial-part-ii)

Wrapper libraries to obtain Facebook advertising audience estimates:

Wrapper library in Python (https://github.com/maraujo/pySocialWatcher) by Matheus Araujo (https://sites.google.com/view/matheusaraujo/)

All of my publications are available at https://ingmarweber.de/publications/. Feel free to follow me at https://twitter.com/ingmarweber.

Friday, 3 November 2017

Accidental Disclosure in the Online Medium

This blog is written by Ana Latia, a student at Cardiff University. She has been studying relevant sociological theories and methods for how we can understand the digital society and how the Internet shapes our everyday lives.

To what extent does the performance on Facebook distorts someone’s identity and how is their privacy put in danger?
I am writing this in order to contribute to the conflicting debates concerning privacy on social media. This will examine individuals’ behaviours and performances on Facebook while having the concept of ‘accidental disclosure’ as its main focus.
A large number of academics started recently to address problems such as users’ sense of privacy when using digital media. I find it very interesting how the digital world is very distinctive when talking about countries all around the world and findings may differ. For instance, there are individuals who are not aware of the fact that the content they share on Facebook can travel long distances in the online medium. The memes phenomenon is the most accurate example of people that are found on Facebook doing unconventional things and then their images, videos or texts are transformed into a meme that circulates all around the internet. This also raises questions regarding privacy and also the extent to which people are aware of accidental disclosure.
The information that we provide on social media can now reach a very large number of individuals within seconds. With regard to Facebook, identity is performed by interacting with other people, through pictures, videos, or over messenger, comments and likes. Social practices become permanent and reachable once they are performed online (Solove 2007). Accidental disclosure in this case refers to the unintended reveal of personal information; when a user posts sensitive information about themselves on a social networking site such as Facebook, or when someone provides confidential material without the user’s authorization. This is important in social media as people cannot monitor everything they share as they are not only evaluated by what they post online, but also on their peers’ actions (Jerningan and Mistree 2009).
Perceptions about privacy and practices on Facebook have developed over time (Vitak 2017: 636). According to Chakrabortly et al (2013) and Madden et al (2013), younger users on Facebook tend to disclose a lot more information; they also have less restrictive attitudes on the issues of sharing personal information.
In May this year I conducted a small research project in order to understand individual’s opinion about accidental disclosure and how this concept shapes their performance on Facebook. The methodology used involved semi-structured interviews and visual methods in the form of images (collected from Facebook), while the sample was composed of three undergraduate students from Cardiff University.
Analysing these interviews and photos as social performances demonstrated the extent to which apparently ‘private’ experiences of the self are manifested by means of displaying photos on social media. All three interviewees admitted that they construct a self-identity on Facebook. Moreover, all of them claimed that at some point they would not share personal information if it will be seen by a large public. They were all fully aware that employers might look on their social media accounts and therefore, they divided their Facebook profile into two parts – close friends and general public.
A big impact that this study provides is that it has practical implication. The findings can articulate how constrained a person feels when constructing an identity on Facebook. Moreover, this can also provide advice for employers who use social media for selecting people and make them ask themselves: “Is this who X really is?”. For example, two of the respondents claimed that they do not feel that the identity that they construct on Facebook is accurate.
All of these represent a starting point for a better understanding of contemporary Facebook phenomena; studying individuals’ attitudes and behaviours on Facebook helps advance theory within social media studies.


·         Solove, D. 2007. The future of reputation: Gossip, rumor, and privacy on the Internet. New Haven, CT: Yale University Press.
·         Jernigan, C. and Mistree, B. 2009. “Gaydar: Facebook friendships expose sexual orientations.” In First Monday 14(10). Online. Available: http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/261 1/2302 [Accessed: 03.12.2016].
·         Vitak, J. 2017. “Facebook as a Research Tool in the Social and Computer Sciences”. In Sloan, L. and Quan-Haase, A. eds. 2017. The SAGE Handbook of Social Media Research Methods. Sage Publications. Online. Available at: https://www.dawsonera.com/readonline/9781473987210  [Accessed: 08.03.2017].