Meta-Facebook data
Download presentation here
Digital footprint data
A digital footprint refers to the trail of digital activities and information left by individuals as they interact with digital platforms and services (Rowe, Cabrera-Arnau, and Piestrostefani 2023). It encompasses data generated through online activities such as browsing history, social media interactions, location tracking, and other digital transactions. The cumulative collection of data forms a digital profile that provides insights into an individual’s online behavior, preferences, and activities. These data can also be aggregated to shed light into macro structural processes and trends, such as urban mobility, consumer demand, transport usage, population ageing and decline.
In particular, Digital Footprint Data (DFD) can be harnessed to analyse human mobility patterns, including patterns of internal mobility within a specific geographical area. By leveraging data from sources such as mobile devices, transportation apps, and geolocation services, we can gain a deeper understanding of how individuals move within a region. For example, DFD can reveal the spatiotemporal patterns of commuting behaviour, the popularity of a route connecting two locations, the likelihood of a certain location to record congestion, or the impact of external factors such as weather conditions, public events or COVID-19 on mobility. Understanding human mobility patterns is therefore key to support fundamental human activities, including urban planning, transportation, service delivery, public health and sustainability. For an extended discussion of digital footprint data, see Rowe, Cabrera-Arnau, and Piestrostefani (2023).
Meta-Facebook data
The social media platform, Facebook with its large user base, offers unique advantages for analysing human mobility. In the course of providing services to their users and with their consent (aware or not), smartphones and smartphone apps regularly collect precise location information. In the case of Facebook, people have an option of whether or not to provide this information to Facebook (“Learn about Your Location Privacy | Privacy Center | Manage Your Privacy on Facebook, Instagram and Messenger | Facebook Privacy”). Location data are used to provide a variety of services, including helping people find nearby friends, information about nearby Wi-Fi hotspots, and location-relevant ads. These data also enable targeting of AMBER alerts and prompts to check-in as “safe” after a hazard event. In addition to powering Facebook product features, these location data can provide insights about how populations are affected by hazard events as they occur (Maas et al. 2019).
Through Meta’s Data for Good programme, Facebook’s parent company, Meta provides tools built from privacy-protected data on the Facebook platform, as well as tools developed using commercially and publicly available sources such as satellite imagery and census data. In particular, Data for Good has created two data sets: Facebook Population During Crisis and Facebook Movement During Crisis, that will be of use for this workshop.
These data sets make use of anonymised and aggregated data, including current and historical location data. While the raw data used for the creation of the datasets remains available only to the data owners, the aggregated data, with privacy and security protections, are shared with non-profit organisations and researchers on an ongoing basis in the days and weeks following a hazard event subject to the signature of a data sharing agreement (Maas et al. 2019).
Meta-Facebook population and movement data
Both of the datasets identified above (i.e. Facebook Population and Facebook Movements) contain data corresponding to a two-year period, starting in March 2020 to May 2022, including Mexico.
The records in both datasets are temporally aggregated into three 8-hour windows (00:00–08:00, 08:00–16:00 and 16:00–00:00) for every day.
The data are spatially aggregated into tiles according to the Bing Maps Tile System. This geospatial indexing system was developed by Microsoft and it partitions the world into square cells at various levels of resolution.
The Facebook Population data provide information on the number of active Facebook users in each tile.
The Facebook Movement data capture the total number of Facebook users moving between pairs of origin and destination Bing tiles.
We note here that due to the nature of the Facebook Movement data, we cannot distinguish between different types of movements, for example, daily commutes to work or permanent changes of address. However, we are still able to detect the evolution of movements between origin-destination pairs of Bing tiles, and hence, we can capture the impact of disasters, such as COVID-19, earthquakes, wildfires or hurricanes on human mobility patterns and local population numbers.
Each entry in the Facebook Population and Facebook Movements datasets include data for baseline levels before the disaster event. The baseline values are computed based on a 45-day period ending before the date of the disaster.
The datasets also include a ‘quality’ score indicating the number of standard deviations by which the observed data at specific locations and time windows differ from the baseline values, highlighting statistically significant changes.
Data generation
Prior to releasing the above-mentioned datasets, Meta applies three techniques to ensure privacy and anonymisation. First, a small undisclosed amount of random noise is added to ensure that precise location cannot be identified for small population counts in sparsely populated areas. While removing small counts may lead to an underrepresentation of the population in these places, the geographic distribution of population is still reflected in the data. Second, spatial smoothing is applied to produce a smooth population count surface using inverse distance-weighted averaging. Third, any remaining population counts of less than 10 are removed from the final data set (see Maas et al. (2019) for details).
Challenges of digital footprint data
Despite the numerous advantages of using DFD to study the patterns of human mobility, the presence of biases in this type of data is usually regarded as a problematic issue. The biases in DFD usually stem from the fact that certain population groups may be more likely to use location-tracking technologies than others, for example, younger people or people living in urban areas. Therefore, DFD may not be representative of the entire population, and as a result, the accuracy of analyses involving DFD may be hindered especially when biases are not accounted for.