Using Digital Footprint Data to Measure and Monitor Human Mobility

1.1 Introduction to digital footprint data

A digital footprint refers to the trail of digital activities and information left by individuals as they interact with digital platforms and services (Rowe, Cabrera-Arnau, and Piestrostefani 2023). It encompasses data generated through online activities such as browsing history, social media interactions, location tracking, and other digital transactions. The cumulative collection of data forms a digital profile that provides insights into an individual’s online behavior, preferences, and activities. This data can also be aggregated to shed light into macro structural processes and trends, such as urban mobility, consumer demand, transport usage, population ageing and decline.

In particular, digital footprint data can be harnessed to analyse human mobility patterns, including patterns of internal mobility within a specific geographical area. By leveraging data from sources such as mobile devices, transportation apps, and geolocation services, we can gain a deeper understanding of how individuals move within a region. For example, digital footprint data can reveal the spatiotemporal patterns of commuting behaviour, the popularity of a certain route connecting two locations, the likelihood that a certain location experiences congestion at a certain time of the day, and even the impact of external factors such as weather conditions, public events or COVID-19 on mobility. Understanding human mobility patterns is therefore key to support fundamental human activities, including urban planning, transportation, service delivery, public health and sustainability. For an extended discussion of digital footprint data, see Rowe, Cabrera-Arnau, and Piestrostefani (2023).

1.2 Meta-Facebook data

The social media platform Facebook, with its vast user base, offers unique advantages for analysing human mobility. In the course of providing services to their users, many smartphones and smartphone apps regularly collect precise location information. In the case of Facebook, people have an option of whether or not to provide this information to Facebook (“Learn about Your Location Privacy | Privacy Center | Manage Your Privacy on Facebook, Instagram and Messenger | Facebook Privacy”). Location data is used to provide a variety of services, including helping people find nearby friends, information about nearby Wi-Fi hotspots, and location-relevant ads. This data also enables targeting of AMBER alerts and prompts to check-in as “safe” after a hazard event. In addition to powering Facebook product features, this location data can provide insights about how populations are affected by hazard events as they happen (Maas et al. 2019).

Through Meta’s Data for Good programme, Facebook’s parent company, Meta Platforms Inc., provides tools built from privacy-protected data on the Facebook platform, as well as tools developed using commercially and publicly available sources like satellite imagery and census data. In particular, Data for Good has created two data sets, Facebook Population During Crisis and Facebook Movement During Crisis, that will be of use for this workshop.

These data sets make use of anonymised and aggregated data, including current and historical location data. While the raw data used for the creation of the data sets remains available only to the data owners, the aggregated data, with privacy and security protections is shared with non-profit organisations and researchers on an ongoing basis in the days and weeks following a hazard event (Maas et al. 2019).

1.3 Meta-Facebook population and movement data

Both data sets Facebook Population and Facebook Movements contain data corresponding to a two-year period, starting in March 2020, and to four Latin American countries, Argentina, Chile, Colombia and Mexico.

The data in both data sets is temporally aggregated into three 8-hour windows (00:00–08:00, 08:00–16:00 and 16:00–00:00) for every day in the aforementioned two-year time period.

It is spatially aggregated into tiles according to the Bing Maps Tile System. This geospatial indexing system was developed by Microsoft and it partitions the world into square cells at various levels of resolution.

The Facebook Population data provide information on the number of active Facebook users in each tile.

The Facebook Movement data capture the total number of Facebook users moving between pairs of origin and destination Bing tiles.

We note here that due to the nature of the Facebook Movement data, we cannot distinguish between different types of movements, for example, daily commutes to work or permanent changes of address. However, we are still able to detect the evolution of movements between origin-destination pairs of Bing tiles and hence, we are able to capture the impact that COVID-19 has on mobility patterns.

On top of the data for the two-year period, each entry in the Facebook Population and Facebook Movements datasets include data for baseline levels before COVID-19. The baseline values are computed based on a 45-day period ending on the 10th of March of 2020.

The data sets also include a ‘quality’ score indicating the number of standard deviations by which the observed data at specific locations and time windows differ from the baseline values, hence highlighting statistically significant changes.

1.3.1 Data generation

Prior to releasing the above-mentioned datasets, Meta applies three techniques to ensure privacy and anonymisation. First, a small undisclosed amount of random noise is added to ensure that precise location cannot be identified for small population counts in sparsely populated areas. While removing small counts may lead to an underrepresentation of the population in these places, the geographic distribution of population is still reflected in the data. Second, spatial smoothing is applied to produce a smooth population count surface using inverse distance-weighted averaging. Third, any remaining population counts of less than 10 are removed from the final data set (see Maas et al. (2019) for details).

1.4 Challenges of digital footprint data

Despite the numerous advantages of using DFD to study the patterns of human mobility, the presence of biases in this type of data is usually regarded as a problematic issue. The biases in DFD usually stem from the fact that certain population groups may be more likely to use location-tracking technologies than others, for example, younger people or people living in urban areas. Therefore, DFD may not be representative of the entire population and as a result, the accuracy of analyses involving DFD may be hindered, especially when biases are not accounted for.