Connect with us

Infra

Using broadband infrastructure as a social sensor to detect inequities in unemployment during the COVID-19 pandemic – Scientific Reports

Published

on

Using broadband infrastructure as a social sensor to detect inequities in unemployment during the COVID-19 pandemic – Scientific Reports

Mainline results for broadband: asymmetries in broadband access increase unemployment

Before discussing the results, we must first define empirically our key variable, namely what we mean by broadband access. We start with the definition used by the Federal Communications Commission (FCC) which states that broadband is “high-speed internet access that is always on and faster than the traditional dial-up access”. They define adequate broadband access for an individual as at least 25 Mbps upload speed and 3 Mbps download speed44. Broadband access can be measured in terms of several different metrics including a) actual speed (measured by Microsoft (MSFT), Ookla and MLab), b) advertised availability (measured by the FCC), and c) adoption (measured by the American Community Survey (ACS)). Therefore, our operating definition of broadband access is the degree to which an individual can access high-speed internet as measured either in terms of self-reported access, advertised speed, and/or actual speed tests. For the purposes of this study, “adequate” either means 50% or more of the population has 25 Mbps upload and 3 Mbps download in terms of advertised or actual speed, or 50% of the population or more report access to broadband. This definition aligns with those of other well-regarded sources45. We discuss the differences in these broadband metrics further in the Methods section and in SI Appendices C, D, and E. Table 1 also highlights how each metric is measured.

Table 1 A summary of each broadband metric used and an explanation of how it is measured and collected.

Figure 1 presents our base case results exploring the efficacy of using broadband as a social sensor to detect unemployment during the COVID-19 pandemic. When we refer to the “base case” throughout the study, we are referring to the use of the Microsoft (MSFT) 2020 dataset to separate out treated counties (50% or more broadband penetration) and control counties (less than 50% broadband penetration).We use this as the base case for our work because this data represents actual user speeds and is the most publicly accessible dataset with the widest geographic coverage. In this base case, when all else is equal after the shock of the stay-at-home orders, we find that counties with more than 50% of the population having access to 25 Mbps download/3 Mbps upload broadband speeds experience an increase of 1.34% in their unemployment rate over similar counties that have less than 50% access. Presented on the right-hand size of Fig. 1 is the parallel trends plot for the base regression which shows that the unemployment rates for both treatment and control counties remain constant prior to the impact of COVID, which visually demonstrates that the parallel trends assumption is upheld (Fig. 7 provides a more formalized statistical test). We also conduct a synthetic controls analysis, which relaxes the parallel trends assumption and combines this approach with fixed effects as explained in prior work 46 (see Table 3 and Appendix H for more details), and results remain robust to this approach. Presented on the left-hand side of Fig. 1 are the difference-in-differences (DiD) estimators for each kind of broadband data considered in the study, using varying levels of penetration to define the control and treatment groups in the regression models (cases using M-Lab and Ookla data are available in Appendix G). When using MSFT, FCC or ACS as the treatment lens, this positive impact on unemployment remains robust across these different datasets and across different penetration levels (i.e., 25%, 40%, 50%, and 75%). While the results are robust across indicators, they do demonstrate variation based on what is being measured. At 50%, the results range from an increase of 0.84% in unemployment reported from FCC 2019 data to an increase of 2.31% in unemployment as reported by the ACS 2020 data. The regression models that serve as the basis for the heat maps in Fig. 1 are in SI Appendix I.

Figure 1

Left: difference-in-difference estimators for the full dataset. Robust standard errors, clustered at the state level, are included below each estimate with statistical significance indicated by the stars based off of a two-tailed test. In the base case (MSFT 2020 data, 50% penetration) when all else equal, after COVID, counties with more than 50% access to 25 Mbps download and 3 Mbps upload, experience an increase of 1.34% in their unemployment rate over similar counties that have less than 50% access. Right: parallel trends between the control (below adequate access to broadband at a county level for MSFT 2020) and treatment (above adequate access to broadband on average at a county level for MSFT 2020). Parallel trends hold prior to the shock of the COVID-19 pandemic.

One may find this baseline result surprising as one would surmise greater broadband access means easier ability to stay-at-home to work, which should better ensure individuals can stay employed. However, what this immediate intuition does not consider is the tension between how broadband access is typically measured vs. what prior empirics, especially in the social sciences, tell us about how infrastructure is distributed. This has important empirical and conceptual implications.

Empirically, all sets of publicly available broadband data are currently measured by spatial area. While this is the convention in prior work33,47,48, a tacit assumption then is that access levels are evenly distributed across the spatial unit of analysis. For example, if we measure that 90% of the population within a spatial unit has adequate broadband access, the expectation is that 90% is evenly scattered across the unit. In our case, let’s assume a county has 100 neighborhoods, each with equivalent populations, this implies 90% of the people in each neighborhood has adequate access. However, several studies note that infrastructure access is actually not evenly distributed18,21,49,50. In other words, let’s revisit our fictitious county with 100 neighborhoods. The reality could be that if we measure on average 90% of a population within a county that has adequate access, perhaps 100% of the population in 90 of the county’s neighborhoods have adequate access but no one in the remaining 10 neighborhoods have adequate access. This would still arrive at an average of 90% coverage. For illustrative purposes in this example, we assume uneven distribution is based on space, but such unevenness can also be based on specific industries or populations that are known to have asymmetric need for broadband or access to it. We will discuss this in the next section.

Conceptually then, those without access being confined to ever more concentrated areas may be “out of sight, out mind”. If employers increasingly see that those around them have broadband access, then they are likely to assume everyone has access. In essence, what we theoretically argue is happening is a “halo effect”, whereby employers assume everyone can access broadband to continue work as they increasingly do not observe those ever concentrated few without such access.

Considering these empirical and conceptual considerations, we would expect unemployment is sensitive to lack of access being confined to a specific set of neighborhoods, especially for counties where the majority have adequate access. We find evidence that this may indeed be the case. To gain insight into this, we sought census tract data that is the finest grained publicly available data that approximates the level of a neighborhood. While ACS, FCC and Ookla datasets all release broadband penetration estimates at the census-tract level annually, unemployment records at the census-tract level are only available from the ACS 5-year estimates, failing to provide the monthly temporal resolution required to run the prescribed DiD regression models. In order to make use of the data which is available, Fig. 2 shows the DiD estimator for sub-groups of counties which have above a given number of census-tracts with broadband access. In this subset, we use Ookla as our base case as this is the only dataset available at both a county and a census-tract level which measures quality of broadband speed thresholds. We find in Fig. 2 that as we subset the treated counties to ever greater percentages of census tracts with adequate (25%, 40%, 50%, and 75%) access, we see the unemployment rate increase between the treated vs. control counties. As we argue and anticipate, this suggests that lack of access is not evenly distributed but concentrated and when it is, employers are arguably more likely not able to adequately detect and accommodate those without adequate broadband access. This further implies that broadband may be a useful social sensor in detecting gaps at ever finer spatial scales to more sharply identify those groups which most need broadband support in times of crises such as during a pandemic.

Figure 2
figure 2

Left: difference-in-difference estimators for the dataset, subset by percentage of census-tracts which have access to adequate broadband within the treated group of counties. Robust standard errors, clustered at the state level, are included below each estimate with statistical significance indicated by the stars based off of a two-tailed test. In the base case (Ookla data, 50% of census-tracts with access in the county) when all else is equal, after COVID, counties with more than 50% of their census-tracts having adequate access to broadband experience an increase of 1.10% in their unemployment rate over similar counties that have less than 50% of the census-tracts in their county with adequate access. The plot of the parallel trends between the control (below adequate access to broadband at a county level) and treatment (above adequate access to broadband on average at a county level, parsed out by counties with varying levels of census tracts with adequate broadband access) is available in Appendix G. The parallel trends hold prior to the shock of the COVID-19 pandemic, and while the differences between treated and controls groups are small they can be observed in the plots. The ideas here is to assess whether concentrated in-access is what drives these potential effects. The full regressions that underpin these results are in SI Appendix I.

Fine tuning the sensor using socioeconomic and sociodemographic data

In light of the findings at this point, our aim now is to detect gaps in broadband spatial data and not assume the measure means even distribution across the spatial unit of analysis. In this spirit, we unpack who specifically within a spatial unit (in our case, a given county) does not have adequate broadband access, and in turn, is most negatively impacted by stay-at-home mandates implemented during the COVID pandemic. This is done by subsetting to specific populations known to have lower access levels or industries known to need broadband access more to work productively. If these gaps are consequential, we assert that unemployment will be higher for those counties where lack of access is indeed concentrated to particular groups or industries, hence the formulation of our aforementioned hypotheses.

Sensor sharpening and coarsening

To reiterate our hypotheses, broadband should sharpen as a social sensor for groups whose lack of broadband access makes them less resilient to the COVID-19 stay-at-home mandates, while broadband as a social sensor should coarsen for groups and locations where such access should be inconsequential in their ability to manage the pandemic. Several groupings lend support to our hypotheses. While we report the summary of our results in Fig. 3 and Table 2, the full regression models are all available in the SI Appendix I.

Figure 3
figure 3

This 2 × 2 matrix typology shows where broadband as a social sensor is appropriately sharpened and coarsened, as well as prone to error (either false positives or false negatives). Robust standard errors, clustered at the state level, are included below each estimate with statistical significance indicated by the stars based off of a two-tailed test. The above figure includes regression results for an example of each scenario, but one should note here that this is just a representative sample of each case, hence why we document all other variables that fit into each cell in a line below these representative samples. The full regressions that underpin these results are in SI Appendix I.

Table 2 The difference-in-difference regression estimators for each of the mechanism explorations are included in this table, along with the expected result, and a description of what this means for use as a social sensor.

The first subset focuses on regulatory-based mechanisms resulting from the stay-at-home orders which mandated that some groups work in-person based on their role in the local economy. In this scenario, we expect to see a greater impact on unemployment in those counties that have below median numbers of essential industry designated (EID) workers, because these workers were more reliant on having broadband access to both comply with stay-at-home mandates and to maintain their employment. For counties with above the median EID workers, when all else is equal, the base case finds that treated counties experience an increase in unemployment rate of 1.04% compared to control counties after the onset of COVID-19. For counties below the median number of EID workers, when all else is equal, the base case finds that the unemployment rate difference is 1.55% for treated counties when compared to control counties after the onset of COVID-19. We see in this gap that the sensor behaves as we expect; lower levels of essential workers sharpen the sensor and identify a larger gap in unemployment, while higher levels serve to coarsen the sensor.

To further solidify this mechanism, we also look at occupations that, by the nature of the tasks required for the work, are more amenable to the work-from-home mandates51. Here we expect that for counties that have on average higher numbers of people that are able to work from home prior to the pandemic, the impact of the forced work-from-home mandates during the COVID pandemic would have less of an impact as these occupations already allowed them to pivot to working from home (WFH) more easily. We find this to be the case. For counties with above the median numbers of people employed in industries that could easily work from home, when all else is equal, the base case finds that treated counties experience an increase in unemployment rate of 0.96% compared to control counties after the onset of COVID-19. For counties below the median numbers of people employed in industries that could easily work from home, the base case finds that the unemployment rate difference is 1.29% for treated counties when compared to control counties after the onset of COVID-19. Here we see the effect that people working in occupations and industries better suited to working online experienced lower rates of unemployment as a result of the work-from-home mandates.

The second subset focuses on marginalization. Certain groups, specifically Black and Hispanic populations, are known to have less access to broadband52 and we therefore expect them to be asymmetrically impacted when resilience depends upon broadband access during the stay-at-home mandates. Given this premise, we expect to see counties with higher median percentages of Black and Hispanic populations to also have higher rates of unemployment. For counties with above median percentage of Hispanic and Black populations, the base cases respectively find an increase of 1.62% and 1.42% in the unemployment rate of treated counties over control counties after the onset of COVID-19. For those counties below the median, when all else equal, the base cases respectively find that treated counties experience a lesser unemployment rate increase of 0.69% and 0.95% over the control counties after the onset of COVID-19. The difference for counties which have higher percentages of Hispanic and Black populations suggest that marginalized groups are indeed more detrimentally impacted by COVID-mandated stay-at-home orders, likely due to their systematically lower broadband access; higher levels of Hispanic and Black populations sharpen the sensor and lower levels coarsen it.

The third group focuses on industrial composition. We first look at industrial composition as demarcated by geography and how that should impact the importance of broadband access during COVID-19. Rural and urban areas are fundamentally different in their demand and supply for broadband services. The industries which drive the economic engines in rural areas (e.g., agriculture) are less dependent on broadband access. Therefore, we expect to see less impact on unemployment in rural areas when using broadband access to demarcate treatment and control groups. For urban counties and mixed urban/rural counties, when all else is equal, the base case finds an increase of 1.12% in the unemployment rate for treated counties over control counties after the onset of COVID-19. For solely rural counties, when all else is equal, the base case finds a statistically insignificant impact on unemployment after the onset of COVID-19. These findings suggest that the use of broadband as a social sensor is sharpened in urban areas, where the primary economic motors are more influenced by broadband access, and the sensor is coarsened in rural areas whose local economies are less dependent on broadband.

In addition to exploring the broad economic sectors associated with urban and rural areas, we also explore how work gets done in specific sectors can also impact social sensor sharpening or coarsening. For instance, the computational and analytical work in technology sectors likely necessitate greater reliance on broadband, so we would expect to see those counties with higher proportions of individuals employed in these sectors impacted more than individuals employed in other sectors. For counties with above median number of tech workers, when all else is held equal, the base case finds that treated counties experience an increase of 1.01% in unemployment over control counties after the onset of COVID-19. For counties below the median, when all else is equal, the base case increase in unemployment rate is insignificant for treated counties over control counties after the onset of COVID-19. This aligns with our expectations; given broadband is crucial for tech industry work, high-tech employment levels sharpen the sensor, while low-tech employment levels coarsen the sensor.

False negatives and positives

So far, broadband operates effectively as a social sensor when what it measures (i.e., broadband access) is predominantly driving unemployment impacts for the social group of concern. However, as we know from engineering, sensors can start experiencing error when what it measures is conflated with other signals53. What that means is while we find alignment with our hypotheses for the subsets we discuss above, broadband will not always coarsen and sharpen as a social sensor in anticipated ways if subsets capture multiple conflating signals beyond broadband access. In particular, we seek to characterize both type I errors (false positives) and type II errors (false negatives).

False positives are those subgroupings that we expected not to affect our social sensor but demonstrate an effect. For example, we anticipate no effect on employment for service workers because their reasons for unemployment are due to COVID-induced business closures and arguably not access to broadband (i.e., you cannot necessarily deliver food, laundry, or run concierge services purely online). However, we find that for counties with above median levels of service workers, when all else is equal, the base case finds an increase in unemployment rate of 1.17% for treated counties over control counties after the onset of COVID-19. For those counties below the median, when all is equal, the base case finds an unemployment rate increase of 0.51% for treated over control counties after the onset of COVID-19. While this could mean such services are increasingly moving online54, this may also be due to confounding aspects associated with service metrics. For instance, the service sector is strongly associated and co-located with high-tech sectors (r = 0.47), which suggests those who work in high-tech also increasingly use such services. This suggests collinearity between industry variables that is spuriously picked up by broadband. Therefore in this case, the conflating signal that is leading to error is arguably the spatial and sectoral linkages between service sectors that are less broadband-dependent with high-tech sectors that are more broadband dependent.

False negatives are those subgroupings that we expected to sharpen (or coarsen) our social sensor but prove inconclusive. For example, we would expect that counties which have higher average income would experience a less severe impact from stay-at-home orders due to the capability of being able to purchase improved broadband speeds. However, we find that in counties with above median average incomes, when all else is equal, treated counties experience an increase in unemployment of 1.55% over control counties after the onset of COVID-19. Conversely for below median counties, when all else is equal, treated counties experience an increase in unemployment of 1.03% over control counties after the onset of COVID-19. This again is likely due to the fact that there are confounding aspects associated with income metrics such as education considerations (population with bachelor’s degree or higher—r = 0.49). In this case, the conflating factor is other proximate and interlinked sociodemographic characteristics that one must carefully tune and calibrate upon deployment. For instance, income may potentially be conflating sociodemographic factors that drive lack of broadband access (i.e., income) with those that reflect greater capabilities and skills to put it to good use (i.e., education).

We see similar false negatives with households with children. Here, we posit due to the need for children to engage in virtual school due to stay-at-home mandates, these households would likely require the parent to stay at home to tend to their children during these times, risking their employment. However, here too we do not find this expected impact as there is little difference in the unemployment rate increases in the base cases (households with children: 1.27% unemployment rate increase for above median vs. 1.39% for below median). We do see this is more consistently the case for counties with above vs. below median levels of single parent households, but the differences between the groups overlap, suggesting they are less significant (see Table 2). Again in this case, the conflating factor may be linkages to other sociodemographic characteristics. For example and as suggested above, perhaps these false negatives are conflating family composition with income, such that a larger family may be able to afford child support with children than smaller families. This suggests one interlinkage between income and family composition that is difficult to disentangle.

Overall then, social sensors are designed to measure one signal (in our case, access to high-speed internet from broadband). Inevitably then, when that signal no longer dominates and/or has other conflating and competing signals, errors are likely to result. Figure 3 presents a 2 × 2 typology that summarizes the results from our analysis and presents an indicative heatmap that reflects each cell of the typology. Table 2 reports all DiD coefficient estimates from these regressions and provides a synopsis of what we expected versus what we actually find in our analysis. In line with Boundary Condition 1, Fig. 4 shows how the strongest signals are closest to those dimensions that most directly measure broadband-based dependent work.

Figure 4
figure 4

This figure presents a graphical summary of case consistency. As expected per Boundary Condition 1, the social sensor weakens as dimension is increasingly distant from broadband-dependent work.

Creating an array of social sensors: additional built environment sensors—a proof-of-concept

How then can we help mitigate the errors from a social sensor (in this case broadband)? Perhaps as in engineering, sensors are more effective when they are arrayed, whereby multiple sensors that read different signals are linked together to mitigate weakeness in any one sensor. This reflects practices in the field of engineering to understand how sensors are developed and used to measure various parameters. One means for doing this is to have a set of redundant measures to ensure accurate readings. We see this in the field of atmospheric science. When measuring the temperature of clouds, which is a critical predictor of storm dynamics and cloud formation, both radiosondes and drone profiling are used to get the most accurate measure possible to include in the models55. We argue that this holds true for the use of infrastructure as a social sensor. We can perhaps strengthen the sensor by incorporating other, additional sensor measures highlighting the relationship between broadband internet and unemployment. Secondly, while there are some applications where a single sensor is enough to measure a given parameter, this is not always the case for more complex parameters. For example, one can get an accurate reading of temperature by only using a thermometer. However, in order to track an object’s movement, a complementary array of sensors may be required, including but not limited to an accelerometer and optical tracking capabilities56. In order to understand the complex dynamics of economic metrics during times of crises, a full array of infrastructure sensors may more accurately pinpoint counties which are detrimentally impacted.

We start by assessing how a redundant measure of broadband access, WiFi enabled public libraries, may work to strengthen the results of our study. Public libraries serve as a “first choice, first refuge, and last resort in a range of emergency and e–government circumstances”57, and many people who did not have access to broadband in their homes during COVID made use of their local public libraries to help fill the broadband gap58. For this redundant public (as opposed to private household) measure check, we created a parallel metric to our broadband metric which assessed what percent of a county’s population falls within the “legal service area” of a public library, defined as the population that lives within the boundaries of the geographic area the library was established to serve. For an additional sensor to be a useful component in an array, there should be some orthogonality, which suggests that the additional sensor is providing information that is not being captured in the existing sensors used. In this case, there is some correlation between public libraries and broadband access, but not perfect correlation, which suggests libraries are providing additional information not captured in our core broadband sensor. We find the correlation between the metrics, when in binary form of treated vs control, to be 0.2. Using this metric, we classify counties with below 50% of their population in the legal service area of a public library (akin to our below 50% penetration of broadband at a county level) as our control group and we classify counties with above 50% of their population in the legal service area of a public library as the treatment group (akin to our above 50% penetration for broadband). We separately run the same model presented above and find that our results directionally hold with the results using broadband access as the sensor, but that the signal of the results on average across most of the cases are smaller and less significant on average. We argue this suggests that public access to broadband helped reduce gaps seen in private access to broadband. The results from this analysis are included in Table 2. Triangulating across multiple sensor signals also not just reduces the instance of false positive and false negatives, but also isolates which subset provides the strongest signal for which to inform more targeted policy support. In this case, coupling libraries and broadband renders insignificant much of the false positives and negatives found in broadband alone, and helps identify occupations most equipped to work from home as the subset with the strongest signal for which to target policy support as both broadband and libraries detect this effect. Perhaps the reason for this is that public access to broadband is more suitable for less data-intensive needs (e.g., email or accessing websites) and less so for more data-intensive needs (e.g., Zoom calls and computational analyses) for which many occupations that were yet equipped for WFH may have necessitated.

To further explore the creation of sensor arrays to detect gaps, we also selected two forms of physical infrastructure which serve to complement the upstream and downstream rollout of broadband—bridges per county and new building permits per state (selected based on data availability). We selected these because they capture different dimensions as to how the built environment can influence broadband through rollout and point of access. Building networks are likely where broadband is deployed more downstream and therefore where it is accessed. Building networks have a correlation with the broadband access metric of 0.15. Bridge networks impact urban connectivity and therefore may influence where broadband is rolled out more upstream. Bridge networks have a correlation with broadband of 0.2559. As with libraries, building and bridge networks are adding novel information to our core broadband sensor. As a result of this, we would expect that by integrating both sets of physical infrastructure into our broadband models, this would help sharpen the (broadband) sensor. Perhaps also areas with physical connectivity enhance the expectation of digital connectivity more than areas without such connectivity.

As shown in Fig. 5, for counties with above median number of new building permits per state, when all else is equal, the base case finds treated counties experience an increase in unemployment rate of 1.35% over control counties after the onset of the COVID-19 stay-at-home mandates. This is compared to an increase in unemployment rate of 0.94% for counties below the median. For counties with above median number of bridges, the increase in unemployment for the base case is 1.35%, compared to the below median subset with a 0.88% increase, holding all else qual after the onset of COVID-19. Given the built environment has similar upstream and downstream impacts, we then further assessed whether these are complementary or substitutive. These effects seem to be complementary as the subset of areas where both bridges and buildings are above the median generate the largest delta (1.42%) between the treated and control counties in the base case. One may presume that perhaps these impacts are due to multicollinearity and that bridges and buildings are simply collocated with each other. This appears to not be the case as the correlation between our binary measures of bridges and buildings is near to zero (r = −0.02).

Figure 5
figure 5

Top: difference-in-difference estimators for the full dataset. Robust standard errors, clustered at the state level, are included below each estimate with statistical significance indicated by the stars based off of a two-tailed test. Under the base case using MSFT 2020 data and a 50% penetration rate as the treatment and control groups, in the subset of counties which have above the median number of new building permits per county, all else equal and after the shock of the stay-at-home mandates, experience an increase in unemployment of 1.35%. This is compared to an increase in unemployment of 0.94% when the subset has a below median number of new buildings. We see that in the subset with above median number of bridges, the increase in unemployment for the base case is 1.35%, compared to the below median number of bridge subset with 0.88% increase. Bottom: based off the findings in the top figure, we investigate further the compounding effect of infrastructure and find that in counties with above median density of bridges and new houses, the impact on unemployment is further exacerbated, suggesting that infrastructure services may be integrated with the provision of broadband. The parallel trends between the control (below adequate access to broadband at a county level) and treatment (above adequate access to broadband on average at a county level) for both number of buildings (upper row) and number of bridges (bottom row) can be found in Appendix G. Parallel trends hold prior to the shock of the COVID-19 pandemic. The full regressions that underpin these results are in SI Appendix I.

Overall, this suggests linking sensors into “arrays” strengthens the signal, reduces errors, mitigates detection gaps, and helps identify the most prominent subsets for targeting. In this case, public libraries reduce the false positives and negatives from broadband alone, help prioritize which subset yields most promising gaps for targeting (i.e., occupations yet equipped for WFH), and identifies more precisely where broadband signals weaken. Moreover in incorporating additional built environment features, we can detect influences on these broadband gaps based on gaps in rollout (bridges) or gaps in points of access (buildings). Clearly, we can conceive many other different sensors for such an array, so we see this as demonstrating a proof-of-concept for future work to explore more systematically other sensor arrays and outcomes, even beyond those centrally focused on broadband and unemployment. Overall in line with Boundary Condition 2, using multiple sensors in an array improves targeting to the most key variables (i.e., occupations not yet equipped for WFH) and to the most key locations (i.e., those with both upstream rollout and downstream access infrastructure).

Continue Reading