Skip to main content
Advertisement
  • Loading metrics

COVID-19 cluster surveillance using exposure data collected from routine contact tracing: The genomic validation of a novel informatics-based approach to outbreak detection in England

  • Simon Packer,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Piotr Patrzylas,

    Roles Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Iona Smith,

    Roles Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Cong Chen,

    Roles Conceptualization, Data curation, Formal analysis, Software, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Adrian Wensley,

    Roles Conceptualization, Data curation, Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Olisaeloka Nsonwu,

    Roles Conceptualization, Data curation, Formal analysis, Software, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Kyle Dack,

    Roles Data curation, Formal analysis, Software, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Charlie Turner,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Charlotte Anderson,

    Roles Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Rachel Kwiatkowska,

    Roles Conceptualization, Investigation, Supervision, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Isabel Oliver,

    Roles Conceptualization, Supervision, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Obaghe Edeghere,

    Roles Conceptualization, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Graham Fraser ,

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing

    ‡ These authors are joint senior authors on this work.

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

  • Gareth Hughes

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    gareth.hughes@ukhsa.gov.uk

    ‡ These authors are joint senior authors on this work.

    Affiliation United Kingdom Health Security Agency, London, United Kingdom

Abstract

Contact tracing was used globally to prevent onwards transmission of COVID-19. Tracing contacts alone is unlikely to be sufficient in controlling community transmission, due to the pre-symptomatic, overdispersed and airborne nature of COVID-19 transmission. We describe and demonstrate the validity of a national enhanced contact tracing programme for COVID-19 cluster surveillance in England. Data on cases occurring between October 2020 and September 2021 were extracted from the national contact tracing system. Exposure clusters were identified algorithmically by matching ≥2 cases attending the same event, identified by matching postcode and event category within a 7-day rolling window. Genetic validity was defined as exposure clusters with ≥2 cases from different households with identical viral sequences. Exposure clusters were fuzzy matched to the national incident management system (HPZone) by postcode and setting description. Multivariable logistic regression modelling was used to determine cluster characteristics associated with genetic validity. Over a quarter of a million (269,470) exposure clusters were identified. Of the eligible clusters, 25% (3,306/13,008) were genetically valid. 81% (2684/3306) of these were not recorded on HPZone and were identified on average of one day earlier than incidents recorded on HPZone. Multivariable analysis demonstrated that exposure clusters occurring in workplaces (aOR = 5·10, 95% CI 4·23–6·17) and education (aOR = 3·72, 95% CI 3·08–4·49) settings were those most strongly associated with genetic validity. Cluster surveillance using enhanced contact tracing in England was a timely, comprehensive and systematic approach to the detection of transmission events occurring in community settings. Cluster surveillance can provide intelligence to stakeholders to support the assessment and management of clusters of COVID-19 at a local, regional, and national level. Future systems should include predictive modelling and network analysis to support risk assessment of exposure clusters to improve the effectiveness of enhanced contract tracing for outbreak detection.

Author summary

We studied how to use routine contact tracing data to detect outbreaks of COVID-19. We identified clusters of people who had potentially been exposed to the virus at the same location (an exposure cluster). We then linked these to genomic data to identify whether clusters included persons who were infected with the same strain of the virus and therefore represent a shared transmission event. Our study is the first to show that routine contact tracing data can be used to systematically detect outbreaks at a population level. Using data from over 4 million cases of COVID-19 in England, we identified more than a quarter of a million exposure clusters. When linked to genomic data, we found that 25% of these were likely to be part of a real outbreak. The size and setting of these exposure clusters determined the likelihood that they were real outbreaks. We recommend that future responses take a machine learning approach to help them to prioritize resources and target their responses to signals that are more likely to represent real outbreaks. Overall, our study suggests that exposure cluster surveillance is a valuable tool for finding and controlling outbreaks of COVID-19 during a pandemic.

Introduction

Globally, contact tracing was deployed during the COVID-19 pandemic to limit and prevent viral transmission through the identification and isolation of persons at greatest risk of developing disease [14]. It was recognised early in the pandemic that SARS-CoV-2 could be transmitted prior to symptom onset, and that transmission was overdispersed, with a minority of cases contributing to the majority of onward transmission events [5,6]. These observations suggested that conventional contact tracing alone, primarily focusing on the identification and isolation of named contacts, would have limited impact on the control of community transmission [2,713]. These observations were supported by several investigations of COVID-19 clusters where the primary cases and subsequent chains of transmission would not have been identified using traditional contact tracing methods alone [1421]. While cluster surveillance based on confirmed cases can have significant utility (e.g. for the monitoring of continuing outbreaks in institutional settings), it provides at best indirect evidence for primary events responsible for transmission.

Backwards contact tracing (BCT) aims to identify the index case and other cases linked to the common source/setting of infection [22]. Modelling studies suggested that capturing cases’ exposure data during BCT could substantially increase contact tracing effectiveness [6,23]. This was supported by a prospective epidemiological study demonstrating the use and benefit of BCT among students in the Belgium [24]. A small number of countries, including Japan, adopted this approach early in the pandemic, leading to more timely recognition and termination of transmission chains [25,26].

In June 2020 the United Kingdom Scientific Advisory Group for Emergencies (SAGE) recommended that a bidirectional approach to contact tracing be developed and implemented [27]. The Public Health England (now United Kingdom Health Security Agency; UKHSA) Enhanced Contact Tracing Programme (ECT) was integrated with the conventional forwards national tracing programme of NHS Test and Trace (NHS T&T) in October 2020 and continued until the cessation of contact tracing of all cases in February 2022. The ECT programme deployed a cluster surveillance system based on case exposures during the pre-symptomatic period (3 to 7 days prior to symptom onset). Exposure data were collected on all cases during routine contact tracing and on a daily basis were algorithmically matched to data from other cases to define “exposure clusters”. These clusters were risk assessed by local public health teams to identify events and/or locations potentially associated with transmission [28]. We describe here the ECT programme in England, the epidemiology of clusters of case exposures of COVID-19 and provide evidence for the validity and operational utility of this approach for disease control through the early identification of transmission events and outbreaks.

Methods

COVID-19 contact tracing in England

The NHS T&T national contact tracing programme was launched in England in May 2020. The system received reports on all confirmed cases of SARS-CoV-2 identified through laboratory testing in England. Cases were initially invited by text message or email to self-complete a contact tracing questionnaire; those that couldn’t be contacted or who did not respond within a defined period were contacted by telephone [28,29]. Case information collected included demographic and clinical data, locations visited outside the home, and close contacts during the infectious period (defined as two days before symptoms or confirmatory laboratory test, to date of self-report of contact).

Enhanced contact tracing in england

Additional questions were added to the case questionnaire in October 2020 to collect information on events and activities outside the home during the period where infection was most likely to have occurred. This was defined as 3–7 days before symptom onset or date of positive test (Fig 1) [30]. Data collected included: event description, event category, attendance date, postcode, and proximity risk indicators (crowded, close contact, closed space). Event categories were defined at three levels: the first indicated the type of event (workplace/education, household or accommodation, or events/activities) while the second and third level categories provided an increasing level of detail regarding the type of activity and its location (S1 Table).

thumbnail
Fig 1. Periods of data collection for enhanced contact tracing.

The backwards period of contact tracing was that likely to reflect probable exposure for the case (3–7 day period before symptom onset or date of the positive test). Data was collected on events and activities at workplace, education, household and other settings (such as hospitality and leisure).

https://doi.org/10.1371/journal.pdig.0000485.g001

Exposure clusters, 2-days window and same day event groupings

Exposure clusters were defined as instances where ≥2 cases reported attending an event with the same postcode and setting category (the location) and with attendance dates within a seven-day rolling period (for example, three exposure events would be linked together if cases attended the same location on the first, fourth and tenth day of the month). A 2-day event window was defined where matching events occurred within a two-day rolling window. Same day events were defined as where the matching events occurred on the same day.

Ethical statement

Ethical approval was not required as the work was part of the public health response to COVID-19. Consent was not required as all data were originally collected for contact tracing and health protection purposes and fall under Regulation 3 of the UK Health Service (Control of Patient Information) Regulations 2002.

Operational use of exposure cluster reports

Daily lists of exposure clusters were automatically processed into a PowerBI dashboard where they were accessed by local and regional public health teams and national incident managers, with regional-level access controlled using Microsoft security groups. Public health teams used exposure cluster information to identify, and risk assess clusters as possible outbreaks. Weekly surveillance reports on incident exposure clusters at local authority level were also made available to public health teams.

Data analysis

Exposure clusters.

Exposure clusters were identified from event data collected from confirmed cases of COVID-19 referred to NHS T&T for contact tracing between 23 October 2020 and 1 August 2021. Cases were included if they had completed a case questionnaire (via digital self-report or call handler), had a residential address in England, and reported at least one event outside their home. Events were linked to an exposure cluster if they matched deterministically on event postcode and setting category and had attendance dates within 7 days of another matched event. Matches were not permitted between events reported by the same case, e.g., if a case attended a workplace across multiple days. The notification date for exposure clusters was the date of entry of the second case into the contact tracing system. Exposure cluster reports were derived through daily linkage of all events reported by cases in the backwards period with geographical information and an attendance date within the past 30 days. Common exposures with a postcode outside England were removed.

Descriptive epidemiology.

National case numbers by specimen date and vaccination data were obtained from the Public Health England (PHE) Coronavirus dashboard [31]. Descriptive analysis included trend analysis of events per case and frequency of exposure clusters by setting, number of cases, distribution of cases over time, background incidence, median age, duration, cumulative 2nd dose vaccination coverage and sex ratio of the exposure cluster. Background incidence (cases per 100,000 population) and 2nd dose vaccine coverage were assigned to exposure clusters based on the earliest attendance date and upper tier local authority (a local government structure responsible for a range of services to the population of a defined area) of the setting. Descriptive statistics (mean, median, interquartile range) were calculated according to the type of data. Events and exposure clusters were grouped into time periods based on the national restrictions in place in England [32,33].

Validation of exposure clusters using genomics data.

Contact tracing records were linked to their corresponding laboratory records and whole genome sequencing data as previously described [34]. Exposure clusters were included in validation analysis if ≥2 cases were successfully linked to genomics data. An exposure cluster was considered genetically valid if it included ≥2 cases from different households where sequences were zero single-nucleotide polymorphisms apart. Household sharing was determined using unique property reference number (UPRN) obtained from address matching using the Ordnance Survey Address Base [35].

Exposure clusters and reported outbreaks and incidents.

Data on COVID-19 incidents and outbreaks notified to and/or managed by regional health protection teams (HPT) were obtained from HPZone, the national health protection case and incident management system. Exposure clusters were linked to reported incidents/outbreaks by postcode and a further fuzzy match on the free text description of the exposure cluster setting provided by cases during contact tracing and included on HPZone. A successful match was made where ≥70% of words (a pragmatic cut-off, irrespective of length) matched between the exposure cluster and HPZone free text description (i.e., 70% of the words within each description were also found in the other description). Valid links were defined as those where the exposure cluster report date was up to seven days before or after the date the situation was entered onto HPZone.

Multivariable analysis.

Single variable and multivariable analyses were used to identify factors associated with genetically valid exposure clusters. Odds ratios (OR) and corresponding 95% confidence intervals (CI) were calculated. A forward approach was used to build a model with the contribution of variables assessed through reduction of Akaike information criterion (AIC) and significance of likelihood ratio test (p<0.05). Variable coefficients and p-values were assessed in the single variable analysis and sequentially added to the multivariable model in order of decreasing significance.

Characteristics of exposure clusters considered for inclusion were the total number of cases, setting, median age of cases, number of same day events, duration, and standard deviation of the sex ratio. A priori confounders (background COVID-19 incidence, cumulative 2nd dose vaccination coverage, urban-rural classification, [36] Index of Multiple Deprivation [37] of exposure postcode) significant through single variable analysis (p<0.05) were considered for inclusion in the model containing exposure cluster characteristics and effects assessed for inclusion as above.

The model was assessed for influential variables (Cook’s distance via residuals versus leverage plot using a cut off 0.5), multi-collinearity (variance inflation factor >5), and the assumption of linearity for continuous variables (Local Polynomial Regression Fitting via graphical assessment of “Loess” line monotonicity). Model prediction of genetic validity was assessed by calculating predicted probabilities and using a receiver operator curve (ROC) statistic. All analysis was undertaken using R version 4.2.1 [38].

Role of the funding source

This work was conducted as part of the public health response to the COVID-19 pandemic in England.

Results

Contact tracing data and reported events

There were 4,628,798 confirmed cases referred for contact tracing during the study period, of which 89% (4,119,630) completed the contact tracing questionnaire. Of those, 57% (2,318,450) declared at least one event outside the home during the backwards period; these cases reported a total of 7,368,666 events (mean 1·6 events per case). Work or education events were most frequently reported with 4,474,540 events declared by 1,494,773 cases (average of 1·7 events per case; Table 1). The median interval between the earliest backwards event and symptom onset was 5 days, with a duration between onset and testing of 6 days and time to referral for contact tracing of 8 days (Table 1).

thumbnail
Table 1. Events declared by cases during contact tracing by direction relative to symptom onset and by event type.

https://doi.org/10.1371/journal.pdig.0000485.t001

Epidemiology of exposure clusters in England

The distribution and magnitude of exposure clusters varied in relation to changes in case incidence and the implementation of national non-pharmaceutical interventions (Fig 2). Overall, more than a quarter of a million exposure clusters (269,470) were identified during the study period; a median of 4,142 (IQR: 1,402–10,598) clusters each week in England. At the peak, 22,879 exposure clusters were identified in a single week (July 12–18 2021). Clusters were most frequently identified in education (19·8%), shopping (19·4%) and workplace (14·3%) settings (Fig 2).

thumbnail
Fig 2. COVID-19 incidence and events and exposure clusters reported to contact tracing.

(A) incidence of new confirmed cases; (B) number of events by period of attendance; (C) exposure clusters by event type. Backwards events reflect those reported by cases during the likely exposure period (3–7 days before symptom onset or date of positive test), forwards events those reported after the case was likely infectious (from 2 days before symptom onset or date of positive test to the time of contact tracing). Data is shown relative to national restrictions in England from 23 October 2020 to 1 August 2021. National non pharmaceutical interventions: 1: second national lockdown; 2: third national lockdown; 3–6: roadmap out of restrictions.

https://doi.org/10.1371/journal.pdig.0000485.g002

At the start of the study period there was a rapid increase in exposure clusters with a peak at the beginning of November 2020, closely following the peak of the concurrent wave of COVID-19 in England. Most of these exposure clusters were in education, hospitality, and entertainment settings. The lockdown period that followed in November 2020 was associated with a sharp but slightly delayed decrease in exposure clusters, with clusters in education settings retaining a high frequency. Lifting of the lockdown in December 2020 led to a substantial increase in exposure clusters: exposure cluster incidence was high in education settings, but also increased substantially in shopping, workplace and hospitality and entertainment settings. The increase was sustained throughout December 2020, with the exception of educational settings where exposure cluster numbers decreased following school and university closures.

The start of the next national lockdown in January 2021 was associated with a decrease in exposure cluster incidence, with declines in hospitality and entertainment settings, but not in workplace and other settings. From February 2021 to the end of May 2021, case numbers fell markedly, and exposure clusters remained infrequent. In June 2021 during the final lifting of national restrictions, case numbers rose sharply and a concomitant increase in exposure clusters was observed in all settings, with high numbers identified in hospitality, entertainment, education, and workplace settings (Fig 2).

Factors associated with genetically valid exposure clusters

There were 13,058 (5·2%) exposure clusters eligible for inclusion in the analysis of genetic validity. Of these, 25% (3,306) were defined as genetically valid (Table 2). The proportion of genetically valid clusters varied over the study period: from 14% in November 2020 and July 2021 to 36% in April 2021. The proportion of genetically valid exposure clusters was highest in clusters of ≥10 cases (37%, 260/712), in education (35%, 1246/3528) or workplace (42%, 577/1371) settings and those containing greater than five instances where ≥2 cases reported attending on the same day (same day attendance) (43%, 470/1088) (Table 3). IMD and rural/urban classification were both not significantly associated with genetic validity in single variables analysis and were not included in multivariable modelling.

thumbnail
Table 2. Exposure cluster genetic validity and matching to managed incidents.

https://doi.org/10.1371/journal.pdig.0000485.t002

thumbnail
Table 3. Crude and adjusted associations between characteristics of exposure clusters and genetic validity.

https://doi.org/10.1371/journal.pdig.0000485.t003

The final model included 12,786 observations (267 exposure clusters excluded due to: postcodes outside of England, missing values for ≥1 variable or found to be highly influential on model fit) and had an area ROC of 0·71 (95% CI 0·70–0·72). Five influential observations were removed from the model which resulted in large percentage change in the association between two settings (personal care and custodial institutions) and genetic validity. No collinearity was observed between variables included in the final model. All continuous variables showed no substantial evidence of non-linearity through visual assessment.

Exposure clusters that included more cases, were shorter in duration, and contained a greater number of same day events, were more likely to indicate genetically valid transmission events (Table 3). There was a dose-response relationship between the number of events in an exposure cluster and likelihood of genetically linked cases. Clusters of longer duration were significantly less likely to represent genetically valid signals for outbreaks. An increased number of same day events within the cluster was associated with genetically linked cases, with odds increasing significantly (using the absence of same day events as the reference group) with the number of same day events included: two same day events (aOR 1·58 [95% CI 1·37–1·82]) and >5 same day events (aOR 3·57 [95% CI 2·89–4·41]) (Table 3).

Clusters in all settings other than those in custodial institutions were found to be independently associated with increased odds of genetic validity (using shopping as the reference group). Strong associations were observed for workplace settings (aOR 5·10 [95% CI 4·23–6·17]), education settings (aOR 3·72 [95% CI 3·08–4·49]), healthcare settings (aOR 3·09 [95% CI 2·27–4·19]), and hospitality settings (aOR 2·89 [95% CI 2·41–3·47]).

Genetically valid exposure clusters and reported incidents/outbreaks

Over 5% (13,494/248,864) of all exposure clusters identified during the study period were linked to incidents recorded on the national incident management system (HPZone). Of the exposure clusters eligible for inclusion in the genetic validity analysis (n = 13,008), 47% (622/1318) of HPZone matched exposure clusters were genetically valid compared to 23% (2684/11690) of those that were not matched. Genetically valid exposure clusters linked to situations on HPZone were identified a median of one day (IQR 0–4, range -7 to 7) earlier through ECT than the corresponding entry on HPZone (Table 2).

Discussion

In this study we have described the epidemiology of COVID-19 case exposure clusters identified by the ECT programme in England and provided evidence for their validity and utility for the rapid identification of outbreaks. Approximately 25% of exposure clusters detected through the ECT programme included ≥2 genetically indistinguishable SARS-CoV-2 infections. This proportion increased to >30% during low incidence periods, where the impact of early action by local public health teams to break transmission chains would be highest. We have also identified cluster characteristics independently associated with increased likelihood of genetic validity; these include clusters of larger size, including same day events, and those in particular settings (including healthcare and workplaces).

The ECT cluster surveillance system frequently detected outbreaks before they were recorded as managed incidents by local health protection services; approximately one half of exposure clusters linked to subsequently confirmed outbreaks were detected before registration on the national incident management system. These events frequently occurred outside of formal institutional settings and could represent important foci of community transmission. Exposure cluster settings included hospitality and mass gatherings, where contacts were likely to be unknown to each other and would not be rapidly identified, if at all, through conventional contact tracing. Community settings contribute significantly to onward spread of COVID-19 [14,17,18,3942] and cluster identification provided corroborative and real-time information to support local risk assessment and management of outbreaks.

To our knowledge, the ECT programme in England was the only national programme using contact tracing information for systematic surveillance of COVID-19 clusters based on the exposures of cases during their pre-symptomatic period. A key consideration for any cluster surveillance system is achieving the optimal balance between sensitivity and specificity. The ECT exposure cluster algorithm was initially designed to prioritise sensitivity over specificity and used a broad time period and postcodes for linking case events. We have shown that clusters defined through shorter time period linkages (e.g., 2-day event window or same day events) are more likely to represent actual transmission events and can be used to improve specificity. Furthermore, the use of unique property reference numbers was introduced towards the end of the programme to increase the precision of geolocation. Improving specificity whilst maintaining sensitivity would be a key development for future cluster surveillance.

The strengths of this study stems from the secondary analysis of systematically collected national contact tracing data. Exposure data was collected from more than 85% of confirmed COVID-19 cases in England over the study period, providing comprehensive and representative coverage with considerable statistical power. Linkage to available genomics data provided a means to validate exposure clusters using a highly specific indicator of probable transmission. Although genomic sequencing coverage limited the proportion of cases which could be included in assessment of genomic validity, cases were largely selected randomly for sequencing (by geographically weighted sampling of community cases), with some oversampling of some high-risk groups (such as healthcare workers and international travellers). Strengthening the coverage and timeliness of genomic surveillance is critical for more effective cluster detection of this kind.

Limitations include the use of primary source data collected for operational purposes, and likely subject to a degree of heterogeneity and incompleteness in data collection. A significant proportion of exposure events may not have been recorded because cases were either unaware or deliberately chose not to report them, although the direction and potential size of any resulting bias is unclear. Additionally, the genetic validity investigations were based on a small proportion of all exposure clusters, this may have introduced representativity bias, the nature and direction of which cannot be determined.

The use of a highly specific definition for genetic validity means we have likely underestimated the true number of valid clusters. Minor variant genomes can emerge to dominance within an individual [43] with the potential for genetic compartmentalisation between the respiratory tract and gastrointestinal tract [44]. In addition, treatments for COVID-19 that interfere with viral replication can induce mutational signatures associated with greater sequence divergence between transmission pairs [45]. Such signals may be greater in certain population groups (e.g., older adults more likely to receive treatment).

Given that transmission of a genetically identical sequence is more likely to occur earlier during an infection [43], settings more likely to be associated with close to continuous exposure (such as households) are more likely to have been detected using our conservative methodology. The observation that longer clusters were less likely to be genetically valid may also be in part due to the accumulation of substitutions during longer transmission chains. Future work needs to assess the impact of these elements and evaluate the use of more relaxed genetic matches on cluster assessment and outbreak detection.

The ECT programme identified and communicated exposure clusters to local public health teams daily. Based on expert opinion and guidance, exposure clusters were risk assessed for the need for public health action. Without the availability of genetic validation during the response, exposure clusters lacked specificity. In future we recommend that a predictive modelling approach, which uses genomic validation, is used to help triage and prioritise clusters for risk assessment. The use of predictive modelling and genomic validation could enable real-time model calibration based on changes in background epidemiology of the virus. However, such an approach may be limited by the turnaround time for sequencing of isolates. Further work could use network analysis methods to combine exposure cluster data with other available transmission indicators to build a transmission network of extant links. These networks could be used to infer the setting/source of infection for all cases as the pandemic progresses, providing vital information on which settings are associated with transmission and to target interventions.

For this study we employed simplistic text matching methods for detection of exposure clusters and linkage to situations under public health management. This has exposed requirements for machine learning methods to improve text matching and exposure cluster detection from contact tracing data. Additionally, these is a further need to develop unsupervised machine learning models to provide timely predictions that exposure clusters are outbreaks. These three health protection requirements for future work are detailed in Table 4.

thumbnail
Table 4. Technological development needs for future studies.

https://doi.org/10.1371/journal.pdig.0000485.t004

Through analysis of routine contact tracing data collected in England, we have shown that systematic case exposure cluster surveillance is a feasible and valid tool for outbreak detection and situational awareness that can complement traditional methods. Although an evaluation of the effectiveness of such programmes to reduce transmission are required, exposure cluster surveillance should be considered for pandemics or epidemics where contact tracing is integral to the response. The methodology may be applicable across a range of infectious diseases, particularly those characterised by overdispersion of transmission and where transmission occurs across a variety of different settings.

Supporting information

S1 Table. Event categories used to classify forward and backward events reported by confirmed COVID-19 cases in the national contact tracing system in England.

https://doi.org/10.1371/journal.pdig.0000485.s001

(DOCX)

Acknowledgments

We thank members of the UKHSA Contact Tracing Cell for guidance and operational support. We thank Paul Cleary for helpful discussions on potential methodological developments.

References

  1. 1. World Health Organization. Contact tracing and quarantine in the context of COVID-19: interim guidance, 6 July 2022. Available from: https://www.who.int/publications/i/item/WHO-2019-nCoV-Contact_tracing_and_quarantine-2022.1.
  2. 2. Eames KTD, Keeling MJ. Contact tracing and disease control. Proc R Soc B Biol Sci. 2003;270: 2565–2571. pmid:14728778
  3. 3. ECDC. Contact tracing: public health management of persons, including healthcare workers, who have had contact with COVID-19 cases in the European Union–third update. Available from: https://www.ecdc.europa.eu/sites/default/files/documents/covid-19-contact-tracing-public-health-management-third-update.pdf.
  4. 4. Hale T, Angrist N, Goldszmidt R, Kira B, Petherick A, Phillips T, et al. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). Nat Hum Behav. 2021;5: 529–538. pmid:33686204
  5. 5. Ko YK, Furuse Y, Ninomiya K, Otani K, Akaba H, Miyahara R, et al. Secondary transmission of SARS-CoV-2 during the first two waves in Japan: Demographic characteristics and overdispersion. Int J Infect Dis. 2022;116:365–373. pmid:35066162
  6. 6. Endo A, Abbott S, Kucharski AJ, Funk S. Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China. Wellcome Open Res. 2020;5:67. pmid:32685698
  7. 7. Greenhalgh T, Jimenez JL, Prather KA, Tufekci Z, Fisman D, Schooley R. Ten scientific reasons in support of airborne transmission of SARS-CoV-2. Lancet 2021;397:1603–1605. pmid:33865497
  8. 8. Tang JW, Marr LC, Li Y, Dancer SJ. Covid-19 has redefined airborne transmission. BMJ. 2021; 373:1–2. pmid:33853842
  9. 9. Wilmes P, Zimmer J, Schulz J, Glod F, Veiber L, Monbaerts L, et al. SARS-CoV-2 transmission risk from asymptomatic carriers: Results from a mass screening programme in Luxembourg. Lancet Reg Health Eur. 2021;4:1–9. pmid:33997830
  10. 10. Muller CP. Do asymptomatic carriers of SARS-COV-2 transmit the virus? Lancet Reg Health Eur. 2021;4:100082. pmid:33997832
  11. 11. Liu Y, Rocklöv J. The effective reproductive number of the Omicron variant of SARS-CoV-2 is several times relative to Delta. J Travel Med. 2022;29: 1–4. pmid:35262737
  12. 12. Craig KJT, Rizvi R, Willis VC, Kassler WJ, Jackson GP. Effectiveness of contact tracing for viral disease mitigation and suppression: Evidence-based review. JMIR Public Health Surveill. 2021;7:e32468. pmid:34612841
  13. 13. Kucharski AJ, Klepac P, Conlan AJK, Kissler SM, Tang ML, Fry H, et al. Effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of SARS-CoV-2 in different settings: a mathematical modelling study. Lancet Infect Dis. 2020;20:1151–1160. pmid:32559451
  14. 14. Leclerc QJ, Fuller NM, Knight LE, Funk S, Knight GM. What settings have been linked to SARS-CoV-2 transmission clusters? Wellcome Open Res 2020;5:83. pmid:32656368
  15. 15. Wong NS, Lee SS, Kwan TH, Yeoh E-K. Settings of virus exposure and their implications in the propagation of transmission networks in a COVID-19 outbreak. Lancet Reg Health West Pac. 2020;4:100052. pmid:34013218
  16. 16. Furuse Y, Sando E, Tsuchiya N, Miyahara R, Yasuda I, Ko YK, et al. Clusters of Coronavirus Disease in Communities, Japan, January–April 2020. Emerg Infect Dis. 2020;26:2176–2179. pmid:32521222
  17. 17. Chau NVV, Hong NTT, Ngoc NM, Thanh TT, Khanh PNQ, Nguyet LA, et al. Superspreading event of SARS-CoV-2 infection at a bar, Ho Chi Minh City, Vietnam. Emerg Infect Dis. 2021;27:310–314. pmid:33063657
  18. 18. Kang CR, Leea JY, Park Y, Huh IS, Ham HJ, Han JK, et al. Coronavirus disease exposure and spread from nightclubs, South Korea. Emerg Infect Dis. 2020;26:2499–2501. pmid:32633713
  19. 19. Lu J, Gu J, Li K, Xu C, Su W, Lai Z, et al. COVID-19 outbreak associated with air conditioning in restaurant, Guangzhou, China, 2020. Emerg Infect Dis. 2020;26: 1628–1631. pmid:32240078
  20. 20. Shen Y, Li C, Dong H, Wang Z, Martinez L, Sun Z, et al. Community outbreak investigation of SARS-CoV-2 transmission among bus riders in Eastern China. JAMA Intern Med. 2020;180: 1665–1671. pmid:32870239
  21. 21. Illingworth C, Hamilton WL, Warne B, Routledge M, Popay A, Jackson C, et al. Superspreaders drive the largest outbreaks of hospital onset COVID-19 infections. Elife 2021;10:e67308. pmid:34425938
  22. 22. Ontario Agency for Health Protection and Promotion (Public Health Ontario). Focus on: backward contact tracing. Available from: https://www.publichealthontario.ca/-/media/documents/ncov/phm/2021/05/covid-19-backward-contact-tracing.pdf?la=en.
  23. 23. Bradshaw WJ, Alley EC, Huggins JH, Lloyd AL, Esvelt KM. Bidirectional contact tracing could dramatically improve COVID-19 control. Nat Commun. 2021;12:232. pmid:33431829
  24. 24. Raymenants J, Geenen C, Thibaut J, Nelissen K, Gorissen S, Andre E. Empirical evidence on the efficiency of bidirectional contact tracing in COVID-19. Nat Commun. 2022;13:4750.
  25. 25. Imamura T, Saito T, Oshitani H. Roles of public health centers and cluster-based approach for COVID-19 response in Japan. Health Secur. 2021;19:229–231. pmid:33346703
  26. 26. Seto J, Aoki Y, Komabayashi K, Ikdea Y, Sampei M, Ogawa N, et al. Epidemiology of coronavirus disease 2019 in Yamagata prefecture, Japan, January–May 2020: the Importance of retrospective contact tracing. Jpn J Infect Dis. 2021;74: 522–529. pmid:33790065
  27. 27. Scientific Advisory Group for Emergencies. Sixty-third SAGE meeting on COVID-19–22 October 2020. Available from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/935103/sage-63-meeting-covid-19-s0842.pdf.
  28. 28. Department of Health & Social Care. Technical report on the COVID-19 pandemic in the UK. Chapter 6: testing. Available from: https://www.gov.uk/government/publications/technical-report-on-the-covid-19-pandemic-in-the-uk/chapter-6-testing.
  29. 29. Briggs A, Jenkins D, Fraser C. NHS Test and Trace: the journey so far. Available from: https://www.health.org.uk/publications/long-reads/nhs-test-and-trace-the-journey-so-far.
  30. 30. Fontana LM, Villamagna AH. Understanding viral shedding of severe acute respiratory coronavirus virus 2 (SARS-CoV-2): review of current literature. Infect Control Hosp Epidemiol 2021;42:659–668. pmid:33077007
  31. 31. UKHSA. Coronavirus (COVID-19) in the UK. England Summary. Available from: https://coronavirus.data.gov.uk/.
  32. 32. Cabinet Office. COVID-19 Response: Summer 2021. Available from: https://www.gov.uk/government/publications/covid-19-response-summer-2021-roadmap/covid-19-response-summer-2021.
  33. 33. Cabinet Office. COVID-19 Response—Spring 2021 (Summary). Available from: https://www.gov.uk/government/publications/covid-19-response-spring-2021/covid-19-response-spring-2021-summary.
  34. 34. Allen H, Tessier E, Turner C, Anderson C, Blomquist P, Simons D, et al. Comparative transmission of SARS-CoV-2 Omicron (B.1.1.529) and Delta (B.1.617.2) variants and the impact of vaccination: national cohort study, England. Epidemiol Infect 2023:151:e58. pmid:36938806
  35. 35. Ordnance Survey. AddressBase. Available from: https://www.ordnancesurvey.co.uk/products/addressbase.
  36. 36. McLennan D, Noble S, Noble M, Plunkett E, Wright G, Gutacker N. The English Indices of Deprivation 2019. Available from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/833951/IoD2019_Technical_Report.pdf.
  37. 37. Office for National Statistics. 2011 rural/urban classification. Available from: https://www.ons.gov.uk/methodology/geography/geographicalproducts/ruralurbanclassifications/2011ruralurbanclassification.
  38. 38. R Core Team. R: A language and environment for statistical computing. Available from: https://www.r-project.org/.
  39. 39. Jang S, Han SH, Rhee JY. Cluster of Coronavirus disease associated with fitness dance classes, South Korea. Emerg Infect Dis 2020;26:1917–1920. pmid:32412896
  40. 40. Kong D, Zheng Y, Wu H, Pan H, Wagner AL, Zheng Y, et al. Pre-symptomatic transmission of novel coronavirus in community settings. Influenza Other Respir Viruses 2020;14:610–614. pmid:32558175
  41. 41. Li H, Wang Y, Ji M, Pei F, Zhao Q, Zhou Y, et al. Transmission routes analysis of SARS-CoV-2: a systematic review and case report. Front Cell Dev Biol 2020;8:618. pmid:32754600
  42. 42. Mat NFC, Edinur HA, Razab MKAA, Safuan S. A single mass gathering resulted in massive transmission of COVID-19 infections in Malaysia with further international spread. J Travel Med 2020;27: taaa059. pmid:32307549
  43. 43. Goldswain H, Dong X, Penrice-Randal R, Alruwaili M, Shawli GT, Prince T, et al. The P323L substitution in the SARS-CoV-2 polymerase (NSP12) confers a selective advantage during infection. Genome Biology 2023;24:47. pmid:36915185
  44. 44. Wang Y, Wang D, Zhang L, Sun W, Zhang Z, Chen W, et al. Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients. Genome Medicine 2021;13:30. pmid:33618765
  45. 45. Sanderson T, Hisner R, Donovan-Banfield I, Hartman H, Lochen A, Peacock TB, et al. A molnupiravir-associated mutational signature in global SARS-CoV-2 genomes. Nature 2023;623: 594–600. pmid:37748513