Skip to main content
Advertisement
  • Loading metrics

Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data

  • Xueda Shen ,

    Roles Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    shenxueda@berkeley.edu

    Affiliation Department of Biostatistics, University of California, Berkeley, California, United States of America

  • Aaron Rumack,

    Roles Methodology, Writing – review & editing

    Affiliation Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America

  • Bryan Wilder,

    Roles Supervision, Writing – review & editing

    Affiliation Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America

  • Ryan J Tibshirani

    Roles Supervision, Writing – review & editing

    Affiliation Department of Statistics, University of California, Berkeley, California, United States of America

Abstract

We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds under a hypothetical scenario in which, during the Delta wave, states only report data on the first day of each month, and on this day, report COVID-19 hospitalization counts for each day in the previous month. In this hypothetical scenario (just as in reality), medical insurance claims data continues to be available daily. At the beginning of each month, we train a regression model, using all data available thus far, to predict hospitalization counts from medical insurance claims. We then use this model to nowcast the (unseen) values of COVID-19 hospitalization counts from medical insurance claims, at each day in the following month. Our analysis uses properly-versioned data, which would have been available in real-time at the time predictions are produced (instead of using data that would have only been available in hindsight). In spite of the difficulties inherent to real-time estimation (e.g., latency and backfill) and the complex dynamics behind COVID-19 hospitalizations themselves, we find altogether that medical insurance claims can be an accurate predictor of hospitalization reports, with mean absolute errors typically around 0.4 hospitalizations per 100,000 people, i.e., proportion of variance explained around 75%. Perhaps more importantly, we find that nowcasts made using medical insurance claims are able to qualitatively capture the dynamics (upswings and downswings) of hospitalization waves, which are key features that inform public health decision-making.

Author summary

Daily reported COVID-19 hospitalizations have been a topline indicator throughout the pandemic in the US, and an up-to-date awareness of the load on the hospital system has been a key factor in public health decision-making. However, collecting and maintaining this indicator comes at a high price, as frequent reporting of hospitalizations is itself burdensome on the health system. This is especially true at times when it is needed the most: staff shortages in hospitals tended to coincide with surges in hospitalizations, making reporting even more challenging in peak times. In this paper, we explore the use of auxiliary indicators based on de-identified, aggregated medical insurance claims data, and build relatively simple statistical models to track hospitalizations using these auxiliary indicators, so that reporting may be (hypothetically) reduced in frequency, thereby reducing the burden on hospitals. We find that these models can track reported hospitalizations closely, even in critical times (surges), suggesting that our approach and similar ones may be good candidates for reducing reporting frequency in future public health crises.

Introduction

Timely access to public health data is critical to enable informed decision making during infectious disease outbreaks. However, setting up and maintaining public health reporting pipelines can be a burden on the health system itself. For example, beginning in May 2020, hospitals in the US have been required to report data on COVID-19 hospitalizations to the Department of Health and Human Services (HHS) [1]. This data has been critical for understanding the state of the pandemic and the current load on the health system. But the frequent reporting required for up-to-date situational awareness (daily, throughout most of the pandemic) has been quite difficult to implement and maintain. In an effort to achieve compliance, the Centers for Medicare and Medicaid Services (CMS) issued regulations in August 2020 that threatened to expel hospitals from the Medicare program, and apply monetary penalties, if they failed to comply with daily COVID-19 reporting requirements. This was strongly and openly opposed by the American Hospital Association (AHA) [2], but the regulations remained in place for nearly three years, lasting until the conclusion of the COVID-19 Public Health Emergency in May 2023.

An intriguing alternative lies in data streams which already exist and are maintained for other purposes, yet are relevant for inferring disease activity. One example is medical insurance claims, which are filed by a healthcare provider to seek reimbursement from an insurance company for medical services performed. In this paper, we examine the use of de-identified, aggregated medical insurance claims data as a complement to public health reporting over the course of the pandemic. Specifically, we investigate the following: if hospital reporting on COVID-19 would have been reduced in frequency from daily to monthly during the Delta wave, could we use signals derived from medical insurance claims to accurately nowcast COVID-19 hospitalization counts during the interim periods?

Medical insurance claims have long been utilized in public health policy analysis, ranging from economic implications of healthcare (e.g., recent examples include [38]), to examinations of treatment effectiveness and satisfaction (e.g., [913]). All of these works, however, are retrospective in nature: they seek to develop an understanding of a particular phenomenon using data that would not have been available in real-time, but only in hindsight. In contrast, our goal to carry out an analysis that reflects real-time estimation, so that we can understand (to the best extent we can) how our models would perform if they were to be operationalized in the future, for true prospective nowcasting. This requires us to use properly-versioned data at all times in our analysis, which would have been available in real-time, at the time nowcasts are produced. This is true of all data sources in question, but it is especially crucial for medical insurance claims signals: these signals are subject to heavy revisions, as claims can be filed long after a service was performed (we describe this more concretely later in the paper), altering previously-computed signal values. Therefore, using finalized (rather than properly-versioned) values of medical insurance claims signals for modeling and prediction can present a misleading picture of nowcasting performance.

Fig 1 displays an example of this, which plots reported COVID-19 hospitalizations in California, over a 1 year period spanning the winter wave of 2020 through the Delta wave of 2021. It also compares a signal derived from COVID-associated inpatient claims. This has its own separate units (as explained precisely in the methods section), and is thus given its own y-axis, on the right-hand side of the figure. Two versions of this inpatient signal are shown: a real-time signal, computed using claims that would have been available at each reference date on the x-axis, and a finalized signal, computed using claims that would have only been available 30 days after each reference date. All three series are smoothed over a trailing 7-day window. We can see that the finalized inpatient signal tracks reported hospitalizations quite well, however, the real-time signal is much more volatile, and its concordance with reported hospitalizations is much worse. (The real-time signal is also missing at some dates around the summer of 2021, which explains why no values are plotted there. We will discuss this shortly in the methods section.)

thumbnail
Fig 1. Reported daily COVID-19 hospitalizations, plotted alongside a signal derived from medical claims that measures daily COVID-associated inpatient admissions, for the state of California.

https://doi.org/10.1371/journal.pcbi.1012717.g001

The importance of data versioning for epidemic tracking and prediction tasks is emphasized in [1415], and the importance of leveraging existing healthcare data streams for epidemic surveillance, as a complement to traditional public health reporting, is motivated in [16]. Infectious disease nowcasting, using data from healthcare pipelines and also from a variety of other auxiliary data sources, has received increased attention over the last decade or so: see, e.g., [1737]. The goal in much of this work is to produce high-resolution, up-to-date estimates of disease activity for a fast-moving pathogen like influenza or COVID-19. The nowcasting problem is essentially to project finalized values from preliminary or partial measurements, making use of backfill or delay distributions, and possibly proxy signals which are exogenous to the main data reporting pipeline. The action of backfill or delay is naturally modeled via convolution, and Bayesian methods have been popular here, given their ability to streamline estimation and uncertainty quantification.

Our paper complements this line of work by considering the effectiveness of nowcasting based purely on proxy signals (for us, medical claims signals): we consider a hypothetical scenario in which hospital reporting on COVID-19 would have been set up at a coarser frequency, and under this hypothetical there is no partial or preliminary reporting data in the interm periods between reports (lasting a month, or longer) on which to base nowcasts. Finally, we examine intentionally simple statistical models based on regression, which would be relatively easy for a public health office to operationalize in practice.

Methods

This section describes the data that we use, the hypothetical scenarios (for hospital reporting cadence) that we consider, and the nowcast and backcast models that we build and evaluate. In describing these models, we also cover model training techniques which account for nonstationarity, time series cross-validation schemes for selecting tuning parameters, and variants of the basic (state-level) model which pool data across states. In the last subsection, we describe a nonparametric method for constructing prediction intervals.

Data

Throughout, we restrict our attention to nowcasting state-level hospitalizations. This is because the medical claims signals that we describe below are not generally available at each US county. That said, in principle, the same ideas we describe in what follows could be applied to nowcast hospital reports in large counties (subject to enough claims data being available in order to form robust signals). We also restrict our attention to nowcasting daily reported hospitalizations between April 1, 2021 and August 1, 2023, though we use data back until November 1, 2020 for training models.

State-level, daily COVID-19 hospitalization reports are obtained from the HHS, accessed via the Delphi Epidata API [1438]. We use to denote the 7-day trailing average of finalized reported new COVID-19 hospitalization counts corresponding to location and time s. Averaging over 7-day trailing windows is mainly used as a smoother, and to account for weekday-weekend differences. Reported hospitalization counts are subject to revision (these are typically minor in comparison to revisions for claims signals), hence we also introduce notation to work with versioned data henceforth: we denote by the 7-day trailing average of hospitalization counts for location and time s, but whose data version is as of time t. In this context, we often refer to s as the reference date and t the issue date. We use analogous notation and nomenclature for all versioned data in this paper.

De-identified, aggregated medical insurance claims data are provided to us by Change Healthcare, which covers around 25% of all commercial medical insurance claims in the US. Moreover, based on comparing total counts of COVID-19 hospitalizations from inpatient claims and those reported to the HHS (over the full period of our analysis), we find that there is a broad range: at the maximum end, for some US states, over 55% of hospitalizations are reflected in the claims data; at the minimum end, for other states, less than 20% appear in the claims data. (A more detailed analysis is given in Sect A of S1 Appendix.) Nevertheless, we have enough claims data per state in order for us to be able to derive meaningful signals which reflect COVID-associated outpatient and inpatient activity. Specifically, in this work, we consider the following two signals:

  • : the finalized percentage of outpatient claims in a 7-day trailing window with a confirmed COVID diagnostic code, corresponding to location and time s.
  • : the finalized percentage of inpatient claims in a 7-day trailing window with a COVID-associated diagnostic code, corresponding to location and time s.

As before, we use and to denote the versioned outpatient and inpatient claims signals, respectively, corresponding to issue date t. The use of versioned data is extremely important when working with claims-based signals because medical insurance claims are often submitted and/or processed late, many days (and even months) after a given date of service. This process is generally referred to as backfill, and it has quite a pronounced effect on both outpatient and inpatient signals. We can look back at Fig 1 for an example of the inpatient signal in California. What is labeled as the “real-time inpatient signal” in the figure is in our notation introduced here, for s ranging over the time values along the x-axis (and  =  California). What is labeled as the “finalized inpatient signal” is actually ; note that 30 is just a large number chosen for simplicity, and it is not necessarily the case that all claims would have been filed after 30 days.

Sometimes backfill is so severe that no claims for reference date s are filed at all until a later issue date t, which we refer to as latency. If the latency is large enough, in particular, if no claims are available until a full 7 days after a given reference date, then the real-time claims signal value will be missing. This happens at a few dates in Fig 1 in May and June of 2021.

In defining the inpatient and outpatient claims signals, the use of a ratio of COVID-associated claims to all claims, instead of a count of COVID-associated claims, is important for two reasons. First, it adjusts for the unknown market share (unknown to us) of Change Healthcare in each given location. Second, this ratio tends to be more robust to backfill than a pure count, as both the numerator and denominator get updated as new claims are filed (whereas a count would only be revised upward, and as we will see later in Fig 2, especially for the inpatient signal, only a small fraction of the total volume of claims are available in the first few days after a given date of service).

thumbnail
Fig 2. Analysis to help guide lag selection for inpatient and outpatient features in the working model (1).

The rows correspond to different features, and the columns to different metrics, as explained precisely in the main text. The shaded regions show  ± 1 standard error bands for each metric, over the state averages.

https://doi.org/10.1371/journal.pcbi.1012717.g002

Like the HHS hospitalization reports, various signals derived from medical insurance claims are available in the Delphi Epidata API. The precise diagnostic codes used in defining the outpatient and inpatient signals are given in the API documentation: https://cmu-delphi.github.io/delphi-epidata. For convenience, we have relayed these definitions in Sect A of S1 Appendix; we have also made all data (including properly-versioned data) used in our analysis available for download at: https://github.com/cmu-delphi/hhs-nowcasting.

Hypothetical scenarios

We consider two hypothetical scenarios, described next. Recall that our nowcasting evaluation spans April 1, 2021 to August 1, 2023, and we have training data all the way back until November 1, 2020.

Scenario 1: monthly updates.

Our first hypothetical scenario examines nowcasting from April 1, 2021 to November 30, 2021. In this scenario, on the first day of each month during this period, we receive daily hospitalization counts for the previous month, and we nowcast and backcast the (unobserved) hospitalization counts for each following day in the given month, using the outpatient and inpatient claims signals described above. Note that the period in this scenario covers the Delta wave in the US. The start of this hypothetical scenario is chosen to be April 1, 2021 so that we have a large enough “burn-in set” (initial training set), which extends back to November 1, 2020. (This includes the winter wave of 2020, which helps the initial models capture relationships during a time of dynamic change.)

To make this all more precise, let us introduce some notation. We drop reference to the location here and in what follows, whenever convenient (whenever it is not needed for the given explanation). Let be a date that marks the start of a month during the period of April 1, 2021 to November 30, 2021. Then on each day , where marks the first day of the next month, we have access to:

  • , hospitalization counts through day , with versions as of day ; and
  • , outpatient and inpatient signals through day t, with versions as of day t.

On each such day t, we use the data we have to train a regression model, call it , to predict hospitalizations from claims signals. We use this to make nowcasts:

and lag-k backcasts:

for each k = 1 ,  , 10. (Clearly, a nowcast is equivalent to a lag-0 backcast.) This is repeated for each of the 9 months in the period spanned by this monthly-update scenario. The details of how regression models are trained and evaluated will be given in the next subsection.

Scenario 2: no updates.

Our second hypothetical scenario spans nowcasting dates from December 1, 2021 to August 1, 2023. In this scenario, we receive reported hospitalizations up through November 30, 2021 with versions as of December 1, 2021, and we receive no further hospitalization counts after that. As before, we nowcast and backcast the (unobserved) hospitalization counts in the remaining period using the outpatient and inpatient claims signals. Note that the period in this scenario covers the Omicron wave in the US, and that this is second scenario is generally far more challenging than the first scenario.

The data received and nowcasts and backcasts made in the no-update scenario can be written in precise notation, in fact, exactly as introduced in the description for the monthly-update scenario above, but now we fix at December 1, 2021, and make nowcasts and backcasts at each t from through the end of the period, August 31, 2023. The details of how regression models are trained and evaluated in this scenario will again be covered in the next subsection.

Regression model

As a basic working model, we predict hospitalizations using a linear combination of a set of lags of the inpatient signal and a set of lags of the outpatient signal,

(1)

where the coefficients , , and are estimated, i.e., the model is fit, by training on historical data available at time t, either separately for each location (recall, the notational dependence on the location has been dropped for now), or in a way that pools data across locations. Details will be given below. First, we describe how we select the lag sets , for the inpatient and outpatient features.

Selecting feature lags.

To select the lag sets , for the working model (1), we restrict our attention to the burn-in set, before the first nowcast date of April 1, 2021 (so as to avoid overfitting to the data in our nowcasting period and reserve this for proper evaluation of our models in the two scenarios outlined above).

To guide lag selection, we consider three metrics, which measure predictive power, feature stability, and data availability. Let denote a generic feature in our usual notation for versioned data, i.e., for the inpatient feature or for outpatient feature, we consider, as a function of lag j:

  • correlation of with finalized hospitalizations , which is a measure of predictive power;
  • correlation of with its own finalized value , which is a measure of stability;
  • the fraction of total claims used to compute the finalized signal that are observed by time t.

(Note: the finalized signal values would not have been available at the end of the burn-in period. However, this does not pose any additional risk to overfitting the data in the nowcasting period, which was the reason to separate out the burn-in set in the first place.) Each metric is computed over time: between November 1, 2020 and March 31, 2021, for each location. The results are displayed in Fig 2, averaged over all locations (all states). The shaded regions display  ± 1 standard error bands, computed over the state averages.

These metrics exhibit a tradeoff as a function of the lag j. The first is nonmonotone: is increases at first because the claims signal at a larger lag j is less volatile (more total claims observed), but then decreases eventually because the signal reference date tj is less related to hospitalizations at t. The second and third metrics are monotone in j. Altogether, we can see that lag 6, for each of the inpatient and outpatient signals, offers a nice balance across the three metrics: maximum predictive power (achieving the highest correlation between hospitalization for both groups of signals), high stability and availability. We therefore include lag 6 in each of the sets and . To improve robustness and capture longer-range dependencies, we also include lags 13 and 20 in each of and , which is roughly consistent with choices of feature lags in other basic epidemic prediction models (e.g., [15]). The choice of 7-day spacing here is also motivated by the desire to limit correlations between features in (1); recall that the inpatient and outpatient signals are computed using a trailing 7-day window of claims data.

Training the regression model.

At each nowcast date t, we fit the coefficients , , in (1) by solving the following (weighted) least squares optimization problem:

(2)

where we use exponentially decaying observation weights:

(3)

for a tuning parameter γ ≥ 0. The index in (2) marks the latest observation boundary for hospitalization reports before time t: this is either the start of the month in scenario 1 (where we receive monthly updates), or December 1, 2021 in scenario 2 (where we receive no further updates). Hence, in other words, we fit the coefficients in (2) by minimizing the weighted mean squared error of our working regression model, over all dates at which response values (reported hospitalization counts) are available. Note carefully that in (2) we only use properly-versioned data that would have been available at t. If any such data, response or feature values, are missing at time t then the corresponding summand is simply omitted from (2).

As a default, we fit the model by solving (2) separately for each location (recall, we have suppressed the dependence on in the notation for simplicity). We call this the state-level model. Later, we discuss schemes for pooling training data across locations. Next, we focus on the decay parameter γ in (2), (3).

Selecting the decay parameter by cross-validation.

Before describing how we select γ in the exponential weights (3) that are used in the weighted least squares problem (2), we pause to discuss the question: why is it useful to use decaying observation weights in the first place? The reason is because the the features—lagged versions of the inpatient and outpatient medical insurance claims signals, and response—reported hospitalizations, need not be jointly stationary, i.e., their relationship may be changing over time.

Looking back at Fig 1, note that we observe evidence of nonstationarity in the relationship between the real-time inpatient signal and reported hospitalizations. If we were to regress reported hospitalizations on the inpatient signal alone, then the regression coefficient that would be appropriate for the period April–July 2021 would be too small for the first hospitalization wave starting in December 2020 (and to a lesser extent, also too small for the second wave starting in August 2021).

By allowing γ itself change over time, we can adapt to the degree of nonstationarity at any point in time, i.e., we can adapt to the amount of past training data that is relevant for the current prediction. Indeed, we will select γ in a dynamic, time-varying fashion using cross-validation (CV). Given the sequential nature of our prediction problem, we use a version of CV that is purely forward-looking, and is sometimes referred to as time series cross-validation in the literature [39]. This works as follows. Let denote the index that marks the most recent observation boundary for hospitalization reports, and let denote the indices that mark the previous two observation boundaries before . For each γ in a grid Γ of tuning parameter values, we carry out the following procedure:

  • for each :
    • fit a regression model by solving:
    • produce backcasts , ;
  • for each :
    • fit a regression model by solving:
    •  -  produce backcasts , ;
  • compute the mean absolute error (MAE) of all backcasts made at all times , against reported hospitalization counts , .

In other words, this procedure uses the last 2 months of data as a validation set for tuning γ. Ultimately, we choose γ ∈ Γ that minimizes the MAE computed in the last step above. To be clear, after choosing γ in this way, we then fit the model by solving (2) and use this to make backcasts at each time , where is the observation boundary after .

Toward defining the tuning parameter set Γ, we first compute , the value of γ such that the effective sample size of the weight sequence equals 30, i.e., it solves

This is chosen as a heuristic upper bound on the “reasonable” range of γ values, motivated by the idea that restricting to at least (effectively) 1 month of training data helps to avoid fitted models which are too volatile. We then set Γ to contain 25 evenly-spaced values between 0 and .

Stabilizing predictions by geo-pooling.

To borrow strength across locations, we consider fitting the regression model at each time t by pooling data across locations, which we refer to as the geo-pooled model. To be precise, instead of solving (2) per location , here we instead solve:

(4)

Note that in (4), the training set includes data from all locations . Importantly, in (4), the response is the hospitalization rate in location at reference date s, as of time t, which is defined as the hospitalization count per 100,000 people, i.e.,

where is the population of location . The use of rates rather than counts in the pooled regression (4) is critical, because otherwise the coefficients would have different meanings in different locations, and pooling across locations would not make sense. After solving (4), in order to use the fitted model to make predictions of hospitalization counts at a given location , we would then need to rescale the predictions by .

Finally, we also consider a mixed model, which linearly combines the state-level and geo-pooled predictions, via

(5)

for a mixing parameter λ ∈ [ 0 , 1 ] . We select λ, separately per state, using the same cross-validation strategy described previously, where we tune λ over a grid of 50 evenly-spaced values between 0 and 1. (We tune over γ separately for each of the state-level and geo-pooled models, and then tune over λ.)

Prediction intervals

In addition to considering point predictions, from the models described above, we use residuals from these models and quantile tracking [40] to generate prediction intervals. Quantile tracking is a method from the online conformal literature, and produces nonparametric intervals which are guaranteed to attain long-run coverage over an arbitrary bounded data sequence (including nonstationary time series). Abstractly, given a predicted value of some unobserved target value at time t, we begin by specifiying a score function φ (assumed to be negatively-oriented, so that lower values correspond to better accuracy) and we construct a prediction set at time t via

Here denotes a parameter that is output by the quantile tracking algorithm. Given a nominal coverage level 1 - α, the parameter is updated using an online gradient descent step with respect to an optimization problem defined by summing the level 1 - α quantile losses of over the sequence t = 1 , 2 , 3 , .

In our problem, we do not receive new target values after we make each prediction, unlike the standard online learning setup used in [40], so we adapt their method so that is adjusted only at the observation boundaries in scenario 1. Further, to allow our intervals to exhibit varying-width between observation boundaries, we use a scaled score that divides the residual by the value of the prediction. Finally, in order to allow our intervals to be asymmetric around the predicted value, we define a separate lower and upper score, and , respectively:

and effectively run two instances of quantile tracking, with parameters and , respectively, each at the level 1 - α ∕ 2. Combining the results from these two procedures then gives us the lower and upper endpoints of the prediction intervals.

We are now ready to describe our adaptation of quantile tracking (under batched updates). In the same notation as that used to describe fitting the regression models above, let denote the latest observation boundary, and denote the next observation boundary. Suppose we are making backcasts at lag k (with k = 0 representing nowcasts). For each , we:

  • produce a point prediction using one of the regression models described in previous subsections;
  • produce a prediction interval via

Then, at , we:

  • look back and compute one-sided coverage errors,
  • update each of , by taking appropriate gradient descent steps,
    where η > 0 is a small fixed learning rate.

To prevent crossing of quantiles, we enforce and (by clipping them at zero if a gradient descent update were to make them negative).

Results

This section examines the nowcasting and backcasting results in scenario 1 (monthly-update period), and scenario 2 (no-update period), through quantitative backcast error analysis and qualitative inspection of the nowcast dynamics over time. We then perform an ablation study, to tease apart how individual parts of the model relate to overall accuracy, and finish by analyzing the prediction intervals.

Scenario 1: backcast error analysis

Fig 3 shows the MAE of all backcasts, as a function of lag k = 0 ,  , 10, from the state-level, geo-pooled, and mixed models over the monthly-update period (which, recall spans April 1, 2021 to November 30, 2021). To be clear, for each model, this is computed by averaging the absolute error of backcasts made to finalized hospitalization counts, per state. The left panel displays the average of these state MAE values, and as well as standard error bands. The right panel is similar but normalizes the state MAE values by state population times , thus considering MAE on the scale of hospitalization rates (rather than counts).

thumbnail
Fig 3. MAE as a function of backcast lag for the state-level, geo-pooled, and mixed models in scenario 1, the monthly-update period.

The shaded regions show  ± 1 standard error bands, over the state MAE values.

https://doi.org/10.1371/journal.pcbi.1012717.g003

The geo-pooled model is clearly worse in terms of MAE than the mixed and state-level models. On the counts scale (left panel), the state-level model has slightly better MAE than the mixed model; on the rates scale (right panel), the opposite is true. This is because the mixed model generally performs better for smaller states, where predictions tend to be more volatile and shrinking toward the geo-pooled model helps to reduce variance. This is not reflected in the MAE plot on the left panel, since poor backcast performance on small states contributes little when measured in terms of counts.

In absolute terms, the backcasts made by the state-level and mixed models are generally quite accurate. For example, the nowcasts (backcasts at lag 0) made by the state-level and mixed models have MAEs of 0.428 and 0.405, respectively, on the scale of hospitalization rates. These correspond to proportions of variance explained (PVEs) of 71.6% and 75.4%, respectively. Even the geo-pooled model, which is relatively quite a bit worse, produces nowcasts with an MAE of 0.524 on the scale of hospitalization rates, which corresponds to a still respectable PVE of 61.1%.

Scenario 1: illustrative nowcast examples

We examine nowcasts made by the state-level and mixed models during the monthly-update period, in four states: California (CA), Kentucky (KY), Vermont (VT), and New York (NY). The first three are chosen to demonstrate the qualitative behavior of nowcasts in states of different sizes, with CA having the largest population, KY having roughly median population, and VT having the smallest. NY is chosen because it represents somewhat of failure case for the robustness of nowcasts from the state-level model.

Fig 4 displays state-level and mixed model nowcasts for CA, KY, and VT. To be clear, in each panel of the figure, the nowcasts use real-time inpatient and claims signals as predictive features, and the models behind these nowcasts are trained using hospitalization data up to the latest observation boundary (marked by dotted vertical lines). The shaded bars in the figure display finalized reported hospitalizations, the target of ultimate interest. In CA and KY, where the Delta wave is prominent (roughly July to November), we can see that both the state-level and mixed model nowcasts track the dynamics of the Delta wave. For example, looking at July 1, 2021 in CA, the hospitalization reports from the previous months (which is all that would have been available at that time) show no indication of an upswing to come. Still, the nowcasts during July present a clear upward trend, which means that policy-makers would know from these nowcasts that a wave is underway. As we see it, evaluating nowcasts for their qualitative shape (in comparison to hospitalization waves) is important—just as much as (if not more so than) numerical error analysis.

thumbnail
Fig 4. Nowcasts from the state-level and mixed models in scenario 1, the monthly-update period, for CA, KY, and VT.

The dotted vertical lines mark the observation boundaries—when hospitalization reports for the previous month are received, and models are retrained.

https://doi.org/10.1371/journal.pcbi.1012717.g004

Being much smaller, VT has on the order of ten reported COVID-19 hospitalizations daily, not several hundreds, as in KY or CA. Accordingly, we can see in Fig 4 that the (finalized) reported hospitalizations curve in VT is much noisier—we do not really see a clear Delta wave (instead, an increasing but quite noisy trend from August through December). The claims signals are also more noisy in a small state like VT, since they are ratios of small counts. Therefore, both the response and features are more volatile in the prediction problem for VT, and not surprisingly, the nowcasts here appear qualitatively worse. That said, the nowcasts still roughly track the gross trends in reported hospitalizations: higher in April and May, lower in June and July, and growing again in August onward.

Generally, the discussion above translates to the rest of the US: nowcasts tend to be in good qualitative agreement with hospitalization waves in larger states, and less so in smaller states. Sect C of Appendix S1 provides a full set of nowcast and backcast plots in all 50 states, from the mixed model, under scenario 1. Occasionally, we find that the trend in the predictions disagrees with that in hospitalization reports in a given month, but this gets corrected at the next observation boundary, once new training data is available and the regression model is refit. We also find that backcasts at lag 5 or 10 tend to be smoother than nowcasts.

Fig 5 shows state-level and mixed model nowcasts for NY. This is a notable example of a failure in robustness of the state-level model: its nowcasts for a good part of the month of June are actually negative (and we truncate them at zero for visualization purposes). This is due to a large, systematic revision which occurred on June 8 in the outpatient signal, where its values for reference dates in May were revised upward. Moreover, when the state-level model is fit to data through the end of May, the regression coefficient on the largest outpatient feature lag ends up being negative, and consequently the upward revision of past feature values on June 8 introduces a strong downward bias in the subsequent predictions. Supporting analysis for this explanation is given in Sect B of S1 Appendix.

thumbnail
Fig 5. Nowcasts from the state-level and mixed models in scenario 1, the monthly-update period, for NY.

The dotted vertical lines mark the observation boundaries, as in Fig 4.

https://doi.org/10.1371/journal.pcbi.1012717.g005

As we can see in the figure, the mixed model produces nowcasts that are much more stable in the month of June. However, the mixed model and even the geo-pooled model are themselves not immune to large and erratic revisions in the input features at prediction time. These revisions can result in erratic nowcasts and backcasts. To mitigate this, we could turn to models that fuse information across a wider variety of auxiliary signals (where ideally, some of these signals would have less severe backfill than claims-based signals), an idea that we return to in the discussion section.

Scenario 2: backcast error analysis

Fig 6 shows the MAE of all backcasts, as a function of lag k = 0 ,  , 10, from the state-level, geo-pooled, and mixed models over the no-update period (spanning December 1, 2021 to August 31, 2023). The format is as in Fig 3. We see two notable differences compared to the results in the monthly-update period. First, as expected, each model performs worse than it did in Fig 6. On the scale of hospitalization rates (right panel), the range of MAEs has jumped from (roughly) 0.4–0.53 in the monthly-update period to 0.62–0.72 in the no-update period, and correspondingly, the PVEs have dropped from (roughly) 61–75% to 10–37%.

thumbnail
Fig 6. MAE as a function of backcast lag for the state-level, geo-pooled, and mixed models in scenario 2, the no-update period.

The shaded regions show  ± 1 standard error bands, over the state MAE values.

https://doi.org/10.1371/journal.pcbi.1012717.g006

Second (and perhaps a bit more surprising), we see in the no-update period that the state-level model performs clearly worse than the geo-pooled and mixed models when MAE is measured either on the counts or rates scale. The mixed model MAE is somewhat better than the geo-pooled model MAE with respect to counts, while the two are basically the same with respect to rates. It is encouraging to see that the mixed model performs competitively across both monthly-update and no-update scenarios: this model—equipped with the CV procedure for tuning λ—is able to effectively adapt the strengths of local and global training approaches (underlying the state-level and geo-pooled models) to the task at hand, in order to yield accurate predictions in an average-case sense.

Scenario 2: illustrative nowcast examples

We examine nowcasts made by the geo-pooled and mixed models for CA, KY, and VT. NY, which served as failure case in the monthly-update period, is not shown here, because KY itself provides such an example: as we will see, the mixed model lacks robustness over a part of the Omicron wave (meanwhile, the mixed model performs fine for NY throughout the no-update period—as shown in the Sect B of S1 Appendix).

Fig 7 displays the nowcasts for CA, KY, and VT, with the same general format as in Fig 4. CA is a clear success case: both geo-pooled and mixed models capture the dynamics of the Omicron wave faithfully, and also track the summer 2023 and winter 2023 waves, despite (recall) receiving no reported hospitalizations past December 1, 2022. VT, as in the monthly-update period, is a challenging case because it corresponds to much noisier prediction problem, and the nowcasts here look qualitatively worse overall. However, the mixed model nowcasts still pick up the Omicron wave. KY represents a failure case: the mixed model nowcasts for all of February are negative and truncated at zero for the visualization. What this is actually demonstrating is a failure of the state-level model, and simultaneously, a failure of tuning of the mixing parameter λ. The mixed model here ends up placing a large weight on state-level predictions (not shown), which are themselves volatile for reasons similar to what happens in NY during the monthly-update period—large revisions to the input features at prediction time.

thumbnail
Fig 7. Nowcasts from the geo-pooled and mixed models in scenario 2, the no-update period, for CA, KY, and VT.

The dotted vertical line (on the left side of the plot) marks the sole observation boundary in this period—hospitalization reports for all dates prior to this are available, but no reports are received after that.

https://doi.org/10.1371/journal.pcbi.1012717.g007

The volatility of the mixed model nowcasts in KY during Omicron provides an important perspective: in the MAE sense, we found in Fig 6 that the mixed model (with its CV tuning for λ) successfully navigates the strengths and weaknesses of the geo-pooled and state-level models; and yet for specific states and periods of time, the mixed model can still lack robustness. To be clear, such fragility is not limited to the no-update period, and it could have happened in NY in the monthly-update period, had the CV tuning procedure for λ not downweighted the state-level predictions so heavily. Improving robustness is a direction for future work, and we revisit this topic in the discussion section.

Lastly, in Sect D of S1 Appendix, we again provide a full set of nowcast and backcast plots in all 50 states, from the mixed model, under scenario 2. For larger states, we find that the nowcasts generally trace out the Omicron wave, but for smaller states, this happens less consistently and the nowcasts look more noisy. Backcasts at lag 5 or 10 tend to smooth out the nowcasts, though not dramatically.

Ablation study

To examine the importance of some of our modeling choices (as described in the methods section), we carry out an ablation study in which we remove a particular component of the model, use a simpler alternative in its place, and evaluate the result in terms of MAE. In particular, we consider four ablated models:

  1. Unweighted, all past: we fit the regression model (2) without observation weights, on all past reported hospitalization data available (considering summands s < t in (2)).
  2. Unweighted, two months: we fit the regression model (2) without observation weights, on the reported hospitalization data available for the latest two months (considering summands in (2)).
  3. Weighted, inpatient only: we fit the regression model (2) using only the lags of the inpatient signal.
  4. Weighted, outpatient only: we fit the regression model (2) using only the lags of the outpatient signal.

Tables 1 and 2 compare the performance of the state-level model to these ablated models, for scenarios 1 and 2, respectively. In each scenario, for each model, we compute its MAE over the range of time values in the scenario, per backcast lag k = 0 ,  , 10 and state, then we report the average and standard error of these MAE values across all backcast lags and states. The state-level and inpatient-only models perform the best overall, and make up the two best models in either scenario. Meanwhile, the all-past model is slightly worse and the two-month model is significantly worse, which emphasizes the importance of decaying weights (and CV tuning) as part of the original model design. The outpatient-only model is competitive in scenario 1 but not in scenario 2. This could be a reflection of distribution-drift in the relationship between the outpatient signal and hospitalization reports post Omicron, which in scenario 2 is compounded by our inability to retrain the regression model in order to account for such drift.

thumbnail
Table 1. MAE results from the state-level and ablated models, averaged over all backcast lags and all states, in scenario 1.

https://doi.org/10.1371/journal.pone.0313772.t001

thumbnail
Table 2. MAE results from the state-level and ablated models, averaged over all backcast lags and all states, in scenario 2.

https://doi.org/10.1371/journal.pone.0313772.t002

Prediction interval analysis

We now examine the performance of the quantile tracker, applied to the state-level model in scenario 1. To gain an understanding of how quantile tracking performs against off-the-shelf alternatives, we consider:

  • parametric intervals: we apply the standard method for obtaining prediction intervals associated with a well-specified linear regression model with independent Gaussian errors;
  • sample quantiles: we obtain intervals by simply computing and as sample quantiles of all past scores (in lieu of quantile tracking), where the empirical distribution of past scores is weighted with the same decaying weights that are used to train the regression model.

We and evaluate all methods in terms of both coverage of their prediction intervals, and interval score (a metric that combines coverage and sharpness), which for a predicted interval  [ l , u ]  at the nominal level 1 - α, and target value y, is defined as

Fig 8 reports interval score and coverage averaged over all dates in scenario 1, at the nominal coverage levels 0.6 and 0.8. (The setup is the same as that in Fig 3, except with either interval score or coverage replacing absolute error.) Quantile tracking clearly dominates the other methods, and it is the only method achieving anywhere close to the nominal coverage level.

thumbnail
Fig 8. Interval score and coverage as a function of backcast lag for quantile tracking, parametric (linear regression) modeling, and sample quantiles in scenario 1, when the nominal coverage level is 0.6 (first row) and 0.8 (second row).

The shaded regions show  ± 1 standard error bands, over the state values.

https://doi.org/10.1371/journal.pcbi.1012717.g008

Fig 9 visualizes the intervals formed by quantile tracking and the two alternative methods in CA, KY, VT. The nominal coverage level is 0.8. We can see that quantile tracking generally accounts for uncertainty much better than the other methods, avoiding exceedingly narrow intervals.

thumbnail
Fig 9. Nowcasts and prediction intervals from quantile tracking, parametric (linear regression) modeling, and sample quantiles in scenario 1, for CA, KY, and VT.

https://doi.org/10.1371/journal.pcbi.1012717.g009

Discussion

This paper demonstrates that relatively simple regression models and medical insurance claims data can be used to provide accurate real-time estimates of reported COVID-19 hospitalizations, in different hypothetical scenarios in which hospitalization reporting is dramatically reduced in frequency, or shut down entirely. Of course, we are not advocating for nowcasts from claims data to replace traditional public health surveillance. However, leveraging statistical models which use auxiliary data for nowcasting, so that we may then reduce reporting frequency and thus reduce the burden that reporting entails, may be a favorable tradeoff for public health agencies to consider.

We now reflect on some aspects surrounding modeling in this paper. First, the working linear models in the methods section could have been expressed equivalently using an appropriate probabilistic framing. For example, if we ignore versioning for simplicity, we can of course recast the regression in (2) used to make predictions at time t as the maximum likelihood estimator in the model:

(6)

where denotes the normal distribution with mean a and variance , and where . Specifying distributional assumptions explicitly, as done in (6), becomes particularly important if one relies on the model for problems of inference. Our focus in this paper was primarily on prediction, evaluated via pure prospective predictive accuracy. However, recall, we did touch on inference by investigating prediction intervals, and for this, we found that the parametric prediction intervals generated by the Gaussian linear model (6) performed significantly worse according to coverage or WIS (see Figs 8 and 9) compared to the nonparametric intervals obtained using quantile tracking. For this reason, and because of our focus on prediction more generally, we chose to present our models in the methods section from the perspective of optimization rather than from a probabilistic framing.

The observation weights in (3), used to deal with nonstationarity in the relationship between Y s and the signals Is-j and Os-j as s varies, cause the variance in the equivalent probabilistic model (6) to diverge as s moves back into the past. Traditional time series models handle nonstationarity in a different manner. For example, in ARIMA, we would model the differences (of a given order) of the series as a stationary process. This accommodates certain forms of nonstationarity, depending on the order d of the difference, i.e., d = 1 accommodates a linear drift, d = 2 a quadratic drift, and so on. Meanwhile, the model (2) or equivalently (6) allows for a more general form of nonstationarity, albeit one that varies smoothly in time (the exponential training weights can be viewed as a softer version of using a trailing training window in (2), or limiting the number of samples in (6)). The ablation studies, recall, exposed the importance of these observation weights (with CV tuning for the decay parameter). It would be interesting to connect the use of these weights to a more traditional time series method. At present, the precise connection is unclear to us, and this may be a useful direction for future work.

In terms of smoothing, the geo-pooled (4) and mixed models (5) were simple modifications of the basic regression model (2), based on pooling the training data across different locations. The literature on spatial modeling offers various alternatives for richer models which leverage spatial dependence more intelligently (note that pooling does not leverage this dependence at all). Unfortunately, the resolution (and data) we have available in our study is too coarse to support this type of model. Epidemics are driven by much more local dynamics, and at the local level, further issues like political dynamics [41] and especially social determinants of health [42] likely become important factors as well in terms of modeling and leveraging dependence.

Transitioning to discussing data sources, the medical claims signals used in our work have been available, in real-time, for essentially the entire COVID-19 pandemic. In general, medical insurance claims cover a wide range of health conditions and COVID-19 is not the only worth target for nowcasting systems built on top of claims data. As an example, nowcasts of influenza hospitalizations would also be useful to public health decision-makers, and could be generated via analogous regression models using proxy claims signals. Whether an approach like ours could be used operationally in a future pandemic is a complex issue, which depends on many factors (including data availability). That said, one could argue that methods which tend to be the most reliable in times of emergency are the ones which have been developed and tested ahead of time. This motivates us to continue working on proof-of-concept systems like the one in this paper.

In a different vein, medical insurance claims are certainly not the only relevant auxiliary data stream for tracking COVID-19 hospitalizations, and this analysis may be repeated with any number of other auxiliary signals. In operational systems in public health, robustness is (arguably) more important than average-case performance, and failure examples of the proposed nowcasting methods, as seen in the results section (NY in scenario 1, and KY in scenario 2) would likely be concerning to public health decision-makers. These failure examples were driven by large revisions to the claims signals, occurring at certain points in time. Any model which makes predictions based on a linear combination of a set of features can suffer from erratic behavior if these features are subject to large fluctuations at prediction time. Combining multiple nowcasts built from different auxiliary signals is a way to improve robustness, especially when some of the signals are more stable and less subject to heavy revisions. Signals derived from electronic medical records (EMR), medical device data, and internet search queries all typically have less backfill compared to insurance claims signals.

Model combination methods—which also go by names such as: aggregation, ensembling, or fusion—have been quite successful in influenza nowcasting systems in pre-pandemic years (e.g., [2023]). Ensembles of COVID-19 forecasts have likewise demonstrated considerable robustness compared to the constituent forecasters (e.g., [4344]). An important direction for future work is to incorporate similar ideas into the nowcasting settings considered in this paper, in an effort to move toward more reliable and robust systems.

Suporting information

S1 Appendix. The file contains additional details on claim signals and an exhaustive list of backcasts produced.

https://doi.org/10.1371/journal.pcbi.1012717.s001

(PDF)

Acknowledgments

We would like to thank members of the Delphi research group for valuable feedback, and Change Healthcare and Optum/United Health Group for their invaluable data partnership and their collaboration.

References

  1. 1. United States Department of Health and Human Services. COVID-19 guidance for hospital reporting and FAQs for hospitals, hospital laboratory, and acute care facility data reporting [Internet]. 2023 [cited 2023 Aug 01]. Available from: https://healthwatchusa.org/downloads/20230611-covid-19-faqs-hospitals-hospital-laboratory-acute-care-facility-pdf
  2. 2. Nickels TP. RE: CMS-3401-IFC [Internet]. 2020 [cited 2023 Aug 01]. Available from: https://www.aha.org/system/files/media/file/2020/11/aha-comment-cms-aug-25-interim-final-rule-on-covid-19-data-reporting-lett.pdf.
  3. 3. Panczak R, von Wyl V, Reich O, Luta X, Maessen M, Stuck AE, et al. Death at no cost? Persons with no health insurance claims in the last year of life in Switzerland. BMC Health Serv Res. 2018;18(1):317
  4. 4. Li S, Yang Y. An empirical study on the influence of the basic medical insurance for urban and rural residents on family financial asset allocation. Front Public Health. 2021;9:631304
  5. 5. Sakai M, Ohtera S, Iwao T, Neff Y, Kato G, Takahashi Y, et al. Validation of claims data to identify death among aged persons utilizing enrollment data from health insurance unions. Environ Health Prev Med 2019;24(1):63. pmid:31759388
  6. 6. Zheng L, Peng L. Effect of major illness insurance on vulnerability to poverty: evidence from China. Front Publ Health. 2021;9:671035.
  7. 7. Durizzo K, Harttgen K, Tediosi F, Sahu M, Kuwawenaruwa A, Salari P, et al. Toward mandatory health insurance in low-income countries? An analysis of claims data in Tanzania. Health Econ 2022;31(10):2187–207. pmid:35933731
  8. 8. Mori T, Komiyama J, Fujii T, Sanuki M, Kume K, Kato G, et al. Medical expenditures for fragility hip fracture in Japan: a study using the nationwide health insurance claims database. Arch Osteoporos. 2022:17(1);57.
  9. 9. Nakayama T, Imanaka Y, Okuno Y, Kato G, Kuroda T, Goto R, et al. Analysis of the evidence-practice gap to facilitate proper medical care for the elderly: investigation, using databases, of utilization measures for National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB). Environ Health Prev Med 2017;22(1):51. pmid:29165139
  10. 10. Jung YS, Kim YE, Go DS, Munkhzul R, Jung J, Yoon SJ. Associations between private health insurance and medical care utilization for musculoskeletal disorders: using the Korea health panel survey data for 2014 to 2015. Inquiry. 2020;57:004695802098146.
  11. 11. Geng J, Chen X, Shi J, Bao H, Chen Q, Yu H. Assessment of the satisfaction with public health insurance programs by patients with chronic diseases in China: a structural equation modeling approach. BMC Public Health. 2021:21(1);1736.
  12. 12. Yao Q, Li H, Liu C. Use of social health insurance for hospital care by internal migrants in China: evidence from the 2018 China migrants dynamic survey. Front Public Health. 2022;10:838983.
  13. 13. Song SO, Han E, Son KJ, Cha BS, Lee BW. Age at mortality in patients with type 2 diabetes who underwent kidney transplantation: an analysis of data from the Korean national health insurance and statistical information service, 2006 to 2018.. J Clin Med. 2023;12(9):3124.
  14. 14. Reinhart A, Brooks L, Jahja M, Rumack A, Tang J, Agrawal S, et al. An open repository of real-time COVID-19 indicators. Proc Natl Acad Sci U S A 2021;118(51):e2111452118. pmid:34903654
  15. 15. McDonald DJ, Bien J, Green A, Hu AJ, DeFries N, Hyun S, et al. Can auxiliary indicators improve COVID-19 forecasting and hotspot prediction?. Proc Natl Acad Sci U S A 2021;118(51):e2111453118. pmid:34903655
  16. 16. Rosenfeld R, Tibshirani RJ. Epidemic tracking and forecasting: Lessons learned from a tumultuous year. Proc Natl Acad Sci U S A 2021;118(51):e2111456118. pmid:34903658
  17. 17. Viboud C, Charu V, Olson D, Ballesteros S, Gog J, Khan F, et al. Demonstrating the use of high-volume electronic medical claims data to monitor local and regional influenza activity in the US. PLoS One 2014;9(7):e102429. pmid:25072598
  18. 18. Smolinski MS, Crawley AW, Baltrusaitis K, Chunara R, Olsen JM, Wójcik O, et al. Flu near you: crowdsourced symptom reporting spanning 2 influenza seasons. Am J Public Health 2015;105(10):2124–30. pmid:26270299
  19. 19. Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc Natl Acad Sci U S A 2015;112(47):14473–8. pmid:26553980
  20. 20. Farrow DC. Modeling the past, present, and future of influenza [dissertation]. Pittsburgh, PA: Carnegie Mellon University; 2016.
  21. 21. Santillana M, Nguyen AT, Louie T, Zink A, Gray J, Sung I. Cloud-based electronic health records for real-time, region-specific influenza surveillance. Sci Rep. 2016;6(1):1–8.
  22. 22. Bastos LS, Economou T, Gomes MFC, Villela DAM, Coelho FC, Cruz OG, et al. A modelling approach for correcting reporting delays in disease surveillance data. Stat Med 2019;38(22):4363–77. pmid:31292995
  23. 23. Jahja M, Farrow DC, Rosenfeld R, Tibshirani RJ. Kalman filter, sensor fusion, and constrained regression: equivalences and insights. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alche-Buc F, Fox EB, Garnett R, editors. Advances in neural information processing systems. Vol. 32; 2019 Dec 8–14; Vancouver, Canada. Red Hook, NY: Curran Associates; 2019. p. 13166-–175.
  24. 24. Yang C-Y, Chen R-J, Chou W-L, Lee Y-J, Lo Y-S. An integrated influenza surveillance framework based on national influenza-like illness incidence and multiple hospital electronic medical records for early prediction of influenza epidemics: design and evaluation. J Med Internet Res 2019;21(2):e12341. pmid:30707099
  25. 25. Ackley SF, Pilewski S, Petrovic VS, Worden L, Murray E, Porco TC. Assessing the utility of a smart thermometer and mobile application as a surveillance tool for influenza and influenza-like illness. Health Informatics J 2020;26(3):2148–58. pmid:31969046
  26. 26. Brooks LC. Pancasting: forecasting epidemics from provisional data [dissertation]. Pittsburgh, PA: Carnegie Mellon University; 2020.
  27. 27. Leuba SI, Yaesoubi R, Antillon M, Cohen T, Zimmer C. Tracking and predicting U.S. influenza activity with a real-time surveillance network. PLoS Comput Biol. 2020;16(11):e1008180. pmid:33137088
  28. 28. Radin JM, Wineinger NE, Topol EJ, Steinhubl SR. Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the USA: a population-based study. Lancet Digit Health 2020;2(2):e85–93. pmid:33334565
  29. 29. Jahja M, Chin A, Tibshirani RJ. Real-time estimation of COVID-19 infections: deconvolution and sensor fusion. Statist Sci. 2021;37(2):207–28.
  30. 30. Li T, White LF. Bayesian back-calculation and nowcasting for line list data during the COVID-19 pandemic. PLoS Comput Biol 2021;17(7):e1009210. pmid:34252078
  31. 31. Menkir TF, Cox H, Poirier C, Saul M, Jones-Weekes S, Clementson C, et al. A nowcasting framework for correcting for reporting delays in malaria surveillance. PLoS Comput Biol 2021;17(11):e1009570. pmid:34784353
  32. 32. Bergström F, Günther F, Höhle M, Britton T. Bayesian nowcasting with leading indicators applied to COVID-19 fatalities in Sweden. PLoS Comput Biol 2022;18(12):e1010767. pmid:36477048
  33. 33. Miller AC, Hannah LA, Futoma J, Foti NJ, Fox EB, D’Amour A, et al. Statistical deconvolution for inference of infection time series. Epidemiology 2022;33(4):470–9. pmid:35545230
  34. 34. Seaman SR, Samartsidis P, Kall M, De Angelis D. Nowcasting COVID-19 deaths in England by age and region. J R Stat Soc C. 2022;71(5):1266–81.
  35. 35. De Salazar PM, Lu F, Hay JA, Gómez-Barroso D, Fernández-Navarro P, Martínez EV, et al. Near real-time surveillance of the SARS-CoV-2 epidemic with incomplete data. PLoS Comput Biol 2022;18(3):e1009964. pmid:35358171
  36. 36. Wolffram D, Abbott S, An der Heiden M, Funk S, Günther F, Hailer D, et al. Collaborative nowcasting of COVID-19 hospitalization incidences in Germany. PLoS Comput Biol 2023;19(8):e1011394. pmid:37566642
  37. 37. Lison A, Abbott S, Huisman J, Stadler T. Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates. PLoS Comput Biol 2024;20(4):e1012021. pmid:38626217
  38. 38. Farrow DC, Brooks LC, Tibshirani RJ, Rosenfield R. Delphi Epidata API [Internet]. 2015 [cited 2023 Aug 01]. Available from: https://github.com/cmu-delphi/delphi-epidata
  39. 39. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. 3rd ed. Melbourne: OTexts; 2021.
  40. 40. Angelopoulos A, Candes EJ, Tibshirani RJ. Conformal PID control for time series prediction. In: Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, editors. Advances in Neural Information Processing Systems 36; 2023 Dec 10–16; New Orleans, USA. Red Hook, NY: Curran Associates; 2023.
  41. 41. Jones B, Kiley J. The changing geography of COVID-19 in the U.S. [Internet]. Washington, DC: Pew Research Center; 2020 [cited 2023 Aug 01]. Available from: https://www.pewresearch.org/politics/wp-content/uploads/sites/4/2020/12/PP_2020.12.08_COVID-19-Deaths-Geography_Data-Essay.pdf
  42. 42. Khan SS, Krefman AE, McCabe ME, Petito LC, Yang X, Kershaw KN, et al. Association between county-level risk groups and COVID-19 outcomes in the United States: a socioecological study. BMC Public Health 2022;22(1):81. pmid:35027022
  43. 43. Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci U S A 2022;119(15):e2113561119. pmid:35394862
  44. 44. Ray EL, Brooks LC, Bien J, Biggerstaff M, Bosse NI, Bracher J, et al. Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States. Int J Forecast 2023;39(3):1366–83. pmid:35791416