What’s new?
Today’s announcement of 356 new local cases in NSW was the highest so far in this outbreak. The recent case counts have been bouncing around somewhat, but the model remains largely stable. Both the models are performing well with good levels of error in its 7-day projected estimates. The projections for the cumulative number of local cases for today, from the Gompertz and Richards’ growth models one week ago, have only underestimated by 5% and 3% respectively.
The model projections suggest that the outbreak is continuing to grow, with the rate of growth increasing. We are likely to see around 2000-2500 new cases over the coming week. The current lock down conditions in Greater Sydney have been extended to the end of August 2021, but it is implausible that the outbreak will be at a stage where these conditions can be lifted. This will require a major improvement in the transmission dynamics of the outbreak. Worryingly, it is difficult to see how this will occur with the current policy approach.
Starting from 8 August 2021, I completely changed the way the code computes the confidence intervals and confidence bands in the projections of the new daily cases charts. The details are in the comments in the code, reproduced here:
# Projections of NEW daily case counts based on Gompertz/Richards' model # Note: This is effectively the first derivative of the model. # Technically, this could potentially be calculated exactly as there is an analytic # solution to the first derivative to the Gompertz equation. However, I don't know # how to do this or implement it in R. Prior to 7 August 2021, I estimated this by # using a function to fit a cubic spline to the Gompertz/Richards' model estimate # which then easily allowed for an estimated first derivative. However, I didn't # have a reasonable way to calculate the confidence intervals, so resorted to a hack. # I basically used the confidence interval of the estimate from the model as the CI # of the derivative, assuming that this would be the upper bound. Clearly this was # unsatisfactory. # # The following is the new logic I used in creating the estimate for the number of NEW # daily cases, and the 95% confidence interval/distribution for that estimate. This is # effectively a bootstrapping method. # # For each day's TOTAL case number projection in the fitted Gompertz/Richards' model, I use # the standard error of that fitted value to create a randomised normal distribution of # values. I compute a list of 1,000,000 values for each day. I then subtract a list of # these values from the list of the prior day, effectively creating a list of differences # between the days. It is likely that this distribution of values will be overly # conservative as in reality, it is not possible for the total number of cases to ever # be less than a previous day's. # # The median, or middle value of this list of differences is the NEW case estimate as # directly computed from the model. Specific percentiles, which are the proportion of values # of the distribution can be interpreted as the confidence interval. For instance, the 95% # confidence interval is the interval between the 2.5% and 97.5% centiles.
Context and timeline of the Sydney 2021 outbreak
I started these analysis in early July with the new COVID-19 (delta variant) outbreak in Sydney NSW. The purpose of these charts is that they provide some projections into the near future. Hopefully this allows for some data driven expectations. The Sydney 2021 series starts on 12 July 2021 and are available here: https://vitualis.com/?page_id=4071
The current Sydney outbreak started on 17 June 2021 – the first day with reports of community cases in Bondi. The NSW Government formally commenced stay-at-home orders (“lockdown”) for a number of inner-Sydney regions on 25 June 2021 at 2359, having announced this earlier in the day. This was then broadened 18 hours later to cover the whole of Greater Sydney. The initial plan was a two-week lockdown, with it potentially being lifted on 9 July 2021.
By 9 July 2021, there were early signs that COVID-19 transmission was worsening. On 15 July 2021, the lockdown was again extended until the end of July, and then extended again, this time for four weeks on 28 July 2021. Conditions at the end of the month for people living with South Western, and Western Sydney included a limit to travelling only 5 km from home, masks at all times outside, with only essential workers allowed to leave the local government area, with the requirement of a COVID-19 swab every 72 hours.
Projection of new daily cases, and cumulative counts of COVID-19 with data up to 10 August 2021
What is this?
The top image is a chart of the cumulative (total) COVID-19 cases in NSW, starting from 17 June 2021, and the lower image is a chart of the daily new cases. Only local cases are included (i.e., excluding cases identified in quarantine). Projections are given for the next 7 days. It should be noted that estimates have high levels of uncertainty beyond a few days and must be interpreted cautiously.
The projections are made using a model by fitting the cumulative case data since 17 June 2021 to a Gompertz curve using non-linear regression. The dark central dashed lines are the model estimates, with 95% confidence intervals of the estimate. On the lower chart, the blue gradations can be understood as the degree of uncertainty in the model projections.
Gompertz and Richards’ growth curve
The Gompertz function is a type of sigmoid, or “S”-shaped curve. It’s been around since the early 19th century and was initially used to describe and model demographic mortality curves, and hence, well known to actuaries. The Gompertz function can also be used to accurately model biological growth (e.g., epidemics, tumour size, enzymatic reactions). I have chosen to use this model to help with creating insights as earlier in the pandemic, it was found to be useful in modelling cumulative cases of COVID-19 from the Chinese outbreaks (Jia et al. arXiv:2003.05447v2 [q-bio.PE]).
I had previously used the Richards’ growth curve (or the generalised logistic function), which is a broad family of sigmoid (S-shaped) curves that can describe well many types of growth, including epidemics. It has been demonstrated to have utility in modelling COVID-19 outbreaks in 2020 (Lee et al. PLoS One 2020 doi: 10.1371/journal.pone.0236860).
Why the change?
It was clear that both models were not fitting the data well, but the Richards’ curve model was performing especially poorly, with estimates that were obviously implausible. I’ve undertaken some assessment of the degree of predictive error in both the Gompertz and Richards’ growth curve models. These charts compare the 7- and 14-day total case projections of the models, to what actually occurred in reality 7 and 14 days later. For interpretation, above the 0% error line means that the model provided an over-estimate compared to reality, and below the 0% error line an underestimate.
Both models were in early July providing substantial underestimates, which is what I suspected when I started this series. The Richards’ growth curve model has struggled with the fit to the data, with poor predictive accuracy. It is entirely useless at 14 days. Recently, both models have been doing quite well with the 7-day estimates, though the 14-day projections seem to be substantial underestimates. The Richards’ growth model described the Melbourne 2020 outbreak well, especially when I applied it after the daily number of new cases had reached its peak. Retrospectively using the model early in the outbreak found a similar pattern of poor performance.
Daily case trends
Comparison between the Gompertz and Richards’ growth curve model projections, along with smoothed data trends (7-day simple moving average, and GAM) with data up to 10 August 2021
The generalised additive model gives a descriptive “reality check” to the models. The GAM can be considered as an advanced smoothed trend of the daily counts. My interpretation is that the initial lockdown did reduce the grow rate in cases, but did not reverse the trend. In early July 2021, we see a sudden increased rate of growth in cases, which represents the outbreak and community transmission in South Western Sydney.
Want to know more?
Primary data source is from NSW Health for daily new cases. The analysis is performed using RStudio Cloud using R version 4.1.0.