I restarted the analyses in early July with the onset of the new Melbourne outbreak in July 2020. The logic behind these charts is that they fill an information gap. Official data sources only give historic data series, and mainstream media typically only give near term predictions based on opinion.
Chart update 11 August 2020
What’s new?
Today’s new number of cases, 353 as per the Australian Government Department of Health on 11/8/2020, is again substantially lower than the estimate of the model projections. We have definitely, in my view, crossed the peak. Furthermore, I feel reasonably confident that transmission control has definitely been improved – likely a combination of the stage 4 restrictions (which commenced on 2 August 2020) and other policies (such as mandatory face coverings) earlier. The model has been consistently over-estimating case counts in the past week, which is what we would expect to see if there is a change (for the better) in transmission dynamics. As such, it is very plausible that the 7 and 14 days projections given by the model are much higher than what will be the reality.
My experience with this model from the March 2020 was that projections from early on in the epidemic tend to underestimate slightly in the short-term (days), and overestimate in the longer-term (weeks). This bias is something to keep in mind.
Projection of new daily cases of COVID-19 with data up to 11 August 2020
What is this?
The image is a chart of the confirmed daily new cases of COVID-19 in Australia, with a projection for the next 2 weeks. The projection is made using a model by fitting the data since 1 June 2020 to a Gompertz equation using non-linear regression. The dark blue dashed line is the model estimate. The grey dashed lines are the 95% prediction intervals, with the values given at 7 and 14 days into the future. The blue gradations can be understood as the degree of uncertainty in the model projections.
“Gompertz” equation?
The Gompertz function is a type of sigmoid, or “S”-shaped curve. It’s been around since the early 19th century and was initially used to describe and model demographic mortality curves, and hence, well known to actuaries. The Gompertz function can also be used to accurately model biological growth (e.g., epidemics, tumour size, enzymatic reactions). I have chosen to use this model to help with creating insights as earlier in the pandemic, it was found to be useful in modelling cumulative cases of COVID-19 from the Chinese outbreaks (Jia et al. arXiv:2003.05447v2 [q-bio.PE]). My experience from the initial outbreak from earlier in the year was that this equation gave reasonable descriptions of Australian and New Zealand data (for instance, NZ data below).
How have the model projections changed over the month?
The video demonstrates how the projections have evolved over time as new daily data have become available. This can give a better sense of where we are headed, given that the model cannot account for changes in context (e.g., policy changes, changes in testing rates, etc.)
My interpretation
In my view, there is a clear and substantial change in trajectory in the model projections in the past few days. This is excellent news. This is plausibly due to the effect from the combination of the mandatory face coverings and stage 4 restrictions policies. The daily evolution of the model now as it incorporates the new data means that it is likely that the model performance is currently poor – it is very likely that the model projections into the future are substantial overestimates. My experience with doing the same analyses in March-April is that the model should improve again with more data, but the precision will be low (i.e., wide prediction intervals).
New daily numbers haven’t climbed in the past fortnight in NSW, but also haven’t whittled away either. With the widespread testing and contact tracing, and improving physical distancing adherence and use of masks, I am hopeful that we won’t see a major outbreak in Sydney.
More information about the “peak” in new cases
What does it mean to have reached the peak in new cases? IF our suppression of transmission doesn’t become more effective after the peak, it roughly occurs at 40% of the total cumulative cases in an outbreak. Our intuition might be it is the “halfway point”, but this is not the case. That means that at the time we hit the peak, we can expect up to another one-and-a-half times the number of cases so far in the outbreak, before it ends. The insight is that we must resist the psychological temptation to relax transmission control mechanisms simply because we “crossed the peak”. Indeed, crossing the peak is an opportunity to increase the intensity in transmission control as the system has now developed increased capacity. Doing so may accelerate the drop in new cases. I have some confidence that we have the political and social will to do this in Australia.
More information about “moving averages”
Several people have requested adding “moving averages” on the charts. This is not something that I will include in the main chart and video, so I wanted to provide an explanation. Moving averages are a type of “smoothing” algorithm. This is potentially useful as the daily fluctuations in the case numbers are less interesting, than the underlying pattern of growth. Daily fluctuations relate to system effects such as batching in testing and reporting, while the underlying trend relate to the transmission dynamics of COVID-19 in the community.
So, this sounds pretty useful?! The problem is that moving averages are a rather crude method of smoothing. It doesn’t matter whether we use simple moving averages or exponential moving averages (which give greater weight to more recent data). Other more sophisticated (and in my opinion, rather better) smoothing algorithms can/should be used such as cubic splines, LOESS, or a general additive model. Examples of these are in the below comparative chart.
However, something to notice is that the model that has been used to create projections (the aforementioned Gompertz equation) does a really good job at providing a “smoothing” of the existing data points. It is a very close fit to the smoothed trend using the GAM. This is good as it implies that the model does a reasonable job at describing the known existing data – if it didn’t do this then we shouldn’t have any confidence that the model would provide reasonable future projections. Basically, there is no point in providing “moving averages” in addition to the actual data points other than cluttering the chart with less useful information.
Smoothing methods (SMA vs EMA vs GAM) of case numbers, compared to model projections with data up to 11 August 2020
Want to know more?
Primary data source is the Australian Government Department of Health COVID-19 website for daily new cases. Analysis done using RStudio Cloud using R version 4.0.2.