I restarted the analyses almost a month ago with the onset of the new Melbourne outbreak in July 2020. The logic behind these charts is that they fill an information gap. Official data sources only give historic data series, and mainstream media typically only give near term predictions based on opinion.
Chart update 7 August 2020
What’s new?
Today’s new number of cases, 464 as per the Australian Government Department of Health 7/8/2020 update is lower than the daily projected estimate. It should be noted that case counts have been “lumpy” when considered at a daily level, bouncing up and down. When considered as a trend, the model remains very compatible with the data. The projections of the model remain stable with the inclusion of today’s data.
Stage 3 restrictions started in Melbourne on 9 July 2020, and stage 4 restrictions commenced on 2 August 2020. These restrictions will place much greater limits on social movement, including an overnight curfew. The move to stage 4 can be considered an acknowledgement that at a whole of system level, “stage 3” was insufficiently reducing transmission. This would be consistent with what we have seen with the model projections. If stage 4 restrictions reduce transmission, what we should see in the coming week is the model progressively over-estimating case counts with the “bending” of the curve.
My experience with this model from the March 2020 was that projections from early on in the epidemic tend to underestimate slightly in the short-term (days), and overestimate in the longer-term (weeks). This bias is something to keep in mind.
Projection of new daily cases of COVID-19 with data up to 7 August 2020
What is this?
The image is a chart of the confirmed daily new cases of COVID-19 in Australia, with a projection for the next 2 weeks. The projection is made using a model by fitting the data since 1 June 2020 to a Gompertz equation using non-linear regression. The dark blue dashed line is the model estimate. The grey dashed lines are the 95% prediction intervals, with the values given at 7 and 14 days into the future. The blue gradations can be understood as the degree of uncertainty in the model projections.
“Gompertz” equation?
The Gompertz function is a type of sigmoid, or “S”-shaped curve. It’s been around since the early 19th century and was initially used to describe and model demographic mortality curves, and hence, well known to actuaries. The Gompertz function can also be used to accurately model biological growth (e.g., epidemics, tumour size, enzymatic reactions). I have chosen to use this model to help with creating insights as earlier in the pandemic, it was found to be useful in modelling cumulative cases of COVID-19 from the Chinese outbreaks (Jia et al. arXiv:2003.05447v2 [q-bio.PE]). My experience from the initial outbreak from earlier in the year was that this equation gave reasonable descriptions of Australian and New Zealand data (for instance, NZ data below).
How have the model projections changed over the month?
The video demonstrates how the projections have evolved over time as new daily data have become available. This can give a better sense of where we are headed, given that the model cannot account for changes in context (e.g., policy changes, changes in testing rates, etc.)
My interpretation
According to the model, the peak in new cases is about a week away. I am concerned, and have noted for some days, at how “wide” the peak appears. An implication of this is that if transmission suppression is not improved, new case counts may take a long time to lower even after growth has plateaued. On a cumulative case chart, this would appear as a period of relatively linear growth.
There is no indication from the data that stage 4 restrictions have had an effect on daily new cases numbers, and nor would this be expected at this time. If we think of the restrictions as having an effect on transmission starting from the date of implementation, we should hopefully start to see the earliest effect by around 7 days. Of interest, a marked changed in the trajectory of projections was seen in the model starting about 14 days after stage 3 restrictions.
New daily numbers haven’t climbed in the past fortnight in NSW, but also haven’t whittled away. I think the phrase “on a knife’s edge” has been used to describe the situation in NSW. Widespread testing and contact tracing is taking place, along with improving physical distancing adherence and use of masks. Hopefully, it will be enough.
More information about the “peak” in new cases
What does it mean to have reached the peak in new cases? Assuming that our suppression of transmission doesn’t become MORE effective after the peak, it’s important to recognise that it is not the “halfway point”, which might be the intuition. The peak in the “new cases” curve corresponds to the “inflexion point” on the S-shaped cumulative cases curve (e.g., the first chart of the NZ cases in the brief description on the “Gompertz equation”. Roughly, the peak in new cases occurs at 40% of the total cumulative cases in an outbreak. That means that at the time we hit the peak, we can expect another one-and-a-half times the number of cases so far in the outbreak, before it ends. The insight is that we must resist the psychological temptation to relax transmission control mechanisms simply because we “crossed the peak”.
More information about “moving averages”
Several people have requested adding “moving averages” on the charts. This is not something that I will include, so I wanted to provide an explanation. Moving averages are a type of “smoothing” algorithm. This is potentially useful as the daily fluctuations in the case numbers are less interesting, than the underlying pattern of growth. Daily fluctuations relate to system effects such as batching in testing and reporting, while the underlying trend relate to the transmission dynamics of COVID-19 in the community.
So, this sounds pretty useful?! The problem is that moving averages are a rather crude method of smoothing. It doesn’t matter whether we use simple moving averages or exponential moving averages (which give greater weight to more recent data). Other more sophisticated (and in my opinion, rather better) smoothing algorithms can/should be used such as cubic splines, LOESS, or a general additive model. Examples of these are in the below comparative chart.
However, something to notice is that the model that has been used to create projections (the aforementioned Gompertz equation) does a really good job at providing a “smoothing” of the existing data points. It is a very close fit to the smoothed trend using the GAM. This is good as it implies that the model does a reasonable job at describing the known existing data – if it didn’t do this then we shouldn’t have any confidence that the model would provide reasonable future projections. Basically, there is no point in providing “moving averages” in addition to the actual data points other than cluttering the chart with less useful information.
Smoothing methods (SMA vs EMA vs GAM) of case numbers, compared to model projections with data up to 7 August 2020
Want to know more?
Primary data source is the Australian Government Department of Health COVID-19 website for daily new cases. Analysis done using RStudio Cloud using R version 4.0.2.
Today’s charts
Code for smoothing methods comparison: smoothing_options