I've long said the confirmed cases in the US was a poor count of total cases. It's obviously way too low. But how low? A recent article by three authors (Silverman, Hupert and Washburne--see the article here: https://www.medrxiv.org/content/10.1101/2020.04.01.20050542v2 ) suggests the number of people symptomatic with COVID-19 in a three-week span in March was 28 million, and the ratio of confirmed to symptomatic cases was about 100 to 1 during that time. By the end of March, the authors believe the ratio had dropped some. Still, asymptomatic cases and continued increases in infections would mean that if the 28 million estimate were correct, we would be somewhere north of 70 million cases of COVID in the US right now.
I found the article quite compelling because they used CDC flu-like-illness data, which has been collected for about 10 years, to come to their conclusions. They looked at the excess of people coming to the doctor with such symptoms. In other words, they accounted for the regular flow of people going to the doctor for flu-like symptoms, and found there were far too many. Now, the first thing I thought is that more people came to the doctor because they were worried about COVID (so less severely sick people perhaps came in). The authors acknowledge this also, but using the percentage of people who were later admitted to the hospital in NYC, they found that sicker (not less sick) people were coming in. The question, of course, is whether that bias in NYC went the same way in other localities.
The authors also found that the increase in excess flu-like symptoms was roughly at the rate of the COVID death rate a few weeks later. This is consistent with those additional individuals having COVID as opposed to a re-emergence of the flu.
So, let's say the article is correct and we have maybe 70 million cases (roughly 100 times the current confirmed cases of 760,000) in the US. We have 40,000 reported deaths. What death rate does that imply? Well first we need to adjust the 40,000, since we know that many people currently living with the disease will die. As a very rough measure, let's say that it will be at least 70,000 total (meaning 30,000 more deaths based on current cases). This implies a death rate of 0.1%, or 1 in 1,000. Even if we believe current cases will lead to 160,000 deaths (roughly four times the number we have today), the death rate would only about 0.25%. Could the death rate be that low?
My conclusion is that it's very unlikely, because it conflicts with other data we have.
First, let's look at New York City, where we have roughly 10,000 deaths. A 0.1% death rate would mean that even if everyone who will die of COVID in NYC has already died, then 10 million people have been infected. This cannot be true because it exceeds the population of NYC. Assuming another 10,000 will die in NYC (a much more reasonable estimate based on Italy and other country's experience), then the population death rate cannot be less than about 0.25% (and that is only if everyone in NYC is currently infected or has been infected). The same logic holds for New York State, which has about 20,000 deaths now and 20 million people.
So while the death rate could theoretically be 0.25%, that would imply that everyone in NY is currently infected. So I conclude the death rate must be higher than 0.25%, meaning that US cases must be less than 70 million.
But how high? Let's consider the Diamond Princess Cruise data, where (virtually) everyone was tested, and there were 12 deaths out of about 700 infections. Adjusting for the fact that those passengers were very old compared to the US population, that implies about a 1% death rate (see this article: https://cmmid.github.io/topics/covid19/severity/diamond_cruise_cfr_estimates.html). An updated analysis, that also includes estimates for Wuhan, is here ( https://www.thelancet.com/action/showPdf?pii=S1473-3099%2820%2930243-7 ). Using the Lancet article and adjusting for the age distribution in the US, we get an implied US death death rate of just a little under 1%. The Lancet article did not include the recently revised death estimates in China, but it's not clear whether this would have an effect because the death rate was estimated from cases of people who left China not from those who stayed in China.
Thus, let's use 1%, and assume the total deaths in the US, based on current cases, will be around 80,000 (about double what it is today). This implies that about 8 million people (between 2 and 3 percent of the US) are infected.
Now let's look at one final source for the death rate: Iceland. Iceland is one of the very few places that have done wide-scale testing that included random testing. This does not mean that Iceland identified every positive case, but their figures are probably better than most (see this paper: https://www.nejm.org/doi/full/10.1056/NEJMoa2006100?query=featured_home ). In Iceland, there are 1,300 cases that have "resolved" (death or recovery) and 9 deaths (see https://www.worldometers.info/coronavirus/country/iceland/ ). This implies a death rate of 0.7% with a 95% confidence interval that the rate is between 0.3% and 1.3% (meaning if the cases were representative, the true death rate could reasonably be anywhere from 0.3% to 1.3%). Iceland is similar in terms of its age distribution but the US is slightly older and likely a little less healthy (just a guess on the health part). Thus, the Iceland estimate is likely on the low side.
So what's the net-net? The estimate of an infection rate of nearly 20% (70 million out of 300+ million) is likely much too high. A more reasonable estimate is around 3% (about 8 million), which is based on a death rate of 1%. Is this correct or close to the correct number? Probably not--there are too many assumptions baked into it. Could it be off by an order of magnitude? That seems unlikely.