Redskins are lucky to play bad teams, but how lucky?

A recent article in Yahoo Sports pointed out that the Washington Redskins are the first team in history to play six winless teams in a row. Here is their schedule so far (also according to the article cited above):


Week 1 — at New York Giants (0-0)

Week 2 — vs. St. Louis Rams (0-1)

Week 3 — at Detroit Lions (0-2)

Week 4 — vs. Tampa Bay Buccaneers (0-3)

Week 5 — at Carolina Panthers (0-3)

Week 6 — vs. Kansas City Chiefs (0-5)


The author of the article, Chris Chase (or, as he notes, his dad-let’s call him Mr. Chase), calculates the odds of this as 1 in 32,768. This calculation is incorrect and far too high for several reasons, which I get to below. But first, let me explain how the calculation was likely performed.


The calculation assumes, plausibly, that the Redskins have the same chance of playing any given team (unlike some college teams, who purposely make their schedules easy, this is not possible in the NFL).


The calculation also assumes, not plausibly, that teams that have thus far won no games have a 50-50 chance of winning each game. The implicit assumption there is that all NFL teams are evenly matched. The fact is that there are a few really good teams, a few really bad teams, and a bunch of teams in the middle. Thus, there are likely to be a bunch of winless teams after 5 games, and not, as the incorrect calculation below implies, only 1 winless team of 32 after 5 games.


Finally, the calculation, apparently in a careless error, assumes the chances of playing a winless team the first week are 50-50, when, of course, all teams are winless the first week.


So the Mr. Chase’s (incorrect) calculation is
Week 1 chances: 50% ( 1 in 2)
Week 2 chances: 50% (1 in 2)
Week 3 chances: 50%*50%=25% (1 in 4)
Week 4 chances: 50%*50%*50%=12.5% (1 in 8)
Week 5 chances: 50%*50%*50%=12.5% (same as week 3 because the team they played had only played three games)
Week 6 chances: 50%*50%*50%*50%*50%=3.125% (1 in 32)


A law of probability is that the chance of two unrelated events happening is the product of their individual chances. Thus, if the chance of rain today is 50% and the chance of rain tomorrow is 50%, the chance of rain both days is 25%, if those chances are unrelated (which, by the way, they probably aren’t). This is why the chances for multiple losses are multiplied together.


But back to the football schedule. To calculate the chances of 6 straight games against winless teams, Mr. Chase reasonably multiplied the 6 individual chances (again it assumed the 6 matchups were unrelated):
50% * 50% * 25% * 12.5% * 12.5% * 6.25% = .003%, or 1 in 32,768.
SO, the 32,768 is the number reported in the article.


The easy correction is that the chances of playing a winless team in the first game is 100%, so the calculation should be:
100% * 50% * 25% * 12.5% * 12.5% * 6.25% = .006%, or 1 in 16,384.
This error has been pointed out in comments on the article.


In addition, other comments point out the other major flaw: teams do not have equal probability of losing. Thus the chance that a team will be, say, 0-2 is not 25% (50%*50%) but something else, depending on the quality of the teams. At the extreme, half the teams lose every game and half win every game (this of course assumes losing teams only play against winning teams, but it is possible).


The reality is certainly not this extreme, which would imply a 50-50 chance each week of playing all losing teams (and thus a 1 in 32 chance of playing 6 in a row). So, how do we figure out the reality?


The easiest way is to look at, each week the percent of teams that are winless. If we assume the Redskins have an equal chance of playing each team, then we can compute the odds each week (click on the week to see the linked source). Note that everything is out of 31 teams instead of 32 because the Redskins can’t play themselves.


Week 1: 31 out of 31 teams winless. Chances: 31/31=100%
Week 2: 15 out of 31 teams winless. Chances 15/31=48% (I am assuming no byes first week and I know redskins lost their first game).
Week 3: 8 out of 31 teams winless. Chances: 8/31=26%
Week 4: 6 out of 31 teams winless. Chances: 6/31 = 19%
Week 5: 6 out of 31 teams winless. Chances 6/31 = 19%
Week 6: 4 of 31 teams winless. Chances: 4/31 = 13%


So the actual chances, assuming the Redskins have an equal chance of playing each team each week and cannot play themselves, are: 100%*48%*26%*19%*19%*13% = 0.06%, or 1 in about 1,700. Much more likely than 1 in 32,000 but still pretty unlikely.


And after all these easy games, how are they doing? Unluckily for Redskins fans, not too well…they’re 2-3 going into Sunday’s game against the winless Chiefs.

Unemployment down but joblessness is up?

There was a bit of interesting news that came out Friday–the nations unemployment rate actually declined, from 9.5% to 9.4%. This is true despite the fact that there was a net loss of jobs of 247,000 (see the NY Times article). How could this happen?

Well, the unemployment rate is calculated by taking the number unemployed and dividing by the labor force: Unemployment Rate= Number Unemployed / Labor Force.

The numerator in the equation, Number Unemployed, is defined as the number of people not employed minus anyone who hasn’t looked for a job in the last 4 weeks. The denominator of the equation, Labor Force, is defined as the Number Unemployed plus the number of people currently working (either full or part-time).

Thus, if people give up (and giving up is defined as not looking for the last 4 weeks), they are no longer counted in either the numerator or denominator of the equation. And that is exactly what happened between June and July of this year. According to the BLS (bureau of labor statistics), 637,000 people left the labor force between June and July. Thus, even though the number of people employed fell (by a seasonally adjusted 155,000), the unemployment rate also fell, because the number of people looking for work fell also (267,000). The net result was a drop in unemployment even though fewer people were working and more people lost jobs than found jobs.

A note about the math. At first blush, you may wonder whether it matters, since the people not looking are removed both from the numerator (Number Unemployed) and denominator (Labor Force). But mathematically, it does matter. Suppose we have a ratio 2/10, which equals 20%. Subtract 1 from the numerator and 1 from the denominator and you have 1/9, which equals 11.1%. Thus we subtracted the same number from the numerator and denominator but we did not end up with the same 20%. Instead we ended up with far less (11.1%).

The general rule is that the ratio falls when subtracting the same number from the numerator and denominator as long as the ratio is less than 1. So, 2/10>1/9 but 20/10
I would guess that the labor force drop-offs would be far higher during deeper recessions where many despair of getting work or decide to take a break from their search, and this guess is borne out by recent information on the BLS site, which cites the increase in discouraged workers this last year: “Among the marginally attached, there were 796,000 discouraged workers in July, up by 335,000 over the past 12 months. (The data are not seasonally adjusted.) Discouraged workers are persons not currently looking for work because they believe no jobs are available for them.”

This NY Times chart of unemployment uses a more reasonable definition and shows unemployment far higher than the official 9.4% rate. It includes all those who have looked for a job in the past year as well as part-time workers who want full-time work as part of the unemployed, and the unemployment rate is between 10 and 20%, depending on the state.

Riding a bike? Wear a helmet.

Now that the sun has finally come out in NYC today after what seems like weeks of rain and cold weather, it seems an appropriate time to talk about one of my favorite summer recreational activities–riding a bike.

Growing up in the 1970s, I don’t think I ever saw a helmet, much less wore one. However, in the same way we’ve figured out that seatbelts (and airbags) save lives, we also now know that biking with a helmet makes you safer. The Consumer Product Safety Council reported that wearing a helmet can decrease risk (of head injury) by as much as 85%.

Sadly, there are still a lot of enthusiasts out there that have a take no prisoners type attitude about wearing helmets, even implying that they are less safe (see for instance the helmet section of this web page in bicycle universe). Yet I think anyone who understands the statistics will see that the “freedom” of riding without a helmet is far outweighed by the risk.

The Insurance Institute for Highway Safety (IIHS) has long been a great source for safety information. They’ve got the same goal that hopefully most of us do, reducing deaths and injuries. In a 2003 report, the IIHS reports that child bicycle deaths has declined by more than 50% since 1975 (despite increased biking and presumably because most children wear helmets now). In addition, about 92% of all bicycle deaths were cyclists not wearing helmets (see this report). The same report also shows that while child bicycle deaths have declined precipitously(from 675 in 1975 to 106 in 2007), adult deaths have increased since 1975 (from 323 to 583).

Helmet usage is harder to figure out, but most sources put overall use around 50%, with children’s use higher. This means that, given that 92% of deaths are cyclists not wearing helmets, you stand about 11 times the chance of getting killed if you don’t wear a helmet. This number can be played with a little and wittled down if you assume, say, that cyclists not wearing helmets bike more dangerously, but there would have to be enormous differences for helmets to be shown to be ineffective. Moreover, all the major scientific studies show large positive effects from helmet usage (see this ANTI-helmet site for a summary of the case-control studies).

So why, when you search the internet for helmet effectiveness, or read through the literarture of a number of pro-cycling organizations, do they cast dispersions upon helmet use? This one, for me, is an enigma. I understood why the auto industry was against airbags and seatbelts (they cost money) and why the cigarette and gun manufacturers are against regulation, but why do people care so much about us not wearing helmets. I can think of only a couple of things: a) cyclists want bike lanes and other safety measures without committing to anything on their own, and b) some are too lazy/cool to bother with a helmet. Of course, I’m a cyclist and clearly, I’m all for helmets (and yes, laws requiring them). I also think that if we want state and city governments to take us seriously about increasing cyclist safety through new bike lanes, changing traffic patterns, and building of greenways, we need to do our part, too.

Why Swine Flu is a bunch of hogwash.

I first thought of writing about this a couple of weeks ago, when the nationwide hysteria concerning swine flu was just beginning, but then, as quickly as it came, it went. Now, with the first death from swine flu in NY, the front pages of the major newspapers have returned to the topic. The New York Times article and headline, was, as always, something close to languid. However, the NY post’s article and photos, are, also as usual, a bit hysterical. My son’s school, apparent readers of the post, have covered all the water fountains with plastic bags, perhaps unaware that the CDC clearly states there seems to be little or no chance of infection through drinking water.

What’s more is that, so far, this flu has been a very minor flu, with about 5,000 documented cases and 6 deaths. The blog of record relays that the “regular” flu has already killed something like 13,000 people in the US this year (it’s not clear whether this is derived from the CDC’s annual estimate of 36,000). This amounts to about 100 people a day.

While one CDC scientist estimates the number of people with the swine flu are 50,000 or so, this estimate assumes that under-reporting of swine flu is the same as under-reporting of flu in general. Given the focus on swine flu, I expect that under-reporting of it is far lower than of general flu, and thus, the true number with the swine flu is far fewer than 50,000. The CDC’s currently weekly flu report shows about one-third of the 1,286 new cases as swine flu (novel H1N1). The same report has a great graph, showing an irregular spike in flu diagnosis, just at the time when reported flu usually falls.

There are three pieces of good news, despite the scary spiked graph. First, with spring, flu cases quickly fall, because flu spreads less when people are further away from each other (i.e., outside instead of inside). Second, cases are already falling (though it’s only two weeks of data). Third, all types of flu diagnosis increased in the last two weeks versus the several weeks leading up to May), implying that one of the reasons (perhaps the only reason) for the spike is that we are testing much more than usual, due to the swine flu outbreak.

Thus, Swine flu has so far killed a documented 6 people in the U.S. out of more than 5,000 confirmed cases.

In conclusion, though our own hysteria may drive documented cases up some, and lead to my children having to bring a water bottle to school, the swine flu does not appear to be particularly dangerous or deadly.

Facebook and grades

I don’t have a long post for today, but I want to briefly discuss the discussion of a study on Facebook and grades. It was the subject of the Wall Street Journal’s Numbers Guy blog last week: .


The basic question is under what conditions should we publicize results, and should we wait for peer review?


Here was my comment:
I think if the caveats were printed along with the study results, then the publication is reasonable. Otherwise, we are being a bit paternalistic by implying that the general public cannot understand the caveats but we researchers can.


Suppose instead this was a study linking domestic air travel through a particular city to a new and deadly virus (say, swine flu?). Then there might be more reason to be more cautious (and paternalistic), because the cost of being wrong is very high. Still, there would be the counter-argument that not publishing could endanger people’s lives. We always have this trade-off, I believe, between unintentionally misleading people that a study is correct when it is not, and vice-versa.


In this open era, especially, I think the balance leans towards publishing, where the blogging/commenting public will quickly crucify the poor research and finding supporting evidence for good research.


How big a sample?

Suppose we want to figure out what percentage of BIGbank’s 1,000,000 loans are bad. We also want to look at smallbank, with 100,000 loans. Many people seem to think you’d need to look at 10 times as many loans from BIGbank as you would for smallbank.


The fact is that you would use the same size sample, in almost all practical circumstances, for the two populations above. Ditto if the population were 100,000,000 or 1,000.


The reasons for this, and the concept behind it, go back to the early part of the 20th century when modern experimental methods were developed by (Sir) Ronald A. Fisher. Though Wikipedia correctly sites Fisher in its entry on experimental design, the seminal book, Design of Experiments, is out of stock at Amazon (for $157.50, you can get a re-print of this and two other texts together in a single book). Luckily, for a mere $15.30, you can get David Salsburg’s (no relation and he spells my name wrong! 😉 ) A Lady Tasting Tea, which talks about Fisher’s work. Maybe this is why no one knows this important fact about sample size–because we statisticians have bought up all the books that you would otherwise be breaking down the doors (or clogging the internet) to buy. Fisher developed the idea of using randomization to create a mathematical and probability framework around making inferences of data. In English? He figured out a great way to do experiments, and this idea, or randomization, is what allows us to make statistical inferences about all sorts of things (and the lack of randomization is what sometimes makes it very difficult to prove otherwise obvious things).


Why doesn’t (population) size matter?
To answer this question, we have to use the concept of randomization, as developed by Fisher. First, let’s think about the million loans we want to know about at BIGbank. Each of them is no doubt very different, and we could probably group them into thousands of different categories. Yet, let’s ignore that and just look at the two categories we care about: 1) good loan or 2) bad loan. Now, with enough time studying a given loan, suppose we can reasonably make a determination about which category it falls into. Thus, if we had enough time, we could look at the million loans and figure out that G% are good and B% (100% – G%) are bad.


Now suppose that we took BIGbank’s loan database (ok, we need to assume they know who they loaned money to), and randomly sampled 100 loans from it. Now, stop for a second. Take a deep breath. You have just entered probability bliss — all with that one word, randomly. The beauty to what we’ve just done is that we’ve taken a million disparate loans and with them, formed a set of 100 “good”s and “bad”s, that are identical in their probability distribution. This means that each of the 100 sampled loans that we are about to draw has exactly a G% chance of being a good one and a B% chance of being a bad one, corresponding to the actual proportions in the population of 1,000,000.


If this makes sense so far, skip this paragraph. Otherwise, envision the million loans as quarters lying on a football field. Quarters heads up denote good loans and quarters tails up denote bad loans. We randomly select a single coin. What chance does it have of being heads up? G%, of course, because exactly G% of the million are heads up and we had an equal chance of selecting each one.


Now, once we actually select (and look at) one of the coins, the chances for the second selection change slightly, because where we had G% exactly, now there is one less quarter to choose from, so we have to adjust accordingly. However, that adjustment is very slight. Suppose, G were 90%. Then, we’d have, for the second selection, if the first were a good coin, a 899999/999999 chance of selecting another good one (that’s an 89.99999% chance instead of a 90% chance). For smallbank, we’d be looking at a whopping reduction to an 89.9999% chance from a 90% chance. This gives an inkling of why population size, as long as it is much bigger than sample size, doesn’t much matter.


So, now we have a sample set of 100 loans. We find that 80 are good and 20 are bad. Right off, we know that, whether dealing with the 100,000 population or the 1,000,000 population, that our best guess for the percentage of good loans, G, is 80%. That is because of how we selected our sample. It doesn’t matter one bit how different the loans are. They are just quarters on a football field. It follows from the fact that we selected them randomly.


We also can calculate several other facts, based on this sample. For example, if the actual number of good loans were 90% (900,000 out of 1,000,000), we’d get 80 or fewer in our sample of 100 only 0.1977% of the time. The corresponding figure, if we had sampled from the population of 100,000 (and had 90,000 good loans), would be 0.1968%. What does this lead us to conclude? Very likely, the proportion of “good” loans is less than 90%. We can continue to do this calculation for different possible values of G:

If G were 89%: .586% of the time would you get 80 or fewer.
If G were 88%: 1.47% of the time would you get 80 or fewer.
If G were 87%: 3.12% of the time would you get 80 or fewer.
If G were 86.3%: 5.0% of the time would you get 80 or fewer.
If G were 86%: 6.14% of the time would you get 80 or fewer.

In each of the above cases, the difference between a population of 1,000,000 and 100,000 loans makes a difference only at the second decimal place, if that.


Such a process allows us to create something called a confidence interval. A confidence interval kind of turns this calculation on its head and says, “Hey, if we only get 80 or fewer in a sample 1.47% of the time when the population is 88% good, and I got only 80 good loans in my sample, it doesn’t sound too likely that the population is 88% good.” The question then becomes, at what percentage would you start to worry?


For absolutely no reason at all (and I mean that), people seem to like to limit this percent to 5%. Thus, in the example above, most would allow that, if we estimated G such that 5% (or more) of the time, 80 or fewer of 100 loans would be good (where 80 is the number of good in our sample), then they would feel comfortable. Thus, for the above, we would say, with “95% confidence, 86.3% or fewer of the loans in the population are good.” We could just as well have figured out the number that corresponded to 1% and stated the above in terms of 99% confidence, with the corresponding higher G or figured out the number that corresponds to 30% and stated the above in terms of 70% confidence. However, everyone seems to love 5% and the 95% confidence that goes with it.


Back to sample size versus population. As stated above, the population size, though 10 times bigger, doesn’t makes a difference. For a given probability above, we are using the hypergeometric distribution to calculate the exact figure (the mathematics behind it are discussed some in my earlier post).


Here are some of the chances associated with a G of 85% and a sample size of 100 that yields 80 good loans.

Population infinite : 10.65443%
Population 1,000,000: 10.65331%
Population 100,000 : 10.64%
Population 10,000 : 10.54%
Population 1,000 : 9.49%
Population 500 : 8.21%

This example follows the rule of thumb: you can ignore the population size unless the sample is at least 10% of the population.

7 letter scrabble word redux

A recent article by the Wall Street Journal’s “Numbers Guy” has re-surfaced one of my old posts regarding scrabble. In it I said that after the first turn, you must get an 8-letter word to use all your letters, because your seven letters need to connect to an existing word.


This, of course, is not correct, as was pointed out in comments to the Numbers Guy’s blog (this was also pointed out by my sister). All you need to do to use all your letters with a 7-letter word is find a place to connect that is parallel to an existing word. For example, ‘weather’ could be connected parallel to a word ending in ‘E”, since ‘we’ is a word.


Maybe that’s why my sister won so many scrabble games against me when I was a kid.

Are same-sex classes better?

Yesterday’s New York Times had an article,Boys and Girls Together, Taught Separately in Public School,” about same-sex classes in New York City. In particular, the article focused on P.S. 140 in the Bronx. The article looks upon such classes favorably, despite the fact that there is, as far as I can tell, no evidence that such classes lead to better achievement.


In particular, the article states: “Students of both sexes in the co-ed fifth grade did better on last year’s state tests in math and English than their counterparts in the single-sex rooms, and this year’s co-ed class had the highest percentage of students passing the state social studies exam.”


In other words, the City is continuing this program, even though the evidence indicates that not only are students in same-sex classes doing no better, they are doing worse! The principal, who has introduced some programs that have achieved material results, said: ““We will do whatever works, however we can get there…we thought this would be another tool to try.” This seems reasonable, but the article states,”…unlike other programs aimed at improving student performance, there is no extra cost.” There may not be a monetary cost, but making these students laboratory rats in someone’s education research project doesn’t help them, and, apparently in this case, hurts them. Not to mention the opportunity cost of not exposing these children to other programs that might actually help.


To be fair, the scholarly literature is not consistent in its conclusions about whether same-sex classes improve achievement. However, many of the U.S. studies showed little or no improvement. See, for example:
Singh and Vaught’s study
LePore and Warren


On the other hand, some English and Australian studies indicate that, at least for girls, same-sex classes or schools may result in higher achievement (see, for example, Gillibrand E.; Robinson P.; Brawn R.; Osborn A.) while others indicate that there are no differences (see Harker).


So the literature seems to be mixed, and I would imagine there are numerous confounding factors that make this something hard to measure–for example, typical single-sex classes in New York City consist of low-income minority students, where the boys are seen as being at-risk more than the girls. Contrast with the British and other foreign studies, where the girls are the greater concern for under-achievement.


Despite this, it’s questionable how long it is ethical to continue a program, like the one at P.S. 140, where the current known outcome is that boys and girls are doing worse in same-sex classes.


The age-old NY subway question–unlimited or pay-per-ride?

Before you get onto the New York City subway these days, you have to purchase a metro card. For us daily commuters, the choice would appear to be obvious–purchase an unlimited card. With it, you can get on and off the subway as many times as you like within some period of time. Surely, the MTA prices it to make it worth the money.


However, when I do the calculation for my own behavior, I never seem to get my money’s worth from the unlimited. This is because the unlimited card price is always higher than the cost of buying a per ride card if you are only using the card to commute to work during the week. Even if you are using it for one round trip on the weekend, you would still pay less by buying a “pay-per-ride” card unless you buy a 30-day card.


The following table shows the cost of each unlimited ride card, followed by the amount of trips that could be purchased for that same amount. Because the MTA gives a 15% bonus for all “par per ride” purchases over $7, the nominal value in the table shows the value that will be shown on your metro card if you purchase a pay per ride card.


Unlimited Days Unlimited Cost Nominal Value if purchased as a “pay per ride” card Trips if purchased per trip Trips used if going to and from work only, 5 days a week Trips lost if buying unlimited only for work versus purchasing regular card Also one weekend trip each week Trips lost if buying unlimited versus purchasing regular card with 1 weekly fun trip
1 $7.50 $8.63 4.3 na na na
7 $25.00 $28.75 14.4 10 4.4 12 2.4
14 $47.00 $54.05 27.0 20 7.0 24 3.0
30 $81.00 $93.15 46.6 44 2.6 52 -5.4


The first line shows the one-day card, which can be purchased for $7.50. You can use that same $7.50 to purchase $8.63 in value instead, which will be good for 4 trips plus $0.63. Thus, you’d only want to get an unlimited one-day ride if you were making at least 2 round trips.


The 7 day unlimited costs $25. If you use that same $25 to instead purchase a pay-per-ride card, you get $28.75 of value, entitling you to 14 trips (plus $0.75 additional of stored value). If you go to work every weekday during the 7 day period, you’d use just 10 trips (5 round trips). If you also use the card for one round trip during the weekend, you are up to 12 trips, still 2.4 trips short of what you could have purchased with the $25 for a pay-per-ride.


As the table shows, you are always better off purchasing pay-per-ride cards instead of unlimited cards if you are just using your metro card for commuting. Even if you take one round trip in addition to work every week, only the 30-day unlimited would be worth it, and this only if you go to work every weekday during the period and use the card once each weekend. Many people work at home from time to time and there is typically a federal holiday each month, so the 30-day figures are optimistic.


The other issue with the unlimited is a psychological one: I get upset if I forget my unlimited card or end up not taking the subway a couple days when I could have used the card. With the pay-per-ride, you only pay for what you use. Perhaps more annoyingly, the pay-per-ride cards display the amount left each time you enter the subway, but the unlimited cards do not tell you the number of days left on your card when you enter the subway, and thus, if you don’t keep track it yourself, you will be jammed in the legs with a locked turnstile at least once a month when you purchase an unlimited card.


I realize there are some who not only commute to work but also very frequently take subway trips to go out or run errands. For those, the unlimited cards may be worth it. For others, stick with the pay-per-ride.

Nutty about Peanuts.

Visiting South Carolina this weekend, I picked up an old Southern favorite from Publix: peanut butter cookies (I didn’t see my real favorite: boiled peanuts). No sooner had I returned home than my Mom admonished me for buying them, because they were unsafe, and possibly tainted with Salmonella.


Sure enough, there was an article in The State confirming the outbreak. So far, around 500 people around the country have been sickened (and possibly 6 deaths) from what is believed to be contaminated peanuts. USA Today confirms the continuing “epidemic” today. While these figures seem high, 500 people sickened with food poisoning in a period of four months, across the entire U.S., is hardly a risk worth mentioning. According to, the number of incidents of food poisoning or sickness is 200,000 a day. OK, you might say, but Salmonella is pretty serious and if you don’t take antibiotics you might be laid up for several days. Fine, but the same site says that there are about 1.4 million cases of Salmonella annually, or about 3,835 a day (the CDC says about 40,000 cases are reported annually, but that there are many more unreported).


So why are we getting exercised about a mere 4 cases a day, as with the current outbreak? My best answer is that 1) it makes for interesting news, 2) any problem that affects so broadly a population, even with minuscule or infinitesimal risk, is seen by reporters as being important, and 3) people cannot easily assess their relative risk.


As for me, I explained to my Mom that I’m not too concerned, and quickly had a peanut butter cookie before she could run back to the store. After waiting a day or so to make sure I was Salmonella-free, the rest of the family followed. 😉


Are we entering “unprecedented” territory?

(click graph for greater resolution)

The reports of gloom and doom are abounding, and, I must admit, I believe most of them.


I am going to focus purely on the stock market, because the data is readily available, and because I believe the broader economic problems are only just beginning. My March blog pointed out that in the stocks versus bonds 20-year view, stocks almost always won, but the results are much more mixed over shorter periods. I also need to point out that I overestimated the results for stocks by assuming dividends were not included in the indices. For the Dow indices, the subject of much of that discussion, dividends are included (see the Dow Jones site), so the graphs in that blog are correct, but the numbers should not be adjusted further for dividends, meaning that stocks’ edge over bonds is less impressive.


Today’s post, though, is really about the graph above, showing 1 ,10, and 20 year returns on the Dow since 1928 (from December to December). From December 3, 2007 through December 1, 2008, the Dow lost 37% of its value. This horrible run is beaten only once, from December 1930 to December 1931, when the Dow lost 53%. The years 1930, 1937, and 1974 (again, December to December) were the only other years where the 12 month loss was more than 20%.


Thus, historically, though not unprecedented, the yearly drop in the Dow is, well, statistically “improbable” (that is, if you base your probabilities only on history). While the 10 and 20 year numbers are much more in line with history, they are still on the low end of the distribution. The last time the 10-year change was negative, as it is now, was 30 years ago, in 1978, in the waning years of very tough economic times.


The next few months will start to indicate how deep an economic hole we’ve dug for ourselves, but the stock market numbers are not encouraging, and the extent to which the economy is dependent on the market (in the sense that assets are tied to it) seems much more like the 20s and 30s than like the 70s. Let’s hope I’m wrong.


Election Prediction Explained

So here’s the explanation.


I am following 3 major websites now: – This consolidates polls by state to predict the count.Electoral-vote apparently uses simple averaging to consolidate its data.I prefer this method because it requires little interpretation on their part.Interpretation involves assumptions about bias in the polls, and I believe it is hard to figure out the exact impact of the bias or even the direction. Electoral-vote has Obama at 364. At this time in 2004, they had Kerry at 283 (see this page), whereas his election day total was 252, with the main difference being Florida. More telling, the “strong” Obama States total 264 votes, as opposed to 95 for Kerry at this point. – This consolidates polls by state to predict the count using some complex weighting system. It’s a neat idea but it’s end result is about the same as averaging, and I am not at all convinced it is better.


They’ve got Obama at 346.5, much more than the 270 needed to win. – This well-established survey company is different from the two above in that they actually conduct the polls. Gallup is showing primarily national results, and has Obama significantly up, both in raw percentages and when adjusting for “likely” voters—people Gallup has determined are likely to vote, based on two different models. Gallup’s daily tracking polls has Obama’s lead almost unchanged since the start of October (never more than the statistical error).


My conclusion from the above—Obama will be the next US President.


So why the change from before, when I said polls are difficult to trust and spoke of biases?


Three reasons:


1) the closer we get to the election, the better correlation between intentions and actions


2) the closer we get to the election, the fewer undecided voters. A recent Reuters poll shows this at about 2%. Even if it is 5% and the undecided break 4 to 1 for McCain, he’s going to lose.


3) The biases appear to lean in Obama’s favor: more younger voters likely and more early voters. Very biased reporting from Grandma in S.C. says that lots of young people were out voting early (she spent 2 hours on line to vote early, by the way).

Election Polls

A short note about election polls, which I’ve been following somewhat religiously for the last few weeks.


Election polls differ in at least four significant ways from actual voting.


First, polls are typically of around 1,000 people or less, which means that at best, they are statistically precise to within plus or minus three percent. This means that a six point difference between 2 candidates may be nothing more than sampling error (i.e., a statistical anomaly).


Second, polls tend to be of the general population and not of likely Electoral College votes, which is how the election is counted (but see for a count of Electoral votes, according to polls). As we know from recent elections, the Electoral vote percentages frequently (and seemingly increasingly) do not correspond to popular vote percentages.


Third, polls are snapshots on how people feel on a certain day. Americans seem to be particularly fickle in their opinions recently, perhaps due to the economic turmoil, so don’t trust that today’s lead won’t disappear tomorrow.


Finally, many polls do not remove unlikely voters (though you do see some figures concerning “likely voters”). Polls of people who do not vote are fairly useless, but pollster’s haven’t been very successful in predicting who will actually vote. Thus, the tendency is to include respondents who are registered and say they plan to vote, without looking at their demographics to see what they’ve done in the past.


For all these reasons, if you’re an Obama supporter, you should be worried and if you’re a McCain supporter, you should have some hope. Either way, vote!

The Atlantic Monthly is criminally misusing statistics

I spent the last week vacationing in South Carolina, where my parent’s house seems to have Atlantic Monthly’s and Harper’s from the dawn of time. What luck, then, that one of the most interesting articles (at least statistically) was in an issue as recent as the July/August 2008 issue of the Atlantic. The article is called “American Murder Mystery” and it’s by Hanna Rosin.


The article talks of the recent increase in violent crime in mid-sized cities. In many of these cities, government housing projects (called “Section 8” housing) have been torn down. In their place, the government has provided the poor with rent subsidies so that they can move to private housing. Rosin describes how Phyllis Betts and Richard Janikowski, of the University of Memphis, tie the increase in crime in these cities to the destruction of these projects. A striking quote in the article is from the Memphis police chief: ‘“It used to be the criminal element was more confined,” said Larry Godwin, the police chief. “Now it’s all spread out.”‘



The primary statistical evidence given in the article of an association between crime and former Section 8 residents, is a map that shows areas with high incidents of crime correspond to areas with a large number of people with Section 8 subsidies (i.e., former residents of housing projects). As convincing as this might sound, it has a fatal flaw: the map looks at total incidents rather than crime rate. This means that an area with 10,000 people and 100 crimes (and 100 Section 8 subsidy recipients) will look much worse than an area with 100 people and 1 crime (and 1 Section 8 subsidy recipient). However, both areas have the same rate of crime, and, presumably, the same odds of being a victim of crime (see my earlier blog about the safest place to live for some explanation of the use of rates in measuring crime). Yet in Betts and Janikowski’s analysis, the area with 10,000 people has a higher number Section 8 subsidy recipients and higher crime, thus “proving” their theory of association.


Of course, there will be both a greater number of Section 8 subsidy recipients and a greater number of crimes in the area with 10,000 people than in the area with 100 people . Thus, while the map presented in the Atlantic article does indeed seem to indicate that there is higher crime in areas where there are more Section 8 subsidies, this differential might be entirely an artifact of population density, and, in fact, the crime rate may be completely unrelated to where Section 8 subsidy recipients reside. Without an adjustment for population density, the inferences made from the association are statistically meaningless.

Statistics in Politics – Lies and Damn Lies

The nice thing about politicians and the newspaper columnists that write about them is that they lie a lot about statistics. That makes writing a blog that points out the errors easy to create. This weeks subject is David Brooks’ latest New York Times column.


Brooks takes issue with Obama’s claim that his fundraising is from a broad base of small donors, and goes on to compare Obama money raised to McCain money raised by special interest group. I am not going to attack the actual dollar figures that Brooks gives. He cites no sources whatsoever, so that makes them hard to attack anyway. Instead, I am going to show how presenting raw numbers without proper context creates a biased picture.


Let’s take Brooks’ first claim: He says “lawyers account for the biggest chunk of Democratic donations” and have donated $18 million, as compared to $5 million for McCain. This sounds like 1) Obama is getting most (“biggest chunk”) of his donations from one big special interest group (lawyers) and 2) Obama is getting 3 times as much of his donations from this group as McCain.


Here’s the problem: Obama has out raised McCain by more than 2 to 1. According to CBS News, Obama’s total amount raised is $295.5 million compared to McCain’s $121.9 million. Thus, the $18 million raised from lawyers represents only 6% of the money raised. Still a lot of money, but it puts the “biggest chunk” in context. McCain’s $5 million raised from lawyers, on the other hand, represents 4% of the total money he raised. Thus, Obama is getting more as a percentage from lawyers but instead of 18 to 5, or 3 times as much, it’s 6% to 4%, or 50% more. Another issue is that there is a difference between individuals who are lawyers and public interest groups for lawyers. Brooks is trying to blur those lines by grouping all their donations together (to be fair, he does not say “special interest groups”). Sure, a lot of lawyers certainly support some of the public interest groups, but others do not. Also, these groups can be at odds with one another, so grouping all lawyers together gives you the bigger number but is inaccurate.


Brooks goes on to compare several other groups of professions. In each of these areas, Obama receives more money in absolute dollars. However, in terms of percentage of total donations, McCain is usually always receiving more: from financial securities workers, McCain gets 36% more as a percent of his total; from real estate workers, McCain gets 94% more; from bank workers, McCain gets 82% more; from hedge fund workers, McCain gets 29% more; from medical/health care workers, McCain gets 4% more.


There are two other areas (in addition to lawyers) where Obama is receiving more in percentage terms. The first is “communications and electronics”, where Obama is getting 106% more in percentage terms. The second is “Professors and other people who work in education.” In that area, Obama gets a whopping 4 times as much as McCain as a percentage of total funds raised. Brooks implies that these are “part of a spontaneous movement of small-money enthusiasts,” but he doesn’t support that with any evidence showing that these groups are anything more than an unorganized group of individuals–all the polls have indicated that more educated people lean toward Obama, so why wouldn’t they give more?


The last thing that Brooks points out is that although, as Obama claims, 90% of his donors gave less than $200, only 45% of his donated money comes from such small donors. This is a good point, and Obama, who has been claiming this for awhile, should be called to the mat on it.


However, it would be more interesting to look at the percent of small donors and money from small donors in McCain’s campaign as a comparison. You can bet that it’s less than 45% of donated money and less than 90% of donors. Yet, a comment on Brooks’ article by a New Republic blogger puts it into context, pointing out that “31 percent of Bush’s money in 2004 came from donations of $200 or less (compared to 16 percent in 2000). Kerry, meanwhile, raised 37 percent…” (the blog sites this article on 2004 donations (by Joseph Graf) as its source). Thus, 45% is a lot, but the number has been increasing for both parties, with the most obvious reason that the Internet has allowed candidates to easily reach out to everyone, rather than raising most of their money through $1,000 a plate dinners and the like (campaign finance reform, which limits individual contributions, also had a role in bringing up the percentage raised through smaller donors).


The lesson here is, of course: “don’t believe the numbers.” David Brooks is going to make them look good for McCain–he’s a columnist, not a reporter–just as other columnists are going to make them look good for Obama.

Let’s make a deal problem


Those of us who grew up with the show Let’s Make a Deal can understand the gyst of the let’s make a deal problem right away. For those of you too young (or old) to remember, here is a summary.


Monty Hall, the host, allows you to choose one of three curtains.Behind one of the curtains is a new car or another big prize, while behind the other two is a year’s supply of shampoo or the equivalent.You choose Curtain 1.Monty opens Curtain 2 and shows you it has a year’s supply of the shampoo.Then he gives you a choice:


a)stick with your original decision, or

b)switch to Curtain 3.


The intuitive conclusion is that it doesn’t matter: there are two curtains remaining, and they are equally likely to contain the prize.However, in this case, the intuition is wrong.If we assume Monty


1) always shows a curtain with the shampoo behind it;

2) never reveals the curtain you chose; and

3) randomly decides which of the remaining two curtains to reveal if the curtain you chose contains the car,


then switching to Curtain 3 gives you a 2/3 chance of winning while sticking to Curtain 1 gives you a 1/3 chance of winning.




This problem, like many probability problems, is one of information.Initially, you have no information about any of the three curtains so each choice gives you a 1/3 chance of winning.By showing you the curtain with the shampoo, you have learned nothing new about the curtain you originally chose—because there was no way, whether your curtain had the car or the shampoo, that Monty was going to show you what was behind your curtain.Your curtain had, and still has (as far as you know), a 1/3 chance of containing the car.However, you did get information about Curtain 3: Monty did not choose to reveal it.This could mean one of two things:


A.Curtain 3 has the car, and therefore Monty had to show you Curtain 2, as he would never reveal the curtain with the car (1/3 chance, calculated by taking the 1/3 chance that Curtain 3 has the car and multiplying by the 100% chance that he reveals Curtain 2 when the car is behind Curtain 3); or


B.Curtain 1 has the car, and Monty chose to reveal Curtain 2 (1/6th chance, calculated by taking the 1/3 chance that curtain 1 has the car and multiplying by the ½ chance that Monty reveals curtain Number 2 when the car is behind Curtain 1).


These probabilities do not sum to 1, because we are excluding the outcomes, now impossible, where Monty reveals Curtain Number 3.In order to revise the probabilities to take into account what was revealed by Monty, we need to divide the probabilities in A (1/3) and B (1/6) above by the chances of the two possible remaining outcomes (1/3 plus 1/6 = 1/2).Thus, outcome A (car is behind Curtain 3) has a probability of(1/3) / (½)= 2/3, while outcome B (car is behind Curtain 1) has a probability of (1/6)/(1/2) = 1/3.


The intuition is as follows: Monty always reveals Curtain 2 or 3 when you choose 1, so you do not get any more information about whether it is behind 1 by this revelation, but you do gain information about 2 and 3 from this revelation, since he never reveals 3 if the car is behind it but does sometimes reveal 3 if the car is not behind it.Thus, the fact that Monty did not reveal Curtain 3 tells you something.


[Note: this problem has been around for awhile, but was made famous by Marilyn Vos Savant’s discussion of it and the subsequent outcry by those who insisted her answer, the correct one, was wrong.See, for example:]


Technical Explanation


There are a whole class of problems in probability that involve updating the chances based on new information.These problems are solved according to Bayes’ Rule, after a law in probability that specifies how to update probabilities with new information (for a full discussion, including discussion of whether the Reverend Bayes was actually the first to discover this theorem, see the Wikipedia entry:’_theorem).


To understand Bayes’ Rule, we need to first know the notation used for conditional probability.We use the vertical line ( | ) to denote a condition and, as in prior blogs, P(A) is the probability that event A occurs.Thus, P(A|R2) is the probability that A occurs, given that R2 already occurred.Bayes’ Rule is:


P(A|R2) = P(R2|A)*P(A) / P(R2)


So let:


A=event that prize is under Curtain 3

R2= event that Monty reveals the curtain 2 contents

C=event that prize is under Curtain 1


Now we can figure out the right side of the Bayes’ Rule equation, in order to figure out P(A| R2).


We know P(R2|A) = 1, because Monty won’t reveal curtain 3 when it contains the prize and he won’t reveal curtain 1 because you chose curtain 1.


P(A) = P(C) = 1/3 ==> remember, this one is unconditional, so given three curtains, there’s a 1/3 chance of the prize being behind each.


To figure out P(R2), it is useful to note that for any events R2 and A, P(R2 and A) = P(A) * P(R2|A)


In our case, the P(R2) is the sum of the probabilities of 2 exclusive events:


1)prize is under curtain 3 (event A) and Monty reveals curtain 2 (event R2): 1/3 * 1=1/3

2)prize is under curtain 1 (event C) and Monty reveals curtain 2 (event R2) 1/3*1/2 = 1/6.


This sum, 1/3 plus 1/6 is ½=P(R2).


Thus, by Bayes Rule, P(A| R2) = (1*1/3) / ½ = 2/3


Just for fun, now you can compute P(C|R2) = P(prize is under Curtain 1 given that Curtain 2 is revealed) = 1/3 using Bayes’ Rule.


False Positives in Cancer Diagnoses


The outcome of Bayes’ Rule can be very confusing, and is important to keep in mind in more important problems than the Let’s Make a Deal problem.For example, suppose an MRI for breast cancer has a false negative rate of 1/100, meaning that the test will incorrectly indicate that you do not have cancer when you in fact do 1 in 100 times.Similarly, the test might also have a false positive rate of 1 in 100, meaning that the test will incorrectly indicate that you do have cancer when in fact you do not 1 in 100 times (false positive rates for MRIs over time can be much higher, because they are frequently done once or twice a year: see the recent article about a study of false positives in MRIs for breast cancer screening, which were around 25% over time.


Suppose your MRI result just came out positive for breast cancer.What are the chances you actually have breast cancer?


First, it’s useful to know that around 250,000 women a year get breast cancer (see this site) and there are about 60 million women above the age of 40 (see census site), when most cases occur.This represents an annual infection rate of nearly 1 in 200.


Let’s define the probabilities:


P(C) = Probability of breast cancer in a given year = 1/200 = 0.005


P(D| not C) = Probability that MRI diagnosed cancer given that you do NOT have cancer = false positive = 1/100 =0.01


P(N|C) = Probability of MRI did not diagnose given that you have cancer = false negative = 1/100= .01


P(D|C) = 1-P(N|C) = Probability that MRI diagnosed cancer given that you have cancer = 99/100 =0.99


We want P(C|D) = Probability of cancer, given a cancer diagnosis by MRI.


Before using Bayes’ Rule, we can first define P(D) as the sum of the probabilities of all exclusive events that include D.In English, the chance of diagnosis is the sum of 1) the chance that you have cancer and are diagnosed and 2) the chance that you do not have cancer and are diagnosed.Thus, P(D) = P(D|C)*P(C) + P(D|not C)* P(not C) = 0.99*0.005 + 0.01* .995 =.0149


Using Bayes’ Rule:


P(have cancer given the MRI result shows cancer) = P(C|D) = P(D|C)*P(C)/ P(D) = 0.99 * .005 / .0149 = .33 or about 1/3.


Thus, a very effective MRI test for cancer, which gives the wrong result only 1% of the time, is still suspect when it gives a result of cancer.In fact, an MRI diagnosis of cancer indicates only a 1/3 probability of actually having cancer (keep in mind while there are indications that false positives I used here for the MRI are made up, though they do appear to be at least in the 1% range).


It’s easy to understand what happens logically when you imagine that 200 women come in for screening.Only 1 will probably have cancer, since the cancer rate is about 1 in 200.The MRI will almost surely diagnose her (99% chance).For the other 199, the MRI will indicate no cancer for all but about 1%, which means it will indicate cancer for about 2 of them.Thus, of the 3 cases where the MRI indicates cancer, 2 of them will be false indications.

Why are there too many boys in China?

For a long time now, the ratio of males to females in China has been increasing. In fact, one of the most recent articles I could find on it was from 2004, where the ratio stood at around 120 boys to every 100 girls (see the msnbc article).


It’s clear to most that the combination of the one child law, preventing most chinese couples from having more than one child, and the preference in China for boys, is driving this (though there are other explanations, including the possibilities of different effects of some diseases: see this business week article).


There are two sinister mechanisms for ensuring that your only child is a boy: selective abortion or infanticide. Yet there is another option: just have another baby if the first is a girl, and don’t tell the government. I think this third option is more likely, because I do not think most families can afford an abortion (illegal for sex selection) and very few mothers would kill their babies.


So how much does this non-reporting need to happen to change the ratio from the normal 106 to 100 male to female births to the abnormal 120 to 100?


The answer to this is the combination of three things: 1) percent of births that are girls (with no intervention), 2) percent of families that have another baby (hoping for a boy), given the first is a girl, and 3) the percent of families that do not tell the government about the first baby.


Lets call these percentages, Pg, P2, and Ps (for girl, 2nd child, and secret). Lets also call Pr the reported percent of girls, which is 100/220, or 45.45%. We’ll assume also for simplicity that families quit trying when they have a boy or have 2 children, whichever comes first. Also, we’ll assume families always report the first child if it is a boy or if they have no more children.


Pg is known at around 100/206=48.54%
P2 and Ps are unknown.


We want to figure out what P2 and Ps could lead to the Pr being 45.45% when Pg is 48.54%.


First, consider that, given the ground rules above, the following are the types of families that can exist (in birth order):
B (boy, one child only)
G (girl, one child only)
GB (girl boy, two children)
GG (girl girl, two children)


To figure out the percent of girls reported, we need the total girls reported divided by the total children reported. This is easy to figure out for each combination above:
B = 0 girls / 1 child
G = 1 Girl / 1 child
GB = 0 girls / 1 child Ps percent of the time and 1 girls / 2 children (1- Ps percent of the time)
GG = 1 girl / 1 child Ps percent of the time and 2 girls / 2 children (1-Ps percent of the time)


We are almost there. Now we just need to sum the numerators multiplied by their probabilities and the denominators multiplied by their probabilities. Here are the probabilities of each family combination:
B = 1-Pg
G = Pg*(1 – P2) ==> It’s just Pg times the percent of families who do not have more children
GB = Pg*(P2)*(1-Pg) It’s the chances of a girl, followed by the decision to have a 2nd, followed by having a boy.
GG = Pg*(P2)*Pg=Pg^2*P2


Thus the numerator (number of girls reported average is):
Num = (1-Pg)*0 +
Pg*(1-P2)*1 +
Pg*P2*(1-Pg)*Ps*0 +
Pg*P2*(1-Pg)*(1-Ps)*1 +
Pg^2*P2*Ps*1 +


and the denominator (number of children reported on average):
Den = (1-Pg)*1 +
Pg*(1-P2)*1 +
Pg*P2*(1-Pg)*Ps*1 +
Pg*P2*(1-Pg)*(1-Ps)*2 +
Pg^2*P2*Ps*1 +


We know that, in China, Pr= Num/Den = 45.45% and that, in general, Pg=48.54%. Thus, we can solve .4545=Num/Den in terms of Ps and P2.


Since we have 1 equations and 2 unknowns, there are an infinite number of solutions, but here are a few possibilities:
0% have a second child –impossible
10% have a second child — impossible
15% have a second child and 85% of those keep the first a secret from the government
20% have a second child and 65% of those keep the first a secret from the government
30% have a second child and 45% of those keep the first a secret from the government
40% have a second child and 35% of those keep the first a secret from the government
50% have a second child and 30% of those keep the first a secret from the government


One thing to note (that is not necessarily obvious in these calculations) is that if everyone reports all the children they have (Ps=0), then the percent of girls will be exactly 48.54%, the same as if everyone had one child, as long as infanticide and selective abortion are not occurring.


But the main point here is that a small number (15%) of couples having second children and not reporting the first girl leads to the warped percentages of baby girls, if there is high under-reporting of these first children. You do not need to assume that infanticide or selective abortion plays a role at all.

What’s the chance of rain?

Everyday probability barely shows up in weather forecasts these days. For example, Yahoo’s weather will say something like “a few showers” to mean that there is a chance of showers. However, if you are a purist, go to the National Weather Service (NWS) site, where they still make predictions using probabilities. See the Chicago forecast for this week in Yahoo and at the NWS site (the online NY times can barely be bothered to give any information at all, unless you can interpret their icons).


When it comes to data, I’ve always felt more is more, and so if I am really interested in the weather, I go to the NWS site where I’ll get more than just “chance of rain” or “a few showers.”


But what does it mean when the weather forecast says there is a 30% chance of rain Wednesday and a 50% chance of rain Thursday? If we focus on a single time period, say Thursday, the conclusion if pretty clear: there’s a 50-50 chance of rain. Put another way, when encountering conditions like this in the past, the NWS model data shows rain half the time and no rain half the time.


The inference becomes more difficult when we want to ask a more complex question. For example, suppose I’m going to Chicago Wednesday and returning Thursday night. I want to know whether to bring an umbrella. Since I hate lugging an umbrella along, I only want to bring one if there is at least a 75% chance of rain at some point while I’m there.


It turns out that the answer to this question cannot be determined with the information given (don’t you just love that choice on multiple choice tests?).


Before we explain why, though, we need some definitions and notation.
To do the math for this, we generally define each possible outcome as an event. In this case, we have the following events:
Event A: Rains Wednesday
Event B: Rains Thursday


We are interested in the chance that either Event A or Event B occurs. We have a shorthand for expressing the probability of Event A: “P(A)”.


There is a simple probability formula that is very useful here:
P(A or B) = P(A) + P(B) – P(A and B)
This formula says that the probability of Event A or Event B happening is the probability of A plus the probability of B minus the probability that A and B both happened (the event that A and B occurred is called the intersection of Events A and B). This makes sense because if we just added them (as you might intuitively do) we are double counting the times both events occur, and thus we need to subtract out the intersection once at the end.


In some cases P(A and B)=0. In other words, Events A and B never occur together. You may have noticed this comes up when you toss a coin: it is never both heads and tails at the same time (except for that time I got the coin to stand on its side). Events like these are called mutually exclusive.


In other cases P(A and B)=P(A)*P(B). This means the probability of A and B is the product of the probabilities A and B. In this case, the two events can both occur, but they have nothing to do with each other. Events like these are called independent events.


In still other cases, P(A and B) is neither P(A) + P(B) or P(A)*P(B).


If we assume the events A and B are mutually exclusive, then there’s an 80% chance (50+30) of rain either Wednesday or Thursday. This seems unlikely though, because most storms could last more than an evening.


If we assume the events A and B are independent, then there’s an 65% chance of rain either Wednesday or Thursday. This is a little more complicated to calculate, because we need to figure out the chances of it raining both Wed. and thursday, which we assume is independent and thus is P(A)*P(B)=30%*50%=15%. Thus:
P(A or B)=P(A) + P(B) – P(A and B)=50% + 30% -15%=65%.

[we could also figure out the chances of not raining either night. Since the chance of rain is independent, the chance of no rain is also independent. Also, the chance of rain plus the chance of no rain must be one. Thus P(no rain Wednesday)=1-P(A)=100%-30%=70%. Similarly  P(not B)=100%-50%=50%. Then, the chance of no rain either time period = P(not A and not B)=70%*50%=35%. Thus, there is a 35% chance it will not rain either night, and we can conclude there would be a 65% chance of rain one of the nights, of course all hinging on the independence assumption].


okay. So finally, we have two probabilities 80% and 65%, based on two different and rather extreme assumptions. On the high side, 80% is the most extreme. We can see this by seeing that in order to get a larger number, we’d have to plug in a negative probability for the value of P(A and B) in the general formula (which does not assume independence or anything else):
P(A or B) = P(A) + P(B) – P(A and B)


Since probabilities must always be at least 0 and at most 100%, we cannot have a negative number for P(A and B). So at most, the chance of rain Wednesday or Thursday is 80%.


But what about the least the chance might be? Independence seems a pretty extreme assumption in the other direction, but in fact it is not. What would lead to the smallest probability is if the two events A and B were highly related–in fact so related that P(B)=1 if A occurs. This would mean the P(A and B)=30% (the smaller of P(A) and P(B)). This would lead to a probability that it rains either Wednesday or Thursday of just 50%:
P(A or B) = P(A) + P(B) – P(A and B) = 30% + 50% – 30% = 50%


So now that we’ve got rain down, let me go back to the original impetus for this blog: it is easy to make the wrong inference when given information about the chances of a series of events.


The recent Freakonomics blog about chances of birth defects addresses this issue. In it, Steven Levitt describes a couple who was told that there was a 1 in 10 chance that a test, which showed an embryo was not viable, was wrong. The test was done twice on each of two embryos, and all four times the outcome was that the embryos were not viable. Thus, the lab told them that the chances of having healthy twins from two such embryos was 1 in 10,000. Of course, after reading about rain above, you recognize this as the application of the independence assumption (1/10 times 1/10 times 1/10 times 1/10 equals 1 in 10,000). The couple didn’t listen to the lab though, and, nine months later, 2 very viable and very healthy babies were born.


Post hoc, it seems the lab should have (at least) said the chances were somewhere between 1 in 10 and 1 in 10,000. In addition, the 1 in 10 seems like an awfully round number–could it have been rounded down from some larger probability (1 in 8, 1 in 5, 1 in 3, who knows?). Levitt wonders whether the whole test is just nonsense in the first place.


So what do you do when confronted with a critical medical problem and a slew of probabilities? There’s no easy answer, of course, but I believe gathering as much hard data as possible is important. Then make sure you distinguish between the inferences made by your doctor, nurse, or lab technician (which are more subject to error) and the underlying probabilities associated with the drug, the test, or the procedure (which are less subject to error).

Stocks or Bonds?

A couple of posts ago, I talked about the question of whether to rent or buy (the answer: in the long run, buy; in the short run, rent). With all the turmoil in the financial market, it seems a reasonable time to visit the question of whether stocks or bonds are a better investment.


If you look at data from 1929-2007 for the Dow Jones Industrial Average (DJIA), a 20 year investment in stocks yielded an inflation-adjusted return of about 2.6% annually. This is before taking dividends into account, which add at least a couple percent (recently, the dividend yield has been closer to 2% while in the past 4-5% was the norm, see this article for relevant charts). The net return for stocks, after inflation, is around 6% annually.


For bonds, the returns also generally beat inflation, but are not as good. Their average is around 2% annually, after subtracting out inflation.


The following chart shows the DJIA inflation adjusted for 10 and 20 year investments (in blue and purple, downloaded from Yahoo Finance), versus Treasury bonds (in yellow). The US Treasury Bond return is based on 20-year bonds when available from the Fed and 10 year bonds or estimating using the article sited above when 20-year returns were not available.


The year shown is the final year of the investment. Thus, if you made a 20 year investment beginning in 1985, you can look at the points corresponding to 2005 to find out that you would have earned approximately 10% annually, whether it was in bonds (yellow) or stocks (purple for 20 year) by the start of 2005, and that is after inflation.


[Click on the graph see it in higher resolution]


The yellow line, denoting Treasury bond returns, is mostly below the blue and purple lines, indicating that, for most years, a 10 or 20 year investment in the Dow Jones Index is better than a similar investment in Treasury bonds. In fact, for 57 of the 68 ten-year investments, stocks do better. For 20 year investments, the numbers are even more promising for stocks, which perform better for 56 out of 57 20-year investments.


Treasury Bonds, on the other hand, are considered risk-free. The idea is that there is no risk that the U.S. Treasury will default on its loan (or, at least, if it does default, there are far more serious problems to worry about). On the other hand, with stocks, there are no guarantees that they will not go down.


This basic idea, that stocks have broadly out-performed bonds and beaten inflation in the long run, seems to be well-understood. This fact, however, does not imply that individual stock investments will outperform bonds, or that a shorter term investment will outperform.


The “truth” about stocks that is implied by this graph, however, is a little deceiving, for three reasons.


1. The graph tells you little about what might happen to a 20-year investment beginning in a particular year


The graph shows that for most time periods, stocks do well, but there is an enormous amount of variation, even for the long-term investments considered here. If you happened to make some long-term investments in 1962, at the end of a recession and the beginning of a long boom, you’d still be out of luck if you needed that money in 1982, when its real value would have been far less than when you invested it (the pink point corresponding to 1982 shows a 1% annual loss for the 20 prior years, amounting to about an 18% total loss in real dollars).


[The “safe” Treasury bond portfolio, however, would have done far worse, losing about twice as much over the same period. This is not because the US defaulted on its bonds. Instead, it’s because of inflation. The 20-year bond you bought in 1962, yielding 4%, did not keep up with inflation. This is the long-term, somewhat hidden risk for bonds.]


2. The graph averages over a portfolio of stocks


The other issue is that despite the graphs very clear implication that stocks are better, even in bad times, this does not necessarily imply that individual stocks are better. Returns on an individual stock, or even on a small portfolio of stocks, vary much more wildy than the Dow Jones average shown in the graph. Also, in recent years, the Treasury has issued inflation-indexed bonds, which guarantee a real return above 0, thus insuring the yellow line in a future graph will be more than 0 (see information on inflation-indexed bonds).


3. The future is not now


While it seems convincing that 80 years of history show stocks in a very positive light, we only need to look at Japan, whose Nikkei average, since 1985, has lost about 15% adjusting for inflation (it’s far more than 50% for an investment made near Japan’s stock market peak). There are some indications that the current U.S. problems (real estate boom and bust, credit problems) are worse than Japan’s. Much of U.S. investment in the last few decades has been fueled by the safety of the dollar. The rise of the Euro and of globalization has already begun to change that, and a continuing fall in the dollar will almost certainly cause inflation, which was devastating to the stock market in the 1970’s.


Bottom Line
There’s a lot of evidence that, in the past, a long-term stock investment paid off, relative to both risk-free bonds and inflation. However, there is no guarantee this party will continue, especially if there is a sea-change in dollar investments.


Where’s my retirement money? Almost all in stocks…but almost all foreign stocks.