In my professional work, I like being the statistical sleuth, trying to figure out whether a person or company cheated, and how much they cheated. Thus it was with a lot of interest that I read a recent article in USA Today describing suspicious activity that went on some standardized tests in DC schools.
It seems that standardized tests at certain DC schools have improved dramatically. For example, the article says, “in 2008, 84% of fourth-grade math students were listed as proficient or advanced, up from 22% for the previous fourth-grade class.” Of course, this could just be part of the amazing turn around.
However, the review found that this dramatic change corresponded with a another interesting statistic: the school had a very high number of erased answers that were changed from wrong answers to right answers (WTR erasures). Again, here’s what the article said: “On the 2009 reading test, for example, seventh-graders in one Noyes classroom averaged 12.7 wrong-to-right erasures per student on answer sheets; the average for seventh-graders in all D.C. schools on that test was less than 1. The odds are better for winning the Powerball grand prize than having that many erasures by chance.”
Here’s my problem with this logic: the calculation of the chances assumes that each student is acting independently and erasing much more than usual. In other words, the chances are calculated assuming that the students are randomly grouped by school with respect to the number of WTR erasures they have, and thus no school should have a particularly high or low number of erasures: number of erasures and the associated school would be statistically independent.
This statistical independence assumption falls apart if there is cheating, wherein teachers erase wrong answers and change them to correct answers after the test is completed. However, the statistical independence assumption also could also fall apart for innocuous reasons.
Suppose the students at this school were instructed to arbitrarily fill in the last 10 questions immediately upon beginning the exam (this might be a good strategy if there is no penalty for guessing and if many students do not finish the exam). Then, for the ones who get to the end of the test, they are erasing most of their guesses. This is a completely legitimate strategy, but it would make raise the number of WTR erasures a great deal. A lot of more complicated test taking strategies would also lead to more erasures, and if this school in particular taught those strategies, there would be a very high chance that there would be far more erasures at this school than at others, and some of the people interviewed cited strategies that may have led to more erasures.
Thus, the high erasure rate, even WTR erasures, may have a relatively simple explanation: this school effectively coached the kids in test taking while other schools did not or coached the children differently.
The article provides a link to several documents summarizing the results of the analysis. What I find interesting is that the worst school, BS Monroe ES, in terms of WTR erasures, also has a lot of WTW (wrong to wrong erasures). On average, this school has about between 2 and 3 WTW erasures per student, or about 1 WTW for every 5 WTR erasures. A more interesting, and I think more revealing, analysis would be to see how this ratio compares to the normal ratio. If the normal ratio is 1 WTW to 5 WTR, it indicates cheating may not have been the reason for the erasures (unless the cheaters were purposefully erasing some and changing them to wrong answers–which seems unliklely since there is no indication potential cheaters realized erasures could be detected at all). If the general ratio is far from 5 to 1, it would be another indicator of a different process going on at BS Monroe ES, perhaps involving cheating though it is still hard to rule out other, innocuous explanations that involve test-taking strategy.
Another analysis would be to look at the WTR vs. WTW erasures student by student. Presumably, students who answered a higher percentage of un-erased problems correctly would have a better ratio of WTR to WTW erasures. If that were not true, then it would lead more clearly to the conclusion that someone else was doing the erasing.
The research revealed in the article shows the correlation of two things: a dramatic increase in test scores and a dramatic number of WTR erasures. Cheating is one explanation for these increases. Another, however, is the implementation of a smart test-taking strategy at the school, which might well be part of an overall program to increase the test scores and improve the school. A statistical test can have a seemingly dramatic result (less likely than winning the lottery), but while defeating a specific hypothesis (independence of erasures by school), it doesn’t necessarily prove another hypothesis (cheating).