Simson's Paradox
Essay by review • June 20, 2011 • Essay • 395 Words (2 Pages) • 1,160 Views
Simpson's paradox
Simpson's paradox is a statistical paradox described by E. H. Simpson in 1951, in which the accomplishments of several groups seem to be reversed with the groups are combined.
It's a well accepted rule of thumb that the larger the data set, the more reliable the conclusions drawn. Simpson' paradox, however, slams a hammer down on the rule and the result is a good deal worse than a sore thumb. Unfortunately Simpson's paradox demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. Unfortunately, the conclusions from the large set are also usually wrong.
Here is an example. Suppose two people, Ann and Bob, who are let loose on Wikipedia. In the first test, Ann improves 60 percent of the articles she edits while Bob improves 90 percent of the articles he edits. In the second test, Ann improves just 10 percent of the articles she edits while Bob improves 30 percent.
Both times, Bob improved a much higher percentage of articles than Ann - yet when the two tests are combined, Ann has improved a much higher percentage than Bob!
The result comes about this way: In the first test, Ann edits 100 articles, improving 60 of them, while Bob edits just 10 articles, improving 9 of them. In the second test, Ann edits only 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them. When the two tests are added together, both edited 110 articles, yet Ann improved 69 of them (63 percent) while Bob improved only 40 of them (36 percent)!
Simpson's Paradox is caused by a combination of a lurking variable and data from unequal sized groups being combined into a single data set. The unequal group sizes, in the presence of a lurking variable, can weight the results incorrectly. This can lead to seriously flawed conclusions. The obvious way to prevent it is to not combine data sets of different sizes from diverse sources. Simpson's Paradox will generally not be a problem in a well designed experiment or survey if possible lurking variables are identified ahead of time and properly controlled. This includes eliminating them, holding them constant for all groups or making them part of the study.
...
...