(I wrote this paper in 2007 for a Statistics class I took while trying to do a Ph.D. I am sharing it here for posterity.)
McNemar’s test is a non-parametric method used on nominal data to determine whether the row and column marginal frequencies are equal. It is applied to 2×2 contingency tables with a dichotomous trait with matched pairs of subjects.
Simpson’s paradox is a statistical paradox in which the successes of several groups seem to be reversed when the groups are combined. This seemingly impossible result is encountered often in social science statistics and occurs when a weighting variable, which is not relevant to the individual group assessment, must be used in the combined assessment.
The paper evaluates the potential effect of Simpson’s paradox in McNemar’s test results and conclusions.
Theory
McNemar’s test for the significance of changes Named after Quinn McNemar, who introduced it in 1947. The McNemar test can be introduced as a variation of the sign test for the case when the data is nominal, and thus can be expressed as “0’s” (zeroes) and “1’s” (ones).[1] In a study of N subjects (i=1..N), where the effect of some treatment on a characteristic of the subject results in values represented by (Xi, Yi), each the result of treatments X and Y on subject i. We can say the set of values Xi and the set of values Yi constitute two paired samples, where each Xi and Yi can only take a value of 0 or 1.Contingency table
The data can be presented in a 2 x 2 contingency table with the form: Table 1. 2×2 contingency table| Yi = 0 | Yi = 1 | |
| Xi = 0 | A = number of (0,0) | B = number of (0,1) |
| Xi = 1 | C = number of (1,0) | D = number of (1,1) |
Assumptions
The data consists of the characteristics resulting from treatments on N randomly selected subjects, denoted as (Xi, Yi). The pairs (Xi, Yi) are mutually independent and the measurement scale is nominal. This results in four possible categories presented above as (0,0), (0,1), (1,0), and (1,1).Hypotheses
We want to test the hypothesis that the treatments make a difference in the incidence of the characteristic, thus the null hypothesis will state that the treatment does not change the incidence of the characteristic or, what is the same, that the incidence of the characteristic is the same for both treatments[2]. Thus, we test:H0: P(Xi = 0) = P(Yi = 0), or [1]
H0: P(Xi = 1) = P(Yi = 1)
Expressed in terms of proportions[3] the null hypothesis is:H0: p1 = p2 [2]
In summary, all possible hypotheses expressed in proportions are: H0: p1 = p2 H0: p1 ≥ p2 H0: p1 ≤ p2 [3] H1: p1 ≠ p2 H1: p1 < p2 H1: p1 > p2Test statistic
Since we are testing for p1 = p2, we can re-write to test for p1 – p2 = 0. Using the equivalence between [1] and [2] and the values in Table 1, we say:p1={A+B}/{N} [4]
p2={A+C}/{N} [5]
Thereforep1 – p2={B-C}/{N}
McNemar showed that B-C ~ N(0,sqrt{B+C}), when B+C>10, and then the appropriate test statistic is:Z={B-C}/{sqrt{B+C}} [6]
As Conover (1999) points out the two-tailed test of Z is comparable (for big enough values of B+C) to the one-tailed test of Z2, using a chi-squared distribution with 1 degree of freedom.Decision rule
For the two-sided test, the p-value is two times the probability of finding a Z greater than the Z found. We reject the null hypothesis if the p-value is less than the level of significance desired. For the one-sided test, the p-value is the probability of finding a Z greater than the Z found. We reject the null hypothesis if the p-value is less than the level of significance desired.Simpson’s Paradox
Simpson’s paradox is the common name for a situation that may occur when two populations are analyzed with respect to the frequency of some characteristic: if the populations are separated into two categories, the population with higher frequency might show a lower frequency within each category. The paradox arises when the following counter-intuitive relationships are true:a/b < A/B
c/d < C/D, and [7]
(a+c)/(c+d) > (A+C)/(B+D)
A simple illustrative example (adapted from Shapiro, 1982): Table of the success rate of two treatments on men, women, and both:Table 2. Paradox example
| Men | Women | Both Sexes | |
| Treatment 1 | 60/80=0.75 | 40/120=0.33 | 100/200=0.50 |
| Treatment 2 | 100/150=0.66 | 10/40=0.25 | 110/190=0.58 |
Table 3. 2x2x2 contingency table
| CB | CB̅ | C̅B | C̅B̅ | |
| A | a | c | e | g |
| A̅ | b | d | f | h |
Where a+b+c+d+e+f+g+h=1
In the example given earlier, if A is treatment 1, B is men and C is success then a is 60, b is 100 and so forth. According to Bartlett, as cited by Simpson, the condition for a zero second-order interaction is:{ad}/{bc}={eh}/{fg} [8]
Which in the example given is true. According to Simpson, the second condition, assuming zero second-order interaction, is that the collapsed variable, “sex”, is independent of treatment for both success or failure, or that it is independent of success for both treatments. Mathematically, the condition is:af=be , or ag=ce [9]
Which in the example given are not true.A practical example
Wardrop (1995) studies the effect of Simpson’s paradox on the perception of the existence of the “hot hand” in basketball: the fans believe that making a shot will influence a player to make the following shot. He tests the player’s and the overall shooting data (two consecutive free throws) using the McNemar test and finds that the overall results support the “hot hand” leading the fans to believe in it even though the results for individual players might not indicate the same. Data is as follows:Table 4. Hot-hand summary
| Larry Bird | Rick Robey | Total | |||||||||||
| Second shot | Second shot | Second shot | |||||||||||
| First shot | Hit | Miss | Tot | First shot | Hit | Miss | Tot | First shot | Hit | Miss | Tot | ||
| Hit | 251 | 34 | 285 | Hit | 54 | 37 | 91 | Hit | 305 | 71 | 376 | ||
| Miss | 48 | 5 | 53 | Miss | 49 | 31 | 80 | Miss | 97 | 36 | 133 | ||
| Tot | 299 | 39 | 338 | Tot | 103 | 68 | 171 | Tot | 402 | 107 | 509 | ||
Table 5. Test results
| Larry Bird | Rick Robey | Total | |
| phh | 0.881 | 0.593 | 0.881 |
| pmh | 0.906 | 0.613 | 0.729 |
| p-value | 0.061 | 0.098 | 0.022 |
All of this ties back to twenty years of building models, one at a time.
Things that I use, like, and am affiliated with:
Mint Mobile offers great cell phone service for $15 flat, get $15 off using the link. Get discounted phones with service activation and no contract.
I never spend money before I check Mr Rebates or Rakuten to get cashbacks, rebates, discounts, coupons or cheaper gift cards.
