Tipultech logo

Modified Bonferroni Adjustments

Author: Dr Simon Moss

Overview

Modified Bonferroni tests refer to a set of procedures that researchers and statisticians sometimes use when they need to conduct many statistical tests, all of which correspond to an overlapping hypothesis. In these instances, some researchers apply a Bonferroni adjustment to ensure the probability of Type I errors does not exceed .05. This approach, however, is sometimes regarded as unnecessarily conservative, reducing power appreciably. The modified Bonferroni tests are intended to increase power, while ensuring the probability of Type I errors does not exceed .05.

The original Bonferroni test

Need to control the Type I error rate

To demonstrate the significance of Bonferroni adjustments, consider the following example. A researcher wants to examine whether personality is related to intelligence. Participants complete a personality inventory that measures five traits: extraversion, neuroticism, conscientiousness, agreeableness, and openness. In addition, participants complete three measures of intelligence: a test of verbal ability, a test of numerical ability, and a test of abstract ability. The correlation between each personality trait and each ability test appears in the following table.


Extraversion

Neuroticism

Conscientious

Agreeable

Openness

Verbal ability

r = .17

p = .10

r = .07

p = .29

r = .33

p = .004

r = .05

p = .56

r = .16

p = .15

Numerical ability

r = .18

p = .08

r = .47

p = .002

r = .20

p = .09

r = .21

p = .09

r = .06

p = .35

Abstract ability

r = .29

p = .04

r = .04

p = .57

r = .28

p = .04

r = .15

p = .15

r = .07

p = .30

In this instance, the p value for 4 of the correlations is less than .05 and thus significant. The researcher will, therefore, conclude that personality is related to ability.

The problem, however, is that 15 tests have been conducted to assess the same hypothesis: that personality is related to ability. To highlight this problem, consider the following rationale:

The Bonferroni adjustment

To override this problem, some researchers apply a Bonferroni adjustment. Specifically, they change the level of alpha. In particular, the level of alpha they apply to each test is equal to the original level of alpha divided by the number of tests. In this instance, for example:

The Bonferroni adjustment ensures the probability that at least one of the 15 tests could generate a Type I error is approximately .05. Hence, the Bonferroni adjustment ensures the probability that researchers falsely conclude that personality is related to ability would not exceed .05.

Nevertheless, the Bonferroni adjustment appreciably reduces power-that is, the adjustment diminishes the likelihood that individuals will generate significant results when the variables are indeed related in the population.

Several scholars have recommended refinements to the Bonferroni adjustment that redress this problem. These modified versions are more powerful.

Modified Bonferroni tests

The Holm procedure

Holm (1979) introduced variant of the Bonferroni adjustment that is often applied by researchers. To conduct this procedure, researchers first arrange the p values from lowest to highest, as shown below

Position in sequence

P

Correlation and p value

1

p = .002

Neuroticism and numerical ability, r = .47

2

p = .004

Conscientiousness and verbal ability, r = .33

3

p = .04

Extraversion and abstract ability, r = .29

4

p = .04

Conscientiousness and abstract ability, r = .28

5

P = .08

Extraversion and numerical ability, r = .18

6

p = .09

Agreeableness and numerical ability, r = .21

7

p = .09

Conscientiousness and numerical ability, r = .20

8

p = .10

Extraversion and verbal ability, r = .17

9

p = .15

Openness and verbal ability, r = .16

10

p = .15

Agreeableness and abstract ability, r = .15

11

p = .29

Neuroticism and verbal ability, r = .07

12

p = .30

Openness and abstract ability, r = .07

13

p = .35

Openness and numerical ability, r = .06

14

p = .56

Agreeableness and verbal ability, r = .05

15

p = .57

Neuroticism and abstract ability, r = .04

For this procedure-and indeed for all the modified Bonferroni tests-the adjusted alpha is different for each p value. In particular, the alpha at each position equals alpha divided by (the number of tests - position in the sequence + 1). For example, in this instance:

Adjusted alpha

P

Correlation and p value

0.0033

p = .002

Neuroticism and numerical ability, r = .47

0.0036

p = .004

Conscientiousness and verbal ability, r = .33

0.0038

p = .04

Extraversion and abstract ability, r = .29

0.0042

p = .04

Conscientiousness and abstract ability, r = .28

0.0045

P = .08

Extraversion and numerical ability, r = .18

0.0050

p = .09

Agreeableness and numerical ability, r = .21

0.0056

p = .09

Conscientiousness and numerical ability, r = .20

0.0063

p = .10

Extraversion and verbal ability, r = .17

0.0071

p = .15

Openness and verbal ability, r = .16

0.0083

P = .15

Agreeableness and abstract ability, r = .15

0.0100

P = .29

Neuroticism and verbal ability, r = .07

0.0125

P = .30

Openness and abstract ability, r = .07

0.0167

P = .35

Openness and numerical ability, r = .06

0.0250

P = .56

Agreeableness and verbal ability, r = .05

0.0500

P = .57

Neuroticism and abstract ability, r = .04

Finally, the researcher, beginning at the top of this table, scans the rows-and stops when the p value exceeds the adjusted alpha. In this instance, the researcher would scan until they reach the second row-in which case .004 exceeds .0036. Hence, the researcher would conclude that only one of the relationships is significant: the relationship between neuroticism and numerical ability.

The Holland-Copenhaver procedure

The Holland-Copenhaver procedure, developed by three two authors in 1987, is similar to the Holm procedure. However, Holland and Copenhaver applied the inequality derived by Sidak (1987) to justify their criteria.

Again, to conduct this procedure, researchers first arrange the p values from lowest to highest. Next, to compute the adjusted alpha for each p value, they apply a more complex formula. To apply this formula, they first compute the equation the number of tests - position in the sequence + 1. We will designate the answer the "operant". Next, they compute (1-alpha) to the power of this operant. Finally, the subtract the answer from 1.

To illustrate, in this instance:

Adjusted alpha

P

Correlation and p value

0.0034

p = .002

Neuroticism and numerical ability, r = .47

0.0037

p = .004

Conscientiousness and verbal ability, r = .33

0.0039

p = .04

Extraversion and abstract ability, r = .29

0.0043

p = .04

Conscientiousness and abstract ability, r = .28

0.0047

P = .08

Extraversion and numerical ability, r = .18

0.0051

p = .09

Agreeableness and numerical ability, r = .21

0.0057

p = .09

Conscientiousness and numerical ability, r = .20

0.0064

p = .10

Extraversion and verbal ability, r = .17

0.0073

p = .15

Openness and verbal ability, r = .16

0.0085

p = .15

Agreeableness and abstract ability, r = .15

0.0102

p = .29

Neuroticism and verbal ability, r = .07

0.0127

p = .30

Openness and abstract ability, r = .07

0.0170

p = .35

Openness and numerical ability, r = .06

0.0253

p = .56

Agreeableness and verbal ability, r = .05

0.0500

p = .57

Neuroticism and abstract ability, r = .04

Finally, like the Holm procedure, the researcher, beginning at the top of this table, scans the rows. The researcher then stops when the p value exceeds the adjusted alpha. In this instance, the researcher would scan until they reach the second row-in which case .004 exceeds .0036. Again, the researcher would conclude that only one of the relationships is significant: the relationship between neuroticism and numerical ability.

The technique ensures the likelihood of Type I errors does not exceed .05, provided a specific condition is fulfilled: the test statistics must be positive orthant dependent. According to Holland and Copenhaver (1987), this assumption tends to be satisfied in most practical settings.

The Hochberg procedure

The Hochberg procedure, promulgated by Hochberg in 1988, is derived from the Simes (1986) equation. This procedure is almost identical to the Holm procedure, apart from one key difference: researchers scan the table, beginning at the bottom, not the top, and discontinue as soon as the p value is less than is the adjusted alpha. To illustrate, consider the adjusted values of alpha, as calculated in the context of the Holm procedure.

Adjusted alpha

P

Correlation and p value

0.0033

p = .002

Neuroticism and numerical ability, r = .47

0.0036

p = .004

Conscientiousness and verbal ability, r = .33

0.0038

p = .04

Extraversion and abstract ability, r = .29

0.0042

p = .04

Conscientiousness and abstract ability, r = .28

0.0045

P = .08

Extraversion and numerical ability, r = .18

0.0050

p = .09

Agreeableness and numerical ability, r = .21

0.0056

p = .09

Conscientiousness and numerical ability, r = .20

0.0063

p = .10

Extraversion and verbal ability, r = .17

0.0071

p = .15

Openness and verbal ability, r = .16

0.0083

P = .15

Agreeableness and abstract ability, r = .15

0.0100

P = .29

Neuroticism and verbal ability, r = .07

0.0125

P = .30

Openness and abstract ability, r = .07

0.0167

P = .35

Openness and numerical ability, r = .06

0.0250

P = .56

Agreeableness and verbal ability, r = .05

0.0500

P = .57

Neuroticism and abstract ability, r = .04

In this instance, the researcher scans the table, beginning at the bottom, and discontinuing at the top row-the first row in which the p value is less than is the adjusted alpha. This row onwards represents significant p values, which includes only one significant finding in this instance. Because researchers begin at the bottom, this technique is regarded as a step-up, rather than step-down, procedure.

Evaluation of the modified Bonferroni procedures

Many studies have compared these modified Bonferroni procedures to each other or to the original Bonferroni adjustment. Olejnik, Supattathum, and Huberty (1997), for example, compared five modified Bonferroni procedures: the Holm, Holland-Copenhaver, and Hochberg procedures as well as two other more complex variants: the Rom procedure (Rom, 1990) and the Hommel procedure (Hommel, 1998). These procedures are appreciably better than is the Bonferroni adjustment in detecting two or more significant effects. However, power does not vary appreciably across the five modified techniques. The Holm or Hochberg alternatives, which are simple and do not assume the test statistics must be positive orthant dependent, are thus recommended. The complex techniques might be applicable if researchers can access computer algorithms to compute the adjusted alpha levels.

References

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289-300.

Benjamini, Y., & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational & Behavioral Statistics, 25, 60-83.

Benjamini, Y., & Liu, W. (1999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference, 82, 163-170.

Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1152-1175.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Drigalenko, E. I., & Elston, R. C. (1997). False discoveries in genome scanning. Genetic Epidemiology, 14, 779-784.

Einot, I., & Gabriel, K. R. (1975). A study of the powers of several methods of multiple comparisons. Journal of the American Statistical Association, 70, 574-583.

Dunnett, C. W. & Tamhane, A. C. (1992). A step-up multiple test procedure. Journal of the American Statistical Association, 87, 162-170.

Einot, I., & Gabriel, K. R. (1975). A study of the powers of several methods of multiple comparisons. Journal of the American Statistical Association, 70, 574-583.

Halperin, M., Lan, K. K., & Hamdy, M. I. (1988). Some implications of an alternative definition of the multiple comparison problem. Biometrika, 75, 773-778.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Hochberg, Y., & Benjamini, Y. (1990). More powerful procedures for multiple significance testing. Statistics in Medicine, 9, 811-818.

Holland, B. (1991). On the application of three modified Bonferroni procedures to pairwise multiple comparisons in balanced repeated measures designs. Computational Statistics Quarterly, 3, 219-231.

Holland, B., & Cheung, S. H. (2002). Family size robustness criteria for multiple comparison procedures. Journal of the Royal Statistical Society, B, 54, 63-77.

Holland, B. S., & Copenhaver, M. D. (1987). An improved sequentially rejective Bonferroni test procedure. Biometrics, 43, 417-423.

Holland, B. S., & Copenhaver, M. D. (1988). Improved Bonferroni-type multiple testing procedures. Psychological Bulletin, 104, 145-149.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65-70.

Hommel, G. (1989). A comparison of two modified Bonferroni procedures. Biometrika, 76, 624-625.

Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27, 179-182.

Keppel, G. (1991). Design and analysis: A researcher's handbook (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.

Keselman, H. J., Cribbie, R., & Holland, B. (1999). The pairwise multiple comparison multiplicity problem: An alternative approach to familywise/comparisonwise Type I error control. Psychological Methods, 4, 58-69.

Keselman, H. J., Cribbie, R., & Holland, B. (2002). Controlling the rate of Type I error over a large set of statistical tests. British Journal of Mathematical and Statistical Psychology, 55, 27-39.

Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth.

Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational & Behavioral Statistics, 22, 389-406.

Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika, 77, 663-665.

Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology (Cambridge, Mass.), 1, 43-46.

Ryan, T. A. (1959). Multiple comparisons in psychological research. Psychological Bulletin, 56, 26-47.

Sarkar, S. K., & Chang, C. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association, 92, 1601-1608.

Saville, D. J. (1990). Multiple comparison procedures: The practical solution. American Statistician, 44, 174-180.

Schippman, J. S., & Prien, E. P. (1986). Psychometric evaluation of an integrated assessment procedure. Psychological Reports, 59, 111-122.

Shaffer, J. P. (1995). Multiple hypothesis testing: A review. Annual Review of Psychology, 46, 561-584.

Sid?k, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62, 623-633.

Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751-754.

Toothaker, L. E. (1991). Multiple comparisons for researchers. Newbury, CA: Sage.

Westfall, P. H., & Young, S. S. (1993). On adjusting p-values for multiplicity. Biometrics, 49, 941-945.

Williams, V. S., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational & Behavioral Statistics, 24, 42-69.

Wilson, W. (1962). A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychological Bulletin, 59, 296-300.

Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics, 48, 1005-1013.

Yekutieli, D., & Benjamini, Y. (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82, 171-196.



Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.





Last Update: 6/22/2016