The majority of scientific papers in psychology report tests of significance--z, t, and F values, for example. Generally, if these values are high, the researchers can conclude the groups differ significantly from one another or the variables are significantly related to one another.
Nevertheless, a high z, t, or F value does not necessarily imply the difference between the groups--or the relationship between these variables--is large. That is, a high z, t, or F value can be generated when the difference between the groups, or the relationship between these variables, is small, provided the sample size is large. That is, a high z, t, or F value merely indicates the researcher can be quite certain the groups differ from each other or the variables are highly related to each other.
In contrast, measures of effect size, such as the Cohen d value, represent the extent to which the groups differ from one another or the degree to which the variables are related (Cohen, 1965). To illustrate, suppose that researchers want to examine whether males or females have acquired more Facebook friends. The d value merely equals the difference between the mean of each gender divided by the standard deviation within each sex--technically, the pooled standard deviation.
If the d value, sometimes called delta, is approximately .20, the effect size is regarded as small. That is, researchers would conclude that any difference between the genders is not especially pronounced, almost imperceptible to the average person. If the d value is approximately .50, the effect size is regarded as medium. Specifically, researchers would conclude that perhaps the difference between the genders is not modest, obvious to experts but probably not to a layperson. Finally, if the d value is approximately .80, the effect size is regarded as large. In particular, researchers would conclude that perhaps the difference between the genders is conspicuous, clear to almost anyone.
Unfortunately, this d value is not especially robust. That is, almost trivial changes in the population can generate major differences in the Cohen d value. This article presents a variant, developed by Algina, Keselman, and Penfield (2005). To compute this d value, researchers should trim the lowest and highest 20% of values in each group, before applying the formula developed by Cohen, and finally multiplying the answer by .642. This measure is suitable whenever researchers need to compare two independent groups or conditions.
Algina, Keselman, and Penfield (2005) described a technique to compute a robust variant of delta. First, researchers should trim the highest 20%, called Windsorizing. That is:
Second, researchers should trim or Windsorize the lowest 20% of values, using the same method. In particular:
Third, apply the usual formula, developed by Cohen, to these trimmed or Windsorized scores.
As demonstrated by Algina, Keselman, and Penfield (2005), this variant of delta is more robust. Trivial changes in the population or sample can generate major shifts in this index. This problem, according to Algina, Keselman, and Penfield (2005), arises because the index is too dependent on variations in the tails of each distribution. As a consequence, deviations from a normal distribution can affect the tails and bias this measure appreciably. Windsorizing overcomes this problem.
To interpret the magnitude of this effect size, the guidelines that were developed by Cohen should be applied in this instance as well. That is, values of 0.2, 0.5, and 0.8 represent small, medium, and large effects respectively.
Second, Algina, Keselman, and Penfield (2005) recommend that researchers can use Bootstrap procedures to estimate confidence intervals. In their research, these Bootstrap procedures were very effective.
Many alternative indices of effect size are applicable, even when researchers merely want to compare two groups, such as males and females, on some measure. Many of these measures are also applicable to studies in which researchers compare more than two groups:
Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 317-328.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494-509.
Cliff, N. (1996). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate Behavioral Research, 31, 331-350.
Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 95-121). New York: Academic Press.
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365.
Last Update: 6/26/2016