Robust variants of Cohen's d

Author: Dr Simon Moss

Overview

The majority of scientific papers in psychology report tests of significance--z, t, and F values, for example. Generally, if these values are high, the researchers can conclude the groups differ significantly from one another or the variables are significantly related to one another.

Nevertheless, a high z, t, or F value does not necessarily imply the difference between the groups--or the relationship between these variables--is large. That is, a high z, t, or F value can be generated when the difference between the groups, or the relationship between these variables, is small, provided the sample size is large. That is, a high z, t, or F value merely indicates the researcher can be quite certain the groups differ from each other or the variables are highly related to each other.

In contrast, measures of effect size, such as the Cohen d value, represent the extent to which the groups differ from one another or the degree to which the variables are related (Cohen, 1965). To illustrate, suppose that researchers want to examine whether males or females have acquired more Facebook friends. The d value merely equals the difference between the mean of each gender divided by the standard deviation within each sex--technically, the pooled standard deviation.

If the d value, sometimes called delta, is approximately .20, the effect size is regarded as small. That is, researchers would conclude that any difference between the genders is not especially pronounced, almost imperceptible to the average person. If the d value is approximately .50, the effect size is regarded as medium. Specifically, researchers would conclude that perhaps the difference between the genders is not modest, obvious to experts but probably not to a layperson. Finally, if the d value is approximately .80, the effect size is regarded as large. In particular, researchers would conclude that perhaps the difference between the genders is conspicuous, clear to almost anyone.

Unfortunately, this d value is not especially robust. That is, almost trivial changes in the population can generate major differences in the Cohen d value. This article presents a variant, developed by Algina, Keselman, and Penfield (2005). To compute this d value, researchers should trim the lowest and highest 20% of values in each group, before applying the formula developed by Cohen, and finally multiplying the answer by .642. This measure is suitable whenever researchers need to compare two independent groups or conditions.

Calculation of the revised index of delta

Algina, Keselman, and Penfield (2005) described a technique to compute a robust variant of delta. First, researchers should trim the highest 20%, called Windsorizing. That is:

For each group, researchers should order the values from lowest to highest.

For example, suppose the researchers want to examine whether number of Facebook friends depends on gender. In this instance, the males and females should be ranked separately, from the lowest to the highest number of Facebook friends.

For each group, identify the 20% percentile--that is, the value in which only 20% of the values are higher.

For example, if the sample comprises 50 males and 50 females, the 20% percentile is the tenth highest value for each gender

Then, for each group, convert the values that exceed this 20% percentile to this value.

For example, if the 20% percentile for males is 104, values above 104 in males are converted to 104. Similarly, if the 20% percentile for females is 120, then values above 120 in females are converted to 120.

Second, researchers should trim or Windsorize the lowest 20% of values, using the same method. In particular:

For each group, identify the 80% percentile--that is, the value in which 80% of the values are higher.
For example, if the sample comprises 50 males and 50 females, the 80% percentile is the fortieth highest value for each gender
Then, for each group, convert the values that are below the 80% percentile to this value.
For example, if the 80% percentile for males is 42, values below 42 in males are converted to 42. Similarly, if the 80% percentile for females is 61, then values below 61 in females are converted to 61.

Third, apply the usual formula, developed by Cohen, to these trimmed or Windsorized scores.

Calculate the mean and variance of each group.

For example, to examine the males and females separately, you can select Data and Split files or Select.

Then, specify Analyze data, Descriptives, and Descriptives to calculate the mean and variance.

Note the variance is merely the standard deviation squared.

Calculate the pooled variance. That is, multiply each variance by n - 1, where n is the number of individuals in that group. Sum these two answers together and divide by N - 2, where N is the total number of participants.

Compute the square root of this pooled variance, to compute the pooled standard deviation.

To calculate the revised d value, divide the difference between the two Windsorized means by the pooled standard deviation.

Multiply this value by .642--an adjustment that is necessary when 20% values on each tail have been trimmed.

Benefits of this variant of delta

As demonstrated by Algina, Keselman, and Penfield (2005), this variant of delta is more robust. Trivial changes in the population or sample can generate major shifts in this index. This problem, according to Algina, Keselman, and Penfield (2005), arises because the index is too dependent on variations in the tails of each distribution. As a consequence, deviations from a normal distribution can affect the tails and bias this measure appreciably. Windsorizing overcomes this problem.

Application of this variant of delta

To interpret the magnitude of this effect size, the guidelines that were developed by Cohen should be applied in this instance as well. That is, values of 0.2, 0.5, and 0.8 represent small, medium, and large effects respectively.

Second, Algina, Keselman, and Penfield (2005) recommend that researchers can use Bootstrap procedures to estimate confidence intervals. In their research, these Bootstrap procedures were very effective.

Alternative measure of effect size

Many alternative indices of effect size are applicable, even when researchers merely want to compare two groups, such as males and females, on some measure. Many of these measures are also applicable to studies in which researchers compare more than two groups:

Eta squared represents the between group variance divided by the total variance--and is thus the proportion of total variance explained by the groups.
The square root of this value is equal to the correlation between a dummy variable representing the groups and the dependent measure (Algina, Keselman, & Penfield, 2005).
The sample common language estimate of effect size, published by McGraw and Wong (1992), represents the probability that a score in one population exceeds a score in another population, assuming the distributions are normal and variances of each group are homogenous
The dominance statistic, proposed by Cliff (1993, 1996), represents p2>1 - p1>2. In this instance, p2>1 is the probability that a score from the population with the higher mean exceeds a score from the population with the lower mean& p2>1 is the probability that a score from the population with the lower mean exceeds a score from the population with the higher mean.

References

Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 317-328.

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494-509.

Cliff, N. (1996). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate Behavioral Research, 31, 331-350.

Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 95-121). New York: Academic Press.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365.

Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.