# Between-subject versus repeated-measures designs

### Introduction

A researcher would like to ascertain whether or not music influences the mood of individuals. In particular, the researcher wants to ascertain whether or not Metallica's 'Kill and maim' and their cover version of 'I'm a little teapot short and stout' confer different effects on the mood of listeners.

Presumably, the researcher could present 'Kill and maim' to one group of participants and then present 'I'm a little teapot short and stout' to another. This design, which is called "between-subjects" as well as "independent", is represented in the table below. Only an extract of the data is provided. The first column specifies the name of each participant. The second column specifies the song that participant received. The third column specifies the mood of participants on a 10-point scale, where 1 represents very negative and 10 represents very positive.

 Participant Song that is presented Mood Amber Kill and maim 6 Bob I'm a little teapot... 4 Carl Kill and maim 9 Donna I'm a little teapot... 2 . . . . . . Zoe Kill and maim 6

Alternatively, the researcher could present both "Kill and maim"as well as "I'm a little teapot short and stout o the same group of participants and measure their mood after each song. This design, which is called 'repeated measures' as well as 'within-subjects, 'dependent', 'matched', or 'related', is represented in the table below. Again, the first column specifies the name of each participant. The second column specifies their mood after they listen to 'Kill and maim'. The third column specifies their mood after they listen to 'I'm a little teapot short and stout'.

 Participant Kill and maim I'm a little teapot... Amber 5 6 Bob 2 4 Carl 7 9 Donna 5 2 . . . . . . Mandy 1 6

This document discusses the criteria that should be considered to ascertain whether a between-subject or repeated-measures design is appropriate.

### Benefits of repeated measures designs: Power

Repeated-measures designs present some vital benefits. The principal benefit revolves around the power of these designs. That is, repeated-measures designs are more likely to yield significant effects than are between-subject designs.

To demonstrate the source of this power, consider the following table. This table provides an extract of data that emerged from a between-subject design. As this table reveals, the participants differ from one another on many attributes, such as their confidence, number of children, and so forth. As a consequence, mood will vary dramatically across the individuals. This variability tends to inflate the error term - the denominator of a t or F value. Hence, the t or F value is often reduced and thus significant results become more difficult to achieve.

 Participant Song Mood Confidence Number of children Amber Kill and maim 5 4 2 Bob Kill and maim 2 2 4 Carl Kill and maim 9 8 0 Donna Kill and maim 1 3 4 Ernie Kill and maim 8 8 0 Fred Kill and maim 7 7 0 George I'm a little teapot... 9 8 0 Helen I'm a little teapot... 2 3 3 . . . . . . . . Zoe I'm a little teapot... 6 5 2

In contrast, consider the next table, which provides an extract of data that emerged from a repeated-measures design. Again, the participants differ from one another on many attributes. In this instance, however, these attributes will influence the mood of participants to a similar extent in both conditions. Nevertheless, the difference in mood between the two conditions is somewhat uniform across participants. The error term - or denominator of the t and F value - is primarily dependent upon the difference in mood between the two conditions. Hence, the t or F values will tend to be elevated, which increases the likelihood of significant results.

 Participant Kill and maim I'm a little teapot... Difference between the two songs Amber 5 6 -1 Bob 2 4 -2 Carl 7 9 -2 Donna 5 5 0 . . . . . . . . Mandy 1 3 -2

In short, a design that involves repeated measures tends to be more powerful& that is, significant effects are more likely to be achieved. Indeed, these designs can be more powerful than between-subject research, even when only half the number of participants are utilized.

### Benefits of repeated measures designs: Equality of treatment

To reiterate, in the between-subject design, some participants listen to Kill and Maim and other participants listen to 'I'm a little teapot short and stout'. In the repeated-measures design, however, every participant listens to both songs. Hence, in repeated measures designs, all participants experience the same process.

Three principal benefits accrue when all participants experience the same process and thus receive the same treatment. The first benefit revolves around the prevention of compensation. To illustrate consider the between-subject design in which participants did not all experience the same process. Instead, only half the participants receive the popular and stirring hit 'Kill and maim'. The remaining participants might thus feel disappointed they did not receive such an energising number. To compensate, they might attempt to energise themselves, perhaps banging their head to the beat of 'I'm a little teapot short and stout'. This compensation contaminates the findings.

The second benefit revolves around ethical issues. To demonstrate, suppose that 'Kill and main' is thought to enhance confidence and assurance. Thus, participants who listen only to 'I'm a little teapot short and stout' do not receive the benefits that stem from 'Kill and main'. In essence, the between-subject design thus prevents participants from receiving a potential treatment to their uncertainty and timidity. The repeated-measures design circumvents this dilemma.

The final benefit revolves around logistics. The between-subject design, in which participants must be assigned to different conditions, can be difficult to organise.

### Drawbacks of repeated-measures designs: Withdrawal

The previous sections identified the benefits of repeated-measures designs. Unfortunately, these designs also present some drawbacks. The first drawback emerges from the extended commitment that participants must exhibit. In the between-subject design, the participants merely listened to one song. In the repeated-measures design, the participants listened to an additional song. Hence, the repeated-measures design demanded more effort and time. If the design comprised many, protracted conditions, this disparity between between-subject and repeated-measures designs would be magnified.

The main difficulty associated with this additional effort is that fewer participants complete all conditions. Specifically, many individuals will refuse to participate in such an extensive regime. In addition, many participants could withdraw even before they complete the entire process.

If many individuals withdraw from the study or refuse to participate at all, the sample size obviously diminishes. This reduction in the sample size could even offset the increased power that repeated-measures designs usually afford. Indeed, suppose the additional effort associated with the repeated-measures design halves the sample size. This design will thus be approximately as powerful as a between-subject study.

Nevertheless, the withdrawal of participants presents some additional, and more acute, problems as well. Specifically, this withdrawal biases the final sample. To illustrate, suppose that most individuals refuse to participate in the entire process. Accordingly, the final sample will comprise the few individuals who consent to this research. These participants most likely exhibit some unique qualities& otherwise, their behaviour would have conformed to the majority of individuals who refused to participate. As a consequence, the findings that emerge from this sample cannot be extended to the majority of individuals who would not participate. The results, hence, cannot be generalised with confidence.

### Drawbacks of repeated-measures design: Reactivity and variation over time

The repeated-measures design also presents another drawback. In particular, performance in one condition can sometimes contaminate performance in another. To illustrate, suppose that all participants listen to 'Kill and maim' and then 'I'm a little teapot short and stout'. Many of these participants might feel fatigued after they listen to the first song, 'Kill and maim'. As a consequence, their attention might wane during the presentation of 'I'm a little teapot short and stout'. This song thus might not enhance mood as expected. In other words, performance in one condition influences the effect of another condition.

To demonstrate this issue, consider the following pair of tables. The first table presents the mood of participants who listened to one song only. According to this table, mood is elevated in participants who listen to 'I'm a little teapot short and stout'. Presumably, this cheerful melody induces a pleasant mood. The second table presents the mood of participants who listened to 'Kill and maim' and then 'I'm a little teapot short and stout'. According to this table, mood is not elevated in participants who listen to 'I'm a little teapot short and stout'. Perhaps individuals were too fatigued to listen intently to 'I'm a little teapot short and stout'. That is, the first condition imposes an unfair disadvantage upon the second condition, which is represented by the asterisks.

 Participant Song Mood Amber Kill and maim 2 Bob Kill and maim 1 Carl Kill and maim 3 Donna I'm a little teapot... 9 Ernie I'm a little teapot... 8 . . . Zoe I'm a little teapot... 9

 Participant Kill and maim I'm a little teapot... Amber 5 6* Bob 2 3* Carl 7 4* Donna 5 1* . . . Mandy 1 4*

### Counterbalancing

The previous section revealed that performance in one condition could influence the effect of another condition. A technique, called counterbalancing, provides a means to counteract this problem. Specifically, the order of each condition or song is balanced across individuals.

To counterbalance the conditions in this context, half of the participants listened first to 'Kill and maim' and then to 'I'm a little teapot short and stout'. The mood of these participants is represented in the top half of the following table. For these participants, mood in response to 'I'm a little teapot short and stout' is somewhat depressed, because of the fatigue they experience while they listen to this song. In other words, fatigue contaminates the mood of these individuals, and this contamination is denoted by asterisks.

The remaining participants listened first to 'I'm a little teapot short and stout' and then 'Kill and maim'. The mood of these participants is represented in the bottom half of the following table. For these participants, mood in response to 'I'm a little teapot short and stout' is elevated. Instead, mood in response to 'Kill and main' is particular depressed as a consequence of fatigue.

 Participant First song heard Kill and maim I'm a little teapot... Amber Kill and maim 5 3* Bob Kill and maim 2 4* Carl Kill and maim 7 6* Donna Kill and maim 5 5* . . . . . . . . Mandy Kill and maim 1 3* Participant First song heard Kill and maim I'm a little teapot... Neal I'm a little teapot 3* 6 Oliver I'm a little teapot 1* 8 Petra I'm a little teapot 6* 9 Rachel I'm a little teapot 3* 8 . . . . . . . . Zoe I'm a little teapot 1* 5

Thus, for half the participants, fatigue contaminates mood after 'I'm a little teapot short and stout'. For the remaining participants, fatigue contaminates mood after 'Kill and maim'. Overall, when the data of all participants is combined, these distortions annul on another. If the mean level of mood differs between the two conditions, the researcher can thus conclude that mood is contingent upon the song they receive. This difference cannot be ascribed to fatigue or any other variation over time.

Counterbalancing can also be applied if the design entails more than three conditions. For example, suppose the researcher also examines the impact of 'Kill the little teapot' on mood. In this instance,

A sixth of the participants will listen to 'Kill and maim', then 'I'm a little teapot short and stout', and finally 'Kill the little teapot'.

In addition, a sixth of the participants will listen to 'Kill the little teapot', then 'Kill and maim', and finally 'I'm a little teapot short and stout'

..and so forth for each possible arrangements of these three songs.

In other words, six arrangements of these three songs need to be presented. Unfortunately, counterbalancing becomes cumbersome if the design comprises more than three conditions. For example, if the study involved four songs, 24 different arrangements need to be presented.

To overcome this complication, researchers introduced an approach called a Latin square. When this approach is applied, the number of arrangements that need to be presented is equivalent to the number of conditions. For example, suppose the study involved four songs or conditions. Only 4, rather than 24, arrangements need to be presented.
These arrangements are depicted in the following table. The top row represents the first arrangement of the conditions. The next row represents the second arrangement of the conditions, and so forth. These Latin squares ensure that each song is presented first for 25% of participants, second for 25% of participants, third for 25% of participants, and fourth for 25% of participants.

 First song heard Second song heard Third song heard Final song heard First group of participants Kill and maim I'm a little teapot Kill the teapot My way Second group of participants I'm a little teapot Kill and maim My way Kill the teapot Third group of participants My way Kill the teapot Kill and maim I'm a little teapot Fourth group of participants Kill the teapot My way I'm a little teapot Kill and maim

### Obstacles to counterbalancing

The previous section revealed that one of the drawbacks of repeated-measures, reactivity and variation over time, can be overcome through counterbalancing. In other words, if counterbalancing can be applied, the only drawback of repeated-measures is the potential for withdrawal

Unfortunately, counterbalancing cannot always be applied successfully. First, sometimes a problem called assymetric transfer or differential carry-over arises. To illustrate this problem, consider the pair of tables that are provided below. Again, the top half of this table represents the mood of individuals who listened to 'Kill and main' first. The bottom half of this table represents the mood of individuals who listened to 'I'm a little teapot short and stout' first.

 Participant First song heard Kill and maim I'm a little teapot... Amber Kill and maim 5 3*** Bob Kill and maim 2 4*** Carl Kill and maim 7 6*** Donna Kill and maim 5 5*** . . . . . . . . Mandy Kill and maim 1 3***

 Participant First song heard Kill and maim I'm a little teapot... Neal I'm a little teapot 3* 6 Oliver I'm a little teapot 1* 8 Petra I'm a little teapot 6* 9 Rachel I'm a little teapot 3* 8 . . . . . . . . Zoe I'm a little teapot 1* 5

Consider the participants who listen to 'Kill and Maim' first. These individuals are most likely to be deafened, as well as fatigued, after they listen to 'Kill and Maim'. Hence, these participants will not even be able to hear 'I'm a little teapot short and stout'. As a consequence, 'I'm a little teapot short and stout' will be unable to enhance mood at all. In other words, the presentation of 'Kill and main' appreciably contaminates the responses to 'I'm a little teapot short and stout', as depicted by the three asterisks.

On the other hand, consider the participants who listen to 'I'm a little teapot short and stout' first. These individuals will not be deafened after they listen to 'I'm a little teapot short and stout'. Hence, these participants will be able to hear 'Kill and maim'. As a consequence, 'Kill and maim will theoretically be able to enhance mood. In other words, the presentation of 'I'm a little teapot short and stout' only marginally contaminates the responses to 'Kill and maim', as depicted by the single asterisks.

In other words, when the data from both sets of participants are combined, these distortions will not annul each other. Instead, mood in response to 'I'm a little teapot short and stout' will be unfairly depressed. This problem arises whenever the impact of the first condition on the second differs from the impact of the second condition on the first. Whenever asymmetric transfer is likely, a between-subject design is more applicable.

In addition to asymmetric transfer, other issues can also undermine the legitimacy of counterbalancing. For example, counterbalancing is difficult to organize. That is, the need to vary the order in which the conditions are administered can be cumbersome. Indeed, in some instances, this task is unattainable.

To illustrate, suppose you would like to compare the mood of individuals before and after they listen to 'Kill and maim'. In this instance, assessments of mood before participants listen to the song must always precede assessments of mood after participants listen to the song. The order of these conditions cannot be varied or counterbalanced. In other words, a between-subject design must be introduced, where some of the participants do not listen to 'Kill and maim'.

### Conclusion

In summary, to ascertain whether a between-subject or repeated-measures design is most applicable, the following scheme should be followed.

First, determine whether or not a repeated-measures design, with counterbalanced conditions, could yield asymmetric transfer. If so, use a between-subject design.

Second, estimate the proportion of participant that would refuse to participate if they had to undertake all, rather than one, condition. If more than 50% of the individuals who would agree to complete one condition would refuse to partake in all conditions, use a between-subject design.

Third, determine whether or not all participants must undergo the same process. If so, use a between-subject design. If none of these restrictions apply, use a repeated-measures design that entails counterbalancing.