Tipultech logo

Sample syntax for regression analyses

Author: Dr Simon Moss

Introduction

To execute regression analyses in SPSS, researchers often prefer to create a syntax file rather than select the various menus. If researchers use the syntax, they can repeat their analysis with other datasets efficiently, without needing to select the various menus and options again. This article presents a sample syntax, which illustrates the various phases that researchers often complete, such as checking missing data, recoding items, checking internal validity, testing assumptions, and then conducting the regression analyses.

How to use the syntax

To use the syntax:

Step 1. Reverse score or recode items

First, identify items that need to be reverse scored. For example, suppose 4 items are used to assess anxiety. Suppose high scores on two of the items reflect high anxiety. Suppose low scores on the other items reflect high anxiety--the scores on these two items will need to be reverse scored. That is, researchers need to ensure that high scores on each item correspond to high levels of that variable. Copy and paste the syntax below.

RECODE item3 (1=5) (2=4) (3=3) (4=2) (5=1) INTO item3r.

RECODE item4 (1=5) (2=4) (3=3) (4=2) (5=1) INTO item4r.

EXECUTE.

In this code, for example, the first line changes all the 1s to 5s, the 2s to 4s, the 4s to 2s, and the 5s to 1s in item 3. This line then creates a new column in the data file called item3r. To reverse score:

Other adjustments might be necessary

Step 1b. Manage missing data

The next step is often to substitute missing data with accurate estimates, using a technique called expectation maximization (for information about the underlying rationale, see Expectation Maximization). To undertake this technique, which is only necessary if participants have not answered all of key questions:

Type the following syntax

MVA VARIABLES=height weight item1 item2 item3 item4 gender haircolor

/MAXCAT=25

/CATEGORICAL=gender haircolor

/EM(TOLERANCE=0.001 CONVERGENCE=0.0001 ITERATIONS=25 OUTFILE=

'C:My '+ 'DocumentsSample data file.sav').

Use the updated data, unless the missing data is not missing at random. That is, this updated file is not applicable if the missing data is related to values on one variable even after controlling other variables. Hence:

Step 2. Examine alpha reliability or internal consistency

The next step is to compute Cronbach's alpha for each of your scales. Copy and paste the follow syntax.

RELIABILITY

/VARIABLES = item1 item2 item3

/FORMAT=NOLABELS

/SCALE(ALPHA) = ALL/MODEL=ALPHA

/SUMMARY=TOTAL.

Alpha values that exceed 0,7 are considered acceptable (e.g., Nunnally & Bernstein. 1994). Suppose the alpha value is less than 0.7. In this instance:

You can continue removing items even if alpha exceeds 0.7. However, if the scale is popular, only remove items if necessary. That is, only remove items if alpha rises considerably.

Then, repeat this process with other scales or subscales. That is, copy and then paste these five lines. Replace the items with the next subscale, and continue.

Step 3. Compute the scale scores

You now need to compute the scores for each scale or subscale. For example, suppose that qn1, qn2, qn3r, and qn4r pertain to anxiety. However, suppose that qn1 was deleted, because this item reduces alpha. You thus need to create a new column that is the average of qn2, qn3r, and qn4r. This column would represent the level of anxiety. Accordingly, copy and paste the following syntax:

COMPUTE anxiety = mean(qn2, qn3r, qn4r).

EXECUTE.

In this syntax, each line below presents the code that creates these new columns. The word after 'COMPUTE' is merely the label for your new column. For example:

Continue this process for each scale or subscale. That is, copy and paste the COMPUTE line. If your data includes 6 scales, then you need 6 COMPUTE commands. Highlight these lines, as well as the EXECUTE command, and then run.

Step 4. Assess multicolinearity

You now need to examine the correlation between all scales and other key variables. To create a correlation matrix, copy and paste the following syntax:

CORRELATIONS

/VARIABLES=anxiety openness height weight gender

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

Replace 'anxiety openness height weight gender' with all your scales and subscales. You can also include other numerical variables, such as height, as well as dichotomous variables, such as gender. The two categories in these dichotomous variables should be labeled as 0 and 1 respectively in the data file. Finally, highlight and then execute this syntax

Correlations that exceed 0.8 or so indicate multicolinearity. That is, these correlations suggest the two variables overlap unduly. These two variables should not be included in the same analysis, unless one is the dependent variable and the other is the independent variable. You might want to collapse these two variables into the one scale.

Indeed, if the sample size is small, such as less than 100, correlations that exceed 0.7 might indicate multicolinearity. Regardless, the correlations often provide some insight into the hypotheses as well.

Sometimes, the correlation matrix comprises too many variables and is thus unwieldy. You could potentially divide the matrix into three tables. That is:

CORRELATIONS

/VARIABLES=anxiety openness WITH gender height weight

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

Step 5. Examine outliers

You can next identify multivariate outliers--individuals whose profile of scores diverges appreciably from a typical person. These outliers might indicate the individual is not a member of your target population. Alternatively, these outliers might represent errors or compromise normality. To identify these outliers, copy and paste the following syntax:

REGRESSION

/MISSING PAIRWISE

/STATISTICS COEFF OUTS R ANOVA

/NOORIGIN

/DEPENDENT other

/METHOD = ENTER anxiety openness gender

/SAVE MAHAL.

Sometimes, all the variables appear in the list next to ENTER. If this instance, create a new column of numbers in the data file. Replace 'other' with the name of this column. Highlight and execute this syntax.

Execute this syntax, which creates a new column in the data file called mah_1. High values represent potential multivariate outliers. To identify outliers:

Step 6. Specify possible regressions.

Before you proceed, specify the regressions you plan to undertake. For example, you might need to know how personality affects IQ. Suppose your study includes 5 measures of personality. Suppose your study includes 2 measures of IQ: verbal and spatial.

In this example, you would undertake two regressions. For the first regression, the dependent variable would be verbal IQ. The independent variables would be five personality traits. For the second regression, the dependent variable would be spatial IQ. The independent variables would again be 5 personality traits.

In short, for each regression, you need to specify the dependent variable. Then, you need to specify the independent variables. Sometimes, this step is simple& sometimes, this step demands some creativity. Mediated and moderated models will be discussed later.

Step 7. Undertake your first regression, partly as practice.

To undertake your first regression analysis, copy and paste the following syntax:

REGRESSION

/MISSING PAIRWISE

/STATISTICS COEFF OUTS R ANOVA

/NOORIGIN

/DEPENDENT verbaliq

/METHOD = ENTER extravrt neurot openness

/SAVE PRED COOK RESID.

DESCRIPTIVES

VARIABLES=res_1

/STATISTICS=SKEWNESS KURTOSIS.

GRAPH

/SCATTERPLOT(BIVAR)=pre_1 WITH res_1

/MISSING=LISTWISE.

Before you examine the output of this regression, you need to assess the assumptions:

Second, after you remove these influential cases, you need to examine whether the residuals, which is the column labeled Res_1, are normal.

Third, you should examine the assumptions of homoscedasticity and linearity.

Finally, examine the output of the regression. Specifically, examine the ANOVA table to ascertain whether the R is significant. Examine the coefficients table to ascertain which variables are significant.

Once you have completed the first regression, switch to the data file. Delete the columns Coo_1, Res_1, and Pre_1. Ensure that you delete these columns after each regression. Finally, you can conduct other regression analyses as well.

Step 8. Undertake logistic regression

Sometimes, your dependent variable is dichotomous, such as gender. In these instances, you should undertake logistic regression rather than multiple regression. Copy and paste the following syntax:

LOGISTIC REGRESSION VARIABLES gender

/METHOD=ENTER height weight

/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

Step 9. Conduct moderated regression analyses.

If you want to examine whether one numerical variable moderate or changes the relationship between two other variables, moderated regression is often useful. For example, age might affect the relationship between personality and IQ. To undertake moderator regression analyses, first ascertain the mean of all your independent variables and moderators. In this example, the researcher would compute the means of personality and age. Specifically, copy and paste the following syntax, but replace 'extravrt neurot openness age' with the name of your independent variables and moderators.

DESCRIPTIVES

VARIABLES= extravrt neurot openness age

/STATISTICS=MEAN.

Next create new columns that, in essence, combine each independent variable with each moderator. In this example, the researcher would create one column that combines extravrt with age. The researcher would then create another column that combines neurot with age and so forth. That is, copy and paste the following syntax.

COMPUTE ext_x_age = (extravrt - 1.76)*(age - 1.45).

COMPUTE neu_x_age = (neurot - 2.46)*(age - 1.45).

COMPUTE ope_x_age = (openness - 2.54)*(age - 1.45).

EXECUTE.

Finally, include these new columns--called products or interactions--into the regression after the ENTER command. Whenever you include these products, also include the constituent independent variables and moderators. Copy and paste the following example.

REGRESSION

/MISSING PAIRWISE

/STATISTICS COEFF OUTS R ANOVA

/NOORIGIN

/DEPENDENT verbaliq

/METHOD = ENTER extravrt neurot openness age ext_x_age neu_x_age ope_x_age

/SAVE PRED COOK RESID.

DESCRIPTIVES

VARIABLES=res_1

/STATISTICS=SKEWNESS KURTOSIS.

GRAPH

/SCATTERPLOT(BIVAR)=pre_1 WITH res_1

/MISSING=LISTWISE.

Sometimes, these moderated regression analyses include too many variables. If so, you could examine each independent or moderator separately. In this instance, you could examine extravrt, neurot, and openess separately.

Step 10. Conduct mediation analyses.

Two main techniques can be undertaken to examine mediation analyses (for more information on these techniques, see Mediation analyses& for more information on analyses with multiple mediators, see Mediation analyses with multiple mediators).

The most common, but not necessarily the most effective, technique is called the Baron and Kenny approach or the causal steps approach (see Baron & Kenny, 1986). For example, suppose your study includes the following hypothesis: The association between personality and IQ is mediated by anxiety and depression. To assess this hypothesis, you need to undertake three to four steps.

First, show the independent variables are associated with the dependent variables. You could undertake two regression analyses:

Second, show the independent variables are associated with the mediators

Third, show the mediators are associated with the dependent variables, after controlling the independent variables

Finally, show the independent and dependent variables are unrelated once you control the mediators

Copy and paste the syntax below. You should simply replace 'anxiety depressn' with your mediators as well as 'extravrt neurot openness' with your independent variables

REGRESSION

/MISSING PAIRWISE

/STATISTICS COEFF OUTS R ANOVA CHANGE

/NOORIGIN

/DEPENDENT verbaliq

/ METHOD = ENTER anxiety depressn

/METHOD = ENTER extravrt neurot openness

/SAVE PRED COOK RESID.

DESCRIPTIVES

VARIABLES=res_1

/STATISTICS=SKEWNESS KURTOSIS.

GRAPH

/SCATTERPLOT(BIVAR)=pre_1 WITH res_1

/MISSING=LISTWISE.

REGRESSION

/MISSING PAIRWISE

/STATISTICS COEFF OUTS R ANOVA CHANGE

/NOORIGIN

/DEPENDENT spatiaiq

/ METHOD = ENTER anxiety depressn

/METHOD = ENTER extravrt neurot openness

/SAVE PRED COOK RESID.

DESCRIPTIVES

VARIABLES=res_1

/STATISTICS=SKEWNESS KURTOSIS.

GRAPH

/SCATTERPLOT(BIVAR)=pre_1 WITH res_1

/MISSING=LISTWISE.

The CHANGE command in the third row yields some vital output. Specifically, locate the p value associated with F change in the second step. If this value is not significant, the independent variables were not significantly related to the dependent variable after the mediators are controlled. In other words, full mediation is operating.

If this value is significant, the independent variables were significantly related to the dependent variable even after the mediators are controlled. In other words, full mediation is not operating.

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks: Sage.

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.

Nunnally, J. C., & Bernstein. I. H. (1994). Psychometric Theory. New York: McGraw-Hill.



Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.





Last Update: 6/28/2016