Choosing A Statistical Test


Steps in Statistical Testing:

1) State the null hypothesis (Ho) and the alternative hypothesis (Ha).

2) Choose an acceptable and appropriate level of significance (a) and sample size (n) for your particular study design.

3) Determine the appropriate statistical technique and corresponding test statistic.

4) Collect the data and compute the value of the test statistic.

5) Calculate the number of degrees of freedom for the data set.

6) Compare the value of the test statistic with the critical values in a statistical table for the appropriate distribution and using the correct degrees of freedom.

7) Make a statistical decision and express the statistical decision in terms of the problem under study.


Step 1: Statement of Statistical Hypotheses

In statistical testing we always use a null hypothesis (Ho) that there is no difference between the distributions. In other words:

Ho: The survival of the animals is independent of drug treatment.

Ha: The survival of the animals is associated with drug treatment.

The alternative hypothesis (Ha) is obviously that the drug treatment does affect survival in some way. This may seem an odd way to phrase things since most biologists think of their experiment as a test of the hypothesis that drug treatment has an effect on the survival of the animals in the trial. While this may be an appropriate research hypothesis, it is actually the alternative statistical hypothesis. In statistics, we are always testing the null hypothesis. In short, all statistical tests are simply ways to examine different types of data and to determine whether or not you have a statistically significant reason to reject the null hypothesis. The distinction between the biological or research hypothesis and the statistical hypothesis is very important.

Step 2: Levels of Significance

Further, we need to ask whether the proportion of surviving animals with the drug treatment was a specifed amount different from the proportion of untreated survivors. Researchers specify this level of significance (a) beforehand; usually at the 0.05 level or smaller. What that means is that the researcher is willing to accept a 5% chance of rejecting the null hypothesis (Ho) when it is in fact true. In other words, there is a 5% chance that the statistic will cause us to believe that the survival of the animals is associated with drug treatment when, in fact, their survival is independent of treatment. In biology we typically choose a level of significance of 0.05 or less, but a doctor using human subjects might choose a level of 0.01 (1%) or less to be safe.

The p value approach has become common in the life sciences and published results often require this format. The p value is the probability of calculating a test statisitc value equal to or greater than the result obtained from the sample data when the null hypothesis is really ture. In other words, the p value is the smallest level of significance at which the null hypothesis can be rejected fo a given dataset.

Step 3: Choice of the Appropriate Statistical Test

Assume we have designed our experiment, stated our statistical hypothesis correctly, and determined the level of significance to be the typical 0.05 level. How do we choose the appropriate statistic from the many tests available? The choice of statistical test is in part determined by the design of the study and the type of data that is collected.

Data Types

There are basically two types of data: catagorical and numerical. Catagorical data fall into specific "catagories," such as yes and no responses to a survey. Here there are only two choices and there are no intermediates possible. Sex (male or female) would be another good example of a catagorical variable. On the other hand, data on the number of offspring a group of females produce is a type of numerical data because the answer is a number. Numerical data can be discrete or continuous. Discrete numerical variables arise from counting processes (i.e. How many cars do you own?), while continuous numerical variables arise from measuring processes (i.e. How tall are you?). The number of cars owned is discrete because there are a finite number of interger responses. You can't own half a car. Height on the other hand, in continuous because it can take on any value within a range or interval depending on the precision of the measuring device.

Measurement Scales

Technically discrete numerical data are "measured by counting" so we can also talk about levels of measurement or types of measurement scales. There are four basic types: nominal, ordinal, interval and ratio. Catagorical data are measured using either a nominal or ordinal scale. For example, data calssified into distinct catagories with no ordering of the catagories are considered nominal (i.e. yes or no, political party affiliation - democrat, republican, greens, etc). Catagorical data in which the catagories imply some sf ranking scheme are considered ordinal (i.e. freshman, sophomore, junior, senior). Interval and ratio scales of measurement are numerical but differ in one important feature. Interval scales are ordered and the differences between points on the scale have the same meaning anywhere on the scale. Temperature measurements are good examples of interval scales. If the scale has all the feature of an interval scale and there is a true zero point, then it is called a ratio scale. Measurements of length and weight are ratio scales because there is a true zero (i.e. can't have negative heigth). Note that temperature in Celcius and Fahrenheit have a zero point on the scale, but this zero point is said to be arbitrary.

Try to classify the following numerical variables by the type of scale:

Ok, Back to how to choose a statistical test.

Let's keep it simple for now. On the left side of the table below are the goals of the study. Across the top of the table are types of data. The details will be discussed below.


Type of Data Collected

Goal of the Study From Normal distribution Rank, Score, or Non-normal Distribution Binomial (2 outcomes)
Describe one population Mean, Std Dev Median, Interquartile range Proportion
Compare pop. to hypothetical value One-sample t Test Wilcoxon Test Chi-square or Binomial Test
Compare 2 unpaired groups Unpaired t Test Mann-Whitney Test Fisher's Test or Chi-square*
Compare >2 unmatched groups One-way ANOVA Kruskal-Wallis Test Chi-square Test
Compare >2 matched groups Repeated-measures ANVOA Friedman Test  
Association b/w 2 variables Pearson correlation Spearman correlation  
Predict value from a measured value Simple linear regression Nonparametric regression  
Predict value from several meaured values Multiple linear Regression    

* Chi-square for large samples

Notice that there is more than one method for each type of study. How do we choose? First we have to make sure our data fit the assumptions of the test (otherwise GIGO - garbage in, garbage out). The first tyoe of goal, description of a sample population, is straight forward and won't be considered further here. Suppose, however, we wish to compare our sample population with some hypothetical value for the population. The One-sample t-Test assumes that our numerical data is independently drawn from and represent a random sample of the population as a whole and that the population is normally distributed. In practice, this test should not be used for small data sets (less than 30). Less stringent assumptions are required by the non-parametric Wilconon Signed-Ranks Test. Here we do not have to have our sample drawn from a normally distributed population. (Tests that make no assumptions about the population distribution are called non-parametric tests). Wilconon test require interval or ratio data.