The next step in
the procedure is to decide what the level of significance should be,
usually noted as α
in most text books, and calculates what the boundaries of the
critical region on the t curve are. If a two sided test is
required then α
value is split evenly between the two sides of the t curve. To find
the critical t-values it is necessary to consult a t-test
statistical table, an example of which is given below:
t-Distribution
significance
Degrees ------------------------------------
of Freedom 0.100 0.050 0.025 0.010 0.005
------------------------------------------------
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Before
the correct t-value can be obtained it is necessary to decide on the
"degrees of freedom", which is equal to n-1 - if there are
15 observations then the degrees of freedom that is consulted on a
table is 15-1=14. If the test is a two-sided and uses an α
level of 0.10 then the boundary is t_{0.05}=1.761 and
-t_{0.05}=-1.761.
The only real
computation to do in this test is calculate a value for t from
the data using the following formula:
(1.1)
With the critical
boundaries known and a value of t computed from the formula above it
is now decision time - the hypothesis H_{1 }is rejected if
the value t is in the critical region, otherwise accept the
hypothesis H_{0}.
The following examples demonstrate the procedure used for the tests.
Example 1
A sample of eight
bottles of a certain product were taken and their liquid content
measured – the results are below:
369, 357,
356, 364, 348, 361, 345, 364
The researcher wants to
test the null hypothesis that the mean equals 355 versus the
alternative that it does not. Let α
= 0.01.
H_{0}: μ
= 355, H_{1}: μ
≠ 355
As this is a
two-tailed test, and α=0.01
with (8-1)=7 degrees of freedom then from the t-table
t_{0.005}=3.499, -t_{0.005}=-3.499.
Do computations
As -3.499 < 1.029 < 3.499 then accept H_{0}.
Now looking at SAS, how
does the same test get done. There is a procedure called PROC TTEST
that will do the calculations however the default output and
interpretation are very different. Using the same data as in example
1 (loading it into a dataset called PRODA with a variable VOLUME) and
running the following code
20 proc ttest data=proda;
21 var volume;
22 run;
will produce the following output
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
volume 8 346.25 358 369.75 4.6472 8.2462 24.477 2.9155
T-Tests
Variable DF t Value Pr > |t|
volume 7 1.03 0.3377
From the output, how is
a decision made as to whether accept or reject the hypothesis? The
number to look for is under the label "Pr
> |t|", and to read this correctly the number
is compared against the significance level - if the number is greater
than or equal to the significance then accept H_{0},
otherwise accept H_{1}. In this case the significance was
0.05 (0.10/2=0.05 since two sided test) and as 0.3377 > 0.005 then
accept H_{0.}
Is there a way to check
that the p-value from the TTEST procedure is acceptable or that the
correct hypothesis was chosen? After all the way the procedure for
determining which hypothesis to choose is well known and in countless
textbooks? There is a function available in BASE SAS that can give
the p-value from the t value computed in Example 1 and is
shown in SAS data step below with the result:
31 data _null_;
32 x=1.029;
33 df=7;
34 p=(1-probt(abs(x),df))*2; /*significance level of a two-tailed t test*/
35 put p=;
36 run;
p=0.3377168268
The t-test can also be
done using the UNIVARIATE procedure for the Single Sample case and
using the MU0= option as the following SAS code and output shows
(look at the Tests for Location section, Student's t result):
393 proc univariate data=proda mu0=355;
394 var volume;
395 run;
The UNIVARIATE Procedure
Variable: volume
Moments
N 8 Sum Weights 8
Mean 358 Sum Observations 2864
Std Deviation 8.24621125 Variance 68
Skewness -0.480995 Kurtosis -0.7557588
Uncorrected SS 1025788 Corrected SS 476
Coeff Variation 2.30341096 Std Error Mean 2.91547595
Basic Statistical Measures
Location Variability
Mean 358.0000 Std Deviation 8.24621
Median 359.0000 Variance 68.00000
Mode 364.0000 Range 24.00000
Interquartile Range 12.00000
Tests for Location: Mu0=355
Test -Statistic- -----p Value------
Student's t t 1.028992 Pr > |t| 0.3377
Sign M 2 Pr >= |M| 0.2891
Signed Rank S 7 Pr >= |S| 0.3672
It is also possible to
do the calculations using data step code within BASE SAS, as shown
below, and get an output similar to the output below::
data _null_;
...SAS Statements...
tsigl=-abs(tinv(alpha,df));
tsigh=abs(tinv(alpha,df));
tval=(mean-mju)/(std/sqrt(n));
p=(1-probt(abs(tval),df))*2;
...more SAS Statements...
run;
--- Output ---
T-TEST
Dataset = PRODA
Variable = volume
H0 = 355 , H1 ^= 355
alpha = 0.01 (2-sided test: alpha/2=0.005)
N= 8
Mean=358
S= 8.2462112512
DF= 7
-3.499483297 < 1.0289915109 < 3.4994832974 : accept H0, reject H1
Pr > |t| = 0.3377205477
For the data step method I
have as a macro in my macro collection – every programmer
should carry around with them something that contains their useful of
frequently used code.
Example 2
A company took a random
sample of ten components and clocked the duration a machine took to
recondition and inspect the each component (in seconds), the times of
which were
5.7, 4.8,
5.9, 4.9, 6.1, 4.2, 6.5, 6.4, 5.8, 5.7
The goal is to have an
average time of 5 seconds. Using a significance level of 0.01 was
the goal met?
H_{0}: μ
= 5, H_{1}: μ
>5 (only concerned if the
average time is greater than 5 seconds)
As this is a
single-tailed test and α=0.01
with (10-1)=9 degrees of freedom then from the t-table
t_{0.01}=2.821.
Computation
As 2.564<2.821
then accept H_{0}.
In the previous
example, had the significance level been 0.05 (t_{0.05}=1.833)
then the result would have been quite different, 1.833<=2.564 then
reject H_{0} and accept H_{1}.
Using SAS and the UNIVARIATE procedure (MJU=5) the output is:
The UNIVARIATE Procedure
Variable: time
Moments
N 10 Sum Weights 10
Mean 5.6 Sum Observations 56
Std Deviation 0.74087036 Variance 0.54888889
Skewness -0.7500206 Kurtosis -0.2612091
Uncorrected SS 318.54 Corrected SS 4.94
Coeff Variation 13.2298278 Std Error Mean 0.23428378
Basic Statistical Measures
Location Variability
Mean 5.600000 Std Deviation 0.74087
Median 5.750000 Variance 0.54889
Mode 5.700000 Range 2.30000
Interquartile Range 1.20000
Tests for Location: Mu0=5
Test -Statistic- -----p Value------
Student's t t 2.560997 Pr > |t| 0.0306
Sign M 2 Pr >= |M| 0.3438
Signed Rank S 19 Pr >= |S| 0.0488
A decision is made as
to accept H_{0} or H_{1} by comparing the "Pr
> |t|" value against the significance level –
in this case 0.01 < 0.0306 so accept H_{0}.
TWO SAMPLES
The second part of this
paper will discuss using the t-test to compare two means with an
unknown but assumed common population variance.
The hypothesis that is
usually tested is that the means are equal, denoted by H_{0}:
μ_{1}=μ_{2}.
The hypothesis can also be rewritten as H_{0}:μ_{1}-μ_{2}=0
- this is useful as it is then possible to easily write the test to
check if it is larger or smaller by a specified value, sometimes
denoted in textbooks as ∆.
The test is very much
the same as for the single sample except that the calculation for t
is now:
(2.1)
where
(2.2)
Summarizing the
procedure the process would be:
First step
is to state the null and alternative hypothesis. As the test on the
mean the question being asked, for a two-tailed test, is
H_{0}:
μ_{1} - μ_{2}
= ∆
H_{1}:
μ_{1} - μ_{2}
≠ ∆
For
a single tailed test the alternative hypothesis would be written as
H_{1}:
μ_{1} - μ_{2}
>∆
or
H_{1}:
μ_{1} - μ_{2}
<∆
depending
on the question being asked.
The next step in
the procedure is to decide what the level of significance should be
as above.
Before
the correct t-value can be obtained it is necessary to decide on the
"degrees of freedom", which is equal to n_{1} + n_{2}
– 2 - if there are 7 observations in sample 1, 8 observations
in sample 2, then the degrees of freedom that is consulted on a table
is 7+8-2=13. If the test is a two-sided and uses an α
level of 0.10 then the boundary is t_{0.05}=1.771 and
-t_{0.05}=-1.771.
The only real
computation to do in this test is calculate a value for t from
the data using calculations 2.1 and 2.2.
With the critical boundaries known and a value of t computed from
the formula above it is now decision time - the hypothesis H_{1
}is rejected if the value t is in the critical region,
otherwise accept the hypothesis H_{0}.
The following examples
demonstrate the procedure used for the tests.
Example 3
A sample of free range
eggs from two farms were sought and a test was asked for to determine
if the mean weight (ounces) of the eggs from the two farms are the
same using a significance of 0.01:
Farm A: 20, 28, 24, 20, 24, 21, 17, 28, 25, 19
Farm B: 29, 16, 25, 27, 27, 18, 22, 27
H_{0}: μ_{1}
- μ_{2} = 0, H_{1}: μ_{1} - μ_{2}
≠ 0
As this is a
two-tailed test, and α=0.01
with (10+8-2)=16 degrees of freedom then from the t-table
t_{0.005}=2.921, -t_{0.005}=-2.921.
Do computations
As -2.921 <
-0.637 < 2.921 then accept H_{0}.
To calculate the t-test
using SAS procedures it is not possible to use the BASE SAS
procedures but instead use others, for example TTEST from the
SAS/STAT module. Using the data above and using the variable FARM to
indicate where the sample came from, WTOZ as the weight in ounces,
and the following code:
proc ttest data=eggs0;
class farm;
var wtoz;
run;
the following output appears:
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable farm N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
wtoz 1 10 19.898 22.6 25.302 2.598 3.7771 6.8956 1.1944
wtoz 2 8 19.917 23.875 27.833 3.13 4.734 9.635 1.6737
wtoz Diff (1-2) -5.521 -1.275 2.971 3.1448 4.2225 6.4264 2.0029
T-Tests
Variable Method Variances DF t Value Pr > |t|
wtoz Pooled Equal 16 -0.64 0.5334
wtoz Satterthwaite Unequal 13.3 -0.62 0.5457
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
wtoz Folded F 7 9 1.57 0.5173
As with the UNIVARIATE
procedure above the result to look at is the “
Pr> |t|” value
in the row using the method
Pooled
(if the assumption is that the two populations have the same variance
then the “Pooled” method value is used, otherwise the
Satterthwaite method value is used – the distinction is too
advanced for this paper and if the reader is interested they should
refer to the SAS documentation). As with the single sample method
the decision is made as to accept H_{0} or H_{1}
comparing the "Pr
> |t|" value against the significance level –
in this case 0.005 < 0.5334 (0.01/2=0.005 – two tailed test)
so accept H_{0}.
To check if the p-value
calculated in the TTEST procedure is the same as the one that was
calculated by hand above the following data step code is used:
41 data _null_;
42 x=-0.637;
43 df=16;
44 p=(1-probt(abs(x),df))*2; /*significance level of a two-tailed t test*/
45 put p=;
46 run;
p=0.533133971
The value 0.5331 is
about that of 0.5334 from the TTEST procedure – the difference
is due to rounding.
A good programmer will carry around a piece of SAS code to do this
test using BASE SAS that is the manual method plus the calculation for
the p-value.
Example 4
A sample of free range
eggs from two farms were sought and a test was asked for to determine
if the mean weight (ounces) of the eggs from Farm A is greater than
Farm B by three ounces using a significance of 0.01:
Farm A: 26, 26, 28, 33, 32, 27, 24, 24
Farm B: 19, 16, 26, 18, 28, 20, 18, 23, 18, 27
H_{0}: μ_{1}
- μ_{2} = 3, H_{1}: μ_{1} - μ_{2}
> 3
As this is a
one-tailed test, and α=0.01
with (10+8-2)=16 degrees of freedom then from the t-table
t_{0.01}=2.583.
Do computations
As 1.72 < 2.583
then accept H_{0}.
Using the TTEST
procedure and with the following SAS code
proc ttest data=eggs0 H0=3;
class farm;
var wtoz;
run;
the following output is generated:
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable farm N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
wtoz 1 8 23.317 27.5 31.683 1.9863 3.3806 8.9927 1.1952
wtoz 2 10 16.832 21.3 25.768 2.6853 4.3474 9.9017 1.3748
wtoz Diff (1-2) 0.7224 6.2 11.678 2.7016 3.9536 6.974 1.8754
T-Tests
Variable Method Variances DF t Value Pr > |t|
wtoz Pooled Equal 16 1.71 0.1073
wtoz Satterthwaite Unequal 16 1.76 0.0981
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
wtoz Folded F 9 7 1.65 0.5198
A decision is made as
to accept H_{0} or H_{1} by comparing the
“Pr > |t|” value
against the significance level –
in this case 0.005 < 0.1073 so accept H_{0}.