EGN 3443 Probability and Statistics for Engineers

Hui Yang

 

This course presents the theory and methods of probability and statistics models needed to support engineering decision making. The course objectives include:

To understand the basic concepts of probability and statistics.

To understand the data representation techniques.

To learn discrete and continuous random variables, probability distributions, measure of central tendency, and measure of dispersion.

To learn the statistical inference and hypothesis testing.

To understand the regression analysis using least square parameter estimation.

To develop the statistical way of thinking.

Bayes Theorem

1.   Suppose there is a rare cancer disease, the probability of a randomly selected person  people to get this cancer is 1/10000. And suppose there’s a lab blood test for this cancer that is 98% accurate in the sense that 98% of those who have the cancer will test positive, but the lab blood test also show positive for 2% of those who do not have such a cancer disease.

 

If a randomly selected person takes the lab blood test and the result is positive, what is the probability that he really has this cancer?

 

Let's see the following confusion matrix (What a good name? :)

 

True positive: Diseased people correctly tested positive

False positive: Healthy people wrongly tested as positive

True negative: Healthy people correctly tested as negative

False negative: Diseased people wrongly tested as negative

 

 

Actual condition

disease

No disease

Test Results

Positive

True Positive

(i.e. disease reported and present)

 

P (positive | disease)

False Positive (Type I error)

(i.e. disease reported but not present)
 

P (positive | no disease)

Negative

False Negative (Type II error)

(i.e. disease not detected)
 

P (negative | disease)

True Negative

(i.e. disease not reported and not present)

 

P (negative | no disease)

 

Note: false positive rate and false negative rate are not necessarily the same. They are both 0.02 in this example for simplicity.

 

P(positive |disease) = 0.98

But we want to know, P(disease | positive)?

 

P(positive|disease)* P(disease)

P(disease|positive)= -------------------------------------------------------------

                                                          P(positive)

 

                                           P(positive|disease)* P(disease)

   =    -----------------------------------------------------------------------------------------------------------------------

P(positive|disease)* P(disease)+ P(positive|no disease)* P(no disease)

 

                    0.98*0.0001

   =    --------------------------------------------  =     0.004877

0.98*0.0001+0.02*0.9999

 

Knowing that you tested positive increased your probability of having the disease from 0.0001 to 0.004877, but not all the way to 0.98.

 

2.   Let’s go back and look at what would happen if 60% of the original population had the disease (not 1/10000 any more).

When the prevalence of the disease is 0.60, the probability of having the disease given a positive test result is

 

P(positive|disease)* P(disease)

P(disease|positive)= ------------------------------------------------------------------

                                                          P(positive)

 

                                           P(positive|disease)* P(disease)

   =    -------------------------------------------------------------------------------------------------------------------

P(positive|disease)* P(disease)+ P(positive|no disease)* P(no disease)

 

                    0.98*0.6

   =    ----------------------------------  =  0.986577

0.98*0.6+0.02*0.4

         

If the prevalence of the disease is 60%, then knowing that you tested positive increased your probability of having the disease from 60% to 98.6577.

 

3.   What kind of prevalence of the disease will make P(disease | positive) = P(positive |disease) if the lab test results remain the same?

 

                                     0.98*x

         0.98  =    ----------------------------------       ( solve this equation for x - prevalence of the disease)

                           0.98* x +0.02*(1- x)

 

4.   Bulls mind-expanding: If a randomly selected person tests negative, what is the probability that he does not have the disease?

 

 

Discrete Random Variables and Probability Distribution

Rolling Two Fair Dice

1.   List all possible outcomes (a,b) of rolling the two dice. Let a denote the number on the top of the first die and b the number on the top of the second die. Note that each of a and b can be any of the integers from 1 through 6.

(1,1)

(1,2)

(1,3)

(1,4)

(1,5)

(1,6)

(2,1)

(2,2)

(2,3)

(2,4)

(2,5)

(2,6)

(3,1)

(3,2)

(3,3)

(3,4)

(3,5)

(3,6)

(4,1)

(4,2)

(4,3)

(4,4)

(4,5)

(4,6)

(5,1)

(5,2)

(5,3)

(5,4)

(5,5)

(5,6)

(6,1)

(6,2)

(6,3)

(6,4)

(6,5)

(6,6)

2.  Assume that the random variable X is the sum of the values shown after the throw of two dice.  What would the probability mass function (PMF) f(x) be like?

 

Table of probability mass function (PMF)  f(x) of rolling two fair dice

x

f(x) = P(X=x)

2

1/36

3

2/36

4

3/36

5

4/36

6

5/36

7

6/36

8

5/36

9

4/36

10

3/36

11

2/36

12

1/36

 

P(X=x) on the y-axis vs. x on the x-axis

3.  Calculate the mean E(X) and variance V(X).

Discrete Random Variable X - sum of two rolling dice (mean and variance)
X   f(x)     x*f(x)   (x-΅)^2 (x-΅)^2*f(x)
2 1/36 = 0.02777778 2*1/36= 0.05555556   25 0.694444444
3 2/36 = 0.05555556 3*2/36= 0.16666667   16 0.888888889
4 3/36 = 0.08333333 4*3/36= 0.33333333   9 0.75
5 4/36 = 0.11111111 5*4/36= 0.55555556   4 0.444444444
6 5/36 = 0.13888889 6*5/36= 0.83333333   1 0.138888889
7 6/36 = 0.16666667 7*6/36= 1.16666667   0 0
8 5/36 = 0.13888889 8*5/36= 1.11111111   1 0.138888889
9 4/36 = 0.11111111 9*4/36= 1   4 0.444444444
10 3/36 = 0.08333333 10*3/36= 0.83333333   9 0.75
11 2/36 = 0.05555556 11*2/36= 0.61111111   16 0.888888889
12 1/36 = 0.02777778 12*1/36= 0.33333333   25 0.694444444
               
      ΅= 7.00   sigma^2= 5.83
               
            sigma= 2.42

 

4.   Please find the probability that the sum of rolling two fair dice is less than or equal to a certain value, fill out the following table, and draw the graph for F(x).

Cumulative Distribution Function of the sum of rolling two dice

x

F(x)=P(X≤x)=P(- ∞)+…+P(x-1)+P(x)

F(x)

F(x)

P(X>x)=1-F(x)

-2 P(- ∞)+…+P(-4)+P(-3)+P(-2)) 0 0.0000 1.0000
-1 P(- ∞)+…+P(-3)+P(-2)+P(-1) 0 0.0000 1.0000
0 P(- ∞)+…+P(-2)+P(-1)+P(0) 0 0.0000 1.0000
1 P(- ∞)+…+P(-1)+P(0)+P(1) 0 0.0000 1.0000
2 P(- ∞)+…+P(0)+P(1)+P(2) 1/36 0.0278 0.9722
3 P(- ∞)+…+P(1)+P(2)+P(3) 1/36+2/36 0.0833 0.9167
4 P(- ∞)+…+P(2)+P(3)+P(4) 1/36+2/36+3/36 0.1667 0.8333
5 P(- ∞)+…+P(3)+P(4)+P(5) 1/36+2/36+3/36+4/36 0.2778 0.7222
6 P(- ∞)+…+P(4)+P(5)+P(6) 1/36+2/36+3/36+4/36+5/36 0.4167 0.5833
7 P(- ∞)+…+P(5)+P(6)+P(7) 1/36+2/36+3/36+4/36+5/36+6/36 0.5833 0.4167
8 P(- ∞)+…+P(6)+P(7)+P(8) 1/36+2/36+3/36+4/36+5/36+6/36+5/36 0.7222 0.2778
9 P(- ∞)+…+P(7)+P(8)+P(9) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36 0.8333 0.1667
10 P(- ∞)+…+P(8)+P(9)+P(10) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36 0.9167 0.0833
11 P(- ∞)+…+P(9)+P(10)+P(11) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36 0.9722 0.0278
12 P(- ∞)+…+P(10)+P(11)+P(12) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36 1.0000 0.0000
13 P(- ∞)+…+P(11)+P(12)+P(13) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0 1.0000 0.0000
14 P(- ∞)+…+P(12)+P(13)+P(14) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0+0 1.0000 0.0000
15 P(- ∞)+…+P(13)+P(14)+P(15) 1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0+0+0 1.0000 0.0000

Cumulative Distribution Function of the sum of rolling two dice

Confidence Interval

A machine is set up such that the average content of juice per bottle equals μ.

A sample of 100 bottles yields an average content of 48oz.

Calculate a 90% and a 95% confidence interval for the average content.

Assume that the population standard deviation σ = 5oz.

 

 

100(1-α)%

90%

95%

99%

1.645

1.96

2.576

90%:                                                   95%:

What sample size is required to make sure the margin of error (MOE) is within 0.5oz at the 95% confidence level? (±0.5 oz)

Assume that the population standard deviation σ = 5oz.

 

n = (1.96*5/0.5)2=368.64~369

 

Hypothesis Testing

A machine is set up such that the average content of juice per bottle equals μ. A sample of 36 bottles yields an average content of 51.5oz. Test the hypothesis that the average content per bottle is 50oz at the 5% significance level.

Assume that the population standard deviation σ = 5oz.

Classical approach:

Steps:

(a)  Formulate Ho and H1

H0: μ=50      H1: μ≠50     

(b)  Calculate the test statistic Z0

(c)   For the two sided test, reject H0 if Z0>Zα/2 or Z0<-Zα/2

Zα/2 = Z0.025=1.96

-1.96<1.8<1.96          -Zα/2 <Z0<Zα/2     within the acceptance region and the null hypothesis cannot be rejected

P-value approach:

Steps:

(a)  Formulate Ho and H1

H0: μ=50      H1: μ≠50     

(b)  Calculate the test statistic Z0

(c)   For the two sided test, p = 2[1-Φ( Z0)]=2*(1-0.9641)=2*0.0359=0.0718.

(d)  P-value>0.05, the null hypothesis cannot be rejected