Lab 2I
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
mean of random
shuffles also produces differences that are normally distributed.R
functions to:
titanic data. Write
and run code calculating the mean age of
people in the data but shuffle their survival
status 500 times.
Assign this data the name
shfls.shfls, write and run code
using mutate to add a new variable to the dataset.
diff and should
be the mean age of those who survived minus
those who died.mean and sd of the diff
variable.
Assign these values the name
diff_mean and diff_sd.diff
variable looks approximately normally distributed.
Since the distribution of our diff variable appears
normally distributed, we can use a normal model to estimate the
probability of seeing differences that are more extreme than our actual
data.
(6) Draw a sketch of a normal curve. Label the mean age difference, based on your shuffles, and the actual age difference of survivors minus non-survivors from the actual data. Then shade in the area, under the normal curve, that is smaller than the actual difference.
(7) Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.
The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.
(8) If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):
rnorm function.mean height is 67 inches and the
standard deviation is 3 inches.histogram.pnorm to calculate
probabilities based on a specified quantity.
Conduct one of the statistical investigations below:
titanic data:
cdc data:
Male in our data is taller than the average
Female?