```{r} library(Lahman) library(mosaic) mlb <- Batting %>% mutate(BAvg = H / AB) %>% filter(yearID %in% c(1941, 1980) & AB > 400) ``` Just these two exceptional players. ```{r} mlb %>% filter(BAvg > .36) %>% select(playerID, yearID, BAvg) ``` Grouping by year to get the mean and sd within years. ```{r} mlb %>% group_by(yearID) %>% summarize(N = n(), mean_BAvg = mean(BAvg), sd_BAvg = sd(BAvg)) ``` Now comparing Reggie Jackson's performance within the actual data. By default, `pdata()` gives the lower tail, thus, it returns the percentile. ```{r} pdata(~BAvg, q = .300, data = filter(mlb, yearID == 1980)) ``` Now comparing Reggie Jackson's performance with a few sample statistics, assuming a normal model. Why would this number not be identical to the number above? ```{r} pnorm(.300, mean = .279, sd = 0.0276) xpnorm(.300, mean = .279, sd = 0.0276) #to make a fancy figure along with it! ``` Lastly, we can simulate a normal distribution using the computer. Why would this number not be the same as with `pnorm()`? ```{r} sim_BAvg <- rnorm(10000, mean = .279, sd = 0.0276) pdata(~sim_BAvg, q = .300) ``` Fun with `pnorm()` and `qnorm()` ```{r} pnorm(.300, mean = .279, sd = 0.0276) qnorm(.7766325, mean = .279, sd = 0.0276) ```