```{r}
library(Lahman)
library(mosaic)

mlb <- Batting %>%
  mutate(BAvg = H / AB) %>%
  filter(yearID %in% c(1941, 1980) & AB > 400)
```

Just these two exceptional players.

```{r}
mlb %>%
  filter(BAvg > .36) %>%
  select(playerID, yearID, BAvg)
```

Grouping by year to get the mean and sd within years. 

```{r}
mlb %>%
  group_by(yearID) %>%
  summarize(N = n(), mean_BAvg = mean(BAvg), sd_BAvg = sd(BAvg))
```

Now comparing Reggie Jackson's performance within the actual data. By default, `pdata()` gives the lower tail, thus, it returns the percentile. 

```{r}
pdata(~BAvg, q = .300, data = filter(mlb, yearID == 1980))
```

Now comparing Reggie Jackson's performance with a few sample statistics, assuming a normal model. Why would this number not be identical to the number above?

```{r}
pnorm(.300, mean = .279, sd = 0.0276)
xpnorm(.300, mean = .279, sd = 0.0276) #to make a fancy figure along with it!
```

Lastly, we can simulate a normal distribution using the computer. Why would this number not be the same as with `pnorm()`?

```{r}
sim_BAvg <- rnorm(10000, mean = .279, sd = 0.0276)
pdata(~sim_BAvg, q = .300)
```

Fun with `pnorm()` and `qnorm()`

```{r}
pnorm(.300, mean = .279, sd = 0.0276)
qnorm(.7766325, mean = .279, sd = 0.0276)
```