Letters from a young statistician

Occasionally, I have thoughts that I can't fit into a 144-character tweet, and I need somewhere to put them. This is that place! I'm learning Jekyll as I go, so please bear with any bugs, and email me if you see something wrong.

Dagstuhl reflections

Over the last 10 days, I had the pleasure of attending rstudio::conf and the Dagstuhl seminar on Evidence about Programmers for Programming Language Design. At rstudio::conf, I taught a two-day workshop on Intro to R & RStudio, and at the Dagstuhl I mostly thought about exposing novices to programming in scientific contexts. So, there was a lot of overlap.

More guys

One incidental piece of the Dagstuhl is I thought more about the use of the term “guys.” While we were going around and doing our initial 3-minute introductions, I heard a lot of uses of “guys” that rubbed me the wrong way, so when I introduced myself I asked people to think about their gendered word choices throughout the seminar.

I regret doing this, because I think it marked me as someone who cares mainly about gender issues (I was there for the PL design stuff!) and I don’t think anyone really understood my point. People did become more aware of when they were saying it (and we used “squirrels!” as an interjection when someone did it), but not in a way that fostered inclusiveness. I’ll try to outline my personal opinion on the term here.

Scientists Programming

At the Dagstuhl seminar, Andreas Stefik asked me to speak about “Scientists Programming.” I’m not sure whether he did this on purpose, but by assigning people their talk titles, he drew us a little outside our comfort zone and got us to think more deeply. “Scientists Programming” wasn’t a talk that I had in my pocket, and wasn’t what I would have proposed, but it was fun to pull together. It made me realize that scientists programming is actually one of my primary interests, although I’ve never written that anywhere before. Felienne wrote up a nice summary of my talk, but I wanted to write down some thoughts of my own.

R syntax comparison

For my rstudio::conf working, Intro to R & RStudio, I finally finished my cheatsheet on R syntaxes. I’ve been working on this for at least a year (😱 ), so I’m glad to see it out in the wild.

When I posted the cheatsheet online, I got some critical feedback, which I would like to address in the form of a FAQ.

On microaggressions

I was glad to see Karl Broman’s blog post on a troubling conference talk. In the post, Karl describes how the speaker repeatedly used the word “guys” to describe people involved in statistics, and how he and Hadley Wickham reached out to him in an effort to get him to change his ways.

If you were in the room during the talk, I’m sure the experience is still viscerally available to you, but for those who were not, I think Karl’s post falls far short of the necessary description to underscore the problem. As I said in a recent tweet, it’s hard to convey in words how icky the talk was. But, I’m going to make an attempt.

What are the chances my name is Amanda?

I get called Amanda a lot. This tends to drive me crazy, because I think my real name is much more interesting. But, I realized recently that given the prior probabilities, it’s actually a very reasonable thing to call me.

[Note: this blog post has some interactive elements, so it is probably more fun to read on my shiny server. And of course, if you want to see what I did, the code is on GitHub.]

Census data: A rant

As usual toward the end of the semester, my students begin working in earnest on their final projects. And as usual, this results in my selective amnesia about census.gov being ruptured yet again. Looking for a way out, I turned to twitter and got some good recommendations. In case it’s useful to other teachers/people, I’ll share my thoughts.

(The tl;dr is that you probably want students to be using Social Explorer, but read on for the full rant.)

Statistics graduate school advice

It’s that time of year again, the time where I find myself meeting with students thinking about graduate school in statistics. Since I often end up sending people the same things, I figured I’d pull them together into a blog post. You probably already know about the grad cafe, the professor is in, and PhD comics. These other links might not be as common.

Worth adding to your inbox

For the most part, I get my data news from the web (blogs like flowingdata, simply statistics, stats chat) and twitter (check out the people I follow). However, there are a few email mailing lists I’ve joined that are worth the addional lines in my inbox. Here they are:

OpenVisConf talk transcript

During OpenVisConf, I hurriedly posted some links from my talk so people could follow up with resources I had mentioned. I kept meaning to write up a fuller blog post, but of course life comes along (in the interim, Smith has finished finals, I’ve graded a ton of projects, gone out of the state twice, and attended commencement).

So, when the official transcripts from the conference were posted, I knew I’d struck gold. Amanda Lundberg was the transcriptionist, and she did a fantastic job. The transcripts are provided with a creative commons license, so I grabbed the text from my talk, edited it a bit, and stuck in some images.

It was quite interesting to read the transcripts as-is. I know Amanda edited out a lot of “um”s and “uh”s as everyone was speaking, but a person’s verbal tics still get through. In my case, I say “that”, “so”, and “really” a LOT. Some of those have been removed here for ease of reading.

On a more interesting note, the organizers used TF-IDF to identify unique words and bi-grams from each talk. Mine make it pretty clear I’m a statistician. Words like “distribution”, “statistic”, “observed difference” and “bootstrap” all appeared more in my talk than others.