I am a data scientist. Data science is an emerging field commonly described as “the practice of deriving valuable insights from data,” and this thread runs through all of my work. My scholarly contributions have come in five main areas:

Subfields of interest to me include network science, applied statistics, sabermetrics, sports analytics, statistical modeling, analysis of algorithms, combinatorial optimization, data visualization, graph theory, and combinatorics. My Erdös number is 3, as I have co-authored a paper with Amotz Bar-Noy, who has co-authored a paper with Noga Alon, who has co-authored a paper with Paul Erdös.

My background is academically diverse, in that my undergraduate degree is in economics (my first declared major was English), my doctorate is in mathematics, my thesis advisor is in computer science, and my professional experience is in statistics. As such, my research tends to be interdisciplinary, with an emphasis on applying available techniques from any discpline to address the question of interest.

In 2012, I completed my Ph.D. in Mathematics at the Graduate Center of the City University of New York, where my advisor was Amotz Bar-Noy, also of Brooklyn College. Previously, I earned an M.A. in Applied Mathematics from the University of California, San Diego, and a B.A. in Economics from Wesleyan University.

In 2019, I won the Significant Contributor Award from the Section on Statistics in Sports of the American Statistical Association.

: Please see my C.V. for complete details on my work.


Books

Analyzing Baseball Data with R cover

Analyzing Baseball Data with R, 2nd edition

Analyzing Baseball Data with R, 2nd Edition introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.

Buy the book from:


Modern Data Science with R, 2nd edition

Contemporary data science uses both statistical modeling and computer programming to extract meaning from data. It requires a tight integration of knowledge from statistics, computer science, mathematics, and a domain of application. This book, which is intended for readers with some background in statistics and modest prior experience with coding, helps them develop and practice the appropriate skills to tackle complex data science projects. Most of the examples are done in R, but SQL, Python, and other cutting-edge tools are discussed as well.

Read the 2nd edition

Buy the book from:

Modern Data Science with R cover


The Sabermetric Revolution cover

The Sabermetric Revolution

Since leaving the Mets, I’ve written a book, entitled The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball, with leading sports economist Andrew Zimbalist. We examine the evolution of sabermetrics in baseball and other sports since the publication of Moneyball, summarize the current state of sabermetric thinking, and address the question of whether there is any evidence that sabermetrics has actually worked. The book will be published by the University of Pennsylvania Press and is scheduled for a December 2013 release.

Buy the book from:


Ongoing Projects

OpenIntro

I developed a series of courses on Introductory Statistics with R sequence of courses for DataCamp, an interactive platform to learn R and data science. Mine Çetinkaya-Rundel (Duke), Andrew Bray (Reed), and Jo Hardin (Pomona) are working with me on these courses. We are horrified by the recent sexual harassment scandal at DataCamp and the ensuing coverup.

Much of that content is now available through interactive tutorials developed with the learnr package supporting the textbook OpenIntro::Introduction to Modern Statistics Tutorials.

OpenIntro


Travis-CI Build Status CRAN_Status_Badge CRAN RStudio mirror downloads

ETL packages for R

etl is an R package to facilitate Extract - Transform - Load (ETL) operations for medium data. The end result is generally a populated SQL database, but the user interaction takes place solely within R.


Publication List

Search for me on:


[1]
C. Legacy, A. Zieffler, B. S. Baumer, V. Barr, and N. J. Horton, “Facilitating team-based data science: Lessons learned from the DSC-WAV project,” Foundations of Data Science, 2021 [Online]. Available: https://arxiv.org/abs/2106.11209
[2]
N. J. Horton, B. S. Baumer, A. Zieffler, and V. Barr, “The Data Science Corps Wrangle-Analyze-Visualize program: Building data acumen for undergraduate students,” Harvard Data Science Review, vol. 3, no. 1, pp. 1–8, Feb. 2021 [Online]. Available: https://hdsr.mitpress.mit.edu/pub/nvflcexe
[3]
A. A. McNamara, N. J. Horton, and B. S. Baumer, “Greater data science at baccalaureate institutions,” Journal of Computational and Graphical Statistics, vol. 26, no. 4, pp. 781–783, 2017 [Online]. Available: https://doi.org/10.1080/10618600.2017.1386568
[4]
B. S. Baumer, A. Y. Kim, K. M. Kinnaird, M. Q. Ott, and R. L. Garcia, “Integrating data science ethics into an undergraduate major,” Journal of Statistics and Data Science Education, 2020 [Online]. Available: http://arxiv.org/abs/2001.07649
[5]
J. Albert, M. Marchi, and B. S. Baumer, Analyzing baseball data with R, 2nd ed. CRC Press: Boca Raton, FL, 2018, p. 342 [Online]. Available: https://www.crcpress.com/Analyzing-Baseball-Data-with-R-Second-Edition/Marchi-Albert-Baumer/p/book/9780815353515
[6]
D. J. Kelley, B. S. Baumer, C. G. Brush, M. Cole, M. Dean, M. Madavi, M. Majbouri, P. Greene, and R. Heavlow, “Global entrepreneurship monitor 2016/2017 women’s entrepreneurship report,” Global Entrepreneurship Monitor; Global Entrepreneurship Research Association, Jul. 2017.
[7]
A. B. Elam, C. G. Brush, P. G. Greene, B. S. Baumer, M. Dean, and R. Heavlow, “Global entrepreneurship monitor 2018/2019 women’s entrepreneurship report,” Global Entrepreneurship Monitor; Global Entrepreneurship Research Association, Nov. 2019 [Online]. Available: https://www.gemconsortium.org/file/open?fileId=50405
[8]
B. S. Baumer and A. S. Zimbalist, “The impact of college athletic success on donations and applicant quality,” International Journal of Financial Studies, vol. 7, no. 2, p. 19, 2019 [Online]. Available: https://www.mdpi.com/2227-7072/7/2/19
[9]
M. Papaiakovou, N. Pilotte, B. S. Baumer, J. Grant, K. Asbjornsdottir, F. Schaer, Y. Hu, R. Aroian, J. Walson, and S. A. Williams, “A comparative analysis of preservation techniques for the optimal molecular detection of hookworm DNA in human fecal specimens,” PLOS Neglected Tropical Diseases, vol. 12, no. 1, pp. 1–17, Jan. 2018 [Online]. Available: http://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0006130
[10]
M. S. Schwartz, J. Schnabl, M. P. H. Litz, B. S. Baumer, and M. Barresi, -SCOPE: A new method to quantify 3D biological structures and identify differences in zebrafish forebrain development,” Developmental Biology, vol. 460, no. 2, pp. 115–138, Apr. 2020 [Online]. Available: https://doi.org/10.1016/j.ydbio.2019.11.014
[11]
R. D. De Veaux, M. Agarwal, M. Averett, B. S. Baumer, A. Bray, T. C. Bressoud, L. Bryant, L. Z. Cheng, A. Francis, R. Gould, A. Y. Kim, M. Kretchmar, Q. Lu, A. Moskol, D. Nolan, R. Pelayo, S. Raleigh, R. J. Sethi, M. Sondjaja, N. Tiruviluamala, P. X. Uhlig, T. M. Washington, C. L. Wesley, D. White, and P. Ye, “Curriculum guidelines for undergraduate programs in data science,” Annual Review of Statistics and Its Application, vol. 4, no. 1, pp. 1–16, 2017 [Online]. Available: http://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-060116-053930
[12]
A. M. Bertin and B. S. Baumer, “Creating optimal conditions for reproducible data analysis in R with ‘fertile’,” Stat, vol. 10, no. 1, p. e332, Dec. 2020 [Online]. Available: https://doi.org/10.1002/sta4.332
[13]
B. S. Baumer, “Lessons from between the white lines for isolated data scientists,” The American Statistican, vol. 72, no. 1, pp. 66–71, 2018 [Online]. Available: http://amstat.tandfonline.com/doi/full/10.1080/00031305.2017.1375985
[14]
B. S. Baumer, “Lessons from between the white lines for isolated data scientists,” PeerJ Preprints, vol. 5, p. e3160v2, Aug. 2017 [Online]. Available: https://doi.org/10.7287/peerj.preprints.3160v2
[15]
M. Lopez, G. J. Matthews, and B. S. Baumer, “How often does the best team win? A unified approach to understanding randomness in North American sport,” Annals of Applied Statistics, vol. 12, no. 4, pp. 2483–2516, 2018 [Online]. Available: https://projecteuclid.org/euclid.aoas/1542078053
[16]
M. Çetinkaya-Rundel, J. S. Hardin, B. S. Baumer, A. A. McNamara, N. J. Horton, and C. W. Rundel, “An educator’s perspective of the tidyverse,” Technology Innovations in Statistics Education, 2021 [Online]. Available: https://arxiv.org/abs/2108.03510
[17]
B. S. Baumer, A. S. Bray, M. Çetinkaya-Rundel, and J. Hardin, “Teaching introductory statistics with DataCamp,” Journal of Statistics Education, vol. 28, no. 1, Mar. 2020 [Online]. Available: https://www.tandfonline.com/doi/ref/10.1080/10691898.2020.1730734
[18]
B. S. Baumer, “A grammar for reproducible and painless extract-transform-load operations on medium data,” Journal of Computational and Statistical Graphics, vol. 28, no. 2, pp. 256–264, 2019 [Online]. Available: https://amstat.tandfonline.com/doi/full/10.1080/10618600.2018.1512867
[19]
R. Gould, B. Baumer, M. Çetinkaya-Rundel, and A. Bray, “Big data goes to college,” AMSTAT News, no. 444, pp. 17–19, 2014 [Online]. Available: http://magazine.amstat.org/blog/2014/06/01/datafest/
[20]
B. Baumer, “Applied mathematics at the ballpark: The life of one sabermetrician,” Math Horizons, vol. 22, no. 1, pp. 18–20, 2014 [Online]. Available: http://www.jstor.org/stable/10.4169/mathhorizons.22.1.18
[21]
B. Baumer, “In a Moneyball world, a number of teams remain slow to buy into sabermetrics,” in The great analytics rankings, R. Webb, Ed. ESPN.com; ESPN.com, 2015 [Online]. Available: http://espn.go.com/espn/feature/story/_/id/12331388/the-great-analytics-rankings#!mlb
[22]
B. S. Baumer, “The Oxford anthology of statistics in sports: Volume 1: 2000-2004 by James J. Cochran, Jay Bennett, Jim Albert,” The American Statistician, vol. 72, no. 3. Taylor & Francis, pp. 297–298, 2018 [Online]. Available: https://www.tandfonline.com/doi/abs/10.1080/00031305.2018.1496649
[23]
B. S. Baumer, “Analyzing baseball data with R by Max Marchi, Jim Albert,” International Statistical Review, vol. 82, no. 2. Wiley Online Library, pp. 313–315, Aug-2014 [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1111/insr.12068_5/full
[24]
S. Stoudt, L. Santana, and B. Baumer, “In pursuit of perfection: An ensemble method for predicting march madness match-up probabilities,” in JSM proceedings, 2014.
[25]
B. Baumer and D. Udwin, R Markdown,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 7, no. 3, pp. 167–177, 2015 [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/wics.1348/full
[26]
N. J. Horton, B. S. Baumer, and H. Wickham, “Setting the stage for data science: Integration of data management skills in introductory and second courses in statistics,” CHANCE, vol. 28, no. 3, pp. 40–50, 2015 [Online]. Available: http://chance.amstat.org/2015/04/setting-the-stage/
[27]
B. S. Baumer and G. J. Matthews, “There is no avoiding WAR,” CHANCE, vol. 27, no. 3, pp. 41–44, 2014 [Online]. Available: http://chance.amstat.org/2014/09/avoiding-war/
[28]
B. S. Baumer and P. Badian-Pessot, “Evaluation of batters and base runners,” in Handbook of statistical methods and analyses in sports, J. Albert, M. E. Glickman, T. B. Swartz, and R. H. Koning, Eds. Chapman; Hall/CRC Press: Boca Raton, FL, 2016, pp. 1–37 [Online]. Available: https://www.crcpress.com/Handbook-of-Statistical-Methods-and-Analyses-in-Sports/Albert-Glickman-Swartz-Koning/p/book/9781498737364
[29]
B. S. Baumer, Y. Wei, and G. S. Bloom, “The smallest non-autograph,” Discussiones Mathematicae Graph Theory, vol. 36, no. 3, pp. 577–602, 2016 [Online]. Available: http://www.discuss.wmie.uz.zgora.pl/gt/index.php?doi=10.7151/dmgt.1881
[30]
B. S. Baumer, D. T. Kaplan, and N. J. Horton, Modern Data Science with R. Chapman; Hall/CRC Press: Boca Raton, 2017, p. 551 [Online]. Available: https://www.crcpress.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/9781498724487
[31]
B. S. Baumer, D. T. Kaplan, and N. J. Horton, Modern Data Science with R, 2nd ed. Chapman; Hall/CRC Press: Boca Raton, 2021, pp. 1–673 [Online]. Available: https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/p/book/9780367191498
[32]
B. Baumer, G. Rabanca, A. Bar-Noy, and P. Basu, “Star search: Effective subgroups in collaborative social networks.” ACM; ACM, New York, NY, USA, pp. 729–736, 2015 [Online]. Available: http://dl.acm.org/citation.cfm?id=2810062
[33]
J. Hardin, R. Hoerl, N. J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, and others, “Data science in statistics curricula: Preparing students to ‘think with data’,” The American Statistician, vol. 69, no. 4, pp. 343–353, 2015 [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/00031305.2015.1077729
[34]
B. Baumer, “A data science course for undergraduates: Thinking with data,” The American Statistician, vol. 69, no. 4, pp. 334–342, 2015 [Online]. Available: http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2015.1081105
[35]
B. Baumer, M. Çetinkaya-Rundel, A. Bray, L. Loi, and N. J. Horton, “R Markdown: Integrating a reproducible analysis tool into introductory statistics,” Technology Innovations in Statistics Education, vol. 8, no. 1, 2014 [Online]. Available: http://escholarship.org/uc/item/90b2f5xh
[36]
A. Bar-Noy and B. Baumer, “Average case network lifetime on an interval with adjustable sensing ranges,” Algorithmica, vol. 72, no. 1, pp. 148–166, 2015 [Online]. Available: http://link.springer.com/article/10.1007/s00453-013-9853-5
[37]
B. Baumer and A. Zimbalist, Quantifying Market Inefficiencies in the Baseball Players’ Market,” Eastern Economic Journal, vol. 40, pp. 488–498, Dec. 2014 [Online]. Available: http://www.palgrave-journals.com/eej/journal/vaop/ncurrent/full/eej201343a.html
[38]
A. Bar-Noy, B. Baumer, and D. Rawitz, “Changing of the guards: Strip cover with duty cycling,” Theoretical Computer Science, vol. 610, pp. 135–148, 2016 [Online]. Available: https://doi.org/10.1016/j.tcs.2014.09.002
[39]
B. S. Baumer, S. T. Jensen, and G. J. Matthews, “OpenWAR: An open source system for evaluating overall player performance in Major League Baseball,” Journal of Quantitative Analysis in Sports, vol. 11, no. 2, pp. 69–84, 2015 [Online]. Available: https://doi.org/10.1515/jqas-2014-0098
[40]
B. Baumer, P. Basu, A. Bar-Noy, and C. Chau, “Social-communication composite networks,” in Opportunistic mobile social networks, CRC Press, 2014, pp. 1–36 [Online]. Available: https://www.crcpress.com/Opportunistic-Mobile-Social-Networks/Wu-Wang/p/book/9781466594944
[41]
P. Bogdanov, B. Baumer, P. Basu, A. Bar-Noy, and A. K. Singh, “As strong as the weakest link: Mining diverse cliques in weighted graphs,” vol. 8188. Springer, pp. 525–540, 2013 [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-642-40988-2_34
[42]
B. Baumer and A. Zimbalist, The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball. University of Pennsylvania Press, 2014, p. 240 [Online]. Available: http://www.upenn.edu/pennpress/book/15168.html
[43]
A. Bar-Noy, B. Baumer, and D. Rawitz, “Brief announcement: Set it and forget it - approximating the set once strip cover problem.” ACM, pp. 105–107, 2013 [Online]. Available: https://dl.acm.org/citation.cfm?id=2486162
[44]
B. S. Baumer, “Sensor strip cover: Maximizing network lifetime on an interval,” PhD thesis, City University of New York, 2012 [Online]. Available: http://proquest.umi.com/pqdweb?did=2677679131&sid=1&Fmt=2&clientId=29054&RQT=309&VName=PQD
[45]
B. S. Baumer, J. Piette, and B. Null, “Parsing the relationship between baserunning and batting abilities within lineups,” Journal of Quantitative Analysis in Sports, vol. 8, no. 2, pp. 1–17, 2012 [Online]. Available: https://doi.org/10.1515/1559-0410.1429
[46]
A. Bar-Noy and B. Baumer, “Maximizing network lifetime on the line with adjustable sensing ranges,” in ALGOSENSORS, 2011, vol. 7111, pp. 28–41 [Online]. Available: https://link.springer.com/content/pdf/10.1007/978-3-642-28209-6.pdf#page=38
[47]
A. Bar-Noy, B. Baumer, and D. Rawitz, “Changing of the guards: Strip cover with duty cycling,” vol. 7355. Springer, pp. 36–47, 2012 [Online]. Available: http://www.springer.com/us/book/9783642311031
[48]
B. Baumer, P. Basu, and A. Bar-Noy, “Modeling and analysis of composite network embeddings,” in MSWiM, 2011, pp. 341–350 [Online]. Available: https://dl.acm.org/citation.cfm?id=2068956
[49]
A. Bar-Noy, B. Baumer, and D. Rawitz, “Set it and forget it: Tighter approximation bounds for RoundRobin in a restricted lifetime model,” Algorithmica, vol. 76, no. 2, pp. 1–19, Oct. 2016 [Online]. Available: http://link.springer.com/article/10.1007/s00453-016-0198-8
[50]
B. S. Baumer and P. Terlecky, Improved Estimates for the Impact of Baserunning in Baseball,” in JSM proceedings, 2010.
[51]
B. S. Baumer and D. Draghicescu, Mapping Batter Ability in Baseball: A Study in Spatial Modeling,” in JSM proceedings, 2010.
[52]
B. S. Baumer, A. Galdi, and R. Sebastian, A Survey of Methods for the Statistical Evaluation of Defensive Ability in Major League Baseball,” in JSM proceedings, 2009.
[53]
B. S. Baumer, Using Simulation to Estimate the Impact of Baserunning Ability in Baseball,” Journal of Quantitative Analysis in Sports, vol. 5, no. 2, pp. 1–16, 2009 [Online]. Available: https://doi.org/10.2202/1559-0410.1174
[54]
B. S. Baumer, Why On-Base Percentage is a Better Indicator of Future Performance than Batting Average: An Algebraic Proof,” Journal of Quantitative Analysis in Sports, vol. 4, no. 2, pp. 1–11, 2008 [Online]. Available: https://doi.org/10.2202/1559-0410.1101