MLB Statistics Visualization

Design & development of a D3 InfoVis for MLB stats

Overview

Purpose

I chose the topic of visualizing MLB statistics because baseball is one of the most statistics-laden sports because of its relatively additive nature (individual player contribution can be well separated), large sample size (500+ at-bats for a starting player during the season), and a long tradition of applying statistics, concepts and tools. As a result, baseball statistics are not only of interest for the analytic departments of professional teams, but also widely popular among the fans.

The immense available data set consists of over 6000 plate appearances by hitters for each team per season dating back to 1901. This presents an interesting challenge of displaying all of this data in unique and usable ways to help users gain a better understanding of statistics and performance for players and teams, and this challenge excited me.

Challenges / Constraints

Ideally for this project I would like to do user interviews with both baseball fans, as well as Baseball R&D Analysts and other folks involved in baseball operations teams. However, the latter is a very niche user base and it is difficult to get access to those users. I have contacted one person from an MLB clubs baseball ops team and I'm hoping to interview them soon. Because of the lack of access to these users my research and design decisions will primarily be based on the baseball fan user demographic.

Project Info

Type Masters Research Project
Advisor Alex Endert
Duration 1 Semester (6 Credits)
Tools Used Balsamiq, Sketch, D3.js

Problem

Baseball is one of the most statistics-laden sports because of its relatively additive nature (individual player contribution can be well separated), large sample size (500+ at-bats for a starting player during the season), and a long tradition of applying statistics, concepts and tools. As a result, baseball statistics are not only of interest for the analytic departments of professional teams, but also widely popular among fans. The immense available data set consists of over 190,000 plate appearances by hitters per season dating back to 1901. This presents an interesting challenge of displaying all of this data in unique and usable ways to help users gain a better understanding of statistics and performance for players and teams.

Many people love baseball because of its inherent statistical characteristics. However, for both team analytic departments and fans, often the practice of statistical inquiries of is still limited to looking at multiple spreadsheets containing thousands of statistical values. This takes efforts to figure out patterns or trends.

Sometimes a few of the better baseball analytics internet tools include visualizations (Fangraphs, Hardball Times, 538, etc.). However, the figures are often of simple static visualizations such as scatter plots or bar charts. Existing solutions are rarely interactive, and often presented in views that are independent with each other. This often means they are limited to displaying one specific statistic and a smaller time frame or a higher overview.


Potential Solution

The goal of this project will be to design an online interactive MLB statistics visualization tool to convey a greater scope of information on a single view, while maintaining the information saliency of simple visualizations. This along with user interactions will allow users to both quickly find answers to simple statistical inquiries, as well as discover more complex insights and trends.

Focusing the visualization on the temporal dimension will enable visual exploration of the history of MLB, highlighting individual player performance/development, team performance, and the relation between the two. Temporal patterns in these aspects are especially difficult to see with traditional methods, and even though finding historical data of an individual player is relatively easy, it is much harder to integrate the temporal storylines from different players from a team or the whole league.

Process

Steps

  1. Literature Review
  2. Competitive Analysis
  3. User Interviews
  4. Low-fidelity Design Prototype
  5. User Testing
  6. High-fidelity Interactive Design Prototype
  7. User Testing
  8. Develop Final System
  9. Final System
  10. User Testing

Method

My goal is to perform initial user interviews to understand how participants use current solutions and how an interactive visualization could allow them to achieve their goals more effectively and efficiently. I then will go through an interative design process from paper design prototypes to a fully functioning D3 solution, while performing user tests as part of each iteration.

User Research

Paper Prototypes

Low-Fidelity Prototype

To be continued

I'm currently still working on this project and will update this page periodically.

Presentation