MLB Statistics Visualization

Design & development of a D3 InfoVis for MLB stats

Summary

TL;DR

As a group project for an information visualization course we were tasked with developing an InfoVis and we could use any data set. I chose MLB stats because of its inherent statistical nature of the sport, as well as the challenges and limitations of current solutions of displaying immense amounts of baseball statistics.

We analyzed current solutions and user needs, to create our solution goals. With current solutions, when a user wants to understand a hitter's performance or value they often search through multiple pages of data tables. This becomes even more difficult when users want to answer more complex relational questions such as a player’s performance in context with the history of baseball, or the relationship between hitter and team performance. Our goal was to create a solution that simplifies this processes and allow the user to retrieve insights on a simple single-page interactive InfoVis.

The other two team members were M.S. Computer Science so I was responsible for the design and they led the development. I went through multiple iterations of design prototypes from paper to high-fidelity, incorporating feedback on each iteration. The course and project is still in progress but we are nearing the end of the the design phase and will soon start development.

Overview

Purpose

I chose the topic of visualizing MLB statistics because baseball is one of the most statistics-laden sports because of its relatively additive nature (individual player contribution can be well separated), large sample size (500+ at-bats for a starting player during the season), and a long tradition of applying statistics, concepts and tools. As a result, baseball statistics are not only of interest for the analytic departments of professional teams, but also widely popular among the fans. The immense available data set consists of over 6000 plate appearances by hitters for each team per season dating back to 1901. This presents an interesting challenge of displaying all of this data in unique and usable ways to help users gain a better understanding of statistics and performance for players and teams, and this challenge excited me.

Challenges / Constraints

The purpose of the course project was to develop an InfoVis using D3, this meant there was little opportunity in our timeline for me to employ user research methods in order to better understand the problem and the users.

Another challenge was our group was supposed to be 4-6 members, but I could only recruit 2 other members so we had to do more work per person in order to satisfy the project requirements.

Project Info

Type Group Class Project
Class CS 7450 Information Visualization
Duration 1 Semester
Tools Used Balsamiq, Sketch, D3

Team

MyselfM.S. HCI
UX Research, UX Design, Front-end Dev, Video

Member 2M.S. CS
Finding data, D3 Programming, Front-end Dev

Member 2M.S. CS
D3 Programming, Data analyzing, Database

Links

Video: Coming Soon

Problem

Many people love baseball because of its inherent statistical characteristics. However, most of the time the practice of statistical inquiries of the fans are still limited to looking a spreadsheets with tables of hundreds of values, and it takes efforts to figure out patterns or trends. Even in the best internet resources of interesting baseball analytics (Fangraphs, Hardball Times, 538, etc.) where visualizations are often included, the figures are often of simple forms such as scatterplots or bar charts. To be fair, we do think using scatterplots and bar charts are justified in most cases and are the norm for good reasons. They are often extremely effective in conveying the intended message, and for the same reason our designs do not deviate very far from them. But to put it briefly, our project can be seen as an attempt at extending the conventional designs to convey richer information on a single view, while maintaining the information saliency of the simple designs which allow users to quickly figure out answers to interesting questions.

For this purpose, we choose to focus on the temporal dimension and to enable visual exploration of the history of MLB, highlighting individual player performance/development, team performance, and the relation between the two. Temporal patterns in these aspects are especially difficult to see with traditional methods, and even though finding historical data of an individual player is relatively easy, it is much harder to integrate the temporal storylines from different players from a team or the whole league.

Process

Method

The purpose of the course project was to develop an InfoVis using D3, this meant there was little opportunity in our timeline for me to employ user research methods in order to better understand the problem and the users. I did however take advantage of the early stages of the project to go through a mini “design sprint” and multiple stages of design prototypes.

Steps

  1. Met with my two teammates to discuss the user group and the problem
  2. Listed questions users might be looking to answer when looking at baseball data
  3. Analyzed current solutions such as Baseball Reference, Fangraphs, Hardball Times, 538, etc.
  4. Listed areas where our solution could improve on current solutions
  5. Developed Goal
  6. Each member draw paper prototype of a solution
  7. Identify ideas/features of each paper prototype that would be most effective to incorporate in our solution

User Needs

Some simple and complex insights that user may be looking to retrieve
Player
  • How does a specific player compare to others that year?
  • How does a specific player's statistics change over his career? Does a player develop over time?
  • What events contributed to changes in statistics, like Giants move to SF or Steroid Era happened?
  • Factors contribute most to the top players?
Team
  • How did a team perform along to the year timeline? How did a team do in a specific year? Who contribute to the team performance and by how much?
  • Is there any fun facts about the team/players?
  • What factors contribute the most to the top performance?
Player-Team
  • Which player statistics are most important to a team’s performance?
  • How do individual player performances correlate with team performance?
  • Successful teams rely more on a few stars or strong performance from the whole team?
  • Are there discernable patterns of player decomposition correlated to the ups and downs of a team?
  • Any interesting storylines? (e.g. Giant’s hugely successful draft picks in the late 2000s)
  • Can we see the trend of team/player payrolls and its relation to their performance?
  • How long does a player play in one team in general? How frequently and to what extent does a team’s roster change overtime?

Solution Goals

  • Display hitter performance in the context of the rest of the league, and the history of baseball
  • Allow users to quickly retrieve insights related to complex questions
  • Highlight individual player performance/development, team performance, and the relation between the two
  • Convey more information than existing solutions on a single page with minimal need for filtering
  • Enable simple filtering features so user is not overwhelmed with amount of data

Paper Prototypes

2 of the 3 paper prototypes we produced in our "design sprint" session

Low-Fidelity Prototype

High-Fidelity I

Poster

High-Fidelity Prototype II

Solution

Live Web App

Coming Soon: This project is still in progress

Impact on Problem

Coming Soon: This project is still in progress

Reflection

What I Learned

What I Would Have Done Differently

Coming Soon: This project is still in progress

Next Steps

Coming Soon: This project is still in progress