MLB Statistics Visualization

Summary

TL;DR

As a group project for an information visualization course we were tasked with developing an InfoVis and we could use any data set. I chose MLB stats because of its inherent statistical nature of the sport, as well as the challenges and limitations of current solutions of displaying immense amounts of baseball statistics.

We analyzed current solutions and user needs, to create our solution goals. With current solutions, when a user wants to understand a hitter's performance or value they often search through multiple pages of data tables. This becomes even more difficult when users want to answer more complex relational questions such as a player’s performance in context with the history of baseball, or the relationship between hitter and team performance. Our goal was to create a solution that simplifies this processes and allow the user to retrieve insights on a simple single-page interactive InfoVis.

The other two team members were M.S. Computer Science so I was responsible for the design and they led the development. I went through multiple iterations of design prototypes from paper to high-fidelity, incorporating feedback on each iteration.

Overview

Purpose

I chose the topic of visualizing MLB statistics because baseball is one of the most statistics-laden sports because of its relatively additive nature (individual player contribution can be well separated), large sample size (500+ at-bats for a starting player during the season), and a long tradition of applying statistics, concepts and tools. As a result, baseball statistics are not only of interest for the analytic departments of professional teams, but also widely popular among the fans. The immense available data set consists of over 6000 plate appearances by hitters for each team per season dating back to 1901. This presents an interesting challenge of displaying all of this data in unique and usable ways to help users gain a better understanding of statistics and performance for players and teams, and this challenge excited me.

Challenges / Constraints

Another challenge was our group was supposed to be 4-6 members, but I could only recruit 2 other members so we had to do more work per person in order to satisfy the project requirements.

Both of my teammates were from China, one did his undergrad in the US and had a good understanding of baseball. The other, did not know anything about baseball so that presented a challenge.

Project Info

Type	Group Class Project
Class	CS 7450 Information Visualization
Duration	1 Semester
Tools Used	Balsamiq, Sketch, D3

Team

MyselfM.S. HCI
UX Research, UX Design, Front-end Dev, Video

Member 2M.S. CS
Finding data, D3 Programming, Front-end Dev

Member 2M.S. CS
D3 Programming, Data analyzing, Database

Links

Web App: http://mlbstat.com

Video: https://youtu.be/gCGGDFH4zt8

Problem

Many people love baseball because of its inherent statistical characteristics. However, most of the time the practice of statistical inquiries of the fans are still limited to looking a spreadsheets with tables of hundreds of values, and it takes efforts to figure out patterns or trends. Even in the best internet resources of interesting baseball analytics (Fangraphs, Hardball Times, 538, etc.) where visualizations are often included, the figures are often of simple forms such as scatter plots or bar charts. Existing solutions are rarely interactive, and often presented in views that are independent with each other.

Our project can be seen as an attempt at extending the conventional designs to convey richer information on a single view, while maintaining the information saliency of the simple designs which allow users to quickly figure out answers to interesting questions.

For this purpose, we choose to focus on the temporal dimension and to enable visual exploration of the history of MLB, highlighting individual player performance/development, team performance, and the relation between the two. Temporal patterns in these aspects are especially difficult to see with traditional methods, and even though finding historical data of an individual player is relatively easy, it is much harder to integrate the temporal storylines from different players from a team or the whole league.

Process

Method

The purpose of the course project was to develop an InfoVis using D3, this meant there was little opportunity in our timeline for me to employ user research methods in order to better understand the problem and the users. I did however take advantage of the early stages of the project to go through a mini “design sprint” and multiple stages of design prototypes, incorporating feedback on each iteration.

Steps

Met with my two teammates to discuss the user group and the problem
Listed questions users might be looking to answer when looking at baseball data
Analyzed current solutions such as Baseball Reference, Fangraphs, Hardball Times, 538, etc.
Listed areas where our solution could improve on current solutions
Developed Goal
Each member draw paper prototype of a solution
Identify ideas/features of each paper prototype that would be most effective to incorporate in our solution

User Needs

Some simple and complex insights that user may be looking to retrieve

Player

How does a specific player compare to others that year?
How does a specific player's statistics change over his career? Does a player develop over time?
What events contributed to changes in statistics, like Giants move to SF or Steroid Era happened?
Factors contribute most to the top players?

Team

How did a team perform along to the year timeline? How did a team do in a specific year? Who contribute to the team performance and by how much?
Is there any fun facts about the team/players?
What factors contribute the most to the top performance?

Player-Team

Which player statistics are most important to a team’s performance?
How do individual player performances correlate with team performance?
Successful teams rely more on a few stars or strong performance from the whole team?
Are there discernable patterns of player decomposition correlated to the ups and downs of a team?
Any interesting storylines? (e.g. Giant’s hugely successful draft picks in the late 2000s)
Can we see the trend of team/player payrolls and its relation to their performance?
How long does a player play in one team in general? How frequently and to what extent does a team’s roster change overtime?

Solution Goals

Display hitter performance in the context of the rest of the league, and the history of baseball
Allow users to quickly retrieve insights related to complex questions
Highlight individual player performance/development, team performance, and the relation between the two
Convey more information than existing solutions on a single page with minimal need for filtering
Enable simple filtering features so user is not overwhelmed with amount of data

Paper Prototypes

2 of the 3 paper prototypes we produced in our "design sprint" session

Paper Prototype A

Paper Prototype B

Low-Fidelity Prototype

Low-Fidelity

Low-Fidelity Annotated

High-Fidelity I

Poster

High-Fidelity Prototype II

High-Fidelity II - Annotated

High-Fidelity II - Timeline Zoomed

Solution

Unfortunately, because I did all of the design work, I decided to let my teammates take care of the development. In typical semester-long project fashion, development of the final app didn’t start until the last couple weeks of the semester. Fortunately, my teammates were great developers and turned around a functional web app in short time.

Unfortunately, I experienced the classic designer disappointment of handing off designs from a semester of hard work and having them all but ignored. The web app that they designed and developed works decently, and includes some of my design and research. However, the user experience is painfully disappointing to me.

I often like to say, “If you need an onboarding tutorial to explain a user how to use your app, it means your UX sucks.” The inclusion of an onboarding screen in our app is a great example of this.

Final Web App

http://mlbstat.com

Video Demo

Reflection

What I Learned

Although the final solution was incredibly disappointing to me, I took away some important lessons. I learned the importance of good communication between designers and developers. Despite me communicating my research and design decisions, the developers ignored them. I’m not sure what the issue was, but maybe I didn’t communicate the justifications for the design decisions effectively enough to motivate the developers to incorporate them.

However, I also learned that as the designer, I could have benefitted from more communication with my team. Although what the developers designed and built wasn’t what we designed, they made some design decisions that I thought were great and I would have loved to incorporate them in the design process.

What I Would Have Done Differently

Although it was a restriction based on the fact that this was not a design project, I would have done much more initial user research to better understand the problem and design a solution.
I would have done more user testing between design iterations.
I would have loved to do one more design iteration/ Based on user feedback we decided to incorporate the WAR/Salary timeline in a bigger way. I sketched and explained to the developers how we could do this. However, I didn’t have a chance to do a new high-fidelity iteration of this design.
I would have liked to quantify the impact of our solution by timing users to discover an insight using both existing solutions and our solution. e.g. “Which Red Sox player had the highest value (WAR to salary ratio) in the last 5 years”

Next Steps

I'd really like to take a step back and do another design iteration. Then develop a web app based on that and do some user testing to quantitively compare my solution to existing solutions. As a busy college student it'll be tough to find time to work do this on the side, but I think it will be fun and give me some closure on this project.