6 Data Visualization Fundamentals Every Data Scientist Needs to Know

Posted by Nick Paul on Tuesday, January 25, 2022

TOC

  1. Introudction
  2. Understand the Power of Data Viz
  3. Start With the Big Picture
  4. Design With Your Audience in Mind
  5. Focus On Details
  6. Find Mentors
  7. Practice

Introduction

Whether you are just starting out on your data science journey or been in the game for a while, there is never a shortage of things to learn or skills to pickup. One must-have skill all data scientist need in their toolkit is the ability to create clear, concise, and meaningful Data Visualizations.

Even if you are the best ML engineer of all-time, your models will sit in your github repo if you cannot communicate the value to leaders. Leaders are busy and most make many decisions across different domains. They want – and deserve – their information presented in a clear and concise manner. If you make them spend their cognitive energy and time deciphering your chart, they may not have time to think about the next steps or, worse, may come away with the wrong conclusion.

Understand the Power of Data Viz

The power behind data visualization is directly linked to the way our brains process information. Our brains are not capable of holding large volumes of data in memory - most people struggle to keep more than seven items in their working memory at a time - but are excellent at recognizing patterns. Data visualization takes advantage of these two facts by removing the requirement to hold a large number of data points in memory, which frees up our cognitive energy to focus on discovering patterns and relationships.

Consider Anscombe’s quartet:

Anscombe’s quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Each of these four data sets has the same mean and standard deviation.

Statistic Data Set 1 Data Set 2 Data Set 3 Data Set 4
Mean of x 9.00 9.00 9.00 9.00
Mean of y 7.50 7.50 7.50 7.50
SD of x 3.32 3.32 3.32 3.32
SD of y 2.03 2.03 2.03 2.03

However, when we plot these data sets, we see a totally different story.

Without any statistical training, the average person can explain how these data sets are different.

Consider this example from Stephen Few:

Can you determine which country has the lowest unfavorable rating? What about the highest?

How about now?

Using color, ordering, spatial representation, create a much clearer picture. If you hand the first one to a leader, they might throw it at you. On the other hand, the second chart leads to thoughtful discussion and a clear picture of who does and does not like the US. Instead of trying to decipher the story, the leader can now move on to new questions: what is causing the data that we are seeing.

Start With the Big Picture

You should always begin the data science process by defining your problem. What question are you trying to answer? What decision are we trying to inform? Creating a data visualization is no exception. Before we start generating plots, we need to think about the key points we are trying to communicate. Focus on the key points first! You never want to waste effort creating sophisticated visualizations that do not tie back to your original objective.

As an example, remember the old Snickers commercial where the guy writes CHEFS instead of CHEIFS in the end-zone?!? He painstakingly colors each blade of grass and uses superior technical skill but loses sight of the big picture. He is now stuck with hours of work to undue the mess; don’t be this guy!
Hey that’s great, but who are the chefs?

Design With Your Audience in Mind

The type of visualization you create is dependent on the audience. You should create the graphic with minimal expectations on the users. Below are some important considerations about your audience:

  1. What is their familiarity with the subject?
  2. What is their technical background?
  3. How much time will they have to consume your product?
  4. Is the visualization going in a report, presentation, website, etc.?
  5. What do you want your audience to know or do?

The last one is extremely important. We conduct analysis to inform decisions so think about the decision your audience needs to make.

Focus On Details

While crafting a visualization, you should constantly question every element on the chart with one question in mind: if I remove this item, do I lose valuable information or context? Every detail should add clarity and context for the audience. Distractions increase the likelihood your message will be lost.

Best Practices:

  • Accurately convey the data
  • Clearly label your plot
  • Remove clutter
  • Make order matter
  • Highlight the important parts
  • Tell your story

In my days as an analyst and even as a consultant, I would often have people pile on my charts and try to add a bunch of clutter. Other analysts, mid-level managers, or staff members would critique my charts as too simple and show me options to make 3-D bar charts with black backgrounds and cover every inch of the chart with wording and numbers - often resembling a used car ad by the time they were done. But, when it came time to brief leadership on a topic, I always had better success when I kept it simple and was relentless with removing distractions.

Defend you work and your story from those that want to jazz-up your chart and make it sexier! Just because your favorite Data Visualization tool has the option, does not mean you should use it - a 3-D exploding pie chart may be the worst data visualization of all-time but excel will still offer it to you.

Keep it simple and your story will come through.

Find Mentors

If you ask a successful writer how they perfected their craft, many will tell you by reading, a lot! Storytelling with data is no different. In order to create meaningful visualizations you need to see what others have done. Finding mentors is key!

Do not worry if you cannot find one where you work or are currently going to school. To be honest, most of the people I worked with or for were pretty bad at data visualization. Even if you have someone that is decent, chances are they are not one of the best of in the world. With the internet, you have access to thoughtful approaches and practices that have timeless fundamentals. My approach is to read a couple books that get to the fundamentals and follow creators on social media to explore the realm of possible. Below are a few of my favorites.

Practice

Finally, as I like to tell my kids, in order to make good barbecue, you have to have experience, and the only way to get experience is by making bad barbecue. Likewise for storytelling with data. You will not make that New York Times front page visualization until you make a lot of really bad charts. So, what are you waiting for? Get out there and make some bad barbecue.