Hamilton Ulmer

data visualization

Good Data Viz is About the Small Things

It's all too common to fixate on the choice of data visualization method at the expense of the just-as-important small choices – labels, annotations, animation, and other bits you can add to tell the story better. That’s really where good data viz shines.
published
Nov 11, 2020
topics
dataviz communication data science

I recently consulted on an internal data visualization project at work, where someone asked this question: “What's type of data graphic best shows this particular insight?” I've been asked variations of this a lot over the years, and thought it might be better to just write an answer for posterity's sake.

In my experience, picking a data visualization method to convey some kind of insight is really the start of your journey, not the end. This is not to say that the choice isn’t important. There’s plenty of reading material on the internet that addresses the tradeoffs between, say, a funnel chart and a Sankey diagram. It's just that fixating on the graphic type focuses the solution on a specific output, not a desired outcome. You have to go so much further.

changing the framing

To that end, I try to get others to reframe the question to this: “how can this data visualization enable the reader to effortlessly see the story I see?”. This question can change your tactics in profound ways. After all, getting someone to understand your viz requires you to practice basic reader empathy. Your readers don't always have the time, the context, nor the skills to uncover insights from the data on their own. Your job is to make it easy – hell, I'd say trivial – for them to reach that "aha!" moment, where their thinking changes and the possibilities open up.

enter small things

When you shift your focus to telling the story well, you'll begin to understand why your work doesn't stop at picking a visualization method or graphic type, and what to do next. It's the small things that really make the story stick – labels, annotations, design choices, animations, mouse interactions, tooltips, data sources, documentation, and other extraneous details not always covered in your favorite data viz book.

The small things make the reader's journey possible. They're the contextual pieces needed to see the story clearly. They're the details that delight them into trusting your expertise; that one annotation that guides their attention; that helpful tooltip that explains a complicated metric; all the pre-empted answers to their immediate questions. Sometimes, they even make room for the reader to explore on their own.

Your graphic choice may point the reader in the right direction. It may even get them part-way there. But the road that leads them to that "aha!" moment is almost always paved with small things.

Ridge hiking near Muir Beach, San Francisco in the distance.

"Small" may not be the right word – not all of these things are visually small – but I think it works in this context. Juxtaposed with the "big" choice of graphic type, these other parts are seen as smaller and more numerous. This is probably why they're considered afterthoughts, if they're even considered at all. But without them, readers are liable to:

  • get lost – misinterpreting the visualization can push them to form the wrong conclusion or make the wrong decision.
  • get stuck in the mud – they might fixate on a meaningless part of the visualization, assuming there is something important that isn't really there, without arriving at the core insight.
  • give up and head elsewhere – if your data visualization is hard to understand, they may just give up, making a decision without any data.

Building intuition about what small things to add is a byproduct of hands-on experience and relentless reader empathy. It requires putting a data visualization in front of someone and seeing them struggle to understand, and having the drive to understand and address their confusion and frustration. The more you do it, the easier it gets to anticipate the ways you can make your visual story clearer.

If you're impatient for a checklist you can jump to some small-thing recommendations I outline below.

When a reader does truly understand the story, it's a rewarding experience. The insights will often lead to deeper, more interesting questions and explorations (I sometimes call this the data viz happy path). Isn't this the whole point of telling visual stories with data – helping others reach a new understanding?

It's not controversial to say that this idea – small details transforming a work from "meh" to "good" – is true in just about every communication medium. Small things reduce the cognitive load and make the medium itself disappear, leaving only the story. A great screenplay, an enthralling book, a song that grooves so well it can either fade into the setting or command your attention. "Good" work is not contructed by accident yet feels natural. And so it is with good data visualization. When it's really good, the graphic choices disappear, and the insights remain.

An Example

Let's see the difference between these two framings – picking a graphic vs. telling the story well – through a practical example. Say you are fixated on “which graphic?” and pick a Sankey chart as a way of expressing some sort of user acquisition funnel for your company's newly-launched product, Sprockets Desktop. The software package you're using can easily express a series of state transitions as a static Sankey diagram, and you're surprised by how easy it is to get something together. You share the chart below with the product manager, who has never seen the acquisition funnel numbers before: "Here's the acquisition funnel we talked about. Any thoughts?"

Wikipedia:
funnel analysis.

Stopping at "what graphic to use?"

The response you get back from the very busy product manager is, well, terse:

an underwhelmed / underwhelming response

Looks good, thanks.

– the PM

You've reached a crucial moment. The product manager may never give you feedback on what's wrong, especially if they are not particularly data-savvy or don't have the time. But you can tell they're underwhelmed.

And this is where people sometimes screw it up. They assume it's because the visualization method isn't right. Before you change directions, let's say you prod this product manager for some real feedback. This encourages them to unleash a longer critique:

the product manager has a moment of candor

What period of time is this chart for? These labels look like columns in a SQL resultset and I don't understand all of them. This Sankey chart gives me a good sense of the overall funnel dynamics but it's missing the numbers. Are these user states big or small in practice? I'd like to just SEE the numbers on the thing directly – there’s so much room available. This is for Sprockets Desktop, right? Why isn’t the title more descriptive? How do you define these different states? Can I get the raw data somehow? It's neat to see this funnel, but I'm struggling to understand it. Sorry, just being honest!

– the PM

Changing what type of graphic you use won't answer these questions. The reader didn't even criticize the choice of Sankey chart.

Now, let's say you shift your thinking from picking a graphic to telling the story well, and add the small things they complained about:

Focusing on "how do I get the reader to see the story I see?"

The product manager's response is effusive:

the product manager has that "aha!" moment

This is fascinating. One in three signups never install? Wow – what a big opportunity. And this is for the last week, so it could be a result of changes we made a couple weeks ago but never tracked. I am going to share these numbers in our next strategy meeting and see if anyone else has ideas. I think we've been fixated on converting web sessions to signups, but we haven't been considering all the other weak points. Nice that there's a link to the data source. I'm going to pair this with data from our external data vendor. I have so many follow-up questions about where we can go from here!

– the PM

Your updated chart got the reader to an "aha!" moment quickly, answered all of their immediate contextual questions, and put them on the path to asking the next set of deeper product questions. All of these were achieved by focusing on the small things that made the data visualization more immediately interpretable.

This is obviously a contrived example. The data visualization does have to be relevant to the audience. In our example, it'd be a problem if the product manager didn't care about the user journey of the product they're working on. If your data visualization isn't showing something meaningful to the reader, it probably won't get them to care. This said, it's not always so straightforward to get someone to understand that a new insight is relevant, especially if the visual story you're telling is novel to them in some way. But then again, that's why we've reframed the challenge to telling the story well, isn't it?

The Two Constraints

The reality is, even if you have the desire to take your data visualization further, there are two things that are likely to get in your way: limitations with your tools, and diminishing returns.

limitations with your tools

If you’ve used a visualization library before, you’ve probably discovered that there are always limits to what you can do. The lower level you go, the more work it is to get annotations and animations to do what you want.

This said, most libraries support all the basic small things in some way. I'm hesitant to suggest what things to add to your charts, since building intuition around effective visual storytelling is more important than having a checklist. This said, there are some common easy-to-implement small things that always immediately improve the readability and interpretation of your graphic:

  • human-readable labels – labels are always the easiest thing to add, and often the most impactful. Make sure they're expressed in clear, simple language. Don't expose users to camel_case formatting or things they wouldn't encounter in normal reading. If your axes have numbers on them, spend the time to make them more human-readable, transforming a label like 1.542E-10 to something easier like 154M or even 154,342,000. Dates should always have a human-friendly format, not 2020-10-02.
  • contextually-useful axes – this is another obvious one, but you can do more than just adding axes and making sure the labels on them are human-readable. Where you place your ticks and what additional text you put on your axes can help contexualize the data better. The default axis layout may not be adequate. After all, the machine doesn't know the story; you do.
  • legends – another obvious element to add, and one that most graphics libraries support. Legends teach readers what symbols, colors, and element styles map to categories or ranges.
  • theming – people tend to scoff at the idea of spending time making the default aesthetics of a data visualization fit within the reader's context. It shouldn't matter, after all. But it does. Most tools do make it easy to match the fonts and colors of your graphs to fit in with the page. If it's easy to do, I suggest putting together your own theme. When I do custom data visualization for the web, this is easy, since I tend to have access to all the same css properties, color tokens, and fonts. When I've made static graphics with ggplot, I tend to put together my own custom theme that makes the data graphic blend into the writeup a bit more. It sounds silly, and I would say you have to be careful not to overinvest, but it does reduce a bit of cognitive friction.
  • titles, subtitles, etc. – always provide a descriptive chart title. If you can add a subtitle, use one to give more context about the range of time the chart is about or other useful information. Always, always, always use the title and subtitle elements if they're available.
  • annotations – annotations do a great job of providing context and interpretation. A vertical marker and label can show an event in a line chart. A horizontal marker with a label can show some kind of baseline, some number to compare against a trendline, bar chart, or box plot. I find that annotations tend to really help drive the visual storytelling. They tend to be the hardest to wrangle, since a lot of libraries don't make it easy to do what you want. If your library of choice can't easily do what you want, you can always just open up Preview, or Figma, or any other free design tool, and just add annotations to your viz and export an image or svg. There is no shame in using more than one tool. You just need to practice empathy for your reader and use everything at your disposal. When you remember your end goal, you'll tend to care a bit less about how you reach it.

Interactive "small things" are much less-supported, but I find them to be very useful:

  • tooltips – if the data visualization library you're using produces interactive charts, and tooltips are an option, use them to clarify terms and provide context or caveats. People are always grateful to have access to additional information like this. They tend to reduce the most common question someone has with a data graphic: "what does this label mean?"
  • animation – animation looks nice, but it also makes an interactive visualization more accessible. They help guide the readers eye to the next state, and the tweening between states keeps them from being confused or jarred. Good animation makes the work feel natural.
  • interactivity, reactivity – this is an area where the tools are getting better and better. With a reactive data visualization framework, changes in your data are automatically propagated to all the components that listen to the data. This makes it easy, for instance, to link a mouseover in one graph to trigger mouseovers in all the other graphs on the same page. I tend to use Svelte & D3 for almost all my data visualization these days for this reason, mostly because it saves me substantial time to get to where I want to go and is relatively easy to get started with compared to React. Observable notebooks also make reactive data visualization easy to implement if you have less experience with frontend work. Other python-based tools like Streamlit are promising, and of course Bokeh has been great for a really long time.

Wikipedia:
tweening.

the Svelte web framework.

Observable.

Streamlit.

Bokeh.

diminishing returns

There isn’t always a good case to be made for investing further in making your data visualization shine. Knowing where to draw the line is key. Data viz that answers a brief analytics question doesn't necessarily need as much investment; a data visualization that is meant to tell the kind of story that changes how people think does. Small things have a higher ROI when (1) the data visualization is considered important and (2) you might not be around to explain it to others and they're left to interpret on their own.

Regardless, I do think that investing in your ability to tell good data stories pays big dividends over time, regardless of the context – especially if you’re writing your own code (in some capacity) to do the visualization. The tricks you learn to make a data visualization more understandable are new, reusable patterns rather than one-offs.


There is almost always a bit more you can do to tell the story better with the data viz tools you have today. Challenge yourself – you might be surprised. When you put yourself in your reader’s position and think about how to get them to that "aha!" moment, then the work in front of you becomes a lot clearer. And I promise you that work will be about lots of little things.

Thanks to Marissa Gorlick for reading over an early draft and providing great feedback.


Thanks for reading. If you have any questions, comments, or disagreements, please drop me a line at hamilton dot ulmer at gmail.