Data Foundations: Visualization

This is my fourth installment of a blog series that attempts to demystify the data world. While each post could not possibly tell you everything you need to know, they are meant to give you a sense of the space. They're a quick glimpse of the various areas of specialization, providing an overview of the type of challenges faced in each and the technical tools used.

  • Part 1 discussed the accessibility of data analytics and provided background in the form of a foundational data pyramid.
  • Part 2 dove into the first two layers of the data pyramid and shed light on why it's important to dabble in these areas.
  • Part 3 reviewed the various structural considerations you need to take into account as you begin to work with data.

Part 4 is where we currently find ourselves and it covers the area of data visualization. Much of what is done in previous layers of the pyramid is to get us to this point -- where we can make use of the collected, stored, and prepared data. While data visualization may sound like it is just fun and is focused on making pretty charts, it is actually much more than that. In all fairness, though, it is also fun and beautiful.

Visualization is about telling a story.

It is about finding interesting insights in a pile of data, exploring the relationships between your data, and presenting your findings. The ultimate hope is that you find something impactful, that can help people make a decision driven by facts.

When I began my Power of Visualization presentation I used a common example of how visualizing data can lead to different insights than simply looking at raw data.

Find the top 15 values

Which image made the task the easiest? I'm guessing the 3rd one. The unique combinations of region and year are listed as rows, with the bar representing the value, and the whole thing is sorted (and ranked) for us! Now, this is an obvious and easy example. You may be thinking that I created the question to fit the visual, when in fact I created the visual to answer the question. The visual can actually answer a variety of questions, but each question has a similar flavor. This visual could not answer questions specific to just region -- it would be very difficult to find all instances of the Midwest region in order to add up all values to be able to answer such a question. One of the other visuals would probably be easier to do such a thing.

But why ask a user to do that?! Design your visual to answer the question at hand. If you have a different question, create a different visual!

This leads us to the types of questions to ask of your data

Well, if you are building visualizations for an organization, they probably have a pile of questions they're looking to get answers for. But, it's possible you've been given a bunch of data and are being asked to find something interesting. Where do you begin? 

  • Explore each field in your data: what are the range of values (minimum and maximum), what is most common and least common value, etc.
  • Understand the relationships between fields in the data: how does price relate to profits, what region has the largest percentage of customers, etc.
  • Does anything stand out to you as unusual or interesting? This is where you need to use your creativity and curiosity to ask questions of your data.

Once you find something interesting, you need to consider what you expect someone to do with that information. Do you want them to take action on something? If so, what chart is best to communicate that information? How do you tell a story built from data?

There is a whole field of study on how human perceive certain types of charts

Instead of attempting to cover all of that today, I will provide some resources:

  • Storytelling with Data is a great community space and blog, with resources and opportunities to practice your visualization skills
  • Information Dashboard Design is a book that discusses the numerous considerations when building a visual
  • Edward Tufte is a master of data visualization, who has written many books and even provides seminars

It may seem overwhelming, but the best thing you can do is find data and practice exploring it. Build visualizations, share them, and get feedback. The best tool I can recommend as you begin this journey is Tableau Public (it's the free version of Tableau Desktop). Tableau has a ton of free resources and a great community. Find them on the forums or Twitter if you have any trouble! Of course, feel free to reach out to me if I can help you in any way.