You probably would not be reading this if you didn’t already know that GIS is a great tool for parsing information and representing it in a spatial format. But what if I said that GIS is too good at it… and that GIS makes it too easy to misrepresent your data. And what if I said that all the amazing data interpretation tools at our availability make it too easy to tell a story with data, rather than to let the data accurately represent reality.
Ok… example time, right? Imagine it’s the spring of 2020 again (sorry), and you have been tasked with identifying COVID regulation compliance problem areas in the state and where they might overlap with vulnerable populations. You are provided with the most recent copy of the Governors COVID compliance complaint data, some limited English proficiency data layers, and the ACS demographics data. Go forth and do. Now, what does your analysis look like? I can tell you what mine looked like. It was a beautiful heat map with a nice color ramp from purple to green (my favorite) and it very accurately showed… where people live in Washington State. Turns out, overlying data reliant on the general population is very likely to just show you where the most people congregate. It was mostly useless (hence there not being a nice pic of it here). Looked great, told us nothing. There were a few interesting tidbits in the data. But, if you had relied on the nice map I had originally produced, you could have confidently held it up and proclaimed that “All cities in Washington are exhibiting high levels of COVID regulation non-compliance!”
Fig.1: The probably all too familiar John Hopkins Covid Dashboard for Washington State.
But that wouldn’t really be true. Say you start over and normalize the compliance complaint data by population statistics? What might the data say then? It won’t tell the same story that’s for sure. Normalizing that data wasn’t exactly easy… Since the data was open to the public and it was clearly labeled “complaint” well, you can probably already imagine the abundance of inappropriate entries. Political complaints against a certain federal facility had to be filtered out as did anything else that obviously had nothing to do with COVID regulations. So, after filtering and normalizing the data per capita it returned nothing near as interesting. A few outliers remained, some of which were almost certainly just exacerbated by population density and job dissatisfaction. All in a day’s work. I reported what I found and moved on.
Fig. 2: A very official looking map to make up for the one I deleted years ago. This one also just shows where people live.
So what should I have done instead? It was a rush job right? Take data, make map. Got it. Well… provided I’d had the time to think it over properly, I should have written out an analysis hypothesis. Yes, a hypothesis. Given the available data, normalized for population and filtered for erroneous entries, is there an overlap of areas which have a low COVID Compliance rate and a large LEP population. Easy enough right? Essentially, you need to create an analysis plan, and stick to it. No changing things up because your data returned nothing interesting. (AKA Data Dredging, it’s bad.)
I feel like that was a fairly innocuous example (and yes there is more to that story...) I saw my mistake, corrected it and moved on. But what if I had wanted to make a statement about low compliance levels? Or even just thought less about what my results meant. I could have left the map as is, submitted it to the response team and let them sound the alarms (or not). In his now infamous (at least to geographers) book How to Lie with Maps Mark Monmonier basically spells out all the ways you can purposely or not, deceive with your maps. Which brings me to example 2: Do you know which cell phone company has the best coverage area? They all do, just ask them. They have the maps to prove it.
Fig. 3: Cell phone coverage maps and one 2010 map showing google search comparisons for pizza, guns and strip clubs. I can’t tell the difference at this scale anyways. Credit to FloatinSheep.org. At least the fun map used an appropriate projection for their data.
And this is the heart of the problem. Modern GIS is so powerful, so complex you can easily misrepresent your data either purposefully or not, just by how you symbolize it. Say you were tasked with presenting sensitive population demographics for your city. Depending on your color ramp alone you could highlight diversity or deemphasize it. Take a look at Figure 4. In this example the exact same data with the exact same color ramp is served up using three different classification models. Equal interval suddenly looks far more uniform than it sounds.
Fig. 4: Percentage of Populace in Age Dependent Groups (Children and Seniors) symbolized 3 ways.
So what do you do when your map suddenly comes out as one mono-chrome blob? Is it ethical to fish around for a histogram classification that highlights the differences in your data? What if your analysis just doesn’t show what you want? These are the types of questions a good GIS analyst should be able to parse before beginning an analysis. Understanding classification methods, data normalization and understanding the nature of your data and more specifically when to employ which classification/normalization method… these are the basics for effectively framing your data to accurately represent reality. The answer might also involve lots of explanatory text.
Fig. 5: What a map looks like when it really doesn’t say what you want it to. Or, “How to lie poorly with a map”.
When I was a graduate student, a large part of our early curriculum was based around the discussion of GIS vs GIST. Geographic Information Systems, vs Geographic information Science and Technology. Science being the operative word in that debate. GIS involves a lot of Science, and it is the application of those scientific principles that I felt really gave weight to the argument for Geographic Information Science. Those principles can and should be applied in the day-to-day work of a GIS Analyst. Without which, you might just be showing were people live in your next analysis.
Clint Lusk is a Senior GIS Analyst Technical with the City of Tacoma. He was probably left alone in a basement room full of maps for too long before writing this article and he freely admits that he has to look up histogram classification methods before using them most of the time.