Context For Cheap - The Map Reference Overlay

Wednesday, January 23, 2013
I used to hate building reference layers in my maps. Labeling placenames was painful, but nothing compared to the chest-hair-waxing misery of scaling transportation symbology by road class. No out-of-the-box defaults were ever cartographically pleasing, and hours were incinerated in the fires of annotation placement. [Sorry, QGIS and Illustrator - you're just as much to blame here as the Redlands upstart.]

But that's basically all in the past. Not long after I slogged my way into web interactive maps, the crew at Mapbox released the finest preconfigured basemap I've ever seen, "Mapbox Streets". They quickly followed with a dozen attractive starter styles and an impressive customization pallette.

Mapbox Streets

The idea of the pre-baked map was nothing new - Google, ESRI, GeoIQ and others had entered the web map age with variations on the Reference-Canvas idea: "You provide the overlay data and the story to tell, we'll provide the geographic context." And this works really well for monodimensional Point of Interest (POI) data, as demonstrated by the Google pushpins that these days rain from the sky to skewer every interesting set of coordinates on the planet. Pins and icons don't get in the way of labels and reference features.

The siren call of the martini glass . . .

But what about polygons and the world of the choropleth? Not that it stopped anyone from trying, but polygons on top of a reference canvas either obscure the features beneath or require too much transparency to make a thematic point. I bombed out on early attempts to work with this essential truth:

NYC Metro, covered in bubble bath

The solution to this problem is under our noses. I first noticed this technique when John Keefe and Steven Melendez at the WNYC Data Desk posted a Mapbox-based interactive looking at NYC's proposed wards; the streets were curiously visible above the color-coded ward polygons.

They had introduced me to the reference overlay.

Leveraging the customization options of Mapbox streets, they had
  • Winnowed out the layers representing land and water from their basemap, leaving just roads, land use and text,
  • Set these to a modestly transparent level (maybe 30-40%),
  • Using the compositing of the Mapbox API, laid this semi-transparent layer on top of the thematic polygon layer, inverting the standard reference canvas model
After I recovered the pieces of my brain that had exploded out my ears (maybe I'm easily impressed), I set to applying this tactic to my own maps. I also realized that this could be expanded to allow for a sort of map sandwich, with land and water below, thematic data next and reference data on top:

And it's not just a Mapbox thing:
Hell, if you can do better than AJ Ashton and company, build your own mostly-transparent reference overlay and cache it in a tile server for future projects.

Subtle Context on the Census Dotmap

While I realize this is all old hat in the GIS world (yes, of course you place your labels above your polygons dude), it's usefulness in web mapping can't be overstated. The reference overlay saves us serious time, solves the "mashup" problem, and lets us focus on our data and what it has to say.

Update, Wednesday Night:
After conversations with some of the Stamen and former-GeoIQ folks, I figured it'd be worth comparing what happens when three different teams build reference overlays from the same (OpenStreetmap) data. Check it out here.

Read more ...

Open Thresholds

Friday, January 18, 2013
They went too far, clearly.

In publishing the precise locations and names of all the permitted handgun owners in two New York Counties, the New York Journal-News has done a serious disservice to data journalists in particular. More broadly, they may have made things more difficult for the "Open Data" community at large.

Got guns?
But a lot of ink and rage has already been leveled at the J-N for this; in the New York Times, David Carr pointed out that even in an era of minimized privacy this was a step too far, lacking in due diligence. Jeff Sonderman in Poynter noted that the context matters kind of a lot - that the timing and lack of justification seemed to associate the mapped gun owners with the Sandy Hook massacre. Sonderman also had sage words for those sitting on piles of prospective open-data boodle:

"If you can’t come up with a better reason than 'because we can' or 'because we think it would look cool,' stop here, you’re about to data dump."

So the smarter folks have weighed in on the implications for journalism and data management, but this awkward business leaves me with two HUMUNGO-GONZO TAKE-HOME MESSAGES for the geographic opendata community:

1. Aggregate to Support the Story.

We - as a society - are flat-out not comfortable with publishing the name and location of individuals. At the very least strip the identifiers out of your points; better still, aggregate the points to coarser-scale geographic units. Census blocks work fantastically well for detailed data like this, and I hear that hexagonal bins are all the rage these days. More importantly, the coarser scale brings context and emphasizes patterns; that's where the story is at.

2. QA/QC, Punks.

Google Fusion Tables - for all its awesomeness - is an extremely blunt instrument for data journalism. Styles, filters and deployment are all very limited for getting your message out. But fusion tables also make it a little too easy to presume accuracy. The handgun ownership maps were piped through the Google geocoding engine (by all accounts the most accurate one out there today) and deposited in their supposed locations on the map. The Journal-News may have tried to clean up the output before publishing, but they didn't catch a few that missed their targets and landed in Burbank and Houston. If you're going to publish something like this, sloppiness is profoundly unhelpful.
Yeah, that guy doesn't live there.

Geosprocket built an application for the Burlington Free Press (coincidentally a sister publication of the Journal-News; don't run out of free article views now!) in late 2012, in which we tried to show the month-to-month patterns of burglaries in the city of Burlington. The data was provided by the BPD in response to a FOIA request by the Free Press, and it was extremely specific - down to the address of the incident. The context and story were clear - there's a February bump in Nighttime Burglaries - and we tailored the visualization to focus on that pattern.

BTVCrime - via the Burlington Free Press
At the time I thought we were being conscientious by stripping out the address text and using only the badge number of the responding officer, but in retrospect I would have aggregated these to the census block level. With the cool tools available today, it's a relative snap to make a polygon flash every time an incident occurs, and let the incidents stack up in accumulated color (though not so much of a snap that I'll do it for a blog post).

Basically, the Journal-News handgun owners' map has caused me to rethink a few of my own methods, and I hope provided us all with a sense of the threshold between responsible data journalism and data dumping.

Read more ...