Sunday, May 19, 2013

Taking Tilemill 2 for a Spin


When I took my first GIS class, I was told that 5% of my time would be spent making maps, while 90% of it would be spent corralling data into usable format and 5% would be spent beating an unresponsive plotter with a cardboard tube. This balance has definitely changed as interoperability has gone from prayer to practice, but I've continued to be frustrated by the time I feel gets lost to a missing shapefile extension or a falsely-defined projection.

For all the fun afforded us by Mapbox's original Tilemill map design platform, it's always been sort of a GIS-y hassle to wrangle data into it. Mind you this is slight, glancing criticism - it's NOTHING compared to getting your geodata to work in Illustrator or any actual GIS platform. But I always found myself wishing I could spend less time racking my brain for where I put that awesome building footprint layer, or trying in vain to find the right ORDER BY syntax for a PostGIS datasource.

Perhaps you've heard, but Mapbox sort of solved my whiny problems. Tilemill 2 is in unsupported alpha, but it's already fulfilling my dream of data-agnostic cartography. The basic idea is that the world - as derived from OpenStreetmap - is some pretty centralizable basedata, so why not just tap the source and style it however you like? Instead of pulling extracts or copies or subsets, Tilemill 2 just gives you the whole damn world, all 330GB-and-counting of it, via super-fast vector tiles (great explanation here). You get to style those tiles with the CartoCSS language, and at that point they're ready for your audience.

Suffice to say, it is A LOT of fun to have the world at your fingertips:





For the moment there's no export or serving option, and it'll be interesting to see what Mapbox does with this tool in its already-robust custom mapping lineup. It's also worth noting that planet.osm is not ALL TEH DATAZ - we'll always need a way to use local or personal geodata. But this is a great leap for an already-impressive platform.

It's liberating to be able to focus on cartography with the building blocks just there at my fingertips.






Friday, April 19, 2013

Geoflow: A Hasty Review


Remember those Microsoft guys?

Yeah, me neither. They're not really geo-involved in any loud way. Their contribution to GIS has consisted of a few excel plugins and a dedicated deference to the ArcGIS platform. They made a halfhearted attempt to enter the web mapping space with Bing maps, but it's safe to say the developer community hasn't expressed much interest:


But there are some geo-progressive (How many times am I allowed to use it as a prefix?) corners of the Windows empire, notable in the hiring of OpenStreetmap founder Steve Coast, and in a generous licensing of high-resolution imagery for OSM mappers. Add to that a new tool for Excel called Geoflow, which launched to some fanfare Last week as we all looked up from our maps with a collective "Microsoft? Really?"

I've been fiddling with it a bit since and I'm glad to have given it a try. It can probably be best described as the CartoDB/Fusion Tables of the desktop, and while that may sound like a backhanded insult, I think it will be extremely useful to the large and dedicated population of Excel users. This is not GIS. This is geovisualization, and it's not half bad.



The premise is simple - put your spreadsheet records on a map - and it executes with little difficulty. The user just needs a column with placenames (or coordinates), and the rest of the columns become thematic options in a 3D map without having to leave excel. The defaults are novel and in some cases visually appealing - which is fortunate because there's not a lot of tweaking you can do (Until Microsoft exposes a Visual Basic UI. Heaven help us.). The orthographic perspective is nice and the application did a better job of figuring out my data structure than excel usually does.

A quick roundup:

  • Only available on Office 2013 for now.
  • It's the antithesis of open-source but they get a temporary pass because excel is ubiquitous. I look forward to the LibreOffice Calc plugin.
  • The Nokia-based geocoding is solid, with only one misfire to null island.
  • The map UI isn't intuitive, but I'm not exactly sure if there is such a thing as intuitive 3D navigation. Maybe this is one for the Leap Motion developers. 
  • Not portable. Unless there's some feature in the Office365 version that I'm missing, this is Geoflow's biggest shortcoming. The one nod to the desire to show your map somewhere other than in your cubicle is the "Copy Screen" button. Which I already had on my keyboard.
  • The tour function is clunky, roped to a jerky graphics engine that tries to load every zoom level of tiles even after you've arrived at your stop. This happens every time, suggesting there's no tile caching at all.
  • There's a nice selection of default map themes from Bing Maps, but they have trouble in the orthographic environment. Labels and imagery are slow to load, and text keeps its north orientation even if you want the Hobo-Dyer projection.

Overall: not bad, Microsoft. I like novelty, and they've somehow managed to combine some of the better elements of Google Earth with a spreadsheet and make it look compelling. I wouldn't say it's buggy, just in need of some UX rethinking. Ultimately it's good for all of us when a new map technology emerges from unexpected quarters.


Thursday, April 18, 2013

Toward an Ideal Geoportal

A Geoportal Identity Crisis

A city wants to open its geodata for public use. An NGO wants to spark a transparency initiative. A regional planning commission wants to stop emailing zipped shapefiles when pestered. They want to deliver two contrasting products - raw data and parsed themes - to as many as a dozen different audiences: policymakers, technical service providers, the press, professional curmudgeons, etc.

In the past decade, the solution to this nearly-impossible balancing act has been to build a Geoportal, and we've seen some pretty memorable misfires. Most of the trouble can be chalked up to the good intentions of the technicians who produce the data and the tools to distribute it; accustomed to a desktop GIS environment for exploration and analysis, they have built and re-built "GIS on the Web" (Chris Herwig documented his travels around a broad selection of them). They've done this under the assumption that users will be filtering, buffering, selecting by location and overlaying, and as it happens that's not really true. This approach fails all sectors of the public.

For many months now, Brian Timoney has been championing sanity in the wasteland of geoportals. He expertly trolls the mapjunk and the faustian UX, but he's also offered an analytics-based selection of "Best practices" for getting geodata to the public. Beyond how users actually interact with online maps, he's drawn attention to the subtle distinction between "open data" and "useable open data". But he is still something of a voice in the wilderness, as we all nod vigorously in agreement and go back to downloading file geodatabases from the USGS. We work with the system we have, because it's hard to envision the details of the alternative.

A Template

I've been trying to envision such an alternative, spurred on by some adventurous clients. Specifically, I wanted to see if it was possible to crack open a public dataset in a way that was compelling to technicians as well as to the lay public - something that would adhere to emerging best practices as well as to my own bias toward open architecture for open data. Timoney himself has already taken a stab at this, but we have different styles. So I cobbled together a template geoportal, wrapped around a standard multipurpose bit of public data: building records.

Give it a spin here.

I assumed two audiences: 1.) citizens who want their own zoning information and permit history fast, and 2.) analysts who want to grab bulk chunks of building footprint geodata for urban planning, disaster response or just noodling cartography. The former group can search for their address, see their building in context, and get the basic info before heading off to print their fact sheet. One minute or less.


The GIS specialists of the world can hit the download link and vacuum in all the features within the current view extent, in the interoperable format flavor that suits them (+TopoJSON for the bleeding-edge types).


This is just a page for a single dataset, but I think it meets the needs of both audiences and doesn't suck to look at or navigate. A few of the other features I wanted to include:
  • Lightweight and Javascript-based - less than 1MB before the tiles show up.
  • Shareable URLS - the root URL is subject specific, and the location hash allows users to pass around a focused view of a neighborhood or house.
  • Traditional search (in the hanging dialog box) is prominent without obscuring the map, since I'm of the opinion that a map is better as a page canvas than as a tiny sidebar window.
  • Very few visible bells and whistles - This is built to some narrow workflows with no mission creep and no toolbars. A bunch of info is socked away in a modal popup for the intrepid.
  • Cartography - I'm done with auto-pixelated graphics and bad default symbology; this is 2013 and we should all be using Mapnik for the web.
  • Open-source undercarriage - this is a combo of Bootstrap, Leaflet, CartoDB and Mapbox.
  • A note on CartoDB and Mapbox - their server code is legit open-source (meaning I could just run it my own damn self), but I've used their hosted services here since it's just - ack - easier to let them handle the services and flexibility thereof. And still cheaper than the competition. 
It's true that much of my requirements were about what to omit, but it is truly a difficult task to limit the scope of an application that is being driven to omni-functionality by stakeholders. Resistance is part of the process.

Here's the code on github. The index.html is commented liberally, so you should be able to tell where to swap things in and out. Scaling this approach to an entire suite of geodata could be as easy as forking the repo for every dataset you want to present to the public. Single-theme maps will see the most use in the long run, so keep it simple.

The To-Do List

This app needs typeahead in the search box to really deliver options. Andrew Hill at Vizzuality shows how easy this is when pulling from a CartoDB address column, but I'm still trying to bolt it onto the partially-abstracted leaflet geosearch module with my meager javascript skills. I will gleefully accept pull requests. Additionally, you may note that each building's "Fact Sheet" points to the same place. The city of Burlington is doing great things with its data, but we don't have building URLs keyed to parcel IDs or addresses yet :)

The bigger question is one of data discovery; if we're going to limit geoportals to one theme at a time, how do users get where they want to go? I hate being directed to a silverlight-slinging "Map Gallery" when I'm looking for info, but I'm not sure what the alternative is for top-level geodata search. Is it Chicago's spare text-based search? Is it the 300-button web GIS that requires training to use? Is it good SEO?

We know something about how users behave once they find the map they want, but how do we get them there in the first place?

Listocracy from Chicago, GIS-in-a-browser from VT Nat. Resources
Many thanks to Jay Appleton from the city of Burlington for being the single-handed support structure of open data here, including emailing the occasional zipped shapefile :)


Tuesday, February 12, 2013

TopoJSON and Messed-Up Topology



The awesomeness of Mike Bostock's TopoJSON format for geodata is not disputable. It drops file sizes by up to 90% and opens the door to seamless feature simplification. And Josh Livni's shpescape.com makes it accessible to everyone in a web conversion UI. But in the GIS world topology is a fearful thing - it blows up your geoprocessing when incorrect, and "fixing" the errors of features that partially share messy borders can take hours. So I wish the conversion of a messy feature set from .shp or GeoJSON to TopoJSON would magically erase the gaps and overlaps of topological suckage. Would that I could click my heels and make it so, along with a smoked porter appearing on my desk. 

The above example shows some postal codes at the U.S./Canada border; in 1816 some surveyor-deficient New Yorkers accidentally built a fort 3/4 of a mile into Canada, before realizing their mistake and abandoning it. If the current residents of zipcode 12979 decided they wanted to take that site back, the resulting overlap would be about what you see here, with a U.S. postal code overlapping a Canadian one. This being TopoJSON, the U.S. feature shares a D3 "path" with its New York neighbor. But its Northern border is now unique to that feature, no longer coinciding with the southern path of the Canadian feature it invaded. 

This doesn't really pose any showstopping problems in this use case, but it certainly could if the symbology were at all complicated or the label placement were important. My point here is that clean geometry still matters when using TopoJSON; errors don't go away when you make the conversion.

Fortunately it looks like more cleaning products are in the works . . .





Wednesday, February 6, 2013

Navigation on Planks

We're not there yet . . . (Skier dude by Saman Bemel Benrud)
The news yesterday was that Google has added the trails of 38 ski resorts in the US and Canada to its Maps database. Here in Vermont, my stashes are safe from further encroachment, since the only two Green Mountain resorts on the list are perennial tourist-bait favorites Okemo and Stowe. That said, this is a cool move by Google in its desire to permeate our mobile world. I'm used to leaving my phone behind while I ski, but this sort of information would be useful to have at my fingertips if I were in unfamiliar terrain at a new resort. It could also be useful in tandem with the ski-run-tracking apps that are proliferating these days.

However, one item is missing from the new functionality: ski navigation. You can't request the shortest (or gnarliest) route from the top of the Sensation Quad to the bottom of the Alpine Double. These new lines - while color-coded according to difficulty - are just background images in Google's otherwise-rich network of information. The navigation engine suggests that the shortest route down the Liftline at Stowe is to pop off my cables and trudge down the Toll Road or the Long Trail.

The longest way around
I can see why Google would be nervous about offering directions in this context - There are two problems that are unique to alpine skiing as a navigation paradigm:

Obey the Rope.

Trail conditions (especially in the East) change more rapidly than road conditions. However, they don't change more rapidly than traffic conditions, and Google's got that covered in near-realtime in major metro areas. 
Solution: GTFS for ski resorts - While it's not immediately realistic to ask every resort to come up with a trails API, it's possible to work out a common format for trail reports (Y'know, those things you look at in the morning to see where it's not bulletproof) much like Google does for public transit. It would be an afternoon task for a data ninja at Mountain View to figure out a way of scraping and parsing that information on a daily basis.

Gravity's a . . . Precondition.


Leaving nordic and backcountry skiing aside, ski navigation would require consideration of elevation change. No uphill turns allowed. 
Solution: Spatial Analysis -  This is one of those bread-and-butter geoprocessing problems that GIS undergrads are given: calculate a flow accumulation surface, add turn attributes to the trail data accordingly. This is actually kind of a fun one.

Lots of folks are never going to look at a mobile device while skiing, maybe myself included. But for the ones who do, it'd be pretty danged cool to have navigation assistance built into the vertical territory that Google is now adding to its already-thorough map system. Just a few hitches to overcome, and I think they're up to it :)

"If only there were some way I could technify this experience . . ."

Wednesday, January 23, 2013

Context For Cheap - The Map Reference Overlay

I used to hate building reference layers in my maps. Labeling placenames was painful, but nothing compared to the chest-hair-waxing misery of scaling transportation symbology by road class. No out-of-the-box defaults were ever cartographically pleasing, and hours were incinerated in the fires of annotation placement. [Sorry, QGIS and Illustrator - you're just as much to blame here as the Redlands upstart.]

But that's basically all in the past. Not long after I slogged my way into web interactive maps, the crew at Mapbox released the finest preconfigured basemap I've ever seen, "Mapbox Streets". They quickly followed with a dozen attractive starter styles and an impressive customization pallette.

Mapbox Streets

The idea of the pre-baked map was nothing new - Google, ESRI, GeoIQ and others had entered the web map age with variations on the Reference-Canvas idea: "You provide the overlay data and the story to tell, we'll provide the geographic context." And this works really well for monodimensional Point of Interest (POI) data, as demonstrated by the Google pushpins that these days rain from the sky to skewer every interesting set of coordinates on the planet. Pins and icons don't get in the way of labels and reference features.

The siren call of the martini glass . . .

But what about polygons and the world of the choropleth? Not that it stopped anyone from trying, but polygons on top of a reference canvas either obscure the features beneath or require too much transparency to make a thematic point. I bombed out on early attempts to work with this essential truth:

NYC Metro, covered in bubble bath

The solution to this problem is under our noses. I first noticed this technique when John Keefe and Steven Melendez at the WNYC Data Desk posted a Mapbox-based interactive looking at NYC's proposed wards; the streets were curiously visible above the color-coded ward polygons.

They had introduced me to the reference overlay.

Leveraging the customization options of Mapbox streets, they had
  • Winnowed out the layers representing land and water from their basemap, leaving just roads, land use and text,
  • Set these to a modestly transparent level (maybe 30-40%),
  • Using the compositing of the Mapbox API, laid this semi-transparent layer on top of the thematic polygon layer, inverting the standard reference canvas model
After I recovered the pieces of my brain that had exploded out my ears (maybe I'm easily impressed), I set to applying this tactic to my own maps. I also realized that this could be expanded to allow for a sort of map sandwich, with land and water below, thematic data next and reference data on top:


And it's not just a Mapbox thing:
Hell, if you can do better than AJ Ashton and company, build your own mostly-transparent reference overlay and cache it in a tile server for future projects.

Subtle Context on the Census Dotmap

While I realize this is all old hat in the GIS world (yes, of course you place your labels above your polygons dude), it's usefulness in web mapping can't be overstated. The reference overlay saves us serious time, solves the "mashup" problem, and lets us focus on our data and what it has to say.

Update, Wednesday Night:
After conversations with some of the Stamen and former-GeoIQ folks, I figured it'd be worth comparing what happens when three different teams build reference overlays from the same (OpenStreetmap) data. Check it out here.







Friday, January 18, 2013

Open Thresholds

They went too far, clearly.

In publishing the precise locations and names of all the permitted handgun owners in two New York Counties, the New York Journal-News has done a serious disservice to data journalists in particular. More broadly, they may have made things more difficult for the "Open Data" community at large.

Got guns?
But a lot of ink and rage has already been leveled at the J-N for this; in the New York Times, David Carr pointed out that even in an era of minimized privacy this was a step too far, lacking in due diligence. Jeff Sonderman in Poynter noted that the context matters kind of a lot - that the timing and lack of justification seemed to associate the mapped gun owners with the Sandy Hook massacre. Sonderman also had sage words for those sitting on piles of prospective open-data boodle:

"If you can’t come up with a better reason than 'because we can' or 'because we think it would look cool,' stop here, you’re about to data dump."

So the smarter folks have weighed in on the implications for journalism and data management, but this awkward business leaves me with two HUMUNGO-GONZO TAKE-HOME MESSAGES for the geographic opendata community:

1. Aggregate to Support the Story.

We - as a society - are flat-out not comfortable with publishing the name and location of individuals. At the very least strip the identifiers out of your points; better still, aggregate the points to coarser-scale geographic units. Census blocks work fantastically well for detailed data like this, and I hear that hexagonal bins are all the rage these days. More importantly, the coarser scale brings context and emphasizes patterns; that's where the story is at.

2. QA/QC, Punks.

Google Fusion Tables - for all its awesomeness - is an extremely blunt instrument for data journalism. Styles, filters and deployment are all very limited for getting your message out. But fusion tables also make it a little too easy to presume accuracy. The handgun ownership maps were piped through the Google geocoding engine (by all accounts the most accurate one out there today) and deposited in their supposed locations on the map. The Journal-News may have tried to clean up the output before publishing, but they didn't catch a few that missed their targets and landed in Burbank and Houston. If you're going to publish something like this, sloppiness is profoundly unhelpful.
Yeah, that guy doesn't live there.



Geosprocket built an application for the Burlington Free Press (coincidentally a sister publication of the Journal-News; don't run out of free article views now!) in late 2012, in which we tried to show the month-to-month patterns of burglaries in the city of Burlington. The data was provided by the BPD in response to a FOIA request by the Free Press, and it was extremely specific - down to the address of the incident. The context and story were clear - there's a February bump in Nighttime Burglaries - and we tailored the visualization to focus on that pattern.

BTVCrime - via the Burlington Free Press
At the time I thought we were being conscientious by stripping out the address text and using only the badge number of the responding officer, but in retrospect I would have aggregated these to the census block level. With the cool tools available today, it's a relative snap to make a polygon flash every time an incident occurs, and let the incidents stack up in accumulated color (though not so much of a snap that I'll do it for a blog post).

Basically, the Journal-News handgun owners' map has caused me to rethink a few of my own methods, and I hope provided us all with a sense of the threshold between responsible data journalism and data dumping.