Thursday, July 30, 2009

I can see my house.. - Part 2 : Open Source Geocoding

The drawback of relying on Google APIs is that they've got all these 'Terms of Use'. Unless you pay, you can only call the geocoder so often and can't use it for applications which aren't available freely to the public, and you can't store results except for caching.

What you could really do with is your own geocoding service and that might not be as far-fetched as it seems. If you have enough monkeys and typewriters, they'll eventually write Hamlet. In a more realistic timescale, if you have enough people driving around with GPS devices, they'll come up with a map. They have and it is called OpenStreetMap. It is still a work-in-progress (and always will be, as roads get built or redeveloped). You'll have to see if the data is sufficient for your area of interest. If not, buy a GPS and fix it.

You can get raw data from them, in an XML format called OSM. I had a look at that route and it is not ideal. I'll blog on that later. In the meantime, here's one some-else made earlier. Cloudmade have done a nice chunk of processing for us, and generated Shapefiles similar to those I loaded up in the prior post. From here, I can get a zip file containing shapefiles, one of which is 'australia_highway'. I load that the same way I did the others.

They do other countries too and boundaries as well as highways, so if the Australian postcodes/suburbs from the last post weren't useful to you, this resource may help out.

The local_geocode.sql is an example of how I've used the data to do some basic geocoding. To get that code to work in XE, you'd have to get rid of the call to "sdo_geom.sdo_closest_points" as it is new in 11g. Lets hope that OpenWorld will give us a hint on when we may see 11gR2 XE.

The data isn't at the level of Google. It doesn't do street numbers, for example. Some streets in the source OSM don't have names, and the street next to the one I live in is missing entirely. But if you are trying to determine the closest store for your customer based on their address, it could be good enough. Even if coverage isn't perfect, you may still be able to reduce the amount of calls you make to a "paid for" geocoder (which could reduce the amount you need to pay for it).

Warning: You may be tempted to switch to use sdo_geom.sdo_intersection to find where two streets cross. This is actually part of Oracle Spatial and isn't licensed for use with the basic Locator functionality that comes free with the database. In 10gR2, Locator only included the SGO_GEOM functions SDO_DISTANCE , VALIDATE_GEOMETRY_WITH_CONTEXT, VALIDATE_LAYER_WITH_CONTEXT. In 11g, it includes all subprograms EXCEPT for the following: SDO_GEOM.RELATE, SDO_GEOM.SDO_DIFFERENCE, SDO_GEOM.SDO_INTERSECTION, SDO_GEOM.SDO_UNION, SDO_GEOM.SDO_VOLUME, SDO_GEOM.SDO_XOR.

With XE, the licensing is less-defined. According to this, Spatial is not included with XE. Logically that would mean sdo_intersection can't be part of Spatial as it is included. And even if it is part of Spatial, then are the license restrictions in either the 10gR2 or 11g 'Spatial' documents applicable. Since though either would preclude sdo_intersection, the last question is probably academic.

A discussion of Spatial vs Locator licensing can be found here. As reported here, the cost of Spatial has risen from an arm and a leg to boths legs, one arm and an ear (or by 52% in dollar terms).

I can see my house from here - Part 1

On a recent project, I was looking at Geocoding with Google Maps.

Searching with a simple string would sometimes produce odd results, like places in America when I really wanted one in Australia, or one in Victoria when I expected it in Sydney. Looking further, I've found that you can tell Google Maps to limit its results to a specific 'box' specified by latitude and longitude. The trick is knowing what latitude and longitude to use as the limits.

In Australia, we can get boundaries from the Australian Bureau of Statistics here. It's a few years out of date, but will be sufficiently accurate for most purposes. The drawback is that this is in a shapefile (which is actually a set of files - go figure).

But a Googlehunt directed me to the GDAL/OGR, which can convert shapefiles into Oracle spatial/locator geometries. I downloaded the source to my ubuntu box (which has the database installed), then uncommented and set the line in nmake.opt regarding ORACLE_HOME and installed it (don't forget to set the LD_LIBRARY_PATH).

Unzip the the data files, and gdal/ogr can load the shape files straight into the database, even creating the table. That's the ease of use I like.
ogr2ogr -f OCI OCI:dev/[email protected]/db11r1 SSC06aAUST_region.shp ogr2ogr -f OCI OCI:dev/[email protected]/db11r1 POA06aAUST_region.shp

From there, I want to summarize up a 'box' for each postcode/suburb/state that tells me the outer limits of the area. Luckily for me, Australia is positioned such that it is pretty easy to determine whether a value is a latitude or longitude.

create table aus_postcode as
select a.poa_2006,
decode(state_2006 ,1,'NSW',2,'VIC', 3, 'QLD',
4,'SA', 5,'WA', 6,'TAS',7,'NT',8,'ACT','Other') state,
min(case when b.column_value between 80 and 160 then b.column_value end) west_longitude,
max(case when b.column_value between 80 and 160 then b.column_value end) east_longitude,
min(case when b.column_value between -50 and -20 then b.column_value end) south_latitude,
max(case when b.column_value between -50 and -20 then b.column_value end) north_latitude
from POA06AAUST_REGION a, table(a.ora_geometry.SDO_ORDINATES) b
group by a.poa_2006, state_2006;

create table aus_localities as
select a.name_2006 locality, ssc_2006 locality_number,
decode(state_2006 ,1,'NSW',2,'VIC', 3, 'QLD',
4,'SA', 5,'WA', 6,'TAS',7,'NT',8,'ACT','Other') state,
min(case when b.column_value between 80 and 160 then b.column_value end) west_longitude,
max(case when b.column_value between 80 and 160 then b.column_value end) east_longitude,
min(case when b.column_value between -50 and -20 then b.column_value end) south_latitude,
max(case when b.column_value between -50 and -20 then b.column_value end) north_latitude
from SSC06aAUST_region a, table(a.ora_geometry.SDO_ORDINATES) b
group by a.name_2006, ssc_2006, state_2006;

create table aus_states as
select state, min(west_longitude) west_longitude, max(east_longitude) east_longitude,
min(south_latitude) south_latitude, max(north_latitude) north_latitude
from aus_localities
group by state;

So now, before I geocode an address with suburb, postcode or state, I can query the bounding box and pass it to Google to get a more accurate answer. For an example of the code for this, look at the util_geocode package here. You'll need to plug in your own Google Maps API key before using it. Also, if you want to borrow the code for parsing the returned KML/XML, you may need to make adjustments for the address fields returned for searches in your country of interest.