Australian GeoSpatial Data – Free

Edit: There are notes in the comments from Tim that explain the changes for PostgreSql 8.4.  Thanks Tim!

I’ve built a couple of sites that needed geospatial data. One was a social networking site that needed a way to list people who were near other people, the other was a art web site that allowed users to upload steet art and show it on a map. I thought it would be interesting to get the basics of an Australian suburb dataset up and running in a geospatial database and do some simple queries.

Install PostgreSql and PostGIS

First thing to do is setup PostgreSql and PostGIS. I’m sure you can do this in MySQL but I haven’t done it, so leave a note in the comments if you get that up and running :). There are a few article on how to do this and it is platform specific so go and do that.

Get some Suburb data

Now we need some data. The ABS is kind enough to provide Australia broken down into suburbs and postcodes on their site. I’m going to deal with suburbs so go ahead and download the State Suburbs (SSC) 2006 Digital Boundaries in ESRI Shapefile format data cube. This data cube has every suburb in Australia defined as a Polygon (or a multipolygon) with each node defined as a latitude and longitude.

Converting it to SQL

Unzip the downloaded shapefile and you’ll get 8 files but we are only concerned with the SSC06aAUST_region.* ones. We are going to load the POA06aAUST_region data into the database but firstly we need to convert it into SQL.

shp2pgsql SSC06aAUST_region.shp suburbs -s 4283 -I -d > suburbs.sql

shp2pgsql converts the ESRI Shapefile into SQL. -I adds an index (which is very important for speed) and the -d Drop and recreates the table. The -s 4283 make sure the suburb data is defined in with the correct projection. The earth isn’t a sphere and different parts of the earth are curved slighly differently so the geo-bods came up with a whole bunch of projections. 4283 is the standardized number for the GDA 1994 projection which is the projection the suburb data comes in (you can just take a peek inside the POA06aAUST_region.prj file to see what the project is).

Create a Geo-enabled DB and load the data

createdb australia
createlang plpgsql australia
psql -f /opt/local/share/postgis/lwpostgis.sql -d australia
psql -f /opt/local/share/postgis/spatial_ref_sys.sql -d australia
psql australia < suburbs.sql

Note: The directories for the lwpostgis.sql and spatial_ref_sys will vary from system to system so you’ll have to find them on your own machine.

You will also want to create a reference table for the Australian States

create table aust_states (id integer primary key, state_name varchar, state_abbrev varchar);
insert into aust_states (id, state_name, state_abbrev) values (1, 'New South Wales', 'NSW');
insert into aust_states (id, state_name, state_abbrev) values (2, 'Victoria', 'VIC');
insert into aust_states (id, state_name, state_abbrev) values (3, 'Queensland', 'QLD');
insert into aust_states (id, state_name, state_abbrev) values (4, 'South Australia', 'SA');
insert into aust_states (id, state_name, state_abbrev) values (5, 'Western Australia', 'WA');
insert into aust_states (id, state_name, state_abbrev) values (6, 'Tasmania', 'TAS');
insert into aust_states (id, state_name, state_abbrev) values (7, 'Northern Territory', 'NT');
insert into aust_states (id, state_name, state_abbrev) values (8, 'Australian Captial Territory', 'ACT');
insert into aust_states (id, state_name, state_abbrev) values (9, 'Other Territories', 'OT');

Get some awesome answers!

Show me the polygon of Port Melbourne

select name_2006, astext(the_geom)  from suburbs where name_2006 = 'Port Melbourne';

This returns a whole bunch of lat and longs. Pretty useless really. Maybe having the center of a suburb would be more useful.

Show me the center of Port Melbourne

select name_2006, astext(centroid(the_geom))  from suburbs where name_2006 = 'Port Melbourne';

   name_2006    |                  astext
----------------+-------------------------------------------
 Port Melbourne | POINT(144.921987367191 -37.8328692507562)
(1 row)

Much better!

Show me the suburbs that surround Port Melbourne

select surrounding.name_2006
    from suburbs source, suburbs surrounding
    where source.name_2006 = 'Port Melbourne'
        and touches(source.the_geom, surrounding.the_geom);

    name_2006
-----------------
 Albert Park
 Docklands
 South Melbourne
 Southbank
 Spotswood
 West Melbourne
 Yarraville
(7 rows)

Here I select the suburb table twice, once to represent it as the source suburb, in this case Port Melbourne and as a destination or surrounding suburb. I then restrict my matches to only show polygons that touch the source.

Show me the suburbs that surround Port Melbourne with distances between suburbs

select surrounding.name_2006,
       distance(transform(centroid(source.the_geom),3112),
                transform(centroid(surrounding.the_geom),3112))
    From suburbs source, suburbs surrounding
    where source.name_2006 = 'Port Melbourne'
        and touches(source.the_geom, surrounding.the_geom);

    name_2006    |     distance
-----------------+------------------
 Albert Park     | 3908.06472236311
 Docklands       | 2316.21021732757
 South Melbourne | 3106.68573231296
 Southbank       | 3492.93829708397
 Spotswood       |  3035.6283677131
 West Melbourne  | 2682.84381789969
 Yarraville      | 3914.04324956383
(7 rows)

The interesting part here is getting the distance between suburbs. The distance() method gets the distance between two points, which for us is the 2 center points of our suburbs. Unfortunately if you measure the distance you’ll get an answer in degrees which isn’t that useful. So you need to transform the projection from a degree (lat and long are in degrees) to a meter based projection . Australia happens to have one called the Lambert Conformal Conic projection known as number 3112. Hence:

distance(transform(centroid(source.the_geom),3112),
                transform(centroid(surrounding.the_geom),3112))

will get the distance, in meters, betwwen two suburbs.

Show me all the suburbs named Richmond

select name_2006,
       state_name
    from suburbs
    inner join aust_states on suburbs.state_2006 = aust_states.id
    where name_2006 = 'Richmond';

 name_2006 |   state_name
-----------+-----------------
 Richmond  | Victoria
 Richmond  | South Australia
 Richmond  | Tasmania
(3 rows)

What’s next?

This is all very nice, but when you start geocoding data and getting lat/longs of items you can store in the db then you can do some really fun stuff. If this article generates enough interest I’ll follow up with some Ruby code and Google Maps integration.

  1. Great article mate, didn’t realize all that data was available, looking forward to the next articles. :)

  2. elton says:

    just what i was looking for… thanks…

  3. John Roth says:

    Great article. Cheers!

  4. Michael says:

    Great article!

  5. Tim Preston says:

    For those on postgresql 8.4, your paths have changed!

    The GIS sql files to set up your database are now located in the postgres84 share directory, and there has been one name change from lwpostgis.sql to postgis.sql. So, for the example above use:

    psql -f /opt/local/share/postgresql84/contrib/postgis.sql -d australia
    psql -f /opt/local/share/postgresql84/contrib/spatial_ref_sys.sql -d australia

    Many thanks to Mark!

  6. Brian Cooper says:

    Be careful when using Australian Bureau of Statistics non standard geographies as they could be misleading in many cases they are approxmations of areas or combinations of localities where the real geography may be split over several ABS geographies. This is very common in NSW.

  1. There are no trackbacks for this post yet.

Leave a Reply