Home

Google and Sensis – The reports of my death are greatly exaggerated[2]

Posted in Home on November 4th, 2008 by mark – 2 Comments

These are my opinions of what is going on and not those of Sensis.

Fairfax (The Age and Sydney Morning Herald) are running a story titled Sensis concedes defeat to Google2. It’s a great headline, but it is very misleading. There are two facts in the announcement – syndication and search.

Syndication

Yellow Pages Australia is syndicating it’s data. They’ve had a commercial aggreement with NineMSN for a very long time which allows Yellow Pages data to be used within the NineMSN search. The new announcement extends the syndication of Yellow data to include GoogleMaps Australia.

If you are a Yellow Pages advertiser this means your advertisement can be found in more places. Therefore the Yellow Pages ad you bought is now even more valuable. Win!

Search on Sensis.com.au

Here is a nice quote from previously mentioned FairFax2 artcile – ‘and abandon its own search engine for one powered by Google’. The site in questiohn, sensis.com.au, currently has searches of all of the sensis properties – yellow, white, trading post AND the web – a one stop shop for searching the Sensis properties. Outsourcing the web search on Sensis.com.au makes perfect sense as Internet search is not Sensis’ core competancy. If you are going to outsource your search it should be to someone who knows how to do it like Google, Yahoo or Microsoft. Sensis went with Google. Not too shocking really.

The Yellow Pages, to my knowledge, will remain powered by yellow.com.au which is build by Sensis in Melbourne.

1 The reports of my death are greatly exaggerated

2 Sensis concedes defeat to Google

Yellow Lab – Sensis innovates

Posted in Home on October 28th, 2008 by mark – 3 Comments

I’ve been at Sensis for a year now and my latest project is Yellow Lab . The Yellow Pages is a significant part of the Sensis and Telstra portfolio earning 1,273 million dollars in revenue (p34) during 2007/8 which includes both print and online revenue. Yellow Pages is a product that you only change when you are absolutely sure you’re going to make it better. Labs was created to trail ideas for Yellow Online and mitigate risk in delivering new features. The ideas we trail are product, technical and user experience related.

It’s built on Rails

This is the first public site that Sensis has built in Rails. The momentum has been building within Sensis for a while but nothing of this scale has made it out to the public. Rails was chosen because it allowed Sensis to get to market quickly. From development to deployment, Rails has been a win. We’ve had many developers rotate through the project already, all of them have picked up Ruby and Rails very quickly with more experienced devs passing on their knowledge. That’s a win for Ruby and Rails .

Microformats everywhere

The Yellow Pages has a lot of data. Every single search result, and listing, is microformatted. I’ll be writing a separate post about this very soon, but it is really exciting stuff.

Building a search engine

We put a lot of time and effort into search. It’s hard and it’s fun. It also crosses all elements of the business from product management to user experience to technology. It is certainly more complicated than performing LIKE queries on a database or installing Sphinx or Solr . Business rules, physical locality, voting and tags all need to contribute towards getting a good result.

Aggregated Listings

Yellow Pages is a directory at it’s heart which lends itself to browsing rather than searching. Let’s say you’re looking for hardware in Port Melbourne. In a print world you’d flick open the book, look for the hardware category and then thumb your way through until you find a business that works for you. Big businesses like Bunnings would list themselves under hardware, but they may also list themselves under Nurseries as they also sell plants. That’s an awesome solution to the book because Bunnings can now be found in several places throughout the book. That’s a win for the business advertising and its a win for the user as they can find what you are after more easily.

When directories put their info online you usually have a modified browse experience where you have to select a category and then you can search within a category. For example, you say your looking for a Hardware store in Port Melbourne and find that Bunnings turns up. This is the way the core site works.

We wanted change the experience of finding businesses from a directory browsing experience to an internet style search. One hitch, if you put these listings online and search for Bunnings in Port Melbourne you’ll now find two entries (actually 5 on the site), one for Hardware and one for Nurseries – and that sucks if you are looking online. So we’ve aggregated listings together that are from the same businesses using some jiggery pokery to provide a single view of a business. It’s not perfect but it makes a big difference – it definitely solves 80% of the problem.

Sweet, sweet maps

Whereis has freaking good maps, despite Google having customer mind-share on maps. You can zoom in and see buildings with names and they have map data that gets updated regularly. You can send a location right to you GPS device, but in general, it is really well tailored to the Aussie market. Besides they look much nicer than the current Yellow Pages maps and so we’ve put them on Yellow Labs .

User generated content

Dude… We are so Web 2.0 – finally. We’ve got tagging (which we index immediately in our search engine so you can find em quickly) and user ratings in the form of “WOMming”. The tech is straight forward, and the features may seem obvious but how does this fit in to being an innovation platform? Finding out how things work in the real world is hard and the questions we are asking are more social. So we’ll build it and test if and how people use it? Do we moderate tagging contect? How? Does a ‘positive’ only recommendation work? Let build something and find out.

The future starts now

We have just dropped the very first iteration of Yellow Lab. It is rough and ready. This is the beggining of the adventure, not the end. We are going to put new features in and rip out features once we’ve finished learning from a feature. Trying out a new idea, measuring its success and deciding to go forward – or to throw it away – helps the business make better decision. There is some good stuff coming down the pike. Keep coming back to the Lab on a regular basis to check out what is new as we update the site several times a month – and please send us feedback.

Australian GeoSpatial Data – Free

Posted in Home on October 19th, 2008 by mark – 6 Comments

Edit: There are notes in the comments from Tim that explain the changes for PostgreSql 8.4.  Thanks Tim!

I’ve built a couple of sites that needed geospatial data. One was a social networking site that needed a way to list people who were near other people, the other was a art web site that allowed users to upload steet art and show it on a map. I thought it would be interesting to get the basics of an Australian suburb dataset up and running in a geospatial database and do some simple queries.

Install PostgreSql and PostGIS

First thing to do is setup PostgreSql and PostGIS. I’m sure you can do this in MySQL but I haven’t done it, so leave a note in the comments if you get that up and running :). There are a few article on how to do this and it is platform specific so go and do that.

Get some Suburb data

Now we need some data. The ABS is kind enough to provide Australia broken down into suburbs and postcodes on their site. I’m going to deal with suburbs so go ahead and download the State Suburbs (SSC) 2006 Digital Boundaries in ESRI Shapefile format data cube. This data cube has every suburb in Australia defined as a Polygon (or a multipolygon) with each node defined as a latitude and longitude.

Converting it to SQL

Unzip the downloaded shapefile and you’ll get 8 files but we are only concerned with the SSC06aAUST_region.* ones. We are going to load the POA06aAUST_region data into the database but firstly we need to convert it into SQL.

shp2pgsql SSC06aAUST_region.shp suburbs -s 4283 -I -d > suburbs.sql

shp2pgsql converts the ESRI Shapefile into SQL. -I adds an index (which is very important for speed) and the -d Drop and recreates the table. The -s 4283 make sure the suburb data is defined in with the correct projection. The earth isn’t a sphere and different parts of the earth are curved slighly differently so the geo-bods came up with a whole bunch of projections. 4283 is the standardized number for the GDA 1994 projection which is the projection the suburb data comes in (you can just take a peek inside the POA06aAUST_region.prj file to see what the project is).

Create a Geo-enabled DB and load the data

createdb australia
createlang plpgsql australia
psql -f /opt/local/share/postgis/lwpostgis.sql -d australia
psql -f /opt/local/share/postgis/spatial_ref_sys.sql -d australia
psql australia < suburbs.sql

Note: The directories for the lwpostgis.sql and spatial_ref_sys will vary from system to system so you’ll have to find them on your own machine.

You will also want to create a reference table for the Australian States

create table aust_states (id integer primary key, state_name varchar, state_abbrev varchar);
insert into aust_states (id, state_name, state_abbrev) values (1, 'New South Wales', 'NSW');
insert into aust_states (id, state_name, state_abbrev) values (2, 'Victoria', 'VIC');
insert into aust_states (id, state_name, state_abbrev) values (3, 'Queensland', 'QLD');
insert into aust_states (id, state_name, state_abbrev) values (4, 'South Australia', 'SA');
insert into aust_states (id, state_name, state_abbrev) values (5, 'Western Australia', 'WA');
insert into aust_states (id, state_name, state_abbrev) values (6, 'Tasmania', 'TAS');
insert into aust_states (id, state_name, state_abbrev) values (7, 'Northern Territory', 'NT');
insert into aust_states (id, state_name, state_abbrev) values (8, 'Australian Captial Territory', 'ACT');
insert into aust_states (id, state_name, state_abbrev) values (9, 'Other Territories', 'OT');

Get some awesome answers!

Show me the polygon of Port Melbourne

select name_2006, astext(the_geom)  from suburbs where name_2006 = 'Port Melbourne';

This returns a whole bunch of lat and longs. Pretty useless really. Maybe having the center of a suburb would be more useful.

Show me the center of Port Melbourne

select name_2006, astext(centroid(the_geom))  from suburbs where name_2006 = 'Port Melbourne';

   name_2006    |                  astext
----------------+-------------------------------------------
 Port Melbourne | POINT(144.921987367191 -37.8328692507562)
(1 row)

Much better!

Show me the suburbs that surround Port Melbourne

select surrounding.name_2006
    from suburbs source, suburbs surrounding
    where source.name_2006 = 'Port Melbourne'
        and touches(source.the_geom, surrounding.the_geom);

    name_2006
-----------------
 Albert Park
 Docklands
 South Melbourne
 Southbank
 Spotswood
 West Melbourne
 Yarraville
(7 rows)

Here I select the suburb table twice, once to represent it as the source suburb, in this case Port Melbourne and as a destination or surrounding suburb. I then restrict my matches to only show polygons that touch the source.

Show me the suburbs that surround Port Melbourne with distances between suburbs

select surrounding.name_2006,
       distance(transform(centroid(source.the_geom),3112),
                transform(centroid(surrounding.the_geom),3112))
    From suburbs source, suburbs surrounding
    where source.name_2006 = 'Port Melbourne'
        and touches(source.the_geom, surrounding.the_geom);

    name_2006    |     distance
-----------------+------------------
 Albert Park     | 3908.06472236311
 Docklands       | 2316.21021732757
 South Melbourne | 3106.68573231296
 Southbank       | 3492.93829708397
 Spotswood       |  3035.6283677131
 West Melbourne  | 2682.84381789969
 Yarraville      | 3914.04324956383
(7 rows)

The interesting part here is getting the distance between suburbs. The distance() method gets the distance between two points, which for us is the 2 center points of our suburbs. Unfortunately if you measure the distance you’ll get an answer in degrees which isn’t that useful. So you need to transform the projection from a degree (lat and long are in degrees) to a meter based projection . Australia happens to have one called the Lambert Conformal Conic projection known as number 3112. Hence:

distance(transform(centroid(source.the_geom),3112),
                transform(centroid(surrounding.the_geom),3112))

will get the distance, in meters, betwwen two suburbs.

Show me all the suburbs named Richmond

select name_2006,
       state_name
    from suburbs
    inner join aust_states on suburbs.state_2006 = aust_states.id
    where name_2006 = 'Richmond';

 name_2006 |   state_name
-----------+-----------------
 Richmond  | Victoria
 Richmond  | South Australia
 Richmond  | Tasmania
(3 rows)

What’s next?

This is all very nice, but when you start geocoding data and getting lat/longs of items you can store in the db then you can do some really fun stuff. If this article generates enough interest I’ll follow up with some Ruby code and Google Maps integration.

Class inherits from

Posted in Home on August 3rd, 2008 by mark – 2 Comments

I wanted to know if a class was an Active Record class. I couldn’t find an easy way to do it in Ruby so I monkey patches the Class object like so (assuming Person is an active record model object).

class Class
  def inherits_from?(klass, me=self)
    return false if me.nil?
    return true if me == klass
    inherits_from? klass, me.superclass
  end
end

>> Person.inherits_from? ActiveRecord::Base
=> true

Generate static pages in Rails

Posted in Home on July 14th, 2008 by mark – 2 Comments

I wanted a nice looking 404, 500 and maintenance pages for my Rails app and I couldn’t serve them from Rails.

My requirements were:

  • I didn’t want to hand code the pages – I’m using a web framework for a reason!
  • I wanted to use the application’s layout
  • I needed that pages to be static so I could serve them when the Rails app was either down or when it had ‘issues’

My solution was to:

  1. create a controller for the purpose of rendering the static pages
  2. tailor your views so you have nice 404, 500 and maintenance pages
  3. modify the layout so that signin and register were not longer present
  4. create a rake task to render the pages and write them out to a file

Items 1 & 2 are just standard Rails stuff – so go nutz young coders.

Item 3 was pretty straight forward, in the layout I put:

  <% unless controller.controller_name == "errors" %>
     put your sign-in code here
  <% end %>

Item 4 was a bit trickier, but this rake task should get you up and running.

  namespace :generate do
    task :pages => :environment do
      require 'action_controller/integration'
      app = ActionController::Integration::Session.new
      app.host! "stateofflux.com"
      [['/errors/error_404', 'public/40 4.html'],
       ['/errors/error_500', 'public/500.html']].each do |url, file|
        begin
          app.get url
          File.open(file, "w") { |f| f.write app.response.body }
        rescue Exception => e
          puts "Could not write file #{e}"
        end
      end
    end
  end

We run the rake task in the development environment then check it in, but you could run it in production if there was production data that you needed to create the page.

activerecord-postgresql-adapter in Rails 2.1

Posted in Home on July 13th, 2008 by mark – 1 Comment

If you get the following message

Please install the postgresql adapter: `gem install activerecord-postgresql-adapter` 

It means you don’t have the new ‘pg’ postgresql library installed. This is easily fixed with a bit of

sudo gem install pg

Column editing with Emacs

Posted in Home on June 14th, 2008 by mark – 3 Comments


Emacs Column Editing from Mark Mansour on Vimeo.

My OS X lovin’ buddies love to point out how easy it is to manipulate columns in TextMate . But I’m old skool and I still love emacs . So to prove that column editing mode is a cinch I’ve put together my first screencast to demonstrate column editing mode in Emacs. Emacs ships with CUA mode which you need to turn on and then let the good times begin. Enjoy…

Cleaning dirty database data

Posted in Home on June 9th, 2008 by mark – Be the first to comment

I have a database with duplicate records in it and I want to know how many records I should have if I clean out the duplicates. Boy is this thing dirty! The dataset I’m working with a mid sized (approximately 2 million records) and it’s a dump from another system. One of the problems is that the data in the dump has been denormalized. The second part of the problem is that some data has been entered multiple times in the source system1.

Let me give you an example.

blog_example=# \d
              List of relations
 Schema |     Name      | Type  |    Owner    
--------+---------------+-------+-------------
 public | addresses     | table | markmansour
 public | users         | table | markmansour
(2 rows)

If I wanted to count the number of users, this would be straight forward, I’d just:

blog_example=# select count(*) from users;
 count 
-------
     3
(1 row)

But let’s look at the data a bit more closely.

blog_example=# select * from users;
 id | name  
----+-------
  1 | Korny
  2 | Tim
  3 | Korny
(3 rows)

blog_example=# select * from phone_numbers;
 id | user_id |  number  
----+---------+----------
  1 |       1 | 11111111
  2 |       1 | 22222222
  3 |       2 | 33333333
  4 |       3 | 11111111
  5 |       3 | 22222222
(5 rows)

For the purposes of this example I’ll consider a duplicate to be a user with the exacly the same name, phone number and address – the main thing is that there are multiple one-to-many relationships and that there is repetition. In this example the user Korny (users with the id 1 & 3) have the same phone numbers and the same address and should be considered duplicates.
In SQL the normal way to group things together is to use the cleverly named “group by” clause, but that doesn’t get us what we’re after2. I’d like to see the following:

blog_example=# magic select name, number but put the numbers on the same line
 name  |  number  
-------+----------
 Korny | 11111111, 22222222
 Tim   | 33333333
(2 rows)

This can be done with PostgreSql (if you know how to do this in MySql please let me know!) by creating your own aggregate function . You’ve probably used an aggregate function like MAX or AVG before. I’m after a string aggregation function. You can define one like this:

CREATE AGGREGATE array_accum_text (
    basetype = text,
    sfunc = array_append,
    stype = text[],
    initcond = '{}',
    sortop = >);

This allows related rows to be grouped up, for example:

blog_example=# select u.*, array_to_string(array_accum_text(cast(ph.number as text)), ',') as all_phone_numbers
blog_example=#   from users as u
blog_example=#   inner join phone_numbers as ph on u.id = ph.user_id
blog_example=#   group by u.id, u.name;
 id | name  | all_phone_numbers 
----+-------+-------------------
  3 | Korny | 11111111,22222222
  1 | Korny | 11111111,22222222
  2 | Tim   | 33333333
(3 rows)

To take it a step further we can now group related fields together, but I’ll do it via a view. I want the users id to remain so that when I join the text together it doesn’t collapse all the telephone numbers from all the names even if their user ids are different (this is really hard to explain so I suggest trying it out without a view to see what I mean).

blog_example=# create view extended_users as
blog_example-#   select u.id as user_id,
blog_example-#          u.name as name, 
blog_example-#          array_to_string(array_accum_text(cast(ph.number as text)), ',') as all_phone_numbers
blog_example-#     from users as u
blog_example-#     inner join phone_numbers as ph on u.id = ph.user_id
blog_example-#     group by u.id, u.name;
CREATE VIEW

blog_example=# select name, all_phone_numbers from extended_users
blog_example-#   group by name, all_phone_numbers;
 name  | all_phone_numbers 
-------+-------------------
 Korny | 11111111,22222222
 Tim   | 33333333
(2 rows)

From this query I know that there are only two records once I remove all duplicates. When I ran this over my dataset it took the rows from 2 million down to 1.4 million. That is a lot of redundancy that my users don’t want to see. My next action is going to be to writing some Rails migrations to clean it up :), but that will have to wait for another post.

Footnotes

1 I want to talk about a technical solution so let ignore the politics of the situation (i.e. let’s presume that I can’t get the data keyed in a better way or have the data delivered in a more normalized format).

2 Some SQL:

blog_example=# select u.name, ph.number from users as u
blog_example-#        inner join phone_numbers as ph on u.id = ph.user_id
blog_example-#        group by u.name, ph.number;
 name  |  number  
-------+----------
 Korny | 11111111
 Korny | 22222222
 Tim   | 33333333
(3 rows)

Rubinius comes to Melbourne

Posted in Home on March 24th, 2008 by mark – Be the first to comment

Rubinius is a reimplementation of the Ruby interpreter initiated by Evan Phoenix who has recently been hired by Engine Yard. Engine Yard is a hosting company that is positioning itself as the hosting provider of choice for Ruby applications by a variety of developer focused strategies. In an effort to raise their profile and gain credibility in the Ruby community they’ve hired Evan to work full-time on the development of Rubinius. They’ve also hired the excellent Ezra Zygmuntowicz of camping fame which gives them huge cred in the developer community. Not only are they just latching onto established efforts, they are creating new markets by going after the shared hosting crew by developing mod_rubinius . I would guess that the benefits are twofold – they look like nice guys by giving back to the community, but they also create a very easy path for them to upsell shared hosting into VPS. Everyone wins when this kind of effort is put into a business model.

Evan and Eric Hodel are out in Australia for the Rubinius Sprint set up by Marcus Crafter and fellow Engine Yarder Dylan Egan . In a side trip Evan came to Melbourne and spent three hours explaining how Rubinius worked and answering a heap of really interesting technical questions from the Melbourne crew before heading out for a few quiet (5am) drinks.

BarCamp Melbourne 2008

Posted in Home on March 21st, 2008 by mark – Be the first to comment

I’ve been a bit absent with my blogging so I’ve got a few retrospective entries for you.

BarCamp Melbourne : 60 people, 25 talks and an amazing day of interdisciplinary cahoots unorganized by Ben Balbo ! What I loved about BarCamp:http://barcamp.org/ was the range of presentations – everything from programming, databases and information modelling through to media centers, industrial relations, social activities and a hell of a lot of fun.

I’d like to give special mention to Paul Fenwick’s hilarious and poignant talk on “An Illustrated History of Failure”. He managed to show why testing costs – and quantified it. It was a showcase of brilliant execution on a topic which would, with an average presenter, make you groan.

Other outstanding talks where Andy G’s “OpenLazlo GUI, jiggy itouch GUI, openmaji” and Brent Snook’s “Lowering Usability Hurdles with the Wii”. Very, very cool stuff. In reality all the talks were excellent and had a lot to offer. I can’t wait to see the what the next BarCamp has to offer.

I was lucky enough to win a copy of The CSS Anthology from SitePoint:http://sitepoint.com/ for my talk on Monkey Patching (which I’d given previously as the Ruby Nuby night a few months ago). The after event drinks were in North Melbourne, with the suds provided by Microsoft

You can keep up with the activities on by checking out the BarCampMelbourne2008 tag on da internets .