Taming Open Data – the process behind Top Areas

News, Product

Our exciting announcement of the new Top Areas product is just one milestone in our long and ongoing process of Open Data analysis. When we started our work with Open Data we anticipated a few road blocks, but the challenges we encountered have exceeded our expectations. The primary problem with Open Data is that there is no standardized system shared between countries, or even between cities within the same country. The drastic variation made it challenging to generate the relevant results that we sought.

Open Data in Paris, France (left) vs Rome, Italy (right)

Our team has worked hard to overcome the lack of standardization by combing through the data of every city manually to try to find repeating patterns across different cities. We then began the validation process by focusing solely on Barcelona – a city that everyone on our team knows well – in order to uncover patterns that could be applied to other cities. The data used in our analysis came from the 2011 census, specifically we looked at the statistical data for census sections (areas with 1500 to 2000 inhabitants) around the city.

Our primary goal was to identify which data sets could be useful in revealing larger trends throughout the city. We focused first on uncovering the posh areas. After some trial and error we found that size of housing could be used to identify affluent areas, as well as immigration data (we look for a high concentration of predominantly European immigrants with more purchasing power). We cross referenced these findings with public data on the sale price of real-estate around the city. The results we obtained are very consistent with the actual areas considered posh in Barcelona.

Top Areas 

Data maps based on real-estate price (left) and size of housing (middle) versus our Top Area maps

Our next test was to identify and define different types of shopping areas in Barcelona. We utilized both our own GeoPopularity data and the available census data in order to uncover the main commercial areas and shopping centers in the city.

Top shopping centers in Barcelona

The success of our first two trials made us optimistic about our ability to use Open Data to generate relevant information about a city. We continued on with our project by brainstorming some neighborhood characteristics that would be of interest to travelers. Our research has led to the successful implementation of the following categories in our Top Areas product: Historic, High Street shopping, Beaches, Business, Multi-cultural, Green Parks, City Center, Young People/students and Posh areas. 

Our goal was to uncover the top (most prominent and popular) city areas falling into one of these categories.

Currently the parameters we use to classify our areas include:

  • Open Data: Geodata of each city, in order to have a reference point on which published census data can be retrieved and used. 
  • Open Data: Census information on immigration and housing. 
  • Open Data: Industrial and green areas.
  • Public Data: Information on housing prices. 
  • AVUXI Data: Categorized GeoPopularity data grouped by census areas. 
  • AVUXI Data: Heat maps of areas of interest by basic traveler activities: Eating, Shopping, Sightseeing and Nightlife.

The frustration and hard-work with Open Data was well worth it now that we have officially added our Top Areas product to the TopPlace Product Suite. We are incredibly proud of its accuracy and the value its integration can bring to any OTA or Metasearch site. While Open Data still proves to be a challenging system to work with, the platform we have built has made it much easier. We have come a long way and continue to refine our process of analysis daily.