Smart Cities: Pushing Open Data with Power BI

Monday, 24 April 2017

Smart Cities: Pushing Open Data with Power BI

Written by Paloma Recuero, graduate in Physics, Masters in eLearning and Educational Technology, currently collaborating with LUCA.

In this article, which will consist of two posts, we will talk about Smart Cities and how they can use their data to be more "smart." We will work with an example dataset from the London Data Store (about Fire Services) to learn how to use Power BI Desktop as a data analysis and visualization tool. The article will finish by drawing various conclusions from the studied data.

I. Background: Smart Cities and Open Data

Cities offer different services to citizens and therefore need to collect and store a large amount of diverse data. Other bodies and public administrations that are financed by citizens generate information such as geolocalised information, weather conditions and medical information. Supporters of Open Data argue that the information should be accessible and reusable for the general public, without requiring specific permissions. They deem restricting their access as going against the common good, since the information belongs to society as it has been financed by society members themselves.

Almost a decade ago, in 2010, the first Open Data Day was celebrated, which brought together 60 cities interested in being able to use this information to offer better services to the citizens and to help them solve the different challenges that were facing them.

Figure 1: What opportunities can Smart Cities offer?

What are these challenges? 

Although each city has its peculiarities, many of them are shared between different cities of the world. We are talking about challenges such as the continuous increase in population which stems from people moving from rural areas to cities. We are then also faced by problems such as traffic congestion, pollution, rising housing prices, aging infrastructure and services on the verge of collapse.

How does a Smart City address these challenges?

When we talk about Smart Cities, we basically talk about infrastructure, connectivity, IoT and of course Open Data. We are talking about Big Data technologies, machine learning and extraction, transformation, normalization, processing and data exposition. All of these prior actions will allow us to take advantage of all that information, detect problems, extract patterns, analyze behaviours, hypothesize or optimize models. In short, use the available data to optimize the services offered to citizens.

An example:

When Open Data is macro-managed at a local level, it allows for very focused initiatives to help aid the real problems of citizens. For example, in many large cities, the shortage of parking places is a real problem, while at the regional level, the number of places may seem sufficient. Cities that provide real-time information on the availability of these places promote the development of apps that direct drivers to them, reducing lost travel times and pollution rates. Another interesting example is apps based on the data that local government offers on contamination and levels of allergens so that the affected people can take measures that minimize their effects.

Figure 2: Examples of applications like Google Play that measure the levels of pollen.

You can also combine data from different sources, data from public administrations with others offered by private companies (treated to remove confidential customer information from them). LUCA collaborates in Big Data for Social Good initiatives, providing great value by offering NGOs and governments anonymized data on the location of mobile terminals that have proven to be of great interest in generating insight into natural disasters. These insights allow them to send the needed emergency help and resources faster and more efficiently.

For this reason, many cities, not just big capitals, want to become Smart Cities. A connected city can use information and technology to offer its citizens a policy of transparency (based on open data), more efficient services (optimizing the use of public resources and reducing costs) and effective empowerment and a greater participation of the people in the decision making that affect their day to day). There is always a "but" that must be considered...

As always with innovative projects, there have been no shortage of alarm bells ringing and criticisms about the risks of the Smart Cities. Some authors believe that it is a model driven by large engineering, technology and consulting firms solely for the benefit of their own interests and that they are predicted to be the cause of the "end of democracy" as highlighted in the article, "The Truth about smart cities: in the end, they will destroy democracy'.

It's safe to say that sometimes technology is put ahead of people. When you start to only think in terms of sensors and algorithms, confusing the means with the ends, you do not take into account what the needs or problems of the real people. In these cases, the result can never be positive. "Making Cities Smarter: How Citizens' Collective Intelligence Can Guide Better Decision Making" suggests we should take into consideration collective citizen intelligence to help IT decision-making to build a real, human build smart city" which can take the bottom-up approach being led by the citizens.

II. The Problem

Figure 3: Image of London Data Store.

London was one of the first European cities to opt for an Open Data Initiative, and in January 2010 they launched the London Data Store, where data sets were published covering a wide range of topics including: economy, employment, transport, the environment, security, housing and health. This Data Store contains more than 500 datasets and its website is visited every month by 50,000 people. In 2015 they received the ODI Open Data Publisher Award for pioneering work in publishing open data at both local and regional level.

This initiative gives us an excellent example of Open Data that has been maintained locally. This has proven very useful for undertaking initiatives that concen it's citizens, providing solutions to specific problems and translating this into innovative use of technology for the benefit of the inhabitants of the city.

The next section of this post we will work one of the more "picturesque" data sets offered by the London Data Store. This will include data collected about the animal rescue services provided by the City Fire Brigade. The analysis of this data set will present a new problem with which we will try propose solutions.

What challenge faced the London firemen?

London Fire Brigade rescues pregnant cat
Figure 3: Firemen in Tottenham who rescued a pregnant cat.

The data

We will now take you step by step through how to analyze the London Data Store information. Firstly please download the following excel document.

This is a table of 4751 records, which collects information on the departures of firefighters to cover this type of service. The main fields are the incident number, the date and time of the alert, the hours dedicated to the problem, the cost of the service, the type of animal involved, a brief description of the problem, origin of the alert and the type of location. 

Although it is not relevant for our analysis, whenever a file or document is downloaded from the web, in this case an Excel table, it is advisable to check its level of security regarding metadata exposure. There can be data hidden amongst data that can provide sensitive information about our working environment without us being aware of it. We use the Metashield Analyzer tool to ensure security. In this case, we will simply analyze the file (Step 1). When we ourselves are the ones publishing information in a datastore we will make sure that we do not post information that could cause us to be hidden in the metadata using Metashield Protector (Step 2).

The result of the analysis shows a low level of risk. The "exposed" information is inadvertently related to the printer and creator of the document. However, it is always advisable to check.

We will work with a tool that allows us to perform visual analysis very intuitively and create reports that will highlight the patterns that characterize the data. This is the Power BI Desktop tool.
"Power BI is a collection of software services, applications and connectors that work together to turn unrelated data sources into coherent, interactive and visually appealing information. A typical Power BI workflow begins with Power BI Desktop, where a report is created. This report is then published to the Power BI service and then shared so that users of Power BI Mobile applications can use the information."

Power BI allows the user to create visual reports using different data sources from a simple Excel table to a collection of local or cloud-based hybrid data stores. They can be shared, updated in real time, our example does not require such complications. We have a simple Excel table, we will load it through the desktop tool and we will get various reports that we can use internally. We can work without any problems with the free version of the tool, for which we only have to register to use.

We downloaded the application and installed it just like any other Windows application.

Figure 7: The initial installation screen for Power BI Desktop, simply choose the correct format for your desktop.

Format the Data

In this example, we will work with an Excel file. If we worked with Excel files in OneDrive, reports and dashboards in Power BI would automatically update when you make changes and save the job. 

For Power BI to be able to import the data from the workbook, the data must be in a table format. It is that simple. In Excel, you can highlight a range of cells and, on the Insert tab of the Excel ribbon, click Table. It's also important to make sure that all columns have a proper name so that you can easily find the data that interests us when creating reports in Power BI.

Although many of these aspects can be modified again once the file is loaded, it is usually more convenient and faster to do this debugging the data before loading.

We created a new report and chose the first available data source: Excel. Get Data>Files>Local File, to find and select the Excel file you want.

Figure 8: Screenshot for the selection of the data source. Show all possibilities “data source”.

A new window the appears that allows us to choose which sheet from the data table which we are most interested in. it shows a preview of the table and gives us the possibility as we indicated before to edit it. In this case we load it directly by selecting the "Load"button.

Figure 9: Selection screen. It allows us to select which pages from the Excel we wish to load. It then offers a partial visualization followed by the ability to edit the data.

As these things never go well at first, we get a nice error message. We have no less that 2853 errors. We can investigate there origin directly from the "
View Errors" section.

Figure 10: Error message that appeared when we loaded our example.

It seems like the problem stems from the format of the "End of Service" field. It has given the same value to all fields.

Figure 11: Yellow highlights the fields that were not able to load correctly.

As this data does not give us any relevant information (due to the services only lasting hours instead of days and as we already have the start date for the service these then becomes redundant. For this reason we can simply delete this column. Highlight the entire column and in the menu select "Remove".

Figure 12: Deleting one of the columns

Another option would be to delete the column and only use the data from the year:

Figure 13: Instead of deleting the data we can change the form of viewing it and use the data for the year.

The format of the "DateTimeofCall" field also doesn't seem sufficient. This time we opt for the "Change Type" option from the same menu however we convert it to include a decimal point.

Despire reducing the number of errors through reloading the data there continues to be errors in some of the fields. Now we are dealing with the "IncidentNumber" field, the format that should be applied is the "Text" format. "Close and Apply".

Figure 15: Change the format from Incident Number to the text format

Visualizing the data with Power BI, basic concepts.

Once we have loaded the data we can begin to generate reports. We will first look at some of the basic concepts of the tool and we will then move on to the panel of reports for this case. 

In light of this report there a five clear areas:

1. Range of tasks, this shows the common tasks associated with the reports and visualization.
2. The Report view or canvas, where visualizations are created and organized.
3. The Tab area which is located in the Report section, this allows us to select and add a page to the report.
4. The Visualization panel, this is where the visualization can be editted, personalizing colours, axes, applying filters or drag fields.
5.  The Field panel, filters can be pasted to the report as well as the Visualization panel. Through loading the data, we can see the distinct fields in which the reports can be developed.

To create a visualization, simply drag a field from the list of fields to the report. The result will be the default data table. 

We can also firstly select the type of visualization that we want and then drag data fields to bookmark the position on the canvas. 

Figure 16: By choosing a visualization,  a canvas appears in the corresponding bookmark.
The end result of the visualization depends on how we are formatting the data. The type of visualization also affects how the changes can automatically become updated through the process.

Figure 17: Making a selection from the list of fields that we want to add and then drag them to the canvas of under the corresponding label under the window "Visualizations" (Axis, Legend etc.).

Finally, we can change the size of the display on the canvas, reposition it, edit it, add labels and modify colours etc. As we hover over sections of the visualization we can see information about tools that contain detauls about that segment, such as tags and their total value.

Figure 18: We can modify the visualization directly from the canvas (with the menu of its own window) or using the different tools that appear in the menus to the right of the screen, as we hover over it.
Returning to our first example, we dediced to create a table visualization of the data. The bookmark is created and from this point we select the fields of data that we are interested in (highlighted in yellow), these are then added to the table. 

Figure 19:
The selected fields (highlighted in yellow) appear under the "Values" column. Using this method you can add the selected fields directly from the list. In other visualizations, it is advisable to drag the field under the column with the correct label that we want (Legend, Shared Axis, Column values, values etc). 

At the same time, filters are added for data field:

Figure 20: In our example, through added the "AnimalGroupParent" field automatically allows us to filter for each type of animal that appears in the group.

We now have access to a tool to work with the dataset from the special services of the London firemen. We have the magnifying glass to analyse the clues. The question is, what conclusions will we reach? Look out for our next post which will hopefully give you an answer.

No comments:

Post a Comment