Lo que se mueve en tu país vecino, y más allá

Wednesday, November 30, 2016

Lo que se mueve en tu país vecino, y más allá

Por Pedro de Alarcón y Javier Carro, Data Scientists en LUCA.

El valor de las llamadas telefónicas internacionales para entender nuestra sociedad

Telefónica dispone de una amplia infraestructura global de red que es ofrecida a otras operadoras para transportar su tráfico internacional de voz y datos. Éste y otros muchos servicios son comercializados por Telefonica Business Solutions (negocio de "wholesale"). El servicio que nos ocupa en este caso consiste básicamente en recoger tráfico de voz y datos en un país (proporcionado por una operadora de telecomunicaciones), transportarlo y entregarlo en otra operadora y país.

International call traffic may tell you more than first thought

International call traffic may tell you more than first thought

Originally by Pedro de Alarcón and Javier Carro, Data Scientists at LUCA.

This post debates the value of international phone calls in understanding society.

Telefónica has a wide global infrastructure of networks which can be used by other service providers to carry their international call and data traffic. Telefonica Business Solutions sell this service, amongst others, negotiating wholesale business deals. Our role throughout this process is to collect call and data traffic in one country (provided by a telecommunications operator) and effectively transport and pass this on to another operator in a different country.

International call patterns
Figure 1: We have analysed international call patterns to observe how different countries interact.

We recently had the chance to process and analyse a few months’ worth of data relating to this service. The aim of this was to let us understand the data and information allowing us to discover some interesting facts with a pretty simple analysis. All the information that is stored and processed by our global Big Data team here in LUCA is done so anonymously, ensuring a secure working environment.

When dealing with voice calls, the characteristics of each “recording/event” have a dataset which can be summed up as the phone number from the country of origin, a destination phone number, a timestamp and the call time duration. To add a deeper analysis, we can also use further parameters – but we won’t do that on this occassion.

Whilst the phone numbers we deal with are anonymous, we can access the country code and in some cases the region or province of the number of the person making the call and the person receiving the call. This dataset may face limitations in terms of data variance but it is expansive in terms of volume. In terms of structure it is very similar to some popular open data relating to air traffic (see this example). In fact, this resemblance has allowed us to easily reuse some interpretations of the data as they have been previously formed by the programme Carto

Let’s listen to what the data tells us:

 Given the basic information that this dataset has provided, the first exploration that we have raised is the evolution of the total number of calls that they have studied. The following graph is a representation of the daily traffic that was studied.

Amount of calls managed by Telefónica
Figure 2: A specific representation of the amount of calls managed by Telefónica. We can see a clear weekly pattern and the curious changes that happen in different weeks.


It has been found that a weekly pattern takes place with dips during the weekend. What is most notable is the weekly variation and of course even more so when the variations are very pronounced. The data is starting to to show something worth debating, so what is the data trying to tell us?

The answer becomes clearer when we start to travel between countries. For example, in the following graphic we can see the daily progress of the number of calls to Italy from various countries. The biggest peaks that appear on the right hand of the graph are from the 24th of August 2016, which is when a large earthquake took place in Italy.

Calls made to Italy
Figure 3: Representation of the amount of calls made to Italy from different countries worldwide during the earthquake that took place on the 24th of August in Italy.


The data may also be starting to allow us to analyse international events when the ties between countries are noted through our data. Let’s hold that thought, the information starts to appear more subtly: why was the response of Ecuador or Argentina more notable than other countries? We can try to explain the situation with a few well thought out arguments, but, we want the data to do the talking.   

Google gives us a very useful tool to help us interpret this information we are finding. This platform is referred to as GDELT and it monitors in real time what’s happening in the world and the impact it’s having. It also takes into consideration the language and where in the world it has happened. This means that we can further develop the information that we already have by combining the local and global. This tool can be used with the BigQuery platform from Google. Depending on how you choose to set the parameters the results may vary or you can simply stick to the preconfigured analytical tools.

As an example, in June 2016, the United Kingdom voted over whether they would remain in the European Union. Can our data explain this? It certainly can. We aren’t just talking about the immediate effect but also the impact it will have in the following weeks. We can see in the following graph the amount of calls between the United Kingdom and Belgium (headquarters of the European Commission). The first marked date (in red) is the day of the vote (Thursday June 23). We can also see the impact in the weeks after the event. The second marked date, exactly a month after, coincides with the first published economic index which highlighted the economic contraction of the United Kingdom.

Calls between UK and Belgium
Figure 4: Representation of the amount of calls between the UK and Belgium around the time of the Brexit vote in the UK.

These initial investigations help to create a more formal model. These bodies can even be the anonymous phone numbers as well as the geographical regions with their origin and destination. They can also be taken as separate data that can create a series of indicators (amount of minutes received and taken), or this information can be paired up so that it would be talking about a network or graph in which the hubs are the bodies and the arcs connect those hubs with others where there has been traffic.



Type of data considered Suggested Analysis:




The next figure shows an example from a graph which represents a map that shows and analyzes the existing connections between Spain and the rest of the countries noted. In detail, the data is linked to July 7 2016, and highlight the connections with Islamic countries as it was the last day of Ramadan. They also show links to countries that contribute to tourism in the summer. The video below shows the daily changes in data from the map. 

Connection graphic
Figure 5: Graph showing the connections between Spain and the rest of the world on the 7th of July 2016 (the end of Ramadan)

Time Series

The sequential and temporary nature of data allows us to model them in a time sensitive way. The analysis of time series is a popular statistical discipline and therefore library functions have been developed in almost all programming languages which are regularly used in the world of data analysis (R, Python and Matlab). There are even free tools such as INZight which allow us to do more basic analyses without even writing one line of code.

As a first step before making any analysis, it is important to verify that our data series is static (the mean, variance and covariance of its values does not depend on time) and, if it is not, we make it that way. A series of data from the call based data set shouldn’t usually be stationary, so we need to work on that.

Put simply, a time series like what we have identified in the data taken from the call traffic has been divided into three parts that can be added together or multiplied to produce the original series. 

Trend: In our case it depends of the volume of traffic that Telefonica processes with a particular country, I.e., we are mainly linked to the growth or contraction of business.

Seasonality: There are notable weekly cycles, in which a significant increase in calls happens during the weekend. 

Remaining information: This is the difference in values from the original series of data and then data that has been generated through trends and seasonality. This part of the data allows for most interest as the peaks and troughs can be linked and related to technical issues, international events and public holidays. Ultimately the remaining information is where we can look if we want to analyse what happened outside of the normal trends. 

Any program (like zoo, xts or R timeSeries) allows us to easily remove these three components.

Graphic
Figure 6: Break down through trend, seasonality and those left over from the amount of directed minutes to one single country.



The usual interest in doing a time series analysis is to be able to generate a predictive model (like the exponential smoothing model or the ARIMA model) that allows us to anticipate, for example, how much traffic we will have in the next few days. Or it can help us to find the true outliers in the series (values that come out of the intervals of predictions that we can do confidently, which is quite simple to do in R

Due to its reliability to make predictions in the short term, the family of tests belonging to the exponential smoothing technique from Holt-Winters have become popular and are available in tools like Tableau or TIBCO Spotfire analysis. 

The ARIMA models are more complex to apply but in most cases improve the prediction of the previous data as the link between the data has been previously established giving the model more context depending on earlier values.

Traffic generated
Figure 7: Prediction of traffic generated with exponential smoothing (Holt-Winters)

Multi-country social media platforms:

The information behind the use of social media is a value that is used extensively by businesses. The main reason that businesses follow up this information is to segment their clients so that they can effectively communicate with them to increase their chances of product consumption. However, the main obstacles for businesses when trying to exploit these sources are complexity and cost.

Telefónica provides experience and differencial knowledge about the construction of social media models or SNA (Social Network Analysis) which uses information gained from caller patterns. This time we want to understand the existing relationships internationally that are formed through social media and how we can explain their relevance through telecommunications. We have been inspired by social initiatives like Combatting global epidemics with big mobile data and also Behavioural insights for the 2030 agenda.

The next figure gives us a first look at data taken from this type of perspective. Only taking into account the volume of calls relevant to the ones that were actually answered, combining this with common sense aligning this data with global socio-economic data.

Amount of calls
Figure 8: The amount of calls be between countries paired by their origin and destination calls throughout the month of August 2016. This only counts the countries with most volumes of calls generated and the main volume of destination calls for each country.

There are good sources with international socio-economic data in order to contrast and complement what is observed in our data. For example, large amounts of economic data can be found in the economic observatory of MIT, in the Databank of the World Bank, or Eurostat. And more social (and also economic) data in the United Nations or UNICEF databases. This type of data can be very useful even if there is temporary granularity, spatial issues, or the frequency with which they are published is not ideal.

Before continuing to understanding how countries interact, we need to stop for a moment and think about how people behave when it comes to making a call.

In Figure 9 we have divided the calls that are made daily into four different user groups: those who tend to call during working hours (green), those who call in their free time (blue), those who call during the weekend (red), or finally those who call at night time (purple). Although this first division may seem simple, it allows us to note the users who will normally be calling for personal reasons or those who ring because of work related activity. We can highlight how, for example, the level of calls during the weekend easily exceeds those taken from Monday to Friday, and furthermore once you are in one of these groups you tend to stay there. It’s easy to say this was expected but it’s the data that has been able to state and qualify these statements.

Daily evolution
Figure 9: The daily evolution of calls made by users who normally call during office hours (green), those who call during the afternoon Monday-Friday (blue), weekend callers (red) and night time callers (purple)

Coming back to the inter-country perspective, in the following graph we can see a geographical representation that can help us to better understand the flows in communication. The original data has been simplified and scaled for convenience and ease of reading. We can monitor the changes in datafrom the dates of Ramadan (7/7/16) to the earthquake in Italy on the (24/8/16).


Video 1: Animated representation of the connections made between Spain and the rest of the listed countries throughout the months of June, July and August 2016. We can see these connections with relation to the dates of the end of Ramadan and also with the earthquake in Italy

These representations have led us to confirm personal links (social) and professional (economic) which we have mentioned before when referring to socioeconomic data. In the next graphic we go a little deeper to show a more specific central European zone, the video gives us a closer look at the analysis of the data. 

Geographical representation
Figure 10: Geographical representation of communications through a defined zone in Europe.

Recapping what we have learnt from the data analysis, where we have separated the callers based on their habits and have the precise information about their location we can start to understand the fundamental relation between call data and other socio-economic indicators. Not forgetting the link between global events, commercial relations between regions and even the simple interaction between people in their local communities.

For example: 

We could analyze communications between eminently industrial zones and compare those with relation to commercial seaports which are connected by transport links.
The combined knowledge of communications between the caller country and the destination of the call collaborated with historic immigration patterns has allowed us to give the data a deeper meaning. We could see that this was consistent with the data analysed about Argentina and Italy during the earthquake crisis in Italy. For this reason, we expect the same patterns with Spain and Germany. Does this mean that this call time information could become a true indication for modern day immigration? It’s possible that one day we might be able to predict the flow of people through data.

Clearly our digital footprint goes a long way in describing us and our behaviour. 

Artificial Intelligence: What even is that?

Tuesday, November 29, 2016

Artificial Intelligence: What even is that?


By Dr Richard Benjamins, VP for External Positioning and Big Data for Social Good at LUCA.

Artificial Intelligence (AI) is the hottest topic out there at the moment, and often it is merely associated with chatbots such as Siri or other cognitive programs such as Watson. However, AI is much broader than just that. To understand what these systems mean for Artificial Intelligence, it is important to understand the "AI basics", which are often lost in the midst of AI hype out there at the moment. By understanding these fundamental principles, you will be able to make your own judgment on what you read or hear about AI.

Perú se une al viaje Data-Driven

Monday, November 28, 2016

Perú se une al viaje Data-Driven


Según el último informe del Fondo Monetario Internacional (FMI), Perú será el segundo país con más crecimiento económico en Sudamérica, solo superado por Bolivia. Se espera que, para el 2017, la economía peruana crezca en un 4,1%. Este crecimiento exponencial en la economía de Perú lo sitúa como uno de los países en el punto de mira para la adopción y crecimiento de las tecnologías de Big Data.

Telefónica Mannequin Challenge

Friday, November 25, 2016

Telefónica Mannequin Challenge

Today in the office we decided to do our very own Mannequin Challenge, bringing together employees from all over Telefónica. This viral internet craze has even frozen the internet in recent weeks so we decided to do our own version:

Can Mobile Data combat Climate Change in Germany?

Thursday, November 24, 2016

Can Mobile Data combat Climate Change in Germany?


One of our favourite topics here at LUCA is using Big Data for Social Good, to measure our progress on Sustainable Development Goals. Three of the 17 goals are closely linked to Climate Change: Affordable and Clean Energy; Sustainable Cities and Communities and Climate Action.

Commuter Traffic: Can Big Data solve the problem?

Wednesday, November 23, 2016

Commuter Traffic: Can Big Data solve the problem?


By Javier Carro and Pedro de Alarcón, Data Scientists at LUCA.

When we sit in our daily traffic jams, many of us may think: Where do the other commuters come from? Where are they on their way to? Are we all going in the same direction? Perhaps the lady who sat in the car next door actually lives 2 doors away and has the exact same commute as me everyday.  Here at LUCA, we decided to take a data-driven approach by looking at our mobile data insights to show you the huge potential of carsharing, demonstrating that us commuters have a lot more in common than you may think.

#LanzamosLUCA: Carlos Marina habla de conversión y privacidad en retail

Tuesday, November 22, 2016

#LanzamosLUCA: Carlos Marina habla de conversión y privacidad en retail


Esta semana pasamos a la ponencia de nuestro evento de lanzamiento de Carlos Marina, CEO de On The Spot, que nos habla del sector retail, explicándonos algunos casos de uso de Big Data aplicados a empresas.
La ponencia comienza definiendo el sector retail como el conjunto de aquellos espacios en los que existe una interacción entre el cliente final y una marca. Carlos desmitiza al Big Data poniendo ejemplos de situaciones cotidianas en las que se puede aplicar.

El CEO de On The Spot nos explica como todos estamos estamos interconectados al retail, ya que si no eres dueño de una tienda, eres empleado o eres cliente. Carlos nos cuenta que es necesario aprovechar las ventajas que nos ofrecen las nuevas tecnologías, entre ellas el Big Data, ya que es un requisito para poder avanzar en el mercado siendo competitivo.

 La clave para el retail es la conversión. El big data puede ayudarte en todos los pasos del ciclo de retail, para lograr la conversión deseada: desde elegir donde abrir tu tienda, que se gasten más en la tienda, y al irse, que se acuerden de tí.

Telefónica ha desarrollado mecanismos, como Smart Steps, para averiguar dónde se mueve la población y cuál es su perfil sociodemográfico. Aplicado al retail, la plataforma nos permite resolver cuestiones como: ¿Cuánta gente entra en la tienda? ¿Cuánto tiempo pasan los clientes en la tienda? ¿Cuál es su perfil?. La segunda palabra clave es privacidad. Hay que tener en cuenta que toda la información debe de ser agregada y anonimizada, respetando la privacidad de los clientes.

Carlos nos cuenta una serie de casos de usos en retail, utilizando demos y ejemplos de clientes en su uso del Big Data. Puedes ver la charla entera aquí:



Big Dating: Could AI be the real matchmaker on Tinder?

Monday, November 21, 2016

Big Dating: Could AI be the real matchmaker on Tinder?

Online dating platforms such as Tinder, Happn and Hinge are seeing exponential growth, slowly sliding on to the home screens of smartphone users all over the world.  Last week at the Web Summit in Lisbon, Tinder's CEO, Sean Rad, presented about just how popular the world of swiping and superliking has become, declaring that 80% of people on the app are actually searching for "serious relationships". He also shared that 85% of users are Millennials and that 1.4 billion swipes take place every day, creating 26 million daily matches. 

LUCA at Big Data Spain 2016: Our Full Roundup

Friday, November 18, 2016

LUCA at Big Data Spain 2016: Our Full Roundup

Over the past 2 days our LUCA team have been in full force at the Big Data Spain 2016 event, which was held this year in Kinepolis in the Ciudad de la Imagen. This technology summit, run by Paradigma Digital, brings together over a thousand Big Data professionals and is already in it's fifth edition. 

Open Data and Business - a paradox?

Thursday, November 17, 2016

Open Data and Business - a paradox?


While Open Data has a wide range of definitions, Wikipedia provides one of the most commonly accepted: "Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." 

#LanzamosLUCA: De reacción a predicción con Carme Artigas

Wednesday, November 16, 2016

#LanzamosLUCA: De reacción a predicción con Carme Artigas


Como cada semana, lanzamos una nueva charla de nuestro evento de lanzamiento, esta semana le toca el turno a Carme Artigas, CEO y fundadora de Synergic Partners, que nos habla del Big Data en acción. Su proyección internacional y éxito empresarial ha hecho que Carme haya sido reconocida por un reciente estudio de  'Insights Success' como la única española entre las 30 directivas más influyentes.

The Data Transparency Lab Conference 2016 kicks off tomorrow

Tuesday, November 15, 2016

The Data Transparency Lab Conference 2016 kicks off tomorrow


By Ramon Sangüesa, Data Transparency Lab coordinator.

This week the 2016 edition of the Data Transparency Lab conference will take place.  In this event, a community of technologists, researchers, policymakers and industry representatives come together at Columbia University in New York in their ambition to advance online personal data transparency through scientific research and design. This same conference took place last year in Boston at MIT as you can see below:

Big Data Spain 2016: Your chance to join the LUCA team?

Big Data Spain 2016: Your chance to join the LUCA team?

This week on the 17th and 18th of November, Big Data Spain 2016 will be taking place bringing together over 1000 Big Data experts from all over the world to discuss the latest trends and challenges facing the modern organization when it comes to data. 

Smart Energy: predecir el consumo para detectar desviaciones

Monday, November 14, 2016

Smart Energy: predecir el consumo para detectar desviaciones


By Elena Cruz Martín (Data Scientist) and Francisco Javier Vilchez Torralba (Intern) at LUCA.



No han pasado ni 150 años desde que Thomas Alva Edison patentó su bombilla. Sin embargo, en un mundo donde el avance tecnológico es tan rápido en unas décadas hemos pasado de sorprendernos por poder tener luz por la noche a un consumo masivo de energía: ya no solo no nos maravillamos por esto, sino que damos por supuesto que en nuestros edificios tendremos la iluminación correcta para cada hora del día y que en nuestra oficina habrá unos agradables 22º. 

#LanzamosLUCA: presentamos nuestro portfolio de soluciones en Big Data

Friday, November 11, 2016

#LanzamosLUCA: presentamos nuestro portfolio de soluciones en Big Data

Continuando con la serie de ponencias de nuestro evento de lanzamiento, esta semana pasamos a las charlas de José Luis Gilperez y Elena Gil, que nos cuentan más acerca de LUCA y de la propuesta de Telefónica en Big Data.

Chatbots? New? You haven't met ELIZA

Thursday, November 10, 2016

Chatbots? New? You haven't met ELIZA

By Dr Richard Benjamins, VP for External Positioning and Big Data for Social Good at LUCA.

Artificial Intelligence is a hot topic at the moment. We definitely live in the AI summer, as opposed to the AI winter of the 1970s when AI research suffered a decline in interest and funding due to undelivered expectations. Today, AI is back in, and chatbots in particular are at the centre of every analysts attention. 

Big Data and Elections: We shine a light on Trump and Clinton

Wednesday, November 9, 2016

Big Data and Elections: We shine a light on Trump and Clinton


Twitter is widely used as a tool to understand and predict phenomena in the real world. Today on our blog, we have been using Twitter to understand the US Presidential Elections of November 8th 2016. There are no conclusive research results on whether it is possible to predict the outcomes of elections using tweets but we decided to investigate.

Can Big Data and IOT prevent motorcycle crashes?

Tuesday, November 8, 2016

Can Big Data and IOT prevent motorcycle crashes?

Most of us are familiar with the dangers involved in driving motorbikes, with motorcyclists being 27 times more likely than passenger car occupants to die in a crash per vehicle mile traveled, and almost five times more likely to be injured, according to the US based III.

Los datos hablan en Castilla y León

Monday, November 7, 2016

Los datos hablan en Castilla y León

El pasado 3 de noviembre tuvo lugar la IV edición del Big Data CyL, un evento que convoca a numerosos profesionales del mundo Big Data de Castilla y León para compartir experiencias sobre análisis de datos, visualización, tecnologías, gestión de datos y gestión de negocios en este ámbito.

Fighting Fraud: The $3.7 trillion black hole facing today's organizations

Friday, November 4, 2016

Fighting Fraud: The $3.7 trillion black hole facing today's organizations

By Daniel Torres, Global Product Manager at LUCA.

The global cost of fraud per year is approximately $3.7 trillion according to a 2014 survey, meaning that the average fraud impact per organization is estimated at around 5% of its annual revenue. Whilst many believe that fraud cases tend to be multi-million dollar affairs, when in reality the survey revealed that the average loss was actually $145,000.

#LanzamosLUCA: Chema Alonso sobre la transformación hacia data-driven

Thursday, November 3, 2016

#LanzamosLUCA: Chema Alonso sobre la transformación hacia data-driven


El pasado día 20 de Octubre tuvo lugar el evento presentación de LUCA, la nueva unidad de Big Data B2B de Telefónica. Como ya prometimos, iremos compartiendo las charlas de los ponentes que participaron en nuestro evento de lanzamiento y en las que se desvelaron tanto insights acerca de la nueva oferta de LUCA como casos de éxito en los que Telefónica ha trabajado haciendo uso del Big Data.

LUCA and CARTO to work together bringing location to the next frontier of Big Data

Wednesday, November 2, 2016

LUCA and CARTO to work together bringing location to the next frontier of Big Data

The ability to derive actionable insights from analyzing Big Data is a huge component to success for any telecommunications company. Big Data is typically defined as having an inordinate amount of velocity, volume, and variety of data, which often contains a location element. Therefore, performing a holistic analysis requires location contextualization. We are now working with CARTO to do just that.

Big Data for Social Good: How 6 billion mobile phones are social sensors to save lives

Tuesday, November 1, 2016

Big Data for Social Good: How 6 billion mobile phones are social sensors to save lives

By Florence Broderick, Strategic Marketing Manager at LUCA.

Attending One Young World as a returning ambassador this year as one of the 40-strong delegation was an absolute privilege. Taking part alongside young employees from hundreds of public and private sector organizations was an eye-opener into the different professional, personal and political situations we face, depending on the country in which we work or study.