Open Algorithms, what are they? Yves-Alexandre de Montjoye explains

Monday, February 20, 2017

Open Algorithms, what are they? Yves-Alexandre de Montjoye explains


Yves - Alexandre de Montjoye is a lecturer at Imperial College London's new Data Science Institute. As a postdoctoral researcher at Harvard IQSS, his main field of research focuses on metadata privacy.


This week, we spoke to Yves about the OPAL project, which is an open technology platform where open algorithms run directly on the servers of partner companies, behind their firewalls, to extract key development indicators for a wide range of potential users, including national statistical offices, ministries, civil society organizations and media organizations - just to name a few.



So Yves, tell us more about the OPAL initiative.

OPAL is a platform which enables companies to use private data for social good. As I see it, OPAL is a new way of looking at data in a way that you can preserve people’s privacy. It takes us back to the basics. The question is: why did the anonymization of data start? Because we needed a way to use data on a large scale to do statistics and, at the same time, it wasn’t going to affect you because we didn’t know which part of the data was yours. This is why everything started. OPAL is a question and answer system and the idea behind it is to replace the privacy guarantee that anonymization used to provide for small datasets in the past, by solid proofs and IT security. 

"OPAL is turning a privacy question into a security question."
The philosophy behind this is about allowing people to use data, but without actually giving the data away. We allow organizations to ask questions about the data and then give them an aggregated answer, but without giving the actual dataset. People from social organizations and NGO's can formulate questions in the form of an Open Algorithm, then we review the algorithm and validate it, and then we run this question in our data. 

"The fundamental difference of OPAL is that the platform allows you to use the data without giving you a copy of it."

In 2006,  AOL had a data breach and, basically released search data. After some time, it turned out it was really easy to identify people in this dataset. Then you think we can’t have this data out there, but everyone already had a copy of it. That was 11 years ago and you can still download this dataset on BitTorrent.

In 1995, when data anonymization really started we barely had the internet, but now we're in the era of Big Data. A lot has changed in over the past few years so it is essential that we stop and rethink the way we approach data privacy, with proper computer science and security to ensure we use data in a way where we preserve people’s privacy. 

Data privacy
Figure 2: "It is essential that we stop and rethink the way we approach data privacy".

What are the next steps for OPAL to move forward?

We have a lot of work ahead of us, figuring out the details and building the platform in a scalable way. We are building a development team here at Imperial College Data Science Institute to do this. In its initial phase of deployment, the project will focus on a small set of countries in Latin America, Africa and Asia. First we are going to deploy it in Senegal, with the help of Orange Sonatel, and in Colombia, with Telefónica

Once we have tested the platform and run it there, we can add new partners. This is the benefit of software solutions, it is easy to deploy in new countries. Once an algorithm has been validated, it can be run around the world!
How would you picture privacy in 10 years?

In general, you can really see people are starting to care a great deal about privacy. We have seen a lot of backlash around privacy. There are a lot of good uses of Big Data such as the one generated by mobile phones, but we need to build solutions that allow this data to be used in a way that really protects people's privacy. This is why we're building OPAL.


No comments:

Post a Comment