Homomorphic Encryption — The holy grail of online confidentiality

Andreas Pogiatzis
8 min readJul 20, 2018

--

Data… I need more data….

We rapidly transitioning in a digital society where personal data become increasingly valuable. The colossal evolution of AI and the advent of deep learning have rolled out a path for an era of immense innovation. AI is becoming so wide spread that in the very near future every single digital device will utilize some sort of intelligent agent to carry out a task. However, as magic as it sounds, it does not work out of nowhere. It all boils down to a single element. Data, data and yet… more data.

It is not just a matter of hard work and capital that big players like Google, Facebook and Microsoft are dominant in the field. It is more than that.

Numbers indicate that Google and FB influence 70% of all internet traffic [1].

A ridiculously amount of google searches is carried out every minute all around the world. Imagine the details that FB keeps for its users, and I am not talking direct personal data but rather, generated data — What pages you visited, how much you are engaging with your friends, what kind videos you watch etc — in other words, your digital footprint!

Observing this from where we stand now is simply astonishing. A recent example is Google’s Duplex [2] with its human-like conversational abilities or even NVIDIA’s synthetic real time image generation [3] which is almost indistinguishable from real images. But fast forward few years from now and imagine this fictional scenario: Facebook can use your chat history to give you a personal intelligent agent that can respond to chats in your messenger on your behalf with the exact same writing style as yours! Also, you can give Amazon all your banking details and roll out specifically crafted shopping/grocery lists (Now that acquired Wholefoods) that match your financial status and preferences! A new genome company can use a your DNA sequence to give you a detailed list of medicines that would work really well on you.

As you can see, all of these come at a cost. The cost of giving away your privacy, to access all these services. What if one of these companies suffer a data breach? Or even worst, what if they sell your data without your consent? There are scenarios like these happening for real already. Let alone in the future, when these pools of data will become a treasure with an increasing value.

But at the end of the day, you did consciously let that happen to yourself anyway right? You gave your data away. What else could you do? Plug out the ethernet cabe from the router and read the news from the daily newspaper? Nah.. Who would do that? No one is willing to give up their online activities now, just for the sake of saving their privacy in the long term.

On the other hand even if we do it, even if we erase our online identity for the sake of privacy, this is indirectly damaging the evolution of AI. You may don’t want to give your consent to let your location data to be used for research due to confidentiality issues and that’s totally understandable, but at the same time you are contributing to the barrier which holds AI innovation back. ML and DL all require huge amounts of data to work well, and visa versa, unstructured data is utterly useless without some sort of processing for extracting meaningful information.

Homomorphic Encryption to the rescue

Now, what if I tell you, there is a technology that can solve all these issues? That’s right, and in fact there has been around for many years but yet, it was not used for practicality/security issues. That is Homomorphic Encryption(HE).

What is HE?

Homomorphic Encryption is an encryption scheme which allows computations to be performed on encrypted data without corrupting their features or format.

More formally, let (P, C, K, E, D) be an encryption where P is the plaintext space, C is the ciphertext space , K is the key space and E is the encryption operation. If we have plaintexts a and b in P, k in K and the following holds:

Then the scheme is additively homomorphic. The same principle applies with other mathematical operations.

Bit of a history

As I mentioned earlier, homomorphic encryption has been around for a while, however as of recently it could not have not been used in a practical way. One of the first attempts to perform operations on encrypted data was Yao’s Gabled Circuit [4] which dates back to 1982. In his study, he also defined the infamous Yao’s Millionaire’s problem where the objective is to let two millionaires find out which one is richer without knowing how much money they have. Unfortunately Yao’s solution efficiency was very poor and there was a lot of communication overhead in order to derive the result.

Following his lead, many other homomorphic encryption schemes emerged but with serious limitations. A common issue was that these schemes allowed either one type of operations, or limited number of operations to be executed on the encrypted data which of course had serious impact on practicality. Interestingly, these included well adopted modern schemes like RSA and El Gamal. It is true that RSA manifests a multiplicative homomorphic property when unpadded but unfortunately unpadded RSA is insecure. Thus in order to achieve security RSA gave away its homomorphic properties [5].

Matter of fact the restrictions encountered in HE schemes led to its categorization in the following three groups:

  1. Partial Homomorphic Encryption (PHE): Allows only one type of operation with an unlimited number of times.
  2. Somewhat Homomorphic Encryption (SWHE): Allows some types of operations with a limited number of times.
  3. Fully Homomorphic Encryption (FHE): Allows an all types of operations with unlimited number of times.

It wasn’t until 2009, when Craig Gentry released his profound PhD thesis which described a Fully Homomorphic Encryption scheme for the first time [6] (Write this down… this man is getting a Turing award one day 👏). Although still impractical, he introduced through his research a framework for achieving FHE. This, sparked a lot of interest throughout the academic community and many improvements and novel schemes came thereafter.

Source: A Survey on Homomorphic Encryption Schemes: Theory and Implementation

Gentry’s proposition was based on ideal-lattices. Although very promising, it was hard to implement due to its complex mathematical concepts and its computational cost. As a result, lattices became a hot topic in amongst the cryptographic community. Particularly, a lattice based problem is increasingly gaining popularity mainly because of its post quantum hardness is Learning With Errors (LWE) and It has been used as the basis of a novel FHE in 2014 [7].

Not long after Gentry’s thesis, a FHE over integers was introduced by Van Dijk et al [8] which aimed for simplicity. Finally, a different FHE was proposed by Lopez-Alt et al [9] based on the NTRU public-key cryptosystem (lattice-based) which gained attention due to its efficiency.

Given the above, FHE schemes can be categorized in:

  1. Ideal Lattice Based
  2. Over Integers
  3. LWE Based
  4. NTRU-like

Application

Still can’t see the use cases? Let me give you a simple example. Let’s take for instance the above scenario with facebook’s autonomous chat agent. With the use of homomorphic encryption, FB can store all your chat conversations on premise, encrypted using your encryption key and perform operations on them. i.e. train a neural net to respond to conversations without disclosing any of the information in your conversations.

To put it simply, HE is perfect for outsourcing processing. The illustration below highlights how this works:

Source: A Survey on Homomorphic Encryption Schemes: Theory and Implementation

In addition, HE can be practically applied in smart cities infrastructure. The extensive application of IoT devices has been proven very promising for the emergence of smart cities but at the same time a brand new attack surface has been opened up for hackers. As such, the application of HE on these devices can add another layer of powerful protection on top of these sensitive data collection.

Applications are endless. I can keep going but I would rather keep this section short. I dare you to participate in a local hackathon and read the challenges that are given. I am sure that there would be at least one occasion where HE will be useful.

Current state of HE

HE has become an active area of research which is gathering a lot of attention from the academia. Large companies like Microsoft and IBM are actively contributing and exploring the field because of its potential impact in cloud processing. The latest FHE schemes can be practically applied on commodity hardware and several software libraries have surfaced that made it easier to access and experiment with this awesome technology.

Currently, the most seminal FHE schemes are:

a. Fully Homomorphic Encryption without Bootstrapping (BGV)

b. FHEW: Bootstrapping Homomorphic Encryption in less than a second (SV)

c. Faster Fully Homomorphic Encryption: Bootstrapping in Less Than 0.1 (CGGI16)

d. Somewhat Practical Fully Homomorphic Encryption (FV12)

Furthermore, HE has been already proven to work with ML and DL with close to unencrypted accuracy which is quite amazing. Researchers have demonstrated the execution linear machine learning algorithms on encrypted data but also developed a framework for training and classification on encrypted data via deep neural networks.

Two of the most widely used software libraries for HE are:

  1. Simple Encrypted Arithmetic Library (SEAL) by Microsoft
  2. HELib

Of course there is still a lot of room for improvements and experimentation but yet the progress has been astonishing. If any of you is a researcher, developer or entrepreneur this particular area is rapidly expanding and I would highly recommend to keep an eye on it.

Conclusions

I know that I have been mumbling around for a while now so I guess is time to wrap this is up. HE is a very exciting subject with a tremendous potential to disrupt the landscape of online privacy and AI evolution. The urgent need for such a solution is apparent and some of the first use cases have already been implemented. There is certainly much progress to be made and many more to come just around the corner so now is definitely a good time to get involved.

In overview, my humble opinion is that, HE will be a cornerstone in not that far away future for privacy enhancing data analytics applications.

Plus, because I know that all of you devs are thinking right now: “This shit is cool, but show me the money!” I am planning to release a full tutorial on developing a simple web application that uses HE for data processing throughout my next posts, so stay tuned!

References:

[1] https://staltz.com/the-web-began-dying-in-2014-heres-how.html

[2] https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html

[3] https://news.developer.nvidia.com/generating-and-editing-high-resolution-synthetic-images-with-gans/

[4] https://research.cs.wisc.edu/areas/sec/yao1982-ocr.pdf

[5] https://people.csail.mit.edu/vinodv/6892-Fall2013/6892-lec01.pdf

[6] https://crypto.stanford.edu/craig/craig-thesis.pdf

[7] http://ro.uow.edu.au/theses/4028/

[8] https://eprint.iacr.org/2009/616.pdf

[9] https://eprint.iacr.org/2013/094.pdf

--

--

Andreas Pogiatzis

☰ PhD Candidate @ UoG ● Combining Cyber Security with Data Science ● Writing to Understand