Where Cryptography and Machine Learning Meet

Author: Alexandria Lim  Majors: Computer Engineering, and German

Using the Morfix.io website to play around with fully homomorphic encryption

My name is Alexandria Lim, and I am a student at the College of Engineering. I am majoring in Computer Engineering and German. My mentor’s name is Dr. Alexander Nelson, a professor in the Department of Computer Science and Computer Engineering. This Fall 2020 semester is my last semester of funding. Upon graduation, I would like to work a couple of years in the industry before returning for post-graduate study.

For a world in which technology is all around us, we often neglect the important task of preserving the privacy and security of our technology and the data they can collect. One might believe that technology cannot analyze data for their users without knowing the data contents, but I believe otherwise. With my Honors College Research grant, I sought to focus on preserving privacy and security of technology through the use of fully homomorphic encryption (FHE) – a type of encryption that allows for computations such as addition and multiplication to be done on encrypted information and returns an encrypted result. The special feature about this encryption is that the encrypted results, when decrypted, match the results that would have been given if the computations had been done on regular, readable information. Thus, with FHE, no readable data needs to be stored and is at risk of being compromised.

As a freshman, I participated in the Freshman Engineering Research Colloquium class which gives freshmen the opportunity to engage in research. My research mentor for this class was Dr. Alexander Nelson, who introduced me to the topic of FHE. For my Honors College research project, I asked him to be my mentor again. He proofread and edited my proposal, which aided my success in receiving this research grant, and continued to meet with me through on-demand meetings.

This fall semester, after the campus had adjusted to changes from COVID-19, my research mentor and I discussed future directions that this research could take from my original plan,  which was to create a system in which the user data could be analyzed and maintain privacy at the same time. In the discussions, there were two approaches that were discussed to accomplish this goal.

In the first approach – which was the original plan – all the computation with FHE would be done at the “edge devices” – or the smartwatch and smartphones themselves. However, there are two reasons why this approach is suboptimal: edge devices will probably not have enough computation power for the demands of FHE, and furthermore, edge devices are insecure because malicious attackers would have physical access to a device, making it easier for them to steal proprietary service code.  In the second approach – and the revised plan – a server would be used to communicate constantly with the client devices and perform the calculations for the clients. A server has more resources and is located in data centers, so it can perform more complex computations and stay more physically secure than edge devices.

For servers that need to quickly interpret incoming sensory information from their clients, machine learning algorithms may be used to efficiently categorize information as a certain type of activity. Thus, the end goal was finalized to be a proof-of-concept machine learning algorithm using FHE. My algorithm code would have the ability to analyze and classify various activities within the incoming data because it is based on an existing machine learning algorithm and would run without knowing the true value of the data to preserve privacy because it uses FHE.

To start, I read through the GitHub repository for the Microsoft SEAL library, which contains helpful information for developers who wish to use the code or to learn more about the code. After getting an overview of the example code, Dr. Nelson and I looked at various tools. The simplest tool looked to be Morfix.io – a website that allows for quick experimentation with FHE due to its simple layout of being able to choose encryption parameters and generate the needed encrypted ciphertexts. Next, to choose the specific algorithm that I would translate into FHE, I performed a literature review of various studies on the UCI machine learning repository that conducted activity analysis and classification with various sensor data to see what types of algorithms were being used. Dr. Nelson advised me to use the naïve Bayes algorithm as it would be fairly simple compared to the other algorithms and directed me to resources to learn about the algorithm.

As of now, the process of translating the algorithm to use the Microsoft SEAL library is ongoing. The next step in my research is to finish up the algorithm code and to run different datasets from the UCI machine learning repository to compare the classifications my algorithm gives to the classifications of the algorithms in the original papers. If my algorithm was to succeed in achieving the accuracy of classifications as the original non-FHE using activity classification algorithm, it could suggest that activity classification through FHE is feasible for use while also providing privacy and security of the data from the user.

This research experience has helped me to broaden the horizons of my knowledge base in privacy and information security by allowing me to learn more about FHE and machine learning. I enjoy the open-ended aspects of research and the flexibility to try different routes to my solutions. I hope to continue this research and build up the work to turn the project into an honors thesis. I also see the possibilities of pursuing research in the field of privacy and information security as a career.