Author: Joshua Jowers | Major: Industrial Engineering | Semester: Fall 2023
My name is Joshua Jowers, and I am a senior entering my final semester, majoring in Industrial Engineering with minors in data analytics and computer science. In the spring of 2023, I initiated undergraduate research with Dr. Chase Rainwater of the Industrial Engineering department. The overarching goal of the research is to devise a method for autonomous drone navigation that is not reliant on GPS. During the spring, I delved into code previously written for this problem by Wint Harvey during his time earning a Ph.D. here at the University of Arkansas. In the fall, I extended this work to run some experiments in an attempt to match the performance of Wint’s model using a different machine-learning library. This also sets up the focus of my project next semester, which is to use a more complex model type to improve the accuracy of the machine-learning model.
Wint used a specific type of machine-learning model called a convolutional neural network (CNN). The specific architecture of the CNN is called Xception, a 71-layer convolutional neural network architecture particularly effective in learning image data. Wint implemented this CNN using the machine-learning library TensorFlow. His model was trained on a set of 10,000 high-altitude images taken in a 32.2km X 32.2km region of Washington County, AR, achieving a minimum average error value of 115 meters.
Dr. Rainwater presented some of his ongoing research projects to a class that I was part of in the spring semester of 2022. This project that he was working on with Wint seemed particularly interesting to me, so I reached out to him to find out how I could contribute to it. We met, and he described some of the possible extensions of the code that could be tested out. Wint’s original code uses the neural network to take in the high-altitude images one at a time and output a prediction of the latitude and longitude of where the image was taken. The difference between the true location and the predicted location of the image. As is, the model is fed in the images in random order and makes each prediction independent of the others.
The goal of my research is to extend the model to consider predictions of multiple images when estimating the location of each image. The neural network would be rewritten using long short-term memory (LSTM), which is a mechanism that allows a neural network to consider previous predictions when making the current prediction. I will then feed in a series of images as if they were taken by a moving drone into the model to test if the performance of the model will improve with the capability of using consecutive images in the way that it would receive them from a drone flying. This can also be applied in the case where the drone is taking a video while flying, and the code can separate the video into a series of images, which can be fed to the model.
During this past spring, I spent most of my time familiarizing myself with Wint’s code. During that semester, Dr. Rainwater and I decided to change the implementation of the code to use Pytorch instead of TensorFlow. There is more flexibility available with Pytorch as well as more documentation of implementations similar to this project. So, during the spring semester and this fall semester, most of my efforts have been spent porting this code over to Pytorch. This has come with several challenges due to some major differences in the way that the two libraries are constructed.
Using Wint’s code as a basis, I created a Pytorch version of the Xception CNN. This is just the base model that the LSTM model will be built upon. Throughout this semester, there has been a focus on getting the performance of the Pytorch model to match that of the TensorFlow model. When the initial Pytorch implementation was tested, it provided very poor performance with model learning plateauing at around 10% of the width of the training area, around 3200 meters. I spent much of the fall semester experimenting with different loss functions and regularization techniques to improve model performance. After some fine-tuning of multiple parameters of the model, the Pytorch model is still underperforming with its best average error being around 620 meters, or about 2% of the width of the sampled area.
Going into the spring semester, I will be continuing work on improving the performance of the Pytorch model with help from Dr. Rainwater and another student working with him, Rafael Toche Pizano. Once the Pytorch implementation is performing at a level comparable to the TensorFlow model, I will begin the implementation of the updated model with LSTM capabilities to compare its performance to the base Pytorch model. The goal of this is to determine if using LSTM is worth the extra computational power that will be required compared to the more basic model.