June 7, 2019   //    Github

app_screenshot

I built an American Sign Language classifier web app by training a model off a Kaggle dataset then building the web app with Flask. To train the model, I tried a variety of approaches, from preprocessing with histogram of gradients then using multi-class SVMs, to implementing a LeNet-5 CNN architecture in Pytorch, to a VGG-11 architecture.

The Kaggle dataset I used had very low data variety, so I had to use significant data augmentations to avoid overfitting; this allowed me to brush up my skills in OpenCV and Pytorch dataloaders. My final choice of the VGG-11 was largely due to the fact that I needed a deep enough network to handle the data augmentations I had added. The final CNN had an accuracy of 99.3% on the test set. When I fed the model 26 pictures of my own hand (one for each letter), it classified 75% correctly. Some key letters that are often confused are M, N, S, and T.