As an app that brings speech therapy onto users' devices, the evaluation model is crucial. I helped further the research that Sara Technology had been conducting in this space. Specifically, I helped train a phoneme classification model using both open source and internally collected data. Helping collect data for the team and writing training scripts based on foundational audio embedding models, I was able to help further the research that the AI team was conducting to improve pronunciation evaluation.
Key skills:
• Audio Learning Models
• Connectionist Temporal Classification
• wav2vec
• AWS Amplify
• React
My time with Sara Technology was incredibly exciting as I joined the team during one of the most crucial time periods for any startup: app launch. During this time, I was quickly immersed into the process of building out and pushing app pages to production. Apart from the rush of app development, my primary task was training the evaluation model. This involved two steps: building out a tool for internal data collection and training open source models on phoneme data. A phoneme is essentially parts of speech, so being able to correctly identify what phonemes are being spoken in an audio recording is an important step in classifying whether a word was pronounced correctly or incorrectly. The internal data collection tool was built to simplify the process of collecting audio recordings from the team. Once that was built out, I was able to focus on the model training. I used an existing audio embedding model trained on a large audio data set, and from there, I used a Connectionist Temporal Classifier to conduct the actual classification step. The training process required a few iterations before we could see an improvement in the classification error rate. Common obstacles included not being able to obtain enough data despite using multiple open source data sets and running into compute limits. Despite working with these constraints, I was able to see a significant improvement in classification error and enjoyed being able to delve into a real world deep learning problem space.