Sinhala audio transcribe - Tensorflow audio processing

Project information

  • Category: Machine learning
  • Client: University of Kelaniya
  • Project URL: project link

The goal of the project was to create an API endpoint that would take in a wave file (audio file) , and would then transcribe the audio file.

The approach taken to accomplish this task as as follows

  • 1. Convert the wav file into a resampled mono 16KHz format
  • 2. Convert the 16kHz formats to a spectograms
  • 3. Train a CNN model on a dataset comprised of these spectorgrams.
  • 4. Load the model to a API endpoint to run inference

Here the model was trained on a selected list of sinhala words where a batch of audio samples was collected for each sinhala word.

The audio samples where then converted to spectograms which were then used to trian the CNN model. The model reached an overall validation accuracy of 60%. More training with higher quality samples would lead to better validation accuracy.