Modern automatic speech recognition (ASR) systems are mostly data driven. Deep neural networks based acoustic models are highly effective, but are opaque and require massive amounts of data for training. On the other hand, phonologists and linguists have elaborate theories to model the nature and behaviour of languages around the world, but have had limited success in contributing to mainstream ASR technology. Phonologists describe speech in terms of phonological features, which are based on articulatory as well as perceptual properties. After a brief overview on the universal nature of these features, this talk will describe some ways in which these features can contribute to modern ASR technology. These are sub-phonetic elements shared across languages, and hence, can serve as useful means for cross-language model transfer for under-resourced languages, i.e., languages with limited training data available. Also, the well-organised structure of these features makes them very useful in computer-aided learning of non-native languages -- for evaluation as well as for providing corrective feedbacks to the learners.
Dr. Vipul Arora is a post-doctoral researcher in Language and Brain lab at Faculty of Linguistics, University of Oxford (UK). He is working on automatic speech recognition (ASR), while focusing on the use of phonological principles to enhance the ASR systems. He received his B.Tech. and Ph.D. degrees in electrical engineering from Indian Institute of Technology (IIT), Kanpur, in the years 2009 and 2015, respectively. His Ph.D. research was based on automatic transcription of polyphonic music.