Human-Quality Kannada TTS using Transfer Learning on Tacotron2 and WaveGlow [A. G. Ramakrishnan, EE]

RaGaVeRa Indic Technologies, a startup conceptualised at IISc by Ramakrishnan A G and Shiva Kumar H R and incubated by the Society for Innovation and development (SID) has developed the best ever quality TTS for Kannada, using about 44.8 hours of training data recorded from a studio from a Kannada teacher with good diction. Transfer learning is used to continue training over the Tacotron2 and WaveGlow checkpoints pre-trained on English. 35 Kannada natives evaluated it with a MOS of 4.51, whereas the original speech of the speaker was given an MOS of 4.62. In another evaluation, five sentences synthesized by RaGaVeRa’s, Google’s Wavenet, and Nuance’s TTS were presented anonymously in random order and listeners were asked to choose their most preferred output. Based on 55 human evaluators, RaGaVeRa’s Kannada TTS obtained a mean preference score of 78.2%, whereas Google’s and Nuance’s TTS got scores of 13.1% and 5.1%, respectively.

Reference:

Anil Kumar K K, Shiva Kumar H R, Ramakrishnan A G, and Jnanesh K P, ”Efficient Human-Quality Kannada TTS using Transfer Learning on NVIDIA’s Tacotron2”, International Conf. on Electronics, Computing and Communication Technologies, IEEE CONECCT 2021.

Samples of synthesized speech are available at

https://www.ragavera.com/tts/sg-kan-samples.

Click image to view enlarged version

Scroll Up