SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis [Prasanta Kumar Ghosh, EE]

A reduced-complexity speech synthesizer is developed by reformulating the source-filter model of speech where the excitation signal is modeled as a sum of a pitch-dependent impulse train and colored noise. The parameters of the reformulated source-filter model are predicted using a neural network, referred to as SFNet. The network parameters are learnt by training the network using l1-error between the log Mel-spectrum of the predicted waveform and that of the ground-truth waveform. We demonstrate that there is a significant reduction in the memory and computational complexity compared to the state-of-the-art speaker independent neural speech synthesizer without any loss of the naturalness of the synthesized speech.


Click image to view enlarged version

Scroll Up