Unsupervised Learning of Disentangled Speech Content and Style Representation


Andros Tjandra1, Ruoming Pang2, Yu Zhang2, Shigeki Karita2
1NAIST, 2Google LLC


Description:
Content X: original input speech fed to the content encoder
Style Y: original input speech fed to the style encoder
Generated content X, style Y: synthesize speech from decoder by combining the content encoder output X and style encoder output Y

Tested on Google Chrome
ID Audio