Unsupervised Learning of Disentangled Speech Content and Style Representation

Andros Tjandra¹, Ruoming Pang², Yu Zhang², Shigeki Karita²
¹NAIST, ²Google LLC

Description:
Content X: original input speech fed to the content encoder
Style Y: original input speech fed to the style encoder
Generated content X, style Y: synthesize speech from decoder by combining the content encoder output X and style encoder output Y

Tested on Google Chrome

Audio

Unsupervised Learning of Disentangled Speech Content and Style Representation

Andros Tjandra1, Ruoming Pang2, Yu Zhang2, Shigeki Karita2 1NAIST, 2Google LLC

Tested on Google Chrome

Andros Tjandra¹, Ruoming Pang², Yu Zhang², Shigeki Karita²
¹NAIST, ²Google LLC