Description: Content X: original input speech fed to the content encoder Style Y: original input speech fed to the style encoder Generated content X, style Y: synthesize speech from decoder by combining the content encoder output X and style encoder output Y
ID | Audio | ||||||
---|---|---|---|---|---|---|---|
{{ item.id }} |
|