Description: Content X: original input speech fed to the content encoder Style Y: original input speech fed to the style encoder Generated content X, style Y: synthesize speech from decoder by combining the content encoder output X and style encoder output Y
| ID | Audio | ||||||
|---|---|---|---|---|---|---|---|
| {{ item.id }} |
|