This page shows audio demos of our proposed controllable voice conversion algorithm ControlVC, compared with several baselines. ControlVC allows users to impose fine temporal controls on pitch and speed to the converted utterance. The converted utterance maintains the linguistic content of the source utterance, mimics the timbre of the target speaker, and sounds natural while following the user input pitch and or speed controls.

Navigation


0. Links

Paper:

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls...

Code:

https://github.com/MelissaChen15/control-vc


1. Free Conversion

We first perform free non-parallel many-to-many voice conversion, without user controls to assess the linguistic content preservation and target timbre mimicking abilities.

This is the traditional setting of voice conversion research, where the converted speech utterance is expected to have the same linguistic content as the source utterance, but with a similar timbre and speaking style of the target speaker.

Notation