This page shows audio demos of our proposed controllable voice conversion algorithm ControlVC, compared with several baselines. ControlVC allows users to impose fine temporal controls on pitch and speed to the converted utterance. The converted utterance maintains the linguistic content of the source utterance, mimics the timbre of the target speaker, and sounds natural while following the user input pitch and or speed controls.

Navigation

0. Links

Paper:

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls...

Code:

https://github.com/MelissaChen15/control-vc

1. Free Conversion

We first perform free non-parallel many-to-many voice conversion, without user controls to assess the linguistic content preservation and target timbre mimicking abilities.

This is the traditional setting of voice conversion research, where the converted speech utterance is expected to have the same linguistic content as the source utterance, but with a similar timbre and speaking style of the target speaker.

Notation

Source: the recorded source speech utterance
Target: the recorded target speech utterance
ControlVC: our proposed controllable voice conversion system
PSOLA-LPC: a controllable voice conversion system using digital signal processing (DSP) techniques including pitch synchronous overlap and add (PSOLA) and linear predictive coding (LPC). [POSLA code can be found here: https://github.com/maxrmorrison/psola] [refer to ch.3 of this book for LPC]
PSOLA-AutoVC: a controllable voice conversion system using PSOLA to achieve pitch and speed control and using AutoVC to perform the conversion. [AutoVC paper and code can be found here: https://github.com/auspicious3000/autovc]