This issue recommends a Python-based AI onomatopoeia project – MockingBird.
MockingBird can clone the sound through 5 seconds of audio material, the output timbre produced is very similar to the original sound, and can synthesize sounds and consonants that do not exist in the original audio sample, but also support the generation of arbitrary voice content.
MockingBird Features:
Chinese supports Mandarin and tests with multiple Chinese datasets: aidatatang_200zh, magicdata, aishell3, biaobei, MozillaCommonVoice, data_aishell, etc
PyTorch works with pytorch and has been tested in version 1.9.0 (latest August 2021), GPU Tesla T4 and GTX 2060
Windows+Linux can run on both Windows and linux operating systems (there are also community success cases for MAC OS M1 edition)
Easy & Awesome Downloader or newly trained synthesizer The synthesizer has good results, comes with a sound preset encoder/code, or real-time HiFi-GAN as a vocoder
Webserver Ready The training results of the server for remote invocation
How to use:
1. Installation
Install PyTorch.
Install ffmpeg.
Run pip install -r requirements.txt to install the necessary packages.
install webrtcvad pip install webrtcvad-wheels.
2. Prepare the pre-training model
2.1 Train the synthesizer model yourself using the data set (as against 2.2)
Download the data set and extract it: Make sure you have access to all the audio files in the train folder (such as.wav)
Preprocess the audio and Mayer spectrum: python pre-. py <datasets_root> -d {dataset} -n {number} can be passed in parameters
-d{dataset} Specifies the dataset. aidatatang_200zh, magicdata, aishell3, data_aishell are supported. The default value is aidatatang_200zh
-n {number} Specifies the number of parallel CPU 11770k + 32GB Measured 10 no problem
If you download the aidatatang_200zh file on disk D, the train file path is D:\data\aidatatang_200zh\corpus\train, your datasets_root is D:\data\
Train synthesizer: python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer
When you see in the training folder synthesizer/saved_models/ that the attention line displays and the loss meets your needs, go to the startup step.
2.2 Use of community pre-trained synthesizers (with 2.1 as an alternative)
Please refer to the link at the end of this article
2.3 Training Vocoder (Optional)
Preprocessing data: python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>
Replace <datasets_root> with your dataset directory, and <synthesizer_model_path> with one of your best synthesizer model directories, such as sythensizer\saved_mode\xxx
Train wavernn vocoder: python vocoder_train.py <trainid> <datasets_root>
<trainid> is replaced with the identifier you want, and the same identifier continues the original model when retrained
Train the hifigan vocoder: python vocoder_train.py <trainid> <datasets_root> hifigan
<trainid> is replaced with the identifier you want, and the same identifier continues the original model when retrained
3. Start the program or toolbox
3.1 Start the Web program:
python web.py is successfully run in the browser to open the address, the default is http://localhost:8080
3.2 Starting the Toolbox
python demo_toolbox.py -d <datasets_root>
Specify an available data set file path. If supported data sets are available, they will be automatically loaded for debugging and will also serve as a storage directory for manually recorded audio.
You can read more on your own.