To leverage crowd-sourced data to train multi-speaker text-to-speech (TTS) models that can synthesize clean speech for all speakers, it is essential to learn disentangled representations which can independently control the speaker identity and background noise in generated signals. However, learning such representations can be challenging, due to the lack of labels describing the recording conditions of each training example, and the fact that speakers and recording conditions are often correlated, e.g. since users often make many recordings using the same equipment. This paper proposes three components to address this problem by: (1) formulating a conditional generative model with factorized latent variables, (2) using data augmentation to add noise that is not correlated with speaker identity and whose label is known during training, and (3) using adversarial factorization to improve disentanglement. Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers. Ablation studies verify the importance of each proposed component.

Reviews

There are no reviews yet.

Be the first to review “SP108-Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization”

You must be logged in to post a review.

Contact UsHere's your new discount product tab.

SP108-Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

Reviews

Company

Home

About Us

Shop

Contact Us

Projects

Software

Hardware

Mini Projects

Mechanical

Policy

Terms & Conditions

Privacy Policy

Refund & Cancellation policy

Shipping & Delivery Policy

Address

F16, Sai Kripa Mall, L.T. Road, Dahisar Railway Station West, Mumbai, Maharashtra 400 068.

Powered by © Tech Cryptors ( TCIS ) 2024

Privacy Policy

Terms & Conditions

SP108-Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

Reviews

Related products

SP101-Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

SP118-Comparison of Machine Learning Methods for Breast Cancer Diagnosis

SP123-Artificial Intelligence in Future Evolution of Mobile Communication

Company

Home

About Us

Shop

Contact Us

Projects

Software

Hardware

Mini Projects

Mechanical

Policy

Terms & Conditions

Privacy Policy

Refund & Cancellation policy

Shipping & Delivery Policy

Address

F16, Sai Kripa Mall, L.T. Road, Dahisar Railway Station West, Mumbai, Maharashtra 400 068.

Powered by © Tech Cryptors ( TCIS ) 2024

Privacy Policy

Terms & Conditions