Name: SP113-Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis - Projects Lelo
Price: 12000.00 INR
Availability: InStock

Recent attention-based end-to-end speech synthesis from text systems have achieved human-level performance. However, many approaches cause a sequence-to-sequence model to generate only averaged results of the input text, making it difficult to control the duration of utterance. In this study, we present a novel mechanism for phonemic-level duration control (PDC) in a nearly end-to-end manner in order to solve this problem. We used a teacher attention alignment generated by an annotation speech analyzer program. Our method is inspired by the idea that the duration of a phoneme is highly related to its phonemic features. These phonemic features are saved on the attention alignment by adding duration embedding to it. This enables the model to learn and control the phonemic and rhythmic features of speech. We also show that providing alignment information as a teacher loss term improves training speed and notably, makes the model better at controlling the speed of dramatic change in phonemic-level duration with subjective demonstration. As a result, we show that our PDC speech synthesis with alignment loss outperforms other baseline methods without losing the ability to control the duration of phonemes in extremely adjusted environments with faster convergence.

Reviews

There are no reviews yet.

Be the first to review “SP113-Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis”

You must be logged in to post a review.

Contact UsHere's your new discount product tab.

SP113-Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis

Reviews

Company

Home

About Us

Shop

Projects

Software

Hardware

Mini Projects

Mechanical

Policy

Terms & Conditions

Privacy Policy

Refund & Cancellation policy

Shipping & Delivery Policy

Address

Saikrupa Mall, Dahisar Railway Station, F16, Lokmanya Tilak Rd, West, Dahisar East, Mumbai, Maharashtra 400068

Contact Us To Know More +918356839486

Copyright © ProjectsLelo 2025

Designed & Developed by Tech Cryptors IT Services

Designed & Developed by
Tech Cryptors IT Services (TCIS)

SP113-Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis

Reviews

Related products

SP109-A Fully Convolutional Neural Network for Complex Spectrogram Processing in Speech Enhancement

SP122-Fully Supervised Speaker Diarization

SP106-DRL360: 360-degree Video Streaming with Deep Reinforcement Learning

Company

Projects

Policy

Address