UNIT-DSR: DYSARTHRIC SPEECH RECONSTRUCTION SYSTEM USING SPEECH UNIT NORMALIZATION

UNIT-DSR: dysarthric speech reconstruction system using speech unit nromalization

Authors: Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

System comparison

Original: Original dysarthric speech.

Reference: Healthy reference speech of control speaker CF02.

ASR-TTS: Cascaded DSR pipeline with HuBERT-CTC ASR model and Tacotron 2 TTS model, followed by HiFi-GAN vocoder.

E2E-DSR: End-to-end voice conversion via cross-modal knowledge distillation for dysarthric speech reconstruction.

ASA-DSR: Speaker indentity preservation in dysarthric speech reconstruction by adversarial speaker adaptation, with dysarthric speech fine-tuned ASR model as the content encoder.

Unit-DSR (proposed): A speech unit-based dysarthric speech reconstruction system, which is efficiently fine-tuned from a pre-trained HuBERT backbone using a multi-stage strategy. A unit HiFi-GAN vocoder is utilized for speech generation.

Diagram and example of Unit-DSR system

Fig. 1. (a) Diagram of the Unit-DSR system; (b) An example of original speech units of different speakers uttering 'bath', and the reconstructed norm units from the speech unit normalizer, which have a high correspondence with the reference speech units.

Dysarthric speech reconstruction for different speakers

4 dysarthric speakers with different speech intelligibility are used for experiments: M05(mid), F04(mid), M07(low), F02(low). 'F' and 'M' denote female and male respectively.

By replacing the target speaker code, The proposed multi-speaker unit-HiFiGAN vocoder can generate high-quality speech with different speaker identity. And in this demo page, the target speaker of the unit vocoder is CF02.

M05 (intelligibility-mid):

No.	Original	Reference	ASR-TTS	E2E-DSR	ASA-DSR	Unit-DSR	Text
1							advantageous
2							backspace
3							chair
4							command
5							delete
6							downward
7							escape
8							hotel
9							rabbit
10							sentence
11							unusual
12							upward
13							watches
14							x-ray
15							yankee

F04 (intelligibility-mid):

No.	Original	Reference	ASR-TTS	E2E-DSR	ASA-DSR	Unit-DSR	Text
1							advantageous
2							backspace
3							chair
4							command
5							delete
6							downward
7							escape
8							hotel
9							rabbit
10							sentence
11							unusual
12							upward
13							watches
14							x-ray
15							yankee

M07 (intelligibility-low):

No.	Original	Reference	ASR-TTS	E2E-DSR	ASA-DSR	Unit-DSR	Text
1							advantageous
2							backspace
3							paragraph
4							command
5							delete
6							downward
7							escape
8							hotel
9							rabbit
10							sentence
11							unusual
12							upward
13							watches
14							x-ray
15							yankee

F02 (intelligibility-low):

No.	Original	Reference	ASR-TTS	E2E-DSR	ASA-DSR	Unit-DSR	Text
1							advantageous
2							backspace
3							paragraph
4							command
5							delete
6							downward
7							escape
8							hotel
9							rabbit
10							sentence
11							unusual
12							upward
13							watches
14							xray
15							yankee