UNIT-DSR: dysarthric speech reconstruction system using speech unit nromalization

Authors: Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng


System comparison

  • Original: Original dysarthric speech.
  • Reference: Healthy reference speech of control speaker CF02.
  • ASR-TTS: Cascaded DSR pipeline with HuBERT-CTC ASR model and Tacotron 2 TTS model, followed by HiFi-GAN vocoder.
  • E2E-DSR: End-to-end voice conversion via cross-modal knowledge distillation for dysarthric speech reconstruction.
  • ASA-DSR: Speaker indentity preservation in dysarthric speech reconstruction by adversarial speaker adaptation, with dysarthric speech fine-tuned ASR model as the content encoder.
  • Unit-DSR (proposed): A speech unit-based dysarthric speech reconstruction system, which is efficiently fine-tuned from a pre-trained HuBERT backbone using a multi-stage strategy. A unit HiFi-GAN vocoder is utilized for speech generation.

  • Diagram and example of Unit-DSR system

  • Fig. 1. (a) Diagram of the Unit-DSR system; (b) An example of original speech units of different speakers uttering 'bath', and the reconstructed norm units from the speech unit normalizer, which have a high correspondence with the reference speech units.

  • Dysarthric speech reconstruction for different speakers

  • 4 dysarthric speakers with different speech intelligibility are used for experiments: M05(mid), F04(mid), M07(low), F02(low). 'F' and 'M' denote female and male respectively.
  • By replacing the target speaker code, The proposed multi-speaker unit-HiFiGAN vocoder can generate high-quality speech with different speaker identity. And in this demo page, the target speaker of the unit vocoder is CF02.
  • M05 (intelligibility-mid):

    No. Original Reference ASR-TTS E2E-DSR ASA-DSR Unit-DSR Text
    1 advantageous
    2 backspace
    3 chair
    4 command
    5
    delete
    6 downward
    7 escape
    8 hotel
    9
    rabbit
    10 sentence
    11 unusual
    12 upward
    13 watches
    14 x-ray
    15 yankee

    F04 (intelligibility-mid):

    No. Original Reference ASR-TTS E2E-DSR ASA-DSR Unit-DSR Text
    1 advantageous
    2 backspace
    3 chair
    4 command
    5
    delete
    6 downward
    7 escape
    8 hotel
    9
    rabbit
    10 sentence
    11 unusual
    12 upward
    13 watches
    14 x-ray
    15 yankee

    M07 (intelligibility-low):

    No. Original Reference ASR-TTS E2E-DSR ASA-DSR Unit-DSR Text
    1 advantageous
    2 backspace
    3 paragraph
    4 command
    5
    delete
    6 downward
    7 escape
    8 hotel
    9
    rabbit
    10 sentence
    11 unusual
    12 upward
    13 watches
    14 x-ray
    15 yankee

    F02 (intelligibility-low):

    No. Original Reference ASR-TTS E2E-DSR ASA-DSR Unit-DSR Text
    1 advantageous
    2 backspace
    3 paragraph
    4 command
    5
    delete
    6 downward
    7 escape
    8 hotel
    9
    rabbit
    10 sentence
    11 unusual
    12 upward
    13 watches
    14 xray
    15 yankee