Decoding Strategy with Perceptual Rating Prediction
for Language Model-Based Text-to-Speech Synthesis

Kazuki Yamauchi, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari
The University of Tokyo, Japan.

Demo page

Compared methods

We present samples of synthesized speech using the following decoding strategies.

Samples of synthetic speech

Greedy decoding
Naive sampling
Top-k top-p sampling
Sequence-wise BOK-PRP (proposed)
Block-wise BOK-PRP (proposed)
Ground truth