Decoding Strategy with Perceptual Rating Prediction
for Language Model-Based Text-to-Speech Synthesis

Demo page

We present samples of synthesized speech using the following decoding strategies.

Greedy decoding
Naive sampling
Top-k top-p sampling
Sequence-wise BOK-PRP (proposed)
Block-wise BOK-PRP (proposed)
Ground truth