HQ-SVC

Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Bingsong Bai, Yizhong Geng, Fengping Wang, Cong Wang, Puyuan Guo, Yingming Gao, Ya Li*
Beijing University of Posts and Telecommunications, China
*Corresponding author

Abstract

Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately, losing essential acoustic information that degrades output quality while requiring significant computational resources. To overcome these limitations, we propose HQ-SVC, an efficient framework for high-quality zero-shot SVC. HQ-SVC first extracts jointly content and speaker features using a decoupled codec. It then enhances fidelity through pitch and volume modeling, preserving critical acoustic information typically lost in separate modeling approaches, and progressively refines outputs via differentiable signal processing and diffusion techniques. Evaluations confirm HQ-SVC significantly outperforms state-of-the-art zero-shot SVC methods in conversion quality and efficiency. Beyond voice conversion, HQ-SVC achieves superior voice naturalness compared to specialized audio super-resolution methods while natively supporting voice super-resolution tasks.

Model Architecture

HQ-SVC Model Architecture

Figure 1: Overall architecture of the proposed HQ-SVC and the architecture of the proposed EVA module.

Zero-shot Singing Voice Conversion (SVC)

Source Target FACodec-SVC SaMoye-SVC HQ-SVC (Ours)

Zero-shot Singing Voice Super-resolution

GT (44.1 kHz) Downsampled (16 kHz) AudioSR HQ-SVC (Ours)

Zero-shot Speech Voice Super-resolution

GT (44.1 kHz) Downsampled (16 kHz) AudioSR HQ-SVC (Ours)

BibTeX

@article{bai2025hq,
  title={HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios},
  author={Bai, Bingsong and Geng, Yizhong and Wang, Fengping and Wang, Cong and Guo, Puyuan and Gao, Yingming and Li, Ya},
  journal={arXiv preprint arXiv:2511.08496},
  year={2025}
}