voice similarity with VoxSim

Voice similarity demo using wavlm-ecapa model, which is trained on Voxsim dataset. This demo only accepts .wav format. Best at 16 kHz sampling rate. The inference process of this Spaces demo is suboptimal due to the limitations of a basic CPU. To obtain an accurate score, refer to the "voxsim_trainer" repository and run the code via the CLI.

Paper is available here