Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs – Reading out 144K recipes
Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs – Reading out 144K recipes

Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs – Reading out 144K recipes

Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs - Reading out 144K recipes

In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs.

The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset.

You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/

Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results

Code-blocks included in the tutorial.

Words per dollar for each GPU:

Words per dollar comparison or each GPU

Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards.

Conclusions

  • As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken.
  • The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different.
  • No matter what GPU you select, you should be prepared for significant variability in performance.
  • Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes.

We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read.

submitted by /u/SaladChefs
[link] [comments]