Has anyone dropped an opensource tool like run.ai, to leverage multiple gpus / distribute the workload a bit more efficiently?
I'm loving some of the single gpu llm modifications that have been dropping recently (have a couple i've tested that ran well on 4090 and 3090ti in the lab), but i've got a plethora of 8 & 12 gig 3xxx series cards i'd love to take advantage of beyond passthroughs to individual vms. Looking for any solutions. Speed isn't as important as the ability to distributively run larger models.
[link] [comments]