Does PCIe bandwidth matter for running inference in general ?

Difficult to find motherboard with more than 2 PCIe 16x slots. What if I connect GPUs through the PCIe 1x port ? Would that only affect loading the model once per boot and then have no impact on performance ? Does the model need to be reloaded many times during a session ?

I imagine when you start a new conversation, you need to load a clean copy ? So maybe once per conversation and then you can make many queries without being limited by PCIe bandwidth ?

submitted by /u/transdimensionalmeme
[link] [comments]