how would llm-d [1] work compared to distributed-llama? is the overhead or configuration too much to work with for simple setups?
how would llm-d [1] work compared to distributed-llama? is the overhead or configuration too much to work with for simple setups?