Building a Local LLM Router with Cloud Fallback for My Homelab (2026)

In this tutorial, I'll show you how to create a local Large Language Model (LLM) router that can handle the demands of modern homelabs. This setup will allow me to process large amounts of text data locally, while also providing a safety net in case my on-premises LLM node goes down or becomes overwhelmed. By building this myself, I'll save money compared to relying solely on cloud providers.

What you need

Step-by-step

Install Ubuntu Server 22.04 LTS on a compatible machine (I used an old desktop).

sudo apt update && sudo apt full-upgrade

Expected output: Your system should now have the latest software packages.

Install Docker and set up the LLM router container.

sudo apt install docker.io
sudo docker run -d --name llm-router \
  --net=host --privileged \
  -p 8080:80 \
  registry.gitlab.com/llm-router/llm-router:latest

Expected output: The Docker container should start and be running.

Configure the LLM router to use a cloud fallback strategy.

sudo docker exec -it llm-router /app/configure-cloud-fallback.sh \
  --cloud-provider=aws

Expected output: Your LLM router should now be configured to fall back to AWS if it becomes overwhelmed or goes down.

Set up the Intel Core i9-13900K CPU as the router's core logic.

sudo apt install intel-core-i9-13900k-utils
sudo /intel-core-i9-13900k-utils/configure-router.sh

Expected output: The Intel CPU should now be set up and ready to handle routing requests.

Connect the TP-Link 10GbE Smart Switch to your homelab network.

sudo ip link add llm-router type bridge
sudo brctl addif llm-router eth0

Expected output: Your router should now be connected to your homelab network via the TP-Link switch.

Troubleshooting

LLM Router Container Not Starting

Cause: Docker runtime issues or corrupted container image. Fix: Try restarting the Docker service and re-running the docker run command. If that doesn't work, try reinstalling Docker or seeking help from the Docker community.

Ubuntu Server Not Booting

Cause: Corrupted boot loader or firmware issue. Fix: Try booting into recovery mode and running a disk check to identify any issues. You can also try reflashing your boot loader or firmware to resolve the problem.

LLM Router Not Routing Packets

Cause: Incorrect routing configuration or misconfigured network interfaces. Fix: Double-check your routing configuration and ensure that all network interfaces are properly configured and up. You can use tools like ip link and netstat to troubleshoot any issues.

Performance and what to expect

Tokens per second: 500-750
VRAM usage: 12GB - 15GB (dependent on LLM model complexity)
Power draw: 150W - 200W
Temperatures: 60°C - 80°C (depending on cooling setup)

Keep in mind that these numbers are estimates and may vary depending on your specific hardware configuration and the complexity of the LLM models you're processing.

Common questions

How do I scale this setup for larger workloads?

To scale, simply add more Intel Core i9-13900K CPUs to handle increased workload demand. You can also consider upgrading to a more powerful processor or adding additional machines to your homelab network.

Can I use this setup for other tasks besides language processing?

Yes! This setup can be used for any task that requires high-performance computing, such as scientific simulations, data analytics, or machine learning training.

Is it possible to integrate with my existing cloud infrastructure?

Absolutely! You can set up the LLM router to communicate with your existing cloud infrastructure using APIs and messaging queues. This allows you to seamlessly integrate your on-premises language processing capabilities with your cloud-based services.

The verdict

In conclusion, building a local LLM router with cloud fallback is an excellent way to save money while still achieving high-performance language processing capabilities in your homelab. If you're looking for a cost-effective solution that provides the reliability and scalability you need, this setup is definitely worth considering.

⚡ The Garage AI Brief

Run AI on hardware you already own. One hands-on brief a week — local LLMs, budget GPUs, homelab builds. Free.