2026-03-13
Benchmark
Qwen3-Coder-Next: 27 → 129 tok/s
First time running Qwen3-Coder-Next — an 80 billion parameter Mixture of Experts model — on local hardware. Started at 27 tokens per second. Tuned the inference engine (vLLM on NVIDIA Blackwell), disabled CUDA graph suppression, and hit 129 tok/s. Research confirmed this is the ceiling for consumer Blackwell GPUs right now — datacenter cards get native MoE kernels, workstation cards fall back to generic Triton. We're at the wall. Still faster than most cloud APIs.
Hardware: RTX Pro 6000 Blackwell  |  Model: 80B MoE (FP8)  |  Context: 131K tokens
2026-03-09
Deployed
Lab online. VLANs up. Beacons deployed.
Replaced the SG-1100 running pfSense with an MS-01 running Proxmox and OPNsense. Full VLAN segmentation went live — management, compute, IoT, guest, all isolated. Deployed two beacon nodes with Keepalived failover as the distributed control plane. The network went from a flat consumer setup to a segmented lab architecture in one session.
Stack: Proxmox  |  OPNsense  |  Keepalived  |  VLAN segmentation
2026-03-06
Milestone
First public post. Lab coming online.
First post from @directive4AI. Electrical rewiring done. Equipment racked. The lab is coming up. Everything that follows builds on this foundation.
2026-02
Milestone
Cross-country move. Temporary lab in Texas.
Moved the entire operation from San Antonio, TX to West Branch, MI. Wife, toddler, and a second on the way. Built a temporary lab in Texas before we left so work didn't stop during the 45-day transition. Rewired the electrical in the new house. The company didn't go dark — we relocated.
// by the numbers
Models running locally 1B — 120B
Peak inference speed 129 tok/s
Hardware tested AMD, Nvidia, Apple Silicon
Cloud dependencies 0
Don't rent your intelligence.
Captain Jeffrey M. Selonke, USAF (Ret.)
20 years, 5 months, 10 days
Founder — Directive4