L4B — Directive4

2026-03-13

Benchmark

Qwen3-Coder-Next: 27 → 129 tok/s

First time running Qwen3-Coder-Next — an 80 billion parameter Mixture of Experts model — on local hardware. Started at 27 tokens per second. Tuned the inference engine (vLLM on NVIDIA Blackwell), disabled CUDA graph suppression, and hit 129 tok/s. Research confirmed this is the ceiling for consumer Blackwell GPUs right now — datacenter cards get native MoE kernels, workstation cards fall back to generic Triton. We're at the wall. Still faster than most cloud APIs.

Hardware: RTX Pro 6000 Blackwell | Model: 80B MoE (FP8) | Context: 131K tokens

2026-03-09

Deployed

Lab online. VLANs up. Beacons deployed.

Replaced the SG-1100 running pfSense with an MS-01 running Proxmox and OPNsense. Full VLAN segmentation went live — management, compute, IoT, guest, all isolated. Deployed two beacon nodes with Keepalived failover as the distributed control plane. The network went from a flat consumer setup to a segmented lab architecture in one session.

Stack: Proxmox | OPNsense | Keepalived | VLAN segmentation

2026-03-06

Milestone

First public post. Lab coming online.

First post from @directive4AI. Electrical rewiring done. Equipment racked. The lab is coming up. Everything that follows builds on this foundation.

2026-02

Milestone

Cross-country move. Temporary lab in Texas.

Moved the entire operation from San Antonio, TX to West Branch, MI. Wife, toddler, and a second on the way. Built a temporary lab in Texas before we left so work didn't stop during the 45-day transition. Rewired the electrical in the new house. The company didn't go dark — we relocated.

// by the numbers

Models running locally 1B — 120B

Peak inference speed 129 tok/s

Hardware tested AMD, Nvidia, Apple Silicon

Cloud dependencies 0

directive4.ai

Don't rent your intelligence.

Captain Jeffrey M. Selonke, USAF (Ret.)

20 years, 5 months, 10 days

Founder — Directive4