TernaryPhysics Ops

TernaryPhysics-7B is the brain behind our agents' conversational capabilities. It's a 4-bit quantized model that runs entirely on CPU, without requiring a GPU. This post explains what it is, how we built it, and why we made the choices we did.

Model Specifications

Model Size	7 billion parameters (quantized)
Disk Space	4.7 GB
Context Window	4,096 tokens
Inference Speed	~17 tok/s on Apple Silicon (M-series), ~10 tok/s on commodity x86 CPU — no GPU
RAM Required	8 GB minimum
GPU Required	No

Choosing the Right Model

We evaluated dozens of models for infrastructure investigation tasks. Our criteria:

Instruction following. The model needs to understand complex multi-step queries about infrastructure.
Technical knowledge. It needs to understand Kubernetes, databases, networking, Linux internals.
Reasoning ability. Root cause analysis requires multi-hop reasoning.
Efficiency. Must run well on CPU without excessive resource usage.
Permissive license. Must be deployable commercially without restrictions.

TernaryPhysics-7B is the result of extensive evaluation and optimization. It provides strong infrastructure reasoning capabilities while running efficiently on standard hardware.

What is Quantization?

Neural networks typically use 32-bit or 16-bit floating-point numbers to represent weights. Quantization reduces this precision to enable efficient CPU inference.

Why Quantize?

•Smaller size: Reduces model size dramatically, from tens of GB to just a few GB.

•Faster inference: Smaller models load faster and process more efficiently.

•CPU-friendly: Enables real-time inference without GPU acceleration.

TernaryPhysics-7B uses advanced quantization techniques that balance size reduction with quality preservation, ensuring accurate infrastructure reasoning.

Optimized Inference

TernaryPhysics-7B uses an optimized inference engine designed for CPU execution. This enables real-time conversational responses without GPU acceleration.

Fast CPU Inference

Real-time conversational responses on modern hardware.

Memory Efficient

Optimized to run on systems with 8GB+ RAM.

Cross-Platform

Linux, macOS, Windows. x86_64, ARM64. Works everywhere.

No Dependencies

No GPU, no special drivers. Just standard hardware.

How It Fits the Architecture

TernaryPhysics-7B is the "Tier 2" brain in our two-tier architecture. It works alongside the TNN™ (Tier 1):

Normal Operation
────────────────
TNN™ runs continuously → minimal resource usage
TernaryPhysics-7B sleeps → 0 CPU usage

Anomaly Detected / Human Query
──────────────────────────────
TNN™ detects anomaly → wakes TernaryPhysics-7B
TernaryPhysics-7B analyzes logs/metrics
Returns findings → goes back to sleep

This pattern minimizes resource usage. During normal operation, only the tiny TNN consumes resources. The heavyweight LLM only activates when needed.

Hardware Requirements

TernaryPhysics-7B is designed to run on commodity hardware:

Component	Minimum	Recommended
RAM	8 GB	16 GB (multi-agent on one host)
Disk	5 GB	10 GB (logs + multi-agent)
CPU	Any x86_64 or ARM64	Modern multi-core
GPU	Not required	Not required

On modern hardware, you'll get real-time conversational responses. Older hardware still works, just with slightly longer response times.

Future Improvements

We're actively working on:

TNN™ integration. Using the TNN™ to accelerate LLM inference.
Infrastructure fine-tuning. Training on infrastructure-specific data for better technical understanding.
Smaller models. Exploring smaller models for resource-constrained environments.
Efficiency improvements. Continuous optimization for faster, leaner inference.

For more details on how the model fits into the broader architecture, see our Architecture documentation.

TernaryPhysics-7B: Our Quantized LLM