Runtime Visibility

Runtime Visibility is the operator view of what the local model runtime is doing. The Model, Logs, and GPU Monitor drawers make model state, llama-server output, GPU load, and process visibility available next to the chat UI.

Normal signed-in users can view runtime status, logs, and GPU telemetry where the UI exposes them. Admins can also start/stop models, clear logs, and change runtime settings.

Model Status

Model drawer

The Model drawer shows whether the main model is running, the current model name, runtime mode, GPU layers, CPU threads, temperature, top-k, top-p, repeat penalty, and seed.

Admin controls

Admins can choose an enabled and present model, set launch options, start the main model, stop the main model, and control the separate title model process.

Benchmark lockout

Main model start and stop controls are blocked while a benchmark is active, because benchmark runs interact with model process control.

Logs Drawer

Log stream

The Logs drawer shows llama-server log lines from the app's in-memory log buffer. Start Log Stream refreshes the view about every 1.5 seconds while the drawer is open; Stop Log Stream stops that polling.

Admin clear

Signed-in users can view logs. Admins also see Clear Logs, which clears the main and title model log buffers.

What logs are for

Logs help operators see model launch output, runtime readiness, errors, and llama-server messages without leaving the browser.

GPU Monitor

Monitor controls

Start Monitor refreshes telemetry about every 5 seconds while the drawer is open. Stop pauses monitoring. Copy Output copies the raw telemetry output shown in the drawer.

GPU cards

The drawer can show GPU cards with memory usage, utilization, one-minute mini charts, power, temperature, total power, and last-updated time.

Process rows

When supported, the drawer lists running GPU compute processes with PID, executable, per-GPU usage, and total VRAM.

Raw telemetry

The raw output panel shows the underlying local telemetry command output, such as nvidia-smi or rocm-smi output.

Hardware Backends

Backend What CE can show Important limit
NVIDIA Telemetry comes from nvidia-smi. CE can show GPU cards, utilization, memory, process rows, and raw output when the command is available. Visibility depends on local NVIDIA drivers and the command being reachable by the app process.
AMD / ROCm Telemetry uses local ROCm tooling such as rocm-smi and rocminfo where available. Process-level telemetry is reported as unavailable for ROCm systems when the local tools do not provide it.
No supported telemetry The drawer shows a no-data or no-backend message instead of GPU cards. This does not by itself prove model inference is impossible; it only means CE cannot read telemetry from supported local tools.

Visibility Depends On

Local toolsnvidia-smi, rocm-smi, or rocminfo must be installed and available.
DriversGPU drivers and ROCm/CUDA support must match the hardware and operating system.
llama-server buildGPU offload depends on a llama-server build that supports the intended runtime backend.
Saved settingsGPU layers, CPU threads, model path, and ports come from saved runtime settings.