Principal AI Infrastructure Engineer (Triton / NeMo / Ollama) — Careers

Mission

Own the AI inference and training infrastructure that powers Victoria, BrightFlow ML, and Albright Studios generative pipelines. Make NVIDIA Triton, NeMo, and Ollama on GPU K8s reliable and fast.

Responsibilities

Own the Triton inference server cluster on GPU K8s nodes
Maintain the NeMo Curator + fine-tuning pipeline for in-house LLMs
Operate Ollama and other lightweight inference engines for dev workflows
Build model-serving APIs with autoscaling, quotas, and SLOs
Lead GPU capacity planning across A6000, H100, and consumer-tier cards
Partner with ML engineers on model packaging, optimization, and deployment
Establish observability — token/sec, GPU utilization, model latency

Required qualifications

8+ years infrastructure engineering; 3+ years with ML serving
Hands-on experience with NVIDIA Triton, TensorRT, or vLLM
Strong K8s background; comfortable writing operators and CRDs
Deep Python and one systems language (Go, Rust, C++)

Preferred qualifications

Experience with NeMo Curator or modern LLM fine-tuning pipelines
Background managing multi-GPU training (FSDP, DeepSpeed, Megatron)
Open-source contributions to ML infrastructure tooling
Prior Principal or Staff Engineer title at a hyperscaler or AI lab