Albright Laboratories

Engineering & Product

Senior DevOps / SRE

Level: Senior Department: Engineering Location: Florida HQ / Remote-OK Classification: W-2

Mission

Own production reliability across Albright services. Establish the SRE practice — SLOs, on-call, incident response, and the runbooks that let us sleep at night.

Responsibilities

Define and maintain SLOs, error budgets, and on-call rotations
Operate the observability stack — Prometheus, Grafana, Loki, OpenTelemetry
Lead incident response and post-incident review
Build runbooks and automate toil
Partner with platform engineering on K8s, CI/CD, and infra
Establish disaster-recovery and backup posture
Improve mean-time-to-detect and mean-time-to-recover metric

Required qualifications

5+ years SRE or DevOps engineering
Production K8s and Linux administration experience
Strong observability stack experience (Prom/Grafana/OTel)
Comfort writing Python or Go automation

Preferred qualifications

Experience at a fintech or trading firm with strict reliability requirements
On-call leadership at a 24x7 service
Incident-Command-System or comparable training
Open-source contributions

Albright Laboratories is an Equal Opportunity Employer. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. Compensation bands reflect published US benchmarks at the cited source and may be adjusted for experience, location, and total compensation mix. Federal-Cleared roles require US citizenship and an active or eligible security clearance per the role description.