Senior DevOps Engineer, Infrastructure & Reliability
- Description:
- Conduct interviews with engineering teams to identify and remove operational friction in CI/CD, deployments, observability, and cloud environments.
- Design and implement scalable infrastructure-as-code patterns using Terraform to standardize provisioning and reduce configuration drift.
- Own and evolve the Kubernetes platform, including EKS or self-managed environments, so workloads are secure, scalable, and resilient.
- Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase release confidence.
- Lead reliability initiatives such as incident response improvements, root cause analysis, and postmortem practices.
- Design and enforce secure networking, IAM, and secrets management strategies across environments.
- Improve observability through metrics, logs, and tracing using DataDog or similar tooling.
- Optimize cloud costs through rightsizing, autoscaling, and architectural improvements.
- Own disaster recovery planning, backup strategies, and multi-region resilience initiatives.
- Refactor manual or brittle infrastructure into automated, testable, reproducible systems and drive adoption through documentation and hands-on support.
- Requirements:
- 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
- Proven experience designing and operating production Kubernetes environments at scale.
- Deep hands-on expertise with AWS infrastructure and cloud networking.
- Strong experience building and maintaining Terraform modules across large cloud environments.
- Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics.
- Experience leading incident response processes and driving meaningful postmortem outcomes.
- Strong understanding of distributed systems, event-driven architectures with Kafka, and database performance with PostgreSQL.
- Proven ability to modernize legacy infrastructure and eliminate manual operational toil.
- Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps.
- Nice to have: experience operating high-throughput Kafka clusters, tuning PostgreSQL or Redis, implementing autoscaling, building internal developer platforms, applying security best practices, working with multi-region systems, using Python for automation, or introducing SLO/error budget/chaos testing frameworks.
- All remote hires must be able to travel to Orlando, Florida at least twice per year, plus for orientation in Orlando.
- Benefits:
- Health care plan including medical, dental, and vision coverage.
- Retirement plan with 401(k) and IRA options.
- Life insurance.
- Flexible vacation.
- Work-from-home option.
- Wellness resources.
- Free food and snacks in the office.
- Hybrid setup in Orlando, Florida.
Apply tot his job
Apply To this Job