Senior DevOps Engineer, Infrastructure & Reliability

Remote, USA Full-time Posted 2026-05-31
Apply Now
    Description:
  • Conduct interviews with engineering teams to identify and remove operational friction in CI/CD, deployments, observability, and cloud environments.
  • Design and implement scalable infrastructure-as-code patterns using Terraform to standardize provisioning and reduce configuration drift.
  • Own and evolve the Kubernetes platform, including EKS or self-managed environments, so workloads are secure, scalable, and resilient.
  • Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase release confidence.
  • Lead reliability initiatives such as incident response improvements, root cause analysis, and postmortem practices.
  • Design and enforce secure networking, IAM, and secrets management strategies across environments.
  • Improve observability through metrics, logs, and tracing using DataDog or similar tooling.
  • Optimize cloud costs through rightsizing, autoscaling, and architectural improvements.
  • Own disaster recovery planning, backup strategies, and multi-region resilience initiatives.
  • Refactor manual or brittle infrastructure into automated, testable, reproducible systems and drive adoption through documentation and hands-on support.
    Requirements:
  • 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Proven experience designing and operating production Kubernetes environments at scale.
  • Deep hands-on expertise with AWS infrastructure and cloud networking.
  • Strong experience building and maintaining Terraform modules across large cloud environments.
  • Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics.
  • Experience leading incident response processes and driving meaningful postmortem outcomes.
  • Strong understanding of distributed systems, event-driven architectures with Kafka, and database performance with PostgreSQL.
  • Proven ability to modernize legacy infrastructure and eliminate manual operational toil.
  • Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps.
  • Nice to have: experience operating high-throughput Kafka clusters, tuning PostgreSQL or Redis, implementing autoscaling, building internal developer platforms, applying security best practices, working with multi-region systems, using Python for automation, or introducing SLO/error budget/chaos testing frameworks.
  • All remote hires must be able to travel to Orlando, Florida at least twice per year, plus for orientation in Orlando.
    Benefits:
  • Health care plan including medical, dental, and vision coverage.
  • Retirement plan with 401(k) and IRA options.
  • Life insurance.
  • Flexible vacation.
  • Work-from-home option.
  • Wellness resources.
  • Free food and snacks in the office.
  • Hybrid setup in Orlando, Florida.

Apply tot his job

Apply To this Job

Similar Jobs