Site Reliability Engineering (SRE) at CircleCI
The CircleCI SRE team is responsible for the effective operation of our platform at scale.
This team provides all of the shared services and infrastructure required by CircleCI's software, from deployment and orchestration to monitoring and logging.
The SRE team also aims to provide a consistent environment in which this software can run regardless of whether it is operating in CircleCI's multi-cloud SaaS environment or in the datacenter of one of our enterprise customers.
The SRE team meets these objectives by delivering automation and tooling to ensure that processes are scalable and reliable.
The CircleCI SRE team is globally distributed and remote-friendly.
We take advantage of multiple timezones to manage a platform for our global customer base.
What will make you successful:
Experience managing a container-based microservice architecture, including orchestration, service-discovery, monitoring, and debugging
Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, the OSI Model, Subnetting, and Load Balancing
In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.)
Proficiency in one or more of: C, C++, Java, Python, Go
Experience managing large-scale database systems in a cloud environment
Systematic problem solving approach, coupled with a strong sense of ownership and drive
Track-record of working cooperatively with software engineering teams
Focus on security in the delivery of all levels of a system
Strong preference for shipping incrementally with an understanding of the fundamentals of CI/CD
Desire to learn and grow
What you will do:
Design and deliver solutions to improve the availability, scalability, latency, and efficiency of CircleCI’s services
Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
Diagnose and resolve production issues in conjunction with software engineering team
Architect and implement shared infrastructure used by all services within the CircleCI platform, for both SaaS and on-prem configurations
Support and advise software engineering teams in the design of scalable services
Build and maintain tools for deployment, monitoring, and debugging
Plan and execute disaster recovery drills
Participate in rotating on-call duties, including incident management
If you’re interested in joining the team at CircleCI, please send a resumé and let us know why you’d be a great fit for our team.
If you contribute to an open source project, write a blog, or have a presence on the web (Twitter, GitHub, LinkedIn, etc.) we would love to hear about it.
CircleCI is a Bay Area Best Places to Work 2016 award winner.
Founded in 2011 and headquartered in beautiful downtown San Francisco with a global remote workforce, CircleCI is venture backed by Scale Venture Partners, DFJ, Baseline Ventures and Harrison Metal Capital.
We care deeply about diversity and inclusivity.
We’re hiring at all experience levels, and seek talented teammates from a wide variety of backgrounds and experiences who are equally committed to cultivating a work environment of respect and kindness.
We carefully consider every applicant that takes the time to apply.