Back to jobs

Site Reliability Manager

Job description

Site Reliability Manager

Christchurch

Hybrid working - 3 days home working and 2 days in the office

Up to £60,000 DOE

An exciting new opportunity has arisen within one of the most exciting tech companies based in Christchurch for a Site Reliability Manager.

As a Site Reliability Manager you will have functional knowledge in all areas of SaaS Operations and software delivery enablement with experience in management of large-scale global applications infrastructure and software delivery. The successful candidate should have experience working in a Software-as-a-Service offering. Candidates should also have experience designing, planning, implementing, tuning and operating software application technologies including automation code, cloud environments, micro-service architectures, and clustering technology. The right candidate shall know and follow all applicable industry best practices for management of a global application platform.

About the team:
As a member of our SaaS Operations team, you will join a highly motivated group of bright, fast-paced engineers. You'll work to migrate datacenter infrastructure to a cutting-edge cloud environment that will power our company's impressive growth. We are smart, innovative, and ambitious, and are looking for great people to join us.

What you'll do:

  • Lead and manage our high-performing site reliability team while being hands on.
  • Mentor, grow, and empower your team by giving them the skills, confidence, space, and motivation to make decisions independently that lead to their personal and professional success, and enable them to become technical leaders. In other words, align the best outcomes for growth of the people around and business impact.
  • Participate in deep technical design discussions within your team, and across engineering teams, and ensure that we're building the right systems and keeping the quality high.
  • Drive Design, Architecture, Operability, Security, and Scaling of our Platforms
  • Help develop and maintain processes, tools, and documentation in a multi-region cloud deployment.
  • Facilitate the evaluation of automation and new software solutions.
  • Collaborate with Architects, Developers, Data Reliability, and platform teams on designing scalable and highly available systems.
  • Ensure proper security, monitoring, alerting and reporting for application platforms.
  • Troubleshoot and resolve production issues
  • Help drive the capacity planning process

What you'll bring:

  • You have 5+ years of software support, reliability, or operations engineering experience in a highly customer-focused SaaS environment.
  • Experience in migrating from datacenter infrastructure to the public cloud.
  • Experience in designing for the cloud and utilizing cloud native solutions.
  • Experience with medium-scale to large-scale Windows and Linux production environments, preferably as part of an online service provider.
  • Strong sense of ownership of large projects and complex tasks.
  • You have production experience with multiple cloud vendors
  • You endorse infrastructure as code
  • You have a proven track record of managing diverse and distributed teams, ensuring all members can bring their best.
  • You possess strong leadership skills and the ability to motivate teams.
  • You will bring a collaborative partnership mindset, focused on business impact.
  • Ability to solve problems quickly while taking an automation first approach.
  • Hands on experience release, deployment, and environment lifecycle management.
  • Experience with Open Source technologies.
  • Experience with virtualization & container technologies
  • Hands-on experience with infrastructure-as-code tools and CI/CD concepts. (Preferably HashiCorp tools like Terraform/Consul/Packer/Nomad and management tools like Kubernetes/Salt/Azure DevOps)
  • Experience with more advanced automated monitoring and log aggregation systems. (NewRelic, DataDog, SumoLogic, Splunk, Logstash, etc.)
  • Experience with multi-geography and distributed systems.
  • Working knowledge of web, application, database, and OS server systems (Nginx, Tomcat, IIS, SQL Server, ElasticSearch, RabbitMQ, Redis)
  • Ability to manage competing priorities in a complex environment

Bonus if:

  • Previous experience working in SaaS Site Reliability Engineering (Site Reliability Leader, Software Engineering Manager, Operations Manager, etc.)
  • Bachelor's degree or equivalent work experience

The salary is up to£60k DOE

Hours: Monday - Friday, 09:00 - 17:30 hours with 60 minutes for lunch. Hybrid working is available with 3 days work from home as to be agreed with their line manager

Annual leave: 22 days per year plus bank holidays (annual leave will increase after 2 years' service)

Benefits; group pension scheme, death in service (4 x salary), incapacity benefit, holiday purchase scheme, Cycle2work scheme, onsite showers and restaurant, free parking, excellent wellbeing support through our EAP and staff access to our medical doctors and clinicians, social events (outside of Covid), enhanced Maternity, Paternity & Adoption leave, flexible working, Birthday day off and long service awards.

To apply for this Site Reliability Engineer role please contact lesleymorgan@spectrumit.co.uk