Cloud Operations Site Reliability Engineer

NATIONWIDE BUILDING SOCIETY ,
London, Greater London

Overview

Job Description

Field1 We're no strangers to being at the forefront of technological breakthroughs - first in the UK to launch internet banking, joint first with Apple pay, Cisco leading-edge technology in our branches. Building on that success, we're now on a hugely exciting and ambitious transformation journey to become a simpler, more agile and innovative organisation, whilst remaining relentlessly focused on delivering legendary products and services to our members... Our adoption of cloud is fundamental to the success of our material investment in technology for now and the future. The creation of first-class cloud products and capabilities that can be harnessed and leveraged are critical to achieving our strategy, delighting our members and the foundation of our digital capability. Our level of ambition is set high, that's why we need an experienced cloud talent to drive forward our cloud strategy delivery. And that's where you come in... Want to get involved in Cloud based technical challenges besides just reading blog articles? Are you someone who lives and breathes hyperscale public Cloud providers? Are you someone who appreciates similarities and differences between DevOps and SRE? Part of Nationwide's exciting and growing team of Cloud Operations Site Reliability Engineers (SRE) you'll use your expertise in the creation of technical artefacts to deliver against Cloud Operations SRE objectives that are aligned to our engineering principles and the architectural strategy. As part of the IT Operations and Service Delivery community and working within our Cloud Centre of Excellence, we are looking for talented people with the ability to act like a Developer and think like a Systems Operator. What you'll be doing As a core member of the team responsible for the reliability, availability and performance of the cloud platforms and services that underpin Nationwide. You will: * Influence architectural & design decisions with regards to the operational reliability of our cloud platforms * Collaborate with architecture, security, engineering teams and the wider Cloud Centre of Excellence to set up best in class cloud platforms and subsequently provide guidance on their consumption * Drive efficiency, automation, and cost reduction by automating manual and repetitive tasks * Develop features and codified artefacts for platform improvements leveraging automation and infrastructure as code * Provide technical expertise at parts and/or all stages of the delivery lifecycle * Enforce standards and provide governance and contribute to process improvement * Identify and escalate dependencies, risks and exceptions that will affect implementation of technical artefacts * Share knowledge with peers to contribute to the growth of knowledge within Cloud Operations SRE About you * Exposure to and experience in Public Cloud * Hands-on experience in automating the deployment and monitoring of services in Public Cloud * Ability to design and implement Cloud platforms for elasticity and scalability * Hands on experience in infrastructure as code and post deployment configuration management using tools such as Terraform, ARM, DSC, Puppet, Chef, Ansible etc. * Understanding of continuous deployment using CI/CD tools like Jenkins, SCM (Git, SVN) along with code reviews * Experience in making changes to production & live systems ensuing service uptime adhering to SDLC best practices * Exposure with Docker, Kubernetes and cloud native development frameworks (Serverless & PaaS) * Good scripting & development skills * A strong understanding of core network protocols and services (TCP/IP, DNS). * Experience in system administration including configuration and troubleshooting. * A strong understanding of IAM roles and policies * Ability to analyse network behaviour, performance and application issues using standard tools. * Knowledge of database and replication methodologies * Experience with distributed systems design, maintenance & disaster recovery It would be nice if you also had: * Ability to make design decisions that minimise and optimise infrastructure cost * Experience of scripting against APIs would be advantageous. * Developer and / or Administrator certifications The extras you'll get Our people's success isn't based on how long they spend at their desk. While you'll have contracted hours, we want to offer a flexible environment where possible. That might be working from home, logging on from other offices across the UK, or working part time or compressed hours. We've let you know about the flexibility available for this role at the start of the advert. This means you can quickly decide if it suits how you'd like to work. Remove above paragraphs if flexible working isn't possible, for example operational areas. There are all sorts of employee benefits available at Nationwide, including: * A personal pension - if you put in 7% of your salary, we'll top up by a further 16% * Up to 2 days of paid volunteering a year * Life assurance worth