Site Reliability Engineer

Paddle ,
London, Greater London

Overview

Job Description

As we have only one SRE team at Paddle, our role is "Everything SRE" with a focus on infrastructure, reliability standards, and practices. By following this model: * It's easy to spot patterns and draw similarities between services and projects. * We act as a glue between disparate product teams, creating solutions out of distinct pieces of software. * Enable product engineers to use DevOps practices to maintain user-facing products without divergence in practice across the business. * Define production standards as code and work to smooth out any sharp edges to greatly simplify things for the product engineers running their services. Our Engineering Department is split into 3 cross-functional Product teams and an SRE team. Our Product teams are made up of an Engineering Team Lead, a Product Manager, and Frontend and Backend engineers, depending on the demands of the team. Each Product team focuses on a specific domain, currently, these are Checkout, Payments and SaaS. Our SRE team's job is to enable Product teams to be able to take on Platform and Ops tasks themselves, and ensure our platform is reliable. You are empowered to use the right tech for the job. You'll have the freedom to input into what technology and tooling are used and educate the rest of your colleagues accordingly. As an SRE, we want you to be a driving force of improving and automating how our Product teams develop software at all stages of its lifecycle, which we strive to achieve with strong collaboration and communication with our engineers. Tech we use Here's some of the tech we use day to day, but we're not expecting you to have experience in all areas * Go for our new services * PHP and Laravel for our legacy system * Docker in production and local development * gRPC for internal services running on AWS Fargate * AWS lambda for event-based services * AWS SQS for our asynchronous message queues * Aurora MySQL and DynamoDB for persistent data storage * Redis as a key/value store * Terraform and Cloudformation for infrastructure management * React for our new front-end development * Cloudflare for our DNS server What you'll do * Be on the on-call rotation to respond to incidents * Author blameless postmortems * Handle production incidents * Create, maintain and test our system recovery process * Enrich operational playbook and runbooks * Monitoring, alerting and SLO tracking * Increase Product teams velocity without violating SLOs * Spot patterns and draw similarities between services and projects? * Developing tools to maximise engineering efficiency such as automating the deployment infrastructure * Be an advocate of the GitOps methodology * Collaborate and enable engineers to do their job more efficiently * Seek out processes that can be improved with automation We'd love to hear from you if you * Have AWS experience, we use ECS/Fargate, EC2, RDS, S3 and Lambda * Have a software engineering background, and ideally experience with Go, which we use for our tooling * Knowledge of platform and ops concepts such as networking and Linux administration * Experience working with microservices and distributed systems at scale * Experience with monitoring tools: we use NewRelic, Grafana, ELK, Pingdom and PagerDuty. Why you'll love working at Paddle We are a diverse team of around 140 people based near Shoreditch. We care deeply about enabling a great culture which is inclusive no matter your background. We celebrate our diverse group of talented employees and we pride ourselves on our transparent, collaborative, friendly and respectful culture. We offer a full slate of benefits, including competitive salaries, stock options, pension plans, private healthcare and on-site mental health coaching sessions. We believe in flexible working and offer all team members unlimited holidays and 4 months paid parental leave regardless of gender. Plus we offer some not-so-standard, extra-fun benefits, which can include anything from joining the office football team, enjoying a board game night, taking up in-office meditation sessions or in-house massages. We host regular company get-togethers and quarterly socials. We have weekly catered lunches and of course, fully stocked fridges and cold brew on tap. We value learning and will help you with your personal development where we can - from constant exposure to new challenges and annual learning stipend to regular internal and external training. About us Our mission is to help software companies succeed - enabling them to focus on creating products the world loves. Hundreds of companies rely on our e-commerce platform to sell their software products globally, as well as our powerful analytics and marketing tools to understand and grow their businesses. Our vision is to become the platform that all software companies use to run and grow their business. We aim to replace a fragmented ecosystem of specialised tools with a unified platform that removes the complex burden that comes with running a software busine

Get a Free CV Review

Let the professionals help you find a job.

Learn More

Site Reliability Engineer

Overview

Job Description

People also viewed

Get a Free CV Review

Related Jobs

Get a Free CV Review