Senior Technical Incident Manager

DAZN ,
London, Greater London

Overview

Job Description

Are you passionate about technology and love the idea of leading technical incident management? If you thrive in a fast paced, tech-first environment and are ready to take a hands on technical approach to a 24x7, 356 platform then this is a great opportunity to support the platform availability and technical monitoring of DAZN's global cloud based sports streaming platform! Join DAZN's growing Technical Incident Management team to drive the evolution of the DAZN Critical Incidents Technical Response Group, handling all technical incident types. You will take ownership for Incident Management as the key decision-maker and with the authority to direct the problem resolution path for the fastest restoration to any service. Through your management and restoration of impacted services from any critical incidents, you'll apply the right technical resources and act as technical lead for major incident calls. You'll be involved in dynamic, varied work from determining the client impact, agreeing on resolution actions, managing the Technical communication channel and collaborating with other Incident Managers. You will be passionate about delivering a Major Incident Management process of top quality and integrity to act as the interface to the other Technology and Development stakeholders. Plus, you'll have a unique opportunity to interact with suppliers! You'll be joining a growing team who are constantly looking for ways to evolve our technology with cutting edge solutions like AWS, ECS and Lambda and using varied languages from Node.JS to Java and PHP. We love innovation and out of the box thinking, so if you are looking for a chance to really push technical boundaries and work with cutting edge technology then DAZN is the place to be! HERE'S A BREAKDOWN OF WHAT YOU'LL DO (NOT ALL OF IT, JUST THE MOST IMPORTANT STUFF) * Line Management Responsibility of a team of Technical Incident managers and Senior engineers * Technically leads all aspects of critical incidents (S1-S3) focused on fastest service restoration/recovery - bridge, teams communication channels, sync-points for sub-tech teams leading investigations (including 3rd party vendors andDAZNengineeringteams * Be responsible for the quality and integrity of Major Incident Management process and is the interface with OPS IncidentManagers,Supportteams, and DAZNDevelopment/Engineering teams. * Support and lead technical incidents requiring deep technical and problem resolution skills of the team, this may include across regions working with other TIMs/Engineering teams /Vendors /Suppliers to support 24x7 coverage. * Partner with other Support, Dev and Engineering teams to resolve difficult or unique system issues that team members are not equipped to handle * Provide recommendations on troubleshooting and other technology improvements to quickly resolve incidents, ensuring infrastructure and application stability * Assume leadership responsibility during an S1 to direct theTIMteam as they work towards service restoration and Lead S1-S3 tech incident bridgecalls, determine SMEs needed, identify problem and release/de-escalate after diagnosis * Build strong internal and external relationships with technical teams, customers and third parties * Ensure the TIM team meets resolution specifications as designed in the SLA while also enabling reduction of mean time to resolution * Have an attitude of flexibility and willingness to support a 24x7 global operation via off-hours support or on-call availability DO YOU HAVE THESE ESSENTIALS? * Demonstrated leadership and team management abilities of managing a senior technical team * Strategic and tactical thinking, quantitative and analytical skills, while under pressure * Working knowledge of ITIL incident, problem, and change management components * The ability to co-ordinate technical, incident and supplier side teams to ensure that all incidents are accurately prioritised and effectively managed * Experience in systems across Cloud-based environments and dealing with applications built in Microservices architecture. * Extensive experience of managing major incidents especially those that have a significant cross service impact, including how to influence technical teams not under your direct control * The ability to identify early indications of major incidents not progressing well and the skill to engage the right teams to get them on track * Excellent written and oral communication skills; with a special focus on customer/client level interaction * Practical experience with incident/outage and crisis management * Experience with monitoring tools like Check Mk, Nagios, Pager Duty, Datadog, New Relic or similar * Working knowledge of physical IT infrastructures such as Enterprise Server Platforms and related IT architectures and equipment * Exposure to working with Log Aggregation tools like Elk, Logz.io or similar NOT ESSENTIAL BUT GREAT IF YOU ALSO HAVE * Broadcast industry experience and understanding of broadcast s