Data Engineer

PredictX ,
London, Greater London

Overview

Job Description

Do you enjoy being at the forefront of technological innovation? Are you excited to see what new developments in AI, machine learning and predictive analytics will bring? PredictX helps leaders in business make improved decisions. Trusted by some of the world's biggest brands, our advanced software solution uses AI and machine learning technology to automate tactical tasks and enhance strategic decisions. We are headquartered in London with offices in Spain, Poland and the USA. We are looking for a forward thinking and highly competent Data Engineer to join our London team. As a Data Engineer you will be working on our data pipeline architecture, with the aim of providing clean, usable data to our Business Analysts and Data Scientists. You will be responsible for helping to build modular pipeline components and ensure technical documentation is created and maintained. Key Responsibilities * Helping design, build, maintain and operate the data pipeline. * Defining and building modular data pipeline components. * Ensuring that solid development practices, such as proper use of source control, full testing processes and automated deployment mechanisms, are followed * .Collaborate with our data scientists and business analysts to discover where business value can be found within the data we have available. * Maintaining existing systems and supporting migration to our new data pipeline architecture.Training our Client Analytics and Implementation teams in how to implement and support our clients. * Acting as a subject matter expert on all aspects of the data pipeline.Identifying potential performance issues, bottlenecks and pain points and recommend new and creative ways of resolving them. Who you are * We want you to come with creativity, expertise, flexibility and drive, but above all a desire to learn and keep learning * We want you to want to understand the big picture and how your work makes a difference Experience * 3+ years of proven experience using Python to build data pipelines, including familiarity with python's core big data / data science libraries: e.g. pandas, pyspark, scikit-learn etcSolid understanding of database design and SQL * Experience working in cross functional agile teams, particularly teams including Data Scientists, Software Engineers and Business Analysts * The ability to communicate complicated technical solutions to non technical users * Take ownership of feature development and ongoing maintenance * Technical understanding of infrastructure components, their dependencies, and interactions between servers, virtual systems, networks, databases, web applications, etc Skills * Distributed data processing, for example Spark * NoSQL Databases, such as MongoDB or Couchbase * Cloud computing platforms, such as Google Cloud Platform or AWS * Pipeline orchestration, for example Airflow * Technical understanding of infrastructure components, their dependencies, and interactions between servers, virtual systems, networks, databases, web applications, etc