Cloudera Big Data Engineer

EPAM Systems ,
London, Greater London

Overview

Named as one of Fortune’s List of 100 Fastest Growing Companies for 2019, EPAM is committed to providing our global team of 36,700 EPAMers with inspiring careers from day one. EPAMers lead with passion and honesty, and think creatively. Our people are the source of our success and we value collaboration, try to always understand our customers’ business, and strive for the highest standards of excellence. No matter where you are located, you’ll join a dedicated, diverse community that will help you discover your fullest potential. DESCRIPTION EPAM Big Data Practice is looking for the Cloudera Big Data Engineers . As the Cloudera Data Engineer, you will be responsible for designing and implementing the management, monitoring, security and privacy of data using the full stack of Cloudera Hadoop Ecosystem services to satisfy the business needs. You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, keep reading to learn more about this exciting role Come and join EPAM where Engineering is in our DNA. RESPONSIBILITIES Implement non -relational data stores: Implement a solution that uses Hive, HBase, Impala DB, HDFS Implement data distribution and partitions Implement a consistency model in Hive/HBase Provision a non-relational data store in HDFS Provide access to data to meet security requirements Implement for high availability, disaster recovery, and global distribution Manage data security: Implement data masking Encrypt data at rest and in motion Develop batch processing solutions: Develop batch processing solutions by using Hive and Spark transformations Ingest data by using Sqoop Create linked services and datasets Create oozie workflow pipelines and activities Create and schedule jobs Implement Cloudera Spark clusters, jupyter notebooks, jobs, and autoscaling Ingest data into Cloudera HDFS Develop streaming solutions: Configure input and output with Kafka Select the appropriate windowing functions Implement event processing by using Spark Streaming /Kafka Ingest and query streaming data using Spark Monitor Cloudera Services: Monitor Cloudera Cluster and its services like Spark, Oozie workflows, HDFS, Hive etc Optimize Cloudera data solutions: Troubleshoot data partitioning bottlenecks Optimize HDFS Storage Optimize Spark Streaming Analytics Optimize Hive/Impala Analytics Manage the data lifecycle REQUIREMENTS 5-10 years in IT 2-3 years in Cloudera and Hadoop Ecosystem Experience in Agile or PMI methodology managed projects Experience in enterprise applications, solutions and data infrastructures Experience in designing of data management solutions Experience in designing of robust CI/CD solutions Python/PySpark -highly desired Java/Scala Apache Hadoop HDFS, Map Reduce Oozie Hive Cloudera Impala Spark Kafka Spark Streaming Yarn Sqoop Git, GitLab, Artifactory ADO SCRUM Developer Big Data Hadoop certification WHAT WE OFFER We offer a range of discretionary benefits from time to time, including: Group personal pension plan, life assurance and income protection Private medical insurance, private dental care and critical illness cover Cycle scheme and season ticket loan Employee assistance program Gym discount, Friday lunch, on-site massage and social events 1 day off for your wedding and baby basket Tech purchase scheme Unlimited access to LinkedIn learning solutions Some of these benefits may be available only after you have passed your probationary period This job was originally posted as www.cwjobs.co.uk/job/89906942