Data Engineer (2)

JOB INDEX - 1 month ago - Job Mail

1 month ago

Data Engineer (2)

Main Purpose of the Job:

• Data Engineers build and support data pipelines and datamarts built off those pipelines. Both must be scalable repeatable and secure. They help facilitating getting data from a variety of different sources, in the correct format, assuring that it conform to data quality standards and assuring that downstream users can get to that data timeously. This role functions as a core member of an agile team.

• These professionals are responsible for the infrastructure that provide insights from raw data, handling and integrating diverse sources of data seamlessly. They enable solution handling large volumes of data in batch and realtime leveraging emerging technologies from both the big data and cloud spaces. Additional responsibilities include developing proof of concepts and implement complex big data solution with a focus on collecting, parsing, managing, analysing and visualising large datasets. They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.

• Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professional have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution. They have a solid understanding of physical database design and the systems development lifecycle. This role must work well in a team environment


Job Objectives

• Design and develop data feeds from an on-premise environment into a datalake environment in an AWS cloud environment

• Design and develop programmatic transformations of the data quality to correctly partition it, format it and validate or correct it

• Design and develop programmatic transformation, combinations and calculations to populate complex datamarts based on feed from the datalake

• Provide operational support to datamart datafeeds and datamarts

• Design infrastructure required to develop and operate datalake data feeds

• Design infrastructure required to develop and operate datamarts, their user interfaces and the feeds required to populated them


Task Information (Set up task with associated specific activities below)

• Design and develop data feeds from an on-premise environment into a datalake environment in an AWS cloud environment

- Establishments functional and non-functional requirements around the feed

- Work with the integration team to design the process for managing and monitoring the feeds to the company standards

- Work with the integration team to build and test the feed components

• Design and develop programmatic transformations of the data quality to correctly partition it, format it and validate or correct it

- Establish the functional and non-functional requirements for formatting and validating the data feed

- Design process appropriate to high volume data feeds for managing and monitoring the feeds to the Company standards

- Build and test the formatting and validation transformation components

• Design and develop programmatic transformation, combinations and calculations to populate complex datamarts based on feed from the datalake

- Establish requirement that a datamart should support

- Design the target data model, the transformations and the feeds, appropriate to high volume data flows, required to populate the datamarts

- Build and test the target data model, the transformations and the feed required to populate the datamarts

• Provide operational support to datamart datafeeds and datamarts

- Identity and perform maintenance on the feed as appropriate

- Work with the front-line support team and operations to support the feed in production

• Design infrastructure required to develop and operate datalake data feeds

- Specify infrastructure requirements for feed and work with operations team to implement those requirements and deploy the solution and future updates

• Design infrastructure required to develop and operate datamarts, their user interfaces and the feeds required to populated them

- Specify infrastructure required to develop and operate datamarts

- Specify infrastructure in term of front-end tools required to exploit the datamarts for end-user and work with front end team to deploy a complete solution for the user

- Specify and build any feeds required to populate front-end tools and work with the front-end team to optimise the performance of the overall solution


Report Structure

• Reports to manager for Data Management and Decision Support


Impact of Decision

• Time Span - Operational

• Problem solving - Complex to Highly Complex

• Risk of decisions - High Internal

• Financial impact - Medium

• Influence of work - Operational

• Work proficiency - Professional

• Demands of change - High


Job Related Experience

• Retail operations, 4+ years - Desirable

• Business Intelligence, 4+ years - Essential

• Big Data 2+ years - Desirable

• Extract Transform and Load (ETL) processes 4+ years - Essential

• Cloud AWS 2+years - Essential

• Agile exposure, Kanban or Scrum, 2+years - Essential


Formal Qualification

• IT-related, 3 years - Essential

• AWS Certification at least to associate level - Essential


Job Related Knowledge

• Creating data feeds from on-premise to AWS Cloud, 24 months - Essential

• Support data feeds in production on break fix basis, 24 months - Essential

• Creating data marts using Talend or similar ETL development tool, 48 months - Essential

• Manipulating data using python and pyspark 24 months, Essential

• Processing data using the Hadoop paradigm particularly using EMR, AWS's distribution of Hadoop, 24 months – Essential

• Devop for Big Data and Business Intelligence including automated testing and deployment,24 Months, Essential


Job Related Skills

• Talend,12 months - Essential

• AWS: EMR,EC2, S3, 12 months - Essential

• Python, 12 months - Essential

• PySpark or Spark, 12 months - Desirable

• Business Intelligence data modelling, 36 months – Essential

• SQL 36 months - Essential


Working Conditions

• Working hours - Normal

• Travel - None

• Time away from home - None

• Posture - Workstation in open plan office

• Physical danger - None

• Physical environment - Brackenfell – Cape Town, South Africa


Competencies

Essential

• Planning & Organising (Structuring tasks)

• Evaluating problems

• Executing assignments

• Achieving success

• Analytical thinking

• Communication


Desirable

• Creative thinking


Kindly regard your application as unsuccessful if you have not heard from the agency within 2 weeks.