Data Pipeline Monitoring and Troubleshooting: Ensuring that all data pipelines are operating correctly and addressing any issues promptly.
Data Quality Assurance: Implementing and overseeing processes to ensure the accuracy and integrity of data.
Query Optimization: Continuously optimizing queries and data retrieval methods for performance and efficiency.
Integration of New Data Sources: Adding and integrating new data sources into the existing data architecture.
Automated Script Execution and Monitoring: Running and monitoring automated scripts for data extraction, transformation, and loading (ETL).
Collaboration with Stakeholders: Communicating with business analysts, data scientists, and other stakeholders to understand data needs and requirements.
Performance Reviews of Data Systems: Evaluating the performance of databases, data lakes, and other storage systems.
Data Cleaning and Transformation: Regularly cleaning and transforming data to maintain its usefulness and relevance.
Report Generation: Creating and updating routine data reports for internal use or for stakeholders.
Code Reviews and Updates: Reviewing and updating ETL scripts and data pipeline code for improvements and efficiency.
Data Modeling and Database Design: Developing and refining data models, and optimizing database designs for performance and scalability.
Documentation and Knowledge Sharing: Updating documentation for data pipelines and databases, and sharing knowledge with the team.
Requirements:
Experience:
Relevant Work Experience: Hands-on experience in a data engineering role, demonstrating skills in managing large datasets and performing complex data processing tasks
Educational Background:
Degree in Computer Science, Engineering, Mathematics, or a related field
Technical Skills:
Programming Languages: Proficiency in languages like Python( or Java, or Scala).
Database Management: Deep understanding of SQL and experience with relational databases, as well as NoSQL databases like MongoDB.
Big Data Technologies: Familiarity with big data tools like Apache Hadoop, Spark, Kafka.
ETL Tools: Experience with ETL tools and processes.
Data Modeling: Knowledge of data modeling and understanding of different data structures.
Understanding of Data Warehousing Solutions: Familiarity with data warehousing concepts.
Data Visualization Tools: Skills in data visualization tools like Tableau or Power BI can be beneficial, as they may be required to present data insights effectively.