Job Description:
• Work collaboratively with Product Managers, Designers, and Engineers to set up, develop, and maintain critical back-end integrations for data and analytics platform.
• Create and maintain new and existing data pipelines, Extract, Transform, and Load (ETL) processes, and ETL features using Azure cloud services.
• Build, expand, and optimize data and data pipeline architectures.
• Optimize data flow and collection for cross functional teams of database architects, data analysts, and data scientists.
• Operate large-scale data processing pipelines and resolve business and technical issues pertaining to the processing and data quality.
• Assemble large, complex sets of data that meet non-functional and functional business requirements.
• Identify, design, and implement internal process improvements including re-designing data infrastructure for greater scalability, optimizing data delivery, and automating manual processes.
• Develop and document standard operating procedures (SOPs) for new and existing data pipelines.
• Build analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition.
• Write unit and integration tests for all data processing code.
• Read data specifications and translate them into code and design documents.
Requirements:
• All candidates must pass public trust clearance through the U.S. Federal Government.
• Bachelor's degree in Computer Science, Software Engineering, Data Science, Statistics, or related technical field.
• 8+ years of experience in software/data engineering, including data pipelines, data modeling, data integration, and data management.
• Expertise in data lakes, data warehouses, data meshes, data modeling and data schemas (star, snowflake…).
• Extensive experience with Azure cloud-native data services, including Synapse, Data Factory, DevOps, KeyVault, etc.
• Expertise in SQL, T-SQL, and Python with applied experience in Apache Spark and large-scale processing using PySpark.
• Proficiency with data formats: parquet, distributed snappy parquet, and .csv.
• Understanding of common connection protocols, such as SFTP.
• Proven ability to work with incomplete or ambiguous data infrastructure and design integration strategies.
• Excellent analytical, organizational, and problem-solving skills.
• Strong communication skills, with the ability to translate complex concepts across technical and business teams.
• Proven experience working with petabyte-level data systems.
Benefits:
• Highly competitive salary
• Full healthcare benefits