MUST HAVE:
• Must have a minimum of 10-12 years of hands-on development experience implementing batch and events driven applications using Java, Kafka, Spark, Scala, PySpark and Python.
• Experience with Apache Kafka and Connectors, Java, Springboot in building event driven services, Python in building ML pipelines.
• Develop data pipelines responsible for ingesting large amounts of different kinds of data from various sources.
• Help evolve data architecture and work on Next Generation real time pipeline algorithms and architecture in addition to supporting and maintaining current pipelines and legacy systems.
• Write code and develop worker nodes for business logic, ETL and orchestration processes.
• Develop algorithms for better attribution rules and category classifiers.
• Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive search, discovery, and recommendations.
• Work closely with architects, engineers, data analysts, data scientists, contractors/consultants and project managers in assessing project requirements, design, develop and support data ingestions and API services.
• Work with Data Scientists in building feature engineering pipelines and integrating machine learning models during the content enrichment process.
• Able to influence on priorities working with various partners including engineers, project management office and leadership.
• Mentor junior team members, define architecture, code review, hands-on development and deliver the work in sprint cycle.
• Participate in design discussions with Architects and other team members for the design of new systems and re-engineering of components of existing systems.
• Wear an Architect hat when required to bring new ideas to the table, thought leadership and forward thinking.
• Take a holistic approach to building solutions by thinking of the big picture and overall solution.
• Work on moving away from legacy systems into next generation architecture.
• Take complete ownership from requirements, solution design, development, production launch and post launch production support. Participate in code reviews and regular on-call rotations.
• Desire to apply the best solution in the industry, apply correct design patterns during development and learn best practices and data engineering tools and technologies.
• Performs any other functions and duties assigned and necessary for the smooth and efficient operation
EDUCATION & EXPERIENCE:
• BS or MS in Computer Science (or related field) with 12+ years of hands-on software development experience working in large-scale data processing pipelines.
• Must have skills are Apache Spark, Scala and PySpark with 2-4 years of experience building production grade batch pipelines that handle large volumes of data.
• Must have at least 8+ years of experience in Java and API / Microservices.
• Must have at least 5+ years of experience in Python.
• 5+ years of experience in understanding and writing complex SQL and stored procedures for processing raw data, ETL, data validation, using databases such as SQL Server, Redis and other NoSQL DBs.
• Knowledge of Big data technologies, Hadoop, HDFS.
• Expertise with building events driven pipelines with Kafka and Java / Spark.
• Expertise with Amazon AWS stack such as EMR, EC2, S3.
• Experience working with APIs to collect and ingest data as well build the APIs for business logic.
• Experience working with setting up, maintaining, and debugging production systems and infrastructure.
• Experience in building fault-tolerant and resilient systems.
• Experience in building worker nodes, knowledge of REST principles and data engineering design patterns.
• In-depth knowledge of Java, SpringBoot, Spark, Scala, PySpark, Python, Orchestration tools, ESB, SQL, Stored procedures, Docker, RESTful web services, Kubernetes, CI/CD, Observability techniques, Kafka, Release processes, caching strategies, versioning, B&D, BitBucket / Git and AWS Cloud Ecosystem, NoSQL Databases, Hazelcast.
• Strong software development, architecture diagramming, problem-solving and debugging skills.
• Phenomenal communication and influencing skills
NICE TO HAVE:
• Exposure to Machine Learning (ML), LLM models, using AI during coding, build with AI.
• Knowledge of Elastic APM, ELK stack and search technologies such as Elasticsearch / Solr.
• Some experience in workflow orchestration tools such as Air Flow or Apache NiFi.
Apply Now
Apply Now