Education: Bachelor’s degree in Computer Science or a related field required; Master’s degree in a quantitative discipline highly desirable.
• Proven Execution: 6+ years of engineering experience, with a minimum of 3 years strictly focused on MLOps or LLMOps in a production environment.
• AWS & Azure Mastery: Deep, hands-on proficiency in both ecosystems. You must be able to configure Bedrock and Azure OpenAI services, including private networking and endpoint security, on day one.
• Technical Stack: Expert Python, SQL, and PySpark. Extensive experience with containerization (Docker, Kubernetes) and orchestration tools (Airflow, Kubeflow, or Step Functions).
• LLM Tooling: Professional experience with evaluation and observability frameworks like LangSmith, Arize Phoenix, or WhyLabs.
• Data Science Flavor: A strong understanding of statistical validation, model evaluation metrics, and the ability to partner with Data Scientists to optimize model performance.
• Multi-Cloud Pipeline Execution: Build and maintain automated CI/CD and CT (Continuous Training) pipelines across AWS (SageMaker/Bedrock) and Azure (AI Studio).
• LLMOps Framework Implementation: Design and execute the infrastructure for Retrieval-Augmented Generation (RAG), including vector database management (OpenSearch, Pinecone, or Azure AI Search) and semantic index optimization.