Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)
TheHRchapter is looking for you! The most diverse and speedy Headhunting agency is looking for international talent. Work in international projects with the best perks and compensation rates.
🚀We are hiring a senior MLOps/DevOps/SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you haven’t designed production-grade MLOps infrastructure, haven’t built CI/CD for ML, or haven’t deployed ML workloads on Kubernetes at scale, this role is not a fit.
You will design, build, and own the AWS-based infrastructure, Kubernetes platform, CI/CD pipelines, and observability stack that supports our AI models (Agentic AI, NLU, ASR, Voice Biometrics, TTS). You will be the technical owner of MLOps infrastructure decisions, patterns, and standards.
Location: Remote - Europe (PL/ES/PT/CZ/CY)
Key Responsibilities:
MLOps Platform Architecture (from scratch)
- Design and build AWS-based AI/ML infrastructure using Terraform (required).
- Define standards for security, automation, cost efficiency, and governance.
- Architect infrastructure for ML workloads, GPU/accelerators, scaling, and high availability.
Kubernetes & Model Deployment
- Architect, build, and operate production Kubernetes clusters.
- Containerize and productize ML models (Docker, Helm).
- Deploy latency-sensitive and high-throughput models (ASR/TTS/NLU/Agentic AI).
- Ensure GPU and accelerator nodes are properly integrated and optimized.
CI/CD for Machine Learning
- Build automated training, validation, and deployment pipelines (GitLab/Jenkins).
- Implement canary, blue-green, and automated rollback strategies.
- Integrate MLOps lifecycle tools (MLflow, Kubeflow, SageMaker Model Registry, etc.).
Observability & Reliability
- Implement full observability (Prometheus + Grafana).
- Own uptime, performance, and reliability for ML production services.
- Establish monitoring for latency, drift, model health, and infrastructure health.
Collaboration & Technical Leadership
- Work closely with ML engineers, researchers, and data scientists.
- Translate experimental models into production-ready deployments.
- Define best practices for MLOps across the company.
Qualifications and Skills:
We’re looking for a senior engineer with a strong DevOps/SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure, automation, and hands-on MLOps experience.
- 5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments.
- Strong experience designing, building, and maintaining Kubernetes clusters in production.
- Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
- Solid programming skills in Python or Go for building automation, tooling, and ML workflows.
- Proven experience creating and maintaining CI/CD pipelines (GitLab or Jenkins).
- Practical experience deploying and supporting ML models in production (e.g., ASR, TTS, NLU, LLM/Agentic AI).
- Familiarity with ML workflow orchestration tools such as Kubeflow, Apache Airflow, or similar.
- Experience with experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry).
- Exposure to deploying models on GPU or specialized hardware (e.g., Inferentia, Trainium).
- Solid understanding of cloud infrastructure on AWS, including networking, scaling, storage, and security best practices.
- Experience with deployment tooling (Docker, Helm) and observability stacks (Prometheus, Grafana).
Ways to Know You’ll Succeed
- You enjoy building platforms from the ground up and owning technical decisions.
- You’re comfortable collaborating with ML engineers, researchers, and software teams to turn research into stable production systems.
- You like solving performance, automation, and reliability challenges in distributed systems.
- You bring a structured, pragmatic, and scalable approach to infrastructure design.
- Energetic and proactive individual, with a natural drive to take initiative and move things forward.
- Enjoys working closely with people - researchers, ML engineers, cloud architects, product teams.
- Comfortable sharing ideas openly, challenging assumptions, and contributing to technical discussions.
- Collaborative mindset: you like to build together, not work in isolation.
- Strong ownership mentality - you enjoy taking responsibility for systems end-to-end.
- Curious, hands-on, and motivated by solving complex technical challenges.
- Clear communicator who can translate technical work into practical recommendations.
- Thrives in a fast-paced environment where you can experiment, improve, and shape how things are done.
What we offer
- Competitive fixed compensation based on experience and expertise.
- Work on cutting-edge AI systems used globall.
- Dynamic, multi-disciplinary teams engaged in digital transformation.
- Remote-first work model
- Long-term B2B contract
- 20+ days paid time off
- Apple gear
- Training & development budget

Our Core values at TheHRchapter
✔️ Transparency: We believe in transparent and smooth recruitment processes. You will get feedback from us.
✔️ Candidate experience: Perfect blend between automated and humanized recruitment processes. Don't hesitate to ask us for feedback, anytime.
✔️ Talented pool: We bring highly-skilled motivated candidates to our clients. Our candidates match their company values and management style.
✔️ Diversity and inclusion: There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences.
- Department
- IT & Tech vacancies
- Locations
- Multiple locations
- Remote status
- Fully Remote
About theHRchapter
Your Strategic Partner for HR, Payroll & Headhunting Solutions