Latin America, Remote

Senior Kubernetes Developer – OPS00016

Kubernetes
Senior
Software Engineer

Senior Kubernetes Developer – OPS00016

At Dev.Pro, we partner with businesses worldwide, from startups to Fortune 500 companies — across fintech, retail, hospitality and beyond.

With a remote‑first mindset and a team in 55+ countries, we focus on aligning technical expertise with client needs, communicating clearly, and staying adaptable as priorities shift. This commitment to ownership and flexibility helps us create lasting partnerships — so you can focus on what you do best.

About this opportunity

We invite a skilled Kubernetes Developer to join our fully remote, international team. In this role, you’ll build and optimize the Kubernetes orchestration platform and develop custom operators to run HPC/AI workloads efficiently on GPU clusters. You’ll enhance infrastructure performance and reliability, create internal tools to improve the developer experience, and ensure multi-tenant HPC workloads remain secure and compliant.

What’s in it for you:

• Work on cutting-edge GPU infrastructure and next-gen HPC/AI workloads

• Build a Slurm-on-Kubernetes product from scratch and shape its architecture

• Collaborate with a top-tier international team and grow through continuous learning and conference participation

Is that you?

• 3+ years of hands-on Kubernetes experience in production

• Experience with HPC schedulers (Slurm, PBS, LSF, Volcano)

• Strong background in GPU resource management and distributed systems

• Experience with cloud/hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)

• Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storage

• Deep knowledge of HPC job scheduling and workload orchestration

• Expertise in IaC (Terraform, Helm, or GitOps: ArgoCD/Flux) and monitoring & observability (Prometheus, Grafana, Jaeger, ELK)

• Programming skills in Go, Python, Bash/Shell

• Familiarity with PyTorch, TensorFlow, distributed training, and model serving

• Skills in Linux administration, performance tuning, and advanced networking (RDMA, InfiniBand, TCP/IP, DNS, load balancing)

• Experience in storage management and optimization for large datasets

Key responsibilities and your contribution

In this role, you’ll design, develop, and manage Kubernetes platforms for GPU-intensive AI/HPC workloads.

• Design and build a Slurm-like orchestration layer on Kubernetes for HPC/AI workloads

• Develop custom operators and controllers for GPU job scheduling and execution

• Integrate batch schedulers with Kubernetes to provide a hybrid HPC/Cloud product

• Implement advanced GPU resource management

• Build internal tools and a self-service platform to simplify AI/HPC job deployment and management

• Build a cloud-native platform for AI training, inference, and HPC workloads

• Optimize scheduling to improve GPU utilization and reduce queue times

• Monitor GPU clusters, troubleshoot production issues, and ensure high availability, fault tolerance, and disaster recovery

• Develop CI/CD pipelines for GPU-intensive workloads

• Implement best practices for multi-tenant GPU clusters with AI/HPC workloads

• Ensure compliance with data sovereignty and international regulations

• Maintain secure container, runtime, and workload isolation policies

Apply for this vacancy

Senior Developer Relations Engineer (Python, Kubernetes, Docker) – OPS00018

Senior
Software Engineer
Docker
Kubernetes
Python

Details

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Senior Kubernetes Developer – OPS00016

Senior Kubernetes Developer – OPS00016

Apply:

Senior Developer Relations Engineer (Python, Kubernetes, Docker) – OPS00018