AI Solution Engineer – ML Ops and Data Center DevOps

Location: Karadzicova 14 Bratislava,

About InoCloud:

InoCloud is a fast-growing company specializing in providing cost-effective and flexible AI training solutions for businesses and researchers. Our GPU-accelerated infrastructure empowers our customers to scale their AI capabilities without any vendor lock-in. By taking care of all the challenges of owning and operating on-site hardware, we enable our clients to focus on what truly matters – advancing their AI projects.

Job Description:

We are seeking a talented and experienced AI Solution Engineer to join our team, primarily focusing on ML Ops, Data Center DevOps, and customer support. In this role, you will play a key part in automating processes, optimizing hardware maintenance, and providing top-notch support to our customers. If you have a passion for AI, cutting-edge technology, and a knack for helping customers succeed, this is the perfect opportunity for you.

Responsibilities:

  • Design, develop, and maintain ML Ops and Data Center DevOps processes to improve efficiency, reliability, and scalability of InoCloud’s infrastructure.
  • Collaborate with the development team to create and implement automated processes for deployment, monitoring, and maintenance of AI training infrastructure.
  • Ensure the continuous availability and performance of GPU accelerators to meet customer demands and service level agreements.
  • Provide expert technical support to customers, addressing their questions and resolving issues in a timely and professional manner.
  • Collaborate with cross-functional teams to identify and implement hardware maintenance and optimization strategies.
  • Stay current with industry trends and best practices in ML Ops and Data Center DevOps to continuously improve InoCloud’s infrastructure and offerings.
  • Create and maintain clear documentation of processes, procedures, and system configurations.

Requirements:

  • Have demonstrated ability to create infrastructure, tooling, and end to end ML systems that facilitate rapid turnarounds for ML research teams
  •  3+ years of experience in Data Center/ Cloud DevOps, or a similar role
  • Strong knowledge of AI/ML technologies and GPU-accelerated infrastructure.
  • Proficiency in scripting and programming languages such as Python, Bash, or Go.
  • Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
  • Familiarity with cloud platforms, such as AWS, Azure, or GCP.
  • Understanding of networking principles, including routing and reverse proxy configurations.
  • Proficient in configuration management tools and version control systems, such as Git.
  • Excellent problem-solving skills and a customer-focused mindset.
  • Strong written and verbal communication skills.
  • Ability to work independently and as part of a team in a fast-paced, dynamic environment.
  • Fluent in English – spoken/written

Benefits:

  • Competitive salary and benefits package.
  • Opportunity to work with cutting-edge technology and an innovative team.
  • Professional growth and development opportunities.
  • Flexible working hours and remote work options.
  • A collaborative and supportive company culture.