Skip navigation EPAM

Senior DevOps Engineer

  • Cupertino, CA, USA
  • hot
Job #: 45449
Named as one of Fortunes’ 100 Fastest Growing Companies for 2019, EPAM is committed to providing our global team of 30,100+ EPAMers with inspiring careers from day one. EPAMers lead with passion and honesty and think creatively. Our people are the source of our success and we value collaboration, try to always understand our customers’ business, and strive for the highest standards of excellence. No matter where you are located, you’ll join a dedicated, diverse community that will help you discover your fullest potential.


You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, this could be the perfect opportunity to join EPAM as a Senior DevOps Engineer. Scroll down to learn more about the position’s responsibilities and requirements.

As a member of the Data Engineering Operations team, you will be faced with highly complex issues on a large scale, distributed system environment. You will be empowered to develop and design new solutions heavily focused on system automation. We look for talented engineers in both the Operations and Development space to bring our unique solutions to production at a rapid pace.

The team manages the platform for both ingestion and analytics of worldwide system events. To accomplish this, we build automation tools and services to prevent failures and page out individuals when there really is a problem, not just noise. Our engineers not only work closely with Operations but also with the development and analytics engineers, as well as outside organizations. We build data pipelines for maximum efficiency, scalability and reliability to allow domain-specific engineers to focus on their specialties.


What You’ll Do

  • Monitor production, staging, test and development environments for a myriad of applications in an agile and dynamic organization
  • Provide incident resolution for technical production issues
  • Provide guidance to improve the stability, security, efficiency and scalability of systems
  • Determine future needs for capacity and investigate new products and/or features

What You Have

  • A degree in an associated field and/or other advanced certification along with significant experience
  • BS in computer science with 7-10 years or MS plus 5-7 years experience or related experience
  • Experience with Hadoop based technologies - Hive, Spark, HDFS/Yarn cluster administration
  • Have a passion for automation by creating tools using Python, Java or other JVM languages
  • Strong expertise in troubleshooting complex production issues
  • Expert understanding of Unix/Linux based operating system
  • Excellent problem solving, critical thinking, and communication skills
  • Experience deploying and managing CI/CD pipelines
  • Expertise in configuration management (such as Ansible, salt) for deploying, configuring, and managing servers and systems
  • The candidate should be adapt at prioritizing multiple issues in a high pressure environment
  • Should be able to understand complex architectures and be comfortable working with multiple teams
  • Ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Should be highly proactive with a keen focus on improving uptime availability of our mission-critical services
  • Comfortable working in a fast paced environment while continuously evaluating emerging technologies
  • Proficient in unix, command-line tools, and general system debugging
  • The position requires solid knowledge of secure coding practices and experience with the open source technologies

Nice to have

  • Experience with Kubernetes, Docker Swarm, or other container orchestration framework
  • Experience building and operating large scale hadoop/spark data infrastructure used for machine learning in a production environment
  • Experience in tuning complex hive and spark queries
  • Expertise in debugging hadoop/spark/hive issues using Namenode, datanode, Nodemanager, spark executor logs
  • Experience in Capacity management on multi tenant hadoop cluster
  • Experience in Workflow and data pipeline orchestration (Oozie,Jenkins etc.)
  • Experience in Jupyter based notebook infrastructure

What We Offer

  • Medical, Dental and Vision Insurance (Subsidized)
  • Health Savings Account
  • Flexible Spending Accounts (Healthcare, Dependent Care, Commuter)
  • Short-Term and Long-Term Disability (Company Provided)
  • Life and AD&D Insurance (Company Provided)
  • Employee Assistance Program
  • Unlimited access to LinkedIn learning solutions
  • Matched 401(k) Retirement Savings Plan
  • Paid Time Off
  • Legal Plan and Identity Theft Protection
  • Accident Insurance
  • Employee Discounts
  • Pet Insurance
  • EPAM welcomes all applicants and will consider qualified candidates with criminal history such as arrest and conviction records in a manner consistent applicable law, including the San Francisco Fair Chance Ordinance and Los Angeles Fair Chance Initiative for Hiring

Здравствуйте, чем мы можем вам помочь?

Наши офисы