2024 Infrastructure Site Reliability Engineering Professional Intern



Software Engineering, Other Engineering
Posted on Thursday, January 11, 2024
At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk.

Your Role and Responsibilities
Site Reliability Engineering (SRE) professionals are engineers who specialize in reliability and resiliency with the right mix of knowledge and skills in software and systems, responsible to analyze business needs, problem determination, advise & design, build, test, deploy, changes and maintenance of a well-engineered information system and ecosystems.

We’re seeking skilled, automation-focused SREs to maintain and administer the PowerVS Cloud Infrastructure-as-a-Service environment and provide reliable and secure offering to clients.

As an Automation focused intern for Site Reliability Engineer, you will perform the following tasks:
• Develop, Test, Deploy and Maintain Automation code for various procedures/runbooks defined for several PowerVS Data Center Build, Operations and Support related tasks.
• Develop, Test, Deploy and Maintain Automation code for various Data Collection, Logs processing, infrastructure monitoring, backup and restore of critical logs/configuration data in PowerVS Data Centers
• Develop automation to reduce manual toil (automated, repetitive tasks) using shell scripts (bash, etc), Python, Ansible, and related tools and languages.
• Develop, Test, Deploy and Maintain Automation code to perform code stack updates on infrastructure systems (VIOS, firmware, PowerVC, HMC, Novalink, NIM servers) as well as cloud supporting systems (jump servers, sobox, network nodes, gateways, TSM servers).
• Develop, Test, Deploy and Maintain Automation code to upload, and maintain stock images offered in PowerVS environments.
• Develop, Test, Deploy and Maintain Automation code to remotely administer AIX and Linux servers, maintain User IDs (Add/delete) and passwords.

Required Technical and Professional Expertise

  • Significant scripting/coding experience for automating various aspects of IBM Power systems administration.
  • Automation using Python, shell scripting (bash, etc), Ansible, and related tools and languages.
  • Experience with AIX and Linux administration, commands, and networking.
  • Experience with Redhat OpenShift, Kubernetes
  • Experience with DevOps, CI CD, Terraform
  • Good communication: ability to communicate effectively.
  • An automation mindset, wherever possible you should look to use scripting and automation.

Preferred Technical and Professional Expertise

  • Experience with configuring and tuning IBM AIX, VIOS, PowerVC
  • IBM Cloud CLI, APIs, Terraform
  • Knowledge of IBM Power Systems, Storage Systems, Cisco ACI, Juniper vSR
  • Understanding of system monitoring (Nagios, ELK stack)