SJ Digital || Job Details

28LPA Yearly

SRE GCP - TECH LEAD

SRE GCP - TECH LEAD @ IBM
Hyderabad / TELANGANA

Job Description

Primary Responsibilities

Site Reliability Engineering (SRE) is an engineering discipline that combines software and system engineering to build and run large scale, massively distributed, fault-tolerant systems. SREs ensure managed service offerings and customer deployments have reliability and uptime appropriate to user’s needs and a fast rate of improvement while monitoring and validating capacity and performance. Focused on reliability, scalability, and the development of automation to manage a set of repetitive tasks at scale.

Knowledge &Skills

In depth knowledge on SRE practices and concepts like SLA, SLO, SLI, Error budget, Toil elimination, Post-mortem etc.
Should have experience in any Monitoring and Observability tools: Grafana, Splunk, Dynatrace, gcp operation suite etc.
Should have understanding and knowledge into any APM tools App dynamics, Datadog etc – preferably app dynamics.
Should have experience in IaC: Terraform, Ansible etc.
Should have experience working with cloud-native applications to manage them effectively in GCP or Azure.
Should have experience into creating pipelines in CI/CD any tools like GitHub action, Azure devops, Jenkins etc.
Should have knowledge into version control any tools like Git,BitBucket etc.
Knowledge into any of the scripting languages like powershell,python,bash etc.
Coding infrastructure automation across the CI/CD pipeline
Responsible for ensuring the availability, performance, and scalability of a website or application.
Knowledge into containerization and orchestration: Docker, Kubernetes, Cloudrun(GCP) etc.
Involved in capacity planning and performance tuning to ensure that the site can handle increased traffic without issue.
Responsible for ensuring the availability, performance, and scalability of a website or application.
Should have experience working with cloud-native applications to manage them effectively.
Work closely with developers to identify and fix potential issues before they cause problems for users.
Deep understanding of how distributed systems work in order to be able to troubleshoot and optimize them.
Deep understanding of how different types of databases work in order to be able to effectively troubleshoot any issues that may arise.
Ability to communicate clearly and concisely about system alerts or outages to other members of your team.
Below points to be noted: Apart from JD, Customer is looking for a candidate who can mature their SRE practice across the division. Someone who is comfortable being a champion and leader in the SRE space.

Job Overview

Published on: Aug 23, 2024
Vacancy: 1
Employment Status: Full-Time
Experience: 8 Years to 10+ Years
Job Location: Hyderabad / TELANGANA
Salary: 28LPA
Application Deadline: Sep 23, 2024