**Job Title: Site Reliability Engineer (SRE)**

**Overview:**
As a Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of critical services. Your role bridges the gap between development and operations by implementing robust system architecture, automation, and proactive monitoring. You will focus on key SRE practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and reducing operational toil. Collaboration with cross\-functional teams will be essential in fostering a culture of continuous improvement and accountability.

**Key Responsibilities:**

  • Design and implement resilient system architectures to support high availability and scalability.
  • Develop automation tools and scripts to enhance operational efficiency and minimize manual effort.
  • Define, monitor, and analyze SLOs and SLIs to maintain system reliability and performance.
  • Conduct in\-depth post\-mortem analyses to identify root causes and implement long\-term solutions.
  • Collaborate with development and operations teams to establish best practices in reliability and incident management.
  • Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including Kubernetes and virtual machines.
  • Ensure adherence to Service Level Agreements (SLAs) by maintaining high service delivery standards.
  • Identify and address performance bottlenecks, providing actionable recommendations for system enhancements.
  • Maintain comprehensive documentation of processes, incident responses, and operational workflows.

**Qualifications:**

  • Proficiency in programming languages such as Python, Golang, or Java, with a focus on operational efficiency.
  • Strong experience in system architecture and design, emphasizing reliability and scalability.
  • In\-depth understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post\-mortems.
  • Hands\-on experience with cloud environments such as AWS, Azure, or Google Cloud.
  • Expertise in Linux system administration and troubleshooting application support issues.
  • Familiarity with networking concepts and effective troubleshooting techniques.
  • Excellent problem\-solving skills and a proactive approach to operational challenges.
  • Ability to work independently while effectively collaborating within a team environment.

**Preferred Skills:**

  • Experience with monitoring tools and performance optimization techniques.
  • Strong scripting and automation capabilities for system administration tasks.
  • Hands\-on knowledge of cloud platform services (AWS, Azure, Google Cloud).
  • Familiarity with DevOps methodologies, including CI/CD, infrastructure as code, and containerization.

Job Type: Contract

Pay: RM10,000\.00 \- RM14,000\.00 per month

Benefits:

  • Health insurance
  • Opportunities for promotion
  • Professional development

Schedule:

  • Afternoon shift
  • Rotational shift

Supplemental Pay:

  • 13th month salary
  • Performance bonus
  • Yearly bonus

Work Location: In person

Salary

Location

Job Overview
Job Posted:
7 months ago
Job Expire:
8mos 2w
Job Type
Full time
Job Role
Total Vacancies
1

Share This Job: