**Job Responsibilities:**
● Automate manual ops tasks to streamline processes and reduce manual effort.
● Identify areas where systems can be improved to increase system reliability and reduce system incidents.
● Manage and support proactive monitoring solutions across the Production environment
● Look for trends and themes in issues reported in Live Applications and facilitate investigations by Developers to avoid repeated occurrences
● Perform actions on the Product codebase (backend/frontend) for real\-time diagnosis of major incidents in Live systems
● Analyze and diagnose ‘difficult’ or tricky to reproduce problems
● Perform analysis and reporting on frequently occurring Live problems
● Track the execution of manual or script\-based data fixes on the live environment and prioritize deep dives as appropriate
● Assist Developers who are fixing bugs to understand the detail and user scenarios around reported bugs to accelerate triage and fixing
● Serve in IT Production On\-call rotations supporting infrastructure and services 24x7
**Requirements:**
● Minimum 5 years real\-world hands\-on in a related field
● Experience of implementing SRE or DevOps practices
● Hands\-on experience of AWS preferably in a large\-scale enterprise system
● Understanding of Docker \& Kubernetes and Container technology
● Operational knowledge of Elasticsearch, MS SQL and MySQL
● Knowledge of Monitoring and alerting tools such as Opensearch, Datadog, Zabbix
● Experience writing scripts in a language such as Python, Bash, Java, JavaScript and/or node.js
\- able to work on US or UK Time zone
\- Hybrid work arrangement
Job Type: Full\-time
Pay: RM8,000\.00 \- RM12,000\.00 per month
Benefits:
- Cell phone reimbursement
- Dental insurance
- Free parking
- Health insurance
- Opportunities for promotion
- Parental leave
- Professional development
- Vision insurance
Schedule:
Work Location: In person