Site Reliability Engineer
As a member on the Site Reliability Engineer team, you will work on large-scale system design and troubleshooting, and be fluent in systems programming and/or automation. You will have a desire to tackle the complex problems of scale which are unique to Tokopedia.
- Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Tokopedia's services.
- Solve problems related to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
- Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
- Conduct periodic on call duties using a follow-the-sun model.
- Bachelors degree in Computer Science or related technical field, or equivalent practical experience.
- Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting experience in Shell and Perl.
- Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
- Networking: experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing.