Site Reliability Engineer

As a member on the Site Reliability Engineer team, you will work on large-scale system design and troubleshooting, and be fluent in systems programming and/or automation. You will have a desire to tackle the complex problems of scale which are unique to Tokopedia.


Responsibilities

  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Tokopedia's services.
  • Solve problems related to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
  • Conduct periodic on call duties using a follow-the-sun model.

Requirements

  • Bachelors degree in Computer Science or related technical field, or equivalent practical experience.
  • Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting experience in Shell and Perl.
  • Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
  • Networking: experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing.

Apply for this Position

Personal information

Your Profile