Moniepoint is a financial technology company digitising Africa’s real economy by building a financial ecosystem for businesses, providing them with all the payment, banking, credit and business management tools they need to succeed.
Job Summary
- We are seeking a Site Reliability Engineer (SRE) responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience.
- The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.
Responsibilities
- Participate in on-call rotations to detect and triage service and reliability issues across all environments. Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders.
- Create and maintain meaningful dashboards and alerts. Work with development teams to instrument their code to ensure visibility.
- Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability across both applications and infrastructure.
- Implement and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) defined by the engineering leadership.
- Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.
Requirements
- Minimum of 3 years of experience supporting enterprise applications as an SRE or similar role with proficiency in writing code in Java, Go or Python
- Good understanding of distributed systems concepts, microservices architecture and software design patterns.
- Hands-on experience with Kubernetes. You have managed applications on a major cloud provider (GCP, AWS, or Azure), and can troubleshoot common container issues.
- Experience setting up dashboards in Grafana and using APM tools like Datadog, New Relic, Signoz.You have a Solid understanding of metrics, logs, and traces.
- Proficiency in SQL (e.g., PostgreSQL, MySQL). Ability to write complex queries to debug data issues and a basic understanding of database performance.
Method of Application
Signup to view application details.
Signup Now