datatrota
Signup Login
Home Jobs Blog

Senior Site Reliability Engineer at Moniepoint Inc. (Formerly TeamApt Inc.)

Moniepoint Inc. (Formerly TeamApt Inc.)Nigeria Networking and Tech Support
Full Time
Moniepoint is a financial technology company digitising Africa’s real economy by building a financial ecosystem for businesses, providing them with all the payment, banking, credit and business management tools they need to succeed.

Job Summary

  • Responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience. The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.

What You’ll Get To Do

  • Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments.
  • Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses (RCAs) to identify the root causes of issues and drive long-term fixes.
  • Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience.
  • Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health.
  • Participate in feature development discussions to ensure services are built with observability from the ground up.
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with Product and Engineering teams.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.

To succeed in this role, we think you should have

  • Minimum of 3 years of experience supporting enterprise applications in an SRE or similar role.
  • Knowledge of distributed systems, microservices architecture and software design patterns.
  • Experience with cloud platforms such as AWS, GCP, or Azure.
  • Strong knowledge of Kubernetes and container orchestration tools.
  • Experience using application performance monitoring tools, OpenTelemetry, and observability platforms such as New Relic, Datadog, ELK, or SigNoz
  • Excellent problem-solving and troubleshooting skills as an on-call engineer, with the ability to resolve complex infrastructure and application issues.
  • Proficient in setting up and maintaining monitoring dashboards and alerts using Grafana and Prometheus.
  • Working knowledge of a scripting/programming language (e.g., Python, Bash)
  • Proficiency in SQL databases (e.g., MySQL), writing complex sql queries against large datasets, and hands-on experience in database administration.

Method of Application

Signup to view application details. Signup Now
X

Send this job to a friend