Odixcity Consulting is Nigeria's leading foreign outsourcing firm, specializing in human resources and procurement. We believe in delivering business solutions to groups, entrepreneurs, and SMEs.
Job Summary
- We are looking for a proactive and hands-on Reliability Engineer to join our team. You will be crucial in ensuring our core services are stable, scalable, and efficient.
Responsibilities
- Closely monitor system health, performance, and availability using tools like Grafana, Prometheus, Datadog, or New Relic. Respond to and resolve incidents.
- Lead and document post-incident reviews to identify root causes and preventive actions.
- Write scripts (Python, Bash) and use configuration management tools to automate operational tasks, deployments, and recovery procedures.
- Build the internal platforms and tools that make reliability a default for every engineering team- self-healing systems, automated canary analysis, and performance tracing at scale.
- Work with software teams to define Service Level Objectives (SLOs) and Error Budgets. Implement improvements to reduce manual toil, improve system resilience, and prevent recurring issues.
- Manage and optimize cloud resources (AWS, Google Cloud, or Azure) to ensure cost-effectiveness and performance. Implement infrastructure as Code (IaC) principles.
- Lead the design and implementation of chaos engineering practices, disaster recovery automation, and capacity planning.
Requirements
- 3-5 years of experience in a DevOps, SRE, Linux System Administration, or Backend Engineering role.
- Proficiency in scripting language; Python or GO.
- Solid experience with cloud platforms; Azure, Google Cloud, AWS etc.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Practical knowledge of monitoring/ observability tools.
- Familiarity with CI/CD Pipelines (GitLab CI, Jenkins, GitHub Actions).
Core Skills:
- Excellent problem solving and trobuleshooting skills under pressure.
- Strong understanding of network fundamentals (TCP/IP, DNS, HTTP/S).
- Knowledge of database performance and reliability (PostgreSQL, MySQL, MongoDB).
- A systematic approach to automation and a desire to eliminate manual work.
- Good communication skills to collaborate with both technical and non-technical teams.
- Understanding of security best practices in infrastructure.
Method of Application
Signup to view application details.
Signup Now