AI Evaluation Specialist at Odixcity Consulting

Odixcity ConsultingNigeria Design, Graphics and Media

Full Time

Odixcity Consulting is Nigeria's leading foreign outsourcing firm, specializing in human resources and procurement. We believe in delivering business solutions to groups, entrepreneurs, and SMEs.

Job Summary

We are looking for a sharp detailed senior to architect the systems that measure and improve our generative AI models. ]
You will work at the intersection of data science, product, and research to ensure our AI systems are not only accurate but also safe, unbiased, and aligned with human preferences.

Key Responsibilities

Design and implement robust automated evaluation frame works (using python) test LLMs for tasks like reasoning, coding, and summarization.
Lead the development of annotation rubrics and manage workflows, for human evaluators to generate high context preference data and golden datasets.
Design and execute adversarial testing (re teaming) to identify vulnerable, hallucinations, and biases in mode outputs before deployment.
Develop and calibrate reliable LLM-based evaluators to replace human raters at scale for specific metrics, validating their correlation with human judgment.
Analyze evaluation results to pinpoint specific model weaknesses (e.g. model fails at multi- step reasoning in finance contexts) and present actionable insights to modeling and product teams.
Build and maintain internal evaluation in platform and dashboards to track model performance across different versions and use cases.

Requirements

A Degree in Computer science, information Technology, Data Science or a relating field.
5+ years of experience in machine learning, Data science, or AI Evaluation
Proven track record of designing evaluation strategies for NLP or Generative AI products.
Expert-level proficiency in Python for scripting evaluations and analyzing results (pandas, NumPy).
Strong ability to query data (SQL) and perform statical analysis to validate evaluation confidence intervals and inter-annotator agreement.
Advanced ability to craft prompts for crafts prompts for both model testing and steering LLM-based evaluators.