Author

Nabeel Ghafoor

Senior Engineer

Senior engineer at ByteTuned, leading production AI builds and modernizations.

github.com/nag381

4 articles

How to evaluate a RAG system: the metrics that matter

Evaluate a RAG system in two halves: did retrieval fetch the right context, and did the model answer faithfully from it? Measure retrieval with context precision and recall, generation with faithfulness and answer relevancy — against a fixed set of test cases.

Nabeel GhafoorMay 14, 2026

Engineering Practice

How to reduce LLM hallucinations in production

You cannot fully eliminate hallucinations, but you can drive them down with layers: ground the model in retrieved facts, constrain it with low temperature and structured output, validate with guardrails and an LLM judge, and measure the rate with evals.

Nabeel GhafoorMay 7, 2026

Case Notes

Shipping a grounded RAG assistant in two weeks

A field report on scoping, retrieval quality, and the evals that let us put a retrieval-grounded assistant in front of real users — fast.

Nabeel GhafoorApril 22, 2026

Engineering Practice

How to cut LLM costs in production

Most production LLM bills can be cut 60–80% without hurting quality, because most requests are easy and do not need your most expensive model. The big levers: route to smaller models, cache repeated prompts, right-size, and trim context.

Nabeel GhafoorMarch 25, 2026