The factual correctness is a metric that compares and evaluates the factual accuracy of the LLM generated response with the reference.
- Score ranges from to , with higher values indicating better performance.
- The metric uses a LLM to first break down the response and reference into claims (statements) and then uses natural language inference to determine the factual overlap between the response and the reference.
The factual overlap can be measured using precision, recall and F1-score.
Important
When working with this metric is possible to adjust the number of claims (statements) generated by the LLM from a single sentence both for response and reference. This control can be made using the concepts of atomicity and coverage from Ragas library.
- Atomicity: refers to how much a sentence is broken down into its smallest, meaningful components.
- Coverage: refers to how comprehensively the claims represent the information in the original sentence.