Report query methods evaluation

Report: query methods evaluation

Summary Evaluation Report :

evaluation_data.json – Contains metadata and user queries for telecom grievance categories.
evaluation_results.csv – Raw evaluation scores per query per method.
recommendations.json – Aggregated performance metrics (Precision, Recall, Response Relevancy, and Average Score) for each category and query method.

📊 Evaluation Analysis Report: Weaviate Search Method Performance

📌 Overview

The evaluation compares multiple search and retrieval methods applied over grievance categories from the telecommunications sector, focusing on how well each method retrieves relevant and helpful results. Each method is scored using three RAGAS metrics:

Context Precision – How many retrieved contexts were relevant?
Context Recall – Do the contexts cover all factual claims?
Response Relevancy – Is the generated or retrieved response answering the user query?

Goal: Determine the best-performing query method for each grievance category.

📂 Categories Evaluated

Summary of Best Methods Per Category

Key Metrics Definitions

Metric Breakdown

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

1.00

0.846

0.949

Hybrid

1.00

0.846

0.949

Keyword

1.00

0.830

0.943

Vector

1.00

0.732

0.911

Reranking

1.00

0.833

0.944

Filtered

1.00

0.846

0.949

Best Method (Excluding Generative):

Semantic / Hybrid / Filtered (Tie at 0.949)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

0.96

1.00

0.842

0.934

Hybrid

0.96

1.00

0.829

0.930

Keyword

1.00

0.829

0.943

Vector

0.84

0.986

0.716

0.847

Reranking

1.00

0.829

0.943

Filtered

1.00

0.812

0.937

Best Method (Excluding Generative):

Keyword / Reranking (Tie at 0.943)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

1.00

0.868

0.956

Hybrid

1.00

0.868

0.956

Keyword

0.96

1.00

0.868

0.943

Vector

0.92

0.985

0.726

0.877

Reranking

1.00

0.817

0.939

Filtered

1.00

0.800

0.933

Best Method (Excluding Generative):

Semantic / Hybrid (Tie at 0.956)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

1.00

0.822

0.941

Hybrid

1.00

0.814

0.938

Keyword

1.00

0.808

0.936

Vector

1.00

0.722

0.907

Reranking

0.80

0.659

0.753

Filtered

1.00

0.786

0.929

Best Method (Excluding Generative):

Semantic (0.941)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

0.96

1.00

0.804

0.921

Hybrid

0.96

1.00

0.804

0.921

Keyword

0.96

1.00

0.806

0.922

Vector

0.96

0.736

0.885

Reranking

0.80

0.641

0.747

Filtered

1.00

0.779

0.926

Best Method (Excluding Generative):

Filtered (0.926)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

0.80

0.653

0.751

Hybrid

0.80

0.653

0.751

Keyword

0.80

0.653

0.751

Vector

0.96

1.00

0.750

0.903

Reranking

1.00

0.816

0.939

Filtered

1.00

0.770

0.923

Best Method (Excluding Generative):

Reranking (0.939)

Method

Context Precision

Context Recall

Response Relevancy

Avg Score

Semantic

1.00

0.834

0.945

Hybrid

0.80

0.663

0.754

Keyword

1.00

0.832

0.944

Vector

1.00

0.737

0.912

Reranking

1.00

0.831

0.944

Filtered

1.00

0.775

0.925

Best Method (Excluding Generative):

Semantic (0.945)

Overall Summary

📌 Insights

Semantic search consistently performs well, especially in scenarios where the context is straightforward and vocabulary overlaps.
Filtered search performed surprisingly well in categories like UCC and Call Drop, possibly due to precise taxonomy and labeled data.
Vector search generally lags behind in response relevancy, despite having perfect precision/recall in some cases.
Reranking methods shine in edge cases where initial search is good but needs finer-grained prioritization.

PreviousTabular report for each query NextDMP Final Evaluation

Last updated 5 months ago

hashtagReport: query methods evaluation

hashtagSummary Evaluation Report :

hashtag📊 Evaluation Analysis Report: Weaviate Search Method Performance

hashtag📌 Overview

hashtag📂 Categories Evaluated

hashtagSummary of Best Methods Per Category

hashtagKey Metrics Definitions

hashtagCategory: Mobile Related > Call Drop

hashtagMetric Breakdown

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > Improper Network Coverage

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > Data Speed Lower Than Committed

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > Mobile Number Portability (MNP)

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > UCC Related Complaints

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > VAS Activation/Deactivation Without Consent

hashtagBest Method (Excluding Generative):

hashtagCategory: Mobile Related > SIM Card Activation/Deactivation/Fault

hashtagBest Method (Excluding Generative):

hashtagOverall Summary

hashtag📌 Insights