Report query methods evaluation
Report: query methods evaluation
Summary Evaluation Report :
evaluation_data.json– Contains metadata and user queries for telecom grievance categories.evaluation_results.csv– Raw evaluation scores per query per method.recommendations.json– Aggregated performance metrics (Precision, Recall, Response Relevancy, and Average Score) for each category and query method.
📊 Evaluation Analysis Report: Weaviate Search Method Performance
📌 Overview
The evaluation compares multiple search and retrieval methods applied over grievance categories from the telecommunications sector, focusing on how well each method retrieves relevant and helpful results. Each method is scored using three RAGAS metrics:
Context Precision – How many retrieved contexts were relevant?
Context Recall – Do the contexts cover all factual claims?
Response Relevancy – Is the generated or retrieved response answering the user query?
Goal: Determine the best-performing query method for each grievance category.
📂 Categories Evaluated
Mobile Related
Call Drop
5
Mobile Related
Improper Network Coverage
5
Mobile Related
Data Speed lower than committed
5
Mobile Related
Mobile Number Portability (MNP)
5
Mobile Related
UCC related complaints
5
Mobile Related
Activation/Deactivation of Value Added Services
5
Mobile Related
Activation/Deactivation/Fault of SIM Card
5
Summary of Best Methods Per Category
Call Drop
Generative
0.961
1.0
1.0
0.882
Improper Network Coverage
Generative
0.946
0.96
1.0
0.878
Data Speed lower than committed
Generative
0.958
1.0
1.0
0.874
Mobile Number Portability (MNP)
Generative
0.958
1.0
1.0
0.874
UCC related complaints
Generative
0.947
0.96
1.0
0.881
Activation/Deactivation of VAS
Generative
0.965
1.0
1.0
0.896
SIM Card Activation/Deactivation/Fault
Generative
0.967
1.0
1.0
0.902
Semantic search outperforms all other methods across every category evaluated.
(Excluding Generative Search)
Key Metrics Definitions
Category: Mobile Related > Call Drop
Metric Breakdown
Semantic
1.00
1.00
0.846
0.949
Hybrid
1.00
1.00
0.846
0.949
Keyword
1.00
1.00
0.830
0.943
Vector
1.00
1.00
0.732
0.911
Reranking
1.00
1.00
0.833
0.944
Filtered
1.00
1.00
0.846
0.949
Best Method (Excluding Generative):
Semantic / Hybrid / Filtered (Tie at 0.949)
Category: Mobile Related > Improper Network Coverage
Semantic
0.96
1.00
0.842
0.934
Hybrid
0.96
1.00
0.829
0.930
Keyword
1.00
1.00
0.829
0.943
Vector
0.84
0.986
0.716
0.847
Reranking
1.00
1.00
0.829
0.943
Filtered
1.00
1.00
0.812
0.937
Best Method (Excluding Generative):
Keyword / Reranking (Tie at 0.943)
Category: Mobile Related > Data Speed Lower Than Committed
Semantic
1.00
1.00
0.868
0.956
Hybrid
1.00
1.00
0.868
0.956
Keyword
0.96
1.00
0.868
0.943
Vector
0.92
0.985
0.726
0.877
Reranking
1.00
1.00
0.817
0.939
Filtered
1.00
1.00
0.800
0.933
Best Method (Excluding Generative):
Semantic / Hybrid (Tie at 0.956)
Category: Mobile Related > Mobile Number Portability (MNP)
Semantic
1.00
1.00
0.822
0.941
Hybrid
1.00
1.00
0.814
0.938
Keyword
1.00
1.00
0.808
0.936
Vector
1.00
1.00
0.722
0.907
Reranking
0.80
0.80
0.659
0.753
Filtered
1.00
1.00
0.786
0.929
Best Method (Excluding Generative):
Semantic (0.941)
Category: Mobile Related > UCC Related Complaints
Semantic
0.96
1.00
0.804
0.921
Hybrid
0.96
1.00
0.804
0.921
Keyword
0.96
1.00
0.806
0.922
Vector
0.96
0.96
0.736
0.885
Reranking
0.80
0.80
0.641
0.747
Filtered
1.00
1.00
0.779
0.926
Best Method (Excluding Generative):
Filtered (0.926)
Category: Mobile Related > VAS Activation/Deactivation Without Consent
Semantic
0.80
0.80
0.653
0.751
Hybrid
0.80
0.80
0.653
0.751
Keyword
0.80
0.80
0.653
0.751
Vector
0.96
1.00
0.750
0.903
Reranking
1.00
1.00
0.816
0.939
Filtered
1.00
1.00
0.770
0.923
Best Method (Excluding Generative):
Reranking (0.939)
Category: Mobile Related > SIM Card Activation/Deactivation/Fault
Semantic
1.00
1.00
0.834
0.945
Hybrid
0.80
0.80
0.663
0.754
Keyword
1.00
1.00
0.832
0.944
Vector
1.00
1.00
0.737
0.912
Reranking
1.00
1.00
0.831
0.944
Filtered
1.00
1.00
0.775
0.925
Best Method (Excluding Generative):
Semantic (0.945)
Overall Summary
Call Drop
Semantic / Hybrid / Filtered
0.949
Improper Network Coverage
Keyword / Reranking
0.943
Data Speed Lower Than Committed
Semantic / Hybrid
0.956
Mobile Number Portability (MNP)
Semantic
0.941
UCC Related Complaints
Filtered
0.926
VAS Activation/Deactivation Without Consent
Reranking
0.939
SIM Card Activation/Deactivation/Fault
Semantic
0.945
📌 Insights
Semantic search consistently performs well, especially in scenarios where the context is straightforward and vocabulary overlaps.
Filtered search performed surprisingly well in categories like UCC and Call Drop, possibly due to precise taxonomy and labeled data.
Vector search generally lags behind in response relevancy, despite having perfect precision/recall in some cases.
Reranking methods shine in edge cases where initial search is good but needs finer-grained prioritization.
Last updated