notesum.ai
Published at November 18Benchmarking pre-trained text embedding models in aligning built asset information
cs.CL
cs.AI
cs.IR
Released Date: November 18, 2024
Authors: Mehrzad Shahinmoghadam1, Ali Motamedi1
Aff.: 1Department of construction engineering, École de technologie supérieure, Montreal, H3C 1K3, Canada
| Clustering tasks | No. of | Unique/total | Avg. sample | Total No. of | Avg. unique label |
|---|---|---|---|---|---|
| subsets | samples | length | unique labels | per subset | |
| Clustering-s2s | 18 | 2545/3815 | 28.04 | 31 | 5 |
| Clustering-p2p | 20 | 3067/4577 | 207.91 | 35 | 5 |
| Retrieval tasks | No. of | Avg. query | No. of | Avg. document | No. of document |
| queries | length | documents | length | per query (Avg.) | |
| Retrieval-s2p | 977 | 30.35 | 2761 | 312.75 | 8 |
| Retrieval-p2p | 977 | 128.5 | 2761 | 312.75 | 8 |
| Reranking tasks | No. of | Avg. query | No. of positives | No. of negatives | Avg. samples |
| queries | length | (unique/total) | (unique/total) | length | |
| Reranking-s2p | 179 | 27.89 | 1253/1253 | 2281/3759 | 310.15 |
| Reranking-p2p | 179 | 140.44 | 1253/1253 | 2241/3759 | 309.66 |