notesum.ai
Published at October 30BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios
cs.AI
Released Date: October 30, 2024
Authors: Bora Caglayan1, Mingxue Wang1, John D. Kelleher2, Shen Fei3, Gui Tong3, Jiandong Ding3, Puchao Zhang1
Aff.: 1Huawei Ireland Research Centre, Dublin, Ireland; 2ADAPT Research Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland; 3Huawei Technologies Co., Ltd.

| Benchmark | Definition | BI Question Categories | SQL Query Evaluation Metrics | SQL Result Evaluation Metrics | Question # |
| WikiSql [13] | Generated questions from Wikipedia | ✗ | SQL logical form exact match | Result exact match | 80645 |
| Spider [12] | Student generated database covering a wide range of domains | ✗ | SQL component exact match | Result exact match | 10181 |
| Yelp & IMDB [11] | Yelp website and the Internet Movie Database | ✗ | ✗ | ✗ | 259 |
| MAS [6] | Microsoft academic search database | ✗ | ✗ | ✗ | 196 |
| Advising [3] | Course information | ✗ | SQL component exact match | ✗ | 4579 |
| CSpider [8] | Spider Benchmark for Chinese | ✗ | SQL component exact match | ✗ | 9691 |
| BIS (Our benchmark) | Benchmark with temporal information for BI applications | ✓ | SQL semantic similarity | Result partial similarity | 239 |