notesum.ai

Published at December 6

A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges

cs.AI

cs.DB

Released Date: December 6, 2024

Authors: Aditi Singh¹, Akash Shetty¹, Abul Ehtesham², Saket Kumar³, Tala Talaei Khoei⁴

Aff.: ¹Cleveland State University; ²The Davey Tree Expert Company; ³The Mathworks; ⁴Khoury College of Computer Science, Roux Institute at Northeastern University

Arxiv: http://arxiv.org/pdf/2412.05208v1

Model Name	Dataset	Training Method	Accuracy
Seq2SQL	WikiSQL	Seq-to-Seq with Reinforcement Learning	59.4%
SQLNet	WikiSQL	Sketch-Based with Column Attention	63.2%
TypeSQL	WikiSQL	Type-Aware Neural Network	82.6%
IRNet	Spider	Graph Encoder + Intermediate Representation	61.9%
T5-3B	Spider, CoSQL	Fine-Tuned Transformer	70.0%
PICARD + T5-3B	CoSQL	Constrained Decoding for Dialogue-Based SQL Generation	High
RASAT+PICARD	CoSQL	Relation-Aware Self-Attention-augmented T5 with Incremental Parsing	37.4% IEX
MedT5SQL	MIMICSQL	BERT-based Encoder with LSTM Decoder for SQL Translation	High Accuracy in Medical Query Translation
EDU-T5	Custom Educational Dataset	Fine-tuned T5 Model with Cross-Attention for SQL Query Generation	Optimized
RAT-SQL	WikiSQL, Spider	Relation-Aware Transformer	69.7%
SQLova	WikiSQL	BERT + Column Attention	95%
X-SQL	WikiSQL	BERT-style pre-training with context	91.8%
EHRSQL	EHRSQL Benchmark	Benchmark Model for EHRs	N/A