notesum.ai
Published at November 21An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains
cs.CL
cs.IR
Released Date: November 21, 2024
Authors: Arthur Elwing Torres1, Edleno Silva de Moura1, Altigran Soares da Silva1, Mario A. Nascimento2, Filipe Mesquita3
Aff.: 1Universidade Federal do Amazonas, Manaus, Amazonas, Brazil; 2Northeastern University, Vancouver, BC, Canada; 3Diffbot Technologies Corp., Menlo Park, California, USA

| Dataset | Training Set | Validation Set | ||||
|---|---|---|---|---|---|---|
| Sentences | Tokens | Entities | Sentences | Tokens | Entities | |
| i2b2-2010 | 13867 | 127151 | 3 | 2448 | 22390 | 3 |
| MaSciP | 2253 | 53718 | 21 | 138 | 3548 | 20 |
| BioCreative V CDR | 1000 | 116913 | 2 | 1000 | 115965 | 2 |
| JusBrasil | 1817 | 123597 | 1 | 321 | 21279 | 1 |