notesum.ai
Published at November 25Deciphering genomic codes using advanced NLP techniques: a scoping review
q-bio.GN
cs.AI
Released Date: November 25, 2024
Authors: Shuyan Cheng1, Yishu Wei1, Yiliang Zhou1, Zihan Xu1, Drew N Wright2, Jinze Liu3, Yifan Peng1
Aff.: 1Department of Population Health Sciences, Weill Cornell Medicine; 2Samuel J. Wood Library & C.V. Starr Biomedical Information Center, Weill Cornell Medicine; 3School of Public Health, Virginia Commonwealth University
| Model | Accuracy | F1 | MCC | ROCAUC | Specificity | Precision | Recall |
| BERT-5mC [18] | 0.933 | - | 0.656 | 0.966 | 0.938 | - | 0.872 |
| DNABERT [12]a | 0.965 | 0.965 | - | 0.930 | - | - | - |
| SETOMIC [34] | 0.950 | 0.921 | - | 0.997 | - | 0.945 | - |
| SETQUENCE [34] | 0.475 | 0.359 | - | 0.910 | - | 0.375 | - |
| BERT-CNN [13] | 0.756 | - | 0.514 | - | 0.712 | - | 0.800 |
| TFBERT [15] | 0.880 | 0.880 | 0.762 | 0.947 | - | 0.882 | 0.880 |
| IGnet [35] | 0.838 | 0.824 | - | 0.924 | - | 0.875 | 0.778 |
| MuLan-Methyl [25] | 0.948 | 0.950 | - | 0.968 | - | - | 0.979 |
| moDNA [22] | 0.862 | 0.862 | 0.725 | 0.935 | - | 0.863 | 0.862 |
| DistilBERT+CRF+Attention Mask [11] | 0.965 | 0.735 | - | - | 0.959 | 0.691 | 0.852 |
| BERT+CRF (with/without) [23] | 0.973 | 0.834 | - | - | 0.962 | 0.780 | 0.897 |
| BERT-Promoter [14] | 0.855 | - | - | - | 0.866 | - | 0.843 |
| DeepViFi (pipeline) [16] | 0.960 | - | - | 0.94 | - | 0.996 | 1.000 |
| GENEMASK-based [17] | 0.898 | - | - | 0.962 | - | - | - |
| MSCAN [26] | 0.957 | 0.713 | 0.710 | 0.937 | 0.994 | 0.905 | - |
| MTTLm6 [27] | 0.699 | 0.713 | 0.399 | 0.771 | 0.649 | 0.681 | - |
| BioDGW-CMI [29]a | 0.885 | 0.885 | - | 0.948 | - | 0.885 | 0.885 |
| BCMCMI [28] | 0.832 | 0.836 | 0.667 | 0.904 | - | 0.808 | 0.868 |
| MiTDS [30] | 0.770 | 0.810 | - | - | - | - | 0.960 |