notesum.ai
Published at December 10A Review on the Applications of Transformer-based language models for Nucleotide Sequence Analysis
cs.CL
cs.AI
Released Date: December 10, 2024
Authors: Nimisha Ghosh1, Daniele Santoni2, Indrajit Saha3, Giovanni Felici2
Aff.: 1Institute of Technical Education and Research, Siksha 'O' Anusandhan, India; 2Institute for System Analysis and Computer Science 'Antonio Ruberti', National Research Council of Italy; 3National Institute of Technical Teachers' Training and Research, India

| Category | Paper | Main Idea | Data Repository |
| Promoter and Enhancer | Le et al. [39] | Pre-trained BERT model is used to encode DNA sequences while feature selection using SHAP analysis is performed to select the top-rank BERT encodings to provide them as input to machine learning algorithms for promoter prediction | BERT-Promoter |
| Wang et al. [41] | A BERT-based model for predicting miRNA promoters directly from gene sequences without using any structural or biological signals | miProBERT | |
| Mai et al. [44] | Comparison of performance of state-of-the-art NLP models to predict and analyze promoters in cyanobacterium Synechocystis sp. PCC 6803 and cyanobacterium Synechococcus elongatus sp. UTEX 2973 | TSSNote | |
| An et al. [45] | Motif-Oriented DNA pre-training framework based on self-supervised design as well as can be finetuned for predicting promoters and TFBS | N/A | |
| Li et al. [46] | iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via multi-head attention mechanism for enhancer prediction | iEnhancer-ELM | |
| Luo et al. [47] | A deep learning framework for discriminating super- and typical enhancers by sequence information | SENet | |
| Methylation | Zeng et al. [56] | Prediction of DNA methylation sites using 5 Transformer based languages | MuLan-Methyl |
| Jin et al. [54] | Enables the interpretable prediction of DNA methylations based on genomic sequences only | iDNA-ABF | |
| Soylu et al. [65] | Understanding post-transcriptional 2’-O-methylation (Nm) RNA modification using BERT-based model and CNN | BERT2OME | |
| Reads | Gwak et al. [67] | Classify species based on whole-genome sequencing reads | N/A |
| Rajkumar et al. [68] | Transformer based pipeline to detect viral reads in short-read whole genome sequence data | N/A | |
| Gwak et al. [69] | A hierarchical BERT model to identify eukaryotic viruses using metagenome sequencing data | ViBE | |
| Tang et al. [72] | Identifying PLASMid contigs from short-read assemblies using Transformer | PLASMe | |
| Wichmann et al. [73] | Classifying reads using Transformers | N/A | |
| Binding | Yamada et al. [75] | Prediction of RNA–protein interactions by using BERT model pretrained on a human reference genome | BERT-RBP |
| Luo et al. [76] | DNA–protein binding prediction based on task-specific pre-training | TFBert | |
| Wang et al. [77] | Self-Attention based neural network to predict RNA-Protein binding sites | SA-Net | |
| Wu et al. [78] | Using BERT to predict TCR-antigen binding | TCR-BERT | |
| Zhang et al. [79] | Using a hybrid Transformer network to predict TF-DNA binding specificity | GHTNet | |
| Miscellaneous | Zhang et al. [82] | A Generalized pretrained tool for multiple DNA sequence analysis tasks | N/A |
| Fishman et al. [83] | Open-Source foundational models for applications in long DNA sequences | GENA-LM | |
| Dalla-Torre et al. [84] | Building and applying nucleotide Transformer for several downstram tasks | N/A | |
| Zhou et al. [85] | Proposal of a foundation model pre-trained on multi-species genomes | DNABERT-2 | |
| Clauwaert et al. [86] | A deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence | TIS Transformer | |
| Du et al. [87] | A pre-trained deep learning model for estimation of cross-immunity between drifted strains of Influenza A/H3N2 | DPCIPI | |
| Bai et al. [88] | Identification of bacteriophage genome sequences with representation learning | INHERIT | |
| Lee et al. [92] | Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using Transformer | chromoformer | |
| Raad et al. [93] | A full end-to-end deep model based on Transformers for prediction of pre-miRNAs | miRe2e | |
| Zhang et al. [94] | Prediction of multiple types of RNA modifications via biological language model | Mrmbert | |
| Jurenaite et al. [95] | Supervised learning of oncology related tasks | N/A | |
| Avsec et al. [96] | Predicting gene expression and chromatin states in humans and mice from DNA sequences | Enformer |