notesum.ai

Published at December 10

A Review on the Applications of Transformer-based language models for Nucleotide Sequence Analysis

cs.CL

cs.AI

Released Date: December 10, 2024

Authors: Nimisha Ghosh¹, Daniele Santoni², Indrajit Saha³, Giovanni Felici²

Aff.: ¹Institute of Technical Education and Research, Siksha 'O' Anusandhan, India; ²Institute for System Analysis and Computer Science 'Antonio Ruberti', National Research Council of Italy; ³National Institute of Technical Teachers' Training and Research, India

Arxiv: http://arxiv.org/pdf/2412.07201v1

Category	Paper	Main Idea	Data Repository
Promoter and Enhancer	Le et al. [39]	Pre-trained BERT model is used to encode DNA sequences while feature selection using SHAP analysis is performed to select the top-rank BERT encodings to provide them as input to machine learning algorithms for promoter prediction	BERT-Promoter
	Wang et al. [41]	A BERT-based model for predicting miRNA promoters directly from gene sequences without using any structural or biological signals	miProBERT
	Mai et al. [44]	Comparison of performance of state-of-the-art NLP models to predict and analyze promoters in cyanobacterium Synechocystis sp. PCC 6803 and cyanobacterium Synechococcus elongatus sp. UTEX 2973	TSSNote
	An et al. [45]	Motif-Oriented DNA pre-training framework based on self-supervised design as well as can be finetuned for predicting promoters and TFBS	N/A
	Li et al. [46]	iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via multi-head attention mechanism for enhancer prediction	iEnhancer-ELM
	Luo et al. [47]	A deep learning framework for discriminating super- and typical enhancers by sequence information	SENet
Methylation	Zeng et al. [56]	Prediction of DNA methylation sites using 5 Transformer based languages	MuLan-Methyl
	Jin et al. [54]	Enables the interpretable prediction of DNA methylations based on genomic sequences only	iDNA-ABF
	Soylu et al. [65]	Understanding post-transcriptional 2’-O-methylation (Nm) RNA modification using BERT-based model and CNN	BERT2OME
Reads	Gwak et al. [67]	Classify species based on whole-genome sequencing reads	N/A
	Rajkumar et al. [68]	Transformer based pipeline to detect viral reads in short-read whole genome sequence data	N/A
	Gwak et al. [69]	A hierarchical BERT model to identify eukaryotic viruses using metagenome sequencing data	ViBE
	Tang et al. [72]	Identifying PLASMid contigs from short-read assemblies using Transformer	PLASMe
	Wichmann et al. [73]	Classifying reads using Transformers	N/A
Binding	Yamada et al. [75]	Prediction of RNA–protein interactions by using BERT model pretrained on a human reference genome	BERT-RBP
	Luo et al. [76]	DNA–protein binding prediction based on task-specific pre-training	TFBert
	Wang et al. [77]	Self-Attention based neural network to predict RNA-Protein binding sites	SA-Net
	Wu et al. [78]	Using BERT to predict TCR-antigen binding	TCR-BERT
	Zhang et al. [79]	Using a hybrid Transformer network to predict TF-DNA binding specificity	GHTNet
Miscellaneous	Zhang et al. [82]	A Generalized pretrained tool for multiple DNA sequence analysis tasks	N/A
	Fishman et al. [83]	Open-Source foundational models for applications in long DNA sequences	GENA-LM
	Dalla-Torre et al. [84]	Building and applying nucleotide Transformer for several downstram tasks	N/A
	Zhou et al. [85]	Proposal of a foundation model pre-trained on multi-species genomes	DNABERT-2
	Clauwaert et al. [86]	A deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence	TIS Transformer
	Du et al. [87]	A pre-trained deep learning model for estimation of cross-immunity between drifted strains of Influenza A/H3N2	DPCIPI
	Bai et al. [88]	Identification of bacteriophage genome sequences with representation learning	INHERIT
	Lee et al. [92]	Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using Transformer	chromoformer
	Raad et al. [93]	A full end-to-end deep model based on Transformers for prediction of pre-miRNAs	miRe2e
	Zhang et al. [94]	Prediction of multiple types of RNA modifications via biological language model	Mrmbert
	Jurenaite et al. [95]	Supervised learning of oncology related tasks	N/A
	Avsec et al. [96]	Predicting gene expression and chromatin states in humans and mice from DNA sequences	Enformer