SoyTFtarget

An Integrated Database for Predicting Transcription Factor and Target Gene Regulatory Relationships.


Welcome to SoyTFtarget!


SoyTFtarget is a robust server designed to predict the regulatory relationships between TFs and target genes in soybean. This tool integrates three core components: co-expression analysis between TFs and target genes, TF binding site prediction, and machine learning models trained with DAP-seq data.

This database integrates 562 unique RNA-seq datasets from NCBI, from which 25 representative samples of different soybean tissues were selected. Using a standard RNA-seq analysis pipeline, the expression levels of 3,747 soybean transcription factors and protein-coding genes were calculated across these samples. Pearson correlation coefficient analysis was then used to investigate the relationships between transcription factors and their target genes.

For the prediction of promoter binding, FIMO software was employed, resulting in 8,434,838 TF-target promoter pairs.

Additionally, using DAP-seq data from 97 soybean transcription factors, covering 25 families, 3,552,306 binding peaks were identified. Based on the predicted binding sequences from promoter analysis, 33 features were extracted to build machine learning models using XGBoost, resulting in 18 family-specific models and 1 global model.

Moreover, we manually curated all available 63 experimentally validated TF-target interactions. SoyTFtarget not only allows users to input TFs or genes to predict downstream targets or upstream TFs but also provides a visualization tool for regulatory networks. By integrating multiple analytical methods, SoyTFtarget provides reliable predictions of TF-target interactions and it could be a valuable resource for soybean genomics researchers.


Co-expression Analysis of TF and Target Genes

Following the standard RNA-seq workflow (adapter trimming, quality control, alignment, and TPM quantification), the expression levels of TFs and their target genes are obtained. The Pearson correlation coefficient (PCC) is calculated to assess the correlation between their expression levels, with values closer to 1 or -1 indicating stronger correlation.


Prediction Analysis of TF Binding to Target Gene Promoters

Extract the first 2000 bp upstream of the first CDS for each gene as the promoter region. Using the FIMO software, predict the binding of transcription factor motifs within the promoter regions. Regions with a p-value less than 0.05 are considered predicted binding sites.



Machine Learning Model for Predicting TF Binding Regions on Target Genes

Using DAP-seq data from 97 transcription factors (TFs) across 25 families, machine learning prediction models were constructed. Binding sites scanned from the TFs across the genome were used to extract 33 features for model training. The XGBoost algorithm was employed to build two types of prediction models: 18 family-specific models for the 18 TF families with more than one member, and a global model created by combining data from 96 out of the 97 TFs.



Integrated TF Regulatory Network Construction

Based on co-expression analysis, promoter binding predictions, machine learning models, and experimentally validated TF-target interactions, we constructed regulatory networks that show upstream and downstream relationships of transcription factors (TFs). Different colored lines and arrows indicate the direction of regulation, and users can click on connections to view detailed regulatory information. These networks provide researchers with a clear overview of TF regulatory dynamics, helping to identify key regulatory hubs and pathways




Contact Us:

Ji Huang: huangji@njau.edu.cn
Xueai Zhu: 2022201015@stu.njau.edu.cn

Article: A Machine Learning-based Tool for Predicting Transcription Factor-Target Gene Interactions in Soybean
DOI: 10.1093/plphys/kiaf318
Journal: Plant Physiology

Crop Bioinformatics Group, College of Agriculture,
Nanjing Agricultural University


Flag Counter