Following the standard RNA-seq workflow (adapter trimming, quality control, alignment, and TPM quantification), the expression levels of TFs and their target genes are obtained. The Pearson correlation coefficient (PCC) is calculated to assess the correlation between their expression levels, with values closer to 1 or -1 indicating stronger correlation.
Extract the first 2000 bp upstream of the first CDS for each gene as the promoter region. Using the FIMO software, predict the binding of transcription factor motifs within the promoter regions. Regions with a p-value less than 0.05 are considered predicted binding sites.
Using DAP-seq data from 97 transcription factors (TFs) across 25 families, machine learning prediction models were constructed. Binding sites scanned from the TFs across the genome were used to extract 33 features for model training. The XGBoost algorithm was employed to build two types of prediction models: 18 family-specific models for the 18 TF families with more than one member, and a global model created by combining data from 96 out of the 97 TFs.
Based on co-expression analysis, promoter binding predictions, machine learning models, and experimentally validated TF-target interactions, we constructed regulatory networks that show upstream and downstream relationships of transcription factors (TFs). Different colored lines and arrows indicate the direction of regulation, and users can click on connections to view detailed regulatory information. These networks provide researchers with a clear overview of TF regulatory dynamics, helping to identify key regulatory hubs and pathways
Contact Us:
Ji Huang:
huangji@njau.edu.cn
Xueai Zhu:
2022201015@stu.njau.edu.cn
Article: A Machine Learning-based Tool for Predicting Transcription Factor-Target Gene Interactions in Soybean
DOI:
10.1093/plphys/kiaf318
Journal: Plant Physiology
Crop Bioinformatics Group, College of Agriculture,
Nanjing Agricultural University