Differential expression analysis of phasiRNAs on each PHAS gene loci

The methods of differential expression analysis of PHAS genes and phasiRNAs are based on the following two articles:

[1] Feng J , Meyer C A , Wang Q , et al. GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics, 2012, 28(21):2782.

[2] Love M I , Huber W , Anders S . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 2014, 15(12):550.

[3] Smyth G K . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010, 26(1):139.

Due to the phasiRNA clusters are generated and verified based on each 462-nt (21-nt*11*2) or 528-nt (24-nt*11*2) double-strand structure (See details in the help page ), the phasiRNA clusters could contain the redundant phasiRNA sequences. This python script can generate count matrix of phasiRNA and merge redundant phasiRNAs according to the analysis result (The result file is generated by the analysis function). This python script was designed and written by Yuhan Fei and Baoyi Zhang.

Click here to download the script: merge_phasiRNA.py

Meaning of each parameters:
-r number of replicates (mandatory)
-i name of input files (The input file number of this parameter must be even if r>=1) (mandatory)
-o name of count_matrix of phasiRNA (mandatory)
-phas folder name of non-redundant phasiRNAs (optional)

For example:
python merge_phasiRNA.py -r 0 -i control.xls -o count_matrix.txt -phas phasiRNAs
python merge_phasiRNA.py -r 1 -i control_R1.xls treated_R1.xls -o count_matrix.txt -phas phasiRNAs
python merge_phasiRNA.py -r 2 -i control_R1.xls control_R2.xls treated_R1.xls treated_R2.xls -o output.txt -phas phasiRNAs
python merge_phasiRNA.py -r 3 -i control_R1.xls control_R2.xls control_R3.xls treated_R1.xls treated_R2.xls treated_R3.xls -o output.txt -phas phasiRNAs

Tips: The input files are derived from analysis function and DO NOT change the format of input files.


We also provided the one-line shell script to convert non-redundant phasiRNAs matrix into fasta format. Users can upload non-redundat phasiRNAs with fasta format into our web-based server WPMIAS to make further analysis for the function of these phasiRNAs.

For example:
awk '{print $2 "\n" $3}' input_file.txt > output_file.txt
grep -w phas_gene input_file.txt | awk '{print $2 "\n" $3}' > output_file.txt

Tips: The input file is derived from result page of differential expression PHAS genes analysis and DO NOT change the format of input files. The input file is non-redundant phasiRNAs which consists of six columns, including PHAS_gene, phasiRNA_ID, sequence, strand, start_position, count_number. Also, the ID of phasiRNA consists of >phas + PHAS gene + Strand(Forward/Reverse) + Name of small RNA + Count.

Tools which related to phasiRNA analysis:

Web-based Server:
Xinbin Dai, Patrick Xuechun Zhao. pssRNAMiner: a plant short small RNA regulatory cascade analysis server. Nucleic Acids Research, 2008.

Stand-alone Program:
Qingli G , Xiongfei Q , Weibo J . PhaseTank: genome-wide computational identification of phasiRNAs and their regulatory cascades. Bioinformatics, 2015, (2):284-6.

Kakrana, A., Li, P., Patel, P., Hammond, R., Anand, D., Mathioni, S.M., et al. PHASIS: A computational suite for de novo discovery and characterization of phased, siRNA-generating loci and their miRNA triggers. bioRxiv, 2017, 158832. doi: 10.1101/158832

Gebert D , Hewel C , Rosenkranz D . unitas: the universal tool for annotation of small RNAs. Bmc Genomics, 2017, 18(1):644.

Changqing Z , Guangping L , Shinong Z , et al. tasiRNAdb: a database of ta-siRNA regulatory pathways. Bioinformatics, 2014, (7):1045-6.

Crop Bioinformatics Group, College of Agriculture, NanJing Agricultural University
Contact us at huangji@njau.edu.cn (Ji Huang) or 2016201004@njau.edu.cn (Yuhan Fei)