Due to the phasiRNA clusters are generated and verified based on each 462-nt (21-nt*11*2) or 528-nt (24-nt*11*2) double-strand structure (See details in the
help page ), the phasiRNA clusters could contain the redundant phasiRNA sequences. This python script can generate count matrix of phasiRNA and merge redundant phasiRNAs according to the analysis result (The result file is generated by the
analysis function). This python script was designed and written by
Yuhan Fei and
Baoyi Zhang.
Click here to download the script: merge_phasiRNA.py
Meaning of each parameters:
-r number of replicates (mandatory)
-i name of input files (The input file number of this parameter must be even if r>=1) (mandatory)
-o name of count_matrix of phasiRNA (mandatory)
-phas folder name of non-redundant phasiRNAs (optional)
For example:
python merge_phasiRNA.py -r 0 -i control.xls -o count_matrix.txt -phas phasiRNAs
python merge_phasiRNA.py -r 1 -i control_R1.xls treated_R1.xls -o count_matrix.txt -phas phasiRNAs
python merge_phasiRNA.py -r 2 -i control_R1.xls control_R2.xls treated_R1.xls treated_R2.xls -o output.txt -phas phasiRNAs
python merge_phasiRNA.py -r 3 -i control_R1.xls control_R2.xls control_R3.xls treated_R1.xls treated_R2.xls treated_R3.xls -o output.txt -phas phasiRNAs
...
Tips: The input files are derived from
analysis function and DO NOT change the format of input files.
=========================================================================================================
We also provided the one-line shell script to convert non-redundant phasiRNAs matrix into fasta format. Users can upload non-redundat phasiRNAs with fasta format into our web-based server
WPMIAS to make further analysis for the function of these phasiRNAs.
For example:
awk '{print $2 "\n" $3}' input_file.txt > output_file.txt
grep -w phas_gene input_file.txt | awk '{print $2 "\n" $3}' > output_file.txt
Tips: The input file is derived from result page of differential expression
PHAS genes analysis and DO NOT change the format of input files. The input file is non-redundant phasiRNAs which consists of six columns, including PHAS_gene, phasiRNA_ID, sequence, strand, start_position, count_number. Also, the ID of phasiRNA consists of >phas + PHAS gene + Strand(Forward/Reverse) + Name of small RNA + Count.