Usage
You can use it through the Command Line Interface (CLI) or Python API.
This program is based on genotype data in plink binary file format. If your data is in VCF format, please first install plink, and the tested version is plink 1.9.
1. Data Preprocessing
1.1 Gene Range File
This file can be generated from GFF or GTF files using the provided tools mlqtl gff2range
or mlqtl gtf2range
. However, due to the flexibility of file formats, the conversion results may not fully meet expectations. It is recommended to check the generated file after conversion.
You can also manually create and modify this file. Just ensure the file format is a tab- or space-delimited text file (tsv or txt) with the following columns: chromosome, start position, end position, transcript name (this column will not be used), and gene name.
Note
Ensure that the chromosome names in this file exist in the genotype file. Otherwise, SNPs cannot be found due to chromosome name mismatches.
1.2 Genotype Data
The genotype data must be in plink binary format, which typically includes the following three files:
.bed: Binary genotype data file
.bim: Genotype marker information file
.fam: Sample information file
If your data is in VCF format, you can use the following command to convert it to plink binary format:
plink --vcf ${vcf} --snps-only --allow-extra-chr --make-bed --double-id --vcf-half-call m --out ${out_prefix}
If your data is already in plink format, ensure it contains only SNP variants. You can use the –snps-only parameter to filter out non-SNP variants:
plink --bfile ${bed} --snps-only --make-bed --out ${out_prefix}
It is recommended to filter SNP variants in the genotype data using the range file to save memory during subsequent calculations:
plink --bfile ${bed} --extract range ${range_file} --make-bed --out ${out_prefix}
1.3 Phenotype Data
The phenotype file must be a tab- or space-delimited text file (tsv or txt). The first column should be the sample name, and subsequent columns should be phenotype values. The file must include a header, and the first column’s header must be “sample”. For example:
sample trait1 trait2
sample1 1.2 3.4
sample2 2.3 4.5
2. Command Line Interface
You can use the following command to run the program:
mlqtl run -g ${bed} -p ${trait} -r ${range} -j 16 --padj -o result
Use mlqtl run --help
to view all available parameters and options.