mlqtl.data package
Submodules
mlqtl.data.dataset module
- class mlqtl.data.dataset.Dataset(snp_file=None, gene_file=None, trait_file=None)[source]
Bases:
object
The Dataset class is used to load and manage the SNP, trait, and gene data
mlqtl.data.gene module
- class mlqtl.data.gene.Gene(file: str)[source]
Bases:
object
Gene class for handling gene position information
- property chrom
Return the chromosome
- chunks(p: int) List[Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]]] [source]
Split the gene list into chunks of size p
Parameters
- pint
the number of chunks to split the gene list into
Returns
- List[VectorBool]:
a list of boolean arrays which can use to index the gene name ndarray
- filter(genes: List[str]) None [source]
Filter the gene dataframe to only include the specified genes
mlqtl.data.plink module
Module that reads binary Plink files.
mlqtl.data.snp module
- class mlqtl.data.snp.Plink(prefix: str)[source]
Bases:
object
Reads and store a set of binary Plink files.
- Args:
prefix (str): The prefix of the binary Plink files.
Reads or write binary Plink files (BED, BIM and FAM).
- property duplicated_markers
- property nb_markers
- property nb_samples
- class mlqtl.data.snp.SNP(snp_file: str)[source]
Bases:
Plink
SNP data class for handling binary plink data
- property chrom
Return the chromosome
- static convert_onehot(X: Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int, int]]) Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int, int, int]] [source]
- encode(snps: List[Tuple[int, Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int]]]], onehot: bool = False, filter: Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]] = None) Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int, int]] | Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int, int, int]] [source]
Encode the snp data to ml encoding
Parameters
- snpsList[Tuple[int, VectorInt8]]
the snp data list to encode, each element is a tuple of (index, genotype)
- onehotbool
whether to convert to onehot encoding
- filterVectorBool
the index ndarray to filter the snp data
Returns
- MatrixInt8 or TensorFloat64
the encoded snp data - use onehot: TensorFloat64, shape (n_samples, n_snps, 4) - not use onehot: MatrixInt8, shape (n_samples, n_snps)
- get(marker: str) Tuple[int, Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int]]] [source]
Returns the snp data for a given marker
Parameters
- markerstr
the marker name
Returns
- Tuple[int, VectorInt8]
the index of the marker and the genotype
- property samples
Return the samples in the snp data
mlqtl.data.trait module
- class mlqtl.data.trait.Trait(traits_file: str)[source]
Bases:
object
Trait class for handling trait data
- filter_df(fam: DataFrame) None [source]
Filters the trait data to only include samples in the fam file
- get(name: str) Tuple[Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int]], Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]]] [source]
Returns the trait data for a given name
Parameters
- namestr
The name of the trait
Returns
- Tuple[VectorFloat64, VectorBool]
The trait value data and a boolean mask indicating which samples are not NaN