mlqtl.data package

Submodules

mlqtl.data.dataset module

class mlqtl.data.dataset.Dataset(snp_file=None, gene_file=None, trait_file=None)[source]

Bases: object

The Dataset class is used to load and manage the SNP, trait, and gene data

check_chrom() None[source]

Check if the chromosome names match

get(gene: str) List[Tuple[int, Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int]]]][source]

Return the snps for a given gene

Parameters

genestr

The gene name

Returns

List[Tuple[int, VectorInt8]]

A list of tuples, each tuple contains the snp index and the binary snp data

get_hap(gene: str) DataFrame[source]

Return the haplotype for a given gene

mlqtl.data.gene module

class mlqtl.data.gene.Gene(file: str)[source]

Bases: object

Gene class for handling gene position information

property chrom

Return the chromosome

chunks(p: int) List[Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]]][source]

Split the gene list into chunks of size p

Parameters

pint

the number of chunks to split the gene list into

Returns

List[VectorBool]:

a list of boolean arrays which can use to index the gene name ndarray

filter(genes: List[str]) None[source]

Filter the gene dataframe to only include the specified genes

filter_by_chr(chr: List[str]) None[source]

Filter the gene dataframe to only include genes on a specific chromosome

get(gene: str) DataFrame[source]

Retrieve the location information for a specific gene

mlqtl.data.snp module

Bases: object

Reads and store a set of binary Plink files.

Args:

prefix (str): The prefix of the binary Plink files.

Reads or write binary Plink files (BED, BIM and FAM).

base(key: int | str, snp: ndarray) ndarray[source]

Convert binary to nucleobase

property duplicated_markers
idx2marker(idx: int) str[source]

Returns the marker name from its index

marker2idx(marker: str) int64[source]

Returns the index of a marker

property nb_markers
property nb_samples
next()[source]

Returns the next marker.

Returns:

tuple: The marker name as a string and its genotypes as a numpy.ndarray.

class mlqtl.data.snp.SNP(snp_file: str)[source]

Bases: Plink

SNP data class for handling binary plink data

property chrom

Return the chromosome

static convert_onehot(X: Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int, int]]) Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int, int, int]][source]
encode(snps: List[Tuple[int, Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int]]]], onehot: bool = False, filter: Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]] = None) Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int, int]] | Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int, int, int]][source]

Encode the snp data to ml encoding

Parameters

snpsList[Tuple[int, VectorInt8]]

the snp data list to encode, each element is a tuple of (index, genotype)

onehotbool

whether to convert to onehot encoding

filterVectorBool

the index ndarray to filter the snp data

Returns

MatrixInt8 or TensorFloat64

the encoded snp data - use onehot: TensorFloat64, shape (n_samples, n_snps, 4) - not use onehot: MatrixInt8, shape (n_samples, n_snps)

get(marker: str) Tuple[int, Annotated[ndarray[tuple[int, ...], dtype[int8]], Tuple[int]]][source]

Returns the snp data for a given marker

Parameters

markerstr

the marker name

Returns

Tuple[int, VectorInt8]

the index of the marker and the genotype

property samples

Return the samples in the snp data

mlqtl.data.trait module

class mlqtl.data.trait.Trait(traits_file: str)[source]

Bases: object

Trait class for handling trait data

filter_df(fam: DataFrame) None[source]

Filters the trait data to only include samples in the fam file

get(name: str) Tuple[Annotated[ndarray[tuple[int, ...], dtype[float64]], Tuple[int]], Annotated[ndarray[tuple[int, ...], dtype[bool]], Tuple[int]]][source]

Returns the trait data for a given name

Parameters

namestr

The name of the trait

Returns

Tuple[VectorFloat64, VectorBool]

The trait value data and a boolean mask indicating which samples are not NaN

Module contents