Data#

Classes and utilities for manipulating genomics data.

Fold Intervals#

data.fold_intervals.Subset(value)

Subset of the data.

data.fold_intervals.get_all_folds()

Returns the names of all data folds.

data.fold_intervals.get_fold_names(...)

Returns the data folds used for the model version.

data.fold_intervals.get_fold_intervals(...)

Returns the training intervals for the model version.

Genome#

data.genome.Strand(value)

Represents the strand of a DNA sequence.

data.genome.Interval(chromosome, start, end)

Represents a genomic interval.

data.genome.Variant(chromosome, position, ...)

Represents a genomic variant/mutation.

data.genome.Junction(chromosome, start, end)

Represents a splice junction.

Gene annotation#

data.gene_annotation.TranscriptType(value)

Valid Transcript types available in the GENCODE GTF.

data.gene_annotation.extract_tss(gtf[, feature])

Extract transcription start sites (TSS) from a DataFrame.

data.gene_annotation.filter_transcript_type(gtf)

Filter GTF entries by transcript types.

data.gene_annotation.filter_protein_coding(gtf)

Filter GTF entries to only protein-coding genes.

data.gene_annotation.filter_to_longest_transcript(gtf)

Filter GTF entries to only the longest transcript per gene.

data.gene_annotation.filter_transcript_support_level(...)

Filter GTF to only transcripts with specific GENCODE support levels.

Ontology#

data.ontology.OntologyType(value)

Supported ontology types.

data.ontology.OntologyTerm(type, id)

A single biological ontology term.

Track data#

data.track_data.TrackData(values, metadata)

Container for storing track values and metadata.

data.track_data.concat(track_datas[, ...])

Concatenates multiple TrackData objects along the track dimension.

data.track_data.interleave(track_datas, ...)

Interleaves multiple TrackData objects by alternating rows.

data.track_data.metadata_to_proto(metadata)

Converts track metadata to a TracksMetadata protobuf message.

data.track_data.metadata_from_proto(proto)

Creates track metadata from a TracksMetadata protobuf message.

data.track_data.from_protos(proto[, chunks])

Creates a TrackData object from protobuf messages.

Transcript#

data.transcript.Transcript(exons[, cds, ...])

Represents transcript object containing attributes from a GTF file.

data.transcript.TranscriptExtractor(gtf_df)

Transcript extractor from gtf.