Data#

Classes and utilities for manipulating genomics data.

Fold Intervals#

data.fold_intervals.Subset(value)

Subset of the data.

data.fold_intervals.get_all_folds()

Returns the names of all data folds.

data.fold_intervals.get_fold_names(...)

Returns the data folds used for the model version.

data.fold_intervals.get_fold_intervals(...)

Returns the training intervals for the model version.

Genome#

data.genome.Strand(value)

Represents the strand of a DNA sequence.

data.genome.Interval(chromosome, start, end)

Represents a genomic interval.

data.genome.Variant(chromosome, position, ...)

Represents a genomic variant/mutation.

data.genome.Junction(chromosome, start, end)

Represents a splice junction.

Gene annotation#

data.gene_annotation.TranscriptType(value)

Valid Transcript types available in the GENCODE GTF.

data.gene_annotation.extract_tss(gtf[, feature])

Extract transcription start sites (TSS) from a DataFrame.

data.gene_annotation.filter_transcript_type(gtf)

Filter GTF entries by transcript types.

data.gene_annotation.filter_protein_coding(gtf)

Filter GTF entries to only protein-coding genes.

data.gene_annotation.filter_to_longest_transcript(gtf)

Filter GTF entries to only the longest transcript per gene.

data.gene_annotation.filter_transcript_support_level(...)

Filter GTF to only transcripts with specific GENCODE support levels.

data.gene_annotation.get_gene_interval(gtf)

Returns a stranded genome.Interval given a gene identifier.

data.gene_annotation.get_gene_intervals(gtf)

Returns a list of stranded `genome.Interval`s for the given identifiers.

Junction data#

data.junction_data.JunctionData(junctions, ...)

Container for storing splice junction data.

data.junction_data.get_junctions_to_plot(*, ...)

Gets a list of junctions to plot.

Ontology#

data.ontology.OntologyType(value)

Supported ontology types.

data.ontology.OntologyTerm(type, id)

A single biological ontology term.

Track data#

data.track_data.TrackData(values, metadata)

Container for storing track values and metadata.

data.track_data.concat(track_datas[, ...])

Concatenates multiple TrackData objects along the track dimension.

data.track_data.interleave(track_datas, ...)

Interleaves multiple TrackData objects by alternating rows.

Transcript#

data.transcript.Transcript(exons[, cds, ...])

Represents transcript object containing attributes from a GTF file.

data.transcript.TranscriptExtractor(gtf_df)

Transcript extractor from gtf.