alphagenome.data.transcript.TranscriptExtractor#

class alphagenome.data.transcript.TranscriptExtractor(gtf_df)[source]#

Transcript extractor from gtf.

Methods#

Table

cache_transcripts()

Speed up extract() by converting GTF to dictionary of Transcripts.

extract(interval)

Extract transcripts overlapping an interval.

TranscriptExtractor.cache_transcripts()[source]#

Speed up extract() by converting GTF to dictionary of Transcripts.

This may take ca 11 minutes on the full human genome GTF of 84k protein coding transcripts and 15 s on chr22 (1.5k transcripts).

Running cache_transcripts() will speed up .extract() by ca 5-10x: (11 ms vs 65 ms tested on chr22, or 15 ms vs 160 ms on whole genome).

Return type:

None

TranscriptExtractor.extract(interval)[source]#

Extract transcripts overlapping an interval.

Parameters:

interval (Interval) – Interval used to overlap with transcripts.

Return type:

list[Transcript]

Returns:

List of transcript overlapping interval.