alphagenome.data.gene_annotation.filter_to_longest_transcript

alphagenome.data.gene_annotation.filter_to_longest_transcript#

alphagenome.data.gene_annotation.filter_to_longest_transcript(gtf)[source]#

Filter GTF entries to only the longest transcript per gene.

Parameters:

gtf (DataFrame) – pd.DataFrame of GENCODE GTF entries. Must contain columns ‘Feature’, ‘End’, ‘Start’, ‘gene_id’, and ‘transcript_id’.

Return type:

DataFrame

Returns:

pd.DataFrame of GENCODE GTF entries subset to rows with the longest transcript per gene.