alphagenome.data.gene_annotation.filter_protein_coding

alphagenome.data.gene_annotation.filter_protein_coding#

alphagenome.data.gene_annotation.filter_protein_coding(gtf, include_gene_entries=False)[source]#

Filter GTF entries to only protein-coding genes.

Parameters:
  • gtf (DataFrame) – pd.DataFrame of GENCODE GTF entries. This data frame must contain a column named ‘transcript_type’ or ‘transcript_biotype’.

  • include_gene_entries (bool (default: False)) – Whether to include gene entries in addition to transcript entries.

Return type:

DataFrame

Returns:

pd.DataFrame of GENCODE GTF entries subset to rows with protein-coding genes.