alphagenome.data.gene_annotation.get_gene_interval

alphagenome.data.gene_annotation.get_gene_interval#

alphagenome.data.gene_annotation.get_gene_interval(gtf, gene_symbol=None, gene_id=None)[source]#

Returns a stranded genome.Interval given a gene identifier.

Either gene_symbol or gene_id must be set, but not both.

Parameters:
  • gtf (DataFrame) – pd.DataFrame of GENCODE GTF entries. Must contain columns ‘Feature’, ‘gene_name’, ‘gene_id’, ‘Chromosome’, ‘Start’, ‘End’, and ‘Strand’.

  • gene_symbol (Optional[str] (default: None)) – A gene name or gene symbol (e.g., ‘EGFR’, ‘TNF’, ‘TP53’)

  • gene_id (Optional[str] (default: None)) – An Ensembl gene ID, which can be patched (e.g. ‘ENSG00000141510.17’) or unpatched (e.g., ‘ENSG00000141510’).

Return type:

Interval

Returns:

A genome.Interval for the given gene identifier.

Raises:

ValueError – If neither or both gene_symbol and gene_id are set, or if no interval or multiple intervals are found for the given gene identifier.