alphagenome.data.gene_annotation.get_gene_intervals

alphagenome.data.gene_annotation.get_gene_intervals#

alphagenome.data.gene_annotation.get_gene_intervals(gtf, gene_symbols=None, gene_ids=None)[source]#

Returns a list of stranded `genome.Interval`s for the given identifiers.

Parameters:
  • gtf (DataFrame) – pd.DataFrame of GENCODE GTF entries. Must contain columns ‘Feature’, ‘gene_name’, ‘gene_id’, ‘Chromosome’, ‘Start’, ‘End’, and ‘Strand’.

  • gene_symbols (Optional[Sequence[str]] (default: None)) – A sequence of gene names or gene symbols (e.g., [‘EGFR’, ‘TNF’, ‘TP53’]). Matching is case-insensitive.

  • gene_ids (Optional[Sequence[str]] (default: None)) – A sequence of Ensembl gene IDs, which can be patched (e.g. [‘ENSG00000141510.17’]) or unpatched (e.g., [‘ENSG00000141510’]). Matching is done on unpatched IDs.

Return type:

list[Interval]

Returns:

A list of `genome.Interval`s for the given identifiers. The returned list of intervals is in the same order as the input gene identifiers.

Raises:

ValueError – If neither or both gene_symbols and gene_ids are set, or if no interval or multiple intervals are found for any of the given gene identifiers.