alphagenome.data.gene_annotation.get_gene_intervals#
- alphagenome.data.gene_annotation.get_gene_intervals(gtf, gene_symbols=None, gene_ids=None)[source]#
Returns a list of stranded `genome.Interval`s for the given identifiers.
- Parameters:
gtf (
DataFrame
) – pd.DataFrame of GENCODE GTF entries. Must contain columns ‘Feature’, ‘gene_name’, ‘gene_id’, ‘Chromosome’, ‘Start’, ‘End’, and ‘Strand’.gene_symbols (
Optional
[Sequence
[str
]] (default:None
)) – A sequence of gene names or gene symbols (e.g., [‘EGFR’, ‘TNF’, ‘TP53’]). Matching is case-insensitive.gene_ids (
Optional
[Sequence
[str
]] (default:None
)) – A sequence of Ensembl gene IDs, which can be patched (e.g. [‘ENSG00000141510.17’]) or unpatched (e.g., [‘ENSG00000141510’]). Matching is done on unpatched IDs.
- Return type:
- Returns:
A list of `genome.Interval`s for the given identifiers. The returned list of intervals is in the same order as the input gene identifiers.
- Raises:
ValueError – If neither or both gene_symbols and gene_ids are set, or if no interval or multiple intervals are found for any of the given gene identifiers.