alphagenome.models.variant_scorers.tidy_anndata

Contents

alphagenome.models.variant_scorers.tidy_anndata#

alphagenome.models.variant_scorers.tidy_anndata(adata, match_gene_strand=True, include_extended_metadata=True)[source]#

Formats an AnnData score as a tidy DataFrame.

This function converts the score output from an AnnData object into a long-format pandas DataFrame, where each row represents:

  • For non-gene-centric variant scorers: Score for a variant-track pair.

  • For non-gene-centric interval scorers: Score for an interval-track pair.

  • For gene-centric variant/interval scoring: Score for a variant/interval-gene-track combination.

Parameters:
  • adata (AnnData) – An AnnData object containing scores.

  • match_gene_strand (bool (default: True)) – If True (and using gene-centric scoring), rows with mismatched gene and track strands are removed.

  • include_extended_metadata (bool (default: True)) – If True, includes additional columns derived from metadata specific to the output type, such as biosample name and type, gtex tissue, transcription factor, and histone mark, if available. If False, only includes minimal metadata columns required to unique identify a track withing a given output type: track_name and track_strand.

Return type:

DataFrame

Returns:

A pandas DataFrame with one score per row. The DataFrame includes columns for variant ID (if applicable), scored interval, gene information (if applicable), output type, variant/interval scorer, track name, ontology term, assay type, track strand, and raw score. Additional metadata such as biosample name and type, gtex tissue are also returned (where available). See full_path_to.tidy_scores() for more details on the returned columns.

Raises:

ValueError – If the input is not an AnnData object.