Skip to main content
Ctrl+K

AlphaGenome

  • Quick start
  • Installation
  • API
    • Data
    • Models
    • Interpretation
    • Visualization
  • Tutorials
    • Visualizing predictions
    • Scoring and visualizing a single variant
    • Navigating data ontologies
    • Batch variant scoring
    • Example analysis workflow: TAL1 locus.
  • User guides
    • Essential commands
    • Model output metadata
    • How variant scoring works
    • Visualization basics
  • FAQ
  • References
  • Tutorials
  • Navigating...
  • Colab logo Colab
  • .ipynb

Navigating data ontologies

Contents

  • Setup and imports
  • Interactively view output metadata
    • How many tracks are there per output type?

Navigating data ontologies#

Tip

Open this tutorial in Google colab for interactive viewing.

# @title Install AlphaGenome

# @markdown Run this cell to install AlphaGenome.
from IPython.display import clear_output
! pip install alphagenome
clear_output()

Setup and imports#

from alphagenome.models import dna_client
from google.colab import data_table
import pandas as pd
from google.colab import userdata

data_table.enable_dataframe_formatter()

Interactively view output metadata#

First, we load the model.

dna_model = dna_client.create(userdata.get('ALPHA_GENOME_API_KEY'))

output_metadata = dna_model.output_metadata(
    dna_client.Organism.HOMO_SAPIENS
).concatenate()

Click Filter on the upper right hand side of the interactive dataframe and type a cell or tissue name like “brain” into the Search by all fields box to find the ontology_curie term corresponding to a tissue and output type of interest:

output_metadata
name strand Assay title ontology_curie biosample_name biosample_type biosample_life_stage data_source endedness genetically_modified output_type gtex_tissue histone_mark transcription_factor
0 CL:0000084 ATAC-seq . ATAC-seq CL:0000084 T-cell primary_cell adult encode paired False OutputType.ATAC NaN NaN NaN
1 CL:0000100 ATAC-seq . ATAC-seq CL:0000100 motor neuron in_vitro_differentiated_cells adult encode paired False OutputType.ATAC NaN NaN NaN
2 CL:0000236 ATAC-seq . ATAC-seq CL:0000236 B cell primary_cell adult encode paired False OutputType.ATAC NaN NaN NaN
3 CL:0000623 ATAC-seq . ATAC-seq CL:0000623 natural killer cell primary_cell adult encode paired False OutputType.ATAC NaN NaN NaN
4 CL:0000624 ATAC-seq . ATAC-seq CL:0000624 CD4-positive, alpha-beta T cell primary_cell adult encode paired False OutputType.ATAC NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7 ENCSR182QNJ - PRO-cap EFO:0001099 Caco-2 cell_line NaN encode NaN False OutputType.PROCAP NaN NaN NaN
8 ENCSR740IPL - PRO-cap EFO:0002067 K562 cell_line NaN encode NaN False OutputType.PROCAP NaN NaN NaN
9 ENCSR797DEF - PRO-cap EFO:0002819 Calu3 cell_line NaN encode NaN False OutputType.PROCAP NaN NaN NaN
10 ENCSR801ECP - PRO-cap CL:0002618 endothelial cell of umbilical vein primary_cell NaN encode NaN False OutputType.PROCAP NaN NaN NaN
11 ENCSR860TYZ - PRO-cap EFO:0001200 MCF 10A cell_line NaN encode NaN False OutputType.PROCAP NaN NaN NaN

5563 rows × 14 columns

How many tracks are there per output type?#

# Count human tracks
human_tracks = (
    dna_model.output_metadata(dna_client.Organism.HOMO_SAPIENS)
    .concatenate()
    .groupby('output_type')
    .size()
    .rename('# Human tracks')
)

# Count mouse tracks
mouse_tracks = (
    dna_model.output_metadata(dna_client.Organism.MUS_MUSCULUS)
    .concatenate()
    .groupby('output_type')
    .size()
    .rename('# Mouse tracks')
)

pd.concat([human_tracks, mouse_tracks], axis=1).astype(pd.Int64Dtype())
# Human tracks # Mouse tracks
output_type
OutputType.ATAC 167 18
OutputType.CAGE 546 188
OutputType.DNASE 305 67
OutputType.RNA_SEQ 667 173
OutputType.CHIP_HISTONE 1116 183
OutputType.CHIP_TF 1617 127
OutputType.SPLICE_SITES 4 4
OutputType.SPLICE_SITE_USAGE 734 180
OutputType.SPLICE_JUNCTIONS 367 90
OutputType.CONTACT_MAPS 28 8
OutputType.PROCAP 12 <NA>

Note that PROCAP outputs are not available for mouse.

Contents
  • Setup and imports
  • Interactively view output metadata
    • How many tracks are there per output type?

By Google LLC

© Copyright 2024, Google LLC.