alphagenome.interpretation.ism.ism_matrix

Contents

alphagenome.interpretation.ism.ism_matrix#

alphagenome.interpretation.ism.ism_matrix(variant_scores, variants, interval=None, multiply_by_sequence=True, vocabulary='ACGT')[source]#

Construct the ISM (position, base) matrix from individual ISM scores.

This function returns the relative effect of the variants compared to the per-position average: score[position, base] - mean(score[position, :]]).

Parameters:
  • variant_scores (Sequence[float]) – Variant effect scores corresponding to the variants. These could be obtained from the score_variants() output summarised to a single scalar. Summarisation could be for example be obtained by selecting a specific variant scorer output and extract a specific value from the (variant, track) matrix.

  • variants (Sequence[Variant]) – Sequence of variants used to transform into the ISM matrix.

  • interval (Optional[Interval] (default: None)) – Interval for which to get the contribution scores. All variants need to be contained within that interval. If None, it will be automatically inferred from variants.

  • multiply_by_sequence (bool (default: True)) – If True, only return non-zero values at one-hot-encoded reference genome sequence bases.

  • vocabulary (str (default: 'ACGT')) – Vocabulary of possible alternative bases contained in sequence. The order determines the column order of the returned matrix.

Return type:

ndarray

Returns:

Matrix of shape (interval.width, 4) containing variant scores.