alphagenome.data.genome.Interval#

class alphagenome.data.genome.Interval(chromosome, start, end, strand='.', name='', info=<factory>)[source]#

Represents a genomic interval.

A genomic interval is a region on a chromosome defined by a start and end position. This class provides methods for manipulating and comparing intervals, and for calculating coverage and overlap.

chromosome#

The chromosome name (e.g., ‘chr1’, ‘1’).

start#

The 0-based start position.

end#

The 0-based end position (must be greater than or equal to start).

strand#

The strand of the interval (‘+’, ‘-’, or ‘.’). Defaults to ‘.’ (unstranded).

name#

An optional name for the interval.

info#

An optional dictionary to store additional information.

negative_strand#

True if the interval is on the negative strand, False otherwise.

width#

The width of the interval (end - start).

Attributes#

Table

name

negative_strand

Returns True if interval is on the negative strand, False otherwise.

strand

width

Returns the width of the interval.

chromosome

start

end

info

Interval.name: str = ''#
Interval.negative_strand#

Returns True if interval is on the negative strand, False otherwise.

Interval.strand: str = '.'#
Interval.width#

Returns the width of the interval.

Interval.chromosome: str#
Interval.start: int#
Interval.end: int#
Interval.info: dict[str, Any]#

Methods#

Table

as_unstranded()

Returns an unstranded copy of the interval.

binary_mask(intervals[, bin_size])

Boolean mask True if any interval overlaps the bin: coverage > 0.

binary_mask_stranded(intervals[, bin_size])

Boolean mask True if any interval overlaps the bin: coverage > 0.

boundary_shift([start_offset, end_offset, ...])

Extends or shrinks the interval by adjusting the positions with padding.

center([use_strand])

Computes the center of the interval.

contains(interval)

Checks if this interval completely contains another interval.

copy()

Returns a deep copy of the interval.

coverage(intervals, *[, bin_size])

Computes coverage track from sequence of intervals overlapping interval.

coverage_stranded(intervals, *[, bin_size])

Computes a coverage track from intervals overlapping this interval.

from_interval_dict(interval)

Creates an Interval from a dictionary.

from_proto(proto)

Creates an Interval from a protobuf message.

from_pyranges_dict(row[, ignore_info])

Creates an Interval from a pyranges-like dictionary.

from_str(string)

Creates an Interval from a string (e.g., 'chr1:100-200:+').

intersect(interval)

Returns the intersection of this interval with another interval.

overlap_ranges(intervals)

Returns overlapping ranges from intervals overlapping this interval.

overlaps(interval)

Checks if this interval overlaps with another interval.

pad(start_pad, end_pad, *[, use_strand])

Pads the interval by adding the specified padding to the start and end.

pad_inplace(start_pad, end_pad, *[, use_strand])

Pads the interval in place by adding padding to the start and end.

resize(width[, use_strand])

Resizes the interval to a new width, centered around the original center.

resize_inplace(width[, use_strand])

Resizes the interval in place, centered around the original center.

shift(offset[, use_strand])

Shifts the interval by the given offset.

swap_strand()

Swaps the strand of the interval.

to_interval_dict()

Converts the interval to a dictionary.

to_proto()

Converts the interval to a protobuf message.

to_pyranges_dict()

Converts the interval to a pyranges-like dictionary.

truncate([reference_length])

Truncates the interval to fit within the valid reference range.

within_reference([reference_length])

Checks if the interval is within the valid reference range.

Interval.as_unstranded()[source]#

Returns an unstranded copy of the interval.

Return type:

Self

Interval.binary_mask(intervals, bin_size=1)[source]#

Boolean mask True if any interval overlaps the bin: coverage > 0.

Return type:

ndarray

Interval.binary_mask_stranded(intervals, bin_size=1)[source]#

Boolean mask True if any interval overlaps the bin: coverage > 0.

Return type:

ndarray

Interval.boundary_shift(start_offset=0, end_offset=0, use_strand=True)[source]#

Extends or shrinks the interval by adjusting the positions with padding.

Parameters:
  • start_offset (int (default: 0)) – The amount to shift the start position.

  • end_offset (int (default: 0)) – The amount to shift the end position.

  • use_strand (bool (default: True)) – If True, the offsets are applied in reverse for negative strand intervals.

Return type:

Self

Returns:

A new interval with adjusted boundaries.

Interval.center(use_strand=True)[source]#

Computes the center of the interval.

For intervals with an odd width, the center is rounded down to the nearest integer.

If use_strand is True and the interval is on the negative strand, the center is calculated differently to maintain consistency when stacking sequences from different intervals oriented in the forward strand direction. This ensures that the relative distance between the interval’s upstream boundary and its center is preserved.

Parameters:

use_strand (bool (default: True)) – If True, the strand of the interval is considered when calculating the center.

Return type:

int

Returns:

The integer representing the center position of the interval.

Examples

>>> Interval('1', 1, 3, '+').center()
2
>>> Interval('1', 1, 3, '-').center()  # Strand doesn't matter.
2
>>> Interval('1', 1, 4, '+').center()
3
>>> Interval('1', 1, 4, '-').center()
2
>>> Interval('1', 1, 4, '-').center()
2
>>> Interval('1', 1, 4, '+').center(use_strand=False)
3
>>> Interval('1', 1, 2, '-').center()
1
Interval.contains(interval)[source]#

Checks if this interval completely contains another interval.

Return type:

bool

Interval.copy()[source]#

Returns a deep copy of the interval.

Return type:

Self

Interval.coverage(intervals, *, bin_size=1)[source]#

Computes coverage track from sequence of intervals overlapping interval.

This method calculates the coverage of this interval by a set of other intervals. The coverage is defined as the number of intervals that overlap each position within this interval.

The bin_size parameter allows you to bin the coverage into equal-sized windows. This can be useful for summarizing coverage over larger regions. If bin_size is 1, the coverage is calculated at single-base resolution.

Parameters:
  • intervals (Sequence[Self]) – A sequence of Interval objects that may overlap this interval.

  • bin_size (int (default: 1)) – The size of the bins used to calculate coverage. Must be a positive integer that divides the width of the interval.

Return type:

ndarray

Returns:

A 1D numpy array representing the coverage track. The length of the array is self.width // bin_size. Each element in the array represents the summed coverage within the corresponding bin.

Raises:

ValueError – If bin_size is not a positive integer or if the interval width is not divisible by bin_size.

Interval.coverage_stranded(intervals, *, bin_size=1)[source]#

Computes a coverage track from intervals overlapping this interval.

This method considers the strand information of both self and intervals.

Parameters:
  • intervals (Sequence[Self]) – Sequence of intervals possibly overlapping self.

  • bin_size (int (default: 1)) – Resolution at which to bin the output coverage track. Coverage within each bin (if larger than 1) will be summarized using sum().

Returns:

, 0] represents coverage for intervals on the same strand as self and output[:, 1] represents coverage of intervals on the opposite strand.

Return type:

Numpy array of shape (self.width // bin_size, 2) where output[

classmethod Interval.from_interval_dict(interval)[source]#

Creates an Interval from a dictionary.

Return type:

Self

classmethod Interval.from_proto(proto)[source]#

Creates an Interval from a protobuf message.

Return type:

Self

classmethod Interval.from_pyranges_dict(row, ignore_info=False)[source]#

Creates an Interval from a pyranges-like dictionary.

This method constructs an Interval object from a dictionary that follows the pyranges format, such as a row from a pandas.DataFrame converted to a dict.

The dictionary should have the following keys:

  • ‘Chromosome’: The chromosome name.

  • ‘Start’: The start position.

  • ‘End’: The end position.

  • ‘Strand’: The strand (optional, defaults to unstranded).

  • ‘Name’: The interval name (optional).

Any other keys in the dictionary will be added to the info attribute of the Interval object, unless ignore_info is set to True.

Parameters:
  • row (Mapping[str, Any]) – A dictionary containing interval data.

  • ignore_info (bool (default: False)) – If True, any keys in the dictionary that are not part of the standard pyranges columns (‘Chromosome’, ‘Start’, ‘End’, ‘Strand’, ‘Name’) will not be added to the info attribute.

Return type:

Interval

Returns:

An Interval object created from the input dictionary.

classmethod Interval.from_str(string)[source]#

Creates an Interval from a string (e.g., ‘chr1:100-200:+’).

Return type:

Self

Interval.intersect(interval)[source]#

Returns the intersection of this interval with another interval.

Return type:

Optional[Self]

Interval.overlap_ranges(intervals)[source]#

Returns overlapping ranges from intervals overlapping this interval.

Parameters:

intervals (Sequence[Self]) – Sequence of candidate intervals to test for overlap.

Return type:

ndarray

Returns:

2D numpy array indicating the start and end of the overlapping ranges.

Interval.overlaps(interval)[source]#

Checks if this interval overlaps with another interval.

Return type:

bool

Interval.pad(start_pad, end_pad, *, use_strand=True)[source]#

Pads the interval by adding the specified padding to the start and end.

Parameters:
  • start_pad (int) – The amount of padding to add to the start.

  • end_pad (int) – The amount of padding to add to the end.

  • use_strand (bool (default: True)) – If True, padding is applied in reverse for negative strand intervals.

Return type:

Self

Returns:

A new padded interval.

Interval.pad_inplace(start_pad, end_pad, *, use_strand=True)[source]#

Pads the interval in place by adding padding to the start and end.

Parameters:
  • start_pad (int) – The amount of padding to add to the start.

  • end_pad (int) – The amount of padding to add to the end.

  • use_strand (bool (default: True)) – If True, padding is applied in reverse for negative strand intervals.

Interval.resize(width, use_strand=True)[source]#

Resizes the interval to a new width, centered around the original center.

Parameters:
  • width (int) – The new width of the interval.

  • use_strand (bool (default: True)) – If True, resizing considers the strand orientation.

Return type:

Self

Returns:

A new resized interval.

Interval.resize_inplace(width, use_strand=True)[source]#

Resizes the interval in place, centered around the original center.

Parameters:
  • width (int) – The new width of the interval.

  • use_strand (bool (default: True)) – If True, resizing considers the strand orientation.

Return type:

None

Interval.shift(offset, use_strand=True)[source]#

Shifts the interval by the given offset.

Parameters:
  • offset (int) – The amount to shift the interval.

  • use_strand (bool (default: True)) – If True, the shift direction is reversed for negative strand intervals.

Return type:

Self

Returns:

A new shifted interval.

Interval.swap_strand()[source]#

Swaps the strand of the interval.

Return type:

Self

Interval.to_interval_dict()[source]#

Converts the interval to a dictionary.

Return type:

dict[str, str | int]

Interval.to_proto()[source]#

Converts the interval to a protobuf message.

Return type:

Interval

Interval.to_pyranges_dict()[source]#

Converts the interval to a pyranges-like dictionary.

Return type:

dict[str, int | str]

Interval.truncate(reference_length=9223372036854775807)[source]#

Truncates the interval to fit within the valid reference range.

Return type:

Self

Interval.within_reference(reference_length=9223372036854775807)[source]#

Checks if the interval is within the valid reference range.

Return type:

bool