alphagenome.data.track_data.TrackData#

class alphagenome.data.track_data.TrackData(values, metadata, resolution=1, interval=None, uns=None)[source]#

Container for storing track values and metadata.

TrackData stores multiple genomic tracks at the same resolution, stacked into an ND matrix of shape (positional_bins, num_tracks). It also contains metadata information as a pandas DataFrame with num_tracks rows.

Metadata DataFrame has two main required columns:

  • name: The name of the track.

  • strand: The strand of the track (‘+’, ‘-’, or ‘.’).

Other columns are optional.

Valid shapes of TrackData.values are:

  • [positional_bins]

  • [positional_bins, num_tracks]

  • [positional_bins, positional_bins, num_tracks]

TrackData can store both model predictions and raw data. It can optionally hold information about the genome.Interval from which the data were derived and .uns for storing additional unstructured data.

In addition to being a container, TrackData provides functionality for common aggregation and slicing operations.

values#

A numpy array of floats or integers representing the track values. Positional axes have the same length. Example valid shapes are: [num_tracks], [positional_bins, num_tracks], and [positional_bins, positional_bins, num_tracks].

metadata#

A pandas DataFrame containing metadata for each track. The DataFrame must have at least two columns: ‘name’ and ‘strand’.

resolution#

The resolution of the track data in base pairs.

interval#

An optional Interval object representing the genomic region.

uns#

An optional dictionary to store additional unstructured data.

Raises:

ValueError – If the number of tracks in values does not match the number of rows in metadata, or if metadata contains duplicate (name, strand) pairs, or if the positional axes have different lengths, or if the interval width does not match the expected width.

Attributes#

Table

interval

names

Returns an array of track names (not necessarily unique).

num_tracks

Returns the number of tracks.

ontology_terms

Returns a list of ontology terms (if available).

positional_axes

Returns a list of the positional axes.

resolution

strands

Returns an array of track strands.

uns

width

Returns the interval width covered by the tracks.

values

metadata

TrackData.interval: Interval | None = None#
TrackData.names#

Returns an array of track names (not necessarily unique).

TrackData.num_tracks#

Returns the number of tracks.

TrackData.ontology_terms#

Returns a list of ontology terms (if available).

TrackData.positional_axes#

Returns a list of the positional axes.

TrackData.resolution: int = 1#
TrackData.strands#

Returns an array of track strands.

TrackData.uns: dict[str, Any] | None = None#
TrackData.width#

Returns the interval width covered by the tracks.

TrackData.values: Union[Float32[ndarray, '*positional_bins num_tracks'], Int32[ndarray, '*positional_bins num_tracks'], Bool[ndarray, '*positional_bins num_tracks']]#
TrackData.metadata: DataFrame#

Methods#

Table

bin_index(relative_position)

Returns the bin index for a relative position.

change_resolution(resolution[, aggregation_type])

Changes the resolution of the track data.

copy()

Returns a deep copy of the TrackData object.

downsample(resolution[, aggregation_type])

Downsamples the track data to a lower resolution using sum pooling.

filter_to_negative_strand()

Filters tracks to the negative DNA strand.

filter_to_nonnegative_strand()

Filters tracks to the non-negative DNA strands (positive and unstranded).

filter_to_nonpositive_strand()

Filters tracks to the non-positive DNA strands (negative and unstranded).

filter_to_positive_strand()

Filters tracks to the positive DNA strand.

filter_to_stranded()

Filters tracks to stranded tracks (excluding unstranded).

filter_to_unstranded()

Filters tracks to unstranded tracks.

filter_tracks(mask)

Filters tracks by a boolean mask.

groupby(column)

Splits tracks into groups based on a metadata column.

pad(start_pad, end_pad)

Pads the track data along positional axes.

resize(width)

Resizes the track data by cropping or padding with a fixed center.

reverse_complement()

Reverse complements the track data and interval if present.

select_tracks_by_index(idx)

Selects tracks by numerical index.

select_tracks_by_name(names)

Selects tracks by name.

slice_by_interval(interval[, match_resolution])

Slices the track data using a genome.Interval.

slice_by_positions(start, end)

Slices the track data along the positional axes.

to_protos(*[, bytes_per_chunk, compression_type])

Serializes TrackData to protobuf messages.

upsample(resolution[, aggregation_type])

Upsamples the track data to a higher resolution by repeating existing values.

TrackData.bin_index(relative_position)[source]#

Returns the bin index for a relative position.

Parameters:

relative_position (int) – The relative position within the interval.

Return type:

int

Returns:

The corresponding bin index.

TrackData.change_resolution(resolution, aggregation_type=AggregationType.SUM)[source]#

Changes the resolution of the track data.

Parameters:
  • resolution (int) – The desired resolution in base pairs.

  • aggregation_type (AggregationType (default: <AggregationType.SUM: 'sum'>)) – The aggregation method to use for pooling the values.

Return type:

TrackData

Returns:

A new TrackData object with the new resolution.

TrackData.copy()[source]#

Returns a deep copy of the TrackData object.

Return type:

TrackData

TrackData.downsample(resolution, aggregation_type=AggregationType.SUM)[source]#

Downsamples the track data to a lower resolution using sum pooling.

Parameters:
  • resolution (int) – The desired resolution in base pairs.

  • aggregation_type (AggregationType (default: <AggregationType.SUM: 'sum'>)) – The aggregation method to use for pooling the values.

Return type:

TrackData

Returns:

A new TrackData object with downsampled values.

Raises:

ValueError – If resolution is not greater than the current resolution or not divisible by the current resolution.

TrackData.filter_to_negative_strand()[source]#

Filters tracks to the negative DNA strand.

Return type:

TrackData

TrackData.filter_to_nonnegative_strand()[source]#

Filters tracks to the non-negative DNA strands (positive and unstranded).

Return type:

TrackData

TrackData.filter_to_nonpositive_strand()[source]#

Filters tracks to the non-positive DNA strands (negative and unstranded).

Return type:

TrackData

TrackData.filter_to_positive_strand()[source]#

Filters tracks to the positive DNA strand.

Return type:

TrackData

TrackData.filter_to_stranded()[source]#

Filters tracks to stranded tracks (excluding unstranded).

Return type:

TrackData

TrackData.filter_to_unstranded()[source]#

Filters tracks to unstranded tracks.

Return type:

TrackData

TrackData.filter_tracks(mask)[source]#

Filters tracks by a boolean mask.

Parameters:

mask (ndarray | list[bool]) – A boolean mask to select tracks.

Return type:

TrackData

Returns:

A new TrackData object with the filtered tracks.

TrackData.groupby(column)[source]#

Splits tracks into groups based on a metadata column.

This method splits the tracks in the TrackData object into separate TrackData objects based on the unique values in the specified metadata column. It returns a dictionary where the keys are the unique values in the column, and the values are new TrackData objects containing the tracks corresponding to each key.

Parameters:

column (str) – The name of the metadata column to split by.

Return type:

dict[str, TrackData]

Returns:

A dictionary mapping unique values in the column to TrackData objects containing the corresponding tracks.

TrackData.pad(start_pad, end_pad)[source]#

Pads the track data along positional axes.

Parameters:
  • start_pad (int) – The amount of padding to add at the beginning.

  • end_pad (int) – The amount of padding to add at the end.

Return type:

TrackData

Returns:

A new TrackData object with padded values.

Raises:
  • ValueError – If start_pad or end_pad is not divisible by the

  • resolution.

TrackData.resize(width)[source]#

Resizes the track data by cropping or padding with a fixed center.

Parameters:

width (int) – The desired width in base pairs.

Return type:

TrackData

Returns:

A new TrackData object with resized values.

Raises:

ValueError – If width is not divisible by the resolution.

TrackData.reverse_complement()[source]#

Reverse complements the track data and interval if present.

Return type:

TrackData

Returns:

A new TrackData object with reverse complemented tracks.

TrackData.select_tracks_by_index(idx)[source]#

Selects tracks by numerical index.

Parameters:

idx (ndarray | Sequence[int]) – A list or array of numerical indices to select tracks.

Return type:

TrackData

Returns:

A new TrackData object with the selected tracks.

TrackData.select_tracks_by_name(names)[source]#

Selects tracks by name.

Parameters:

names (ndarray | Sequence[str]) – A list or array of track names to select.

Return type:

TrackData

Returns:

A new TrackData object with the selected tracks.

TrackData.slice_by_interval(interval, match_resolution=False)[source]#

Slices the track data using a genome.Interval.

Parameters:
  • interval (Interval) – The interval to slice to.

  • match_resolution (bool (default: False)) – If True, the interval will first be extended to make sure the width is divisible by resolution.

Return type:

TrackData

Returns:

A new TrackData object sliced to the interval.

Raises:

ValueError – If .interval is not specified or if the specified interval is not fully contained within the current interval.

TrackData.slice_by_positions(start, end)[source]#

Slices the track data along the positional axes.

The slicing follows Python slicing conventions (0 indexed, and includes elements up to end-1).

Parameters:
  • start (int) – The 1-bp resolution start position for slicing.

  • end (int) – The 1-bp resolution end position for slicing.

Return type:

TrackData

Returns:

A new TrackData object with the sliced values.

Raises:
  • ValueError – If (end - start) is greater than the width, or if (end -

  • start) is not divisible by the resolution.

TrackData.to_protos(*, bytes_per_chunk=0, compression_type=0)[source]#

Serializes TrackData to protobuf messages.

Parameters:
  • bytes_per_chunk (int (default: 0)) – The maximum number of bytes per tensor chunk.

  • compression_type (EnumTypeWrapper (default: 0)) – The compression type to use for the tensor chunks.

Return type:

tuple[TrackData, Sequence[TensorChunk]]

Returns:

A tuple containing the TrackData protobuf message and a sequence of TensorChunk protobuf messages.

TrackData.upsample(resolution, aggregation_type=AggregationType.SUM)[source]#

Upsamples the track data to a higher resolution by repeating existing values.

Parameters:
  • resolution (int) – The desired resolution in base pairs.

  • aggregation_type (AggregationType (default: <AggregationType.SUM: 'sum'>)) – The aggregation method to use for pooling the values.

Return type:

TrackData

Returns:

A new TrackData object with upsampled values.

Raises:

ValueError – If resolution is not lower than the current resolution or not divisible by the current resolution.