alphagenome.data.track_data.TrackData#
- class alphagenome.data.track_data.TrackData(values, metadata, resolution=1, interval=None, uns=None)[source]#
Container for storing track values and metadata.
TrackData
stores multiple genomic tracks at the same resolution, stacked into an ND matrix of shape (positional_bins, num_tracks). It also contains metadata information as a pandas DataFrame withnum_tracks
rows.Metadata DataFrame has two main required columns:
name: The name of the track.
strand: The strand of the track (‘+’, ‘-’, or ‘.’).
Other columns are optional.
Valid shapes of
TrackData.values
are:[positional_bins]
[positional_bins, num_tracks]
[positional_bins, positional_bins, num_tracks]
…
TrackData
can store both model predictions and raw data. It can optionally hold information about thegenome.Interval
from which the data were derived and.uns
for storing additional unstructured data.In addition to being a container,
TrackData
provides functionality for common aggregation and slicing operations.- values#
A numpy array of floats or integers representing the track values. Positional axes have the same length. Example valid shapes are: [num_tracks], [positional_bins, num_tracks], and [positional_bins, positional_bins, num_tracks].
- metadata#
A pandas DataFrame containing metadata for each track. The DataFrame must have at least two columns: ‘name’ and ‘strand’.
- resolution#
The resolution of the track data in base pairs.
- interval#
An optional
Interval
object representing the genomic region.
- uns#
An optional dictionary to store additional unstructured data.
- Raises:
ValueError – If the number of tracks in
values
does not match the number of rows inmetadata
, or ifmetadata
contains duplicate (name, strand) pairs, or if the positional axes have different lengths, or if the interval width does not match the expected width.
Attributes#
Table
Returns an array of track names (not necessarily unique). |
|
Returns the number of tracks. |
|
Returns a list of ontology terms (if available). |
|
Returns a list of the positional axes. |
|
Returns an array of track strands. |
|
Returns the interval width covered by the tracks. |
|
- TrackData.names#
Returns an array of track names (not necessarily unique).
- TrackData.num_tracks#
Returns the number of tracks.
- TrackData.ontology_terms#
Returns a list of ontology terms (if available).
- TrackData.positional_axes#
Returns a list of the positional axes.
- TrackData.strands#
Returns an array of track strands.
- TrackData.width#
Returns the interval width covered by the tracks.
Methods#
Table
|
Returns the bin index for a relative position. |
|
Changes the resolution of the track data. |
|
Returns a deep copy of the |
|
Downsamples the track data to a lower resolution using sum pooling. |
Filters tracks to the negative DNA strand. |
|
Filters tracks to the non-negative DNA strands (positive and unstranded). |
|
Filters tracks to the non-positive DNA strands (negative and unstranded). |
|
Filters tracks to the positive DNA strand. |
|
Filters tracks to stranded tracks (excluding unstranded). |
|
Filters tracks to unstranded tracks. |
|
|
Filters tracks by a boolean mask. |
|
Splits tracks into groups based on a metadata column. |
|
Pads the track data along positional axes. |
|
Resizes the track data by cropping or padding with a fixed center. |
Reverse complements the track data and interval if present. |
|
Selects tracks by numerical index. |
|
|
Selects tracks by name. |
|
Slices the track data using a |
|
Slices the track data along the positional axes. |
|
Serializes |
|
Upsamples the track data to a higher resolution by repeating existing values. |
- TrackData.change_resolution(resolution, aggregation_type=AggregationType.SUM)[source]#
Changes the resolution of the track data.
- TrackData.downsample(resolution, aggregation_type=AggregationType.SUM)[source]#
Downsamples the track data to a lower resolution using sum pooling.
- Parameters:
resolution (
int
) – The desired resolution in base pairs.aggregation_type (
AggregationType
(default:<AggregationType.SUM: 'sum'>
)) – The aggregation method to use for pooling the values.
- Return type:
- Returns:
A new
TrackData
object with downsampled values.- Raises:
ValueError – If
resolution
is not greater than the current resolution or not divisible by the current resolution.
- TrackData.filter_to_negative_strand()[source]#
Filters tracks to the negative DNA strand.
- Return type:
- TrackData.filter_to_nonnegative_strand()[source]#
Filters tracks to the non-negative DNA strands (positive and unstranded).
- Return type:
- TrackData.filter_to_nonpositive_strand()[source]#
Filters tracks to the non-positive DNA strands (negative and unstranded).
- Return type:
- TrackData.filter_to_positive_strand()[source]#
Filters tracks to the positive DNA strand.
- Return type:
- TrackData.filter_to_stranded()[source]#
Filters tracks to stranded tracks (excluding unstranded).
- Return type:
- TrackData.groupby(column)[source]#
Splits tracks into groups based on a metadata column.
This method splits the tracks in the
TrackData
object into separateTrackData
objects based on the unique values in the specified metadata column. It returns a dictionary where the keys are the unique values in the column, and the values are newTrackData
objects containing the tracks corresponding to each key.
- TrackData.pad(start_pad, end_pad)[source]#
Pads the track data along positional axes.
- Parameters:
- Return type:
- Returns:
A new
TrackData
object with padded values.- Raises:
ValueError – If
start_pad
orend_pad
is not divisible by theresolution. –
- TrackData.resize(width)[source]#
Resizes the track data by cropping or padding with a fixed center.
- Parameters:
width (
int
) – The desired width in base pairs.- Return type:
- Returns:
A new
TrackData
object with resized values.- Raises:
ValueError – If
width
is not divisible by the resolution.
- TrackData.reverse_complement()[source]#
Reverse complements the track data and interval if present.
- Return type:
- Returns:
A new
TrackData
object with reverse complemented tracks.
- TrackData.slice_by_interval(interval, match_resolution=False)[source]#
Slices the track data using a
genome.Interval
.- Parameters:
- Return type:
- Returns:
A new
TrackData
object sliced to the interval.- Raises:
ValueError – If
.interval
is not specified or if the specified interval is not fully contained within the current interval.
- TrackData.slice_by_positions(start, end)[source]#
Slices the track data along the positional axes.
The slicing follows Python slicing conventions (0 indexed, and includes elements up to end-1).
- Parameters:
- Return type:
- Returns:
A new
TrackData
object with the sliced values.- Raises:
ValueError – If (end - start) is greater than the width, or if (end -
start) is not divisible by the resolution. –
- TrackData.to_protos(*, bytes_per_chunk=0, compression_type=0)[source]#
Serializes
TrackData
to protobuf messages.- Parameters:
bytes_per_chunk (
int
(default:0
)) – The maximum number of bytes per tensor chunk.compression_type (
EnumTypeWrapper
(default:0
)) – The compression type to use for the tensor chunks.
- Return type:
- Returns:
A tuple containing the
TrackData
protobuf message and a sequence ofTensorChunk
protobuf messages.
- TrackData.upsample(resolution, aggregation_type=AggregationType.SUM)[source]#
Upsamples the track data to a higher resolution by repeating existing values.
- Parameters:
resolution (
int
) – The desired resolution in base pairs.aggregation_type (
AggregationType
(default:<AggregationType.SUM: 'sum'>
)) – The aggregation method to use for pooling the values.
- Return type:
- Returns:
A new
TrackData
object with upsampled values.- Raises:
ValueError – If
resolution
is not lower than the current resolution or not divisible by the current resolution.