alphagenome.data.gene_annotation.filter_transcript_support_level#
- alphagenome.data.gene_annotation.filter_transcript_support_level(gtf, transcript_support_levels)[source]#
Filter GTF to only transcripts with specific GENCODE support levels.
As documented in the [Ensembl glossary](https://www.ensembl.org/Help/Glossary), the transcript support level (TSL) indicates the degree of evidence that was used to construct the transcript.
As taken from the glossary, the levels are:
Transcript support level | Description | ||---|—|---| | 1 | A transcript where all splice junctions are supported by at least one non-suspect mRNA. | | 2 | A transcript where the best supporting mRNA is flagged as suspect or the support is from multiple ESTs | | 3 | A transcript where the only support is from a single EST | | 4 | A transcript where the best supporting EST is flagged as suspect | | 5 | A transcript where no single transcript supports the model structure. | | NA | A transcript that was not analysed for TSL. |
- Parameters:
- Return type:
- Returns:
pd.DataFrame exactly as provided, but subset to rows with the specified support level(s).
Transcripts are scored by GENCODE according to how well mRNA and EST alignments match over its full length. Valid levels are: ‘1’: All splice junctions of the transcript are supported by at least one non-suspect mRNA. ‘2’: The best supporting mRNA is flagged as suspect or the support is from multiple ESTs. ‘3’: The only support is from a single EST. ‘4’: The best supporting EST is flagged as suspect. ‘5’: No single transcript supports the model structure. ‘NA’: The transcript was not analyzed (not supported by this filter function).
See GENCODE GTF format documentation for further details: https://www.gencodegenes.org/pages/data_format.html