Title: | Sequence Profiles of OncoKB Genes |
---|---|
Description: | A data package of sequence profiles of OncoKB genes. These profiles are obtained via Ensembl's REST API and derived from the pairwise alignment of the human sequence with its orthologs. |
Authors: | Ramiro Magno [aut, cre] , Isabel Duarte [aut] , Ana-Teresa Maia [aut] , CINTESIS [cph, fnd] |
Maintainer: | Ramiro Magno <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.1.2 |
Built: | 2024-11-21 03:19:49 UTC |
Source: | https://github.com/maialab/protean |
Download OncoKB Cancer Gene List
download_gene_list( path = stop("`path` must be specified"), url = oncokb_dwl_url() )
download_gene_list( path = stop("`path` must be specified"), url = oncokb_dwl_url() )
path |
A character string with the file path where the downloaded file is to be saved. Tilde-expansion is performed. |
url |
The URL of the resource providing the OncoKB cancer gene list. |
A character vector of genes for which the retrieval of sequence profiles was successful and are hence provided with this package.
exported_genes
exported_genes
A character vector.
fetch_oncokb_genes()
retrieves the current set of OncoKB genes from an OncoKB's
cancer gene list file.
fetch_oncokb_genes(file = oncokb_dwl_url())
fetch_oncokb_genes(file = oncokb_dwl_url())
file |
A URL or a file path to the source providing the cancer gene list file. By default it will automatically download cancerGeneList.tsv from OncoKB website. |
A character vector of gene names.
fetch_oncokb_genes()
fetch_oncokb_genes()
This function retrieves pairwise alignments between the human sequence
queried in symbol
and each of its orthologs via Ensembl's REST API
homology/symbol/:species/:symbol
endpoint. Then, from these alignments,
sequence profiles are derived.
get_profile(symbol, simplify = TRUE)
get_profile(symbol, simplify = TRUE)
symbol |
A character vector of HUGO gene symbols. |
simplify |
Should the result be simplified if only one gene symbol is
queried. If |
A list of tibbles, one for each gene symbol queried, with the following columns:
timestamp
Date and time of the download from Ensembl.
human_prot_id
Ensembl identifier of the human protein sequence.
ortho_prot_id
Ensembl identifier of the ortholog protein sequence.
ortho_species
Species name of the ortholog sequence.
human_align_seq
In the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned human sequence.
ortho_align_seq
In the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned ortholog sequence.
human_ortho_perc_id
Percentage of the human sequence matching the sequence of the ortholog.
ortho_human_perc_id
Percentage of the orthologous sequence matching the human sequence.
human_profile_id
Human protein sequence.
ortho_profile_seq
Orthologous sequence stripped off of the alignment positions which correspond to gaps in the human sequence.
A character vector of genes for which the retrieval of sequence profiles was not successful and are therefore absent.
missing_genes
missing_genes
A character vector.
A character vector of OncoKB genes used as query to retrieve the sequence profiles bundled with this package.
oncokb_genes
oncokb_genes
A character vector.
protean comes bundled with a number of sequence profile files in its
inst/profiles
directory. This function make them easy to access by
returning the local path to them.
profile_path(file = NULL)
profile_path(file = NULL)
file |
Name of file or gene symbol. If |
# Retrieve the path to the sequence profile of the TP53 protein # Using the gene symbol profile_path("TP53") # Using the file name profile_path("TP53.csv.gz") # List all profile files profile_path()
# Retrieve the path to the sequence profile of the TP53 protein # Using the gene symbol profile_path("TP53") # Using the file name profile_path("TP53.csv.gz") # List all profile files profile_path()
Read a sequence profile
read_profile(file = stop("`file` must be specified"), sort = TRUE)
read_profile(file = stop("`file` must be specified"), sort = TRUE)
file |
A path to a sequence profile file. |
sort |
Whether to sort the sequences by the variable
|
A tibble of 10 variables:
timestamp
Date and time of the download from Ensembl.
human_prot_id
Ensembl identifier of the human protein sequence.
ortho_prot_id
Ensembl identifier of the ortholog protein sequence.
ortho_species
Species name of the ortholog sequence.
human_align_seq
In the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned human sequence.
ortho_align_seq
In the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned ortholog sequence.
human_ortho_perc_id
Percentage of the human sequence matching the sequence of the ortholog.
ortho_human_perc_id
Percentage of the orthologous sequence matching the human sequence.
human_profile_id
Human protein sequence.
ortho_profile_seq
Orthologous sequence stripped off of the alignment positions which correspond to gaps in the human sequence.
read_profile(profile_path("TP53"))
read_profile(profile_path("TP53"))