| Title: | Sequence Profiles of OncoKB Genes |
|---|---|
| Description: | A data package of sequence profiles of OncoKB genes. These profiles are obtained via Ensembl's REST API and derived from the pairwise alignment of the human sequence with its orthologs. |
| Authors: | Ramiro Magno [aut, cre] (ORCID: <https://orcid.org/0000-0001-5226-3441>), Isabel Duarte [aut] (ORCID: <https://orcid.org/0000-0003-0060-2936>), Ana-Teresa Maia [aut] (ORCID: <https://orcid.org/0000-0002-0454-9207>), CINTESIS [cph, fnd] |
| Maintainer: | Ramiro Magno <[email protected]> |
| License: | CC BY 4.0 |
| Version: | 0.1.2 |
| Built: | 2026-05-22 05:58:27 UTC |
| Source: | https://github.com/maialab/protean |
Download OncoKB Cancer Gene List
download_gene_list( path = stop("`path` must be specified"), url = oncokb_dwl_url() )download_gene_list( path = stop("`path` must be specified"), url = oncokb_dwl_url() )
path |
A character string with the file path where the downloaded file is to be saved. Tilde-expansion is performed. |
url |
The URL of the resource providing the OncoKB cancer gene list. |
A character vector of genes for which the retrieval of sequence profiles was successful and are hence provided with this package.
exported_genesexported_genes
A character vector.
fetch_oncokb_genes() retrieves the current set of OncoKB genes from an OncoKB's
cancer gene list file.
fetch_oncokb_genes(file = oncokb_dwl_url())fetch_oncokb_genes(file = oncokb_dwl_url())
file |
A URL or a file path to the source providing the cancer gene list file. By default it will automatically download cancerGeneList.tsv from OncoKB website. |
A character vector of gene names.
fetch_oncokb_genes()fetch_oncokb_genes()
This function retrieves pairwise alignments between the human sequence
queried in symbol and each of its orthologs via Ensembl's REST API
homology/symbol/:species/:symbol endpoint. Then, from these alignments,
sequence profiles are derived.
get_profile(symbol, simplify = TRUE)get_profile(symbol, simplify = TRUE)
symbol |
A character vector of HUGO gene symbols. |
simplify |
Should the result be simplified if only one gene symbol is
queried. If |
A list of tibbles, one for each gene symbol queried, with the following columns:
timestampDate and time of the download from Ensembl.
human_prot_idEnsembl identifier of the human protein sequence.
ortho_prot_idEnsembl identifier of the ortholog protein sequence.
ortho_speciesSpecies name of the ortholog sequence.
human_align_seqIn the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned human sequence.
ortho_align_seqIn the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned ortholog sequence.
human_ortho_perc_idPercentage of the human sequence matching the sequence of the ortholog.
ortho_human_perc_idPercentage of the orthologous sequence matching the human sequence.
human_profile_idHuman protein sequence.
ortho_profile_seqOrthologous sequence stripped off of the alignment positions which correspond to gaps in the human sequence.
A character vector of genes for which the retrieval of sequence profiles was not successful and are therefore absent.
missing_genesmissing_genes
A character vector.
A character vector of OncoKB genes used as query to retrieve the sequence profiles bundled with this package.
oncokb_genesoncokb_genes
A character vector.
protean comes bundled with a number of sequence profile files in its
inst/profiles directory. This function make them easy to access by
returning the local path to them.
profile_path(file = NULL)profile_path(file = NULL)
file |
Name of file or gene symbol. If |
# Retrieve the path to the sequence profile of the TP53 protein # Using the gene symbol profile_path("TP53") # Using the file name profile_path("TP53.csv.gz") # List all profile files profile_path()# Retrieve the path to the sequence profile of the TP53 protein # Using the gene symbol profile_path("TP53") # Using the file name profile_path("TP53.csv.gz") # List all profile files profile_path()
Read a sequence profile
read_profile(file = stop("`file` must be specified"), sort = TRUE)read_profile(file = stop("`file` must be specified"), sort = TRUE)
file |
A path to a sequence profile file. |
sort |
Whether to sort the sequences by the variable
|
A tibble of 10 variables:
timestampDate and time of the download from Ensembl.
human_prot_idEnsembl identifier of the human protein sequence.
ortho_prot_idEnsembl identifier of the ortholog protein sequence.
ortho_speciesSpecies name of the ortholog sequence.
human_align_seqIn the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned human sequence.
ortho_align_seqIn the context of pairwise alignment between the human sequence and one of its orthologs, this is the aligned ortholog sequence.
human_ortho_perc_idPercentage of the human sequence matching the sequence of the ortholog.
ortho_human_perc_idPercentage of the orthologous sequence matching the human sequence.
human_profile_idHuman protein sequence.
ortho_profile_seqOrthologous sequence stripped off of the alignment positions which correspond to gaps in the human sequence.
read_profile(profile_path("TP53"))read_profile(profile_path("TP53"))