Title: | R Client for the Ensembl REST API |
---|---|
Description: | R Client for the Ensembl REST API. |
Authors: | Ramiro Magno [aut, cre] , Isabel Duarte [aut] , Ana-Teresa Maia [aut] , CINTESIS [cph, fnd] |
Maintainer: | Ramiro Magno <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-11-22 03:05:41 UTC |
Source: | https://github.com/maialab/ensemblr |
This function converts three vectors: chr
, start
, and end
to strings of the form {chr}:{start}..{end}.
genomic_range(chr, start, end, starting_position_index = 1L)
genomic_range(chr, start, end, starting_position_index = 1L)
chr |
A character vector of chromosome names. |
start |
An integer vector of start positions. |
end |
An integer vector of end positions. |
starting_position_index |
Use this argument to indicate if the positions
are 0-based ( |
Returns a character vector whose strings are genomic ranges of the form {chr}:{start}..{end}.
genomic_range("1", 10000L, 20000L) # Returns "1:10000..20000"
genomic_range("1", 10000L, 20000L) # Returns "1:10000..20000"
This function retrieves a table of analyses involved in the generation of data for the different Ensembl databases.
get_analyses( species_name, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_analyses( species_name, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 3 variables:
species_name
Ensembl species name: this is the name used
internally by Ensembl to uniquely identify a species by name. It is the
scientific name but formatted without capitalisation and spacing converted
with an underscore, e.g., 'homo_sapiens'
.
database
Ensembl database. Typically one of 'core'
,
'rnaseq'
, 'cdna'
, 'funcgen'
and
'otherfeatures'
.
analysis
Analysis.
This functions retrieves details about the assembly of a queried species.
get_assemblies( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_assemblies( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a toplevel sequence,
of 4 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
assembly_name
Assembly name.
assembly_date
Assembly date.
genebuild_method
Annotation method.
golden_path_length
Golden path length.
genebuild_initial_release_date
Genebuild release date.
default_coord_system_version
Default coordinate system version.
assembly_accession
Assembly accession.
genebuild_start_date
Genebuild start date.
genebuild_last_geneset_update
Genebuild last geneset update.
# Get details about the human assembly get_assemblies() # Get details about the Mouse and the Fruit Fly genomes get_assemblies(c('mus_musculus', 'drosophila_melanogaster'))
# Get details about the human assembly get_assemblies() # Get details about the Mouse and the Fruit Fly genomes get_assemblies(c('mus_musculus', 'drosophila_melanogaster'))
This function retrieves cytogenetic bands. If no cytogenetic information is available for the queried species then it will be omitted from in the returned value.
get_cytogenetic_bands( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_cytogenetic_bands( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a cytogenetic band,
of 8 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
assembly_name
Assembly name.
cytogenetic_band
Name of the cytogenetic_band.
chromosome
Chromosome name.
start
Genomic start position of the cytogenetic band. Starts at 1.
end
Genomic end position of the cytogenetic band. End position is included in the band interval.
stain
Giemsa
stain results: Giemsa negative, 'gneg'
; Giemsa positive, of
increasing intensities, 'gpos25'
, 'gpos50'
, 'gpos75'
,
and 'gpos100'
; centromeric region, 'acen'
; heterochromatin,
either pericentric or telomeric, 'gvar'
; and short arm of
acrocentric chromosomes are coded as 'stalk'
.
strand
Strand.
# Get toplevel sequences for the human genome (default) get_cytogenetic_bands() # Get toplevel sequences for Mus musculus get_cytogenetic_bands('mus_musculus')
# Get toplevel sequences for the human genome (default) get_cytogenetic_bands() # Get toplevel sequences for Mus musculus get_cytogenetic_bands('mus_musculus')
Retrieve the data release version(s) available on the Ensembl REST server.
get_data_versions(verbose = FALSE, warnings = TRUE)
get_data_versions(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
An integer vector of release version(s).
This function retrieves Ensembl divisions. Ensembl data is split up in separate databases which are loosely based on taxonomic divisions or sub-groups.
get_divisions(verbose = FALSE, warnings = TRUE)
get_divisions(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
A character vector of Ensembl divisions.
# Retrieve a character vector of Ensembl divisions get_divisions()
# Retrieve a character vector of Ensembl divisions get_divisions()
Returns the Ensembl Genomes version of the databases backing this service.
get_ensembl_genomes_version(verbose = FALSE, warnings = TRUE)
get_ensembl_genomes_version(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
An integer value: the Ensembl Genomes version.
get_ensembl_genomes_version()
get_ensembl_genomes_version()
This function retrieves eQTLs, along with its genomic position, beta values and p-values.
get_eqtl_pval_by_gene( ensembl_id, species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_eqtl_pval_by_gene( ensembl_id, species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
ensembl_id |
An Ensembl stable identifier, e.g. |
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Currently, only
|
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 10
variables:
species_name
Ensembl species name: this is the name used
internally by Ensembl to uniquely identify a species by name. It is the
scientific name but formatted without capitalisation and spacing converted
with an underscore, e.g., 'homo_sapiens'
.
ensembl_id
An Ensembl stable identifier, e.g.
"ENSG00000248234378"
.
variant_id
Variant identifier, e.g. "rs80100814"
.
tissue
Tissue.
display_consequence
Variant consequence type.
seq_region_name
Sequence region name (typically a chromosome name) of the variant.
seq_region_start
Genomic start position of the variant.
seq_region_end
Genomic end position of the variant.
beta
The effect size beta of the eQTL association analysis.
pvalue
The p-value of the eQTL association analysis.
get_eqtl_pval_by_gene()
makes GET requests to
/eqtl/id/:species/:stable_id.
get_eqtl_pval_by_gene('ENSG00000248378')
get_eqtl_pval_by_gene('ENSG00000248378')
This function retrieves genes (Ensembl identifier) along with beta values and p-values for the eQTL association.
get_eqtl_pval_by_variant( variant_id, species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_eqtl_pval_by_variant( variant_id, species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
variant_id |
A variant identifier, e.g. |
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Currently, only
|
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 6
variables:
species_name
Ensembl species name: this is the name used
internally by Ensembl to uniquely identify a species by name. It is the
scientific name but formatted without capitalisation and spacing converted
with an underscore, e.g., 'homo_sapiens'
.
variant_id
Variant identifier, e.g. "rs80100814"
.
ensembl_id
An Ensembl stable identifier, e.g.
"ENSG00000248234378"
.
tissue
Tissue.
beta
The effect size beta of the eQTL association analysis.
pvalue
The p-value of the eQTL association analysis.
get_eqtl_pval_by_variant()
makes GET requests to
/eqtl/variant_name/:species/:variant_name.
get_eqtl_pval_by_variant('rs80100814')
get_eqtl_pval_by_variant('rs80100814')
This function retrieves all tissues in the eQTL database.
get_eqtl_tissues( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_eqtl_tissues( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Currently, only human
|
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
get_eqtl_tissues()
makes GET requests to
/eqtl/tissue/:species/.
get_eqtl_tissues()
get_eqtl_tissues()
This function retrieves information about one or more Ensembl identifiers. Ensembl identifiers for which information is available are: genes, exons, transcripts and proteins.
get_id(id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
get_id(id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
id |
A character vector of Ensembl identifiers. Ensembl identifiers have
the form ENS[species prefix][feature type prefix][a unique eleven digit
number]. |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 9 variables:
id
Ensembl identifier.
id_latest
Ensembl identifier including the version suffix.
type
Entity type: gene ('Gene'
), exon ('Exon'
),
transcript ('Transcript'
), and protein ('Translation'
).
id_version
Ensembl identifier version, indicates how many times that entity has changed during its time in Ensembl.
release
Ensembl release version.
is_current
Is this the latest identifier for the represented entity.
genome_assembly_name
Code name of the genome assembly.
peptide
TODO
possible_replacement
TODO
get_id(c('ENSDARE00000830915', 'ENSG00000248378', 'ENSDART00000033574', 'ENSP00000000233'))
get_id(c('ENSDARE00000830915', 'ENSG00000248378', 'ENSDART00000033574', 'ENSP00000000233'))
This function retrieves individual-level information. The data is returned as
a tibble
where each row is an individual of a given
species and the columns are metadata about each individual. See below under
section Value for details about each column. Use the function
get_populations()
to discover the available populations for a species.
get_individuals( species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_individuals( species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
population |
Population name. Find the available populations for a given
species with |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 5 variables:
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
Population.
Description of the population.
Individual identifier.
Individual gender.
get_individuals()
makes GET requests to
/info/variation/populations/:species:/:population_name.
# Get human individuals for populaton "1000GENOMES:phase_3:CEU" (default) get_individuals() # Get Finnish individuals ("1000GENOMES:phase_3:FIN") get_individuals(population = '1000GENOMES:phase_3:FIN')
# Get human individuals for populaton "1000GENOMES:phase_3:CEU" (default) get_individuals() # Get Finnish individuals ("1000GENOMES:phase_3:FIN") get_individuals(population = '1000GENOMES:phase_3:FIN')
This function retrieves the set of chromosomes of a species.
get_karyotypes( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_karyotypes( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a chromosome,
of 4 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
coord_system
Coordinate system type.
chromosome
Chromosome name.
length
Genomic length of the chromsome in base pairs.
# Get the karyotype of Caenorhabditis elegans get_karyotypes('caenorhabditis_elegans') # Get the karyotype of the Giant panda get_karyotypes('ailuropoda_melanoleuca')
# Get the karyotype of Caenorhabditis elegans get_karyotypes('caenorhabditis_elegans') # Get the karyotype of the Giant panda get_karyotypes('ailuropoda_melanoleuca')
Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:
get_ld_variants_by_window(variant_id, genomic_window_size,
...)
get_ld_variants_by_pair(variant_id1,
variant_id2, ...)
get_ld_variants_by_range(genomic_range, ...)
get_ld_variants_by_pair_combn(variant_id,
...)
get_ld_variants_by_window( variant_id, genomic_window_size = 500L, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_pair( variant_id1, variant_id2, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_range( genomic_range, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_pair_combn( variant_id, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_ld_variants_by_window( variant_id, genomic_window_size = 500L, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_pair( variant_id1, variant_id2, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_range( genomic_range, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE ) get_ld_variants_by_pair_combn( variant_id, species_name = "homo_sapiens", population = "1000GENOMES:phase_3:CEU", d_prime = 0, r_squared = 0.05, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
variant_id |
Variant identifiers, e.g., |
genomic_window_size |
An integer vector specifying the genomic window
size in kilobases (kb) around the variant indicated in |
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
population |
Population for which to compute linkage disequilibrium. See
|
d_prime |
|
r_squared |
|
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
variant_id1 |
The first variant of a pair of variants. Used with
|
variant_id2 |
The second variant of a pair of variants. Used with
|
genomic_range |
Genomic range formatted as a string
|
A tibble
of 6 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
population
Population for which to compute linkage disequilibrium.
variant_id1
First variant identifier.
variant_id2
Second variant identifier.
d_prime
between the two variants.
r_squared
between the two variants.
# Retrieve variants in LD by a window size of 1kb: # 1kb: 500 bp upstream and 500 bp downstream of variant. get_ld_variants_by_window('rs123', genomic_window_size = 1L) # Retrieve LD measures for pairs of variants: get_ld_variants_by_pair( variant_id1 = c('rs123', 'rs35439278'), variant_id2 = c('rs122', 'rs35174522') ) # Retrieve variants in LD within a genomic range get_ld_variants_by_range('7:100000..100500') # Retrieve all pair combinations of variants in LD get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))
# Retrieve variants in LD by a window size of 1kb: # 1kb: 500 bp upstream and 500 bp downstream of variant. get_ld_variants_by_window('rs123', genomic_window_size = 1L) # Retrieve LD measures for pairs of variants: get_ld_variants_by_pair( variant_id1 = c('rs123', 'rs35439278'), variant_id2 = c('rs122', 'rs35174522') ) # Retrieve variants in LD within a genomic range get_ld_variants_by_range('7:100000..100500') # Retrieve all pair combinations of variants in LD get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))
This function retrieves population-level information. The data is returned as
a tibble
where each row is a population of a given
species and the columns are metadata about each population. See below under
section Value for details about each column. The parameter ld_only
to
restrict populations returned to only populations with linkage disequilibrium
information.
get_populations( species_name = "homo_sapiens", ld_only = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_populations( species_name = "homo_sapiens", ld_only = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
ld_only |
Whether to restrict populations returned to only populations with linkage disequilibrium data. |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 4 variables:
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
Population.
Description of the population.
Cohort sample size.
get_populations()
makes GET requests to
/info/variation/populations/:species.
# Get all human populations with linkage disequilibrium data get_populations(species_name = 'homo_sapiens', ld_only = TRUE) # Get all human populations get_populations(species_name = 'homo_sapiens', ld_only = FALSE)
# Get all human populations with linkage disequilibrium data get_populations(species_name = 'homo_sapiens', ld_only = TRUE) # Get all human populations get_populations(species_name = 'homo_sapiens', ld_only = FALSE)
Retrieve the current version of the Ensembl REST API
get_rest_version(verbose = FALSE, warnings = TRUE)
get_rest_version(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
A scalar character vector with Ensembl REST API version.
Retrieve the Perl API version
get_software_version(verbose = FALSE, warnings = TRUE)
get_software_version(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
A scalar integer vector with the Perl API version.
This function retrieves species-level information. The data is returned as a
tibble
where each row is a species and the columns are
metadata about each species. See below under section Value for details about
each column.
get_species( division = get_divisions(), verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_species( division = get_divisions(), verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
division |
Ensembl division, e.g., |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 12 variables:
Ensembl division: "EnsemblVertebrates"
,
"EnsemblMetazoa"
, "EnsemblPlants"
, "EnsemblProtists"
,
"EnsemblFungi"
or "EnsemblBacteria"
.
NCBI taxon identifier.
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
Species display name: the name used for display on Ensembl website.
Species common name.
Ensembl release version.
Code name of the genome assembly.
Genbank assembly accession identifier.
Species strain.
Species strain collection.
Other names or acronyms used to refer to the species. Note that this column is of the list type.
Ensembl databases for which data exists for this species. Note that this column is of the list type.
This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.
get_toplevel_sequence_info( species_name = "homo_sapiens", toplevel_sequence = c(1:22, "X", "Y", "MT"), verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_toplevel_sequence_info( species_name = "homo_sapiens", toplevel_sequence = c(1:22, "X", "Y", "MT"), verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
toplevel_sequence |
A toplevel sequence name, e.g. chromosome names such
as |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a toplevel sequence,
of 8 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific name
but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
toplevel_sequence
Name of the toplevel sequence.
is_chromosome
A logical indicating whether the toplevel sequence is
a chromosome (TRUE
) or not (FALSE
).
coord_system
Coordinate system type.
assembly_exception_type
Coordinate system type.
is_circular
A logical indicating whether the toplevel sequence is a
circular sequence (TRUE
) or not (FALSE
).
assembly_name
Assembly name.
length
Genomic length toplevel sequence in base pairs.
# Get details about human chromosomes (default) get_toplevel_sequence_info() # Get details about a scaffold # (To find available toplevel sequences to query use the function # `get_toplevel_sequences()`) get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')
# Get details about human chromosomes (default) get_toplevel_sequence_info() # Get details about a scaffold # (To find available toplevel sequences to query use the function # `get_toplevel_sequences()`) get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')
This function retrieves toplevel sequences. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.
get_toplevel_sequences( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_toplevel_sequences( species_name = "homo_sapiens", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a toplevel sequence,
of 4 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
coord_system
Coordinate system type.
toplevel_sequence
Name of the toplevel sequence.
length
Genomic length toplevel sequence in base pairs.
# Get toplevel sequences for the human genome (default) get_toplevel_sequences() # Get toplevel sequences for Caenorhabditis elegans get_toplevel_sequences('caenorhabditis_elegans')
# Get toplevel sequences for the human genome (default) get_toplevel_sequences() # Get toplevel sequences for Caenorhabditis elegans get_toplevel_sequences('caenorhabditis_elegans')
This function retrieves variant consequence types. For more details check Ensembl Variation - Calculated variant consequences.
get_variant_consequences(verbose = FALSE, warnings = TRUE)
get_variant_consequences(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty about the underlying requests. |
warnings |
Whether to print warnings. |
A rule-based approach is used to predict the effects that each allele of a variant may have on each transcript. These effects are variant consequences, that are catalogued as consequence terms, defined by the Sequence Ontology.
See below a diagram showing the location of each display term relative to the transcript structure:
A tibble
, each row being a variant consequence,
of four variables:
Sequence
Ontology accession, e.g., 'SO:0001626'
.
Sequence Ontology
term, e.g., 'incomplete_terminal_codon_variant'
.
Display term.
Sequence Ontology description.
get_variant_consequence_types
makes GET requests to
/info/variation/consequence_types.
# Retrieve variant consequence types get_variant_consequences()
# Retrieve variant consequence types get_variant_consequences()
This function retrieves variant sources, i.e. a list of databases used by Ensembl from which variant information is retrieved.
get_variation_sources( species_name = "human", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_variation_sources( species_name = "human", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
, each row being a variant database,
of 8 variables:
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
Database name.
Database type, e.g., chip
(genotyping chip) or lsdb
(locus-specific database).
Database version.
Somatic status.
Database description.
Database's URL.
Data types to be found at database.
get_variation_sources
makes GET requests to
info/variation/:species.
# Retrieve variant sources for human (default) get_variation_sources() # Retrieve variant sources for mouse get_variation_sources(species_name = 'mus_musculus')
# Retrieve variant sources for human (default) get_variation_sources() # Retrieve variant sources for mouse get_variation_sources(species_name = 'mus_musculus')
This function gets the versions of the different entities involved in the REST API requests. When accessing the Ensembl REST API, you are actually accessing three interconnected entities:
Ensembl databases (data
).
Perl API (software
).
REST API (rest
).
get_versioning(verbose = FALSE, warnings = TRUE)
get_versioning(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to print warnings. |
A named list of three elements: data
, software
and
rest
.
# Get the versions of the different entities involved in the REST API # requests. get_versioning()
# Get the versions of the different entities involved in the REST API # requests. get_versioning()
This function retrieves cross-references to external databases by Ensembl
identifier. The data is returned as a tibble
where each
row is a cross reference related to the provided Ensembl identifier. See
below under section Value for details about each column.
get_xrefs_by_ensembl_id( species_name, ensembl_id, all_levels = FALSE, ensembl_db = "core", external_db = "", feature = "", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_xrefs_by_ensembl_id( species_name, ensembl_id, all_levels = FALSE, ensembl_db = "core", external_db = "", feature = "", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
ensembl_id |
An Ensembl stable identifier, e.g. |
all_levels |
A |
ensembl_db |
Restrict the search to an Ensembl database: typically one
of |
external_db |
External database to be filtered by. By default no filtering is applied. |
feature |
Restrict search to a feature type: gene ( |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 12
variables:
species_name
Ensembl species name: this is the name used
internally by Ensembl to uniquely identify a species by name. It is the
scientific name but formatted without capitalisation and spacing converted
with an underscore, e.g., 'homo_sapiens'
.
ensembl_id
An Ensembl stable identifier, e.g. "ENSG00000248234378"
.
ensembl_db
Ensembl database.
primary_id
Primary identification in external database.
display_id
Display identification in external database.
external_db_name
External database name.
external_db_display_name
External database display name.
version
TODO
info_type
There are two types of external cross references (XRef):
direct ('DIRECT'
) or dependent ('DEPENDENT'
). A direct cross
reference is one that can be directly linked to a gene, transcript or
translation object in Ensembl Genomes by synonymy or sequence similarity. A
dependent cross reference is one that is transitively linked to the object
via the direct cross reference. The value can also be 'UNMAPPED'
for
unmapped cross references, or 'PROJECTION'
for TODO.
info_text
TODO
synonyms
Other names or acronyms used to refer to the the external database entry. Note that this column is of the list type.
description
Brief description of the external database entry.
get_xrefs_by_ensembl_id()
makes GET requests to
/xrefs/id/:id.
get_xrefs_by_ensembl_id('human', 'ENSG00000248378') get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)
get_xrefs_by_ensembl_id('human', 'ENSG00000248378') get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)
This function retrieves cross references by symbol or display name of a gene.
The data is returned as a tibble
where each row is a
cross reference related to the provided symbol or display name of a gene. See
below under section Value for details about each column.
get_xrefs_by_gene( species_name, gene, ensembl_db = "core", external_db = "", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_xrefs_by_gene( species_name, gene, ensembl_db = "core", external_db = "", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
species_name |
The species name, i.e., the scientific name, all letters
lowercase and space replaced by underscore. Examples: |
gene |
Symbol or display name of a gene, e.g., |
ensembl_db |
Restrict the search to a database other than the default.
Ensembl's default database is |
external_db |
Filter by external database, e.g. |
verbose |
Whether to be verbose about the http requests and respective responses' status. |
warnings |
Whether to show warnings. |
progress_bar |
Whether to show a progress bar. |
A tibble
of 12 variables:
species_name
Ensembl species name: this is the name used internally
by Ensembl to uniquely identify a species by name. It is the scientific
name but formatted without capitalisation and spacing converted with an
underscore, e.g., 'homo_sapiens'
.
gene
Gene symbol.
ensembl_db
Ensembl database.
primary_id
Primary identification in external database.
display_id
Display identification in external database.
external_db_name
External database name.
external_db_display_name
External database display name.
version
TODO
info_type
There are two types of external cross references (XRef):
direct ('DIRECT'
) or dependent ('DEPENDENT'
). A direct cross
reference is one that can be directly linked to a gene, transcript or
translation object in Ensembl Genomes by synonymy or sequence similarity. A
dependent cross reference is one that is transitively linked to the object
via the direct cross reference. The value can also be 'UNMAPPED'
for
unmapped cross references, or 'PROJECTION'
for TODO.
info_text
TODO
Other names or acronyms used to refer to the gene. Note that this column is of the list type.
description
Brief description of the external database entry.
get_xrefs_by_gene()
makes GET requests to
/xrefs/name/:species/:name.
# Get cross references that relate to gene BRCA2 get_xrefs_by_gene(species_name = 'human', gene = 'BRCA2')
# Get cross references that relate to gene BRCA2 get_xrefs_by_gene(species_name = 'human', gene = 'BRCA2')
Check if the Ensembl server where REST API service is running is reachable.
This function attempts to connect to
https://rest.ensembl.org, returning
TRUE
on success, and FALSE
otherwise. Set verbose = TRUE
for a step by step description of the connection attempt.
is_ensembl_reachable(url = ensembl_server(), port = 443L, verbose = FALSE)
is_ensembl_reachable(url = ensembl_server(), port = 443L, verbose = FALSE)
url |
Ensembl REST API server URL. Default is https://rest.ensembl.org. You should not need to change this parameter. |
port |
Network port on which to ping the server. You should not need to change this parameter. |
verbose |
Whether to be verbose ( |
A logical value: TRUE
if EBI server is reachable, FALSE
otherwise.
# Check if the Ensembl Server is reachable is_ensembl_reachable() # Returns TRUE or FALSE. # Check if the GWAS Catalog Server is reachable # and show exactly at what step is it failing (if that is the case) is_ensembl_reachable(verbose = TRUE)
# Check if the Ensembl Server is reachable is_ensembl_reachable() # Returns TRUE or FALSE. # Check if the GWAS Catalog Server is reachable # and show exactly at what step is it failing (if that is the case) is_ensembl_reachable(verbose = TRUE)
A dataset containing the Ensembl REST API endpoints, as listed in https://rest.ensembl.org/.
rest_api_endpoints
rest_api_endpoints
A data frame with 118 rows and 4 variables:
Section.
Ensembl REST API endpoint.
A short description of the resource.
Time stamp of last time this dataset was downloaded from Ensembl.