Package 'ensemblr'

Title: R Client for the Ensembl REST API
Description: R Client for the Ensembl REST API.
Authors: Ramiro Magno [aut, cre] , Isabel Duarte [aut] , Ana-Teresa Maia [aut] , CINTESIS [cph, fnd]
Maintainer: Ramiro Magno <[email protected]>
License: MIT + file LICENSE
Version: 0.0.1
Built: 2024-11-22 03:05:41 UTC
Source: https://github.com/maialab/ensemblr

Help Index


Create genomic range strings

Description

This function converts three vectors: chr, start, and end to strings of the form {chr}:{start}..{end}.

Usage

genomic_range(chr, start, end, starting_position_index = 1L)

Arguments

chr

A character vector of chromosome names.

start

An integer vector of start positions.

end

An integer vector of end positions.

starting_position_index

Use this argument to indicate if the positions are 0-based (0L) or 1-based (1L). This value is used to check if positions are equal or above this number.

Value

Returns a character vector whose strings are genomic ranges of the form {chr}:{start}..{end}.

Examples

genomic_range("1", 10000L, 20000L) # Returns "1:10000..20000"

Get analyses behind Ensembl databases

Description

This function retrieves a table of analyses involved in the generation of data for the different Ensembl databases.

Usage

get_analyses(
  species_name,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 3 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

database

Ensembl database. Typically one of 'core', 'rnaseq', 'cdna', 'funcgen' and 'otherfeatures'.

analysis

Analysis.


Get details about the genome assembly of a species

Description

This functions retrieves details about the assembly of a queried species.

Usage

get_assemblies(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 4 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

assembly_name

Assembly name.

assembly_date

Assembly date.

genebuild_method

Annotation method.

golden_path_length

Golden path length.

genebuild_initial_release_date

Genebuild release date.

default_coord_system_version

Default coordinate system version.

assembly_accession

Assembly accession.

genebuild_start_date

Genebuild start date.

genebuild_last_geneset_update

Genebuild last geneset update.

Examples

# Get details about the human assembly
get_assemblies()

# Get details about the Mouse and the Fruit Fly genomes
get_assemblies(c('mus_musculus', 'drosophila_melanogaster'))

Get cytogenetic bands by species

Description

This function retrieves cytogenetic bands. If no cytogenetic information is available for the queried species then it will be omitted from in the returned value.

Usage

get_cytogenetic_bands(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a cytogenetic band, of 8 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

assembly_name

Assembly name.

cytogenetic_band

Name of the cytogenetic_band.

chromosome

Chromosome name.

start

Genomic start position of the cytogenetic band. Starts at 1.

end

Genomic end position of the cytogenetic band. End position is included in the band interval.

stain

Giemsa stain results: Giemsa negative, 'gneg'; Giemsa positive, of increasing intensities, 'gpos25', 'gpos50', 'gpos75', and 'gpos100'; centromeric region, 'acen'; heterochromatin, either pericentric or telomeric, 'gvar'; and short arm of acrocentric chromosomes are coded as 'stalk'.

strand

Strand.

Examples

# Get toplevel sequences for the human genome (default)
get_cytogenetic_bands()

# Get toplevel sequences for Mus musculus
get_cytogenetic_bands('mus_musculus')

Retrieve the data release version(s) available on the Ensembl REST server.

Description

Retrieve the data release version(s) available on the Ensembl REST server.

Usage

get_data_versions(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

An integer vector of release version(s).


Retrieve Ensembl divisions

Description

This function retrieves Ensembl divisions. Ensembl data is split up in separate databases which are loosely based on taxonomic divisions or sub-groups.

Usage

get_divisions(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

A character vector of Ensembl divisions.

Examples

# Retrieve a character vector of Ensembl divisions
get_divisions()

Get Ensembl Genomes version

Description

Returns the Ensembl Genomes version of the databases backing this service.

Usage

get_ensembl_genomes_version(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

An integer value: the Ensembl Genomes version.

Examples

get_ensembl_genomes_version()

Get eQTL details by gene ensembl identifier

Description

This function retrieves eQTLs, along with its genomic position, beta values and p-values.

Usage

get_eqtl_pval_by_gene(
  ensembl_id,
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378" (or a vector thereof).

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Currently, only 'homo_sapiens' (human) is supported.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 10 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

variant_id

Variant identifier, e.g. "rs80100814".

tissue

Tissue.

display_consequence

Variant consequence type.

seq_region_name

Sequence region name (typically a chromosome name) of the variant.

seq_region_start

Genomic start position of the variant.

seq_region_end

Genomic end position of the variant.

beta

The effect size beta of the eQTL association analysis.

pvalue

The p-value of the eQTL association analysis.

Ensembl REST API endpoints

get_eqtl_pval_by_gene() makes GET requests to /eqtl/id/:species/:stable_id.

Examples

get_eqtl_pval_by_gene('ENSG00000248378')

Get eQTL details by variant identifier

Description

This function retrieves genes (Ensembl identifier) along with beta values and p-values for the eQTL association.

Usage

get_eqtl_pval_by_variant(
  variant_id,
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

variant_id

A variant identifier, e.g. "rs80100814".

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Currently, only 'homo_sapiens' (human) is supported.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 6 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

variant_id

Variant identifier, e.g. "rs80100814".

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

tissue

Tissue.

beta

The effect size beta of the eQTL association analysis.

pvalue

The p-value of the eQTL association analysis.

Ensembl REST API endpoints

get_eqtl_pval_by_variant() makes GET requests to /eqtl/variant_name/:species/:variant_name.

Examples

get_eqtl_pval_by_variant('rs80100814')

Get tissues in the eQTL database

Description

This function retrieves all tissues in the eQTL database.

Usage

get_eqtl_tissues(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Currently, only human 'homo_sapiens' is available.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Ensembl REST API endpoints

get_eqtl_tissues() makes GET requests to /eqtl/tissue/:species/.

Examples

get_eqtl_tissues()

Get details about an Ensembl identifier

Description

This function retrieves information about one or more Ensembl identifiers. Ensembl identifiers for which information is available are: genes, exons, transcripts and proteins.

Usage

get_id(id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)

Arguments

id

A character vector of Ensembl identifiers. Ensembl identifiers have the form ENS[species prefix][feature type prefix][a unique eleven digit number]. id should not contain NAs. Please note that while 'ENSG00000157764' is a valid identifier as a query, 'ENSG00000157764.13' is not.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 9 variables:

id

Ensembl identifier.

id_latest

Ensembl identifier including the version suffix.

type

Entity type: gene ('Gene'), exon ('Exon'), transcript ('Transcript'), and protein ('Translation').

id_version

Ensembl identifier version, indicates how many times that entity has changed during its time in Ensembl.

release

Ensembl release version.

is_current

Is this the latest identifier for the represented entity.

genome_assembly_name

Code name of the genome assembly.

peptide

TODO

possible_replacement

TODO

Examples

get_id(c('ENSDARE00000830915', 'ENSG00000248378', 'ENSDART00000033574', 'ENSP00000000233'))

Get individuals for a population

Description

This function retrieves individual-level information. The data is returned as a tibble where each row is an individual of a given species and the columns are metadata about each individual. See below under section Value for details about each column. Use the function get_populations() to discover the available populations for a species.

Usage

get_individuals(
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

population

Population name. Find the available populations for a given species with get_populations.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 5 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population.

description

Description of the population.

individual

Individual identifier.

gender

Individual gender.

Ensembl REST API endpoints

get_individuals() makes GET requests to /info/variation/populations/:species:/:population_name.

Examples

# Get human individuals for populaton "1000GENOMES:phase_3:CEU" (default)
get_individuals()

# Get Finnish individuals ("1000GENOMES:phase_3:FIN")
get_individuals(population = '1000GENOMES:phase_3:FIN')

Get the karyotype of a species

Description

This function retrieves the set of chromosomes of a species.

Usage

get_karyotypes(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a chromosome, of 4 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

coord_system

Coordinate system type.

chromosome

Chromosome name.

length

Genomic length of the chromsome in base pairs.

Examples

# Get the karyotype of Caenorhabditis elegans
get_karyotypes('caenorhabditis_elegans')

# Get the karyotype of the Giant panda
get_karyotypes('ailuropoda_melanoleuca')

Get linkage disequilibrium data for variants

Description

Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:

Genomic window centred on variants:

get_ld_variants_by_window(variant_id, genomic_window_size, ...)

Pairs of variants:

get_ld_variants_by_pair(variant_id1, variant_id2, ...)

Genomic range:

get_ld_variants_by_range(genomic_range, ...)

All pair combinations of variants:

get_ld_variants_by_pair_combn(variant_id, ...)

Usage

get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

variant_id

Variant identifiers, e.g., 'rs123'. This argument is to be used with either function get_ld_variants_by_window() or get_ld_variants_by_pair_combn(). In the case of get_ld_variants_by_pair_combn() all pairwise combinations of elements of variant_id are used to define pairs of variants for querying. Note that this argument is not the same as variant_id1 or variant_id2, to be used with function get_ld_variants_by_pair.

genomic_window_size

An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in variant_id. This argument is to be used with function get_ld_variants_by_window(). At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed as variant_id. The minimum value for this argument is 1L, not 0L.

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

population

Population for which to compute linkage disequilibrium. See get_populations on how to find available populations for a species.

d_prime

DD' is a measure of linkage disequilibrium. d_prime defines a cut-off threshold: only variants whose DD' \ged_prime are returned.

r_squared

r2r^2 is a measure of linkage disequilibrium. r_squared defines a cut-off threshold: only variants whose r2r^2 \ger_squared are returned. The lower bound for r_squared is 0.05, not 0; the upper bound is 1.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

variant_id1

The first variant of a pair of variants. Used with variant_id2. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

variant_id2

The second variant of a pair of variants. Used with variant_id1. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

genomic_range

Genomic range formatted as a string "chr:start..end", e.g., "X:1..10000". Check function genomic_range to easily create these ranges from vectors of start and end positions. This argument is to be used with function get_ld_variants_by_range().

Value

A tibble of 6 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population for which to compute linkage disequilibrium.

variant_id1

First variant identifier.

variant_id2

Second variant identifier.

d_prime

DD' between the two variants.

r_squared

r2r^2 between the two variants.

Examples

# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)

# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
  variant_id1 = c('rs123', 'rs35439278'),
  variant_id2 = c('rs122', 'rs35174522')
)

# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')

# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))

Get populations for a species

Description

This function retrieves population-level information. The data is returned as a tibble where each row is a population of a given species and the columns are metadata about each population. See below under section Value for details about each column. The parameter ld_only to restrict populations returned to only populations with linkage disequilibrium information.

Usage

get_populations(
  species_name = "homo_sapiens",
  ld_only = TRUE,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

ld_only

Whether to restrict populations returned to only populations with linkage disequilibrium data.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 4 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population.

description

Description of the population.

cohort_size

Cohort sample size.

Ensembl REST API endpoints

get_populations() makes GET requests to /info/variation/populations/:species.

Examples

# Get all human populations with linkage disequilibrium data
get_populations(species_name = 'homo_sapiens', ld_only = TRUE)

# Get all human populations
get_populations(species_name = 'homo_sapiens', ld_only = FALSE)

Retrieve the current version of the Ensembl REST API

Description

Retrieve the current version of the Ensembl REST API

Usage

get_rest_version(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

A scalar character vector with Ensembl REST API version.


Retrieve the Perl API version

Description

Retrieve the Perl API version

Usage

get_software_version(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

A scalar integer vector with the Perl API version.


Get Ensembl species

Description

This function retrieves species-level information. The data is returned as a tibble where each row is a species and the columns are metadata about each species. See below under section Value for details about each column.

Usage

get_species(
  division = get_divisions(),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

division

Ensembl division, e.g., "EnsemblVertebrates" or "EnsemblBacteria", or a combination of several divisions. Check function get_divisions to get available Ensembl divisions.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 12 variables:

division

Ensembl division: "EnsemblVertebrates", "EnsemblMetazoa", "EnsemblPlants", "EnsemblProtists", "EnsemblFungi" or "EnsemblBacteria".

taxon_id

NCBI taxon identifier.

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

species_display_name

Species display name: the name used for display on Ensembl website.

species_common_name

Species common name.

release

Ensembl release version.

genome_assembly_name

Code name of the genome assembly.

genbank_assembly_accession

Genbank assembly accession identifier.

strain

Species strain.

strain_collection

Species strain collection.

species_aliases

Other names or acronyms used to refer to the species. Note that this column is of the list type.

groups

Ensembl databases for which data exists for this species. Note that this column is of the list type.


Get toplevel sequences details

Description

This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequence_info(
  species_name = "homo_sapiens",
  toplevel_sequence = c(1:22, "X", "Y", "MT"),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

toplevel_sequence

A toplevel sequence name, e.g. chromosome names such as "1", "X", or "Y", or a non-chromosome sequence, e.g., a scaffold such as "KI270757.1".

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 8 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

toplevel_sequence

Name of the toplevel sequence.

is_chromosome

A logical indicating whether the toplevel sequence is a chromosome (TRUE) or not (FALSE).

coord_system

Coordinate system type.

assembly_exception_type

Coordinate system type.

is_circular

A logical indicating whether the toplevel sequence is a circular sequence (TRUE) or not (FALSE).

assembly_name

Assembly name.

length

Genomic length toplevel sequence in base pairs.

See Also

get_toplevel_sequences()

Examples

# Get details about human chromosomes (default)
get_toplevel_sequence_info()

# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')

Get toplevel sequences by species

Description

This function retrieves toplevel sequences. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequences(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 4 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

coord_system

Coordinate system type.

toplevel_sequence

Name of the toplevel sequence.

length

Genomic length toplevel sequence in base pairs.

Examples

# Get toplevel sequences for the human genome (default)
get_toplevel_sequences()

# Get toplevel sequences for Caenorhabditis elegans
get_toplevel_sequences('caenorhabditis_elegans')

Retrieve variant consequences

Description

This function retrieves variant consequence types. For more details check Ensembl Variation - Calculated variant consequences.

Usage

get_variant_consequences(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty about the underlying requests.

warnings

Whether to print warnings.

Details

A rule-based approach is used to predict the effects that each allele of a variant may have on each transcript. These effects are variant consequences, that are catalogued as consequence terms, defined by the Sequence Ontology.

See below a diagram showing the location of each display term relative to the transcript structure:

Figure: consequences-fs8.png

Value

A tibble, each row being a variant consequence, of four variables:

SO_accession

Sequence Ontology accession, e.g., 'SO:0001626'.

SO_term

Sequence Ontology term, e.g., 'incomplete_terminal_codon_variant'.

label

Display term.

description

Sequence Ontology description.

Ensembl REST API endpoints

get_variant_consequence_types makes GET requests to /info/variation/consequence_types.

Examples

# Retrieve variant consequence types
get_variant_consequences()

Retrieve variant sources

Description

This function retrieves variant sources, i.e. a list of databases used by Ensembl from which variant information is retrieved.

Usage

get_variation_sources(
  species_name = "human",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a variant database, of 8 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

db_name

Database name.

type

Database type, e.g., chip (genotyping chip) or lsdb (locus-specific database).

version

Database version.

somatic_status

Somatic status.

description

Database description.

url

Database's URL.

data_types

Data types to be found at database.

Ensembl REST API endpoints

get_variation_sources makes GET requests to info/variation/:species.

Examples

# Retrieve variant sources for human (default)
get_variation_sources()

# Retrieve variant sources for mouse
get_variation_sources(species_name = 'mus_musculus')

Retrieve Ensembl REST versions

Description

This function gets the versions of the different entities involved in the REST API requests. When accessing the Ensembl REST API, you are actually accessing three interconnected entities:

  • Ensembl databases (data).

  • Perl API (software).

  • REST API (rest).

ensembl_api_versioning_wo_fonts.svg

Usage

get_versioning(verbose = FALSE, warnings = TRUE)

Arguments

verbose

Whether to be chatty.

warnings

Whether to print warnings.

Value

A named list of three elements: data, software and rest.

Examples

# Get the versions of the different entities involved in the REST API
# requests.
get_versioning()

Get cross-references by Ensembl ID

Description

This function retrieves cross-references to external databases by Ensembl identifier. The data is returned as a tibble where each row is a cross reference related to the provided Ensembl identifier. See below under section Value for details about each column.

Usage

get_xrefs_by_ensembl_id(
  species_name,
  ensembl_id,
  all_levels = FALSE,
  ensembl_db = "core",
  external_db = "",
  feature = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

all_levels

A logical vector. Set to find all genetic features linked to the stable ID, and fetch all external references for them. Specifying this on a gene will also return values from its transcripts and translations.

ensembl_db

Restrict the search to an Ensembl database: typically one of 'core', 'rnaseq', 'cdna', 'funcgen' and 'otherfeatures'.

external_db

External database to be filtered by. By default no filtering is applied.

feature

Restrict search to a feature type: gene ('gene'), exon ('exon'), transcript ('transcript'), and protein ('translation').

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 12 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

ensembl_db

Ensembl database.

primary_id

Primary identification in external database.

display_id

Display identification in external database.

external_db_name

External database name.

external_db_display_name

External database display name.

version

TODO

info_type

There are two types of external cross references (XRef): direct ('DIRECT') or dependent ('DEPENDENT'). A direct cross reference is one that can be directly linked to a gene, transcript or translation object in Ensembl Genomes by synonymy or sequence similarity. A dependent cross reference is one that is transitively linked to the object via the direct cross reference. The value can also be 'UNMAPPED' for unmapped cross references, or 'PROJECTION' for TODO.

info_text

TODO

synonyms

Other names or acronyms used to refer to the the external database entry. Note that this column is of the list type.

description

Brief description of the external database entry.

Ensembl REST API endpoints

get_xrefs_by_ensembl_id() makes GET requests to /xrefs/id/:id.

Examples

get_xrefs_by_ensembl_id('human', 'ENSG00000248378')

get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)

Get cross references by gene symbol or name

Description

This function retrieves cross references by symbol or display name of a gene. The data is returned as a tibble where each row is a cross reference related to the provided symbol or display name of a gene. See below under section Value for details about each column.

Usage

get_xrefs_by_gene(
  species_name,
  gene,
  ensembl_db = "core",
  external_db = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

gene

Symbol or display name of a gene, e.g., 'ACTB' or 'BRCA2'.

ensembl_db

Restrict the search to a database other than the default. Ensembl's default database is 'core'.

external_db

Filter by external database, e.g. 'HGNC'. An empty string indicates no filtering.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 12 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

gene

Gene symbol.

ensembl_db

Ensembl database.

primary_id

Primary identification in external database.

display_id

Display identification in external database.

external_db_name

External database name.

external_db_display_name

External database display name.

version

TODO

info_type

There are two types of external cross references (XRef): direct ('DIRECT') or dependent ('DEPENDENT'). A direct cross reference is one that can be directly linked to a gene, transcript or translation object in Ensembl Genomes by synonymy or sequence similarity. A dependent cross reference is one that is transitively linked to the object via the direct cross reference. The value can also be 'UNMAPPED' for unmapped cross references, or 'PROJECTION' for TODO.

info_text

TODO

synonyms

Other names or acronyms used to refer to the gene. Note that this column is of the list type.

description

Brief description of the external database entry.

Ensembl REST API endpoints

get_xrefs_by_gene() makes GET requests to /xrefs/name/:species/:name.

Examples

# Get cross references that relate to gene BRCA2
get_xrefs_by_gene(species_name = 'human', gene = 'BRCA2')

Is the Ensembl REST API server reachable?

Description

Check if the Ensembl server where REST API service is running is reachable. This function attempts to connect to https://rest.ensembl.org, returning TRUE on success, and FALSE otherwise. Set verbose = TRUE for a step by step description of the connection attempt.

Usage

is_ensembl_reachable(url = ensembl_server(), port = 443L, verbose = FALSE)

Arguments

url

Ensembl REST API server URL. Default is https://rest.ensembl.org. You should not need to change this parameter.

port

Network port on which to ping the server. You should not need to change this parameter.

verbose

Whether to be verbose (TRUE) or not (FALSE).

Value

A logical value: TRUE if EBI server is reachable, FALSE otherwise.

Examples

# Check if the Ensembl Server is reachable
is_ensembl_reachable() # Returns TRUE or FALSE.

# Check if the GWAS Catalog Server is reachable
# and show exactly at what step is it failing (if that is the case)
is_ensembl_reachable(verbose = TRUE)

Ensembl REST API Endpoints.

Description

A dataset containing the Ensembl REST API endpoints, as listed in https://rest.ensembl.org/.

Usage

rest_api_endpoints

Format

A data frame with 118 rows and 4 variables:

section

Section.

endpoint

Ensembl REST API endpoint.

description

A short description of the resource.

last_update_date

Time stamp of last time this dataset was downloaded from Ensembl.

Source

https://rest.ensembl.org/