Title: | REST API Client for the 'PGS' Catalog |
---|---|
Description: | Programmatic access to the 'PGS' Catalog. This package provides easy access to 'PGS' Catalog data by accessing the REST API <https://www.pgscatalog.org/rest/>. |
Authors: | Ramiro Magno [aut, cre] , Isabel Duarte [aut] , Ana-Teresa Maia [aut] , CINTESIS [cph, fnd] |
Maintainer: | Ramiro Magno <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.5 |
Built: | 2024-11-02 02:56:47 UTC |
Source: | https://github.com/maialab/quincunx |
A dataset containing the ancestry categories defined in NHGRI-EBI GWAS
Catalog framework (Table 1, doi:10.1186/s13059-018-1396-2). Ancestry
categories are assigned to samples with distinct and well-defined patterns of
genetic variation. You will find these categories in the variable
ancestry_category
of the following objects: scores,
performance_metrics and sample_sets. Ancestry
categories (ancestry_category
) are further clustered into ancestry
classes (ancestry_class
).
ancestry_categories
ancestry_categories
A data frame with 19 ancestry categories (rows) and 6 columns:
Ancestry category.
To reduce the complexity associated with the many
ancestry categories, some have been merged into higher-level groupings
(ancestry_class
). These groupings represent the current breadth of
data in the PGS Catalog and are likely to change as more data is added.
3-letter code for the ancestry_class
e.g. "EUR"
or "MAE"
.
Hexadecimal colour code associated with
ancestry groupings (ancestry_class
). This can be useful when
visually communicating about ancestries.
Description of the ancestry category.
Examples of detailed descriptions of sample ancestries included in the category.
ancestry_categories
ancestry_categories
Binds together PGS Catalog objects of the same class. Note that
bind()
preserves duplicates whereas
union
does not.
bind(x, ...)
bind(x, ...)
x |
An object of either class scores, publications, traits, performance_metrics, sample_sets, cohorts or trait_categories. |
... |
Objects of the same class as |
An object of the same class as x
.
# Get some `scores` objects: my_scores_1 <- get_scores(c('PGS000012', 'PGS000013')) my_scores_2 <- get_scores(c('PGS000013', 'PGS000014')) # NB: with `bind()`, PGS000013 is repeated (as opposed to `union()`) bind(my_scores_1, my_scores_2)@scores
# Get some `scores` objects: my_scores_1 <- get_scores(c('PGS000012', 'PGS000013')) my_scores_2 <- get_scores(c('PGS000013', 'PGS000014')) # NB: with `bind()`, PGS000013 is repeated (as opposed to `union()`) bind(my_scores_1, my_scores_2)@scores
quincunx uses memoised functions for the REST API calls. Use this function to reset the cache.
clear_cache()
clear_cache()
Returns a logical value, indicating whether the resetting of the
cache was successful (TRUE
) or not FALSE
.
clear_cache()
clear_cache()
The cohorts object consists of two tables (slots) that combined form a
relational database of a subset of cohorts. Each cohort is an observation
(row) in the cohorts
table (first table).
cohorts
A table of cohorts. Each cohort (row) is identified by its
cohort_symbol
. Columns:
Cohort symbol. Example: "CECILE"
.
Cohort full name. Example: "CECILE Breast Cancer
Study"
.
pgs_ids
A table of cohorts and their associated polygenic scores identifiers. Columns:
Cohort symbol. Example: "CECILE"
.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas/dev"
or "eval"
.
Retrieves ancestry categories and classes. This function simply returns the
object ancestry_categories
.
get_ancestry_categories()
get_ancestry_categories()
A tibble with ancestry categories, classes and associated
information. See ancestry_categories
for details about each
column.
get_ancestry_categories()
get_ancestry_categories()
Retrieves cohorts via the PGS Catalog REST API. Please note that all
cohort_symbol
is vectorised, thus allowing for batch mode search.
get_cohorts( cohort_symbol = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_cohorts( cohort_symbol = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
cohort_symbol |
A cohort symbol or |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
A cohorts object.
# Get information about specific cohorts by their symbols (acronyms) get_cohorts(cohort_symbol = c('23andMe', 'IPOBCS')) # Get info on all cohorts (may take a few minutes to download) ## Not run: get_cohorts() ## End(Not run)
# Get information about specific cohorts by their symbols (acronyms) get_cohorts(cohort_symbol = c('23andMe', 'IPOBCS')) # Get info on all cohorts (may take a few minutes to download) ## Not run: get_cohorts() ## End(Not run)
Retrieves performance metrics via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all performance metrics that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_performance_metrics( ppm_id = NULL, pgs_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_performance_metrics( ppm_id = NULL, pgs_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
ppm_id |
A character vector of PGS Catalog performance metrics accession identifiers. |
pgs_id |
A |
set_operation |
Either |
interactive |
A logical. If all performance metrics are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Please note that all search criteria are vectorised, thus allowing for batch mode search.
A performance_metrics object.
## Not run: # Get performance metrics catalogued with identifier 'PPM000001' get_performance_metrics(ppm_id = 'PPM000001') # Get performance metrics associated with polygenic score id 'PGS000001' get_performance_metrics(pgs_id = 'PGS000001') # To retrieve all catalogued performed metrics in PGS Catalog you simply # leave the parameters `ppm_id` and `pgs_id` as `NULL`. get_performance_metrics() ## End(Not run)
## Not run: # Get performance metrics catalogued with identifier 'PPM000001' get_performance_metrics(ppm_id = 'PPM000001') # Get performance metrics associated with polygenic score id 'PGS000001' get_performance_metrics(pgs_id = 'PGS000001') # To retrieve all catalogued performed metrics in PGS Catalog you simply # leave the parameters `ppm_id` and `pgs_id` as `NULL`. get_performance_metrics() ## End(Not run)
Retrieves PGS publications via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all publications that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_publications( pgp_id = NULL, pgs_id = NULL, pubmed_id = NULL, author = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_publications( pgp_id = NULL, pgs_id = NULL, pubmed_id = NULL, author = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_id |
A character vector of PGS Catalog publication accession identifiers. |
pgs_id |
A |
pubmed_id |
An |
author |
A character vector of author names, any author in the list of
authors in a publication, .e.g. |
set_operation |
Either |
interactive |
A logical. If all publications are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Please note that all search criteria are vectorised, thus allowing for batch
mode search. For more details see the help vignette:
vignette("getting-pgs-publications", package = "quincunx")
.
A publications object.
## Not run: # Get PGS publications by their identifier get_publications(pgp_id = c('PGP000001', 'PGP000002')) # By polygenic score identifier get_publications(pgs_id = 'PGS000003') # By PubMed identifier get_publications(pubmed_id = '30554720') # By author's last name get_publications(author = 'Natarajan') ## End(Not run)
## Not run: # Get PGS publications by their identifier get_publications(pgp_id = c('PGP000001', 'PGP000002')) # By polygenic score identifier get_publications(pgs_id = 'PGS000003') # By PubMed identifier get_publications(pubmed_id = '30554720') # By author's last name get_publications(author = 'Natarajan') ## End(Not run)
This function retrieves PGS Catalog release information. Note that the
columns pgs_id
, ppm_id
and pgp_id
contain in each
element a vector. These columns can be unnested using
unnest_longer
(see Examples).
get_releases( date = "latest", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_releases( date = "latest", verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
date |
One or more dates formatted as |
verbose |
Whether to print information about the underlying requests to the REST API server. |
warnings |
Whether to print warnings about the underlying requests to the REST API server. |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
A data frame where each row is a release. Columns are:
Release date.
Number of released Polygenic Score (PGS) identifiers
(pgs_id
).
Number of released Performance Metric (PPM) identifiers
(ppm_id
).
Number of released PGS Catalog Publication (PGP) identifiers
(pgp_id
).
Released Polygenic Score (PGS) identifiers.
Released Performance Metric (PPM) identifiers.
Released PGS Catalog Publication (PGP) identifiers.
News about the release.
## Not run: # Get the latest release get_releases() get_releases(date = 'latest') # Get all releases get_releases(date = 'all') # Get a specific release by date get_releases(date = '2020-08-19') ## End(Not run)
## Not run: # Get the latest release get_releases() get_releases(date = 'latest') # Get all releases get_releases(date = 'all') # Get a specific release by date get_releases(date = '2020-08-19') ## End(Not run)
Retrieves sample sets via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all sample sets that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_sample_sets( pss_id = NULL, pgs_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_sample_sets( pss_id = NULL, pgs_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pss_id |
A character vector of PGS Catalog sample sets accession identifiers. |
pgs_id |
A |
set_operation |
Either |
interactive |
A logical. If all sample sets are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Please note that all search criteria are vectorised, thus allowing for batch mode search.
A sample_sets object.
## Not run: # Search by PGS identifier get_sample_sets(pgs_id = 'PGS000013') # Search by the PSS identifier get_sample_sets(pss_id = 'PSS000068') ## End(Not run)
## Not run: # Search by PGS identifier get_sample_sets(pgs_id = 'PGS000013') # Search by the PSS identifier get_sample_sets(pss_id = 'PSS000068') ## End(Not run)
Retrieves polygenic scores via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all scores that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_scores( pgs_id = NULL, efo_id = NULL, pubmed_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_scores( pgs_id = NULL, efo_id = NULL, pubmed_id = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_id |
A |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An |
set_operation |
Either |
interactive |
A logical. If all scores are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Please note that all search criteria are vectorised, thus allowing for batch mode search.
A scores object.
## Not run: # By `pgs_id` get_scores(pgs_id = 'PGS000088') # By `efo_id` get_scores(efo_id = 'EFO_0007992') # By `pubmed_id` get_scores(pubmed_id = '25748612') ## End(Not run)
## Not run: # By `pgs_id` get_scores(pgs_id = 'PGS000088') # By `efo_id` get_scores(efo_id = 'EFO_0007992') # By `pubmed_id` get_scores(pubmed_id = '25748612') ## End(Not run)
Retrieves all trait categories via the PGS Catalog REST API.
get_trait_categories(verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
get_trait_categories(verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
A trait_categories object.
get_trait_categories(progress_bar = FALSE)
get_trait_categories(progress_bar = FALSE)
Retrieves traits via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all traits that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
traits that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_traits( efo_id = NULL, trait_term = NULL, exact_term = TRUE, include_children = FALSE, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
get_traits( efo_id = NULL, trait_term = NULL, exact_term = TRUE, include_children = FALSE, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
efo_id |
A character vector of EFO identifiers. |
trait_term |
A character vector of terms to be matched against trait
identifiers ( |
exact_term |
A logical value, indicating whether to match the
|
include_children |
A logical value, indicating whether to include child traits or not. |
set_operation |
Either |
interactive |
A logical. If all traits are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Please note that all search criteria are vectorised, thus allowing for batch mode search.
A traits object.
## Not run: # Get a trait by its EFO identifier get_traits(efo_id = 'EFO_0004631') # Get a trait by matching a term in EFO identifier (`efo_id`), label, # description synonyms, categories, or external mapped terms get_traits(trait_term = 'stroke', exact_term = FALSE) # Get a trait matching its name (`trait`) exactly (default) get_traits(trait_term = 'stroke', exact_term = TRUE) # Get traits, excluding its children traits (default) get_traits(trait_term = 'breast cancer') # Get traits, including its children traits (check column `is_child` for # child traits) get_traits(trait_term = 'breast cancer', include_children = TRUE) ## End(Not run)
## Not run: # Get a trait by its EFO identifier get_traits(efo_id = 'EFO_0004631') # Get a trait by matching a term in EFO identifier (`efo_id`), label, # description synonyms, categories, or external mapped terms get_traits(trait_term = 'stroke', exact_term = FALSE) # Get a trait matching its name (`trait`) exactly (default) get_traits(trait_term = 'stroke', exact_term = TRUE) # Get traits, excluding its children traits (default) get_traits(trait_term = 'breast cancer') # Get traits, including its children traits (check column `is_child` for # child traits) get_traits(trait_term = 'breast cancer', include_children = TRUE) ## End(Not run)
This function returns the number of entities in a PGS Catalog object. To
avoid ambiguity with dplyr::n()
use quincunx::n()
.
n(x, unique = FALSE) ## S4 method for signature 'scores' n(x, unique = FALSE) ## S4 method for signature 'publications' n(x, unique = FALSE) ## S4 method for signature 'traits' n(x, unique = FALSE) ## S4 method for signature 'performance_metrics' n(x, unique = FALSE) ## S4 method for signature 'sample_sets' n(x, unique = FALSE) ## S4 method for signature 'cohorts' n(x, unique = FALSE) ## S4 method for signature 'trait_categories' n(x, unique = FALSE) ## S4 method for signature 'releases' n(x, unique = FALSE)
n(x, unique = FALSE) ## S4 method for signature 'scores' n(x, unique = FALSE) ## S4 method for signature 'publications' n(x, unique = FALSE) ## S4 method for signature 'traits' n(x, unique = FALSE) ## S4 method for signature 'performance_metrics' n(x, unique = FALSE) ## S4 method for signature 'sample_sets' n(x, unique = FALSE) ## S4 method for signature 'cohorts' n(x, unique = FALSE) ## S4 method for signature 'trait_categories' n(x, unique = FALSE) ## S4 method for signature 'releases' n(x, unique = FALSE)
x |
A scores, publications, traits, performance_metrics, sample_sets, cohorts, trait_categories or releases object. |
unique |
Whether to count only unique entries ( |
An integer scalar.
# Return the number of polygenic scores in a scores object: my_scores <- get_scores(pgs_id = c('PGS000007', 'PGS000007', 'PGS000042')) n(my_scores) # If you want to count unique scores only, then use the `unique` parameter: n(my_scores, unique = TRUE) # Total number of curated publications in the PGS Catalog: all_pub <- get_publications(interactive = FALSE, progress_bar = FALSE) n(all_pub) # Total number of curated traits in the PGS Catalog: all_traits <- get_traits(interactive = FALSE, progress_bar = FALSE) n(all_traits)
# Return the number of polygenic scores in a scores object: my_scores <- get_scores(pgs_id = c('PGS000007', 'PGS000007', 'PGS000042')) n(my_scores) # If you want to count unique scores only, then use the `unique` parameter: n(my_scores, unique = TRUE) # Total number of curated publications in the PGS Catalog: all_pub <- get_publications(interactive = FALSE, progress_bar = FALSE) n(all_pub) # Total number of curated traits in the PGS Catalog: all_traits <- get_traits(interactive = FALSE, progress_bar = FALSE) n(all_traits)
This function launches the web browser at dbSNP and opens a tab for each SNP identifier.
open_in_dbsnp(variant_id)
open_in_dbsnp(variant_id)
variant_id |
A variant identifier, a character vector. |
Returns TRUE
if successful. Note however that this
function is run for its side effect.
open_in_dbsnp('rs56261590')
open_in_dbsnp('rs56261590')
This function launches the web browser and opens a tab for each identifier on the PGS Catalog web graphical user interface: https://www.pgscatalog.org/.
open_in_pgs_catalog( identifier = NULL, pgs_catalog_entity = c("pgs", "pgp", "pss", "efo") )
open_in_pgs_catalog( identifier = NULL, pgs_catalog_entity = c("pgs", "pgp", "pss", "efo") )
identifier |
A vector of identifiers. The identifiers can be: PGS, PGP, PSS or EFO identifiers. |
pgs_catalog_entity |
Either |
Returns TRUE
if successful, or FALSE
otherwise. But
note that this function is run for its side effect.
# Open in PGS scores Catalog Web Graphical User Interface open_in_pgs_catalog(c('PGS000001', 'PGS000002')) # Open PGS Catalog Publications open_in_pgs_catalog(c('PGP000001', 'PGP000002'), pgs_catalog_entity = 'pgp') # Open Sample Sets (PSS) open_in_pgs_catalog(c('PSS000001', 'PSS000002'), pgs_catalog_entity = 'pss') # Open EFO traits (EFO) open_in_pgs_catalog(c('EFO_0001645', 'MONDO_0007254'), pgs_catalog_entity = 'efo')
# Open in PGS scores Catalog Web Graphical User Interface open_in_pgs_catalog(c('PGS000001', 'PGS000002')) # Open PGS Catalog Publications open_in_pgs_catalog(c('PGP000001', 'PGP000002'), pgs_catalog_entity = 'pgp') # Open Sample Sets (PSS) open_in_pgs_catalog(c('PSS000001', 'PSS000002'), pgs_catalog_entity = 'pss') # Open EFO traits (EFO) open_in_pgs_catalog(c('EFO_0001645', 'MONDO_0007254'), pgs_catalog_entity = 'efo')
This function launches the web browser and opens a tab for each PubMed citation.
open_in_pubmed(pubmed_id)
open_in_pubmed(pubmed_id)
pubmed_id |
A PubMed identifier, either a character or an integer vector. |
Returns TRUE
if successful. Note however that this
function is run for its side effect.
open_in_pubmed(c('26301688', '30595370'))
open_in_pubmed(c('26301688', '30595370'))
The performance_metrics object consists of nine tables (slots) that combined
form a relational database of a subset of performance metrics. Each
performance metric is an observation (row) in the scores
table (first
table).
performance_metrics
A table of PGS Performance Metrics (PPM). Each PPM (row) is
uniquely identified by the ppm_id
column. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
Polygenic Score (PGS) identifier.
The author-reported trait that the PGS has been
developed to predict. Example: "Breast Cancer"
.
Comma-separated list of covariates used in the prediction model to evaluate the PGS.
Any other information relevant to the understanding of the performance metrics.
publications
A table of publications. Each publication (row) is
uniquely identified by the column pgp_id
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
PGS Publication identifier. Example: "PGP000001"
.
PubMed
identifier. Example: "25855707"
.
Publication date. Example: "2020-09-28"
. Note
that the class of publication_date
is Date
.
Abbreviated name of the journal. Example: "Am J Hum
Genet"
.
Publication title.
First author of the publication. Example:
'Mavaddat N'
.
Digital Object Identifier (DOI). This variable is also curated to
allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
sample_sets
A table of sample sets. Each sample set (row) is uniquely
identified by the column pss_id
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
A PGS Sample Set identifier. Example: "PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by
the combination of values from the columns: ppm_id
, pss_id
,
and sample_id
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate key to identify each sample.
Sample stage: should be always Evaluation ("eval"
).
Number of individuals included in the sample.
Number of cases.
Number of controls.
Percentage of male participants.
Detailed phenotype description.
Author reported ancestry is mapped to the best matching
ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
Author reported countries of recruitment (if available).
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
PubMed identifier.
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each
demographics' variable (row) is uniquely identified by the combination of
values from the columns: ppm_id
, pss_id
, sample_id
,
and variable
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate identifier to identify each sample.
Demographics variable. Following columns report about the indicated variable.
Type of statistical estimate for variable.
The variable's statistical value.
Unit of the variable.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by
the combination of values from the columns: ppm_id
, sample_id
and cohort_symbol
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
Sample identifier. This is a surrogate key to identify each sample.
Cohort symbol.
Cohort full name.
pgs_effect_sizes
A table of effect sizes per standard deviation change
in PGS. Examples include regression coefficients (betas) for continuous
traits, odds ratios (OR) and/or hazard ratios (HR) for dichotomous traits
depending on the availability of time-to-event data. Each effect size is
uniquely identified by the combination of values from the columns:
ppm_id
and effect_size_id
. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
Effect size identifier. This is a surrogate identifier to identify each effect size.
Long notation of the effect size (e.g. Odds Ratio).
Short notation of the effect size (e.g. OR).
The estimate's value.
Unit of the estimate.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
pgs_classification_metrics
A table of classification metrics. Examples include the Area under the Receiver Operating Characteristic (AUROC) or Harrell's C-index (Concordance statistic). Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
Classification metric identifier. This is a surrogate identifier to identify each classification metric.
Long notation of the classification metric (e.g. Concordance Statistic).
Short notation classification metric (e.g. C-index).
The estimate's value.
Unit of the estimate.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
pgs_other_metrics
A table of other metrics that are neither effect sizes nor classification metrics. Examples include: R² (proportion of the variance explained), or reclassification metrics. Columns:
A PGS Performance Metrics identifier. Example: "PPM000001"
.
Other metric identifier. This is a surrogate identifier to identify each metric.
Long notation of the metric. Example: "Proportion of the variance explained".
Short notation metric. Example: "R²".
The estimate's value.
Unit of the estimate.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
Map PGP identifiers to PGS identifiers.
pgp_to_pgs( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_to_pgs( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgp_id
and pgs_id
.
## Not run: pgp_to_pgs('PGP000001') pgp_to_pgs(c('PGP000017', 'PGP000042')) ## End(Not run)
## Not run: pgp_to_pgs('PGP000001') pgp_to_pgs(c('PGP000017', 'PGP000042')) ## End(Not run)
Map PGP identifiers to PPM identifiers.
pgp_to_ppm( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_to_ppm( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgp_id
and ppm_id
.
## Not run: pgp_to_ppm('PGP000001') pgp_to_ppm(c('PGP000017', 'PGP000042')) ## End(Not run)
## Not run: pgp_to_ppm('PGP000001') pgp_to_ppm(c('PGP000017', 'PGP000042')) ## End(Not run)
Map PGP identifiers to PSS identifiers.
pgp_to_pss( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_to_pss( pgp_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgp_id
and pss_id
.
## Not run: pgp_to_pss('PGP000001') pgp_to_pss(c('PGP000017', 'PGP000042')) ## End(Not run)
## Not run: pgp_to_pss('PGP000001') pgp_to_pss(c('PGP000017', 'PGP000042')) ## End(Not run)
Map PGS identifiers to PGP identifiers.
pgs_to_pgp( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_to_pgp( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_id |
A character vector of PGS identifiers,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgs_id
and pgp_id
.
## Not run: pgs_to_pgp('PGS000001') pgs_to_pgp(c('PGS000017', 'PGS000042')) ## End(Not run)
## Not run: pgs_to_pgp('PGS000001') pgs_to_pgp(c('PGS000017', 'PGS000042')) ## End(Not run)
Map PGS identifiers to PPM identifiers.
pgs_to_ppm(pgs_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pgs_to_ppm(pgs_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pgs_id |
A character vector of PGS identifiers, e.g., "PGS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgs_id
and ppm_id
.
## Not run: pgs_to_ppm('PGS000001') pgs_to_ppm(c('PGS000017', 'PGS000042')) ## End(Not run)
## Not run: pgs_to_ppm('PGS000001') pgs_to_ppm(c('PGS000017', 'PGS000042')) ## End(Not run)
Map PGS identifiers to PSS identifiers.
pgs_to_pss( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_to_pss( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_id |
A character vector of PGS identifiers,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgs_id
and pss_id
.
## Not run: pgs_to_pss('PGS000001') pgs_to_pss(c('PGS000017', 'PGS000042')) ## End(Not run)
## Not run: pgs_to_pss('PGS000001') pgs_to_pss(c('PGS000017', 'PGS000042')) ## End(Not run)
Map PGS identifiers to GWAS study identifiers. Retrieves GWAS study identifiers associated with samples used in the discovery stage of queried PGS identifiers.
pgs_to_study( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_to_study( pgs_id = NULL, verbose = FALSE, warnings = TRUE, progress_bar = TRUE )
pgs_id |
A character vector of PGS Catalog score accession identifiers.,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pgs_id
and study_id
.
## Not run: pgs_to_study('PGS000001') # Unmappable pgs ids will be missing, e.g., PGS000023 pgs_to_study(c('PGS000013', 'PGS000023')) ## End(Not run)
## Not run: pgs_to_study('PGS000001') # Unmappable pgs ids will be missing, e.g., PGS000023 pgs_to_study(c('PGS000013', 'PGS000023')) ## End(Not run)
Map PPM identifiers to PGP identifiers.
ppm_to_pgp(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_to_pgp(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_id |
A character vector of PPM identifiers, e.g., "PPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: ppm_id
and pgp_id
.
## Not run: ppm_to_pgp('PPM000001') ppm_to_pgp(c('PPM000017', 'PPM000042')) ## End(Not run)
## Not run: ppm_to_pgp('PPM000001') ppm_to_pgp(c('PPM000017', 'PPM000042')) ## End(Not run)
Map PPM identifiers to PGS identifiers.
ppm_to_pgs(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_to_pgs(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_id |
A character vector of PPM identifiers, e.g., "PPPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: ppm_id
and pgs_id
.
## Not run: ppm_to_pgs('PPM000001') ppm_to_pgs(c('PPM000017', 'PPM000042')) ## End(Not run)
## Not run: ppm_to_pgs('PPM000001') ppm_to_pgs(c('PPM000017', 'PPM000042')) ## End(Not run)
Map PPM identifiers to PSS identifiers.
ppm_to_pss(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_to_pss(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
ppm_id |
A character vector of PPM identifiers, e.g., "PPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: ppm_id
and pss_id
.
## Not run: ppm_to_pss('PPM000001') ppm_to_pss(c('PPM000017', 'PPM000042')) ## End(Not run)
## Not run: ppm_to_pss('PPM000001') ppm_to_pss(c('PPM000017', 'PPM000042')) ## End(Not run)
Map PSS identifiers to PGP identifiers. This is a slow function because it starts by downloading first all Performance Metrics, as this is the linkage between PSS and PGP.
pss_to_pgp(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_to_pgp(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pss_id
and pgp_id
.
## Not run: pss_to_pgp('PSS000001') pss_to_pgp(c('PSS000017', 'PSS000042')) ## End(Not run)
## Not run: pss_to_pgp('PSS000001') pss_to_pgp(c('PSS000017', 'PSS000042')) ## End(Not run)
Map PSS identifiers to PGS identifiers. This is a slow function because it starts by downloading first all Performance Metrics, as this is the linkage between PSS and PGS.
pss_to_pgs(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_to_pgs(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pss_id
and pgs_id
.
## Not run: pss_to_pgs('PSS000001') pss_to_pgs(c('PSS000017', 'PSS000042')) ## End(Not run)
## Not run: pss_to_pgs('PSS000001') pss_to_pgs(c('PSS000017', 'PSS000042')) ## End(Not run)
Map PSS identifiers to PPM identifiers. This is a slow function because it starts by downloading first all Performance Metrics.
pss_to_ppm(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_to_ppm(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: pss_id
and ppm_id
.
## Not run: pss_to_ppm('PSS000001') pss_to_ppm(c('PSS000017', 'PSS000042')) ## End(Not run)
## Not run: pss_to_ppm('PSS000001') pss_to_ppm(c('PSS000017', 'PSS000042')) ## End(Not run)
The publications object consists of two tables (slots), each a table that
combined form a relational database of a subset of PGS Catalog Publications.
Each publication is an observation (row) in the publications
table
(first table).
publications
A table of publications. Each publication (row) is
uniquely identified by the pgp_id
column. Columns:
PGS Publication identifier. Example: "PGP000001"
.
PubMed
identifier. Example: "25855707"
.
Publication date. Example: "2020-09-28"
. Note
that the class of publication_date
is Date
.
Abbreviated name of the journal. Example: "Am J Hum
Genet"
.
Publication title.
First author of the publication. Example:
'Mavaddat N'
.
Digital Object Identifier (DOI). This variable is also curated to
allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
Concatenated list of all the publication authors.
pgs_ids
A table of publication and associated PGS identifiers. Columns:
PGS Publication identifier. Example: "PGP000001"
.
Polygenic Score (PGS) identifier.
PGS stage: either "gwas/dev" or "eval".
This function imports a PGS scoring file. For more information about the
scoring file schema check vignette("pgs-scoring-file", package =
"quincunx")
.
read_scoring_file(source, protocol = "http", metadata_only = FALSE)
read_scoring_file(source, protocol = "http", metadata_only = FALSE)
source |
PGS scoring file. This can be specified in three forms: (i) a
PGS identifier, e.g. |
protocol |
Network protocol for communication with the PGS Catalog FTP
server: either |
metadata_only |
Whether to read only the comment block (header) from the scoring file. |
The returned value is a named list. The names are copied from the
arguments passed in source
. Each element of the list contains
another list of two elements: "metadata"
and "data"
. The
"metadata" element contains data parsed from the header of the PGS scoring
file. The "data" element contains a data frame with as many rows as
variants that constitute the PGS score. The columns can vary. There are
mandatory and optional columns. The mandatory columns are those that
identify the variant, effect allele (effect_allele
), and its
respective weight (effect_weight
) in the score. The columns that
identify the variant can either be the rsID
or the combination of
chr_name
and chr_position
. The "data" element will be
NULL
is argument metadata_only
is TRUE
. For more
information about the scoring file schema check
vignette("pgs-scoring-file", package = "quincunx")
.
## Not run: # Read a PGS scoring file by PGS ID # (internally, it translates the PGS ID # to the corresponding FTP URL) try(read_scoring_file("PGS000655")) # Equivalent to `read_scoring_file("PGS000655")` url <- paste0( "http://ftp.ebi.ac.uk/", "pub/databases/spot/pgs/scores/", "PGS000655/ScoringFiles/", "PGS000655.txt.gz" ) read_scoring_file(url) # Reading from a local file try(read_scoring_file("~/PGS000655.txt.gz")) ## End(Not run)
## Not run: # Read a PGS scoring file by PGS ID # (internally, it translates the PGS ID # to the corresponding FTP URL) try(read_scoring_file("PGS000655")) # Equivalent to `read_scoring_file("PGS000655")` url <- paste0( "http://ftp.ebi.ac.uk/", "pub/databases/spot/pgs/scores/", "PGS000655/ScoringFiles/", "PGS000655.txt.gz" ) read_scoring_file(url) # Reading from a local file try(read_scoring_file("~/PGS000655.txt.gz")) ## End(Not run)
The releases object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog releases. Each release is an
observation (row) in the releases
table (first table).
releases
A table of PGS Catalog releases. Each release (row) is
uniquely identified by the release date (date
). Columns:
Release date.
Number of newly released Polygenic Scores.
Number of newly released PGS Performance Metrics.
Number of newly released PGS Publications.
pgs_ids
A table of released Polygenic Scores (PGS) identifiers. Columns:
Release date.
Polygenic Score (PGS) identifier. Example: "PGS000001"
.
ppm_ids
A table of the released PGS Performance Metrics identifiers. Columns:
Release date.
A PGS Performance Metrics identifier. Example: "PPM000001"
.
pgp_ids
A table of the released PGS Publication identifiers. Columns:
Release date.
PGS Publication identifier. Example: "PGP000001"
.
The sample_sets object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog sample sets. Each sample set
is an observation (row) in the sample_sets
table (first table).
sample_sets
A table of sample sets. Each sample set (row) is uniquely
identified by the column pss_id
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by
the combination of values from the columns: pss_id
and
sample_id
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate key to identify each sample.
Sample stage: should be always Evaluation ("eval"
).
Number of individuals included in the sample.
Number of cases.
Number of controls.
Percentage of male participants.
Detailed phenotype description.
Author reported ancestry is mapped to the best matching
ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
Author reported countries of recruitment (if available).
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
PubMed identifier.
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each
demographics' variable (row) is uniquely identified by the combination of
values from the columns: pss_id
, sample_id
, and
variable
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate identifier to identify each sample.
Demographics variable. Following columns report about the indicated variable.
Type of statistical estimate for variable.
The variable's statistical value.
Unit of the variable.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by
the combination of values from the columns: pss_id
, sample_id
and cohort_symbol
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate key to identify each sample.
Cohort symbol.
Cohort full name.
The scores object consists of six tables (slots) that combined form a
relational database of a subset of PGS Catalog polygenic scores. Each score
is an observation (row) in the scores
table (the first table).
scores
A table of polygenic scores. Each polygenic score (row) is
uniquely identified by the pgs_id
column. Columns:
Polygenic Score (PGS) identifier. Example: "PGS000001"
.
This may be the name that the authors describe the PGS with
in the source publication, or a name that a curator of the PGS Catalog has
assigned to identify the score during the curation process (before a PGS
identifier has been given). Example: PRS77_BC
.
URL to the scoring file on the PGS FTP server. Example:
"http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz"
.
Indicate if the PGS data matches the published
polygenic score (TRUE
). If not (FALSE
), the authors have
provided an alternative polygenic for the Catalog and some other data, such
as performance metrics, may differ from the publication.
The author-reported trait that the PGS has been
developed to predict. Example: "Breast Cancer"
.
Any additional description not captured
in the other columns. Example: "Femoral neck BMD (g/cm2)"
.
The name or description of the method or computational algorithm used to develop the PGS.
A description of the relevant inputs and parameters relevant to the PGS development method/process.
Number of variants used to calculate the PGS.
Number of higher-order variant interactions included in the PGS.
The version of the genome assembly that the variants present
in the PGS are associated with. Example: GRCh37
.
The PGS Catalog distributes its data according to EBI's standard Terms of Use. Some PGS have specific terms, licenses, or restrictions (e.g. non-commercial use) that we highlight in this field, if known.
publications
A table of publications. Each publication (row) is
uniquely identified by the pgp_id
column. Columns:
Polygenic Score (PGS) identifier.
PGS Publication identifier. Example: "PGP000001"
.
PubMed
identifier. Example: "25855707"
.
Publication date. Example: "2020-09-28"
. Note
that the class of publication_date
is Date
.
Abbreviated name of the journal. Example: "Am J Hum
Genet"
.
Publication title.
First author of the publication. Example:
'Mavaddat N'
.
Digital Object Identifier (DOI). This variable is also curated to
allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
samples
A table of samples. Each sample (row) is uniquely identified by
the combination of values from the columns: pgs_id
and
sample_id
. Columns:
Polygenic score identifier. An identifier that starts with
'PGS'
and is followed by six digits, e.g. 'PGS000001'
.
Sample identifier. This is a surrogate key to identify each sample.
Sample stage: either "discovery"
or "training"
.
Number of individuals included in the sample.
Number of cases.
Number of controls.
Percentage of male participants.
Detailed phenotype description.
Author reported ancestry is mapped to the best matching
ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
Author reported countries of recruitment (if available).
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
PubMed identifier.
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each
demographics' variable (row) is uniquely identified by the combination of
values from the columns: pgs_id
, sample_id
and
variable
. Columns:
Polygenic Score (PGS) identifier.
Sample identifier. This is a surrogate identifier to identify each sample.
Demographics variable. Following columns report about the indicated variable.
Type of statistical estimate for variable.
The variable's statistical value.
Unit of the variable.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by
the combination of values from the columns: pgs_id
, sample_id
and cohort_symbol
. Columns:
Polygenic Score (PGS) identifier.
Sample identifier. This is a surrogate key to identify each sample.
Cohort symbol.
Cohort full name.
traits
A table of EFO traits. Each trait (row) is uniquely identified
by the combination of the columns pgs_id
and efo_id
. Columns:
Polygenic Score (PGS) identifier.
An EFO identifier.
Trait name.
Detailed description of the trait from EFO.
External link to the EFO entry.
stages_tally
A table of sample sizes and number of samples sets at each stage.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Sample size.
Number of sample sets (only meaningful for the evaluation stage "eval"
)
ancestry_frequencies
This table describes the ancestry composition at each stage.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Ancestry class symbol.
Ancestry fraction (percentage).
multi_ancestry_composition
A table of a breakdown of the ancestries included in multi-ancestries.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Multi-ancestry class symbol.
Ancestry class symbol.
Performs set union, intersection, and (asymmetric!) difference on two objects
of either class scores, publications, traits,
performance_metrics, sample_sets, cohorts or
trait_categories. Note that union()
removes duplicated
entities, whereas bind()
does not.
union(x, y, ...) intersect(x, y, ...) setdiff(x, y, ...) setequal(x, y, ...)
union(x, y, ...) intersect(x, y, ...) setdiff(x, y, ...) setequal(x, y, ...)
x , y
|
Objects of either class scores, publications, traits, performance_metrics, sample_sets, cohorts or trait_categories. |
... |
other arguments passed on to methods. |
In the case of union()
, intersect()
, or setdiff()
: an object of
the same class as x
and y
. In the case of setequal()
, a
logical scalar.
# Get some `scores` objects: my_scores_1 <- get_scores(c('PGS000012', 'PGS000013')) my_scores_2 <- get_scores(c('PGS000013', 'PGS000014')) # # union() # # NB: with `union()`, PGS000013 is not repeated. union(my_scores_1, my_scores_2)@scores # # intersect() # intersect(my_scores_1, my_scores_2)@scores # # setdiff() # setdiff(my_scores_1, my_scores_2)@scores # # setequal() # setequal(my_scores_1, my_scores_2) setequal(my_scores_1, my_scores_1) setequal(my_scores_2, my_scores_2)
# Get some `scores` objects: my_scores_1 <- get_scores(c('PGS000012', 'PGS000013')) my_scores_2 <- get_scores(c('PGS000013', 'PGS000014')) # # union() # # NB: with `union()`, PGS000013 is not repeated. union(my_scores_1, my_scores_2)@scores # # intersect() # intersect(my_scores_1, my_scores_2)@scores # # setdiff() # setdiff(my_scores_1, my_scores_2)@scores # # setequal() # setequal(my_scores_1, my_scores_2) setequal(my_scores_1, my_scores_1) setequal(my_scores_2, my_scores_2)
A dataset containing the various study stages assigned to samples in the PGS Catalog.
stages
stages
A data frame with 5 stages (rows) and 4 columns:
Study stage.
One-letter symbol for the stage, or a comma separated combination thereof.
Stage name.
Stage description.
https://www.pgscatalog.org/docs/ancestry
stages
stages
Map GWAS studies identifiers to PGS identifiers.
study_to_pgs(study_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
study_to_pgs(study_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
study_id |
A character vector of GWAS Catalog study accession identifiers, e.g., "GCST001937". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
A data frame of two columns: study_id
and pgs_id
.
## Not run: study_to_pgs('GCST001937') study_to_pgs(c('GCST000998', 'GCST000338')) ## End(Not run)
## Not run: study_to_pgs('GCST001937') study_to_pgs(c('GCST000998', 'GCST000338')) ## End(Not run)
The trait_categories object consists of two tables (slots) that combined form
a relational database of a subset of PGS Catalog trait categories. Each score
is an observation (row) in the trait_categories
table (first table).
trait_categories
A table of trait categories. Columns:
Trait category name.
traits
A table of associated traits. Columns:
Trait category name.
An EFO identifier.
Trait name.
Detailed description of the trait from EFO.
External link to the EFO entry.
The traits object consists of six slots, each a table
(tibble
), that combined form a relational database of a
subset of PGS Catalog traits. Each trait is an observation (row) in
the traits
table — main table. All tables have the column
efo_id
as primary key.
traits
A table of traits. Columns:
pgs_ids
A table of associated polygenic score identifiers. Columns:
child_pgs_ids
A table of polygenic score identifiers associated with the child traits. Columns:
trait_categories
A table of associated trait categories. Columns:
trait_synonyms
A table of associated trait synonyms. Columns:
trait_mapped_terms
A table of associated external references, identifiers or other terms. Columns:
This function exports a PGS Catalog object to Microsoft Excel xlsx file. Each table (slot) is saved in its own sheet.
write_xlsx(x, file = stop("`file` must be specified"))
write_xlsx(x, file = stop("`file` must be specified"))
x |
A scores, publications, traits, performance_metrics, sample_sets, cohorts, trait_categories or releases object. |
file |
A file name to write to. |
No return value, called for its side effect.