| Title: | Check Research Outputs for Best Practices |
|---|---|
| Description: | A modular, extendable system for checking research outputs for best practices using text search, R code, and/or (optional) LLM queries. |
| Authors: | Lisa DeBruine [aut, cre] (ORCID: <https://orcid.org/0000-0002-7523-5539>), Cristian Mesquida [aut] (ORCID: <https://orcid.org/0000-0002-1542-8355>), Jakub Werner [aut] (ORCID: <https://orcid.org/0009-0004-0388-0362>), Raphael Merz [ctb] (ORCID: <https://orcid.org/0000-0002-9474-3379>), Lukas Wallrich [ctb] (ORCID: <https://orcid.org/0000-0003-2121-5177>), Daniel Lakens [aut] (ORCID: <https://orcid.org/0000-0002-0247-239X>) |
| Maintainer: | Lisa DeBruine <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-06-25 14:58:44 UTC |
| Source: | https://github.com/scienceverse/metacheck |
Signal detection values for modules that classify papers as having a feature or not
accuracy(expected, observed)accuracy(expected, observed)
expected |
a vector of logical values for the expected values |
observed |
a vector of logical values for the observed values |
a list of accuracy parameters
Match table from bib table
add_bib_match(paper, min_score = 50)add_bib_match(paper, min_score = 50)
paper |
a paper or paperlist object |
min_score |
minimal score that is taken to be a reliable match |
the paper or paperlist with bib_match table added
## Not run: paper <- demopaper() paper$bib_match <- NULL # remove existing paper2 <- add_bib_match(paper) paper2$bib_match ## End(Not run)## Not run: paper <- demopaper() paper$bib_match <- NULL # remove existing paper2 <- add_bib_match(paper) paper2$bib_match ## End(Not run)
Retrieve info from AsPredicted by URL
aspredicted_info(ap_url, id_col = 1, wait = 1)aspredicted_info(ap_url, id_col = 1, wait = 1)
ap_url |
an AsPredicted URL, or a table containing them (e.g., as created by |
id_col |
the index or name of the column that contains AsPredicted URLs, if id is a table |
wait |
wait time in seconds |
a data frame of information
Find AsPredicted Links in Papers
aspredicted_links(paper)aspredicted_links(paper)
paper |
a paper object or paperlist object |
a table with the AsPredicted url in the first (href) column
aspredicted_links(psychsci)aspredicted_links(psychsci)
Sends one or more input sentences to a public Gradio app hosted on Hugging Face (the lakens-causal-sentences Space), created based on code by Rasoul Norouzi, retrieves the result via Server-Sent Events (SSE), and returns a tidy data frame with one row per detected cause–effect relation.
causal_relations( sentence, rel_mode = "auto", rel_threshold = 0.5, cause_decision = "cls+span", timeout = 10, verbose = FALSE )causal_relations( sentence, rel_mode = "auto", rel_threshold = 0.5, cause_decision = "cls+span", timeout = 10, verbose = FALSE )
sentence |
A character vector of one or more sentences to analyze for causal relations. |
rel_mode |
Relation extraction mode. Options are |
rel_threshold |
Numeric threshold (default |
cause_decision |
Strategy for cause/effect detection. Options: |
timeout |
Maximum time (in seconds) to wait for the Hugging Face Space to return a result via SSE before aborting. Default is |
verbose |
Logical; if |
The function uses Gradio’s two-step queue API:
(1) a POST request enqueues the job and returns an event_id;
(2) a GET request streams text/event-stream frames until event: complete.
Many Gradio apps emit a double-encoded completion payload of the form
["<final JSON string>"]. This function unwraps that to obtain the final JSON
structure (an array of items containing causal and relations) before parsing.
If a sentence has no relations, the output includes one row with cause = NA,
effect = NA, and the sentence’s causal flag (as returned by the model). For
sentences with multiple relations, the function returns one row per relation.
A base data.frame with columns:
sentence (character): the original input sentence,
causal (logical): whether the sentence is causal per the model,
cause (character): extracted cause span (or NA),
effect (character): extracted effect span (or NA).
Norouzi, R., Kleinberg, B., Vermunt, J. K., & van Lissa, C. J. (2025). Capturing causal claims: A fine-tuned text mining model for extracting causal sentences from social science papers. Research Synthesis Methods, 16(1), 139–156. https://doi.org/10.1017/rsm.2024.13
Hugging Face Model Card: rasoultilburg/SocioCausaNet https://huggingface.co/rasoultilburg/SocioCausaNet
## Not run: # Single sentence df1 <- causal_relations("Smoking causes cancer") print(df1) # Multiple sentences (batch) df2 <- causal_relations(c("Insomnia causes depression.", "Rain leads to flooding.")) print(df2) # Custom parameters and verbose diagnostics df3 <- causal_relations( sentence = "Stress increases blood pressure.", rel_mode = "auto", rel_threshold = 0.4, cause_decision = "cls+span", timeout = 10, verbose = TRUE ) print(df3) ## End(Not run)## Not run: # Single sentence df1 <- causal_relations("Smoking causes cancer") print(df1) # Multiple sentences (batch) df2 <- causal_relations(c("Insomnia causes depression.", "Rain leads to flooding.")) print(df2) # Custom parameters and verbose diagnostics df3 <- causal_relations( sentence = "Stress increases blood pressure.", rel_mode = "auto", rel_threshold = 0.4, cause_decision = "cls+span", timeout = 10, verbose = TRUE ) print(df3) ## End(Not run)
Check code for the presence of absolute paths
code_abs_path(code_text)code_abs_path(code_text)
code_text |
the text of the code, excluding comments |
a vector of absolute paths
code_text <- c( "file <- 'C:/User/lakens/file.R'", "tmp <- '/User/lakens/file.html'", "convert(file, tmp)" ) code_abs_path(code_text)code_text <- c( "file <- 'C:/User/lakens/file.R'", "tmp <- '/User/lakens/file.html'", "convert(file, tmp)" ) code_abs_path(code_text)
Convert Rmd/qmd files to R code only
code_extract_r( file_path = NULL, save_path = NULL, documentation = 0, text = NULL )code_extract_r( file_path = NULL, save_path = NULL, documentation = 0, text = NULL )
file_path |
a vector of file paths to check |
save_path |
if NULL, returns a text vector, else a path to save to |
documentation |
0:2 value to pass to knitr::purl |
text |
alternative to file_path, pass text directly |
a character vector
file_path <- demofile("qmd") code_text <- code_extract_r(file_path)file_path <- demofile("qmd") code_text <- code_extract_r(file_path)
Get files referenced in code
code_file_refs(code_text, lang = c("R", "SPSS", "SAS", "Stata"))code_file_refs(code_text, lang = c("R", "SPSS", "SAS", "Stata"))
code_text |
the code text for a single file |
lang |
the language (we only currently handle R, SPSS, SAS, Stata) |
a vector of files that are referenced in the code
code_text <- c( 'source("functions.R")', 'a <- "bread"', 'b <- read.csv("file.csv")' ) code_file_refs(code_text, "R")code_text <- c( 'source("functions.R")', 'a <- "bread"', 'b <- read.csv("file.csv")' ) code_file_refs(code_text, "R")
Detects code language used in files, only for languages metacheck currently processes (R, SAS, SPSS, Stata).
code_lang(file_name)code_lang(file_name)
file_name |
a vector of file names |
a vector of languages
file_name <- "file.R" code_lang(file_name) file_name <- c("file.Rmd", "file.SAS", "file.r", "file.qmd", "file.txt") code_lang(file_name)file_name <- "file.R" code_lang(file_name) file_name <- c("file.Rmd", "file.SAS", "file.r", "file.qmd", "file.txt") code_lang(file_name)
Returns the lines on which library/require calls exist. This is a helper function for the code_check module.
code_library_lines(code_text, lang = c("R", "SPSS", "SAS", "Stata"))code_library_lines(code_text, lang = c("R", "SPSS", "SAS", "Stata"))
code_text |
the code text for a single file |
lang |
the language (we only currently handle R, SPSS, SAS, Stata) |
a data frame with columns code and line (the line numbers on which library calls exist, after removing blank lines and comments)
code_text <- c( "library(dplyr)", "", "# this line won't count", "library(tidyr)", "renv::install('metacheck')" ) code_library_lines(code_text, "R")code_text <- c( "library(dplyr)", "", "# this line won't count", "library(tidyr)", "renv::install('metacheck')" ) code_library_lines(code_text, "R")
Get Code Composition Stats
code_line_stats(code_text, lang = c("R", "SPSS", "SAS", "Stata"))code_line_stats(code_text, lang = c("R", "SPSS", "SAS", "Stata"))
code_text |
the code text for a single file |
lang |
the language (we only currently handle R, SPSS, SAS, Stata) |
list with items total_lines, comment_lines, code_lines, and percent_comment
code_text <- c( "library(dplyr)", "", "# this line is a comment", "a <- 1" ) code_line_stats(code_text, "R")code_text <- c( "library(dplyr)", "", "# this line is a comment", "a <- 1" ) code_line_stats(code_text, "R")
Parse code to check for errors
code_parse_r(file_path = "", text = NULL)code_parse_r(file_path = "", text = NULL)
file_path |
a vector of file paths to check |
text |
alternative to file_path, pass text directly |
a data frame with columns file_path and line
file_path <- demofile("qmd") code_parse_r(file_path)file_path <- demofile("qmd") code_parse_r(file_path)
Read code from files
code_read(file_path)code_read(file_path)
file_path |
a file path or url to read in |
a character vector of the file contents
file_path <- demofile("json") text <- code_read(file_path)file_path <- demofile("json") text <- code_read(file_path)
Remove comments from code text
code_remove_comments(code_text, lang = c("R", "SPSS", "SAS", "Stata"))code_remove_comments(code_text, lang = c("R", "SPSS", "SAS", "Stata"))
code_text |
the code text for a single file |
lang |
the language (we only currently handle R, SPSS, SAS, Stata) |
the code_text minus comment lines
code_text <- c( "# this is a comment", "", "x <- 'And this is code'" ) code_text_nc <- code_remove_comments(code_text, "R")code_text <- c( "# this is a comment", "", "x <- 'And this is code'" ) code_text_nc <- code_remove_comments(code_text, "R")
A helper function for making module reports.
collapse_section( text, title = "Learn More", callout = c("tip", "note", "warning", "important", "caution"), collapse = TRUE )collapse_section( text, title = "Learn More", callout = c("tip", "note", "warning", "important", "caution"), collapse = TRUE )
text |
The text to put in the collapsible section; vectors will be collapse with line breaks between (e.g., into paragraphs) |
title |
The title of the collapse header |
callout |
the type of quarto callout block |
collapse |
whether to collapse the block at the start |
text
text <- c("Paragraph 1...", "Paragraph 2...") collapse_section(text) |> cat()text <- c("Paragraph 1...", "Paragraph 2...") collapse_section(text) |> cat()
Uses grobid or bibr to convert a file to paper format.
convert( file_path, save_path = ".", method = c("auto", "bibr", "grobid", "xml"), crossref_lookup = FALSE, keep_xml = TRUE, ... )convert( file_path, save_path = ".", method = c("auto", "bibr", "grobid", "xml"), crossref_lookup = FALSE, keep_xml = TRUE, ... )
file_path |
Path to the document file, or a directory of documents |
save_path |
Path to a directory in which to save the JSON file |
method |
whether to use bibr, grobid, or xml (grobid_to_bibr) to convert a file (see Details) |
crossref_lookup |
whether to add the bib_match table from crossref |
keep_xml |
if the method is grobid, whether to keep intermediate XML files |
... |
further arguments to pass to convert_bibr, convert_grobid, or grobid_to_bibr |
Both bibr and grobid can handle PDF files. Only bibr can convert doc or docx files. Already-converted grobid XML files can be converted to bibr format (set crossref_lookup=TRUE to add a bib_match table). If the file_path is a directory, the method will be xml if any XML files are present, and bibr if only doc or docx files are present.
the path to the JSON file
Converts document files (PDF, DOC, DOCX) to structured JSON using the bibr
extraction service. Supports two backends: the Scienceverse platform
("scivrs") which uses a job queue with load balancing, and a
self-hosted bibr instance ("selfhosted") for direct API access.
convert_bibr( file_path, save_path = ".", backend = c("auto", "scivrs", "selfhosted"), api_key = NULL, api_url = NULL, include_figures = FALSE, start_page = 1, end_page = Inf, poll_interval = 2, timeout = 600 )convert_bibr( file_path, save_path = ".", backend = c("auto", "scivrs", "selfhosted"), api_key = NULL, api_url = NULL, include_figures = FALSE, start_page = 1, end_page = Inf, poll_interval = 2, timeout = 600 )
file_path |
Path to the document file, or a directory of documents |
save_path |
Path to a directory in which to save the JSON file |
backend |
Which backend to use: |
api_key |
API key (scivrs backend only). A Bearer token starting with
|
api_url |
Base URL of the API. Defaults to the appropriate URL for the selected backend. |
include_figures |
Whether to include base64-encoded figure images in the output (default FALSE) |
start_page |
First page of the file to extract (default 1) |
end_page |
Last page of the file to extract (default Inf for all pages) |
poll_interval |
Seconds between status polls, scivrs backend only (default 2) |
timeout |
Maximum seconds to wait for processing, scivrs backend only (default 600) |
When backend = "auto" (the default), the "scivrs" backend is
used if api_key is provided or the SCIVRS_API_KEY environment
variable is set. Otherwise, "selfhosted" is used (no authentication
required).
Path(s) to the saved JSON file(s)
## Not run: # Auto-detect backend from environment variables pdf <- demofile("pdf") convert_bibr(pdf) # Explicitly use Scienceverse platform convert_bibr(pdf, backend = "scivrs") # Use self-hosted bibr instance convert_bibr(pdf, backend = "selfhosted") # Extract specific pages convert_bibr(pdf, start_page = 1, end_page = 10) # Directory of papers dir <- system.file("demo", package = "metacheck") convert_bibr(dir, save_path = "results/") ## End(Not run)## Not run: # Auto-detect backend from environment variables pdf <- demofile("pdf") convert_bibr(pdf) # Explicitly use Scienceverse platform convert_bibr(pdf, backend = "scivrs") # Use self-hosted bibr instance convert_bibr(pdf, backend = "selfhosted") # Extract specific pages convert_bibr(pdf, start_page = 1, end_page = 10) # Directory of papers dir <- system.file("demo", package = "metacheck") convert_bibr(dir, save_path = "results/") ## End(Not run)
This function uses a GDPR-compliant public grobid server maintained by Eindhoven Technical University. You can set up your own local grobid server following instructions from https://grobid.readthedocs.io/ and set the argument api_url to its path (probably http://localhost:8070). See https://github.com/grobidOrg/grobid#demo for other publicly available servers (we cannot guarantee their privay).
convert_grobid( file_path, save_path = ".", api_url = "https://grobid.hti.ieis.tue.nl", start_page = -1, end_page = -1, consolidate_citations = 0, consolidate_header = 0, consolidate_funders = 0 )convert_grobid( file_path, save_path = ".", api_url = "https://grobid.hti.ieis.tue.nl", start_page = -1, end_page = -1, consolidate_citations = 0, consolidate_header = 0, consolidate_funders = 0 )
file_path |
path to the PDF, a vector of paths, or a directory name that contains PDFs |
save_path |
directory or file path to save to; set to NULL to return the XML directly |
api_url |
the URL to the grobid server |
start_page |
the first page of the PDF to read (defaults to -1 to read all pages) |
end_page |
the last page of the PDF to read (defaults to -1 to read all pages) |
consolidate_citations |
whether to fix/enhance citations |
consolidate_header |
whether to fix/enhance paper info |
consolidate_funders |
whether to fix/enhance funder info |
Consolidation of citations, headers, and funders looks up these items in CrossRef or another database to fix or enhance information (see https://grobid.readthedocs.io/en/latest/Consolidation/). This can slow down conversion. Consolidating headers is only useful for published papers, and can be set to 0 for work in prep. We recommend you leave these defaults at 0 and use crossref_loookup = TRUE when converting from grobid XML to bibr JSON format with the convert() function.
XML object
Valid selects for crossref API are:
crossref_doi( doi, select = c("DOI", "type", "title", "author", "container-title", "volume", "issue", "page", "URL", "abstract", "year", "error") )crossref_doi( doi, select = c("DOI", "type", "title", "author", "container-title", "volume", "issue", "page", "URL", "abstract", "year", "error") )
doi |
the DOI of the paper to get info for |
select |
what fields to select from the crossref API |
abstract, URL, resource, member, posted, score, created, degree, update-policy, short-title, license, ISSN, container-title, issued, update-to, issue, prefix, approved, indexed, article-number, clinical-trial-number, accepted, author, group-title, DOI, is-referenced-by-count, updated-by, event, chair, standards-body, original-title, funder, translator, published, archive, published-print, alternative-id, subject, subtitle, published-online, publisher-location, content-domain, reference, title, link, type, publisher, volume, references-count, ISBN, issn-type, assertion, deposited, page, content-created, short-container-title, relation, editor
data frame with DOIs and info
doi <- "10.7717/peerj.4375" ## Not run: # cr_info <- crossref_doi(doi) ## End(Not run)doi <- "10.7717/peerj.4375" ## Not run: # cr_info <- crossref_doi(doi) ## End(Not run)
Look up Reference in CrossRef
crossref_query( ref, min_score = 50, rows = 1, select = c("DOI", "score", "type", "title", "author", "editor", "publisher", "container-title", "year", "volume", "issue", "page", "URL") )crossref_query( ref, min_score = 50, rows = 1, select = c("DOI", "score", "type", "title", "author", "editor", "publisher", "container-title", "year", "volume", "issue", "page", "URL") )
ref |
the full text reference of the paper to get info for, see Details |
min_score |
minimal score that is taken to be a reliable match (default 50) |
rows |
the maximum number of rows to return per reference (default 1) |
select |
what fields to select from the crossref API |
The argument ref can take many formats. Crossref queries only look for authors, title, and container-title (e.g., journal or book), but extra information doesn't seem to hurt.
a text reference or fragment
a bibentry object (authors, title and container will be extracted)
a vector of text or bibentry objects
a paper object (the bib table will be extracted)
Valid selects for this route are: abstract, URL, resource, member, posted, score, created, degree, update-policy, short-title, license, ISSN, container-title, issued, update-to, issue, prefix, approved, indexed, article-number, clinical-trial-number, accepted, author, group-title, DOI, is-referenced-by-count, updated-by, event, chair, standards-body, original-title, funder, translator, published, archive, published-print, alternative-id, subject, subtitle, published-online, publisher-location, content-domain, reference, title, link, type, publisher, volume, references-count, ISBN, issn-type, assertion, deposited, page, content-created, short-container-title, relation, editor
doi
ref <- paste( "Lakens, D., Mesquida, C., Rasti, S., & Ditroilo, M. (2024).", "The benefits of preregistration and Registered Reports.", "Evidence-Based Toxicology, 2(1)." ) ## Not run: cr <- crossref_query(ref) ## End(Not run)ref <- paste( "Lakens, D., Mesquida, C., Rasti, S., & Ditroilo, M. (2024).", "The benefits of preregistration and Registered Reports.", "Evidence-Based Toxicology, 2(1)." ) ## Not run: cr <- crossref_query(ref) ## End(Not run)
Doi.org Info from DataCite
datacite_doi(doi)datacite_doi(doi)
doi |
the DOI(s) to get info for |
bib_match data frame
doi <- "10.5281/zenodo.2669586" ## Not run: doi_info <- datacite_doi(doi) ## End(Not run)doi <- "10.5281/zenodo.2669586" ## Not run: doi_info <- datacite_doi(doi) ## End(Not run)
Return the file path for various versions of the demo paper. Use demopaper() to directly read it as a paper object from the json file.
demofile(ext = c("json", "pdf", "docx", "doc", "xml", "qmd"))demofile(ext = c("json", "pdf", "docx", "doc", "xml", "qmd"))
ext |
the extension of the file |
file path
json <- demofile() pdf <- demofile("pdf")json <- demofile() pdf <- demofile("pdf")
Get demo paper
demopaper()demopaper()
paper object
paper <- demopaper()paper <- demopaper()
Clean DOIs
doi_clean(doi)doi_clean(doi)
doi |
a character vector of one or more DOIs |
a character vector of cleaned DOIs (no https://doi.org or DOI:)
doi_clean("https://doi.org/10.1038/nphys1170") doi_clean("doi:10.1038/nphys1170") doi_clean("DOI: 10.1038/nphys1170")doi_clean("https://doi.org/10.1038/nphys1170") doi_clean("doi:10.1038/nphys1170") doi_clean("DOI: 10.1038/nphys1170")
Doi.org Info from DOI
doi_lookup(doi)doi_lookup(doi)
doi |
the DOI(s) to get info for |
data frame with DOIs and info
doi <- "10.7717/peerj.4375" ## Not run: doi_info <- doi_lookup(doi) ## End(Not run)doi <- "10.7717/peerj.4375" ## Not run: doi_info <- doi_lookup(doi) ## End(Not run)
Checks the doi.org API to see if a DOI is registered and has an associated URL
(using https://doi.org/api/handles). Returns TRUE if it does, FALSE if the DOI
does not exist or does not have an associated URL, and NA if the test failed.
Clearly invalid DOIs (i.e. not starting with "10.") will return FALSE without
server requests.
doi_resolves(doi, timeout = 10)doi_resolves(doi, timeout = 10)
doi |
Character vector. One or more DOIs to check. |
timeout |
Numeric. Request timeout in seconds. Default is |
Logical vector. For each input DOI, returns TRUE if the DOI resolves, FALSE if it does not resolve (or does not start with 10.), and NA if the check failed.
## Not run: doi_resolves("10.1038/nphys1170") # Expected: TRUE doi_resolves("10.1234/invalid.doi") # Expected: FALSE ## End(Not run)## Not run: doi_resolves("10.1038/nphys1170") # Expected: TRUE doi_resolves("10.1234/invalid.doi") # Expected: FALSE ## End(Not run)
Validate DOI format
doi_valid_format(doi)doi_valid_format(doi)
doi |
a character vector of one or more DOIs |
a logical vector
doi_valid_format("10.1038/nphys1170") doi_valid_format("no.no.10.1038")doi_valid_format("10.1038/nphys1170") doi_valid_format("no.no.10.1038")
Set or get email
email(email = NULL)email(email = NULL)
email |
if a string, sets the email |
the current option value (character)
email()email()
Useful emojis
emojisemojis
An object of class list of length 32.
General: "check", "star", warning", "stop", "x", "no", "thumbs_up", "thumbs_down", "info", "question"
Traffic Lights: "tl_green", "tl_yellow", "tl_red", "tl_info", "tl_na", "tl_fail"
Hearts: "red", "orange", "yellow", "green", "blue", "purple", "brown", "black", "white", "pink"
List all equations in the text, returning the matched text (e.g., 't(28) = 2.4', 'p = 0.04') and document location in a table. This is the canonical extractor for reported statistics and effect sizes; modules that need statistics should read from this table rather than re-scanning the text.
extract_eq(paper)extract_eq(paper)
paper |
a paper object or paperlist object |
This will catch most comparators like =<>~and most versions of scientific notation like 5.0 x 10^-2 or 5.0e-2. If you find any formats that are not correctly handled by this function, please contact the author.
a data frame with one row per equation and the columns lhs (the statistic name, e.g. "t", "F", "p"), df (parenthetical degrees of freedom such as "(28)" or "(2, 57)", otherwise NA), comp (the comparator, e.g. "="), rhs (the reported value as text), grp_id (groups equations in the same sentence), text_id, and paper_id.
paper <- demopaper() equations <- extract_eq(paper)paper <- demopaper() equations <- extract_eq(paper)
List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
extract_p_values(paper)extract_p_values(paper)
paper |
a paper object or paperlist object |
Note that this will not catch p-values reported like "the p-value is 0.03" because that results in a ton of false positives when papers discuss p-value thresholds. If you need to detect text like that, use the text_search() function and a custom pattern.
This will catch most comparators like =<>~ and most versions of scientific notation like 5.0 x 10^-2 or 5.0e-2. If you find any formats that are not correctly handled by this function, please contact the author.
a table
paper <- demopaper() p_values <- extract_p_values(paper)paper <- demopaper() p_values <- extract_p_values(paper)
Get a table of URLs from a paper or paperlist. Matches urls that start with http or doi:
extract_urls(paper)extract_urls(paper)
paper |
a paper object or paperlist object |
a table
paper <- demopaper() urls <- extract_urls(paper)paper <- demopaper() urls <- extract_urls(paper)
View a figure image
fig_image_view(paper, figure_id = 1)fig_image_view(paper, figure_id = 1)
paper |
a paper object |
figure_id |
the id for the figure to show |
plots the figure
paper <- demopaper() fig_image_view(paper, 1) fig_image_view(paper, 2)paper <- demopaper() fig_image_view(paper, 1) fig_image_view(paper, 2)
Categorise files
file_category(contents)file_category(contents)
contents |
a table with columns name, path such as from |
the table with new column file_category
contents <- c("script.R", "data.csv", "README", "codebook.csv") file_category(contents)contents <- c("script.R", "data.csv", "README", "codebook.csv") file_category(contents)
Get file Type from Extension
filetype(filename)filetype(filename)
filename |
the file name |
a named vector of file types
filetype("script.R")filetype("script.R")
FLoRA database containing DOIs of original studies and replications. Use FLoRA_date() to find the date it was downloaded, and FLoRA_update() to update it.
FLoRA()FLoRA()
A data frame with 8 columns:
DOI of original study
APA reference of original study
DOI of replication study (may be NA if url_r is provided)
APA reference of replication study
URL of replication study (used when DOI is not available)
replication outcome
quote describing replication outcome
replication or reproduction
a data frame
https://osf.io/9r62x/files/t4j8f
FLoRA()FLoRA()
Get date FLoRA was updated
FLoRA_date()FLoRA_date()
the date
FLoRA_date()FLoRA_date()
metacheck comes with a built-in data frame called FLoRA. We update it regularly, but you can use this function to download the newest version. The download is >5MB, but this function will summarise the information into a smaller version and delete the original file.
FLoRA_update()FLoRA_update()
the path to the data frame (invisibly)
Formats a structured author list (data frame with given/family columns) as a display string.
format_bib_authors(authors)format_bib_authors(authors)
authors |
a data frame with |
a character string (or vector) of formatted author names
authors <- data.frame(given = c("Alice H.", "Wendy"), family = c("Eagly", "Wood")) format_bib_authors(authors)authors <- data.frame(given = c("Alice H.", "Wendy"), family = c("Eagly", "Wood")) format_bib_authors(authors)
Format a reference for display in a report.
format_ref(bib)format_ref(bib)
bib |
a bibentry object or list of bibentry objects |
The argument bib should be a bibentry object (e.g., like those made by citation(), but it can also handle a bibtex object or a bibtex formatted character vector. If these do not read in as valid bibtex, the original text of bib will be returned unformatted.
formatted text
mc <- citation("metacheck") format_ref(mc) # handles bibtext bib_mc <- utils::toBibtex(mc) format_ref(bib_mc) paper <- demopaper() format_ref(paper$bib$ref[1:2])mc <- citation("metacheck") format_ref(mc) # handles bibtext bib_mc <- utils::toBibtex(mc) format_ref(bib_mc) paper <- demopaper() format_ref(paper$bib$ref[1:2])
A helper for creating modules. Checks for previous module outputs in a chain and returns the named list item if it exists in any parent environment.
get_prev_outputs(module, item, parent_n = 2)get_prev_outputs(module, item, parent_n = 2)
module |
the name of a previously run module |
item |
the name of the list item to extract |
parent_n |
the number of parents to traverse up the chain. Noramlly 2 if you are calling this from a module function, but maybe more if you are calling it from a helper function. |
the extracted list item, or NULL if not found
# .__mc__prev_outputs is usually created by `module_run()` .__mc__prev_outputs <- list(mod_1 = list(a = 1, b = 2)) f <- function(item) { get_prev_outputs("mod_1", item) } f("a") f("d")# .__mc__prev_outputs is usually created by `module_run()` .__mc__prev_outputs <- list(mod_1 = list(a = 1, b = 2)) f <- function(item) { get_prev_outputs("mod_1", item) } f("a") f("d")
Get File List from GitHub
github_files(repo, dir = "", recursive = FALSE)github_files(repo, dir = "", recursive = FALSE)
repo |
The URL of the repository (in the format "username/repo" or "https://github.com/username/repo") |
dir |
an optional directory name to search |
recursive |
whether to search the files recursively |
a data frame of files
## Not run: github_files("scienceverse/metacheck") ## End(Not run)## Not run: github_files("scienceverse/metacheck") ## End(Not run)
Get GitHub Repo Info
github_info(repo, recursive = FALSE)github_info(repo, recursive = FALSE)
repo |
The URL of the repository (in the format "username/repo" or "https://github.com/username/repo") |
recursive |
whether to search the files recursively |
a list of information about the repo
## Not run: github_info("scienceverse/metacheck") ## End(Not run)## Not run: github_info("scienceverse/metacheck") ## End(Not run)
Get Languages from GitHub Repo
github_languages(repo)github_languages(repo)
repo |
The URL of the repository (in the format "username/repo" or "https://github.com/username/repo") |
vector of languages
## Not run: github_languages("scienceverse/metacheck") ## End(Not run)## Not run: github_languages("scienceverse/metacheck") ## End(Not run)
GitHub links can be in PDFs in several ways.
github_links(paper)github_links(paper)
paper |
a paper object or paperlist object |
a table with the GitHub url in the first (text) column
github_links(psychsci)github_links(psychsci)
Get README from GitHub
github_readme(repo)github_readme(repo)
repo |
The URL of the repository (in the format "username/repo" or "https://github.com/username/repo") |
a character string of the README contents
## Not run: github_readme("scienceverse/metacheck") ## End(Not run)## Not run: github_readme("scienceverse/metacheck") ## End(Not run)
Get Short GitHub Repo Name
github_repo(repo)github_repo(repo)
repo |
The URL of the repository (in the format "username/repo" or "https://github.com/username/repo") |
character string of short repo name
github_repo("scienceverse/metacheck") github_repo("https://github.com/scienceverse/metacheck/") github_repo("https://github.com/scienceverse/metacheck.git")github_repo("scienceverse/metacheck") github_repo("https://github.com/scienceverse/metacheck/") github_repo("https://github.com/scienceverse/metacheck.git")
Convert Grobid TEI XML file to bibr format
grobid_to_bibr(xml_path, save_path = ".", crossref_lookup = FALSE)grobid_to_bibr(xml_path, save_path = ".", crossref_lookup = FALSE)
xml_path |
the path to the XML file |
save_path |
directory or file path to save to; set to NULL to return a paper object |
crossref_lookup |
whether to look up references in crossref |
a paper object
It is useful to ask an LLM to return data in JSON structured format, but can be frustrating to extract the data, especially where the LLM makes syntax mistakes. This function tries to expand a column with a JSON-formatted response into columns and deals with it gracefully (sets an 'error' column to "parsing error") if there are errors. It also fixes column data types, if possible.
json_expand(table, col = "answer", suffix = c("", ".json"))json_expand(table, col = "answer", suffix = c("", ".json"))
table |
the table with a column to expand |
col |
the name or index of the column to expand (defaults to "answer" or the first column) |
suffix |
the suffix for the extracted columns if they conflict with names in the table |
the table plus the expanded columns
table <- data.frame( paper_id = 1:5, answer = c( '{"number": "1", "letter": "A", "bool": true}', '{"number": "2", "letter": "B", "bool": "FALSE"}', '{"number": "3", "letter": "", "bool": null}', "oh no, the LLM misunderstood", '{"number": "5", "letter": ["E", "F"], "bool": false}' ) ) expanded <- json_expand(table, "answer") expandedtable <- data.frame( paper_id = 1:5, answer = c( '{"number": "1", "letter": "A", "bool": true}', '{"number": "2", "letter": "B", "bool": "FALSE"}', '{"number": "3", "letter": "", "bool": null}', "oh no, the LLM misunderstood", '{"number": "5", "letter": ["E", "F"], "bool": false}' ) ) expanded <- json_expand(table, "answer") expanded
Get the last log
lastlog(i = 1, logpath = NULL)lastlog(i = 1, logpath = NULL)
i |
the indices to return |
logpath |
an optional file path to read the log from |
a list of the last log item, or a data frame of multiple items
# set up 2 log items logger("test", list(msg = "hi")) logger("test", list(msg = "hi again")) lastlog() lastlog(2) lastlog(1:2)# set up 2 log items logger("test", list(msg = "hi")) logger("test", list(msg = "hi again")) lastlog() lastlog(2) lastlog(1:2)
Make an html link
link(url, text = url, new_window = TRUE, type = "")link(url, text = url, new_window = TRUE, type = "")
url |
the URL to link to |
text |
the text to link |
new_window |
whether to open in a new window |
type |
handle common links, like "doi" () |
string
link("https://scienceverse.org")link("https://scienceverse.org")
Ask a large language model (LLM) any question you want about a vector of
text or the text from a text_search(). When type is provided, uses
ellmer's structured output API to guarantee output conforming to the type
spec; otherwise returns free-text responses in an answer column.
llm( text, system_prompt, type = NULL, text_col = "text", model = llm_model(), params = list() )llm( text, system_prompt, type = NULL, text_col = "text", model = llm_model(), params = list() )
text |
The text to send to the LLM (vector of strings, or data frame with the text in a column) |
system_prompt |
A system prompt to set the behavior of the assistant |
type |
An optional ellmer type specification for structured extraction
(e.g., from |
text_col |
The name of the text column if text is a data frame |
model |
the LLM model name (see |
params |
a named list to pass to |
You will need to get your own API key from https://console.groq.com/keys. To avoid having to type it out, add it to the .Renviron file in the following format (you can use usethis::edit_r_environ() to access the .Renviron file)
GROQ_API_KEY="key_value_asdf"
See https://console.groq.com/docs for more information
a data frame of results
## Not run: # Free-text query text <- c("hello", "number", "ten", 12) system_prompt <- "Is this a number? Answer only 'TRUE' or 'FALSE'" is_number <- llm(text, system_prompt) # Structured extraction type_spec <- ellmer::type_object( is_number = ellmer::type_boolean("Whether the input is a number") ) result <- llm(c("hello", "42"), "Classify the input.", type = type_spec) ## End(Not run)## Not run: # Free-text query text <- c("hello", "number", "ten", 12) system_prompt <- "Is this a number? Answer only 'TRUE' or 'FALSE'" is_number <- llm(text, system_prompt) # Structured extraction type_spec <- ellmer::type_object( is_number = ellmer::type_boolean("Whether the input is a number") ) result <- llm(c("hello", "42"), "Classify the input.", type = type_spec) ## End(Not run)
Set the maximum number of calls to the LLM
llm_max_calls(n = NULL)llm_max_calls(n = NULL)
n |
The maximum number of calls that the llm() function can make |
Use llm_model_list() to get a list of available models
llm_model(model = NULL)llm_model(model = NULL)
model |
the name of the model |
List available LLM models for the specified platform.
llm_model_list(platform = NULL)llm_model_list(platform = NULL)
platform |
The platform. If NULL, checks all platforms for which you have a valid API_KEY. |
For platforms other than groq, returns the value from the corresponding ellmer::models_platform function.
a data frame of models and info
## Not run: llm_model_list() ## End(Not run)## Not run: llm_model_list() ## End(Not run)
Mainly for use in optional LLM workflows in modules
llm_use(llm_use = NULL)llm_use(llm_use = NULL)
llm_use |
if logical, sets whether to use LLMs |
the current option value (logical)
if (llm_use()) { print("We can use LLMs") } else { print("We will not use LLMs") }if (llm_use()) { print("We can use LLMs") } else { print("We will not use LLMs") }
Lists all files in a local directory recursively and returns a data frame
compatible with the repo_check output table, for use with code_check.
local_files(path, recursive = FALSE)local_files(path, recursive = FALSE)
path |
path to a local directory or file, or a vector of paths |
recursive |
whether to search the files recursively |
a data frame with columns repo_url, file_name, file_url,
file_location, file_size, file_type
## Not run: local_files("my_project") ## End(Not run)## Not run: local_files("my_project") ## End(Not run)
Adds a logging message to the log. Keeps the log as a maximum of 1000 rows.
logger(label = "", contents = list(), logpath = NULL)logger(label = "", contents = list(), logpath = NULL)
label |
a string with the context (e.g.,module name) |
contents |
a named list of the log contents |
logpath |
an optional file path to save the log in |
called for side effects of writing to log, returns logpath
logpath <- tempfile(fileext = ".log") logger("test", list(x = 1), logpath) lastlog()logpath <- tempfile(fileext = ".log") logger("test", list(x = 1), logpath) lastlog()
See the help files for a module by name (get a list of names from module_list())
module_help(module = NULL)module_help(module = NULL)
module |
the name of a module or path to a module |
the help text
module_help("marginal")module_help("marginal")
Get module information
module_info(module)module_info(module)
module |
the name of a module or path to a module |
a list of module info
module_info("all_p_values")module_info("all_p_values")
List modules
module_list(module_dir = system.file("modules", package = "metacheck"))module_list(module_dir = system.file("modules", package = "metacheck"))
module_dir |
the directory to search for modules (defaults to the built-in modules) |
a data frame of modules
mods <- module_list()mods <- module_list()
Report from module output
module_report(module_output, header = 3)module_report(module_output, header = 3)
module_output |
the output of a |
header |
header level (default 2) |
text
paper <- demopaper() op <- module_run(paper, "stat_p_exact") module_report(op) |> cat()paper <- demopaper() op <- module_run(paper, "stat_p_exact") module_report(op) |> cat()
Run a module
module_run(paper, module, ...)module_run(paper, module, ...)
paper |
a paper object or a list of paper objects |
module |
the name of a module or path to a module to run on this object |
... |
further arguments to the module (e.g., arguments for the |
a list of the returned table and report text
module_run(psychsci[[1]], "all_p_values")module_run(psychsci[[1]], "all_p_values")
Create a Module from a Template
module_template(module_name, path = "./modules")module_template(module_name, path = "./modules")
module_name |
The short name of the module (should contain only letters, numbers, and _) |
path |
The path of the directory to save the module in (defaults to a directory called "modules" in the working directory) |
the file path (invisibly)
See details for a list of root-level fields that can be selected.
openalex_doi(doi, select = NULL)openalex_doi(doi, select = NULL)
doi |
the DOI of the paper to get info for |
select |
a vector of fields to return, NULL returns all |
See https://docs.openalex.org/api-entities/works/work-object for explanations of the information you can retrieve about works.
Root-level fields for the select argument:
id
doi
title
display_name
publication_year
publication_date
ids
language
primary_location
type
type_crossref
indexed_in
open_access
authorships
institution_assertions
countries_distinct_count
institutions_distinct_count
corresponding_author_ids
corresponding_institution_ids
apc_list
apc_paid
fwci
has_fulltext
fulltext_origin
cited_by_count
citation_normalized_percentile
cited_by_percentile_year
biblio
is_retracted
is_paratext
primary_topic
topics
keywords
concepts
mesh
locations_count
locations
best_oa_location
sustainable_development_goals
grants
datasets
versions
referenced_works_count
referenced_works
related_works
abstract_inverted_index
abstract_inverted_index_v3
cited_by_api_url
counts_by_year
updated_date
created_date
list with DOIs and info
doi <- "10.7717/peerj.4375" ## Not run: oa_info <- openalex_doi(doi) oa_info <- openalex_doi(doi, "title") ## End(Not run)doi <- "10.7717/peerj.4375" ## Not run: oa_info <- openalex_doi(doi) oa_info <- openalex_doi(doi, "title") ## End(Not run)
Look up a reference in OpenAlex
openalex_query(title, source = NA, authors = NA, strict = TRUE)openalex_query(title, source = NA, authors = NA, strict = TRUE)
title |
The title of the work |
source |
The source (journal or book) |
authors |
The authors |
strict |
Whether to return NULL or the best match if there isn't a single match |
A data frame with citation info
## Not run: openalex_query("Sample Size Justification", "Collabra Psychology") ## End(Not run)## Not run: openalex_query("Sample Size Justification", "Collabra Psychology") ## End(Not run)
Check the status of the OSF API server.
osf_api_check( osf_api = getOption("metacheck.osf.api"), on_error = c("stop", "warn", "ignore") )osf_api_check( osf_api = getOption("metacheck.osf.api"), on_error = c("stop", "warn", "ignore") )
osf_api |
the OSF API to use (e.g., "https://api.osf.io/v2") |
on_error |
whether to stop, warn, or ignore errors |
The OSF API server is down a lot, so it's often good to check it before you run a bunch of OSF functions. When the server is down, it can take several seconds to return an error, so scripts where you are checking many URLs can take a long time before you realise they aren't working.
You can only make 100 API requests per hour, unless you authorise your requests, when you can make 10K requests per day. The osf functions in metacheck often make several requests per URL to get all of the info. You can authorise them by creating an OSF token at https://osf.io/settings/tokens and including the following line in your .Renviron file:
OSF_PAT="replace-with-your-token-string"
the OSF status
osf_api_check()osf_api_check()
Check if strings are valid OSF IDs, URLs, or waterbutler IDs. Basically an improved wrapper for osfr::as_id() that returns NA for invalid IDs in a vector.
osf_check_id(osf_id)osf_check_id(osf_id)
osf_id |
a vector of OSF IDs or URLs |
a vector of valid IDs, with NA in place of invalid IDs
osf_check_id("pngda") osf_check_id("osf.io/pngda") osf_check_id("https://osf.io/pngda") osf_check_id("https://osf .io/png da") # rogue whitespace osf_check_id("pnda") # invalidosf_check_id("pngda") osf_check_id("osf.io/pngda") osf_check_id("https://osf.io/pngda") osf_check_id("https://osf .io/png da") # rogue whitespace osf_check_id("pnda") # invalid
Sometimes the OSF gets fussy if you make too many calls, so you can set a delay of a few seconds before each call. Use osf_delay() to get or set the OSF delay.
osf_delay(delay = NULL)osf_delay(delay = NULL)
delay |
the number of seconds to wait between OSF calls |
osf_delay()osf_delay()
Creates a directory for the OSF ID and downloads all of the files using a folder structure from the OSF project nodes and file storage structure. Returns (invisibly) a data frame with file info.
osf_file_download( osf_id, download_to = ".", max_file_size = 10, max_download_size = 100, max_folder_length = Inf, ignore_folder_structure = FALSE, pb = NULL )osf_file_download( osf_id, download_to = ".", max_file_size = 10, max_download_size = 100, max_folder_length = Inf, ignore_folder_structure = FALSE, pb = NULL )
osf_id |
an OSF ID or URL |
download_to |
path to download to |
max_file_size |
maximum file size to download (in MB) - set to NULL for no restrictions |
max_download_size |
maximum total size to download |
max_folder_length |
maximum folder name length (set to make sure paths are <260 character on some Windows OS) |
ignore_folder_structure |
if TRUE, download all files into a single folder |
pb |
a progress bar passed from another function |
Some differences may exist because the OSF allows longer file names with characters that may not be allowed on a file system, so these are cleaned up when downloading.
You can limit downloads to only files under a specific size (defaults to 10MB) and only a maximum download size (largest files will be omitted until total size is under the limit). Omitted files will be listed as messages in verbose mode, and included in the returned data frame with the downloaded column value set to FALSE.
data frame of file info
## Not run: osf_file_download("6nt4v") ## End(Not run)## Not run: osf_file_download("6nt4v") ## End(Not run)
OSF API queries only return up to 10 items per page, so this helper functions checks for extra pages and returns all of them
osf_get_all_pages(url, page_end = Inf)osf_get_all_pages(url, page_end = Inf)
url |
the OSF API URL |
page_end |
The last page to get |
a table of the returned data
# get the 20 newest preprints ## Not run: osf_api <- getOption("metacheck.osf.api") url <- sprintf("%s/preprints/?search=date_created-desc", osf_api) preprints <- osf_get_all_pages(url, 2) ## End(Not run)# get the 20 newest preprints ## Not run: osf_api <- getOption("metacheck.osf.api") url <- sprintf("%s/preprints/?search=date_created-desc", osf_api) preprints <- osf_get_all_pages(url, 2) ## End(Not run)
Retrieve info from the OSF by ID
osf_info(osf_url, id_col = 1, recursive = FALSE, pb = NULL)osf_info(osf_url, id_col = 1, recursive = FALSE, pb = NULL)
osf_url |
an OSF ID or URL, or a table containing them |
id_col |
the index or name of the column that contains OSF IDs or URLs, if id is a table |
recursive |
whether to retrieve all children |
pb |
a progress bar passed from another function |
a data frame of information
## Not run: # get info on one OSF node osf_info("pngda") # also get child nodes and files osf_info("https://osf.io/6nt4v", recursive = TRUE) ## End(Not run)## Not run: # get info on one OSF node osf_info("pngda") # also get child nodes and files osf_info("https://osf.io/6nt4v", recursive = TRUE) ## End(Not run)
Get all OSF links.
osf_links(paper)osf_links(paper)
paper |
a paper object or paperlist object |
a table with the OSF url in the first (href) column
osf_links(psychsci)osf_links(psychsci)
Get A list of preprints from the OSF
osf_preprint_list( provider = NULL, date_created = NULL, date_modified = NULL, page_start = 1, page_end = page_start )osf_preprint_list( provider = NULL, date_created = NULL, date_modified = NULL, page_start = 1, page_end = page_start )
provider |
a vector of the preprint providers, e.g. psyarxiv, socarxiv, edarxiv (see https://osf.io/preprints/discover) |
date_created |
a single date or a vector of two date (min and max) |
date_modified |
a single date or a vector of two date (min and max) |
page_start |
the first page of 10 entries |
page_end |
the last page of 10 entires to read |
a table of preprint info
## Not run: dc <- c("2025-09-01", "2025-10-01") pp <- osf_preprint_list("psyarxiv", date_created = dc) files <- pp$primary_file ## End(Not run)## Not run: dc <- c("2025-09-01", "2025-10-01") pp <- osf_preprint_list("psyarxiv", date_created = dc) files <- pp$primary_file ## End(Not run)
Get OSF GUID Type
osf_type(guid)osf_type(guid)
guid |
the 5-letter GUID |
the type
# osf_type("pngda")# osf_type("pngda")
Get Paper IDs
paper_id(paper)paper_id(paper)
paper |
a paper or paperlist |
a vector of paper_ids
paper_id(psychsci)paper_id(psychsci)
Return a table from a paper object or concatenate tables across a list of paper objects.
paper_table(paper, table, cols = NULL)paper_table(paper, table, cols = NULL)
paper |
a paper or paperlist |
table |
a table name |
cols |
the columns to return from the table (default all columns) |
a merged table
biblio <- paper_table(psychsci[1:10], "bib") xrefs <- paper_table(psychsci[1:10], "xref")biblio <- paper_table(psychsci[1:10], "bib") xrefs <- paper_table(psychsci[1:10], "xref")
Checks if a paper object conforms to the JSON schema.
paper_validate(paper)paper_validate(paper)
paper |
a paper object |
TRUE or error
paper <- list(paper_id = "Not a paper object") tryCatch( paper_validate(paper), error = \(e) print(e$message) ) paper <- demopaper() paper_validate(paper)paper <- list(paper_id = "Not a paper object") tryCatch( paper_validate(paper), error = \(e) print(e$message) ) paper <- demopaper() paper_validate(paper)
Save a paper as a JSON file.
paper_write(paper, file_name = NULL, save_path = ".")paper_write(paper, file_name = NULL, save_path = ".")
paper |
a paper object |
file_name |
the name of the file (if NULL, defaults to the paper_id) |
save_path |
the directory to save the JSON file in |
the path to the JSON file
## Not run: paper <- demopaper() paper$info$title <- "New title" paper_write(paper, "new_paper") ## End(Not run)## Not run: paper <- demopaper() paper$info$title <- "New title" paper_write(paper, "new_paper") ## End(Not run)
Make sure user-input file names are not problematic.
path_sanitize( path, replacement = "_", remove_whitespace = TRUE, keep_sep = TRUE )path_sanitize( path, replacement = "_", remove_whitespace = TRUE, keep_sep = TRUE )
path |
the path to sanitize (can be a vector of paths) |
replacement |
the character to replace invalid characters with |
remove_whitespace |
whether to include whitespace as a problem |
keep_sep |
whether to keep the path separator / |
the sanitized vector
path <- "/My Files/x><y.pdf" path_sanitize(path) path_sanitize(path, replacement = "~") path_sanitize(path, remove_whitespace = FALSE) path_sanitize(path, keep_sep = FALSE)path <- "/My Files/x><y.pdf" path_sanitize(path) path_sanitize(path, replacement = "~") path_sanitize(path, remove_whitespace = FALSE) path_sanitize(path, keep_sep = FALSE)
Helper function for conditional plurals. For example, if you want to return "1 error" or "2 errors", you can use this in a sprintf().
plural(n, singular = "", plural = "s")plural(n, singular = "", plural = "s")
n |
the number |
singular |
the word or ending when n = 1 |
plural |
the word or ending n != 1 |
a string
n <- 0:3 sprintf("I have %d friend%s", n, plural(n)) sprintf("I have %d %s", n, plural(n, "octopus", "octopi"))n <- 0:3 sprintf("I have %d friend%s", n, plural(n)) sprintf("I have %d %s", n, plural(n, "octopus", "octopi"))
250 open access papers from Psychological Science.
psychscipsychsci
A list of 250 paper objects
https://journals.sagepub.com/home/pss
Takes a DOI, and retrieves information from pubpeer related to post-publication peer review comments.
pubpeer_comments(doi)pubpeer_comments(doi)
doi |
a vector of paper DOIs |
a dataframe with information from pubpeer
doi <- c( "10.1038/s41598-025-24662-9", "10.1177/0146167211398138" ) pubpeer_comments(doi)doi <- c( "10.1038/s41598-025-24662-9", "10.1177/0146167211398138" ) pubpeer_comments(doi)
Retrieve files from ResearchBox by URL
rbox_file_download(rb_url, pb = NULL)rbox_file_download(rb_url, pb = NULL)
rb_url |
a vector of ResearchBox URLs |
pb |
a progress bar passed from another function |
a data frame of information
Retrieve info from ResearchBox by URL
rbox_info(rb_url, id_col = 1, pb = NULL)rbox_info(rb_url, id_col = 1, pb = NULL)
rb_url |
an ResearchBox URL, or a table containing them (e.g., as created by |
id_col |
the index or name of the column that contains ResearchBox URLs, if id is a table |
pb |
a progress bar passed from another function |
a data frame of information
## Not run: # get info on one OSF node rbox_info("https://researchbox.org/801") ## End(Not run)## Not run: # get info on one OSF node rbox_info("https://researchbox.org/801") ## End(Not run)
Find ResearchBox Links in Papers
rbox_links(paper)rbox_links(paper)
paper |
a paper object or paperlist object |
a table with the ResearchBox url in the first (href) column
rbox_links(psychsci)rbox_links(psychsci)
Read in grobid XML or bibr JSON
read(file_path, include_images = FALSE, recursive = FALSE)read(file_path, include_images = FALSE, recursive = FALSE)
file_path |
path to a single directory containing XML and/or JSON files, or a vector of XML/JSON paths |
include_images |
whether to include images in the figures table of the paper object (they make object size larger, only relevant to bibr imports) |
recursive |
whether to read files in subfolders (files should have unique paper_ids, or errors can occur) |
a paper or paperlist
Return a table with fixed DOIs and reference text from a paper object or concatenate tables across a list of paper objects.
ref_table(paper)ref_table(paper)
paper |
a paper or paperlist |
a merged table
biblio <- ref_table(psychsci[[1]])biblio <- ref_table(psychsci[[1]])
Run specified modules on a paper and generate a report in quarto (qmd), html, or pdf format.
report( paper, modules = c("prereg_check", "funding_check", "coi_check", "power", "repo_check", "code_check", "stat_check", "stat_p_exact", "stat_p_nonsig", "stat_effect_size", "marginal", "ref_accuracy", "ref_replication", "ref_retraction", "ref_pubpeer", "ref_summary"), output_file = paste0(paper$paper_id, "_report.", output_format), output_format = c("html", "qmd"), args = list() )report( paper, modules = c("prereg_check", "funding_check", "coi_check", "power", "repo_check", "code_check", "stat_check", "stat_p_exact", "stat_p_nonsig", "stat_effect_size", "marginal", "ref_accuracy", "ref_replication", "ref_retraction", "ref_pubpeer", "ref_summary"), output_file = paste0(paper$paper_id, "_report.", output_format), output_format = c("html", "qmd"), args = list() )
paper |
a paper object or a paperlist object |
modules |
a vector of modules to run (names for built-in modules or paths for custom modules) |
output_file |
the name of the output file |
output_format |
the format to create the report in |
args |
a list of arguments to pass to modules (see Details) |
Pass arguments to modules in a named list of lists, using the same names as the modules argument. You only need to specify modules with arguments.
args <- list(power = list(seed = 8675309))
the file path the report is saved to
## Not run: paper <- demopaper() report(paper) ## End(Not run)## Not run: paper <- demopaper() report(paper) ## End(Not run)
Launch the Report app: upload a PDF and generate a report with one click, with privacy options for what is sent to external servers.
report_app(quiet = FALSE, ...)report_app(quiet = FALSE, ...)
quiet |
whether to show debugging messages in the console |
... |
arguments to pass to shiny::runApp |
NULL (invisibly)
## Not run: report_app() ## End(Not run)## Not run: report_app() ## End(Not run)
Runs modules in order on the paper and orders by section and traffic light.
report_module_run(paper, modules, args = list())report_module_run(paper, modules, args = list())
paper |
a paper object |
modules |
a vector of modules to run |
args |
optional list of arguments to pass to modules |
Pass arguments to modules in a named list of lists, using the same names as the modules argument. You only need to specify modules with arguments.
args <- list(power = list(seed = 8675309))
a list of module outputs
paper <- demopaper() modules <- c("stat_p_exact", "stat_p_nonsig") module_output <- report_module_run(paper, modules)paper <- demopaper() modules <- c("stat_p_exact", "stat_p_nonsig") module_output <- report_module_run(paper, modules)
Create Report from Module Output
report_qmd(module_output, paper = list())report_qmd(module_output, paper = list())
module_output |
a list of module output (usually from |
paper |
a paper object |
report text
A function to display tables in reports.
report_table(table, colwidths = "auto", maxrows = 2, escape = FALSE)report_table(table, colwidths = "auto", maxrows = 2, escape = FALSE)
table |
the data frame to show in a table, or a vector for a list |
colwidths |
set column widths as a vector of px (number > 1) or percent (numbers <= 1) |
maxrows |
if the table has more rows than this, paginate |
escape |
whether or not to escape the DT (necessary if using raw html) |
the datatable
report_table(iris)report_table(iris)
DOIs and nature of statements from the RetractionWatch database. Use rw_date() to find the date it was downloaded, and rw_update() to update it.
retractionwatch() rw()retractionwatch() rw()
A data frame with 44784+ rows and 2 columns:
Document ID
Nature of note(s)
a data frame
https://api.labs.crossref.org/data/retractionwatch
retractionwatch()retractionwatch()
Get date retractionwatch was updated
rw_date()rw_date()
the date
rw_date()rw_date()
metacheck comes with a built-in data frame called retractionwatch. We update it regularly, but you can use this function to download the newest version. The download is >50MB, but this function will summarise the information into a smaller version (~0.5 MB) and delete the original file.
rw_update()rw_update()
the path to the data frame (invisibly)
A helper function for making module reports.
scroll_table( table, colwidths = "auto", maxrows = 2, escape = FALSE, column = "body" )scroll_table( table, colwidths = "auto", maxrows = 2, escape = FALSE, column = "body" )
table |
the data frame to show in a table, or a vector for a list |
colwidths |
set column widths as a vector of px (number > 1) or percent (numbers <= 1) |
maxrows |
if the table has more rows than this, paginate |
escape |
whether or not to escape the DT (necessary if using raw html) |
column |
which quarto column to show tables in |
See quarto article layout for column options. The most common are "body" (centre column), "page" (span all columns"), and "margin" (only in right margin).
To set colwidths, use a numeric or character vector. For a numeric vector, numbers greater than 1 will be interpreted as pixels, less than 1 as percents. Character vectors will be passed as is (e.g., "3em"). If you only want to specify some columns, set the others to NA, like c(200, NA, 200, NA).
the markdown R chunk to create this table
scroll_table(LETTERS)scroll_table(LETTERS)
Check Stats
stats(text, ...)stats(text, ...)
text |
the search table (or list of paper objects) |
... |
arguments to pass to statcheck() |
a table of statistics
paper <- demopaper() stats(paper)paper <- demopaper() stats(paper)
Create a paper object with the specified text (mainly for testing/demos).
test_paper(text = LETTERS, url = character(0))test_paper(text = LETTERS, url = character(0))
text |
a vector of text to add |
url |
a vector of URLs to add |
a paper object
# to test a paper with a specific URL p <- test_paper("https://osf.io/abcde")# to test a paper with a specific URL p <- test_paper("https://osf.io/abcde")
If you have a table resulting from text_search() or a module return object, you can expand the text column to the full sentence, paragraph, or section. You can also set plus and minus to append and prepend sentences to the result (only when expand_to is "sentence").
text_expand( results_table, paper, expand_to = c("sentence", "paragraph", "div", "section"), plus = 0, minus = 0 ) expand_text( results_table, paper, expand_to = c("sentence", "paragraph", "div", "section"), plus = 0, minus = 0 )text_expand( results_table, paper, expand_to = c("sentence", "paragraph", "div", "section"), plus = 0, minus = 0 ) expand_text( results_table, paper, expand_to = c("sentence", "paragraph", "div", "section"), plus = 0, minus = 0 )
results_table |
the table to expand |
paper |
a metacheck paper object or a list of paper objects to look up the expanded text from |
expand_to |
whether to expand to the sentence, paragraph, div, or section level |
plus |
append additional sentences after the target expansion |
minus |
prepend additional sentences before the target expansion |
a results table with the expanded text
# single paper search paper <- demopaper() res_tbl <- text_search(paper, "p =", return = "match") expanded <- text_expand(res_tbl, paper) # multiple paper search papers <- psychsci res_tbl <- text_search(papers, "replicate") expanded <- text_expand(res_tbl, papers, plus = 1, minus = 1)# single paper search paper <- demopaper() res_tbl <- text_search(paper, "p =", return = "match") expanded <- text_expand(res_tbl, paper) # multiple paper search papers <- psychsci res_tbl <- text_search(papers, "replicate") expanded <- text_expand(res_tbl, papers, plus = 1, minus = 1)
Search the text of a paper or list of paper objects. Also works on the table results of a text_search() call.
text_search( paper, pattern = ".*", return = c("sentence", "paragraph", "section", "header", "match", "paper_id"), ignore.case = TRUE, fixed = FALSE, perl = FALSE, exclude = FALSE, search_header = FALSE, include_refs = FALSE ) search_text( paper, pattern = ".*", return = c("sentence", "paragraph", "section", "header", "match", "paper_id"), ignore.case = TRUE, fixed = FALSE, perl = FALSE, exclude = FALSE, search_header = FALSE, include_refs = FALSE )text_search( paper, pattern = ".*", return = c("sentence", "paragraph", "section", "header", "match", "paper_id"), ignore.case = TRUE, fixed = FALSE, perl = FALSE, exclude = FALSE, search_header = FALSE, include_refs = FALSE ) search_text( paper, pattern = ".*", return = c("sentence", "paragraph", "section", "header", "match", "paper_id"), ignore.case = TRUE, fixed = FALSE, perl = FALSE, exclude = FALSE, search_header = FALSE, include_refs = FALSE )
paper |
a paper object or a list of paper objects |
pattern |
the regex pattern to search for, if a vector with length > 1, the patterns will be searched separately and combined |
return |
the kind of text to return, the full sentence, paragraph, header, or section that the text is in, or just the (regex) match, or all body text for a paper (paper_id) |
ignore.case |
whether to ignore case when text searching |
fixed |
logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments. |
perl |
logical. Should Perl-compatible regexps be used? |
exclude |
should matches be included or excluded |
search_header |
also search the header |
include_refs |
whether to include the reference section in the search |
The section argument can take a vector of section names, or a PERL regular expression (use ".*" to match all sections). Possible section types are abstract, intro, method, results, discussion, references, acknowledgment, funding, endnote, footnote, table, figure, and unknown. The default includes all sections except references, tables and figures.
a data frame of matches
paper <- demopaper() all_text <- text_search(paper) study <- text_search(paper, "study") equations <- text_search(paper, "\\b\\S+\\s*(=|<)\\s*[0-9\\.]+", return = "match") no_numbers <- text_search(paper, "\\d", exclude = TRUE)paper <- demopaper() all_text <- text_search(paper) study <- text_search(paper, "study") equations <- text_search(paper, "\\b\\S+\\s*(=|<)\\s*[0-9\\.]+", return = "match") no_numbers <- text_search(paper, "\\d", exclude = TRUE)
Validate
validate(gt, module, compare = "table")validate(gt, module, compare = "table")
gt |
a data frame or vector of text |
module |
the module |
compare |
name of the module output table for comparison |
something
validate("p < .05", "stat_p_exact")validate("p < .05", "stat_p_exact")
Creates a directory for the Zenodo ID and downloads all of the files using a folder structure from the Zenodo project nodes and file storage structure. Returns (invisibly) a data frame with file info.
zenodo_file_download( zenodo_id, download_to = ".", max_file_size = 10, max_download_size = 100, pb = NULL )zenodo_file_download( zenodo_id, download_to = ".", max_file_size = 10, max_download_size = 100, pb = NULL )
zenodo_id |
an Zenodo ID or URL |
download_to |
path to download to |
max_file_size |
maximum file size to download (in MB) - set to NULL or Inf for no restrictions |
max_download_size |
maximum total size to download - set to NULL of Inf for no restrictions |
pb |
a progress bar passed from another function |
You can limit downloads to only files under a specific size (defaults to 10MB) and only a maximum download size (largest files will be omitted until total size is under the limit). Omitted files will be listed as messages in verbose mode, and included in the returned data frame with the downloaded column value set to FALSE.
data frame of file info
## Not run: zenodo_file_download("2591593") ## End(Not run)## Not run: zenodo_file_download("2591593") ## End(Not run)
Retrieve info from Zenodo by URL
zenodo_info(zenodo_url, id_col = 1, pb = NULL)zenodo_info(zenodo_url, id_col = 1, pb = NULL)
zenodo_url |
an Zenodo URL, or a table containing them (e.g., as created by |
id_col |
the index or name of the column that contains Zenodo URLs, if id is a table |
pb |
a progress bar passed from another function |
a data frame of information
## Not run: # get info on one zenodo link zenodo_info("https://doi.org/10.5281/zenodo.18648142") ## End(Not run)## Not run: # get info on one zenodo link zenodo_info("https://doi.org/10.5281/zenodo.18648142") ## End(Not run)
Find Zenodo Links in Papers
zenodo_links(paper)zenodo_links(paper)
paper |
a paper object or paperlist object |
a table with the Zenodo url in the first (text) column
zenodo_links(psychsci)zenodo_links(psychsci)