convert() or convert_grobid() now defaults to the new GDPR-compliant server at TUEreport_app() to make a report with all default modules in a GUI by just uploading a PDFpath_sanitize()extract_eq now catches "Hedges's g" (formerly just "g") and returns values ordered by paper_id, text_id and group_idxml_read_grobid() (an internal helper function for reading Grobid XMLs) to handle some stats better (e.g., "... g z =" is now read as "... gz = ")psychsci for the read-in improvements.retractionwatch database updatedOur beta release! We've made so many changes, and we're sure there are still many bugs to catch and things to improve, but we need other people to start using metacheck to help us.
code_*() functions abstracted out from the code_check module. These may eventually move to a new package specifically for codecheckllm() and vignette.local_pathlocal_files() function (thanks @lakens!).grobid_to_bibr() conversion, handling URLs in text, xrefs, url, and eq tables better.extract_equations() renamed to extract_eq() and now extracts degrees of freedom (df column).tei_text() to fix common problems with grobid handling of equations (e.g., "")psychsci and demopaper() and demofile() for new schema and readfile_types to fix a bug that prepended X to all extensions starting with a number.paper_id() now returns a vector, not a table, fixing modules that used it that wayread() no longer errors when reading an empty directory, just messages and returns an empty paperlistread() only reads in the .json version if a .json and .xml file with the same name existread() has a new argument recursive (default FALSE) to recursively read a directory. This does not handle it well if individual files have the same paper_id, so don't do that.convert() has new arguments crossref_lookup (default FALSE) and keep_xml (default TRUE). It also saves XML and/or JSON files as they are converted, rather than at the end, in case of breaking failure.search_text() is now text_search() and expand_text() is now text_expand(). The old names will exist as aliases.{archive}_retrieve() functions now renamed to {archive}_info() and the old {archive}_info() internal functions are now .{archive}_info()metacheck_app() the shiny app is back!grobid_convert() now reads in the url table more accuratelyextract_urls() uses a simplified regex that seems better at catching full URLsosf_links(), rb_links(), github_links() and aspredicted_links() simplified to use the more accurate url table instead of a full text search.author_table(), as this is just concat_tables() nowlogger() and lastlog() inspired by @levibaruchtest_paper() for creating paper objects with specific test textsummarize_contents() changed to file_category() and now works with a vector of file names, as well as a data framecompare_tables(), text_features() and distinctive_words() now deprecatedvalidate() function simplifiedFLoRA()reports() now takes a paperlist and makes a report from eachreport_module_run() and report_qmd() break down the report() function to allow separation of module output lists and creation of QMD report from them (might be changed to internal functions).crossref_query()lmm() allows you to set the model to any provider or provider/model supported by ellmer (must have appropriate *****_API_KEY set in your Renviron)lmm() arguments have changed to align with ellmer::chat() argumentslmm_models() now returns models from all platforms for which you have a valid API key setgithub_links() function to find github references in a paper.code_check module very much improved - checks SAS and STATA code in OSF, researchbox, and github repos.power module much improvedcoi_check, funding_checkextract_p_values() and extract_urls(), so now no need to use all_p_values and all_urls modules to get their tables. These modules remain because they are used in demos, but may be deprecated soon.get_doi() has been removed in favour of crossref_query(), to look up crossref info by bibliographic query, and crossref_doi(), to look up crossref info by DOI.scroll_table() changed arguments. height is removed and scroll_above changed to maxrows. It not paginates above maxrows (default = 2), rather than scrolling within a fixed height. This is a more accessible solution, since scrolling is hard with touchscreens and it's often hard to copy text in a scroll window. We will continually improve this with further user feedback.reference_check is split into ref_doi_check and ref_accuracy.json_expand() updated to handle LLM JSON errors more gracefully.report() now with the new args argument.get_prev_outputs() module helper functionaspredicted and retractionwatch are removed, as they are superseded by prereg_check and reference_check.nonsignificant_pvalue has changed to nonsig_pformat_ref() for displaying references in bibentry or bibtex formatssummary is now summary_tablepubpeer_comments() function (now vectorised)scroll_table(), collapse_section(), link(), plural(), pb()osf_file_download() when multiple files have the same name and ignore_folder_structure = TRUE.osf_file_download() should handle errors more gracefully (with warnings, but not fail)openalex() results now include abstract, which parses the abstract_inverted_index for youmiscitation to detect commonly mis-cited papers (a proof-of-concept)power to detect and classify power analyses (currently being validated)aspredicted to get structured data from AsPredicted preregistrations (mainly for info)module_template() creates a module file from a templateorcid_person() gets details from an ORCiD, such as name, emails, countryosf_preprint_list() returns a table of preprints from the OSF optionally filtered by archive and dates created or modifiedinst/plumber/README.md for details.module_find() to find potential modules in the working directory and ./modules/effectsize module so text of the potential effect size is given in mod_output$table$es (mod_output$summary$ttests_n and mod_output$summary$Ftests_n columns removed, as they are just the sum of *tests_with_es and *tests_without_es)pdf2grobid() now gives more useful information in the warning if some files do not convert when converting more than one PDFosf_check module when there are no OSF linksstat_table() function by generating a summary table in case of empty stat tableexpand_text() doesn't find a text match because sentence location info is missing, it now returns the original text instead of NAread() on XMLs to update any saved paper objectspsychsci updated for these fixesretractionwatch internal data to retractionwatch() function (alias rw()) to support user updating.rw_date() so you can find out when retractionwatch was last updatedrw_update() lets you update retractionwatch yourselfpdf2grobid() handles save_path batter if any path components don't exist yet. The argument save_path also now can take a vector of the same length as the number of PDFs to convert, so you can specify the name of each output XML.read() now skips any imports with errors and warns you about them after importing all filesosf_get_all_pages() now has a new argument page_end to limit the number of pages retrieved (mainly for testing purposes), and is external (previously internal)osf_files() that failed on paths with spacesread() that duplicated entries in xrefsosf_file_download() now also retrieves files from linked storageosf_check_id() to return expected IDs from various URLsread() to parse more stupid date formats that turn up in the submission string (and added the unparsed submission string back just in case)paper$reference table is now paper$bibpaper$citations table is now paper$xrefs and also contains information for internal cross-references to figures, tables, footnotes, and formulaeref_id and bib_id in both tables is now xref_idxrefs table also contains location information (section, div, p, s) for the sentence containing the cross-ref, so you can use expand_text()read() function now returns paper objects with these new tables, so you will need to re-read any XML files (if you have stored the papercheck list as Rdata)psychsci object has been updated for this new formatexpand_text() where expanded sentences were duplicated if there are multiple matches from the same sentence in the data frame.retractionwatch tableread() that omitted paper DOIs from paper$inforead() to add correctly parsed "accepted" and "received" dates to paper$info (replaces paper$submission string) (ISO 8601 is the only correct date format!)psychsci for new info structureosf_file_download()osf_file_download() now returns a table of file info, including info for files not downloaded because of file size limitsread() function, which superceeds read_grobid(), read_cermine() and read_text() (they are still available, but are now just aliases to read()). This should work with XML files in TEI (grobid), JATS APA-DTD, NLM-DTD and cermine formats, plus full text-only parsing of .docx and plain text files.osf_file_download() function, which downloads all files under a project or node and structures them the same as the project.read_grobid() to classify headers as intro, method, results, discussion with better accuracy (to handle garbled headers)pdf2grobid() to allow some grobid parametersrbox_links() and rbox_retrieve()) -- very preliminaryp_value and the comparator as p_comp, like "exact_p"summarize_contents() from best_guess to file_categoryaspredicted_links() and aspredicted_retrieve() functionsread_grobid(), the paper$references table now contains new columns for bibtype, title, journal, year, and authors to facilitate reference checks, and more reliably pulls DOIs.psychsci set has been updated for the new reference tablesinfo_table() where adding "id" to the items argument borked the id columnjson_expand() function to expand JSON-formatted LLM responsesfind_project argument to osf_retrieve() to make searching for the parent project optional (it takes 1+ API calls)emojis for conveniencevalidate() that returned incorrect summary stats if the data type of an expected column didn't match the data type of an observed column (e.g., double vs integer)validate() that returned FALSE for matches if the expected and observed results were both NApapercheck_app() to show all modulespdf2grobid() where a custom grobid_url was not used in batch processingpsychsci object updated to use XMLs from grobid 0.8.2, which fixes some grobid-related errors in PDF importvalidate() function is updated for the new module structuresummary table that is appended to a master summary table if you chain modules like psychsci |> module_run("all_p_values") |> module_run("marginal")validate() function is temporarily removed to adapt the workflow to the new summary tables.module_help() function and some help/examples in modulesmodule_info() helper functionpaperlist() function to create paper list objectsread_grobid() to have fewer false positives for citationsretractionwatchopenalex() takes paper objects, paper lists, and vectors of DOIs as input, not just a single DOIread_cermine() as associated internal functions for reading cermine-formatted XMLsgithub_repo(), github_readme(), github_languages(), github_files(), github_info()read_grobid() now includes figure and table captions, plus footnotes, in the text tablepsychsci paper list object is updated to include the abovemodule_run() delegates to now check and only pass valid argumentsllm() no longer returns NA when the rate limit is hit, but slows down queries accordinglyread_grobid() now includes back matter (e.g., acknowledgements, COI statements) in the text, so is searchable with search_text()author_table() to get a dataframe of author info from a list of paper objectssearch_text() that omitted duplicate matches in the same sentence when using results = "match"stats() related to the above problem (and filed an issue on statcheck)psychsci dataset of 250 open access papers from Psychological Sciencesearch_text()info_table() to get a dataframe of info from a list of paper objectsdistinctive_words() and text_features()llm() and associated functions like llm_models()search_text()tl_accuracy()expand_text()validate() function and vignette