---
title: "Modules"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Modules}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "#>"
)
```

```{r}
#| label: setup
#| message: false
devtools::load_all(".")
library(dplyr)
```

Metacheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.

## Module List

You can see the list of built-in modules with the function below.

```{r}
#| results: 'asis'
module_list()
```

## Module Output

Module designers can include any information in the returned output, but we suggest they structure it in a specific way to facilitate creating reports and summarising many papers in a metascientific workflow. 

So most modules output a list with the following named items: module, title, table, report, traffic_light, summary_text, summary_table, paper. You probably don't need to worry about any of this unless you are designing modules or using metacheck for metascience -- the `report()` function takes care of displaying everything for you when you need to assess a single paper. 

```{r}
paper <- demopaper()
mo <- module_run(paper, "stat_p_exact")
```


The `module`, `title`, and `summary_text` give brief information. 

```{r}
mo$module
mo$title
mo$summary_text
```
### Traffic light

The `traffic_light` helps the reports give a quick visual guide to where there are problems or things to check.

```{r}
mo$traffic_light
```


🟢 no problems detected;  
🟡 something to check;  
🔴 possible problems detected;  
🔵 informational only;  
⚪️ not applicable;  
⚫️ check failed

### Table

The `table` is usually a detailed table in the format returned from `text_search()` or `text_expand()`, containing either text relevant to the module, or a classification of the text. This table can be of use to further modules in a chain, or to metascientific users. 

```{r}
mo$table
```


### Summary Table

The `summary_table` contains a single row for each paper, and must have an `id` column that matches the paper IDs. It will also have additional columns that summarise the results of the module. This is mainly useful in the metascientific workflow, and this table is appended by each module in a chain.

```{r}
mo$summary_table
```


### Report

The `report` contains a vector of markdown and R code to be inserted into a report. The display is usually handled by the `module_report()` function inside the `report()` function.

```{r}
mo$report
```

### Paper

The `paper` is just the paper argument to `module_run()`. This is mainly used when chaining modules.

```{r}
mo$paper
```

### Previous Outputs

If you run modules in a chain or via the `report()` function, the output accumulates the outputs of previous modules in this item. This is so some modules can share resource-intensive parts of checks rather than repeating them.

```{r}
mo <- paper |>
  module_run("stat_p_exact") |>
  module_run("marginal") |>
  module_run("stat_effect_size")

mo$prev_outputs
```


## Built-in Modules

Below, we will demonstrate the use of a few built-in modules, first on a single paper and then a list of papers, the `psychsci` list of 250 open-access papers from Psychological Science.

```{r}
paper <- psychsci$`0956797620955209`
```


### all_p_values

List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.

```{r}
all_p <- module_run(paper, "all_p_values")

all_p$table # print table
```


If you run this module on all 250 papers, you will get more rows than you probably want to print in the full table one row for every p-value in each paper), so you can print the summary table, which gives you one row per paper.

```{r}
all_p_ps <- module_run(psychsci, "all_p_values")

all_p_ps$summary_table |> head()
```

You can still access the full table for further processing.

```{r}
all_p_ps$table |>
  count(text, sort = TRUE) |>
  head()
```


### all_urls

List all the URLs in the main text. There will, of course, be a few false positives when text in the paper is formatted as a valid URL. 

```{r}
all_urls <- module_run(paper, "all_urls")

all_urls$table
```


```{r}
all_urls_ps <- module_run(psychsci, "all_urls")

all_urls_ps$summary_table
```

### stat_p_exact

List any p-values that may have been reported with insufficient precision (e.g., p < .05 or p = n.s.). 

```{r}
imprecise <- module_run(paper, "stat_p_exact")

imprecise$table$text # print table
```

The `expanded` column has the full sentence for context. Here you can see that "p < .025" was not an imprecisely reported p-value, but a description of the preregistered alpha threshold. 

```{r}
imprecise$table$expanded[[4]] # print expanded text
```

We can investigate the most common imprecise p-values in the PsychSci set. "p < .01" and "p < .05" are probably often describing figures or tables, but what is the deal with "p > .25"?

```{r}
imprecise_ps <- module_run(psychsci, "stat_p_exact")

imprecise_ps$table |>
  count(text, sort = TRUE) |>
  head()
```

We can expand the text to check the context for "p > .25".

```{r}
gt.25 <- imprecise_ps$table |>
  filter(grepl("\\.25", text))

gt.25$expanded[1:3] # look at the first 3
```

### marginal

List all sentences that describe an effect as 'marginally significant'.

```{r, results='asis'}
marginal <- module_run(paper, "marginal")

marginal # print table
```

Let's check how many are in the full set.

```{r}
marginal_ps <- module_run(psychsci, "marginal")

marginal_ps$table # print table
```


### stat_check

Check consistency of p-values and test statistics using functions from [statcheck](https://github.com/MicheleNuijten/statcheck).

```{r}
statcheck <- module_run(paper, "stat_check")

statcheck$table
```

Here we see a false positive, where the paper reported the results of an equivalence test, which are meant to be one-tailed, but statcheck did not detect that this was one-tailed.


In the full PsychSci set, there are more than 27K sentences with numbers to check, so this takes about a minute to run. 

```{r, results='asis', eval = FALSE}
statcheck_ps <- module_run(psychsci, "stat_check")
```

```{r, echo = FALSE}
#saveRDS(statcheck_ps, "statcheck_ps.Rds")
statcheck_ps <- readRDS("statcheck_ps.Rds")
```

There will be, of course, some false positives in the full set of `r nrow(statcheck_ps$table)` flagged values. Let's look just at the flagged values where the computed p-value is about double the reported p-value, and this changes the significance decision (at an alpha of 0.05).

```{r}
statcheck_ps$table |>
  filter(decision_error, 
         round(computed_p/reported_p, 1) == 2.0) |>
  select(reported_p, computed_p, raw) |>
  mutate(computed_p = round(computed_p, 4))
```

## Chaining Modules

Modules return a `summary` table as well as the detailed results `table`, which is automatically added to the summary if you chain modules.

```{r}
ps_metascience <- psychsci[1:10] |>
  module_run("all_p_values") |>
  module_run("stat_p_exact") |>
  module_run("marginal")

ps_metascience$summary_table
```