modified: Sat Jan 20 08:18:27 2018 compiled: Wed May 1 16:11:02 2024

1 Introduction

bacon can be used to remove inflation and bias often observed in epigenome- and transcriptome-wide association studies (Iterson, Zwet, and Heijmans 2017).

To this end bacon constructs an empirical null distribution using a Gibbs Sampling algorithm by fitting a three-component normal mixture on z-scores. One component is forced, using prior knowledge, to represent the null distribution with mean and standard deviation representing the bias and inflation. The other two components are necessary to capture the amount of true associations present in the data, which we assume unknown but small.

bacon provides functionality to inspect the output of the Gibbs Sampling algorithm, i.e., plots of traces, posterior distributions and the mixture fit, are provided. Furthermore, inflation- and bias-corrected test-statistics, effect-sizes and standard errors, or P-values are extracted easily. In addition, functionality for performing fixed-effect meta-analysis and obtaining inflation- and bias-corrected statistics with a 95% Confidence Interval (CI) are provided as well.

The function bacon requires a vector or a matrix of z-scores and/or effect-sizes and standard errors, e.g., those extracted from association analyses using a linear regression approach. For fixed-effect meta-analysis a matrix of effect-sizes and standard-errors is required.

This vignette illustrates the use of bacon using simulated z-scores, effect-sizes and standard errors to avoid long run-times. If multiple sets of test-statisics or effect-sizes and standard-errors are provided, the Gibbs Sampler algorithm can be executed in parallel to reduce computation time using functionality provide by BiocParallel-package.

2 A single set of test-statistics

A vector containing \(5000\) z-scores is generated from a normal mixture distribution, \(90\%\) of the z-scores were drawn from a biased and inflated null distribution, \(\mathcal{N}(0.2, 1.3)\), and the remaining z-scores from \(\mathcal{N}(\mu, 1)\), where \(\mu \sim \mathcal{N}(4, 1)\). The rnormmix-function provided by Bacon generates a vector of random test-statistics described above optionally with different parameters.

y <- rnormmix(5000, c(0.9, 0.2, 1.3, 1, 4, 1))

The function bacon executes the Gibbs Sampler algorithm and stores all in- and out-put in an object of class Bacon. Several accessor-functions are available to access data contained in the Bacon-object, e.g. for obtaining the estimated parameters of the mixture fit or explicitly the bias and inflation. Actually, the latter two are the mean and standard deviation of the null component (mu.0 and sigma.0).

bc <- bacon(y)
bc
## Bacon-object containing 1 set(s) of 5000 test-statistics.
## ...estimated bias: 0.17.
## ...estimated inflation: 1.3.
## 
## Empirical null estimates are based on 5000 iterations with a burnin-period of 2000.
estimates(bc)
##        p.0    p.1    p.2  mu.0 mu.1  mu.2 sigma.0 sigma.1 sigma.2
## [1,] 0.913 0.0573 0.0299 0.166 3.08 -3.02    1.29    3.01     2.6
inflation(bc)
## sigma.0 
##    1.29
bias(bc)
##  mu.0 
## 0.166

Several methods are provided to inspect the output of the Gibbs Sampler algorithm, such as traces-plots of all estimates, plots of posterior distributions, provide as a scatter plot between two parameters, and the actual fit of the three component mixture to the histogram of z-scores.

traces(bc, burnin=FALSE)