LOND {onlineFDR}R Documentation

Online FDR control based on number of discoveries

Description

Implements the LOND algorithm for online FDR control, where LOND stands for (significance) Levels based On Number of Discoveries, as presented by Javanmard and Montanari (2015).

Usage

LOND(d, alpha = 0.05, beta, dep = FALSE, random = TRUE,
  date.format = "%Y-%m-%d")

Arguments

d

Dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches.

alpha

Overall significance level of the FDR procedure, the default is 0.05.

beta

Optional vector of β_i. A default is provided as proposed by Javanmard and Montanari (2018), equation 31.

dep

Logical. If TRUE, runs the modified LOND algorithm which guarantees FDR control for dependent p-values. Defaults to FALSE.

random

Logical. If TRUE (the default), then the order of the p-values in each batch (i.e. those that have exactly the same date) is randomised.

date.format

Optional string giving the format that is used for dates.

Details

The function takes as its input a dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). The case where p-values arrive in batches corresponds to multiple instances of the same date. If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches.

The LOND algorithm controls FDR for independent p-values. Given an overall significance level α, we choose a sequence of non-negative numbers β_i such that they sum to α. The values of the adjusted test levels α_i are chosen as follows:

α_i = (D(i-1) + 1)β_i

where D(n) denotes the number of discoveries in the first n hypotheses.

For dependent p-values, LOND controls FDR if it is modified with β_i / H(i) in place of β_i, where H(j) is the i-th harmonic number.

Further details of the LOND algorithm can be found in Javanmard and Montanari (2015).

Value

d.out

A dataframe with the original dataframe d (which will be reordered if there are batches and random = TRUE), the LOND-adjusted test levels α_i and the indicator function of discoveries R. Hypothesis i is rejected if the i-th p-value is less than or equal to α_i, in which case R[i] = 1 (otherwise R[i] = 0).

References

Javanmard, A. and Montanari, A. (2015) On Online Control of False Discovery Rate. arXiv preprint, https://arxiv.org/abs/1502.06197

Javanmard, A. and Montanari, A. (2018) Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.

Examples

sample.df <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
    'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
    'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
date = as.Date(c(rep("2014-12-01",3),
                rep("2015-09-21",5),
                rep("2016-05-19",2),
                "2016-11-12",
                rep("2017-03-27",4))),
pval = c(2.90e-17, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757))

set.seed(1); LOND(sample.df)
LOND(sample.df, random=FALSE)
set.seed(1); LOND(sample.df, alpha=0.1)



[Package onlineFDR version 1.0.0 Index]