LOND {onlineFDR} | R Documentation |
Implements the LOND algorithm for online FDR control, where LOND stands for (significance) Levels based On Number of Discoveries, as presented by Javanmard and Montanari (2015).
LOND(d, alpha = 0.05, beta, dep = FALSE, random = TRUE, date.format = "%Y-%m-%d")
d |
Dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches. |
alpha |
Overall significance level of the FDR procedure, the default is 0.05. |
beta |
Optional vector of β_i. A default is provided as proposed by Javanmard and Montanari (2018), equation 31. |
dep |
Logical. If |
random |
Logical. If |
date.format |
Optional string giving the format that is used for dates. |
The function takes as its input a dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). The case where p-values arrive in batches corresponds to multiple instances of the same date. If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches.
The LOND algorithm controls FDR for independent p-values. Given an overall significance level α, we choose a sequence of non-negative numbers β_i such that they sum to α. The values of the adjusted test levels α_i are chosen as follows:
α_i = (D(i-1) + 1)β_i
where D(n) denotes the number of discoveries in the first n hypotheses.
For dependent p-values, LOND controls FDR if it is modified with β_i / H(i) in place of β_i, where H(j) is the i-th harmonic number.
Further details of the LOND algorithm can be found in Javanmard and Montanari (2015).
d.out |
A dataframe with the original dataframe |
Javanmard, A. and Montanari, A. (2015) On Online Control of False Discovery Rate. arXiv preprint, https://arxiv.org/abs/1502.06197
Javanmard, A. and Montanari, A. (2018) Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.
sample.df <- data.frame( id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902', 'C38292', 'A30619', 'D46627', 'E29198', 'A41418', 'D51456', 'C88669', 'E03673', 'A63155', 'B66033'), date = as.Date(c(rep("2014-12-01",3), rep("2015-09-21",5), rep("2016-05-19",2), "2016-11-12", rep("2017-03-27",4))), pval = c(2.90e-17, 0.06743, 0.01514, 0.08174, 0.00171, 3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08, 0.69274, 0.30443, 0.00136, 0.72342, 0.54757)) set.seed(1); LOND(sample.df) LOND(sample.df, random=FALSE) set.seed(1); LOND(sample.df, alpha=0.1)