Random DAGs

Zuguang Gu ( z.gu@dkfz.de )

2024-02-06

simona provides functions for generating random DAGs. A random tree is first generated, later more links can be randomly added to form a more general DAG.

Random trees

dag_random_tree() generates a random tree. By default it generates a binary tree where all leaf terms have depth = 9.

library(simona)
set.seed(123)
tree1 = dag_random_tree()
tree1
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1023 terms / 1022 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Aspect ratio: 56.89:1
dag_circular_viz(tree1)

Strictly speaking, tree1 is not random. The tree is growing from the root. In dag_random_tree(), there are several arguments that can be used for generating random trees.

The tree growing stops when the number of total terms exceeds max.

So the default call of dag_random_tree() is identical to:

dag_random_tree(n_children = 2, p_stop = 0, max = 2^10 - 1)

We can change these arguments to some other values, such as:

tree2 = dag_random_tree(n_children = c(2, 6), p_stop = 0.5, max = 2000)
tree2
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1999 terms / 1998 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 7 
##   Aspect ratio: 105.71:1
dag_circular_viz(tree2)

Random DAGs

A more general random DAG is generated based on the random tree. Taking tree1 which is already generated, the function dag_add_random_children() adds more random children to terms in tree1.

dag1 = dag_add_random_children(tree1)
dag1
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1115 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.09
##   Avg number of children: 1.03
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 52.78:1 (based on the shortest distance from root)
dag_circular_viz(dag1)

There are three arguments that controls new child terms. We first introduce two of them.

Let’s try to generate a more dense DAG:

dag2 = dag_add_random_children(tree1, p_add = 0.6, new_children = c(2, 8))
dag2
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 2550 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 2.50
##   Avg number of children: 1.59
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)
dag_circular_viz(dag2)

By default, once a term t is going to add more child terms, it only selects new child terms from the terms that are:

  1. lower than t, i.e. with depths less than t’s depth in the DAG.
  2. not the child terms that t already has.

Then in this subset of candidate child terms, new child terms is randomly picked according to the numbers set in new_children.

The way to randomly pick new child terms can be implemented as a self-defined function. This function accepts two arguments, the dag object and an integer index of “current term”. In the following example, we implemented a function which only pick new child terms from term t’s offspring terms.

add_new_children_from_offspring = function(dag, i, new_children = c(1, 8)) {

    l = rep(FALSE, dag_n_terms(dag))
    offspring = dag_offspring(dag, i, in_labels = FALSE)
    if(length(offspring)) {
        l[offspring] = TRUE

        l[dag_children(dag, i, in_labels = FALSE)] = FALSE
    }

    candidates = which(l)
    n_candidates = length(candidates)
    if(n_candidates) {
        if(n_candidates < new_children[1]) {
            integer(0)
        } else {
            sample(candidates, min(n_candidates, sample(seq(new_children[1], new_children[2]), 1)))
        }
    } else {
        integer(0)
    }  
}

dag3 = dag_add_random_children(tree1, p_add = 0.6,
    add_random_children_fun = add_new_children_from_offspring)
dag3
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1583 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.55
##   Avg number of children: 1.25
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)
dag_circular_viz(dag3)

Session info

sessionInfo()
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ComplexHeatmap_2.18.0 org.Hs.eg.db_3.18.0   AnnotationDbi_1.64.1 
##  [4] IRanges_2.36.0        S4Vectors_0.40.2      Biobase_2.62.0       
##  [7] BiocGenerics_0.48.1   igraph_2.0.1.1        simona_1.0.10        
## [10] knitr_1.45           
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.42.0         circlize_0.4.15         shape_1.4.6            
##  [4] rjson_0.2.21            xfun_0.41               bslib_0.6.1            
##  [7] visNetwork_2.1.2        htmlwidgets_1.6.4       GlobalOptions_0.1.2    
## [10] bitops_1.0-7            vctrs_0.6.5             tools_4.3.2            
## [13] curl_5.2.0              parallel_4.3.2          Polychrome_1.5.1       
## [16] RSQLite_2.3.5           highr_0.10              cluster_2.1.6          
## [19] blob_1.2.4              pkgconfig_2.0.3         RColorBrewer_1.1-3     
## [22] scatterplot3d_0.3-44    GenomeInfoDbData_1.2.11 lifecycle_1.0.4        
## [25] compiler_4.3.2          textshaping_0.3.7       Biostrings_2.70.2      
## [28] codetools_0.2-19        clue_0.3-65             GenomeInfoDb_1.38.5    
## [31] httpuv_1.6.14           htmltools_0.5.7         sass_0.4.8             
## [34] RCurl_1.98-1.14         yaml_2.3.8              later_1.3.2            
## [37] crayon_1.5.2            jquerylib_0.1.4         GO.db_3.18.0           
## [40] ellipsis_0.3.2          cachem_1.0.8            iterators_1.0.14       
## [43] foreach_1.5.2           mime_0.12               digest_0.6.34          
## [46] fastmap_1.1.1           colorspace_2.1-0        cli_3.6.2              
## [49] DiagrammeR_1.0.11       magrittr_2.0.3          promises_1.2.1         
## [52] bit64_4.0.5             rmarkdown_2.25          XVector_0.42.0         
## [55] httr_1.4.7              matrixStats_1.2.0       bit_4.0.5              
## [58] ragg_1.2.7              png_0.1-8               GetoptLong_1.0.5       
## [61] memoise_2.0.1           shiny_1.8.0             evaluate_0.23          
## [64] doParallel_1.0.17       rlang_1.1.3             Rcpp_1.0.12            
## [67] xtable_1.8-4            glue_1.7.0              DBI_1.2.1              
## [70] xml2_1.3.6              rstudioapi_0.15.0       jsonlite_1.8.8         
## [73] R6_2.5.1                systemfonts_1.0.5       zlibbioc_1.48.0