Visualize DAGs

Zuguang Gu ( z.gu@dkfz.de )

2024-02-06

There are two functions for visualizing DAGs. dag_graphviz() uses the DiagrammeR package to visualize small DAGs as HTML widgets. dag_circular_viz() uses a circular layout for large DAGs.

Small DAGs

Let’s first create a small DAG.

library(simona)
parents  = c("a", "a", "b", "b", "c", "d")
children = c("b", "c", "c", "d", "e", "f")
dag_small = create_ontology_DAG(parents, children)
dag_graphviz(dag_small)

The argument node_param can be set to a list of graphical parameters.

color = 2:7
shape = c("polygon", "box", "oval", "egg", "diamond", "parallelogram")
dag_graphviz(dag_small, node_param = list(color = color, shape = shape))

The graphical parameters are not necessary to be a vector. It can be a single value which affects all nodes, or a named vector that contains a subset of nodes to be customized.

color = c("a" = "red", "d" = "blue")
dag_graphviz(dag_small, node_param = list(color = color))

The full set of node-level parameters can be found at: https://graphviz.org/docs/nodes/. They can all be set in the same format as color demonstrated above.

The argument edge_param can be set to a list of graphical parameters for configuring edges. There are two ways to control edge colors. In the following code, we additionally add the relation types to the DAG.

parents  = c("a", "a", "b", "b", "c", "d")
children = c("b", "c", "c", "d", "e", "f")
relations = c("is_a", "is_a", "part_of", "part_of", "is_a", "is_a")
dag_small = create_ontology_DAG(parents, children, relations = relations)

Now since each edge is associated with a relation type, the color can be set by a vector with relation types as names:

edge_color = c("is_a" = "red", "part_of" = "blue")
dag_graphviz(dag_small, edge_param = list(color = edge_color))

To highlight specific edges, the parameter can be set to a named vector where names directly contain relations.

edge_color = c("a -> b" = "red", "c -> e" = "blue")
dag_graphviz(dag_small, edge_param = list(color = edge_color))

The direction in the specification does not matter. The following ways are all the same, but there must be spaces before and after the arrow.

"a -> b"  = "red"
"a <- b"  = "red"
"b -> a"  = "red"
"b <- a"  = "red"
"a <-> b" = "red"
"a - b"   = "red"

The full set of edge-level parameters can be found at https://graphviz.org/docs/edges/. They can all be set in the same format as edge_color demonstrated above.

Internally, dag_graphviz() generates the “DOT” code for graphiviz visualization. The DOT code can be obtained with dag_as_DOT():

dag_as_DOT(dag_small, node_param = list(color = color, shape = shape)) |> cat()
digraph {
  graph [overlap = true]

  node [fontname = "Helvetical"]
  "a" [
    color = "red",
    shape = "polygon",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];
  "b" [
    color = "black",
    shape = "box",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];
  "c" [
    color = "black",
    shape = "oval",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];
  "d" [
    color = "blue",
    shape = "egg",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];
  "e" [
    color = "black",
    shape = "diamond",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];
  "f" [
    color = "black",
    shape = "parallelogram",
    fillcolor = "lightgrey",
    style = "solid",
    fontcolor = "black",
    fontsize = "10",
  ];

  # edges
  "a" -> "b" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "is_a",
  ];
  "a" -> "c" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "is_a",
  ];
  "b" -> "c" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "part_of",
  ];
  "b" -> "d" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "part_of",
  ];
  "c" -> "e" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "is_a",
  ];
  "d" -> "f" [
    color = "black",
    style = "solid",
    dir = "back",
    tooltip = "is_a",
  ];
}

You can paste the DOT code to http://magjac.com/graphviz-visual-editor/ to generate the diagram.

dag_graphviz() is very useful for visualizing a sub-DAG derived from the global DAG. For example, all upstream terms of a GO term. Recall in the following example, dag[, "GO:0010228"] returns a sub-DAG of all upstream terms of GO:0010228.

dag = create_ontology_DAG_from_GO_db()
dag_graphviz(dag[, "GO:0010228"], 
    node_param = list(
        fillcolor = c("GO:0010228" = "pink"),
        style = c("GO:0010228" = "filled")
    ),
    edge_param = list(
        color = c("is_a" = "purple", "part_of" = "darkgreen"),
        style = c("is_a" = "solid", "part_of" = "dashed")
    ), width = 600, height = 600)

Large DAGs

Visualizing large DAGs is not an easy job because a term can have more than one parents. Here the dag_circular_viz() uses a circular layout to visualize large DAGs.

dag_circular_viz(dag)

In the circular layout, each circle correspond to a specific depth (maximal distance to root). The distance of a circle to the circle center is proportional to the logorithm of the number of terms with depth equal to or less than the current depth of this circle. On each circle, each term has a width (or a sector on the circle) associated where offspring terms are only drawn within that section. The width is proportional to the number of leaf terms in the corresponding sub-DAG. Dot size corresponds to the number of child terms.

By default, the DAG is cut after the root term, and each sub-DAG is assigned with a different color. Child terms of root is added in the legend in the plot. If there is a “name” column in the meta data frame, texts in the “name” column are used as the legend labels, or else term IDs are used.

By default the DAG is split on a certain level controlled by the argument partition_by_level. It can also be controlled by setting the possible number of terms in each sub-DAG.

dag_circular_viz(dag, partition_by_size = 5000)

dag_treelize() can convert a DAG to a tree where a term only has one parent. The circular visualization on the reduced tree is as follows:

tree = dag_treelize(dag)
dag_circular_viz(tree)

One useful application is to map GO terms of interest (e.g. significant GO terms from function enrichment analysis) to the DAG. In the following example, go_tb contains GO terms from an enrichment analysis.

go_tb = readRDS(system.file("extdata", "sig_go_tb.rds", package = "simona"))
sig_go_ids = go_tb$ID[go_tb$p.adjust < 0.01]
# make sure `sig_go_ids` all in current GO.db version
sig_go_ids = intersect(sig_go_ids, dag_all_terms(dag))
dag_circular_viz(dag, highlight = sig_go_ids)

In the next example, we will map -log10(p.adjust) to the node size.

p.adjust = go_tb$p.adjust[go_tb$p.adjust < 0.01]

dag_circular_viz() has a node_size argument which allows to set node sizes for terms, thus, we only need to calculate node sizes by the adjusted p-values.

In the following code, we defined a simple node_size_fun() function that linearly interpolates values to node sizes within [2, 10].

node_size_fun = function(x, range = c(2, 10)) {
    s = (range[2] - range[1])/(quantile(x, 0.95) - min(x)) * (x - min(x)) + range[1]
    s[s > range[2]] = range[2]
    s
}

We also generate a legend for the node sizes:

library(ComplexHeatmap)
lgd = Legend(title = "p.adjust", at = -log10(c(0.01, 0.001, 0.0001)), 
    labels = c("0.01", "0.001", "0.0001"), type = "points",
    size = unit(node_size_fun(-log10(c(0.01, 0.001, 0.0001))), "pt"))

Calculate node sizes:

node_size = rep(2, dag_n_terms(dag))
names(node_size) = dag_all_terms(dag)
node_size[sig_go_ids] = node_size_fun(-log10(p.adjust))
## Warning in node_size[sig_go_ids] = node_size_fun(-log10(p.adjust)): number of
## items to replace is not a multiple of replacement length

And finally make the circular plot:

dag_circular_viz(dag, 
    highlight = sig_go_ids,
    node_size = node_size,
    edge_transparency = 0.92, 
    other_legends = lgd)

Session Info

sessionInfo()
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ComplexHeatmap_2.18.0 org.Hs.eg.db_3.18.0   AnnotationDbi_1.64.1 
##  [4] IRanges_2.36.0        S4Vectors_0.40.2      Biobase_2.62.0       
##  [7] BiocGenerics_0.48.1   igraph_2.0.1.1        simona_1.0.10        
## [10] knitr_1.45           
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.42.0         circlize_0.4.15         shape_1.4.6            
##  [4] rjson_0.2.21            xfun_0.41               bslib_0.6.1            
##  [7] visNetwork_2.1.2        htmlwidgets_1.6.4       GlobalOptions_0.1.2    
## [10] bitops_1.0-7            vctrs_0.6.5             tools_4.3.2            
## [13] curl_5.2.0              parallel_4.3.2          Polychrome_1.5.1       
## [16] RSQLite_2.3.5           highr_0.10              cluster_2.1.6          
## [19] blob_1.2.4              pkgconfig_2.0.3         RColorBrewer_1.1-3     
## [22] scatterplot3d_0.3-44    GenomeInfoDbData_1.2.11 lifecycle_1.0.4        
## [25] compiler_4.3.2          textshaping_0.3.7       Biostrings_2.70.2      
## [28] codetools_0.2-19        clue_0.3-65             GenomeInfoDb_1.38.5    
## [31] httpuv_1.6.14           htmltools_0.5.7         sass_0.4.8             
## [34] RCurl_1.98-1.14         yaml_2.3.8              later_1.3.2            
## [37] crayon_1.5.2            jquerylib_0.1.4         GO.db_3.18.0           
## [40] ellipsis_0.3.2          cachem_1.0.8            iterators_1.0.14       
## [43] foreach_1.5.2           mime_0.12               digest_0.6.34          
## [46] fastmap_1.1.1           colorspace_2.1-0        cli_3.6.2              
## [49] DiagrammeR_1.0.11       magrittr_2.0.3          promises_1.2.1         
## [52] bit64_4.0.5             rmarkdown_2.25          XVector_0.42.0         
## [55] httr_1.4.7              matrixStats_1.2.0       bit_4.0.5              
## [58] ragg_1.2.7              png_0.1-8               GetoptLong_1.0.5       
## [61] memoise_2.0.1           shiny_1.8.0             evaluate_0.23          
## [64] doParallel_1.0.17       rlang_1.1.3             Rcpp_1.0.12            
## [67] xtable_1.8-4            glue_1.7.0              DBI_1.2.1              
## [70] xml2_1.3.6              rstudioapi_0.15.0       jsonlite_1.8.8         
## [73] R6_2.5.1                systemfonts_1.0.5       zlibbioc_1.48.0