Make Genome-level Trellis Graph

Make Genome-level Trellis Graph

Author: Zuguang Gu ( z.gu@dkfz.de )

Date: 2024-10-29

Package version: 1.39.0


Trellis graph is a type of graph which splits data by certain conditions and visualizes subset of data in each category parallel. In genomic data analysis, the conditional variable is mostly the chromosome. The advantage of Trellis graph is it can easily reveal multiple variable relationship behind the data. In R, lattice and ggplot2 package can make Trellis graph, however, specially for whole genome level plot, they are limited in:

For a single continuous region, multiple tracks are supported in ggbio and Gviz. But if you want to compare more than one regions, things will be complex. Due to the design of ggbio or Gviz, it will not be so efficient to visualize e.g. larger than 10 regions at a same time.

Here, gtrellis provides a flexible way to arrange genomic categories. It has following advantages:

Basic design

gtrellis aims to arrange genomic categories as Trellis style and supports multiple tracks for visualization. In this package, initializating the layout and adding graphics are independent. After initialization of the layout, intersection between tracks and genomic categories are named cell or panel, and each cell is an independent plotting region (actually, each cell is a viewport in grid system) that self-defined graphics can be added afterwards.

gtrellis is implemented in grid graphic system, so, in order to add graphics in each cell, users need to use low-level graphic functions (grid.points, grid.lines, grid.rect, …) which are quite similar as those in classic graphic system.

Initialize the layout

gtrellis_layout() is used to create the global layout. By default, it initializes the layout with hg19 and puts all chromosomes in one row. Each chromosome has only one track and range on y-axis is 0 to 1.

library(gtrellis)
gtrellis_layout()
plot of chunk unnamed-chunk-2

plot of chunk unnamed-chunk-2

category can be used to set subset of chromosomes as well as the order of chromosomes. gtrellis_show_index() here is an assistant function to add the information to each cell, just for demonstration purpose in this vignette.

gtrellis_layout(category = c("chr3", "chr1"))
gtrellis_show_index()
plot of chunk unnamed-chunk-3

plot of chunk unnamed-chunk-3

Other species are also supported as long as corresponding chromInfo files exist on UCSC ftp. E.g. chromInfo file for mouse (mm10) is http://hgdownload.cse.ucsc.edu/goldenpath/mm10/database/chromInfo.txt.gz. Since there may be many short scaffolds in chromInfo file, if category is not specified, gtrellis will first remove these short scaffolds before making the plot. Also non-normal chromosomes (e.g. “chr1_xxxxxx”) will also be removed. Sometimes this detection is not always correct, if you find chromosomes shown on the plot is not what you expect, set category manually.

gtrellis_layout(species = "mm10")
gtrellis_show_index()
plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-4

You can put chromosomes on multiple rows by specifying nrow or/and ncol. For chromosomes in the same column, the corresponding width is the width of the longest chromosome in that column and short chromosomes will be extended with empty areas.

gtrellis_layout(nrow = 3)
gtrellis_show_index()
plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-5

gtrellis_layout(ncol = 5)
gtrellis_show_index()
plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-5

You can set byrow argument to arrange chromosomes either by rows or by columns. As explained before, by default chromosomes in the same column will share the length of the longest one. It is better to put chromosomes with similar length in a same column.

gtrellis_layout(ncol = 5, byrow = FALSE)
gtrellis_show_index()
plot of chunk unnamed-chunk-6

plot of chunk unnamed-chunk-6

If equal_width is set to TRUE, the layout will be a ‘standard’ Trellis layout. All chromosomes will share the same range on x-axis (length of the longest chromosome) and short chromosomes will be extended with empty areas.

gtrellis_layout(equal_width = TRUE)
gtrellis_show_index()
plot of chunk unnamed-chunk-7

plot of chunk unnamed-chunk-7

Make all columns having equal width and also set multiple rows.

gtrellis_layout(ncol = 5, byrow = FALSE, equal_width = TRUE)
gtrellis_show_index()
plot of chunk unnamed-chunk-8

plot of chunk unnamed-chunk-8

There is also a ‘compact’ mode of the layout that when there are multiple rows, chromosomes on a same row can be put compactly without being aligned to corresponding columns. This mode saves a lot of white space but the drawback is that it is not easy to directly compare positions among chromosomes.

gtrellis_layout(nrow = 3, compact = TRUE)
gtrellis_show_index()
plot of chunk unnamed-chunk-9

plot of chunk unnamed-chunk-9

Set gaps between chromosomes. Note if it is set as a numeric value, it should only be 0 (no gap).

gtrellis_layout(gap = 0)
plot of chunk unnamed-chunk-10

plot of chunk unnamed-chunk-10

Or gap can be a unit object.

gtrellis_layout(gap = unit(5, "mm"))
plot of chunk unnamed-chunk-11

plot of chunk unnamed-chunk-11

When you arrange the layout with multiple rows, you can also set gap as length of two. In this case, the first element corresponds to the gaps between rows and the second corresponds to the gaps between columns.

gtrellis_layout(ncol = 5, gap = unit(c(5, 2), "mm"))
plot of chunk unnamed-chunk-12

plot of chunk unnamed-chunk-12

There may be multiple tracks for chromosomes which describe multiple dimensional data. The tracks can be created by n_track argument.

gtrellis_layout(n_track = 3)
gtrellis_show_index()
plot of chunk unnamed-chunk-13

plot of chunk unnamed-chunk-13

By default, tracks share the same height. The height can be customized by track_height argument. If it is set as numeric values, it will be normalized as percent to the sum.

gtrellis_layout(n_track = 3, track_height = c(1, 2, 3))
plot of chunk unnamed-chunk-14

plot of chunk unnamed-chunk-14

track_height can also be a unit object.

gtrellis_layout(n_track = 3, 
    track_height = unit.c(unit(1, "cm"), unit(1, "null"), grobHeight(textGrob("chr1"))))
plot of chunk unnamed-chunk-15

plot of chunk unnamed-chunk-15

track_axis controls whether to show y-axes. If certain value is set to FALSE, y-axis on corresponding track will not be drawn.

gtrellis_layout(n_track = 3, track_axis = c(FALSE, TRUE, FALSE), xaxis = FALSE, xlab = "")
plot of chunk unnamed-chunk-16

plot of chunk unnamed-chunk-16

Set y-lim by track_ylim. It should be a two-column matrix. But to make things easy, it can also be a vector and it will be filled into a two-column matrix by rows. If it is a vector with length 2, it means all tracks share the same y-lim.

gtrellis_layout(n_track = 3, track_ylim = c(0, 3, -4, 4, 0, 1000000))
plot of chunk unnamed-chunk-17

plot of chunk unnamed-chunk-17

Axis ticks are added on one side of rows or columns, asist_ticks controls whether to add axis ticks on the other sides. (You can compare following figure to the above one.)

gtrellis_layout(n_track = 3, track_ylim = c(0, 3, -4, 4, 0, 1000000), asist_ticks = FALSE)
plot of chunk unnamed-chunk-18

plot of chunk unnamed-chunk-18

Set x-label by xlab and set y-labels by track_ylab.

gtrellis_layout(n_track = 3, title = "title", track_ylab = c("", "bbbbbb", "ccccccc"), xlab = "xlab")
plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-19

Since chromosomes can have more than one tracks, following shows a layout with multiple columns and multiple tracks.

gtrellis_layout(n_track = 3, ncol = 4)
gtrellis_show_index()
plot of chunk unnamed-chunk-20

plot of chunk unnamed-chunk-20

Set border to FALSE to remove borders.

gtrellis_layout(n_track = 3, ncol = 4, border = FALSE, xaxis = FALSE, track_axis = FALSE, xlab = "")
gtrellis_show_index()
plot of chunk unnamed-chunk-21

plot of chunk unnamed-chunk-21

Add graphics

After the initialization of the layout, each cell can be thought as an ordinary coordinate system. Then graphics can be added in afterwards.

Pre-defined track

First we will introduce functions which add fixed types of graphics.

add_points_track() directly adds points at the middle points of corresponding genomic regions. The genomic region variable can be either a data frame or a GRanges object.

library(circlize)
bed = generateRandomBed()
gtrellis_layout(track_ylim = range(bed[[4]]), nrow = 3, byrow = FALSE)
add_points_track(bed, bed[[4]], gp = gpar(col = ifelse(bed[[4]] > 0, "red", "green")))
plot of chunk unnamed-chunk-22

plot of chunk unnamed-chunk-22

add_segments_track() adds segments for corresponding regions.

bed = generateRandomBed(nr = 100)
gtrellis_layout(track_ylim = range(bed[[4]]), nrow = 3, byrow = FALSE)
add_segments_track(bed, bed[[4]], gp = gpar(col = ifelse(bed[[4]] > 0, "red", "green"), lwd = 4))
plot of chunk unnamed-chunk-23

plot of chunk unnamed-chunk-23

add_lines_track() adds lines. Also it can draw areas below the lines (or above, depending on baseline).

bed = generateRandomBed(200)
gtrellis_layout(n_track = 2, track_ylim = rep(range(bed[[4]]), 2), nrow = 3, byrow = FALSE)
add_lines_track(bed, bed[[4]])
add_lines_track(bed, bed[[4]], area = TRUE, gp = gpar(fill = "grey", col = NA))
plot of chunk unnamed-chunk-24

plot of chunk unnamed-chunk-24

add_rect_track() adds rectangles which is useful to draw bars.

col_fun = colorRamp2(c(-1, 0, 1), c("green", "black", "red"))
gtrellis_layout(track_ylim = range(bed[[4]]), nrow = 3, byrow = FALSE)
add_rect_track(bed, h1 = bed[[4]], h2 = 0, 
    gp = gpar(col = NA, fill = col_fun(bed[[4]])))
plot of chunk unnamed-chunk-25

plot of chunk unnamed-chunk-25

add_heatmap_track() adds heatmap. Heatmap will fill the whole track vertically.

gtrellis_layout(nrow = 3, byrow = FALSE, track_axis = FALSE)
mat = matrix(rnorm(nrow(bed)*4), ncol = 4)
add_heatmap_track(bed, mat, fill = col_fun)
plot of chunk unnamed-chunk-26

plot of chunk unnamed-chunk-26

By default, these pre-defined graphic functions draw in the next track. However, different types of graphics can be drawn in a same track by manually setting track.

col_fun = colorRamp2(c(-1, 0, 1), c("green", "black", "red"))
gtrellis_layout(track_ylim = range(bed[[4]]), nrow = 3, byrow = FALSE)
add_rect_track(bed, h1 = bed[[4]], h2 = 0, gp = gpar(col = NA, fill = col_fun(bed[[4]])))
add_lines_track(bed, bed[[4]], track = current_track())
add_points_track(bed, bed[[4]], track = current_track(), size = unit(abs(bed[[4]])*5, "mm"))