1 Overview

TreeAndLeaf is an R-based package for better visualization of dendrograms and phylogenetic trees. The package changes the way a dendrogram is viewed. Through the use of the igraph format and the package RedeR, the nodes are rearranged and the hierarchical relations are kept intact, resulting in an image that is easier to read and can be enhanced with additional layers of information.

The classical dendrogram is a limited format in two ways. Firstly, it only displays one type of information, which is the hierarchical relation between the data. Secondly, it is limited by its size, the larger the database, the less readable it becomes. The TreeAndLeaf enhances space distribution because it uses all directions, allowing for an improved visualization and a better image for publications. The package RedeR, used for plotting in this package, uses a force-based relaxation algorithm that helps nodes in avoiding overlaps. By implementing RedeR and the igraph format, the package allows for customization of the dendrogram inserting multiple layers of information to be represented by edge widths and colors, nodes colors, nodes sizes, line color, etc. The package also includes a fast formatting option for quick and exploratory analysis usage. Therefore, the package is designed to make plotting dendrograms more useful, less confusing and more productive. The workflow while using this package is depicted from Figure 1.

Figure 1. A brief representation of what TreeAndLeaf functions are capable of. (A,B) The dendrogram in A was used to construct the graph representation shown in B. (C) Workflow summary. The main input data consists of a distance matrix, which is used to generate a dendrogram. The TreeAndLeaf package transforms the dendrogram into a graph representation.

This document intends to guide you through the basics and give you ideas of how to use the functions to their full potential. Although TreeAndLeaf was created for systems biology application, it is not at all limited to this use.

2 Quick Start

2.1 Package requirements

This section provides a quick and basic example using the R built-in dataframe USArrests. First, the packages necessary to the analysis are loaded.

#> ***This is RedeR 1.38.0! For a quick start, please type 'vignette('RedeR')'.
#>    Supporting information is available at Genome Biology 13:R29, 2012,
#>    (doi:10.1186/gb-2012-13-4-r29).

2.2 A small dendrogram example

As stated above, USArrests is a dataframe readily available in R. To know more about the info shown in this dataframe, use ?USArrests. To use TreeAndLeaf functions to their full potential, it is recommended that your dataframe has rownames set before making the dendrogram, like this one has.

#> [1] 50  4
#>            Murder Assault UrbanPop Rape
#> Alabama      13.2     236       58 21.2
#> Alaska       10.0     263       48 44.5
#> Arizona       8.1     294       80 31.0
#> Arkansas      8.8     190       50 19.5
#> California    9.0     276       91 40.6
#> Colorado      7.9     204       78 38.7

2.3 Building a dendrogram using R hclust()

In order to build a dendrogram, you need to have a distance matrix of the observations. For example, the default “euclidean distance” method of dist() can be used to generate a distance matrix, and then use the “average” method of hclust() to create a dendrogram.

hc <- hclust(dist(USArrests), "ave")