Contents

This vignette shows an example workflow for ensemble biclustering analysis with the mosbi package. Every function of the package has a help page with a detailed documentation. To access these type help(package=mosbi) in the R console.

0.0.1 Load packages

Import dependencies.

library(mosbi)

0.0.2 Helper functions

Two additional functions are defined, to calculate z-scores of the data and to visualize the biclusters as a histogram.

z_score <- function(x, margin = 2) {
    z_fun <- function(y) {
        (y - mean(y, na.rm = TRUE)) / sd(y, na.rm = TRUE)
    }

    if (margin == 2) {
        return(apply(x, margin, z_fun))
    } else if (margin == 1) {
        return(t(apply(x, margin, z_fun)))
    }
}

bicluster_histo <- function(biclusters) {
    cols <- mosbi::colhistogram(biclusters)
    rows <- mosbi::rowhistogram(biclusters)

    graphics::par(mfrow = c(1, 2))
    hist(cols, main = "Column size ditribution")
    hist(rows, main = "Row size ditribution")
}

0.0.3 1. Download and prepare data

Biclustering will be done on a data matrix. As an example,
lipidomics dataset from the metabolights database will be used https://www.ebi.ac.uk/metabolights/MTBLS562. The data consists of 40 samples (columns) and 245 lipids (rows).

# get data
data(mouse_data)

mouse_data <- mouse_data[c(
    grep(
        "metabolite_identification",
        colnames(mouse_data)
    ),
    grep("^X", colnames(mouse_data))
)]

# Make data matrix
data_matrix <- z_score(log2(as.matrix(mouse_data[2:ncol(mouse_data)])), 1)

rownames(data_matrix) <- mouse_data$metabolite_identification

stats::heatmap(data_matrix)

The data has a gaussian-like distribution and no missing values, so we can proceed with biclustering.

0.0.4 2. Compute biclusters

The mosbi package is able to work with results of different biclustering algorithms. The approach unites the results from different algorithms. The results of four example algorithms will be computed and converted to mosbi::bicluster objects. For a list of all supported biclustering algorithms/packages type ?mosbi::get_biclusters in the R console.

# Fabia
fb <- mosbi::run_fabia(data_matrix) # In case the algorithms throws an error,
#> Cycle: 0
Cycle: 20
Cycle: 40
Cycle: 60
Cycle: 80
Cycle: 100
Cycle: 120
Cycle: 140
Cycle: 160
Cycle: 180
Cycle: 200
Cycle: 220
Cycle: 240
Cycle: 260
Cycle: 280
Cycle: 300
Cycle: 320
Cycle: 340
Cycle: 360
Cycle: 380
Cycle: 400
Cycle: 420
Cycle: 440
Cycle: 460
Cycle: 480
Cycle: 500
# return an empty list

# isa2
BCisa <- mosbi::run_isa(data_matrix)

# Plaid
BCplaid <- mosbi::run_plaid(data_matrix)
#> layer: 0 
#>  5882.744
#> layer: 1 
#> [1]  0 97 15
#> [1]  1 85 12
#> [1]  2 78 12
#> [1] 30 78 12
#> [1] 31 38 12
#> [1] 32 38  6
#> [1] 33 38  6
#> [1] 60 38  6
#> [1] 3
#> [1] 255.292   0.000   0.000   0.000
#> back fitting 2 times
#> layer: 2 
#> [1]   0 113  11
#> [1]   1 103  11
#> [1]   2 101  11
#> [1]  30 101  11
#> [1] 31 28 11
#> [1] 32 28  8
#> [1] 33 28  8
#> [1] 34 28  7
#> [1] 35 27  7
#> [1] 36 27  7
#> [1] 37 27  7
#> [1] 60 27  7
#> [1] 3
#> [1] 104.4864   0.0000   0.0000   0.0000
#> back fitting 2 times
#> layer: 3 
#> [1]   0 121  18
#> [1]   1 104  15
#> [1]   2 101  12
#> [1]  3 98 11
#> [1]  4 97 10
#> [1] 30 97 10
#> [1] 31 28 10
#> [1] 32 28  8
#> [1] 33 28  8
#> [1] 60 28  8
#> [1] 5
#> [1] 85.51426  0.00000  0.00000  0.00000
#> back fitting 2 times
#> layer: 4 
#> [1]   0 116  18
#> [1]   1 100  16
#> [1]  2 94 15
#> [1]  3 90 15
#> [1]  4 85 15
#> [1]  5 84 15
#> [1]  6 82 15
#> [1]  7 81 15
#> [1]  8 77 15
#> [1]  9 74 15
#> [1] 10 69 14
#> [1] 11 65 14
#> [1] 12 63 14
#> [1] 13 62 14
#> [1] 14 60 14
#> [1] 15 58 14
#> [1] 30 58 14
#> [1] "Zero residual degrees of freedom"
#> [1] 31  1 14
#> [1] "Zero residual degrees of freedom"
#> [1] 32  1 14
#> [1] 33  0 14
#> [1] 34
#> [1] 0 0 0 0
#>      
#> Layer Rows Cols  Df      SS    MS Convergence Rows Released Cols Released
#>     0  245   40 284 5866.51 20.66          NA            NA            NA
#>     1   38    6  43  448.76 10.44           1            40             6
#>     2   27    7  33  168.94  5.12           1            74             4
#>     3   28    8  35  156.11  4.46           1            69             2

# QUBIC
BCqubic <- mosbi::run_qubic(data_matrix)

# Merge results of all algorithms
all_bics <- c(fb, BCisa, BCplaid, BCqubic)

bicluster_histo(all_bics)

The histogram visualizes the distribution of bicluster sizes (separately for the number of rows and columns of each bicluster). The total number of found biclusters are given in the title.

0.0.5 3. Compute network

The next step of the ensemble approach is the computation of a similarity network of biclusters. To filter for for similarities due to random overlaps of biclusters, we apply an error model (For more details refer to our publication). Different similarity metrics are available. For details type mosbi::bicluster_network in the R console.

bic_net <- mosbi::bicluster_network(all_bics, # List of biclusters
    data_matrix, # Data matrix
    n_randomizations = 5,
    # Number of randomizations for the
    # error model
    MARGIN = "both",
    # Use datapoints for metric evaluation
    metric = 4, # Fowlkes–Mallows index
    # For information about the metrics,
    # visit the "Similarity metrics
    # evaluation" vignette
    n_steps = 1000,
    # At how many steps should
    # the cut-of is evaluated
    plot_edge_dist = TRUE
    # Plot the evaluation of cut-off estimation
)
#> Esimated cut-off:  0.03903904

The two resulting plot visualize the process of cut-off estimation. The right plot show the remaining number of edges for the computed bicluster network (red) and for randomizations of biclusters (black). The vertical red line showed the threshold with the highest signal-to-noise ratio (SNR). All evaluated SNRs are again visualized in the left plot.

The next plot shows the bicluster similarity matrix. It reveals highly similar biclusters.

stats::heatmap(get_adjacency(bic_net))