Contents

1 Introduction

epiregulon.extra is a companion package to epiregulon and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis.

For full documentation of the epiregulon package, please refer to the epiregulon book.

2 Installation

 if (!require("BiocManager", quietly = TRUE))
     install.packages("BiocManager")

 BiocManager::install("epiregulon.extra")

3 Data preparation

To continue with the visualization functions, we will first need the gene expression matrix. We can download the data from scMultiome.

# load the MAE object
library(scMultiome)
## Loading required package: AnnotationHub
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: ExperimentHub
## Loading required package: MultiAssayExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
## 
##     cache
## The following object is masked from 'package:AnnotationHub':
## 
##     cache
## Loading required package: SingleCellExperiment
mae <- scMultiome::reprogramSeq()
## see ?scMultiome and browseVignettes('scMultiome') for documentation
## loading from cache
# expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name

reducedDim(GeneExpressionMatrix, "UMAP_Combined") <- 
  reducedDim(mae[["TileMatrix500"]], "UMAP_Combined")

# arrange hash_assignment
GeneExpressionMatrix$hash_assignment <- factor(as.character(
  GeneExpressionMatrix$hash_assignment),
  levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
    "HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2"
  )
)

4 Calculate TF activity

Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object.

library(epiregulon)
library(epiregulon.extra)

data(regulon)

score.combine <- calculateActivity(
  expMatrix = GeneExpressionMatrix,
  regulon = regulon,
  mode = "weight",
  method = "weightedMean",
  exp_assay = "normalizedCounts",
  normalize = FALSE
)
## calculating TF activity from regulon using weightedmean
## Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
## regulon, : The weight column contains multiple subcolumns but no cluster
## information was provided. Using first column to compute activity...
## aggregating regulons...
## creating weight matrix...
## calculating activity scores...
## normalize by the number of targets...

5 Perform differential activity

We perform a differential analysis of transcription factor activity across groups of cells. This function is a wrapper around findMarkers from scran.

library(epiregulon.extra)
markers <- findDifferentialActivity(
  activity_matrix = score.combine,
  clusters = GeneExpressionMatrix$hash_assignment,
  pval.type = "some",
  direction = "up",
  test.type = "t"
)

We can specify the top transcription factors of interest

markers.sig <- getSigGenes(markers, topgenes = 5)
## Using a logFC cutoff of 0 for class HTO10_GATA6_UTR for direction equal to any
## Using a logFC cutoff of 0 for class HTO2_GATA6_v2 for direction equal to any
## Using a logFC cutoff of 0 for class HTO8_NKX2.1_UTR for direction equal to any
## Using a logFC cutoff of 0 for class HTO3_NKX2.1_v2 for direction equal to any
## Using a logFC cutoff of 0 for class HTO1_FOXA2_v2 for direction equal to any
## Using a logFC cutoff of 0 for class HTO4_mFOXA1_v2 for direction equal to any
## Using a logFC cutoff of 0 for class HTO6_hFOXA1_UTR for direction equal to any
## Using a logFC cutoff of 0 for class HTO5_NeonG_v2 for direction equal to any

6 Visualize the results

We visualize the known differential transcription factors by bubble plot

plotBubble(
  activity_matrix = score.combine,
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"),
  clusters = GeneExpressionMatrix$hash_assignment
)

Then visualize the most differential transcription factors by clusters

plotBubble(
  activity_matrix = score.combine,
  tf = markers.sig$tf,
  clusters = GeneExpressionMatrix$hash_assignment
)

Visualize the known differential transcription factors by violin plot.

plotActivityViolin(
  activity_matrix = score.combine,
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
  clusters = GeneExpressionMatrix$hash_assignment
)