1. Anatomy of a MPSE

MicrobiotaProcess introduces MPSE S4 class. This class inherits the SummarizedExperiment(Morgan et al. 2021) class. Here, the assays slot is used to store the rectangular abundance matrices of features for a microbiome experimental results. The colData slot is used to store the meta-data of sample and some results about samples in the downstream analysis. The rowData is used to store the meta-data of features and some results about the features in the downstream analysis. Compared to the SummarizedExperiment object, MPSE introduces the following additional slots:

The structure of the MPSE class.

The structure of the MPSE class.

2. Overview of the design of MicrobiotaProcess package

With this data structure, MicrobiotaProcess will be more interoperable with the existing computing ecosystem. For example, the slots inherited SummarizedExperiment can be extracted via the methods provided by SummarizedExperiment. The taxatree and otutree can also be extracted via mp_extract_tree, and they are compatible with ggtree(Yu et al. 2017), ggtreeExtra(Xu et al. 2021), treeio(Wang et al. 2020) and tidytree(Yu 2021) ecosystem since they are all treedata class, which is a data structure used directly by these packages.

Moreover, the results of upstream analysis of microbiome based some tools, such as qiime2(Bolyen et al. 2019), dada2(Callahan et al. 2016) and MetaPhlAn(Beghini et al. 2021) or other classes (SummarizedExperiment(Morgan et al. 2021), phyloseq(McMurdie and Holmes 2013) and TreeSummarizedExperiment(Huang et al. 2021)) used to store the result of microbiome can be loaded or transformed to the MPSE class.

In addition, MicrobiotaProcess also introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome analysis procedures under a unified and common framework (tidy-like framework). We believe MicrobiotaProcess can improve the efficiency of related researches, and it also bridges microbiome data analysis with the tidyverse(Wickham et al. 2019).

The Overview of the design of MicrobiotaProcess package

The Overview of the design of MicrobiotaProcess package

3. MicrobiotaProcess profiling

3.1 bridges other tools

MicrobiotaProcess provides several functions to parsing the output of upstream analysis tools of microbiome, such as qiime2(Bolyen et al. 2019), dada2(Callahan et al. 2016) and MetaPhlAn(Beghini et al. 2021), and return MPSE object. Some bioconductor class, such as phyloseq(McMurdie and Holmes 2013), TreeSummarizedExperiment(Huang et al. 2021) and SummarizedExperiment(Morgan et al. 2021) can also be converted to MPSE via as.MPSE().

## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 11
## # OTU=232 | Samples=19 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
##    <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
##  6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
##  9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## 10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## # … with 4,398 more rows
## # A MPSE-tibble (MPSE object) abstraction: 12,006 × 32
## # OTU=138 | Samples=87 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Speies
##    OTU    Sample     Abund…¹ Sampl…² Barco…³ Linke…⁴ Subject Sex     Age Pitts…⁵
##    <chr>  <chr>        <dbl> <chr>   <chr>   <chr>   <chr>   <chr> <int> <chr>  
##  1 OTU_1  ERR1331856     901 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  2 OTU_2  ERR1331856     877 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  3 OTU_3  ERR1331856     239 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  4 OTU_4  ERR1331856     201 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  5 OTU_5  ERR1331856     168 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  6 OTU_6  ERR1331856     115 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  7 OTU_7  ERR1331856     107 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  8 OTU_8  ERR1331856      84 LR53    AGTGTC… TATGGT… Patient Male     63 2      
##  9 OTU_9  ERR1331856      67 LR53    AGTGTC… TATGGT… Patient Male     63 2      
## 10 OTU_10 ERR1331856      67 LR53    AGTGTC… TATGGT… Patient Male     63 2      
## # … with 11,996 more rows, 22 more variables: Bell <dbl>, BMI <dbl>,
## #   sCD14ugml <dbl>, LBPugml <dbl>, LPSpgml <dbl>, IFABPpgml <dbl>,
## #   Physical_functioning <dbl>, Role_physical <dbl>, Role_emotional <dbl>,
## #   Energy_fatigue <dbl>, Emotional_well_being <dbl>, Social_functioning <dbl>,
## #   Pain <dbl>, General_health <dbl>, Description <lgl>, Kingdom <chr>,
## #   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
## #   Speies <chr>, and abbreviated variable names ¹​Abundance, ²​Sample_Name_s, …
## # A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
## # OTU=263 | Samples=20 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus
##    OTU        Sample Abund…¹ group taxid Kingdom Phylum Class Order Family Genus
##    <chr>      <chr>    <dbl> <chr> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr>
##  1 s__Methan… GupDM…   0.596 testA 2157… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
##  2 s__Actino… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  3 s__Actino… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  4 s__Actino… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  5 s__Bifido… GupDM…   0.948 testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  6 s__Bifido… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  7 s__Bifido… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  8 s__Bifido… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  9 s__Bifido… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
## 10 s__Bifido… GupDM…   0     testA 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
## # … with 5,250 more rows, and abbreviated variable name ¹​Abundance

3.2 alpha diversity analysis

Rarefaction, based on sampling technique, was used to compensate for the effect of sample size on the number of units observed in a sample(Siegel 2004). MicrobiotaProcess provided mp_cal_rarecurve and mp_plot_rarecurve to calculate and plot the curves based on rrarefy of vegan(Oksanen et al. 2019).

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 11
## # OTU=218 | Samples=19 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
##    <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
##  6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
##  9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## 10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## # … with 4,132 more rows
## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 13
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom,
## #   Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbundance time  RareAbundanceRarecurve
##    <chr>  <chr>      <int>         <int> <chr> <list>                
##  1 OTU_1  F3D0         579           214 Early <tibble [2,520 × 4]>  
##  2 OTU_2  F3D0         345           116 Early <tibble [2,520 × 4]>  
##  3 OTU_3  F3D0         449           179 Early <tibble [2,520 × 4]>  
##  4 OTU_4  F3D0         430           167 Early <tibble [2,520 × 4]>  
##  5 OTU_5  F3D0         154            54 Early <tibble [2,520 × 4]>  
##  6 OTU_6  F3D0         470           174 Early <tibble [2,520 × 4]>  
##  7 OTU_7  F3D0         282           115 Early <tibble [2,520 × 4]>  
##  8 OTU_8  F3D0         184            74 Early <tibble [2,520 × 4]>  
##  9 OTU_9  F3D0          45            16 Early <tibble [2,520 × 4]>  
## 10 OTU_10 F3D0         158            59 Early <tibble [2,520 × 4]>  
##    Kingdom     Phylum           Class          Order           
##    <chr>       <chr>            <chr>          <chr>           
##  1 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  2 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  3 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  4 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  5 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  6 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  7 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  8 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  9 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
## 10 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##    Family            Genus                   Species                
##    <chr>             <chr>                   <chr>                  
##  1 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  2 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  3 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  4 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  5 f__Bacteroidaceae g__Bacteroides          s__un_g__Bacteroides   
##  6 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  7 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  8 f__Rikenellaceae  g__Alistipes            s__un_g__Alistipes     
##  9 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
## 10 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
## # … with 4,132 more rows
The rarefaction of samples or groups

The rarefaction of samples or groups

3.3 calculate alpha index and visualization

Alpha diversity can be estimated the species richness and evenness of some species communities. MicrobiotaProcess provides mp_cal_alpha to calculate alpha index (Observe, Chao1, ACE, Shannon, Simpson and J (Pielou’s evenness)) and the mp_plot_alpha to visualize the result.

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 19
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom,
## #   Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbun…¹ time  RareAb…² Observe Chao1   ACE Shannon
##    <chr>  <chr>      <int>      <int> <chr> <list>     <dbl> <dbl> <dbl>   <dbl>
##  1 OTU_1  F3D0         579        214 Early <tibble>     104  104.  105.    3.88
##  2 OTU_2  F3D0         345        116 Early <tibble>     104  104.  105.    3.88
##  3 OTU_3  F3D0         449        179 Early <tibble>     104  104.  105.    3.88
##  4 OTU_4  F3D0         430        167 Early <tibble>     104  104.  105.    3.88
##  5 OTU_5  F3D0         154         54 Early <tibble>     104  104.  105.    3.88
##  6 OTU_6  F3D0         470        174 Early <tibble>     104  104.  105.    3.88
##  7 OTU_7  F3D0         282        115 Early <tibble>     104  104.  105.    3.88
##  8 OTU_8  F3D0         184         74 Early <tibble>     104  104.  105.    3.88
##  9 OTU_9  F3D0          45         16 Early <tibble>     104  104.  105.    3.88
## 10 OTU_10 F3D0         158         59 Early <tibble>     104  104.  105.    3.88
## # … with 4,132 more rows, 9 more variables: Simpson <dbl>, Pielou <dbl>,
## #   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>,
## #   Genus <chr>, Species <chr>, and abbreviated variable names ¹​RareAbundance,
## #   ²​RareAbundanceRarecurve
The alpha diversity comparison

The alpha diversity comparison

Users can extract the result with mp_extract_sample() to extract the result of mp_cal_alpha and visualized the result manually, see the example of mp_cal_alpha.

3.4 The visualization of taxonomy abundance

MicrobiotaProcess provides the mp_cal_abundance, mp_plot_abundance to calculate and plot the composition of species communities. And the mp_extract_abundance can extract the abundance of specific taxonomy level. User can also extract the abundance table to perform external analysis such as visualize manually (see the example of mp_cal_abundance).

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 19
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom,
## #   Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbun…¹ time  RareAb…² Observe Chao1   ACE Shannon
##    <chr>  <chr>      <int>      <int> <chr> <list>     <dbl> <dbl> <dbl>   <dbl>
##  1 OTU_1  F3D0         579        214 Early <tibble>     104  104.  105.    3.88
##  2 OTU_2  F3D0         345        116 Early <tibble>     104  104.  105.    3.88
##  3 OTU_3  F3D0         449        179 Early <tibble>     104  104.  105.    3.88
##  4 OTU_4  F3D0         430        167 Early <tibble>     104  104.  105.    3.88
##  5 OTU_5  F3D0         154         54 Early <tibble>     104  104.  105.    3.88
##  6 OTU_6  F3D0         470        174 Early <tibble>     104  104.  105.    3.88
##  7 OTU_7  F3D0         282        115 Early <tibble>     104  104.  105.    3.88
##  8 OTU_8  F3D0         184         74 Early <tibble>     104  104.  105.    3.88
##  9 OTU_9  F3D0          45         16 Early <tibble>     104  104.  105.    3.88
## 10 OTU_10 F3D0         158         59 Early <tibble>     104  104.  105.    3.88
## # … with 4,132 more rows, 9 more variables: Simpson <dbl>, Pielou <dbl>,
## #   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>,
## #   Genus <chr>, Species <chr>, and abbreviated variable names ¹​RareAbundance,
## #   ²​RareAbundanceRarecurve
## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 20
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample | Taxonomy=Kingdom, Phylum, Class, Order, Family,
## #   Genus, Species
##    OTU    Sample Abundance RareAbun…¹ RelRa…² time  RareAb…³ Observe Chao1   ACE
##    <chr>  <chr>      <int>      <int>   <dbl> <chr> <list>     <dbl> <dbl> <dbl>
##  1 OTU_1  F3D0         579        214   8.50  Early <tibble>     104  104.  105.
##  2 OTU_2  F3D0         345        116   4.61  Early <tibble>     104  104.  105.
##  3 OTU_3  F3D0         449        179   7.11  Early <tibble>     104  104.  105.
##  4 OTU_4  F3D0         430        167   6.63  Early <tibble>     104  104.  105.
##  5 OTU_5  F3D0         154         54   2.14  Early <tibble>     104  104.  105.
##  6 OTU_6  F3D0         470        174   6.91  Early <tibble>     104  104.  105.
##  7 OTU_7  F3D0         282        115   4.57  Early <tibble>     104  104.  105.
##  8 OTU_8  F3D0         184         74   2.94  Early <tibble>     104  104.  105.
##  9 OTU_9  F3D0          45         16   0.635 Early <tibble>     104  104.  105.
## 10 OTU_10 F3D0         158         59   2.34  Early <tibble>     104  104.  105.
## # … with 4,132 more rows, 10 more variables: Shannon <dbl>, Simpson <dbl>,
## #   Pielou <dbl>, Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>,
## #   Family <chr>, Genus <chr>, Species <chr>, and abbreviated variable names
## #   ¹​RareAbundance, ²​RelRareAbundanceBySample, ³​RareAbundanceRarecurve
The relative abundance and abundance of phyla of all samples

The relative abundance and abundance of phyla of all samples

The abundance of features also can be visualized by mp_plot_abundance with heatmap plot by setting geom='heatmap'.

The relative abundance and abundance of phyla of all samples

The relative abundance and abundance of phyla of all samples

The relative abundance and abundance of phyla of groups

The relative abundance and abundance of phyla of groups

3.5 Beta diversity analysis

Beta diversity is used to quantify the dissimilarities between the communities (samples). Some distance indexes, such as Bray-Curtis index, Jaccard index, UniFrac (weighted or unweighted) index, are useful for or popular with the community ecologists. Many ordination methods are used to estimated the dissimilarities in community ecology. MicrobiotaProcess implements mp_cal_dist to calculate the common distance, and provided mp_plot_dist to visualize the result. It also provides several commonly-used ordination methods, such as PCA (mp_cal_pca), PCoA (mp_cal_pcoa), NMDS (mp_cal_nmds), DCA (mp_cal_dca), RDA (mp_cal_rda), CCA (mp_cal_cca), and a function (mp_envfit) fits environmental vectors or factors onto an ordination. Moreover, it also wraps several statistical analysis for the distance matrices, such as adonis (mp_adonis), anosim (mp_anosim), mrpp (mp_mrpp) and mantel (mp_mantel). All these functions are developed based on tidy-like framework, and provided unified grammar, we believe these functions will help users to do the ordination analysis more conveniently.

3.5.1 The distance between samples or groups

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 21
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
## # … with 4,132 more rows, 11 more variables: ACE <dbl>, Shannon <dbl>,
## #   Simpson <dbl>, Pielou <dbl>, Kingdom <chr>, Phylum <chr>, Class <chr>,
## #   Order <chr>, Family <chr>, Genus <chr>, Species <chr>, and abbreviated
## #   variable names ¹​RareAbundance, ²​RelRareAbundanceBySample, ³​hellinger,
## #   ⁴​RareAbundanceRarecurve
## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 22
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
## # … with 4,132 more rows, 12 more variables: ACE <dbl>, Shannon <dbl>,
## #   Simpson <dbl>, Pielou <dbl>, bray <list>, Kingdom <chr>, Phylum <chr>,
## #   Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>, and
## #   abbreviated variable names ¹​RareAbundance, ²​RelRareAbundanceBySample,
## #   ³​hellinger, ⁴​RareAbundanceRarecurve
the distance between samples

the distance between samples

The distance between samples with group information

The distance between samples with group information

The comparison of distance among the groups

The comparison of distance among the groups

3.5.2 The PCoA analysis

The distance can be used to do the ordination analysis, such as PCoA, NMDS, etc. Here, we only show the example of PCoA analysis, other ordinations can refer to the examples and the usages of the corresponding functions.

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 25
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
## # … with 4,132 more rows, 15 more variables: ACE <dbl>, Shannon <dbl>,
## #   Simpson <dbl>, Pielou <dbl>, bray <list>, `PCo1 (46.6%)` <dbl>,
## #   `PCo2 (13.31%)` <dbl>, `PCo3 (8.22%)` <dbl>, Kingdom <chr>, Phylum <chr>,
## #   Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>, and
## #   abbreviated variable names ¹​RareAbundance, ²​RelRareAbundanceBySample,
## #   ³​hellinger, ⁴​RareAbundanceRarecurve
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 9999
## 
## vegan::adonis2(formula = .formula, data = sampleda, permutations = permutations, method = distmethod)
##          Df SumOfSqs      R2      F Pr(>F)    
## time      1  0.58216 0.44137 13.431  1e-04 ***
## Residual 17  0.73683 0.55863                  
## Total    18  1.31899 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The PCoA result

The PCoA result

3.5.3 Hierarchical cluster analysis

The distance of samples can also be used to perform the hierarchical cluster analysis to estimated the dissimilarities of samples. MicrobiotaProcess presents mp_cal_clust to perform this analysis. It also is implemented with the tidy-like framework.

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 25
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
## # … with 4,132 more rows, 15 more variables: ACE <dbl>, Shannon <dbl>,
## #   Simpson <dbl>, Pielou <dbl>, bray <list>, `PCo1 (46.6%)` <dbl>,
## #   `PCo2 (13.31%)` <dbl>, `PCo3 (8.22%)` <dbl>, Kingdom <chr>, Phylum <chr>,
## #   Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>, and
## #   abbreviated variable names ¹​RareAbundance, ²​RelRareAbundanceBySample,
## #   ³​hellinger, ⁴​RareAbundanceRarecurve
## 'treedata' S4 object'.
## 
## ...@ phylo:
## 
## Phylogenetic tree with 19 tips and 18 internal nodes.
## 
## Tip labels:
##   F3D0, F3D1, F3D141, F3D142, F3D143, F3D144, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   '', 'time', 'RareAbundanceRarecurve', 'Observe', 'Chao1', 'ACE', 'Shannon',
## 'Simpson', 'Pielou', 'bray', 'PCo1 (46.6%)', 'PCo2 (13.31%)', 'PCo3 (8.22%)'.
## 
## # The associated data tibble abstraction: 37 × 15
## # The 'node', 'label' and 'isTip' are from the phylo tree.
##     node label  isTip time  RareAbu…¹ Observe Chao1   ACE Shannon Simpson Pielou
##    <int> <chr>  <lgl> <chr> <list>      <dbl> <dbl> <dbl>   <dbl>   <dbl>  <dbl>
##  1     1 F3D0   TRUE  Early <tibble>      104 104.  105.     3.88   0.965  0.835
##  2     2 F3D1   TRUE  Early <tibble>       99 102   101.     3.97   0.971  0.864
##  3     3 F3D141 TRUE  Late  <tibble>       74  74    74.2    3.41   0.950  0.793
##  4     4 F3D142 TRUE  Late  <tibble>       48  48    48      3.12   0.939  0.805
##  5     5 F3D143 TRUE  Late  <tibble>       56  56    56      3.29   0.946  0.818
##  6     6 F3D144 TRUE  Late  <tibble>       47  47    47.2    2.98   0.930  0.774
##  7     7 F3D145 TRUE  Late  <tibble>       71  73.1  74.0    3.12   0.937  0.731
##  8     8 F3D146 TRUE  Late  <tibble>       83  84.5  83.8    3.60   0.956  0.814
##  9     9 F3D147 TRUE  Late  <tibble>       97 107   106.     3.31   0.940  0.723
## 10    10 F3D148 TRUE  Late  <tibble>       92  93.3  94.7    3.44   0.952  0.760
## # … with 27 more rows, 4 more variables: bray <list>, `PCo1 (46.6%)` <dbl>,
## #   `PCo2 (13.31%)` <dbl>, `PCo3 (8.22%)` <dbl>, and abbreviated variable name
## #   ¹​RareAbundanceRarecurve
The hierarchical cluster result of samples

The hierarchical cluster result of samples

Since the result of hierarchical cluster is treedata object, so it is very easy to display the result with associated data. For example, we can display the result of hierarchical cluster and the abundance of specific taxonomy level to check whether some biological pattern can be found.

## # A tibble: 171 × 7
##    Phyla             nodeClass Sample RareAbundance RelRareAbun…¹ time  RareAb…²
##    <fct>             <chr>     <chr>          <int>         <dbl> <chr> <list>  
##  1 p__Actinobacteria Phylum    F3D0              15         0.596 Early <tibble>
##  2 p__Actinobacteria Phylum    F3D1               0         0     Early <tibble>
##  3 p__Actinobacteria Phylum    F3D141            11         0.437 Late  <tibble>
##  4 p__Actinobacteria Phylum    F3D142            28         1.11  Late  <tibble>
##  5 p__Actinobacteria Phylum    F3D143            10         0.397 Late  <tibble>
##  6 p__Actinobacteria Phylum    F3D144            18         0.715 Late  <tibble>
##  7 p__Actinobacteria Phylum    F3D145             6         0.238 Late  <tibble>
##  8 p__Actinobacteria Phylum    F3D146             4         0.159 Late  <tibble>
##  9 p__Actinobacteria Phylum    F3D147            15         0.596 Late  <tibble>
## 10 p__Actinobacteria Phylum    F3D148            19         0.755 Late  <tibble>
## # … with 161 more rows, and abbreviated variable names
## #   ¹​RelRareAbundanceBySample, ²​RareAbundanceBytime
The hierarchical cluster result of samples and the abundance of Phylum

The hierarchical cluster result of samples and the abundance of Phylum

3.6 Biomarker discovery

The MicrobiotaProcess presents mp_diff_analysis for the biomarker discovery based on tidy-like framework. The rule of mp_diff_analysis is similar with the LEfSe(Nicola Segata and Huttenhower 2011). First, all features are tested whether values in different classes are differentially distributed. Second, the significantly different features are tested whether all pairwise comparisons between subclass in different classes distinctly consistent with the class trend. Finally, the significantly discriminative features are assessed by LDA (linear discriminant analysis) or rf(randomForest). However, mp_diff_analysis is more flexible. The test method of two step can be set by user, and we used the general fold change(Wirbel et al. 2019) and wilcox.test(default) to test whether all pairwise comparisons between subclass in different classes distinctly consistent with the class trend. And the result is stored to the treedata object, which can be processed and displayed via treeio(Wang et al. 2020), tidytree(Yu 2021), ggtree(Yu et al. 2017) and ggtreeExtra(Xu et al. 2021).

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 25
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
##      ACE Shannon Simpson Pielou bray     PCo1 (…⁵ PCo2 …⁶ PCo3 …⁷ Kingdom Phylum
##    <dbl>   <dbl>   <dbl>  <dbl> <list>      <dbl>   <dbl>   <dbl> <chr>   <chr> 
##  1  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  2  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  3  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  4  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  5  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  6  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  7  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  8  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
##  9  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
## 10  105.    3.88   0.965  0.835 <tibble>   -0.107   0.118   0.196 k__Bac… p__Ba…
## # … with 4,132 more rows, 5 more variables: Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>, and abbreviated variable names
## #   ¹​RareAbundance, ²​RelRareAbundanceBySample, ³​hellinger, ⁴​RareAbundanceRarecurve, ⁵​`PCo1 (46.6%)`, ⁶​`PCo2 (13.31%)`, ⁷​`PCo3 (8.22%)`
## 'treedata' S4 object'.
## 
## ...@ phylo:
## 
## Phylogenetic tree with 218 tips and 186 internal nodes.
## 
## Tip labels:
##   OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, OTU_5, ...
## Node labels:
##   r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria,
## p__Deinococcus-Thermus, ...
## 
## Rooted; no branch lengths.
## 
## with the following features available:
##   'nodeClass', 'nodeDepth', 'RareAbundanceBySample', 'RareAbundanceBytime',
## 'LDAupper', 'LDAmean', 'LDAlower', 'Sign_time', 'pvalue', 'fdr'.
## 
## # The associated data tibble abstraction: 404 × 13
## # The 'node', 'label' and 'isTip' are from the phylo tree.
##     node label   isTip nodeC…¹ nodeD…² RareAb…³ RareAb…⁴ LDAup…⁵ LDAmean LDAlo…⁶
##    <int> <chr>   <lgl> <chr>     <dbl> <list>   <list>     <dbl>   <dbl>   <dbl>
##  1     1 OTU_67  TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  2     2 OTU_231 TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  3     3 OTU_188 TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  4     4 OTU_150 TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  5     5 OTU_207 TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  6     6 OTU_5   TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  7     7 OTU_1   TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  8     8 OTU_2   TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
##  9     9 OTU_3   TRUE  OTU           8 <tibble> <tibble>   NA      NA      NA   
## 10    10 OTU_4   TRUE  OTU           8 <tibble> <tibble>    4.41    4.38    4.36
## # … with 394 more rows, 3 more variables: Sign_time <chr>, pvalue <dbl>,
## #   fdr <dbl>, and abbreviated variable names ¹​nodeClass, ²​nodeDepth,
## #   ³​RareAbundanceBySample, ⁴​RareAbundanceBytime, ⁵​LDAupper, ⁶​LDAlower
## # A tibble: 399 × 8
##    label   nodeClass LDAupper LDAmean LDAlower Sign_time   pvalue     fdr
##    <chr>   <chr>        <dbl>   <dbl>    <dbl> <chr>        <dbl>   <dbl>
##  1 OTU_67  OTU          NA      NA       NA    <NA>      0.00335  0.0194 
##  2 OTU_231 OTU          NA      NA       NA    <NA>      0.343    0.408  
##  3 OTU_188 OTU          NA      NA       NA    <NA>      0.563    0.634  
##  4 OTU_150 OTU          NA      NA       NA    <NA>      0.235    0.355  
##  5 OTU_207 OTU          NA      NA       NA    <NA>      0.878    0.894  
##  6 OTU_5   OTU          NA      NA       NA    <NA>      0.568    0.634  
##  7 OTU_1   OTU          NA      NA       NA    <NA>      0.744    0.773  
##  8 OTU_2   OTU          NA      NA       NA    <NA>      0.437    0.515  
##  9 OTU_3   OTU          NA      NA       NA    <NA>      0.437    0.515  
## 10 OTU_4   OTU           4.41    4.38     4.36 Late      0.000444 0.00732
## # … with 389 more rows
# Since taxa.tree is treedata object, it can be visualized by ggtree and ggtreeExtra
p1 <- ggtree(
        taxa.tree,
        layout="radial",
        size = 0.3
      ) +
      geom_point(
        data = td_filter(!isTip),
        fill="white",
        size=1,
        shape=21
      )
# display the high light of phylum clade.
p2 <- p1 +
      geom_hilight(
        data = td_filter(nodeClass == "Phylum"),
        mapping = aes(node = node, fill = label)
      )
# display the relative abundance of features(OTU)
p3 <- p2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         data = td_unnest(RareAbundanceBySample),
         geom = geom_star,
         mapping = aes(
                       x = fct_reorder(Sample, time, .fun=min),
                       size = RelRareAbundanceBySample,
                       fill = time,
                       subset = RelRareAbundanceBySample > 0
                   ),
         starshape = 13,
         starstroke = 0.25,
         offset = 0.04,
         pwidth = 0.8,
         grid.params = list(linetype=2)
      ) +
      scale_size_continuous(
         name="Relative Abundance (%)",
         range = c(.5, 3)
      ) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))
# display the tip labels of taxa tree
p4 <- p3 + geom_tiplab(size=2, offset=7.2)
# display the LDA of significant OTU.
p5 <- p4 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         geom = geom_col,
         mapping = aes(
                       x = LDAmean,
                       fill = Sign_time,
                       subset = !is.na(LDAmean)
                       ),
         orientation = "y",
         offset = 0.3,
         pwidth = 0.5,
         axis.params = list(axis = "x",
                            title = "Log10(LDA)",
                            title.height = 0.01,
                            title.size = 2,
                            text.size = 1.8,
                            vjust = 1),
         grid.params = list(linetype = 2)
      )

# display the significant (FDR) taxonomy after kruskal.test (default)
p6 <- p5 +
      ggnewscale::new_scale("size") +
      geom_point(
         data=td_filter(!is.na(Sign_time)),
         mapping = aes(size = -log10(fdr),
                       fill = Sign_time,
                       ),
         shape = 21,
      ) +
      scale_size_continuous(range=c(1, 3)) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))

p6 + theme(
           legend.key.height = unit(0.3, "cm"),
           legend.key.width = unit(0.3, "cm"),
           legend.spacing.y = unit(0.02, "cm"),
           legend.text = element_text(size = 7),
           legend.title = element_text(size = 9),
          )
The different species and the abundance of sample

The different species and the abundance of sample

The visualization methods of result can be various, you can refer to the article or book of ggtree(Yu et al. 2017, 2018) and the article of ggtreeExtra(Xu et al. 2021). In addition, we also developed mp_plot_diff_res to display the result of mp_diff_analysis, it can decreases coding burden.

## # A MPSE-tibble (MPSE object) abstraction: 4,142 × 31
## # OTU=218 | Samples=19 | Assays=Abundance, RareAbundance,
## #   RelRareAbundanceBySample, hellinger | Taxonomy=Kingdom, Phylum, Class,
## #   Order, Family, Genus, Species
##    OTU    Sample Abundance RareAb…¹ RelRa…² helli…³ time  RareAb…⁴ Observe Chao1
##    <chr>  <chr>      <int>    <int>   <dbl>   <dbl> <chr> <list>     <dbl> <dbl>
##  1 OTU_1  F3D0         579      214   8.50   0.298  Early <tibble>     104  104.
##  2 OTU_2  F3D0         345      116   4.61   0.230  Early <tibble>     104  104.
##  3 OTU_3  F3D0         449      179   7.11   0.262  Early <tibble>     104  104.
##  4 OTU_4  F3D0         430      167   6.63   0.257  Early <tibble>     104  104.
##  5 OTU_5  F3D0         154       54   2.14   0.154  Early <tibble>     104  104.
##  6 OTU_6  F3D0         470      174   6.91   0.268  Early <tibble>     104  104.
##  7 OTU_7  F3D0         282      115   4.57   0.208  Early <tibble>     104  104.
##  8 OTU_8  F3D0         184       74   2.94   0.168  Early <tibble>     104  104.
##  9 OTU_9  F3D0          45       16   0.635  0.0830 Early <tibble>     104  104.
## 10 OTU_10 F3D0         158       59   2.34   0.156  Early <tibble>     104  104.
## # … with 4,132 more rows, 21 more variables: ACE <dbl>, Shannon <dbl>,
## #   Simpson <dbl>, Pielou <dbl>, bray <list>, `PCo1 (46.6%)` <dbl>,
## #   `PCo2 (13.31%)` <dbl>, `PCo3 (8.22%)` <dbl>, Kingdom <chr>, Phylum <chr>,
## #   Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>,
## #   LDAupper <dbl>, LDAmean <dbl>, LDAlower <dbl>, Sign_time <chr>,
## #   pvalue <dbl>, fdr <dbl>, and abbreviated variable names ¹​RareAbundance,
## #   ²​RelRareAbundanceBySample, ³​hellinger, ⁴​RareAbundanceRarecurve
The different species and the abundance of group

The different species and the abundance of group

We also developed mp_plot_diff_cladogram to visualize the result.

The cladogram of differential species

The cladogram of differential species

The result also can be visualized with mp_plot_diff_boxplot.

The abundance and LDA effect size of differential taxa

The abundance and LDA effect size of differential taxa

Or visualizing the results with mp_plot_diff_manhattan

The mahattan plot of differential taxa

The mahattan plot of differential taxa

4. Need helps?

If you have questions/issues, please visit github issue tracker.

5. Session information

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_1.0.0            ggstar_1.0.4             MicrobiotaProcess_1.10.3
##  [4] tidytree_0.4.2           treeio_1.22.0            ggtreeExtra_1.8.1       
##  [7] ggtree_3.6.2             shadowtext_0.1.2         phyloseq_1.42.0         
## [10] ggplot2_3.4.0            knitr_1.42              
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.1-1               ggnewscale_0.4.8           
##   [3] colorspace_2.1-0            ggsignif_0.6.4             
##   [5] modeltools_0.2-23           XVector_0.38.0             
##   [7] GenomicRanges_1.50.2        aplot_0.1.9                
##   [9] farver_2.1.1                ggrepel_0.9.3              
##  [11] fansi_1.0.4                 mvtnorm_1.1-3              
##  [13] coin_1.4-2                  codetools_0.2-19           
##  [15] splines_4.2.2               ggh4x_0.2.3                
##  [17] cachem_1.0.6                libcoin_1.0-9              
##  [19] ade4_1.7-22                 jsonlite_1.8.4             
##  [21] cluster_2.1.4               png_0.1-8                  
##  [23] compiler_4.2.2              Matrix_1.5-3               
##  [25] fastmap_1.1.0               lazyeval_0.2.2             
##  [27] cli_3.6.0                   htmltools_0.5.4            
##  [29] tools_4.2.2                 igraph_1.3.5               
##  [31] gtable_0.3.1                glue_1.6.2                 
##  [33] GenomeInfoDbData_1.2.9      corrr_0.4.4                
##  [35] reshape2_1.4.4              dplyr_1.1.0                
##  [37] Rcpp_1.0.10                 Biobase_2.58.0             
##  [39] jquerylib_0.1.4             vctrs_0.5.2                
##  [41] Biostrings_2.66.0           rhdf5filters_1.10.0        
##  [43] multtest_2.54.0             ape_5.6-2                  
##  [45] nlme_3.1-162                iterators_1.0.14           
##  [47] gghalves_0.1.4              ggalluvial_0.12.4          
##  [49] xfun_0.37                   stringr_1.5.0              
##  [51] lifecycle_1.0.3             zlibbioc_1.44.0            
##  [53] MASS_7.3-58.2               zoo_1.8-11                 
##  [55] scales_1.2.1                MatrixGenerics_1.10.0      
##  [57] parallel_4.2.2              SummarizedExperiment_1.28.0
##  [59] biomformat_1.26.0           sandwich_3.0-2             
##  [61] rhdf5_2.42.0                yaml_2.3.7                 
##  [63] gridExtra_2.3               ggfun_0.0.9                
##  [65] dtplyr_1.2.2                yulab.utils_0.0.6          
##  [67] sass_0.4.5                  stringi_1.7.12             
##  [69] highr_0.10                  S4Vectors_0.36.1           
##  [71] foreach_1.5.2               permute_0.9-7              
##  [73] BiocGenerics_0.44.0         ggside_0.2.2               
##  [75] GenomeInfoDb_1.34.9         rlang_1.0.6                
##  [77] pkgconfig_2.0.3             matrixStats_0.63.0         
##  [79] bitops_1.0-7                evaluate_0.20              
##  [81] lattice_0.20-45             purrr_1.0.1                
##  [83] Rhdf5lib_1.20.0             patchwork_1.1.2            
##  [85] labeling_0.4.2              tidyselect_1.2.0           
##  [87] plyr_1.8.8                  magrittr_2.0.3             
##  [89] R6_2.5.1                    IRanges_2.32.0             
##  [91] magick_2.7.3                generics_0.1.3             
##  [93] multcomp_1.4-20             DelayedArray_0.24.0        
##  [95] DBI_1.1.3                   pillar_1.8.1               
##  [97] withr_2.5.0                 mgcv_1.8-41                
##  [99] prettydoc_0.4.1             survival_3.5-0             
## [101] RCurl_1.98-1.10             tibble_3.1.8               
## [103] crayon_1.5.2                utf8_1.2.3                 
## [105] rmarkdown_2.20              grid_4.2.2                 
## [107] data.table_1.14.6           vegan_2.6-4                
## [109] digest_0.6.31               tidyr_1.3.0                
## [111] gridGraphics_0.5-1          stats4_4.2.2               
## [113] munsell_0.5.0               viridisLite_0.4.1          
## [115] ggplotify_0.1.0             bslib_0.4.2

6. References

Beghini, Francesco, Lauren J McIver, Aitor Blanco-Mı́guez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, et al. 2021. “Integrating Taxonomic, Functional, and Strain-Level Profiling of Diverse Microbial Communities with bioBakery 3.” Elife 10: e65088.

Bolyen, Evan, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian C Abnet, Gabriel A Al-Ghalith, Harriet Alexander, et al. 2019. “Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using Qiime 2.” Nature Biotechnology 37 (8): 852–57.

Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. 2016. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13 (7): 581–83.

Huang, Ruizhu, Charlotte Soneson, Felix G.M. Ernst, Kevin C. Rue-Albrecht, Guangchuang Yu, Stephanie C. Hicks, and Mark D. Robinson. 2021. “TreeSummarizedExperiment: A S4 Class for Data with Hierarchical Structure.” F1000Research 9: 1246. https://f1000research.com/articles/9-1246.

McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4): e61217. https://doi.org/10.1371/journal.pone.0061217.

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2021. SummarizedExperiment: SummarizedExperiment Container. https://bioconductor.org/packages/SummarizedExperiment.

Nicola Segata, Levi Waldron, Jacques Izard, and Curtis Huttenhower. 2011. “Metagenomic Biomarker Discovery and Explanation.” Genome Biology 12 (6): R60. https://doi.org/10.1186/gb-2011-12-6-r60.

Oksanen, Jari, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, et al. 2019. “Vegan: Community Ecology Package.” https://CRAN.R-project.org/package=vegan.

Pagès, H., P. Aboyoun, R. Gentleman, and S. DebRoy. 2021. Biostrings: Efficient Manipulation of Biological Strings. https://bioconductor.org/packages/Biostrings.

Siegel, Andrew F. 2004. “Rarefaction Curves.” Encyclopedia of Statistical Sciences 10. https://doi.org/10.1002/0471667196.ess2195.pub2.

Wang, Li-Gen, Tommy Tsan-Yuk Lam, Shuangbin Xu, Zehan Dai, Lang Zhou, Tingze Feng, Pingfan Guo, et al. 2020. “Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data.” Molecular Biology and Evolution 37 (2): 599–603. https://doi.org/10.1093/molbev/msz240.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wirbel, Jakob, Paul Theodor Pyl, Ece Kartal, Konrad Zych, Alireza Kashani, Alessio Milanese, Jonas S Fleck, et al. 2019. “Meta-Analysis of Fecal Metagenomes Reveals Global Microbial Signatures That Are Specific for Colorectal Cancer.” Nature Medicine 25 (4): 679. https://doi.org/10.1038/s41591-019-0406-6.

Xu, Shuangbin, Zehan Dai, Pingfan Guo, Xiaocong Fu, Shanshan Liu, Lang Zhou, Wenli Tang, et al. 2021. “GgtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data.” Molecular Biology and Evolution 38 (9): 4039–42. https://doi.org/10.1093/molbev/msab166.

Yu, Guangchuang. 2021. Tidytree: A Tidy Tool for Phylogenetic Tree Data Manipulation. https://yulab-smu.top/treedata-book/.

Yu, Guangchuang, Tommy Tsan-Yuk Lam, Huachen Zhu, and Yi Guan. 2018. “Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree.” Molecular Biology and Evolution 35 (2): 3041–3. https://doi.org/10.1093/molbev/msy194.

Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210X.12628.