Chapter 1 First break-out room

Study design data The intestinal microbiota is shaped by many interactions between microorganisms, host, diet, and the environment. Exposure to microorganisms present in the environment, and the exchange of microorganisms between hosts sharing the same environment, can influence the intestinal microbiota of individuals, but how this affects microbiota studies is poorly understood. We investigated the effects of experimental housing circumstances on intestinal microbiota composition in broiler chickens, and how these effects may influence the capacity to determine diet-related effects in a nutrition experiment. A known feed intervention was used to create differences in cecal microbiota between pens within the same housing condition (Feed A or Feed B). After 35 days cecal microbiota composition was assessed by 16S ribosomal RNA gene amplicon sequencing of seven broilers per pen. For this workshop, we use a subset of four pens.

Load packages

library(phyloseq) 
library(microbiome) 
library(data.table)
library(RColorBrewer)
library(ggpubr)
library(readr)
library(picante)
library(nlme)
library(ggplot2)
library(ape)
library(vegan)
library(data.table)
library(DT)
library(png)

1.1 Create Phyloseq object

  • Let’s start with uploading the data, 3 files are needed:
      1. Biom file
      1. Metadata file
      1. Tree file

The data can be downloaded via “Download ZIP”
The 3 files need to be in your current working directory (you can run the command getwd() in the RStudio console to check)

Additional reading Phyloseq

A phyloseq object contains information on: the abundances of the amplicon sequence variant (ASV), the taxonomy of the ASV, the phylogenetic tree, and the metadata of the samples.

pseq <- read_phyloseq(otu.file= "BiomWorkshop.biom1", 
                      taxonomy.file = NULL, 
                      metadata.file = "MetaWorkshop.csv", 
                      type="biom", sep =";"  )

treefile <- read_tree("TreeWorkshop.tree") 
ps <- merge_phyloseq(pseq, treefile)
ps
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 489 taxa and 28 samples ]
## sample_data() Sample Data:       [ 28 samples by 9 sample variables ]
## tax_table()   Taxonomy Table:    [ 489 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 489 tips and 488 internal nodes ]

ps is the phyloseq object and contains 28 chicken samples and 489 ASV.
We can view the taxonomy of the ASV within a taxa table:

datatable(tax_table(ps))

The focus is on the diversities but a mini sequences quality control cannot be missed.
The datatable shows the presences of an unknown Domain (“NA”), we want to remove this.

Do you know why?

ps1 <- subset_taxa(ps, Domain!="NA")
ps1
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 486 taxa and 28 samples ]
## sample_data() Sample Data:       [ 28 samples by 9 sample variables ]
## tax_table()   Taxonomy Table:    [ 486 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 486 tips and 485 internal nodes ]

Summary sequence data
To review the sequence data in we can use the summary function

summarize_phyloseq(ps1) 
## Compositional = NO2
## 1] Min. number of reads = 396052] Max. number of reads = 2612713] Total number of reads = 35815914] Average number of reads = 127913.9642857145] Median number of reads = 1099827] Sparsity = 0.8179012345679016] Any OTU sum to 1 or less? NO8] Number of singletons = 09] Percent of OTUs that are singletons 
##         (i.e. exactly one read detected across all samples)010] Number of sample variables are: 9BarcodeSequenceLibraryNumberLibraryNameProjectNameX16S_regionprimersPenFeedPenNr2
## [[1]]
## [1] "1] Min. number of reads = 39605"
## 
## [[2]]
## [1] "2] Max. number of reads = 261271"
## 
## [[3]]
## [1] "3] Total number of reads = 3581591"
## 
## [[4]]
## [1] "4] Average number of reads = 127913.964285714"
## 
## [[5]]
## [1] "5] Median number of reads = 109982"
## 
## [[6]]
## [1] "7] Sparsity = 0.817901234567901"
## 
## [[7]]
## [1] "6] Any OTU sum to 1 or less? NO"
## 
## [[8]]
## [1] "8] Number of singletons = 0"
## 
## [[9]]
## [1] "9] Percent of OTUs that are singletons \n        (i.e. exactly one read detected across all samples)0"
## 
## [[10]]
## [1] "10] Number of sample variables are: 9"
## 
## [[11]]
## [1] "BarcodeSequence" "LibraryNumber"   "LibraryName"     "ProjectName"     "X16S_region"    
## [6] "primers"         "Pen"             "Feed"            "PenNr"

What is the number of singletons in this dataset?

Check rarefaction
The number of species = ASV (y-as) should reach a plateau if the sample size = reads (x-as) is large enough.

otu_tab <- t(abundances(ps1))
p <- vegan::rarecurve(otu_tab, step = 50, label = FALSE, sample = min(rowSums(otu_tab), Col ="red", cex = 0.5))

1.2 Calculate alpha-diversities

Let’s first visualize the data

plot_richness(ps1, "Pen", "Feed", measures=c("Observed", "Shannon", "Simpson", "InvSimpson")) 

What stands out?

Let’s test the four different (non-phylogenetic) alpha diversities on feed and pen

alpha <- estimate_richness(ps1, measures = c("Observed", "Shannon", "Simpson", "InvSimpson" ))
meta <- meta(ps1)  #extract metadata from phyloseq object

Observed

kruskal.pd <- kruskal.test(alpha$Observed ~ meta$Feed)
print(kruskal.pd)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  alpha$Observed by meta$Feed
## Kruskal-Wallis chi-squared = 16.755, df = 1, p-value = 4.253e-05
pairwise.wilcox.test(alpha$Observed, meta$Pen, p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  alpha$Observed and meta$Pen 
## 
##   1      2      3     
## 2 0.7478 -      -     
## 3 0.0249 0.0107 -     
## 4 0.0064 0.0064 0.0065
## 
## P value adjustment method: BH

Simpson

kruskal.pd <- kruskal.test(alpha$Simpson ~ meta$Feed)
print(kruskal.pd)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  alpha$Simpson by meta$Feed
## Kruskal-Wallis chi-squared = 14.196, df = 1, p-value = 0.0001648
pairwise.wilcox.test(alpha$Simpson, meta$Pen, p.adjust.method = "BH") 
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  alpha$Simpson and meta$Pen 
## 
##   1      2      3     
## 2 0.0455 -      -     
## 3 0.0035 0.2086 -     
## 4 0.0012 0.0012 0.0012
## 
## P value adjustment method: BH

Shannon

kruskal.pd <- kruskal.test(alpha$Shannon ~ meta$Feed)
print(kruskal.pd)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  alpha$Shannon by meta$Feed
## Kruskal-Wallis chi-squared = 15.98, df = 1, p-value = 6.403e-05
pairwise.wilcox.test(alpha$Shannon, meta$Pen, p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  alpha$Shannon and meta$Pen 
## 
##   1      2      3     
## 2 0.1282 -      -     
## 3 0.0105 0.0315 -     
## 4 0.0012 0.0012 0.0012
## 
## P value adjustment method: BH

Phylogenetic diversity

#library(picante)
ps.otu <- as.data.frame(ps1@otu_table)
ps.tree <- ps1@phy_tree

df.pd <- pd(t(ps.otu), treefile, include.root=T) # rooted tree
meta$Phylogenetic_Diversity <- df.pd$PD

pd.plot <- ggboxplot(meta, x = "Pen",
                     y = "Phylogenetic_Diversity",
                     fill = "Feed")

pd.plot

kruskal.pd <- kruskal.test(meta$Phylogenetic_Diversity ~ meta$Pen)
print(kruskal.pd)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta$Phylogenetic_Diversity by meta$Pen
## Kruskal-Wallis chi-squared = 22.33, df = 3, p-value = 5.569e-05
pairwise.wilcox.test(meta$Phylogenetic_Diversity, meta$Pen, p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  meta$Phylogenetic_Diversity and meta$Pen 
## 
##   1      2      3     
## 2 0.2593 -      -     
## 3 0.0017 0.0028 -     
## 4 0.0012 0.0012 0.0012
## 
## P value adjustment method: BH

The data we use in this workshop has been publised

Additional reading Shetty Sudarshan A, Lahti Leo, Hermes Gerben DA, & Hauke Smidt. (2020, April 11) Microbial bioinformatics introductory course material 2018 (Version v3.0). Zenodo http://doi.org/10.5281/zenodo.1436630