+Advanced Search
Article Contents


Expanding our understanding of marine viral diversity through metagenomic analyses of biofilms

  • Electronic supplementary material The online version of this article (https://doi.org/10.1007/s42995-020-00078-4) contains supplementary material, which is available to authorized users.
  • Edited by Chengchao Chen.
  • Recent metagenomics surveys have provided insights into the marine virosphere. However, these surveys have focused solely on viruses in seawater, neglecting those associated with biofilms. By analyzing 1.75 terabases of biofilm metagenomic data, 3974 viral sequences were identified from eight locations around the world. Over 90% of these viral sequences were not found in previously reported datasets. Comparisons between biofilm and seawater metagenomes identified viruses that are endemic to the biofilm niche. Analysis of viral sequences integrated within biofilm-derived microbial genomes revealed potential functional genes for trimeric autotransporter adhesin and polysaccharide metabolism, which may contribute to biofilm formation by the bacterial hosts. However, more than 70% of the genes could not be annotated. These findings show marine biofilms to be a reservoir of novel viruses and have enhanced our understanding of natural virus-bacteria ecosystems.
  • 加载中
  • Almpanis A, Swain M, Gatherer D, McEwan N (2018) Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microb Genom 4:000168
    Arvey A, Tempera I, Tsai K, Chen HS, Tikhmyanova N, Klichinsky M, Leslie C, Lieberman PM (2012) An atlas of the Epstein-Barr virus transcriptome and epigenome reveals host-virus regulatory interactions. Cell Host Microbe 12:233-245 doi: 10.1016/j.chom.2012.06.008
    Bettarel Y, Bouvy M, Dumont C, Sime-Ngando T (2006) Virus-bacterium interactions in water and sediment of West African inland aquatic systems. Appl Environ Microbiol 72:5274-5282 doi: 10.1128/AEM.00863-06
    Breitbart M (2012) Marine viruses: truth or dare. Ann Rev Mar Sci 4:425-448 doi: 10.1146/annurev-marine-120709-142805
    Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S, Cruaud C, De Vargas C, Gasol JM, Gorsky G, Gregory AC, Guidi L, Hingamp P, Iudicone D, Not F, Ogata H, Pesant S, Poulos BT, Schwenck SM et al (2015) Patterns and ecological drivers of ocean viral communities. Science 348:1261498 doi: 10.1126/science.1261498
    Bushnell B (2014) BBMap: a fast, accurate, splice-aware aligner. Lawrence Berkeley National Lab (LBNL), Berkeley
    Chung HC, Lee OO, Huang YL, Mok SY, Kolter R, Qian PY (2010) Bacterial community succession and chemical profiles of subtidal biofilms in relation to larval settlement of the polychaete Hydroides elegans. ISME J 4:817-828 doi: 10.1038/ismej.2009.157
    Coutinho FH, Silveira CB, Gregoracci GB, Thompson CC, Edwards RA, Brussaard CP, Dutilh BE, Thompson FL (2017) Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat Commun 8:1-2 doi: 10.1038/s41467-016-0009-6
    Coutinho FH, Gregoracci GB, Walter JM, Thompson CC, Thompson FL (2018) Metagenomics sheds light on the ecology of marine microbes and their viruses. Trends Microbiol 26:955-965 doi: 10.1016/j.tim.2018.05.015
    Dang H, Lovell CR (2016) Microbial surface colonization and biofilm development in marine environments. Microbiol Mol Biol Rev 80:91-138 doi: 10.1128/MMBR.00037-15
    Danovaro R, Dell'Anno A, Corinaldesi C, Magagnini M, Noble R, Tamburini C, Weinbauer M (2008) Major viral impact on the functioning of benthic deep-sea ecosystems. Nature 454:1084-1087 doi: 10.1038/nature07268
    Engelhardt T, Kallmeyer J, Cypionka H, Engelen B (2014) High virus-to-cell ratios indicate ongoing production of viruses in deep subsurface sediments. ISME J 8:1503-1509 doi: 10.1038/ismej.2013.245
    Fey P, Stephens S, Titus MA, Chisholm RL (2002) SadA, a novel adhesion receptor in Dictyostelium. J Cell Biol 159:1109-1119 doi: 10.1083/jcb.200206067
    Galperin MY, Makarova KS, Wolf YI, Koonin EV (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43:261-269 doi: 10.1093/nar/gku1223
    Gödeke J, Paul K, Lassak J, Thormann KM (2011) Phage-induced lysis enhances biofilm formation in Shewanella oneidensis MR-1. ISME J 5:613-626 doi: 10.1038/ismej.2010.153
    Hagan MF, Zandi R (2016) Recent advances in coarse-grained modeling of virus assembly. Curr Opin Virol 18:36-43 doi: 10.1016/j.coviro.2016.02.012
    Høyland-Kroghsbo NM, Mærkedahl RB, Svenningsen SL (2013) A quorum-sensing-induced bacteriophage defense mechanism. mBio 4:00362
    Høyland-Kroghsbo NM, Paczkowski J, Mukherjee S, Broniewski J, Westra E, Bondy-Denomy J, Bassler BL (2017) Quorum sensing controls the Pseudomonas aeruginosa CRISPR-Cas adaptive immune system. Proc Natl Acad Sci USA 114:131-135 doi: 10.1073/pnas.1617415113
    Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Bio 12:e1004957 doi: 10.1371/journal.pcbi.1004957
    Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf 11:119 doi: 10.1186/1471-2105-11-119
    Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf 11:431 doi: 10.1186/1471-2105-11-431
    Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:353-361 doi: 10.1093/nar/gkw1115
    Kristensen DM, Mushegian AR, Koonin EV (2011) Systems biology of bacteriophage proteins and new dimensions of the virus world discovered through metagenomics. Genome Biol 12:9
    Lazar Adler NR, Dean RE, Saint RJ, Stevens MP, Prior JL, Atkins TP, Galyov EE (2013) Identification of a predicted trimeric autotransporter adhesin required for biofilm formation of Burkholderia pseudomallei. PLoS ONE 8:79461 doi: 10.1371/journal.pone.0079461
    Li D, Liu CM, Luo R, Sadakane K, Lam TW (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674-1676 doi: 10.1093/bioinformatics/btv033
    Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:490-495 doi: 10.1093/nar/gkt1178
    Luqman A, Nega M, Nguyen MT, Ebner P, Götz F (2018) SadA-expressing staphylococci in the human gut show increased cell adherence and internalization. Cell Rep 22:535-545 doi: 10.1016/j.celrep.2017.12.058
    Mann S, Chen YP (2010) Bacterial genomic G+C composition-eliciting environmental adaptation. Genomics 95:7-15 doi: 10.1016/j.ygeno.2009.09.002
    Marshall D, Sample C (1995) Epstein-Barr virus nuclear antigen 3C is a transcriptional regulator. J Virol 69:3624-3630 doi: 10.1128/JVI.69.6.3624-3630.1995
    McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ (2007) Phaser crystallographic software. J Appl Crystallogr 40:658-674 doi: 10.1107/S0021889807021206
    McMinn A, Liang Y, Wang M (2020) Minireview: the role of viruses in marine photosynthetic biofilms. Mar Life Sci Technol 2:203-208 doi: 10.1007/s42995-020-00042-2
    Mizuno CM, Rodriguez-Valera F, Kimes NE, Ghai R (2013) Expanding the marine virosphere using metagenomics. PLoS Genet 9:1003987 doi: 10.1371/journal.pgen.1003987
    Motlagh AM, Bhattacharjee AS, Coutinho FH, Dutilh BE, Casjens SR, Goel RK (2017) Insights of phage-host interaction in hypersaline ecosystem through metagenomics analyses. Front Microbiol 1:1-15
    Paez-Espino D, Chen IM, Palaniappan K, Ratner A, Chu K, Szeto E, Pillay M, Huang J, Markowitz VM, Nielsen T, Huntemann M, Reddy TBK, Pavlopoulos GA, Sullivan MB, Campbell BJ, Chen F, Mcmahon KD, Hallam SJ, Denef VJ, Cavicchioli R et al (2016) IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res 30:1030
    Paez-Espino D, Pavlopoulos GA, Ivanova NN, Kyrpides NC (2017) Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. Nat Protoc 12:1673-1682 doi: 10.1038/nprot.2017.063
    Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043-1055 doi: 10.1101/gr.186072.114
    Patel RK, Jain M (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7:30619 doi: 10.1371/journal.pone.0030619
    Patterson AG, Jackson SA, Taylor C, Evans GB, Salmond GP, Przybilski R, Staals RH, Fineran PC (2016) Quorum sensing controls adaptive immunity through the regulation of multiple CRISPR-Cas systems. Mol Cell 64:1102-1108 doi: 10.1016/j.molcel.2016.11.012
    Raghunathan D, Wells TJ, Morris FC, Shaw RK, Bobat S, Peters SE, Paterson GK, Jensen KT, Leyton DL, Blair JM, Browning DF, Pravin J, Floreslangarica A, Hitchcock J, Moraes CTP, Piazza RMF, Maskell DJ, Webber M, May RC, Maclennan CA et al (2011) SadA, a trimeric autotransporter from Salmonella enterica serovar Typhimurium, can promote biofilm formation and provides limited protection against infection. Infect Immun 79:4342-4352 doi: 10.1128/IAI.05592-11
    Rao VB, Feiss M (2008) The bacteriophage DNA packaging motor. Annu Rev Genet 42:647-681 doi: 10.1146/annurev.genet.42.110807.091545
    Roux S, Enault F, Hurwitz BL, Sullivan MB (2015) VirSorter: mining viral signal from microbial genomic data. Peer J 3:985 doi: 10.7717/peerj.985
    Salta M, Wharton JA, Blache Y, Stokes KR, Briand JF (2013) Marine biofilms on artificial surfaces: structure and dynamics. Environ Microbiol 15:2879-2893
    Scanlan PD, Buckling A (2012) Co-evolution with lytic phage selects for the mucoid phenotype of Pseudomonas fluorescens SBW25. ISME J 6:1148-1158 doi: 10.1038/ismej.2011.174
    Sullivan MB, Waterbury JB, Chisholm SW (2003) Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature 424:1047-1051 doi: 10.1038/nature01929
    Sullivan MB, Coleman ML, Weigele P, Rohwer F, Chisholm SW (2005) Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol 3:790-806
    Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejocastillo FM, Costea PI, Cruaud C, Dovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F et al (2015) Structure and function of the global ocean microbiome. Science 348:1261359 doi: 10.1126/science.1261359
    Suttle CA (2005) Viruses in the sea. Nature 437:356-361 doi: 10.1038/nature04160
    Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725-2729 doi: 10.1093/molbev/mst197
    Tan D, Svenningsen SL, Middelboe M (2015) Quorum sensing determines the choice of antiphage defense strategy in Vibrio anguillarum. MBio 6:00627
    Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33-36 doi: 10.1093/nar/28.1.33
    Thompson LR, Zeng Q, Kelly L, Huang KH, Singer AU, Stubbe J, Chisholm SW (2011) Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc Natl Acad Sci USA 108:757-764 doi: 10.1073/pnas.1012199108
    Thoulouze MI, Alcover A (2011) Can viruses form biofilms? Trends Microbiol 19:257-262 doi: 10.1016/j.tim.2011.03.002
    Thurber RV, Payet JP, Thurber AR, Correa AM (2017) Virus-host interactions and their roles in coral reef health and disease. Nat Rev Microbiol 15:205-216 doi: 10.1038/nrmicro.2016.176
    Vidakovic L, Singh PK, Hartmann R, Nadell CD, Drescher K (2018) Dynamic biofilm architecture confers individual and collective mechanisms of viral protection. Nat Microbiol 3:26-31 doi: 10.1038/s41564-017-0050-1
    Wu YW, Simmons BA, Singer SW (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:605-607 doi: 10.1093/bioinformatics/btv638
    Xu Y, Zhang R, Wang N, Cai L, Tong Y, Sun Q, Chen F, Jiao N (2018) Novel phage-host interactions and evolution as revealed by a cyanomyovirus isolated from an estuarine environment. Environ Microbiol 20:2974-2989 doi: 10.1111/1462-2920.14326
    Yoon SH, Ha SM, Lim J, Kwon S, Chun J (2017) A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek 110:1281-1286 doi: 10.1007/s10482-017-0844-4
    Zhang R, Wei W, Cai L (2014) The fate and biogeochemical cycling of viral elements. Nat Rev Microbiol 12:850-851 doi: 10.1038/nrmicro3384
    Zhang W, Wang Y, Bougouffa S, Tian R, Cao H, Li Y, Cai L, Wong YH, Zhang G, Zhou G, Zhang X, Bajic VB, Al-Suwailem A, Qian PY (2015) Synchronized dynamics of bacterial niche-specific functions during biofilm development in a cold seep brine pool. Environ Microbiol 17:4089-4104 doi: 10.1111/1462-2920.12978
    Zhang W, Ding W, Li YX, Tam C, Bougouffa S, Wang R, Pei B, Chiang H, Leung P, Lu Y, Sun J, Fu H, Bajic VB, Liu H, Webster NS, Qian PY (2019) Marine biofilms constitute a bank of hidden microbial diversity and functional potential. Nat Commun 10:517 doi: 10.1038/s41467-019-08463-z
    Zhang Z, Chen F, Chu X, Zhang H, Luo H, Qin F, Zhai Z, Yang M, Sun J, Zhao Y (2019) Diverse, abundant, and novel viruses infecting the marine Roseobacter RCA lineage. Msystems 4:e00494-e519
    Zhao Y, Temperton B, Thrash JC, Schwalbach MS, Vergin KL, Landry ZC, Ellisman M, Deerinck T, Sullivan MB, Giovannoni SJ (2013) Abundant SAR11 viruses in the ocean. Nature 494:357-360 doi: 10.1038/nature11921
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索


Article Metrics

Article views(1134) PDF downloads(1) Cited by()

Proportional views

Expanding our understanding of marine viral diversity through metagenomic analyses of biofilms

    Corresponding author: Pei-Yuan Qian, boqianpy@ust.hk
    Corresponding author: Weipeng Zhang, zhangweipeng1987@126.com
  • 1. College of Marine Life Sciences, Ocean University of China, Qingdao 266100, China
  • 2. Institute for Advanced Ocean Study, Ocean University of China, Qingdao 266100, China
  • 3. Institute for Advanced Ocean Study, Ocean University of China, Qingdao 266100, China
  • 4. Fok Ying Tung Research Institute, Hong Kong University of Science and Technology, Guangzhou 510000, China
  • 5. Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, China
  • 6. State Key Laboratory of Marine Environmental Science, Xiamen University, Xiamen 361005, China

Abstract: Recent metagenomics surveys have provided insights into the marine virosphere. However, these surveys have focused solely on viruses in seawater, neglecting those associated with biofilms. By analyzing 1.75 terabases of biofilm metagenomic data, 3974 viral sequences were identified from eight locations around the world. Over 90% of these viral sequences were not found in previously reported datasets. Comparisons between biofilm and seawater metagenomes identified viruses that are endemic to the biofilm niche. Analysis of viral sequences integrated within biofilm-derived microbial genomes revealed potential functional genes for trimeric autotransporter adhesin and polysaccharide metabolism, which may contribute to biofilm formation by the bacterial hosts. However, more than 70% of the genes could not be annotated. These findings show marine biofilms to be a reservoir of novel viruses and have enhanced our understanding of natural virus-bacteria ecosystems.


  • Viruses make a significant contribution to nutrient and energy conversion processes in marine ecosystems via the modulation of the structure and functions of protistan, bacterial, and archaeal communities (Breitbart 2012; Kristensen et al. 2011; Suttle 2005; Zhang et al. 2014). The viral shunt releases around 10 billion tons of carbon per day and is probably a fundamental step in marine carbon cycling (Breitbart 2012). Viruses that infect bacterial hosts, known as phages, express auxiliary metabolic genes (AMGs) that influence the central metabolic processes of their hosts, such as photosynthesis and nutrient acquisition (Thompson et al. 2011; Xu et al. 2018).

    However, because of their large and highly dynamic populations, the vast majority of marine viruses remain unexplored. Great efforts have been spent on the isolation of viruses infecting many major marine bacterial lineages, such as Prochlorococcus (Sullivan et al. 2003), SAR11 (Zhao et al. 2013), and Roseobacter (Zhang et al. 2019a). Recent advances in culture-free approaches (e.g., metagenomics) have facilitated an unprecedented increase in the analysis of the diversity of marine microbes (bacteria and archaea) and viruses (reviewed in Coutinho et al. 2018). The global ocean dsDNA viromic dataset was established by the Tara Ocean Project with the goal of exploring ocean virus diversity to better understand the ecological and evolutionary drivers behind these viral communities and to reveal new mechanisms by which these viruses affect global oceanic microbial processes (Brum et al. 2015). In addition to the Tara Ocean Project, several other projects have revealed viruses to be the most abundant biological entities in marine ecosystems, e.g., in coral reefs (Thurber et al. 2017), and in marine sediments (Danovaro et al. 2008; Engelhardt et al. 2014).

    Most oceanic surveys have focused on the viruses infecting free-living bacterioplankton while those associated with biofilms were neglected. Biofilm formation confers several ecological advantages on bacteria and archaea, such as environmental protection, increased access to nutrients, and enhanced interspecies interactions (Dang and Lovell 2016). The individual and collective viral protection is conferred by the biofilm architecture (Vidakovic et al. 2018). Biofilms supported on artificial surfaces have been used as models to study biofilm developmental processes, microbe-invertebrate interactions, and novel microbial diversity and functions in marine environments (Chung et al. 2010; Salta et al. 2013; Zhang et al. 2015). In a recent study, Zhang et al. (2019b) examined 101 biofilm samples formed on man-made panels and natural rocks immersed in eight locations across the Atlantic, Indian, and Pacific Oceans and investigated the microbial (bacterial and archaeal) diversity and functional potential within these microbiomes. In the present study, we analyzed the viral sequences extracted from the same 101 biofilm metagenomes with the aim of obtaining a systematic understanding of viral diversity and function.


    Metagenomic identification of viruses in marine biofilms

  • The locations of where the 101 biofilm samples were collected are shown (Fig. 1). The biofilms were developed on eight types of artificial substrates (polystyrene Petri dishes, zinc panels, aluminum, poly(ether-ether-ketone), polytetrafluoroethylene, poly(vinyl chloride), stainless steel, and titanium). The substrates were deployed at a depth of 1-2 m at eight locations around the world (the South Atlantic, the Red Sea, the waters off Hong Kong, Yung Shue O Bay, the East China Sea, and three sites in the South China Sea). Metagenome assembly generated 72, 132, 494 contigs in total, from which 3974 viral sequences (longer than 5.0 kbp) were predicted. The viral sequences had a maximum length of 351.55 kbp, an average length of 14.57 kbp, and an average cytosine bases (GC) content of 47.75%. The novelty of the viruses was evaluated by comparing the viral sequences with the Integrated Microbial Genome/Virus (IMG/VR) database, which contains sequences from almost 8500 isolated viruses and over 700, 000 viral contigs from metagenomes. Consistent with the rules used in a previous study (Paez-Espino et al. 2017), biofilm-derived viral sequences of over 1 kbp with 90% or higher similarity to sequences in the IMG/VR database were considered to be known viruses; only 358 (9.01%) biofilm-derived viral sequences were found in the IMG/VR database (Fig. 2).

    Figure 1.  Sampling locations of the 101 biofilms. The eight locations include (1) South Atlantic, (2) Red Sea, (3) Hong Kong Water, (4) Yung Shue O Bay, (5) East China Sea, (6) South China Sea 1, (7) South China Sea 2, and (8) South China Sea 3. Tara surface seawater samples used for comparison are also shown

    Figure 2.  Similarity between viral sequences identified from marine biofilms and those documented in the IMG/VR database. The biofilm-derived viral sequences were BLASTn searched against the IMG/VR database. The BLASTn hits with over 90% similarity for more than 1000 bp alignments were considered to be known viruses (dots in red), while the other hits (dots in blue) and those with no significant similarity were considered to be novel viruses

    74, 895 open reading frames (ORFs) were predicted from the biofilm-derived viral sequences. To confirm the VirSorter prediction, HMMER was used to search the ORFs against the virus orthologous groups (VOGs) database. As a result, all the viral sequences had ORFs that achieved hits in the VOG database. In total, 31, 038 VOG hits (41.44% of the total ORFs) were obtained, of which 2764 were non-redundant. The 30 most abundant VOGs consisted of genes encoding viral structural proteins, such as terminase large unit (VOG09355), base plate protein J (VOG00195), terminase large unit gp2 (VOG00080), and probable capsid protein gp17 (VOG02249) (Supplementary Fig. S1). The other abundant VOGs included genes responsible for DNA replication and transcription, such as DNA polymerase (VOG00073) (Supplementary Fig. S1).

    Taxonomic classification indicated that 81.60% of the VOGs were associated with Caudovirales, 0.32% were associated with Maveriviricetes, with the remaining VOGs considered to be unclassified viruses (Supplementary Fig. S2). Phylogenetic analysis using the terminase large subunit VOG9355, identified from the biofilm viral sequences and sequences from the VOG database, revealed three relatively independent branches formed by the biofilm viruses, most likely representing novel viral lineages (Fig. 3).

    Figure 3.  Phylogenetic tree of the terminase large subunit gene VOG9355 identified from the biofilm phage sequences. Closely related terminase large subunit gene sequences documented in the VOG database were revealed by hmmscan and then used as a reference. The protein sequences that could be aligned by ClustalW were used to construct a maximum likelihood tree with 1000 replicates. Bootstrap values (> 50) are shown on the branches. All the gene sequences from biofilms are shown in blue, and branches that represent potentially novel viral lineages are shown in red

  • Endemism of the viruses to marine biofilms

  • To explore the niche specificity of the viruses detected in the biofilms, the abundance of the biofilm-derived viruses was investigated by mapping the metagenomic reads of the 101 biofilm and 91 seawater samples (10 million reads per sample) to the viral sequences. To this end, 250 viral sequences with coverage > 1 in at least one biofilm and coverage = 0 in all seawater samples were identified (Fig. 4), suggesting the existence of viruses that are endemic to the biofilm niche. To confirm this result, five phages that were abundant in the Red Sea biofilms were selected their distribution in nine Red Sea biofilm samples and nine adjacent seawater samples were investigated. The number of reads mapped to these phages exceeded 100 in almost all of the biofilm samples but was close to zero in the seawater metagenomes (Supplementary Fig. S3).

    Figure 4.  The endemism of biofilm-derived viruses. Metagenomic reads of 101 biofilm and 91 seawater samples were mapped to the biofilm-derived viral contigs to compare their abundance in biofilms and seawater. Metagenomic reads (101 bp and ten million reads per sample) were mapped to gene sequences. Viral sequences with coverage > 1 in biofilms and coverage = 0 in seawater samples are presented

  • Viruses in single genomes and their functions

  • To investigate the hosts of the viruses and the potential virus-host interactions, 479 microbial genome bins extracted from the biofilm metagenomes were analysed. These genome bins belonged to 20 different microbial phyla, including Proteobacteria (272 genomes), six 'Candidatus' phyla (7 genomes), Acidobacteria (6 genomes), Actinobacteria (10 genomes), Bacteroidetes (100 genomes), Cyanobacteria (34 genomes), Deinococcus-Thermus (1 genome), Firmicutes (2 genomes), Lentisphaerae (2 genomes), Parcubacteria (1 genome), Planctomycetes (22 genomes), Rhodothermaeota (1 genome), Verrucomicrobia (19 genomes), and Euryarchaeota (2 genomes) (Supplementary Fig. S4). Viral sequences from the genome bins were identified using the software PHASEER (McCoy et al. 2007), which was designed for mining phage sequences from draft genomes. In total, 149 phage sequences were distributed in 101 bacterial genome bins of Alphaproteobacteria, Gammaproteobacteria, Acidobacteria, Actinobacteria, Bacteroidetes, Candidatus, Gracilibacteria, Cyanobacteria, Firmicutes, Lentisphaerae, Oligoflexia, Planctomycetes, and Rhodothermaeota (Fig. 5a). Within these taxa, Gammaproteobacteria (n = 43), followed by Alphaproteobacteria (n = 30), possessed the largest number of phage-containing genome bins (Fig. 5a). The GC content of the phage contigs was compared with that of the bacterial genomes and found to be very similar (Supplementary Fig. S5).

    Figure 5.  Identification of viral sequences from microbial genome bins and viral gene functional annotation. A Viral sequences were identified from 100 microbial genomes distributed across 11 bacterial phyla (Proteobacteria were divided into Alpha- and Gamma-proteobacteria). B The number of viral genes annotated by BLASTp searching against the COG database for functional classification

    To detect potential functions encoded by these phages, all genes derived from the phage sequences (4121 predicted ORFs) were analyzed by classifying the gene functions using the COG database (Galperin et al. 2015; Tatusov et al. 2000), which resulted in 22 COG categories (Fig. 5b). In total, 1023 ORFs (24.82%) resulted in hits in the COG database; however, 521 ORFs were classified as "general function" predictions only [R] or as "function unknown" [S]. Of the remaining 502 COGs, 40 were classified as being involved in amino acid transport and metabolism [E], nucleotide transport and metabolism [F], or as carbohydrate transport and metabolism [G], such as the genes encoding Na+/glutamate symporter [COG0786], deoxynucleotide kinases [COG1428], and chitinase [COG3325] (Fig. 5b).

    The functions of all genes derived from these phage contigs (4121 predicted ORFs) were further analyzed by searching them against the KEGG (Kanehisa et al. 2017) and CAZy databases (Lombard et al. 2014). The genes were characterized by searching the 1062 ORFs against the KEGG database's annotated sequences and the top 18 abundant KEGG matches are shown in Fig. 6. Interestingly, the most abundant KEGG hit was for trimeric autotransporter adhesin (K21449) and the genes for viral structure (e.g., K06909), transcriptional regulation (e.g., ParB family transcriptional regulator, chromosome partitioning protein K03497) and DNA replication (e.g., putative DNA primase/helicase K06919) were also annotated. KEGG annotation also revealed uncharacterized but relatively conserved genes (n = 96; 9.0% of all KEGG hits), such as K06903, K06907, and K06904. In parallel, 351 ORFs were annotated by CAZy: these mostly included genes for lysozymes, chitinases, lyase, and peptidoglycan lytic transglycosylases (Supplementary Fig. S6). In total 1133 ORFs were annotated by the CAZy or KEGG database, while the remaining 72.51% achieved no hits.

    Figure 6.  Potential auxiliary metabolic genes of phage genes extracted from the bacterial bins. The gene functions were predicted by BLASTp searching against the KEGG database. The top 18 (gene number > 10) KEGGs are shown

  • The finding here that biofilms are composed of a number of previously unknown viruses is consistent with the notion that biofilm formation promotes virus accumulation and may be a potential library of infectious pathogens (Bettarel et al. 2006). When the biofilm-derived viral sequences were aligned with the VOG database, the most abundant genes were found to be related to structure and replication. More specifically, the base plate is a part of tailed prokaryotic viruses, such as Caudovirales, and it suggests the prevalence of tailed viruses in marine biofilms. The terminase large subunit is a viral DNA-packaging motor, which cleaves viral DNA into smaller pieces and inserts them into a procapsid powered by ATP hydrolysis (Rao and Feiss 2008). Capsid proteins encoded by relatively short genes function to protect nucleic acids and the tertiary structure of capsid proteins contain all the information required for virus assembly (Hagan and Zandi 2016). The annotation of these VOGs validates the conserved structure and function of biofilm-derived viruses; however, phylogenetic analysis of these proteins also indicates the existence of novel viral lineages in marine biofilms.

    There have been few studies reporting on virus endemism in environmental niches. In this study, it is shown that biofilm virus endemism is much greater than in seawater: 250 viral sequences were present in the biofilms collected from the different oceans, but they were absent from all the seawater samples. While surface-associated microbes and viruses must be seeded from seawater, many viruses are very scarce in seawater and so are unlikely to be sampled. Extracellular DNA released through cell lysis mediated by phages has been shown to enhance biofilm formation (Gödeke et al. 2011). Certain viruses are capable of forming biofilm-like assemblies for propagation (Thoulouze and Alcover 2011). In addition, phages can select for a mucoid bacterial phenotype to co-evolve and induce biofilm formation (Scanlan and Buckling 2012). One of the underlying mechanisms coordinating this relationship between viruses and biofilms involves quorum-sensing signals, which upregulate the expression of CRISPR-related genes (Høyland-Kroghsbo et al. 2017; Patterson et al. 2016) and decrease the level of phage receptors (Høyland-Kroghsbo et al. 2013; Tan et al. 2015). Another reason why so many novel viruses were discovered in biofilms is the seawater filtering process, which can highly concentrate the low abundant viruses that are missed during seawater sampling.

    According to previous metagenomic analyses of marine viruses (Coutinho et al. 2017; Mizuno et al. 2013), Cyanobacteria, Actinobacteria, Alphaproteobacteria, Gammaproteobacteria, and Verrucomicrobia are the most prevalent phage hosts. Results presented here are consistent with previous reports with Alpha- and Gamma-proteobacteria being the major hosts of phages in the biofilms. The proportion of guanine and GC content in DNA provides survival advantages in the adaption to environmental conditions (Almpanis et al. 2018; Mann and Chen 2010). Results presented here show a similar GC content between the phages and their hosts, suggesting that the viruses have adapted to their hosts and that certain environmental factors have had roles in shaping the intimate relationships between the phages and the bacteria in the biofilms (Motlagh et al. 2017). Viral sequences identified from microbial genomes are probably phages; however, due to technical limitations, it is difficult to extract all the genomes from metagenomes and distinguish all the phages from free viruses.

    With regard to phage function, more than 70% of the ORFs could not be annotated by the COG, KEGG, or CAZy databases, indicating the limited understanding of the function of biofilm-derived viruses and the need for additional experimental research. COG annotation suggested that the phages inhabiting biofilms may encode enzymes involved in central carbon metabolism. No phage genes for photosynthesis were detected, suggesting that the phages contribute little to carbon fixation in the biofilm communities, which is in contrast to previous findings that showed photosynthetic genes are prevalent in phages infecting subtidal microbial communities (McMinn et al. 2020; Sullivan et al. 2005; Thompson et al. 2011). Notably, 89 genes were found to code for trimeric autotransporter adhesin (K21449), which is a trimeric autotransporter that promotes biofilm formation in bacteria (Fey et al. 2002; Luqman et al. 2018; Raghunathan et al. 2011); mutation of this gene abolished the ability of biofilms to attach to plastic surfaces (Lazar Adler et al. 2013); over-expression of this gene in Salmonella enterica increased cell aggregation and adhesion to human intestinal Caco-2 epithelial cells (Raghunathan et al. 2011). Similarly, a recent study showed that SadA-expressing Staphylococci from the human gut showed increased cell adherence and internalization (Luqman et al. 2018). The high abundance of K21449 indicates the role of phages in facilitating biofilm formation by the bacterial hosts and thus provides clues to the specificity of the viral sphere in marine biofilms. Transcriptional regulators may also have significant mediating effects on the interactions between human beings and Epstein-Barr viruses (Arvey et al. 2012); however, the function of transcriptional regulators in marine viruses is unclear. Furthermore, the polysaccharide metabolism genes (e.g., chitinases) annotated by CAZy are probably used by phages to lyse hosts and are involved in carbon recycling within the biofilm communities.

  • Here we found that over 90% of the biofilm-derived viruses had no overlap with the IMG/VR database and provided evidence for the existence of viruses endemic to biofilms, suggesting that biofilm formation enables the discovery and reconstruction of viral genomes from marine environments. We identified potential auxiliary metabolic genes for trimeric autotransporter adhesin and polysaccharide metabolism in viral sequences integrated into the biofilm-derived microbial genomes, suggesting that phages may contribute to biofilm formation by the bacterial hosts, yet more than 70% of the phage genes functions remain unknown. Taken together, the present study has unveiled a hidden marine virosphere with novel viral diversity and unexplored functions.

Materials and methods


  • The biofilms were developed on eight types of artificial substrates: polystyrene petri dishes (9 × 1.2 cm), zinc panels (11 × 11 cm), aluminum, poly(ether-ether-ketone), polytetrafluoroethylene, poly(vinyl chloride), stainless steel, and titanium (5 × 5 cm). The artificial substrates were deployed at a depth of 1-2 m at eight locations around the world: the South Atlantic, the Red Sea, the waters off Hong Kong, Yung Shue O Bay, the East China Sea, and three sites in the South China Sea. The petri dishes were immersed in seawater for 12 days to allow for biofilm formation; the other artificial substrates were immersed for 30 days to allow for visible bacterial attachment. Biofilms that had formed on natural rocks were also collected. After collection, the biofilms were immediately transferred to the laboratory, and the surface bacterial cells were removed using sterile cotton tips and stored in 5 ml of DNA storage buffer (500 mmol/L NaCl, 50 mmol/L Tris-HCl, 40 mmol/L EDTA, and 50 mmol/L glucose) at − 80 ℃. During biofilm development, adjacent seawater samples were collected and successively filtered through 0.1-μm polycarbonate membrane filters (Millipore, Massachusetts, USA). The filters were stored in 5 ml of DNA storage buffer at − 80 ℃. In total, 101 biofilms and 24 seawater samples were collected. Additionally, 67 Tara seawater samples collected from marine surface (Sunagawa et al. 2015) were also used for comparisons between the biofilms and seawater (Supplementary Table S1).

  • DNA extraction and sequencing

  • Biofilms from the cotton tips and seawater samples on the filters were re-suspended in Tris-HCl buffer, pelleted by centrifugation at 4000 g for 10 min and then lysed with lysozyme (37 ℃ for 30 min) and the lysis buffer provided by the TIANamp Genomic DNA Kit (Tiangen Biotech, Beijing, China). Then, DNA extraction was performed using the TIANamp Genomic DNA Kit, following the manufacturer's protocol. DNA sequencing for the Red Sea samples was performed at the Beijing Genomics Institute (BGI, Beijing, China), and the other samples were sequenced at the Novogene Bioinformatics Institute (Novogene, Beijing, China). After the construction of 350-bp insert libraries, the DNA was sequenced on the HiSeq X Ten System at Novogene and the HiSeq 2500 System at BGI. Quality control was performed on a local server using the software NGS QC Toolkit (version 2.0) (Patel and Jain 2012) to remove low-quality reads (assigned by a quality score < 20 for > 30% of the read length) or unpaired high-quality reads. Information on metagenomic reads is given in Supplementary Table S1.

  • Metagenomic assembly and microbial genome binning

  • Following quality control, reads from the biofilm metagenomes were assembled into contigs using the software MEGAHIT (version 1.0.2) (Li et al. 2015) with kmer values of 21-121, increasing in steps of 10. Coverage information was generated by mapping metagenomic reads to the contigs using Bowtie2 (fastq as input format under a sensitive-local model). The contigs as well as the coverage information were used as input for MaxBin (version 2.0) (Wu et al. 2016) to assign the contigs to single genomes. The single genomes were further analyzed using MetaBAT for purification. The completeness and contamination of the genome bins were analyzed using CheckM (Parks et al. 2015). Duplicated genomes were removed based on the average nucleotide identity (ANI) information provided by the ANI calculator (Yoon et al. 2017), where genome pairs with ANI values exceeding 0.99 were taken as redundant genomes. Information of the assembled metagenomic contigs is given in Supplementary Table S2. Information on the genome bins is provided in Supplementary Table S3.

  • Viral sequences prediction and annotation

  • The software VirSorter (version 1.0.5) (Roux et al. 2015), installed on a local server, was used to identify viral sequences from the metagenomic contigs and genome bins. The database 'Refseqdb' and the mode 'BLASTp' were used for mining viral sequences, and only viruses in the categories of 'sure' or 'somewhat sure' were retained for the following analyses. Metagenomic reads of 101 biofilms and 91 seawater samples were mapped to the viral sequences using bbmap (version 2) (Bushnell 2014) to indicate viral coverage in biofilms and seawater (minimum alignment identity = 0.76). All the metagenomes for mapping were normalized to 10 million reads per metagenome, and all reads were trimmed to 101 bp in length by NGS QC Toolkit (version 2.0). The viral ORFs were predicted using Prodigal (version 2.0) (Hyatt et al. 2010) in the Meta model (only closed ends were allowed). A HMMER hmmscan (Johnson et al. 2010) against the VOG database (https://vogdb.org) was performed to classify the ORFs using an e-value cutoff of 1e − 7, and then the taxonomic affiliation was examined by MEGAN (Huson et al. 2016). The reference genes were selected from VOG database with hmmscan, and a phylogenetic tree was established with ClustW and 1000 bootstraps by MEGA 6 (Tamura et al. 2013). For potential function mining, annotation of the phage genes was performed by BLASTp (e value 1e − 7) searching against the COG (Galperin et al. 2015; Tatusov et al. 2000), KEGG (Kanehisa et al. 2017), and CAZy (Lombard et al. 2014) databases. The workflow of the present study is summarized in Supplementary Fig. S7.

  • The authors are grateful to a grant from the National Key Research and Development Program of China (2018YFC0310600) and two grants from Ocean University of China (841912035 and 842041010) to W.Z. The authors are also grateful to a grant from China Ocean Mineral Resources Research and Development Association (DY135-B2-03) and a grant from the Hong Kong Branch of South Marine Science and Engineering Guangdong Laboratory (SMSEGL20SC01) to P.Y.Q.

Author contributions
  • WZ, P-YQ, and ZR designed the project; WD, RW, and ZL performed the analysis; WD, RW, and WZ wrote the manuscript.

Data availability
Compliance with ethical standards

    Conflict of interest

  • The authors declare that they have no conflict of interest.

  • Animal and human rights statement

  • This article does not contain any studies with human participants or animals performed by any of the authors.

Reference (62)



DownLoad:  Full-Size Img  PowerPoint