Common structuring principles of the Drosophila melanogaster microbiome on a continental scale and between host and substrate

YunWang, 1,2 Martin Kapun, 1,3,4 LenaWaidele, 1,2 Sven Kuenzel, Alan O. Bergland 1,6 and Fabian Staubach 1,2* The European Drosophila Population Genomics Consortium (DrosEU). Department of Evolutionary Biology and Animal Ecology, Biology I, University of Freiburg, Freiburg im Breisgau, Germany. Department of Evolutionary Biology and Environmental Studies, University of Zürich, Zürich, Switzerland. Department of Cell and Developmental Biology, Medical University of Vienna, Vienna, Austria. Max Planck Institute for Evolutionary Biology, Plön, Germany. Department of Biology, University of Virginia, Charlottesville, VA, USA.


Introduction
Environmental factors and stochastic processes play important roles in shaping microbiomes (Spor et al., 2011;van Opstal and Bordenstein, 2015). Recent data across a variety of hosts, including mammals and insects, suggest that environmental factors like host diet often have a dominant effect on microbiome composition (Chandler et al., 2011;Wu et al., 2011;Staubach et al., 2013;Wang et al., 2014;Waidele et al., 2017;Rothschild et al., 2018). In order to explain variation between microbiomes from the same species, stochastic, ecologically neutral processes have moved into focus. These processes comprise ecological drift, dispersal and colonization history. It appears plausible for diverse species (Sieber et al., 2019), including Drosophila  that neutral processes dominate microbiome assembly.
The ability of D. melanogaster to shape its associated microbiome might be limited (Blum et al., 2013;Wong et al., 2013). Instead, probabilistic processes contribute to gut colonization (Obadia et al., 2017) and community structure in natural populations can be explained to a large extent by neutral ecological mechanisms . As in mammals and other organisms, environmental factors, such as the time of collection Behrman et al., 2018) or diet (Chandler et al., 2011;Staubach et al., 2013;Erkosar et al., 2018;Wang and Staubach, 2018) have a strong effect on the Drosophila microbiome.
On the other hand, D. melanogaster microbiome structure is associated with host genotype (Unckless et al., 2015;Chaston et al., 2016;Behrman et al., 2018), indicating at least low levels of genotype-dependent microbiome structuring. While the midgut and the stool of flies share many microbial taxa, they nonetheless differ significantly as has been shown by 16S rRNA gene sequencing in Fink and colleagues, (2013), suggesting a selection process inside the fly takes place. Digestion of bacteria, persistence and proliferation in the fly gut could play a role in this process (Obadia et al., 2017;Inamine et al., 2018). Wong and colleagues. (2015) showed that flies affect their microbiome and that of their immediate environment. Evidence that D. melanogaster exerts control over its microbiome that goes beyond an effective immune response comes from a recent study, which found that D. melanogaster larvae foster beneficial Lactobacillus plantarum via excretions (Storelli et al., 2018). Pais and colleagues. (2018) showed that Acetobacter thailandicus can persist in the gut of D. melanogaster, can be dispersed by the host and provide a fitness benefit to the host, while closely related Acetobacter strains cannot. In lab-reared flies, dysregulation of antimicrobial effectors leads to highly specific changes in microbiome composition that can affect closely related taxa in opposing ways. The resulting microbiome can be detrimental for the host , suggesting fine-tuned host control.
Taken together, the D. melanogaster microbiome is strongly affected by the environment and by stochastic effects. At the same time, highly specific host genotypedependent interactions with the microbiome seem to be at work. This is consistent with newly developed models that combine environmental effects and host feedback on the metacommunity (Adair and Douglas, 2017) with dispersal and colonization dynamics (Miller et al., 2018).
What is currently missing to understand the role of environmental factors and the host in microbiome composition in D. melanogaster is more information on populations and environmental factors on different ecological scales and under natural conditions. These are conditions where the interaction between the Drosophila host and its microbiome has evolved.
If host genetic factors were important for the natural microbiome associated with D. melanogaster, we would expect co-structure of host genetic variation with the microbiome. Furthermore, we would expect that microbes that are affected by host genetic factors should differ in abundance between the host and its environment because the effects of host genetic variation on the microbiome should be smaller or absent outside the host. Finally, if the genetic variation present in natural populations evolved under selective pressure to preferentially associate with beneficial microbes those microbes enriched in the host should have a propensity to provide fitness benefits as opposed to detrimental bacteria.
In order to test these predictions, we profiled the microbiome in 50 samples across Europe. Because the Drosophila microbiome is dominated by bacteria Kapun et al., 2018), we used 16S rRNA gene sequencing ( Fig. 1 and Table S1). The sampling range covered different climates and allowed us to address the effect of environmental factors on the microbiome. We combined the 16S profiles with population-level allele frequency data for more than 20 000 neutral SNPs to test for the co-structure of the microbiome with host genetic variation (Fig. S1). From this data set, we identified bacterial taxa that correlated with the host population structure. These taxa were analysed in an independent experiment, where we compared fly-associated microbiomes to that of their substrate, to test for potential effects of host filtering at a small scale. Finally, we assessed the fitness effects from the recent literature of those taxa that were filtered by the host.

Results
We analysed a total of 5 217 762 16S rRNA reads after rigorous quality filtering (Table S2). A total of 2 672 402 Wolbachia sequences were removed. We grouped the sequences into 100% identity operational taxonomic units (OTUs) to resolve strain-level differences because the interaction with the fly host may differ for closely related bacteria Pais et al., 2018). Composition and diversity of the bacterial microbiome were comparable to those from previous studies (Fig. S2). Host genetic data are based on allele frequency estimates from 20 000 small intronic SNPs in 48 pooled population samples [ Fig. 1, Fig. S1, ].
The natural D. melanogaster bacterial microbiome is costructured with host genetic differentiation on a continental scale Bray-Curtis Dissimilarity between bacterial communities increases with geographic distance (r = 0.196, P = 0.0015, Mantel test, Fig. 1) indicating geographic structuring of microbial communities associated with D. melanogaster on a continent-wide scale. Jaccard and Morisita-Horn distances provided almost identical results (Fig. S3).
When accounting for geographic distance, host genetic distance still explained a significant proportion of the variance (r = 0.19, P = 0.02, Partial Mantel test, Fig. S3). Beta diversity was rather continuously distributed across the European continent ( Fig. S4A-C). Furthermore, we did not detect any effects of the English Channel nor the Alps on beta diversity.
Host genetic differentiation, temperature and foodsubstrate correlate with microbiome structure We modelled microbiome composition in a redundancy analysis (RDA) framework. We selected temperature, precipitation and substrate as plausible (Chandler et al., 2011;Staubach et al., 2013;Thompson et al., 2017;Wang and Staubach, 2018) candidate external environmental variables that could affect microbiome structure. Because microbial communities might reflect seasonal trends in temperature and precipitation, we also tested monthly means of climatic variables in our model. Finally, we included host genetic differentiation in our model.
The full model explained approximately half (46.3%) of the variance, demonstrating that the model contained factors relevant for microbiome composition. In order to select the most important explanatory variables from the full model, we employed the automated Ordistep forward model selection approach. This identified a model solely based on host genetic differentiation (PC1) as superior according to Akaike's Information Criterion (Table 1). However, (PC2) mean annual temperature [T(y)], and substrate were also significant predictors of microbiome composition. Host genetic variations (PC1 and PC2) were also significant predictors of microbiome composition when they were added to the model as the last variable (Table S3). When we included geographic structure (longitude and latitude) in our model, PC1 and PC2 were also picked as significant predictors for microbiome structure (Table S4).
By applying the same model selection approach at the bacterial family level, we also identified host genetic differentiation (PC1) and annual mean temperature [T(y)] as significant predictors of microbiome composition (Table S5).
The abundance of OTU2 (Commensalibacter), and Enterobacteriaceae co-vary with host population structure In order to identify bacteria underlying the correlation of microbiome composition with host population structure, we tested whether the relative abundance of OTUs correlated with host genetic variation. At the 100% identity OTU level, only the abundance of OTU2 (Commensalibacter, The forward selection approach starts with the null model, adding the best explanatory factors one by one until adding the next factor fails to significantly improve the model. The selected model is in bold. P-values are based on permutation tests. PC1 = Axis 1 of host genetic variation, PC2 = Axis 2 of host genetic variation, substrate = substrate the flies were collected from, T(y) = mean annual temperature, T(m) = mean monthly temperature. No evidence for pronounced dispersal limitation of bacteria that correlate with host population structure The bacterial groups that were structured according to host population structure in Europe (OTU2, Enterobacteriaceae, Acetobacteraceae and Leuconostocaceae) were also found along the East Coast and West Coast of the United States (Fig. S5). Furthermore, these bacterial groups were also previously found in association with wild-caught D. takahashii from Hawaii, D. sechellia collected from morinda fruit on the Seychelles, cactus feeding D. mojavensis and even in mushroom feeding Microdrosophila (Chandler et al., 2011). A representative 16S rRNA gene sequence of OTU2 matched sequences from these diverse locations and species perfectly (Chandler et al., 2011).
Microbes that show patterns of continental co-structure also show structuring between flies and their substrate in an independent experiment We hypothesized that if host factors played a role in the continental co-structure of host populations and the microbiome, we should see the effects of the host environment on the microbiome under natural conditions. To test this, we analysed the abundance of microbes that were co-structured with host populations in flies and their immediate substrate. Specifically, we hypothesized that Enterobacteriaceae would be depleted in the flies because flies might reduce contact with bacteria from this family as it contains many insect pathogens. Indeed, Enterobacteriaceae were more abundant in the substrate than in the flies (P = 0.026, paired Mann-Whitney test, one-sided, Fig. 2). Furthermore, we expected to find OTU2 (C. intestini) at higher abundance in the fly than in the substrate because this OTU is a common member of the D. melanogaster associated community and contributes to healthy gut homeostasis Chandler et al., 2011). Indeed, OTU2 was enriched in flies ( Fig. 2, P = 0.022, paired Mann-Whitney test, one-sided). Conversely, we expected that Acetobacteraceae, in general, would be enriched in flies over substrate because this family contains several members that benefit D. melanogaster (Shin et al., 2011;Pais et al., 2018). Our analysis confirmed this expectation (Fig. 2, P = 0.034, paired Mann-Whitney test, one-sided). However, when OTU2 was excluded from the latter analysis of Acetobacteraceae, Acetobacteraceae were not significantly enriched in flies anymore (P = 0.21 paired Mann-Whitney test, one-sided), indicating that OTU2 contributed to family level differences. We found no difference between flies and substrate for Leuconostocaceae (P = 0.27, paired Mann-Whitney test, two-sided).

Co-structure between host genetic variation and the microbiome
The European D. melanogaster microbiome variation correlates weakly with the genetic differentiation of the host. This is consistent with a model in which genome-wide differentiation of the host populations might also affect loci that interact with the microbiome. Given previous evidence for microbe-interacting loci that vary between natural populations (Corby-Harris and Promislow, 2008; Lazzaro et al., 2008;Behrman et al., 2018;Walters et al., 2020) this seems a reasonable model. Alternatively, the co-structure could be explained by environmental factors that affect both, the microbiome and host genetic variation. However, this explanation appears unlikely for two reasons: First, we accounted for the most plausible environmental factors that could affect microbiomes and the host at the same time in our model (food-substrate, temperature, precipitation). Second, we used SNPs from small introns that are considered least affected by natural selection (Parsch et al., 2010) to assess host demography.
It is similarly difficult to explain co-structure solely on the ground of geographic patterns and simple models of independent colonization history of host and microbes. Host genetic differences remained significant predictors of microbiome composition (beta diversity), when accounting for geography, suggesting that the natural D. melanogaster microbiome is not independent from its host. In principle, this dependence as such could also reflect a dependence of the microbiome on co-dispersal with the host. However, we found no support for dispersal limitation of the relevant microbial taxa. Instead, our data and previous studies suggest that these bacteria are cosmopolitan (Cox and Gilmore, 2007;Chandler et al., 2011). There was also no evidence for the effects of geographic barriers on beta diversity that would be expected if bacterial dispersal dynamics were the main driver of continental beta diversity.
Differences in bacterial abundance between fly and substrate are consistent with host filtering of microbes that are co-structured with the host on a continental scale Three of the four bacterial groups (Acetobacteraceae, Enterobacteriaceae and OTU2) that correlated with host genetic variation on a continental scale (Fig. 2) also differed in abundance between flies and their substrate (Fig. 2). Because D. melanogaster constantly exchanges microbes with the environment (Blum et al., 2013;Inamine et al., 2018) and because we also removed the vertically transmitted Wolbachia from our analysis, models that include host filtering (Adair and Douglas, 2017;Miller et al., 2018) appear more likely to explain differences between host and substrate than a model that assumes vertical transmission or relies on substrate-host colonization dynamics alone.
Fitness effects of microbes that are co-structured with the host populations and differ in abundance between host and substrate The patterns of enrichment and depletion of bacteria between host and substrate suggest fitness benefits from potential filtering in the fly environment. A reduction of Enterobacteriaceae in the fly environment (Fig. 2) is likely beneficial for the flies because this family comprises a range of important D. melanogaster pathogens (Basset et al., 2000;Vodovar et al., 2005;Lazzaro et al., 2006;Galac and Lazzaro, 2011). A reduction of Enterobacteriaceae in the fly gut is in line with results from Ryu and colleagues. (2008), who showed that Enterobacteriaceae, including the highly pathogenic Erwinia carotovora carotovora-15 do not persist in the fly gut.
In principle, it seems also possible that depletion of potential pathogens in the fly over the substrate could result from flies dying quickly and escaping our sampling when they get into contact with pathogens. However, this seems an unlikely explanation because pathogens can reach high titers in flies before the host dies. With an intact gut barrier, many pathogens can reach high titers through oral ingestion with no or limited pathogenic effects (e.g. Kuraishi et al., 2011). Furthermore, in a systemic infection, pathogens can reach high titers of millions of bacteria before the fly dies (Galac and Lazzaro, 2011;Duneau et al., 2017). Finally, flies that survive systemic infections often continue to carry chronic infections with pathogen loads that can be in the range of tens of thousands of live bacteria with little effect on their lifespan (Chambers et al., 2014(Chambers et al., , 2019Duneau et al., 2017).
In contrast, Acetobacteraceae were enriched in the host. This enrichment was mainly driven by OTU2 (C. intestini). OTU2 matches sequences from previous studies on fruit flies in the natural environment (BLAST results, Table S6) (Cox and Gilmore, 2007;Chandler et al., 2011;Wang and Staubach, 2018) and in the laboratory . In particular, it perfectly matches C. intestini strain A911 (Roh et al., 2008) that is favoured by the fly host and has a protective function.
Taken together our data are consistent with low but highly specific levels of filtering in the host environment likely providing an overall fitness benefit to the host. This is also supported by recent work showing in a multispecies framework that there are host determinants of microbiome variation between Drosophilids that increase fly fitness (Adair et al., 2020). Low levels of host filtering that benefit the host meet the expectations of the 'ecosystem on a leash model' (Foster et al., 2017), where the host-associated microbiome is allowed to vary, but the microbes directly affecting host fitness experience filtering. In this context, our observation that potential host filtering of fitness-relevant microbes varies across the European continent would suggest that there are yet unknown evolutionary trade-offs or that bacterial strain variation that was not captured by our 16S rRNA gene sequencing approach might play a role for host fitness.

Environmental factors and the D. melanogaster microbiome
While the effects of substrate on the fly microbiome are well described (Chandler et al., 2011;Staubach et al., 2013;Wang and Staubach, 2018), continental-scale effects of temperature on a host-associated microbiome has, to our knowledge, not been described before. Temperature affects environmental microbiomes on a global scale (Zhou et al., 2016;Thompson et al., 2017) and the effect in Drosophila might reflect the exposure of flies to different environmental microbiomes, supporting a model in which the environmental metacommunities differ and affect composition of the host-associated community (Adair and Douglas, 2017;Miller et al., 2018). The correlation with monthly temperature at the collection date only showed a non-significant trend (P = 0.065). Because our seasonal sampling was relatively limited (nine locations), more data are required to quantitatively address the question of whether seasonal temperature changes affect the microbiome.

Fly and substrate samples
European fly samples were collected as described in Kapun and colleagues. (2018). In short, 50 samples of D. melanogaster were collected from 31 locations across the European continent with a joint effort of European research groups (Fig. 1). Each sample contained a pool of 33-40 wild-caught males. Detailed sampling information can be found in Table S1 and the supplementary methods.
For the survey of the microbiome of flies and their substrate, pairs of pools of five flies and the corresponding substrate for a total of 24 samples were collected. The immediate substrate, on which the flies that we collected were sitting and feeding, was collected with a sterile scalpel and transferred to a sterile microcentrifuge tube. The survey spanned six different substrates (grapes, apples, cherries, plums, cactus fruit and lemons) from four locations (three from the United States and one from Germany). For the remaining samples, please see supplementary methods and Table S1.

PCR and sequencing
Barcoded bacterial broad-range primers, from Caporaso and colleagues. (2011) were used to amplify the V4 region of the bacterial 16S rRNA gene. The resulting amplicons were sequenced on an Illumina MiSeq sequencer reading 2 × 250 bp.
The host population genetic data stem from Kapun and colleagues. (2018). In short, 48 DNA pools, each representing one population sample and consisting of DNA from 33 to 40 flies, were sequenced to~50× coverage on an Illumina sequencer, reading 2 × 150 bp. SNPs were annotated for all samples and allele frequencies estimated from allele-specific coverage in the pools.

Data analysis
We analysed the 16S rRNA gene sequencing data using MOTHUR v1.40.0 (Schloss et al., 2009). A detailed analysis script can be found in the supplementary Script 1.
To identify factors that shape microbial communities, we applied RDA, following Borcard and colleagues., (2018) (supplementary Script 2). In order to reduce the effects of rare species on RDA, we focused the analysis on OTUs with more than 1000 reads across samples. Data of annual and monthly mean temperatures (BIO 1 and tmean) and precipitation (BIO 12 and prec) were downloaded from WorldClim (Fick and Hijmans, 2017) (supplementary Script 3). Host genetic differentiation was represented by the first two principal components of an allele frequency-based Principle Components Analysis performed by Kapun and colleagues. (2018). In short, the data represent allele frequencies from more than 20 000 SNPs in short intronic sequences that evolve putatively neutral and best represent population structure. In order to select the variables that were most important for microbiome structure, we applied forward model selection of additive linear models with the ordistep function from the vegan R package (Oksanen et al., 2019).
In order to test for potential spatial autocorrelation, we followed the protocol by Borcard and colleagues., (2018) using the dbmem function (supplementary Script 2). We found no evidence for significant autocorrelation in our data after removal of the continent-wide trend in species distributions that we analysed here. All algorithms were part of the vegan (Oksanen et al., 2019) and adespatial R packages (Dray et al., 2019). Geographic distances were computed with the gdist function from the Imap R package (Wallace, 2012) (supplementary Script 4).
For the correlation of host genetic differentiation with the relative abundance of individual OTUs and bacterial families, we calculated q-values with the p.adjust function in R to account for multiple testing. Following the recommendation by Efron (2007), only significant correlations (P < 0.05) with bacterial groups with q-values (False Discovery Rates) smaller than 0.2 were considered significant.