The diversity and biogeography of abundant and rare intertidal marine microeukaryotes explained by environment and dispersal limitation

Benthic microeukaryotes are key ecosystem drivers in marine sandy beaches, an important and dynamic environment; however, little is known about their diversity and biogeography on a large spatial scale. Here, we investigated the community composition and geographical distributions of benthic microeukaryotes using high-throughput sequencing of the 18S rRNA gene and quantified the contributions of environmental factors and spatial separation on the distribution patterns of both rare and abundant taxa. We collected 36 intertidal samples at 12 sandy beaches from four regions that spanned distances from 0.001 to 12,000 km. We found 12,890 operational taxonomic units (OTUs; 97% sequence identity level) including members of all eukaryotic super-groups and several phyla of uncertain position. Arthropoda and Diatomeae dominated the sequence reads in abundance, but Ciliophora and Discoba were the most diverse groups across all samples. About one-third of the OTUs could not be definitively classified at a similarity level of 80%, supporting the view that a large number of rare and minute marine species may have escaped previous characterization. We found generally similar geographical patterns for abundant and rare microeukaryotic sub-communities, and both showed a significant distance-decay similarity trend. Variation partitioning showed that both rare and abundant sub-communities exhibited a slightly stronger response to environmental factors than spatial (distance) factors. However, the abundant sub-community was strongly correlated with variations in spatial, environmental and sediment grain size factors (66% of variance explained), but the rare assemblage was not (16%). This suggests that different or more complex mechanisms generate and maintain diversity in the rare biosphere in this habitat.

Benthic microeukaryotes are key ecosystem drivers in marine sandy beaches, an important and dynamic environment; however, little is known about their diversity and biogeography on a large spatial scale. Here, we investigated the community composition and geographical distributions of benthic microeukaryotes using high-throughput sequencing of the 18S rRNA gene and quantified the contributions of environmental factors and spatial separation on the distribution patterns of both rare and abundant taxa. We collected 36 intertidal samples at 12 sandy beaches from four regions that spanned distances from 0.001 to 12,000 km. We found 12,890 operational taxonomic units (OTUs; 97% sequence identity level) including members of all eukaryotic super-groups and several phyla of uncertain position. Arthropoda and Diatomeae dominated the sequence reads in abundance, but Ciliophora and Discoba were the most diverse groups across all samples. About onethird of the OTUs could not be definitively classified at a similarity level of 80%, supporting the view that a large number of rare and minute marine species may have escaped previous characterization. We found generally similar geographical patterns for abundant and rare microeukaryotic sub-communities, and both showed a significant distance-decay similarity trend. Variation partitioning showed that both rare and abundant sub-communities exhibited a slightly stronger response to environmental factors than spatial (distance) factors. However, the abundant subcommunity was strongly correlated with variations in spatial, environmental and sediment grain size factors (66% of variance explained), but the rare assemblage was not (16%). This suggests that different or more complex mechanisms generate and maintain diversity in the rare biosphere in this habitat.

Introduction
Microbial eukaryotes are important components of marine ecosystems, with extremely diverse taxa in multiple trophic modes including autotrophy, mixotrophy, predation, parasitism, symbiosis, osmotrophy and saprotrophy (Zubkov and Tarran, 2008;Worden et al., 2015). Thus, understanding the diversity and biogeographical patterns of microeukaryotes is critical to understanding ecological processes and mechanisms that underlie and maintain ecosystem function at both regional and global scales (Hanson et al., 2012). However, our understanding of the composition of marine microbial communities and the ecological factors that determine community structure at a global scale is very limited (Nunes-Alves, 2015), especially in the rare microbial biosphere of benthic environments Lynch and Neufeld, 2015). Additionally, understanding microbial ecosystems is critical for predicting ecosystem function as global environments change (Margules and Pressey, 2000;Worm et al., 2006). Until very recently, both microbes and marine systems were hardly featured at all in many textbooks on biogeography, which are still mostly concerned with terrestrial and freshwater macroscopic organisms. Thus, the topic of the biodiversity and distribution patterns of microeukaryotes in intertidal sandy beaches is quite a new area in biogeography.
Advances in high-throughput sequencing (HTS) combined with metabarcoding approaches have improved our ability to assess the biodiversity of microeukaryotes and address ecologically relevant questions on an unprecedented scale (Bik et al., 2012). Recently, HTS techniques have been successfully used to examine the composition and dynamics of microbial communities in different ecosystems, including surface and deep marine waters (Edgcomb et al., 2011a;Pernice et al., 2015;Yu et al., 2015), marine sediments (Gong et al., 2015), lakes and reservoirs (Simon et al., 2015;Liu et al., 2017), soils (Bates et al., 2013) and estuaries (Lallias et al., 2015). Unexpectedly, high microeukaryotic diversity has been discovered in all of these studies. For example, de Vargas et al. (2015) estimated the diversity of small eukaryotes (less than about 2 mm in size) in the sunlit ocean to be 150,000 OTUs, much higher than previously thought (de Vargas et al., 2015).
Most microbial communities consist of a large number of species partitioned into a few abundant forms and many that are represented by only a few individuals (Sogin et al., 2006;Logares et al., 2014;Liu et al., 2015). The latter, referred to as the 'rare biosphere', are likely subject to different controlling factors than the more abundant species, and an important effort of a growing body of studies in recent years has been directed toward understanding the dynamics of the rare biosphere because rare taxa can have important ecological roles, serving as nearly limitless reservoirs of genetic and functional diversity (Pedr os-Ali o, 2012; Lynch and Neufeld, 2015). Because microbes are not dispersal-limited in the same way that large organisms are, the geographical distribution of rare microbial taxa is important for understanding overall microeukaryotic diversity (Telford et al., 2006;Logares et al., 2014). Although the existence of microbial biogeographic patterns is now well established in microbial ecology, there is limited understanding of the mechanisms governing the relative influences of environmental deterministic and stochastic processes in driving these patterns (Hanson et al., 2012;Liu et al., 2015). Recent studies showed that the spatial and temporal patterns of rare and abundant taxa in marine bacteria and archea are usually different in coastal ecosystems (Gobet et al., 2012;Hugoni et al., 2013). However, some studies found similar patterns for the abundant and rare assemblages in aquatic ecosystems (Gong et al., 2015;Liu et al., 2015). At this point, it is not known whether the microeukaryotic community composition of the rare biosphere is influenced more by spatial factors (ability to disperse over great distances to new habitats) or environmental factors (ability to grow under a range of environmental conditions). Intertidal sandy beaches provide an ideal system to address these questions at a large geographical scale in a ubiquitous, complex and dynamic habitat.
Sandy beaches comprise 75% of the world's unfrozen shorelines, providing key ecosystem services, such as water purification, storm buffering, nutrient cycling and nursery habitats for resource species, and have also become important for economic development and cultural use . This habitat is very dynamic in space and time, due to the interaction of sand, waves and tides (Short, 1996). A large number of microbes and small animals (meiofauna) have adapted to the changing environmental conditions of sandy beaches (McLachlan and Brown, 2006). In addition to these residents, some species descend onto the beach from inland on the falling tide, while a variety of species move up over the beach from the sub-littoral zone on the rising tide. The open ecosystem of the sandy beach is created by the interaction of these components in a trophic network, which exchanges materials with both sea and land (McLachlan and Brown, 2006). The environmental quality of sandy beaches is important to the health and safety of beach users and the economic prosperity of coastal communities . Over the past decades, however, the microbial species diversity of sandy beaches has only been explored at very low resolution levels (i.e., with ordinary microscopy). Thus, little is known about the diversity patterns and relative importance of spatial and environmental factors for the microeukaryotic community. In fact, the information and hypotheses about global patterns of species richness on sandy beaches have mainly focused on macro-and meio-fauna in recent decades (McArdle and McLachlan, 1992;Defeo and McLachlan, 2005;Harris et al., 2014;Nel et al., 2014), and very few studies have quantified the rare microbial species (Fonseca et al., 2014). New methods using HTS and bioinformatic tools allow us to quantify microeukaryotic diversity of this important and unique habitat in a way that encompasses both the abundant and rare members of the ecosystem at a meaningful sequencing depth.
Here, we present the first comprehensive study of largescale distribution of microeukaryotic benthic communities in sandy beaches along a latitudinal gradient ranging from 24 to 428N at a continental scale (North America and Asia). We determine and compare the patterns in alpha and beta diversities of both abundant and rare microeukaryotic benthic taxa, characterized by Illumina highthroughput sequencing (HTS) of the 18S rRNA gene, across 12 intertidal stations with three replicates at each station. In addition, we assess whether spatial and environmental factors (including sediment grain size) explained the geographical patterns of both abundant and rare microeukaryotic benthic sub-communities.

General patterns of species richness
Our study system consists of 12 sandy beaches across three different regions [Xiamen (XM), Zhoushan (ZS) and Qingdao (QD) cities] of eastern China and one in Connecticut (CT) of northeastern United States ( Fig. 1 and Table  S1 in the Supporting Information). The four regions sampled span large gradients in terms of climate and latitude, spanning a geographical distance up to 12,000 km. The benthic microeukaryotes (0.22-200 lm) in 36 sediment samples, based on HTS of 18S rDNA amplicons, yielded 1,232,604 clean reads and 12,890 OTUs at 97% similarity level after singleton OTUs were removed (  Fig. 2A), but the total number of OTUs in all samples (12,890) was roughly equivalent to the number estimated by abundance-based richness estimators such as Chao 1 (12,895 6 2) and ACE (abundance-based coverage estimator, 12,927 6 56) ( Table  S3 in the Supporting Information). Both the estimated species accumulation curves (Fig. 2C) and extrapolated species richness indices (Chao 1 and ACE) indicated that the majority of the microeukaryotic taxa had been well recovered from the 36 studied sites. The Shannon index, providing an estimate of alpha diversity in each sample, ranged from 3.27 to 6.73 with a mean of 5.50 6 0.13 (s.e.) ( Table S3 in the Supporting Information) and did not differ across four regions (P > 0.05), with 5.51 6 0.31 in CT, 5.61 6 0.13 in QD, 5.31 6 0.34 in ZS and 5.52 6 0.19 in XM. Other alpha diversity indices (including OTU number, ACE, Chao 1, Simpson and Pielou) also did not show a significant difference within and across regions (ANOVA, P > 0.05, Fig. S1 in the Supporting Information).

Geographical patterns
At higher taxonomic level, the microeukaryote communities showed a distinct biogeographical distribution pattern (Fig. 3A). This was true at the OTU level for the whole dataset as well as both the abundant and rare subcommunities (Fig. 4). The relative abundance of OTUs was significantly correlated with their local occupancy (Fig. S3 in the Supporting Information) and only 27.9% microeukaryotic OTUs (3598) were shared among the four regions ( Fig. S4 in the Supporting Information). Additionally, the four sampling regions were significantly separated at P 5 0.001 for the microeukaryotic communities, and the global ANOSIM R statistics among the four groups was 0.671, 0.626 and 0.614 for the entire community, the abundant sub-community and the rare sub-community, respectively, indicating significant differences over this large spatial range (Table 1). It is notable that the Connecticut and Qindao communities were more similar to each other than other regions (Table 1, R values lower than others; Fig. 4), likely due to their similar temperate latitude. Spearman correlograms comparing Bray-Curtis community similarity vs. geographic distances among samples indicated significant negative correlations for the entire community and both the abundant and rare subcommunities, with a correlation coefficient of 20.437 (P < 0.01), 20.444 (P < 0.01) and 20.346 (P < 0.01), respectively (Figs 5 and S5 in the Supporting Information). This means that the similarity in community composition between any two regions or sites decreased with increasing geographical distances. However, we did not find a significant latitudinal diversity gradient over the nearly 17 degree range of our samples (Fig. S1 in the Supporting Information; roughly from subtropical to temperate regions).

Relationships between spatial, environmental factors and microeukaryotic community
The longest gradient lengths of detrended correspondence analysis (DCA) on the abundant and rare sub-community data sets were 2.50 and 4.32, respectively, indicating that redundancy analysis (RDA) was suitable for the abundant sub-community, while canonical correspondence analysis (CCA) was more suitable for the rare sub-community. The RDA ordination showed that six environmental variables [temperature, salinity, pH, total nitrogen (TN), total phosphorus (TP) and phosphate phosphorus (PO 4 -P)], six spatial variables [PCNM1-PCNM6 (principal coordinates of neighbour matrices)] and four grain size variables (S1, S2, S3 and S5) were significantly related to changes in the abundant sub-community (P < 0.05; Fig. S6 in the Supporting Information). The CCA ordination revealed that the rare sub-communities have a slightly different pattern in that only five environmental variables (temperature, salinity, pH, TN and ORG), five spatial variables (PCNM1-PCNM5) and four grain size variables (S1, S2, S3 and S5) were significantly related to changes in the sub-community composition (P < 0.05; Fig. S6 in the Supporting Information).
Variation partitioning showed that distance, environmental factors and grain size together explained 54% of the entire community variation, including 66% for the abundant sub-community, but only 16% of variation for the rare subcommunity (Fig. 6). Among all of these communities, the relative contribution of environmental plus grain size factors was slightly larger than that of spatial (distance) factors. We further evaluated the variation of environmental and grain size factors and found that the variation of environmental factors was larger than that of grain size factors (Fig. 6). When the three kinds of variables were independently evaluated, the pure effects of spatial factors were larger than those of pure environmental or pure grain size factors (Fig. S7 in the Supporting Information). These results indicate that environmental factors were critical to benthic microeukaryotic communities, especially the abundant sub-communities. Because environmental factors were most likely also related to spatial (distance) factors (e.g., due to latitudinal, geological or land use differences in the different regions), they were also major drivers of the Heatmap showing the relative abundance of the 43 lineages within each sample. The OTU numbers (species richness) are given in parentheses for each lineage, while the sequence data were normalized by range-scaling each class (0-1) for each sample. The lineages were classified mainly based on Adl and colleagues (2012); each of the lineages that contained more than 1000 sequence reads within entire dataset was individually presented, while the rare lineages (fewer than 1000 sequence reads) and the undetermined lineages (sequence similarity >80%) merged as 'others'. Sequence reads are 'unclassified' when sequence similarity to a reference sequence is <80%. A single dot in front of the taxon indicates the first rank classification of the protists and multicellular groups, while double dots indicate the second rank classification. C. Bar graph showing the relative abundance of sequence reads in the 43 branching lineages within the entire dataset (metacommunity including 36 local communities). [Colour figure can be viewed at wileyonlinelibrary.com] spatial differentiation observed, although the contribution of each environmental driver differed somewhat between regions.
Despite lower correlation coefficients for rare subcommunities and environmental and distance factors, compared to the abundant sub-communities, Mantel and partial Mantel tests further confirmed that both abundant and rare sub-communities were primarily governed by environmental plus sand grain size variables, rather than distance factors (Table 2).

Diversity and distribution
In our samples from temperate and subtropical intertidal marine sandy sediments, we found members of all eukaryotic super-groups as well as some phyla of uncertain position (Fig. 3). Alveolata, Opisthokonta and Stramenopiles were the most diverse and abundant groups, as observed in surface sediments from other coastal ecosystems (Adl et al., 2012;Gong et al., 2015).
Although a 200-lm pore size sieve was used to remove large metazoans, we found that the most abundant single group was the Opisthokonta, which comprised 44.4% of all sequence reads. This group was dominated by metazoans, which accounted for 40.9% of all sequences. Some large-sized organisms (esp. elongated species) can get through the 200-lm mesh. Also, the presence of smaller life stages (eggs, spores or larvae) of large-sized organisms may have contributed to the assemblage (Liu et al., 2017). The top four metazoan phyla, in terms of both OTUs and sequences, were Arthropoda, Platyhelminthes, Nematoda and Gastrotricha as seen in previous metabarcoding analyses of marine meiofaunal biodiversity (Fonseca et al., 2010;Lallias et al., 2015). Additionally, a great diversity of unicellular Opisthokonts (e.g., choanomonads, ichthyosporeans and fungi; 8.2% of all OTUs) were recovered although their abundances were very low (3.5% of total sequences). For example, the marine fungi presented high diversity (683 OTUs, 5.3% of the total OTUs) but low abundance (2.6% of the total sequences) in all of the sampling sites, similar to other  (Edgcomb et al., 2011a) and coastal surface sediments (3.5%) (Gong et al., 2015). This contrasts with deep subsurface sediments, where fungi appear to dominate the microeukaryotic community (Edgcomb et al., 2011b). The high diversity of marine fungi suggests that they may be associated with the recycling of nutrients in sandy beaches, as recent reports showed that the marine fungi are correlated with total organic carbon, nitrate, sulfide and dissolved inorganic carbon, in anoxic marine pelagic environments and especially at the seafloor (Richards et al., 2012;Orsi et al., 2013;Worden et al., 2015). Non-metazoan and non-fungal opisthokonts were also diverse although a large number of them are considered to be rare taxa, which are poorly known in many natural environments (del Campo et al., 2015). An alternative explanation for the high diversity/low abundance pattern is that opisthokont microeukaryote cells or dispersal stages (e.g., fungal spores) are ubiquitous due to lack of dispersal limitation, but not always active.
The Stramenopiles were the second most dominant group, represented mainly by Diatomea. Diatoms are eukaryotic unicellular microalgae present in all types of aquatic ecosystems (Cermeno and Falkowski, 2009). Over one-third of the total 30 cosmopolitan OTUs in our data set were diatoms, suggesting high dispersal capability and/or ecological success of these species in diverse environments. Diatomea were also the dominant group observed in European coastal sediments . MASTs are a group of heterotrophic marine Stramenopiles that are known mostly from environmental DNA surveys of both oxic and anoxic systems. They appear to be bacterivores and comprise a widespread basal branch of the Stramenopiles  that may have significant role in marine plankton systems (Massana et al., 2014). Our study recovered 84 MAST OTUs with 4744 sequences. They were minor components in all of our samples (0.38% of total sequences).
Alveolata sequences were abundant in our study, comparable to results of clone library analyses of microeukaryotes in other marine habitats, for example, anoxic deep waters (Edgcomb et al., 2011a), the water column (Pernice et al.,  The operational taxonomic units (OTUs) were defined at 97% sequence similarity threshold. The ANOSIM results are generated using the Bray-Curtis similarity data. *P < 0.05. **P < 0.01. 2013) and abyssal seafloor sediments (Scheckenbach et al., 2010). About 34% of eukaryotic ribosomal diversity in our data set could not be classified because it did not closely match any reference barcode in the SILVA 115 database and thus could not be assigned to any of the known eukaryotic super-groups. However, this unclassified diversity represented only a small proportion (about 9.2%) of total sequences because a large proportion of these OTUs were rare. Based on our knowledge, there are two possible explanations for the large number of unclassified OTUs. First, some of these may correspond to divergent rDNA pseudogenes, known to exist in eukaryotes (M arquez et al., 2003;Pan et al., 2016), or sequencing artifacts (Decelle et al., 2014). Second, it is more likely to be due to limited diversity coverage of the current V9 database of 18S rDNA sequences and alignments used to make taxon assignments (de Vargas et al., 2015;Harder et al., 2016). It has been estimated that between one-third and twothirds of all marine species, corresponding primarily to rare and minute taxa, have escaped previous characterization and are thus not represented in public databases (Appeltans et al., 2012;de Vargas et al., 2015).
Rarefaction curves showed that our sequencing effort appeared satisfactory at a global level, indicating that most Values indicate the percentage of variation explained by each fraction, including pure, shared explained and unexplained variability. Forward selection procedures were used to select the best subset of spatial, environmental and sand grain size variables explaining community variation, respectively. ANOVA permutation tests were calculated on the variation explained by each set without the effect of the other. Significance levels are as follows: *P < 0.05 and **P < 0.01. All: all microeukaryotic taxa; abundant: abundant taxa; rare: rare taxa. [Colour figure can be viewed at wileyonlinelibrary. com] of the abundant taxa present in the 12 sandy beaches have been retrieved by our sampling. This gives an estimate of the total number of OTUs on the order of 13,000. We consider this as a relatively low number of OTUs for this globally distributed set of intertidal sediment samples, even more so given that this number might have been somewhat inflated by polymerase chain reaction (PCR) or other methodological errors (Bik et al., 2012;Liu et al., 2017). The generally low OTU number is consistent with the idea that enormous population sizes and unlimited dispersal maintain the global diversity of small eukaryotes to a relatively low level, compared to larger organisms (Fenchel, 2005;Yang et al., 2010). Although the rarefaction curve for the pooled dataset shows saturation at our observed number of OTUs, individual curves for the four separate regions do not (Fig. 2B). This suggests that approximately 13,000 OTUs are well distributed globally, but that we would have needed deeper sampling or sequencing at each site to observe them all everywhere. Venn diagrams of OTU occurrence also indicate this (Fig. S4 in the Supporting Information). For example, in the whole dataset, 3598 OTUs, or 28% of the total, occurred in all four regions and hence are considered cosmopolitan. A smaller number (2835 of 12,890 OTUs), 22%, were found in only one of the four regions. Within regions, however, the pattern is different. For the four stations in Connecticut, only 14% of OTUs (1264) were common and 37% (3337 of 9104 OTUs) were found only once. In the four stations in Xiamen, 17% of OTUs (1530) were common and 38% (3439 of 9012 OTUs) were unique to one station. It is counterintuitive that a higher percent of OTUs should be cosmopolitan in the global dataset than are in the regional datasets. However, this underscores the sampling problem. In a very large dataset (12 stations with three replicates each), sequencing can be deep enough to saturate, whereas the number of sequences from each regional station is not enough to ensure that all OTUs will be encountered even if they are present. Implied by this is that the abundant taxa, which are more easily encountered even at low sequencing depth, can vary among sampling sites within a region, perhaps suggesting sensitivities of abundance to environmental conditions.

Controlling factors for geographical distributions of rare and abundant microeukaryotes
Our results illustrate that both environmental (deterministic) and spatial (stochastic) processes play significant roles in structuring microeukaryote assemblages in the sandy beach metacommunity. An important ecological aspect of biogeography is to understand the mechanisms that generate and maintain microbial diversity (Hanson et al., 2012). However, unlike the extensive studies of biogeographical patterns in terrestrial ecosystems or other marine ecosystems, studies on the spatial scales of benthic microeukaryote diversity of sandy beaches are still limited or in their infancy (Fonseca et al., 2014;Gong et al., 2015). Our study provided an opportunity to evaluate potential controlling factors and to determine whether the relative importance of local environment versus spatial factors differ for the rare and abundant sub-communities. We found that the rare sub-community biogeographic patterns were slightly different compared to those for the abundant subcommunity because the communities were significantly related to different spatial and environmental factors (Fig.  6). However, variation in both sub-communities was significantly related to temperature, salinity, pH and total nitrogen. Previous biogeographic analyses have shown that climate zone is a main predictor of species richness, increasing from temperate to tropical sandy beaches (Rafael Barboza and Defeo, 2015), but we did not find any direct correlation of different diversity measures with latitude over the approximately 178 of latitude our samples covered. Conversely, in a large-scale meta-analysis salinity was suggested as the major determinant across many (including ocean) ecosystems and to exceed the influence of temperature (Lozupone and Knight, 2007;Liu et al., 2011). We found significant relationships between diversity and both temperature and salinity in our data, though the range in both variables was not great among our sampling sites.
We also observed that both abundant and rare subcommunities had a stronger distance-decay relationship (Fig. 5), although the abundant taxa showed a stronger dispersal ability since more of them were found at multiple sites (Fig. S3 in the Supporting Information). The rare taxa were more diverse than the abundant taxa in sediment samples collected from different regions or even from the same region but different sites. Any dispersal limitation of microeukaryotes should lead to a decrease in community similarity as individuals are more likely to colonize nearby habitats (Lear et al., 2014), leading to the significant distance-decay patterns in this study. Moreover, species sorting that is adaptation to local environments can also lead to distance-decay in community similarity (Hanson et al., 2012), as evidenced by the increased environmental variability with geographic distance (Fig. S8 in the Supporting Information). Hanson and colleagues (2012) highlighted that environmental selection is a major process leading to a distance-decay relationship. We found that the environment plus grain-size factors and spatial factors are nearly equal in explaining the variation of the entire-, abundant-and rare-communities (Fig. 6), but partial Mantel tests showed that both abundant-and rarecommunities were mostly governed by environmental and sand (grain size) variables. In addition, grain size had a significant impact on the variation of microeukaryote communities, as evidenced by the explained variation being slightly lower than that of spatial and other environmental factors (Fig. S7 in the Supporting Information). Several previous studies have shown that species richness significantly increased in response to decreasing sediment particle size, especially in the intertidal zones (Lallias et al., 2015;Rafael Barboza and Defeo, 2015). Grain size and shape can influence both physicochemical properties and biological communities of benthic systems in direct or indirect ways. One abundant group, the ciliates, generally are more abundant in medium and fine sand than in silt and mud, presumably due to the restricted pore size and permeability (Patterson et al., 1989). We found quite a bit of variation in grain size among our sample sites, with QD being the coarsest and ZS being the finest (Fig. S9 in the Supporting Information).
Our analysis revealed that the largest explained fraction of the community variation in both total and abundant microeukaryotes was attributed to the joint effect of spatial and environmental factors. Although we set out to examine primarily spatial factors by focusing on a single habitat, sandy beaches, we did see some differences among sites, and geographic distance was significantly correlated with differences in environmental characteristics (Fig. S8 in the Supporting Information). More importantly, the large-scale distribution patterns in both rare and abundant subcommunities were slightly more influenced by environmental conditions than by dispersal limitation (spatial factor). However, it is notable that the unexplained variation of the rare sub-community (84%) was substantially higher than that of the entire (46%) and abundant (34%) communities. The high proportion of unexplained variation may be due to additional biotic and abiotic factors not measured in this study (Hanson et al., 2012). In fact, tide intensity and range, beach slope and inshore productivity are known to affect spatial patterns of macro-faunal communities in sandy beaches (McLachlan and Dorvlo, 2005). Moreover, based on neutral theory, ecological drift (stochastic processes of birth, death, colonization and extinction) and evolutionary drift (stochastic genetic change) could contribute to purely spatial effects and unexplained variation (Hanson et al., 2012;Zhou et al., 2014). Most likely, additive or synergistic effects of multiple factors are important in regulating the distribution of microeukaryotes in intertidal sediments across different space and time scales (Lei et al., 2014). Thus, an alternative explanation is that random processes play a much larger role in the ecology and evolution of rare microorganism than they do for the better studied abundant taxa (Bonner, 2013). These findings highlight the need to understand what governs the relative balance between stochastic and deterministic processes and what conditions would lead to stochastic processes overwhelming deterministic processes (Wang et al., 2013). With increasingly powerful survey tools and new theoretical models, obviously, extending the sampling sites and integrating more environmental factors and conducting manipulative experiments across space and time will be helpful to further advance our understanding of the influence and mechanisms of environmental selection and dispersal processes on the biogeography of benthic intertidal microeukaryotes. Additional investigations based on rare taxa are especially needed in order to better understand benthic microeukaryotic diversity and biogeography.

Conclusions
Sampling sandy beach intertial sediments over a range of spatial scales from 0.001 to 12,000 km, we found DNA sequences representing all of the major eukaryotic clades. Arthropods, diatoms and alveolates (ciliates and dinoflagellates) had the most sequences. At 12,890 OTUs (clustered at 97%), the rarefaction curve for the global dataset showed near saturation and about 30 OTUs were cosmopolitan in all replicate samples. Spatial factors (distance) explained more of the variability among sites for the abundant microeukaryotic OTUs, compared to the rare OTUs, suggesting that many of the rare forms are present at a site because of high dispersal, but do not become abundant under the extant environmental conditions at time of sampling. This was also indicated by the fact that some groups (e.g., fungi and non-metazoan opisthokonts) had high OTU richness represented by relatively few sequences. Significant distance-decay relationships for community similarity were found for both rare and abundant taxa. Our results also provide strong evidence that both environmental (habitat suitability) and spatial (dispersal) factors contributed to the distribution patterns in both rare and abundant sub-communities. However, unlike the well-explained variation of the abundant subcommunity, the high proportion of unexplained variation in the rare sub-community suggested more complex mechanisms that may generate and maintain the rare biosphere diversity in the dynamic habitat of sandy beaches.

Study sites and sampling design
In early September 2014, 36 benthic samples (12 beaches times three replicates each) were collected from the intertidal sandy zone ( Fig. 1 and Table S1 in the Supporting Information). Three replicate samples, each composed of a single core (inner diameter of 6.4 cm, length of 10 cm, volume of about 300 ml), were taken approximately 1 m apart from each other when the sampled area was still just inundated by the falling tide, and the sampled area was approximately half way between the low and high tide. An additional core and pore water sample was taken in each replicate for physicochemical analysis. The samples were transported to the laboratory and processed within 4 h of sampling. For microeukaryotic community analyses, surface (0-5 cm) sediment samples were immersed and shaken in 5 l of sterile seawater over five times and then pre-filtered with a 200-lm pore size sieve in order to mechanically remove sand grains and large metazoans. Finally, the water sample with microeukaryotes (smaller than 200 lm) was filtered onto a 0.22-lm pore size, 47-mm diameter polycarbonate membrane (Millipore, Billerica, MA, USA). The membranes were stored in a 2 ml tube and kept at 2808C until DNA extraction. To limit DNA contamination, the sterile seawater was pre-filtered through a nominal 0.22 lm pore size membrane (Millipore, Billerica, MA, USA). Filtration equipment was rinsed with sterile water before each sample filtering and cleaned after each filtration with a 10% bleach solution.

Physicochemical analyses
Pore-water temperature, salinity and pH were measured in situ with the Hydrolab multi-parameter water quality sondes (HACH, Loveland, CO, USA). The concentrations of TN, total carbon (TC) and total organic carbon (TOC) were determined by a TOC/TN-VCPH analyzer (Shimadzu, Kyoto, Japan). A spectrophotometric method was used to measure TP after digestion. The concentrations of nitrite and nitrate nitrogen (NOx-N) and PO 4 -P were determined with a Lachat QC8500 Flow Injection Autoanalyzer (Lachat Instruments, HACH, Loveland, CO, USA). Sediment properties (sedimentary organic matter content and grain size) were analysed as follows: 10 g of sediment was taken from each replicate sample, the sediment dry weight was determined before and after combustion to a constant weight at 5608C and the fraction lost by combustion was used to estimate the percentage of organic matter. The remaining sediment sample (about 300 g) was used to analyse grain size, by using the dry-sieving method (Bale and Kenny, 2005).

DNA extraction, PCR and high-throughput sequencing
Membranes with microeukaryotes were cut into small pieces, soaked in 1.5 ml of lysis buffer (0.1 M EDTA and 1% SDS) and incubated at room temperature for four hours before DNA extraction. Total DNA was extracted using the FastDNA SPIN Kit and the FastPrep Instrument (MP Biomedicals, Santa Ana, CA, USA) according to the manufacturer's instructions. The universal meta-barcoding primers 1380F and 1510R (Amaral-Zettler et al., 2009) were used to amplify the hyper-variable V9 region of eukaryotic 18S rDNA. Each DNA sample was separately amplified with the PCR, as described by Liu and colleagues (2017). PCR was carried out in 30 ll reactions with 15 ll of PhusionV R High-Fidelity PCR Master Mix (New England Biolabs, Beverly, MA, USA), 0.2 lM of forward and reverse primers and about 10 ng template DNA. Thermal cycling consisted of initial denaturation at 988C for 1 min, followed by 30 cycles of denaturation at 988C for 10 s, annealing at 508C for 30 s and elongation at 728C for 60 s. A final elongation step was done at 728C for 10 min. The PCR products were checked on 2% agarose gel and purified with GeneJET Gel Extraction Kit (Thermo Scientific, Hudson, NH, USA). Sequencing libraries were generated using NextV R Ultra TM DNA Library Prep Kit for Illumina (New England Biolabs, Beverly, MA, USA) following manufacturer's recommendations and barcode indexes were added. The library quality was assessed on the Qubit@ 2.0 Fluorometer (Thermo Scientific) and Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). Finally, the libraries were subjected to Illumina MiSeq sequencing in a paired-end 250-bp sequence read run (Illumina, San Diego, CA, USA).

Bioinformatics
Paired-end reads were assigned to each of the 36 samples according to their unique barcodes. The paired-end reads were merged with FLASH (Magoč and Salzberg, 2011). Raw sequence reads were analysed and quality filtered in MOTHUR v.1.33.3 (Schloss et al., 2009) and QIIME v.1.8.0 (Caporaso et al. 2010), with procedures as suggested by Bokulich and colleagues (2013). Sequences of length <135 or >152, average quality <30, ambiguous bases >0 or homopolymer length >6 were removed. The remaining high-quality sequence reads were aligned to a reference alignment, and those sequences that did not align to the correct region were eliminated. Chimeras were identified and removed using UCHIME (Edgar et al., 2011). Afterward, a Bayesian classifier was used to classify sequences to the phylum or subphylum level (e.g., Discoba and Ciliophora, respectively) against the SILVA 115 database (Pruesse et al., 2007), using an 80% bootstrap confidence score (Wang et al., 2007). Finally, sequences were split into lower taxonomic ranks using the UPARSE algorithms and assigned to OTUs at a 97% sequence similarity threshold. Singleton OTUs were discarded before the downstream analyses as potential sequencing errors. To enable comparisons between samples, we used a randomly subsampled subset of 34,239 sequences from each sample to standardize sequencing effort across samples. Thus, our final data set retained 1,232,604 sequence reads for the entire metacommunity.

Rare and abundant taxa
Defining the rare biosphere has been arbitrary (Lynch and Neufeld, 2015). In this study, the thresholds for regionally abundant and rare taxa or OTUs were defined based on relative abundance cut-offs Liu et al., 2015). OTUs with a mean relative abundance of >0.1% of the metacommunity were herein defined as abundant taxa, whereas those with a mean relative abundance of <0.001% are defined as rare taxa. Using these values, our data exactly matched with the previous view  that abundant taxa contribute <5% richness and >50% relative abundance, while rare taxa contribute >50% richness and <5% abundance ( Table S2 in the Supporting Information).

Analysis of alpha-and beta-diversities
Rarefaction curves and alpha diversity indices of ACE, Chao1, Shannon, Simpson and Pielou's evenness were calculated in the Vegan package of R (R Core Team, 2015), while the Good's coverage was calculated in MOTHUR. The heatmaps showing relative taxonomic abundance were generated in the pheatmap package of R. OTU networks were constructed and graphically edited in Cytoscape (Shannon et al., 2003) using the layout 'edge-weighted spring embedded' with weights.
For comparing communities, non-metric multidimensional scaling ordination (nMDS) analyses were conducted based on Bray-Curtis similarity. Furthermore, an analysis of similarity (ANOSIM) was used to statistically test for significant differences in microeukaryotic communities among the four regions, based on geographical distance. In this analysis, complete separation is indicted by R 5 1, whereas R 5 0 suggests no separation. Both nMDS and ANOSIM were performed in PRIMER 7.0 (Clarke and Gorley, 2015).

Relationships between community composition, environmental and spatial factors
Before multivariate statistical analyses, all environment variables, with the exception of pH, were log(x 1 1) transformed to improve normality and homoscedasticity. Spearman's rank correlations were used to determine the relationships between the Bray-Curtis similarity of microeukaryotic communities and the geographical distances among sampling sites and the relationship between the Euclidean distances among environmental variables and the geographical distances.
A set of spatial variables was generated using PCNM analysis (Borcard and Legendre, 2002) based on the longitude and latitude of sampling sites. RDA or CCA was performed to explore the relationships between microeukaryotic communities and spatial (based on PCNM variables), environmental and grain size variables, based on the longest gradient lengths of DCA. Before the RDA or CCA analysis, forward selection procedures were used to select subsets of spatial, environmental and grain size variables, respectively, by using the 'ordiR2step' function from the vegan package in R (Blanchet et al., 2008). All non-significant (P > 0.05) variables were eliminated in downstream analyses. The relative importance of the spatial, environmental and grain size factors and their covariations, for explaining differences in community composition was distinguished using variation partitioning with adjusted R 2 coefficients (Wang et al., 2015;Peres-Neto et al., 2006). In this analysis, the residual fraction equals the unexplained variance.
To compare the variance partitioning results, we also conducted both standard and partial Mantel tests for relationship between microeukaryotic community similarity and spatial, environmental and grain size variables (Legendre and Legendre, 2012). All statistics analyses were performed in R (http:// www.r-project.org) unless otherwise indicated (R Core Team, 2015).
In our study, the environmental factors (e.g., temperature) are continuous variables, whereas the grain size (which also is an environmental factor) is a categorical variable and for this reason we analysed it separately from the other factors.

Accession number
All raw sequence data from this study have been deposited in the GenBank's Sequence Read Archive (SRA) database (http://www.ncbi.nlm.nih.gov/) under the accession number SRP099521 (BioProject Accession PRJNA362750 and Bio-Sample accession SAMN06246921-SAMN06246956).

Supporting information
Additional Supporting Information may be found in the online version of this article at the publisher's website. Fig. S1. Latitudinal trends in intertidal benthic microeukaryotic diversity (OTU number, ACE, Chao1, Shannon, Simpson and Pielou evenness). One-way ANOVA was used to compare the differences among four regions, error bars are standard error and P values are given. Subsequently, ANOVA was used among 12 stations (three replicates in each station) and all P values are higher than 0.05. Fig. S2. Frequency distributions of all, abundant and rare OTUs from 36 samples. (A) All OTUs. Networks representing regionally abundant (B) and rare (C) taxa, respectively. Nodes represent OTUs that may connect different samples through edges (lines). Fig. S3. Spearman's rank correlation between medians of microeukaryotic OTUs abundance and number of sites occupied (n is the number of OTUs). Fig. S4. Venn diagram showing the unique and shared OTUs in the regional and local communities. The operational taxonomic units (OTUs) were defined at 97% sequence similarity threshold. Fig. S5. Spearman's rank correlations between the Bray-Curtis similarity of microeukaryotic community and geographical distance between sampling sites (n is the number of comparison and P values are indicated), [QD], [ZS] and [XM] regions located at China and [CT] region located at USA. All, all microeukaryotic taxa; abundant, abundant taxa; rare, rare taxa. Fig. S6. RDA/CCA ordinations showing the microeukaryotic community composition in relation to significant spatial, environmental and sand size variables (P < 0.05). TN, total nitrogen; TP, total phosphorus; PO 4 -P, phosphate phosphorus; ORG, organic matter content; S1-S6, different sand sizes (S1, S2, S3, S4, S5 and S6 indicate >2, 1-2, 0.5-1, 0.25-0.5, 0.1-0.25 and 0.075-0.1 mm, respectively). All: all microeukaryotic taxa; abundant: abundant taxa; rare: rare taxa. Fig. S7. Variation partitioning among spatial, environmental and sand grain size variables for the three sets of data (All: all microeukaryotic taxa community; abundant: abundant sub-community; rare: rare sub-community). Forward selection procedures were used to select the best subset of spatial, environmental and sand size variables explaining community variation, respectively. Fig. S8. Spearman's rank correlation between the Euclidean distance and geographical distance (n is the number of comparison, all 10 environmental variables were used for Euclidean distance, see 'Experimental procedures' section). Fig. S9. Sand size fraction distribution based on weight percentage for the 36 sediment samples. Table S1. Abundant and rare OTUs information for the 36 benthic microeukaryotic communities based on the Illumina dataset. Table S2. General descriptions of all, abundant and rare OTUs data sets at 97% sequence similarity level. Table S3. Diversity, richness and coverage of benthic microeukaryotic communities in each of the individual samples.