Land‐use influences phosphatase gene microdiversity in soils

Phosphorus cycling exerts significant influence upon soil fertility and productivity - processes largely controlled by microbial activity. We adopted phenotypic and metagenomic approaches to investigate phosphatase genes within soils. Microbial communities in bare fallowed soil showed a marked capacity to utilise phytate for growth compared with arable or grassland soil communities. Bare fallowed soil contained lowest concentrations of orthophosphate. Analysis of metagenomes indicated phoA, phoD and phoX, and histidine acid and cysteine phytase genes were most abundant in grassland soil which contained the greatest amount of NaOH-EDTA extractable orthophosphate. Beta-propeller phytase genes were most abundant in bare fallowed soil. Phylogenetic analysis of metagenome sequences indicated the phenotypic shift observed in the capacity to mineralise phytate in bare fallow soil was accompanied by an increase in phoD, phoX and beta-propeller phytase genes coding for exoenzymes. However, there was a remarkable degree of genetic similarity across the soils despite the differences in land-use. Predicted extracellular ecotypes were distributed across a greater range of soil structure than predicted intracellular ecotypes, suggesting that microbial communities subject to the dual stresses of low nutrient availability and reduced access to organic material in bare fallowed soils rely upon the action of exoenzymes.


Introduction
For soil microorganisms, acquiring sufficient phosphorus (P) to support essential cellular processes can be onerous. The high reactivity of the orthophosphate anion (PO 3-4 ) dictates that solution PO [3][4] concentrations are typically less than 10 mM (Ozanne, 1980;Mengel and Kirkby, 1987;Raghothama, 1999;Frossard et al., 2000). Not only is the solution concentration maintained at low levels, but sequestered P is effectively retained and difficult to access. P is partitioned between inorganic (P i ) and organic (P o ) soil pools. pH is critical for P i speciation: at low pH, soil solution P i concentrations are controlled by sorption of PO [3][4] ions to iron or aluminium oxides and oxyhydroxides (Parfitt, 1989) where protonated bidentate inner-sphere complexes dominate. At higher pH, P i chemistry is dominated by precipitation reactions with calcium, forming the increasingly insoluble dicalcium phosphate, octocalcium phosphate and hydroxylapatite, and hydrated minerals with iron and aluminium (Shen et al., 2011). For P o which may represent between 30 and 80% of total soil P (Harrison, 1987), pH is less critical for chemical speciation, but indirect influences on the soil microbiota may result in varying capabilities to mineralise different organic forms. Labile P o species include mono-and diesters as well as polyphosphate esters, however inositol hexakisphosphates (phytates, insP 6 ) and phosphonates (containing a carbon-phosphorus bond) are less labile (Turner et al., 2002;Condron et al., 2005).
In plants, the main compound for storage of P i is phytate, particularly abundant in cereal grains. Phytate is an extremely stable compound and release of phytate-P requires activity of specific enzymes -inositol phosphohydrolases, alternatively known as phytases. Phytases hydrolyse phosphate from phytate in a stepwise manner, yielding products that may then be hydrolysed further by phosphatases. The rapid turnover of crop residues in soils either directly or from the addition of plant materials in, for example, manures results in significant incorporation of plant-associated P into microbial biomass: as much as 28% of P in Medicago residues in the soil microbial biomass within 7 days has been reported (McLaughlin et al., 1988). Free phytate is rapidly mineralized in both marine and terrestrial environments (Suzumura and Kamatani 1995;Doolette et al., 2010) and organisms capable of hydrolysing phosphate esters may therefore exert a significant influence upon P cycling in soil. However, phytate can accumulate in soil as a result of interactions with mineral and organic soil components which renders it resistant to enzymatic breakdown.
There are several distinct phytase families (Mullaney and Ullah, 2003). Of these, histidine acid phytase (HAPhy), protein tyrosine phosphatase-like cysteine phytase (CPhy) and beta-propeller phytase (bPPhy) have been shown to be active in prokaryotes (Lim et al., 2007). Structural differences between the families dictate that each dephosphorylates a different site. Lower-order products of phytate dephosphorylation may also be hydrolysed by phosphomonoesterases such as PhoA, PhoD (which also exhibits phosphodiesterase activity) and PhoX (Cosgrove, 1980) which also acts upon other phosphate esters. Differences in the environmental distribution of the various phosphatase genes are apparent from sequence databases. The phoD gene distribution is rather cosmopolitan, being dominant in a wide variety of environments from marine, including hydrothermal vents, and terrestrial systems in association with plants and animals (Ragot et al., 2015), however phoX is less widely distributed but more abundant in terrestrial settings than marine (Ragot et al., 2017). For phytases, there is a clear distinction between HAPhy and CPhy genes which are present only in terrestrial environments and bPPhy genes which are found in both marine and terrestrial environments (Lim et al., 2007) Microorganisms rely on exoenzymes to degrade complex organic material into simple forms to facilitate nutrient acquisition. In common with many organic compounds in soil, within relatively short timescales P o may become stabilized by incorporation into macro-and micro-aggregates (L€ utzow et al., 2006) rendering physical contact between cells and P o challenging. The combination of chemical complexity and spatial separation in soils means that understanding the distribution and abundance of exoenzymes is critical to development of more accurate models of nutrient cycling in soils.
Plants have a limited capacity to access phytate in soil (Brinch-Pedersen et al., 2002). Microbial activity therefore exerts a significant influence upon the ability of plants to access P o (Steffens et al., 2009;Richardson and Simpson, 2011), particularly in low P soils, or low input agriculture. Reduced use of inorganic agricultural fertilizers in response to financial pressures or to reduce nutrient loss to groundwater, or increased use of manures may require efficient microbial re-mining of P to ensure plant health and productivity. It is therefore important to understand the diversity of phosphatase genes in soils and the influence of land-use strategies. We therefore established a study of phosphatase genes in soil from a long-term experiment to investigate the effect of land-use upon gene abundance, activity and diversity in soil. Since exoenzymes are likely to play an important role in the mineralisation of P o in soil we interpreted our results in the light if bioinformatics-based predictions of enzyme subcellular localisation. We demonstrate that phytate is utilized as a source of P most readily by cultivable bacteria from soil maintained as bare fallow, and this is associated with greater proportions of PhoD, PhoX and bPPhy exoenzymes in the bare fallow soil than in grassland or arable soils. Notwithstanding this shift in abundance, there is a remarkable degree of genetic similarity across the soils despite the extreme differences in land-use.

Phosphorus in Highfield soils
Soil P was assessed as NaOH-EDTA extractable P from the Highfield land-use treatments (Fig. 1). The greatest concentration of P was identified in grassland soils (661.7 6 31.3 mg-P g 21 ). After 50 years of minimal inputs, bare fallow soils still retained measurable P concentrations (235.0 6 3.8 mg-P g 21 ). Arable soils contained intermediate levels of extractable P (517.0 6 12.6 mg-P g 21 ). For all comparisons, we employed one-factor analysis of variance (ANOVA): where a significant effect (a 5 0.05) of land-use was identified, differences between land-use means were determined by Holm-S ıd ak post-test comparisons. ANOVA indicated a significant effect of land-use upon soil P concentration. Comparisons of land-use means indicated that P concentration in each soil was significantly different from the others. The general trend in P of grassland > arable > bare fallow matches those observed for other soil measures previously reported for Highfield soils including carbon and nitrogen content (Table 1).
Quantitative comparisons of NaOH-EDTA extractable orthophosphate and phytate in the soils using suppressed ion conductivity-HPLC showed a significant effect of management upon orthophosphate (Fig. 1A), with greatest concentrations in grassland and arable soils (279.4 6 17.4, 252.6 6 8.4 mg g 21 dry soil, respectively) which were not significantly different, in contrast to 105.7 6 6.7 mg g 21 extracted from bare fallow soil (significantly less than extractable orthophosphate in either grassland or arable Phosphatase gene microdiversity in soil 2741 soils). Greater differences were observed in the amounts of extractable phytate, where both myo-and scyllo-IP 6 stereoisomers were identified for all three contrasting soils (Fig. 1B). A significant effect of land-use was observed for total (myo 1 scyllo) phytate from the soils. The greatest amounts were extracted from grassland soils (77.1 6 6.6 mg g 21 ), compared with 38.1 6 3.2 and 22.9 6 2.8 mg g 21 extracted from arable and bare fallow soils, all mean estimates significantly different (p < 0.001). These values are typical of phytate estimates from a range of agricultural soils (Jarosch et al., 2015).

Soil microbial community response to phytate
Growth rates of soil microbial communities were measured in a defined medium with phytate as the only added source of P. Based upon estimates of NaOH-extractable P, the maximum amount of autochthonous P added to the cultures was approximately 66 mg with grassland and 24 mg in bare fallow soils. These amounts are negligible compared with the amount of phytate added (2.94 mg) and are unlikely to have confounded the results. Preliminary experiments indicated that where phytate was added to growth media, orthophosphate, assessed spectrophotometrically, accumulated to a greater extent in medium inoculated with bare fallow soil, apparently in response to cells entering stationary phase of growth (see Supporting Information Fig. S5). We observed highest microbial growth rates in cultures inoculated with bare fallow soil ( Fig. 2A). Grassland soil communities appeared to be particularly ineffective in accessing the added phytate-P. In support of growth rate estimates, maximum likelihood (ML) MPN estimates based on data from the three replicate plots for each land-use type and using the same defined medium as used for growth estimates indicated significantly greater numbers of phytase-positive cultivable organisms in bare fallow soil (based upon the distribution of the ML 95% confidence intervals associated with the MPN estimates); arable and grassland soils were not significantly different (Fig. 2B). At the dilutions tested (10 22 2 10 25 -fold) there was no evidence of growth in dilution series of soils in the absence of added phytate: cells within replicates (n 5 8 at each dilution) for each soil did not violate the assumption of non-random distribution (Grassland,  A. There was a significant effect of land-use upon orthophosphate (PO 3-4 ) concentrations in the three soils. There was no significant difference between the mean orthophosphate concentration in grassland and arable soils (t 5 2.01; p 5 0.056) but both soils had significantly higher concentrations than bare fallow soil (p < 0.001). B. Phytate (IP 6 ) concentrations: only myo-and scyllo-stereoisomers were detected. There was a significant effect of land-use upon the total phytate concentration in the three soils. All mean phytate concentrations were significantly different from the other means (p < 0.001) Data was generated using suppressed ion conductivity HPLC of NaOH-EDTA soil extracts. In each case the mean of three estimates 6 standard error of the means are shown.
v 2 5 2.61, p > 0.05; Arable v 2 5 7.31, p > 0.05; Bare Fallow, v 2 5 2.69, p > 0.05). The observation that under conditions where phytate is the sole source of P, cultivable microbes in bare fallow soil are both more numerous (based upon ML-MPN estimates) and grow more quickly, are in marked contrast to other estimates of biological activity reported for these soils. Estimates of extractable ester-linked fatty acids and basal soil respiration (Gregory et al., 2009), extractable DNA (Hirsch et al., 2009) and extractable phospholipid fatty acids, biomass carbon and ATP content (Wu et al., 2012) are all greatest in grassland soils and least in bare fallow soils with arable soils being intermediate. Thus, despite bare fallow soils having the lowest biomass, there were greater numbers of cultivable phytase-positive organisms which were able to grow more quickly and mineralize phytase more extensively in this study. This suggests a shift in the capacity of the community in bare fallow soil to exploit phytate as a P-source.

Phylogenetic annotation of Highfield metagenome datasets
Soil from three replicate plots for each of the three landuses were used to generate shotgun metagenome sequences. After filtering to remove substandard sequences, the average size of each metagenome was 356.7 million sequences (range 576.4-228.0 million) with an average sequence length of 100 nucleotide bases. Phylogenetic analysis of the complete datasets was performed using MEGAN and comparison against the RefSeq nonredundant protein database held at NCBI. Phylogenetic annotation of the sequences indicated that in each soil Actinobacteria, Alpha-proteobacteria and Bacteroidetes/ Chlorobi accounted for approximately 50% of the sequences (Fig. 3A). SIMPER analysis of Bray-Curtis similarity coefficients indicated a 12% dissimilarity across the averaged abundances for the three land-uses. Again using Bray-Curtis similarity coefficients, PERMANOVA analysis indicated a significant effect of land-use (pseudo-F 5 4.34; p 5 0.004 based upon 9,999 random permutations). A ternary plot of the relative distribution of orders/sub-phyla determined by MEGAN among the three land-uses ( Fig. 3B) indicated that the Alpha-proteobacteria, Actinobacteria and Bacteriodetes were dominant in all three landuse soils but that members of the Gemmatimonadetes and Armatimonadetes were particularly associated with bare fallow soil, although at relative low abundance.

Abundance of phosphatase genes in Highfield soils
The bare fallow soil microbial community appears to be able to grow more efficiently on phytate compared with grassland or arable soils. We therefore interrogated metagenome sequence datasets generated from each soil for the presence of functional genes involved in dephosphorylation of organic P compounds. From the sequenced environmental DNA, we first calculated the mean abundance of four single-copy ubiquitous genes as an estimate of the number of genome-equivalents (GE) represented in each metagenome. There was no significant difference between the mean GE of 34,485 for grassland, 30,967 for arable and 33,646 for bare fallow soils (F 2,6 50.06; p 5 0.946). The abundance of each phosphatase gene was compared against the estimates of GE for each metagenome dataset to determine the frequency (%GE) of each gene in each soil (details are given in SI3).
A comparison of homologue frequency across all genes ( Fig. 4) indicates that phosphomonoesterases are more abundant in soil than phytases: phoD in particular is much more abundant than the other phosphatase genes. This is   consistent with analysis of phoA, phoD and phoX abundance in publically available bacterial genomes and metagenome datasets (Tan et al., 2013;Ragot et al., 2015;Ragot et al., 2017). In the case of the phosphomonoesterases (where grassland soil contained the greatest abundance), phoD represented 52%GE compared with 3%GE and 8%GE for phoA and phoX respectively. We could identify no significant effect of land-use upon phoD ecotype abundance (Fig. 4), contradicting previous studies which have shown that fertilisation causes changes in phoD abundance in agricultural soils assessed by quantitative PCR (Tan et al., 2013;Fraser et al., 2015). There was an effect of land-use upon phoA abundance; with significantly fewer homologous sequences identified from bare fallow than either arable or grassland soils.
PhoA is the least abundant alkaline phosphatase gene in soil -a similar distribution is observed in marine systems (Sebastian and Ammerman, 2009;Luo et al., 2011). Nevertheless, phoA homologues were more abundant than phytases, where mean abundances were less than 2% of all prokaryotes. Of the phytases studied here, the HAPhy gene was the least abundant: CPhy and bPPhy genes were equally abundant in the soils, in contrast to marine systems where bPPhy is the dominant phytase gene (Lim et al., 2007). The trend in gene homologue frequencies was for greater numbers in grassland than bare fallow soils, with abundance in arable soils being intermediate. bPPhy-homologous genes showed the reverse trend although differences between land-use were not significant. Thus, the most abundant phytase in soil was dependent upon land-use: CPhy was dominant in grassland soils, bPPhy in bare fallow soil: the genes were equally abundant in arable soil (Fig. 4). Collectively, the genes responded in one of two ways to land-use: phoA, CPhy and HAPhy gene homologue frequencies all responded significantly to land-use; phoD, phoX and bPPhy gene homologues showed no such response.

Phylogenetic distribution of phosphatase genes in Highfield soils
To identify the potential reasons for the two groups of genes' contrasting response to land-use, we used our MApPP workflow (SI3) to assess both abundance and phylogenetic structure of the community associated with each gene. All PhoA enzymes were predicted to be periplasmic by PSORTb; all metagenomes were dominated by sequences with homology to a clade of actinomycete phoA genes (cluster 1, Fig. 5). The proportion of metagenome reads associated with cluster 1 were greatest in grassland and least in bare fallow metagenomes (0.76%GE in grassland, 0.54% in arable soils and 0.28% in bare fallow soil). There was a significant effect of landuse upon the proportion of reads associated with this cluster (F 2,6 520.1; p 5 0.002) with all land-uses being significantly different from the others (p 5 0.024). There was a second group of metagenome reads placed deeply within an Enterobacteriaceae clade (phoA cluster 2). The Fig. 5. Phylogenetic placement for metagenome sequences showing homology to the alkaline phosphatase phoA and histidine acid and PTPlike cysteine phytase genes. In each case there was a significant reduction in the normalized homologue abundance from grassland to bare fallow soils. Genes for which an exoenzyme is predicted are shown as red branches on the ML reference tree. Details of metagenome analysis are provided in the supplementary information. Histidine acid phytase cluster 1 represents gene sequences from Yersinia, Pandoraea, Edwardsiella, Serratia and Burkholderia; cluster 2, Escherichia, Cronobacter and Shigella; cluster 3 a group of sequences showing limited homology to clusters 1 and 2; and cluster 4, Enterobacter sp. strain 638. Cysteine phytase cluster 1 represents gene sequences from Bdellovibrio spp. PhoA cluster 1 represents a clade made up of phoA genes associated with Streptoalloteichus, Amycolatopsis, Cellvibrio and Janibacter species; cluster 2 represents a group of phoA genes from Escherichia, Citrobacter, Providentia, and Shigella; cluster 3, Streptoalloteichus hindistanus; and cluster 4, Janibacter sp. HTCC 2649. Clusters for which a significant effect of land-use was observed are indicated with *. Trees were generated using iTOL v3 (Letunic and Bork, 2016). mean proportion of GE associated with cluster 2 were again greatest in grassland soil (0.36% GE), least in bare fallow soil (0.17%) and intermediate in arable soil (0.31%) with a significant effect of land-use (F 2,6 520.5; p 5 0.002). Post-test comparisons indicated that the bare fallow soil exhibited a significantly reduced proportion than either the grassland (t 5 6.15; p 5 0.003) or arable (t 5 4.60; p 5 0.007) soils, but that there was no difference between the proportions for grassland or arable soils. Other placements show a similar trend in GE but no overall significant effect of land-use.
A number of CPhy proteins were predicted to be secreted: genes coding for these extracellular enzymes are shown as red branches in the reference phylogenetic tree (Fig. 5). Placement of metagenome reads for all soils was dominated by sequences assigned to the gene of the deltaproteobacterium Bdellovibrio bacteriovorus (CPhy cluster 1, Fig. 5), recently demonstrated to have a high degree of specificity for IP 6 (Gruninger et al., 2014). The B. bacteriovorus CPhy enzymes were predicted to be extracellular. The number of metagenome reads with homology to the B. bacteriovorus CPhy were reduced in bare fallow soil (0.17% GE) compared with either arable (0.56%) or grassland (0.83%) soils and there was a significant effect of landuse (F 2,6 515.97; p 5 0.004). Post-test comparisons indicated that bare fallow soil exhibited a significantly reduced proportion when compared with grassland (t 5 5.62; p 5 0.004) or arable (t 5 3.31; p 5 0.032) soils. There was no difference between the proportions for grassland and arable soils. The only other placement of significant numbers of reads was a group showing homology to the CPhy gene associated with the Firmicute Veillonella.
The third gene for which significant reductions in abundance were observed was HAPhy. As with PhoA, HAPhy enzymes were predicted to be periplasmic. In this instance, placement of soil metagenome reads was dominated by two proteobacterial clades: (HAPhy clusters 1  and 2, Fig. 5). These phylogenetic associations were similar to that observed for phoA. There was a significant effect of land-use on the summed proportion of GE placed in cluster 1 (grassland 0.268%, arable 0.205%, bare fallow 0.129%; F 2,6 58.17; p 5 0.019) but post-test comparisons indicated that only the difference between grassland and bare fallow was significant (t 5 4.04; p 5 0.020). Although a similar trend for greatest GE associated with grassland soil and the least for bare fallow soil was observed for cluster 2, there was no significant effect of land-use.
The largest placement of reads was at a deep branch within the tree (cluster 3, Fig. 5), suggesting dominant environmental HAPhy gene sequences may exhibit poor homology to sequenced genes. In all cases, the proportion of GE associated with placements was greatest in grassland soil, least in bare fallow soil (grassland 0.136%; arable 0.099%; bare fallow 0.026%): there was a significant effect of land-use (F 2,6 556.82; p < 0.001). Each landuse was significantly different from the others (smallest difference, grassland versus arable 5 0.038%, t 5 3.61; p 5 0.011). Lastly, some metagenome reads showed high Fig. 6. Phylogenetic placement for metagenome sequences showing homology to the alkaline phosphatase phoD and phoX and b-propeller phytase genes. In each case there was no significant effect of land-use upon the normalized homologue abundance from each soil. Genes for which an exoenzyme is predicted are shown as red branches on the ML tree. Details of metagenome analysis are provided in the supplementary information. PhoD cluster 1 contains diverse sequences including Cupriavidus, Ralstonia, Burkholderia, Rhodanobacter, and Candidatus Koribacter; phoD cluster 2 represents sequences from Variovorax, Acidovorax and Comamonas; phoD cluster 3 represents sequences from Gemmatimonas, Gemmatirosa, Planctomyces and Rhodopirellula. PhoX cluster 1 represents genes from Variovorax, Alcanivorax, Methylophaga, and Candidatus Accumulibacter phosphatis; cluster 2 represents genes from Acidovorax, Sphingomonas, Novosphingobium, Ramlibacter and Janthinobacterium; b-propeller phytase cluster 1 represents phytase genes from Bacillus, Paenibacillus, Alteromonas and Cyanothece. Clusters for which a significant effect of land-use was observed are indicated with *. Trees were generated using iTOL v3 (Letunic and Bork, 2016). homology to the HAPhy gene sequence of Enterobacter sp. strain 638 (HAPhy cluster 4, Fig. 5), an endophytic gammaproteobacterium with plant-growth promoting activity (Taghavi et al., 2009). In this case, there were 20 metagenome reads showing homology from one sample from bare fallow soil, but none from any other sample.
In the case of the alkaline phosphatase PhoX, a number of enzymes were predicted to be extracellular (Fig. 6) although there is little correlation between phylogeny and extracellular localisation. In grassland soil, phoX homologues were most closely related to a proteobacterial clade (phoX, cluster 1, Fig. 6). The greatest number of reads for all soils within this clade showed high homology to the phoX sequence of the gammaproteobacterium Methylophaga frappieri with 0.499% GE from grassland metagenomes, 0.549% from arable metagenomes and 0.292% from bare fallow metagenomes. There was no significant effect of land-use upon the proportion of GE from each soil.
A distinct difference between the distributions of metagenome reads for phoA, CPhy and HAPhy homologues which showed similar phylogenetic distributions but a trend in reduced numbers of reads from grassland to bare fallow soil and phoX, is that a shift in phylogenetic distribution of reads from those associated with grassland and arable soils to those associated with bare fallow soil is evident. For bare fallow soil, the dominant placement (0.730% GE in metagenomes) showed high homology to the betaproteobacterium Ramlibacter tataouinensis TTB310 (phoX cluster 2, Fig. 6), originally isolated from semi-arid sandy soil (De Luca et al., 2011). Only 0.111% GE and 0.168% grassland GE showed such homology. There was a significant effect of land-use (F 2,6 55.95; p 5 0.038), however post-test comparisons did not indicate any significant differences. A second set of reads within this cluster showed high homology to phoX of a second betaproteobacterium Janthinobacterium lividum. In this case, there were 0.120% GE from bare fallow metagenomes, 0.020% from arable metagenomes and 0.011% reads from grassland soil. There was a significant effect of land-use (F 2,6 58.60; p 5 0.017) and post-test comparisons indicated a significantly greater proportion in bare fallow than either grassland (t 5 3.73; p 5 0.029) or arable (t 5 3.43; p 5 0.028) metagenomes but no difference between grassland and arable soils. A subcellular localisation for either PhoX protein could not be assigned. However, both are part of a clade predicted to be extracellular, suggesting that bacteria dominating bare fallow soil express exoenzymes. The TTB310 genome possesses type II and III secretion systems -evidence that secreted hydrolases may be important in the desert sands where it originated (De Luca et al., 2011).
For the dominant phosphatase gene, phoD, there is a clear correlation between phylogeny and subcellular localisation. A number of deeply rooting clades code for extracellular enzymes. The greatest number of placements was from arable soil, where 3.145% GE showed high homology to phoD of the gammaproteobacterium Rhodanobacter spathiphylli B39 (phoD cluster 1, Fig. 6). In comparison, this placement represented 2.38% GE of grassland metagenomes and 2.273% in bare fallow soil metagenomes but there was no significant effect of landuse. A closely related phoD homologue from the acidobacterium Ca. Koribacter versatilis was also most abundant in arable soil (2.41% GE) compared with grassland and bare fallow soils (1.60% and 2.00% respectively). Organisms harbouring B39 or Ca. K. versatilis phoD homologues (possibly also the phoX homologue of Methylophaga frappieri) appear well adapted to arable soil, however whether this was due to P-availability is unclear since phosphate fertilizer is added. B39 was isolated from the rhizosphere of the monocotyledon Spathiphyllum (De Clercq et al., 2006) -it is possible that these phoD-expressing organisms are suited to the wheat rhizosphere. Other, more deeply placed sequences were also most abundant in arable soil. The B39 PhoD enzyme could not be assigned a subcellular compartment. The Ca. K. versatilis protein was predicted to be extracellular. These homologues are unique among the genes studied here, as the dominant placements are associated with arable soil. A second phoD clade dominates bare fallow soil (phoD cluster 2, Fig. 6). The largest numbers of metagenome reads in this clade had high homology to a Burkholderiales bacterium -JOSHI 001, 2.05% GE from bare fallow soil, compared with 1.06% and 1.09% from grassland and arable soils respectively. There was a significant effect of land-use (F 2,6 524.17; p 5 0.001) and the abundance was significantly greater in bare fallow soil than either grassland (t 5 6.11; p 5 0.003) or arable (t 5 5.93; p 5 0.002). Abundance in grassland and arable soils were not significantly different. PSORTb predicted PhoD of JOSHI 001 to be extracellular.
Within the same large clade, a number of metagenome reads showed high homology to phoD of Gemmatirosa kalamazoonensis KBS708, an agricultural soil isolate (De Bruyn et al., 2013) and the related phoD sequence of polyphosphate-accumulating Gemmatimonas aurantiaca T-27 (phoD cluster 3, Fig. 6). The combined abundance of these two closely related sequences was greatest in bare fallow soils (0.961% GE) compared with arable (0.591%) and grassland (0.420%) and there was a significant effect of land-use upon the abundance (F 2,6 57.71; p 5 0.022) but post-test comparison indicated a significant difference only for grassland and bare fallow soils (t 5 3.84; p 5 0.025). PhoD proteins from both these strains were predicted to be extracellular. The observed shift in phoD community structure dependent upon land-use is consistent with previous studies based on PCR and DGGE (Tan et al., 2013;Fraser et al., 2015).
For bPPhy, there is a clear correlation between phylogeny and enzyme secretion, with a large phyletically-mixed clade (bPPhy cluster 1, Fig. 6) expressing exoenzymes and a second, more diverse clade with only a limited number expressing outer membrane-associated enzymes, the others being intracellular or of unknown localisation. The metagenomes for all soils were dominated by a set of reads showing homology to bPPhy from the cyanobacterium Cyanothece (bPPhy cluster 1, Fig. 6). For this placement, the greatest number of reads were identified in bare fallow soil (0.665% GE) with arable (0.283%) and grassland (0.264%) soils having a similar proportion of reads. There were also placements within this cluster showing high homology to the gammaproteobacterium Alteromonas macleodii (0.21% GE from bare fallow soil, 0.053% from grassland and 0.054% from arable). Of these two, a significant effect of land-use was only observed for Cyanothece-related abundance (F 2,6 58.3; p 5 0.019) and post-test comparisons indicated that the bare fallow soil abundance was significantly greater than that for either the grassland (t 5 3.61; p 5 0.033) or arable (t 5 3.44; p 5 0.027) metagenomes. The A. macleodii bPPhy protein was predicted to be extracellular. Neither of the Cyanothece sp. bPPhy proteins could be assigned a subcellular compartment but were the only gene sequences placed within the clade which were not predicted to be extracellular. Other Cyanothece bPPhy proteins are predicted to be extracellular (e.g. B1WZU4_CYAA5 or A3ITE0_9CHRO), possibly indicating that the homologous sequences from these soils may code for extracellular phytase. Although this clade contains bPPhy sequences from Bacillus spp., few metagenomic sequences were detected suggesting that Bacillus phytases in sequence databases are relatively uncommon in soils.
Response of phosphatase gene expressing organisms in Highfield soils to land-use -Microbial communities in Highfield soils respond to land-use in different ways and display distinct gross phenotypes. Added phytate is mineralized more rapidly in soil fallowed for over fifty years compared with either arable soil receiving regular phosphate addition in fertilizers, or grassland soil. All measures of soil phosphate and P-containing compounds showed a consistent decrease from grassland to arable to bare fallow soils and previous studies indicate that organic matter and microbial abundance follow the same trend (Gregory et al., 2009;Hirsch et al., 2009).
In spite of this, MPN estimates for organisms capable of acquiring P from phytate were highest in bare fallow soil, evidence of a phenotypic shift in soil microbial communities. Analysis of phytase and phosphatase genes in metagenomes provides compelling phylogenetic evidence of this shift. For the genes phoD, phoX and bPPhy there were significant shifts in phylogeny, apparently towards organisms expressing exoenzymes in bare fallow soil.
Microbial secretion of exoenzymes has often been observed in response to low nutrient conditions or complex chemistries (Koch, 1985;Allison and Vitousek, 2005;Delpin and Goodman, 2009). The shift here is possibly in response to reduced P bioavailability in bare fallow soil. Organic matter in this soil is predominantly intra-aggregate (Hirsch et al., 2009): not only do we measure reduced organic P in these soils, but access to it is likely to be more challenging because of occlusion within aggregates. Our observations are consistent with trait microdiversityoccurrence of ecologically distinct populations within phylogenetically-related groups (Zimmerman et al., 2013) -suggesting microdiversity within phosphatase enzymes may be important in soil microbial communities and for soil fertility. However, not all observed responses of phosphatase gene homologues are necessarily driven by P availability. For instance, CPhy are dominated by sequences with homology to the CPhy gene of Bdellovibrio bacteriovorus, a predator of other bacteria. In this case the significant reduction of homologues from grassland to bare fallow soils is likely to be a direct response to reduced bacterial biomass observed for these soils (Hirsch et al., 2009). Similarly, it is difficult to determine whether the increase in the cyanobacterium Cyanothece sp. bPPhy homologues in bare fallow soil is due to adaptation to low P availability -or indeed autotrophic metabolism more generally -or a response to reduced competition for light. It is also worth noting that the soils used in this study exhibited a progressive reduction in pH from grassland to bare fallow soil of almost a whole pH unit (Table 1). Soil pH has a clear effect upon both phosphomonoesterase and phosphodiesterase activities (Turner and Haygarth, 2005;Turner, 2010), some of which can be accounted for by varying pH optima of alkaline and acid phosphatases, and the community of phoD-expressing microbes (Ragot et al., 2015).
To address these issues, responses of phoD, phoX and bPPhy ecotypes to soil chemical and structural factors were assessed using canonical correspondence analysis (CCA, Ter Braak, 1986). Counts of individual ecotypes represented in Fig. 6 acted as 'species' data: counts less than 5 were treated as 0 and ecotypes which were present in only 1 of the 9 soils were removed from the analyses. For each gene, ecotypes were divided into one of three categories based upon predictions of sub-cellular localisation by PSORTb: ecotypes which were predicted to be extracellular (shown in Fig. 6); ecotypes which were predicted to be intracellular; and ecotypes for which no localisation could be predicted. The pH, C org and Nitrogen data presented in Table 1 as well as orthophosphate concentrations presented in Fig. 1 were used to represent soil chemical environments and the ratio of intra-aggregate C org to free C org (intra-aggregate ratio) -calculated from data presented in Table 1 -was used to represent soil physical structure. C org and N were significantly correlated in the soils (r 5 0.999, p < 0.001) and so only C org was used in CCA. The variables were included in a separate model for each gene and model significance was estimated based upon 9,999 Monte Carlo permutations.
For phoD, although CCA clearly separated the land-uses (Fig. 7A), eigenvalues (k) based upon the distribution of ecotypes between the three soil types associated with Axes 1 and 2 were not significant (p > 0.05). However, the median Axis 1 score for predicted extracellular ecotypes (-0.016, 95% confidence interval 20.094 to 0.042) was significantly lower than that for intracellular (0.142, 95% confidence interval 0.044 to 0.216) ecotypes (Mann-Whitney test, U 5 8122.5, p < 0.001). For phoX (Fig. 7B), there was a separation of ecotypes on both Axes 1 (k 5 0.111, p 5 0.024) and 2 (k 5 0.025, p 5 0.035). To examine the environmental parameters responsible for ecotype groupings, scores were correlated with environmental parameters: both pH (r 5 20.959, p < 0.001) and orthophosphate (r 520.882, p < 0.001) were negatively correlated with Axis 1 ecotype scores and intra-aggregate ratio was positively correlated (r 50.915, p < 0.001). There was no significant correlation of C org with Axis 1 scores. Intra-aggregate ratio exerts a stronger effect than either pH and P for some of the PSORTb predicted extracellular phoX ecotypes; this is also the case for some of the ecotypes for which a subcellular location could not be predicted. There is an apparent distribution on Axis 2 which is associated with plant type, separating ecotypes between grass and arable (wheat) systems. The ordination of bPPhy gene ecotypes (Fig. 7C) is similar, although only Axis 1 was significant (k 5 0.093, p 5 0.045). Again, ecotypes scores were negatively correlated with both pH (r 5 20.964, p < 0.001) and orthophosphate (r 5 20.905, p < 0.001), but positively correlated with intra-aggregate ratio (r 5 0.916, p < 0.001). Intra-aggregate ratio again appears to exert greater influence upon ecotype, especially those predicted to be extracellular, than either P or pH. Extracellular ecotypes of each gene predominate in soils low in P and pH but having high intra-aggregate ratios. It is difficult to separate potential chemical effects (pH or orthophosphate) from soil structural effects (intra-aggregate ratio) however, there is a common trend in the ordinations -ecotypes for which an extracellular mode is predicted are distributed across a greater range of intra-aggregate ratios than predicted intracellular ecotypes. This suggests that microbial communities subject to the dual stresses of low nutrient availability and reduced access to organic material in bare fallow soils rely more upon the production of exoenzymes (Allison, 2005;Allison et al., 2011).
Irrespective of whether community shifts observed were driven exclusively by P-availability, soil structural effects, or a combination of both it is clear that land-use has the potential to induce shifts in the microbial community towards optimal states with benefits for nutrient cycling; this is particularly the case for the expression of exoenzymes by the microbial community. A salient observation from the CCA ordination of gene ecotypes is that the plant community itself (in this instance either mixed grasses and forbs, or wheat) appears to exert little structuring effect upon the distribution of functional genes. It is probably a combination of soil chemical and structural properties which exert the greater influence. A similar response of microbial communities has been observed across a broader range of soils (conventional and organic arable, pasture, natural grassland and deciduous forest) than used here (Kuramae et al., 2011). In this case, pH and phosphate were identified, along with C:N ratio, as exerting significant influence upon total bacterial community structure. Triangles represent the centroids of individual replicate plots of grassland (green), arable (yellow) and bare fallow (brown) soils with regard to the ordination of relative frequencies across ecotypes. Environmental factors (pH, P, C org and intra-aggregate ratio) are represented as vectors and increase in the direction of the vector: vector length indicates the degree of correlation of each environmental variable with ecotype relative frequencies. Values on each axis indicate the percentages of total variation explained by the axis, the eigenvalue (k) and the permutation-based significance of the amount of total variation explained by the axis. Type 2 scaling is used in all cases.
Although the bare fallow soil studied here is extreme, it may be that periods of limited soil inputs maintain active populations of exoenzyme expressing organisms which are otherwise out competed in soils where nutrients are maintained in a high and freely available state. The dominant enzymes in the fallowed soil that we have demonstrated to overcome microbial P limitation will also make useful targets for heterologous expression in crop plant roots to improve crop P acquisition (e.g. George et al., 2005;Chan et al., 2006). It is also clear that in some cases, environmental genes associated with P-cycling are poorly represented in sequence collections and that in the case of the bPPhy well characterized enzymes such as Bacillus spp. do not appear to be well represented in the soils described in this study. Clearly, the wider relevance of changes in functional gene abundance and diversity remain uncertain in the absence of gene expression data. Since regulation of genes associated with the PHO regulon is tightly coupled to bioavailability of orthophosphate ions we chose not to study gene expression via metatranscriptomes generated from theses soils at this juncture since the timing of sampling will be crucial. However, the differences in gene abundance and especially the observed shift to exoenzymes in a land-used dependent manner warrant further study to link gene expression and phosphatase and phytase enzyme activity in these soils.

Soils
Experimental soils were collected from triplicate permanent grassland, arable and bare fallow plots (each of 50 3 7 m) of the Highfield Ley-Arable experiment (00:21:48W, 51:48:18N) at Rothamsted Research. The soil is a silty clay loam (27% clay) (Chromic Luvisol according to FAO criteria). At the time of sampling, arable plots had been under continuous arable (winter wheat, Triticum aestivum L., most recently Hereward seed coated with Redigo V R Deter V R combination insecticide/fungicide treatment, Bayer CropScience) and receiving ammonium nitrate fertilisation to provide approximately 220 kg-N ha 21 annum 21 , and additional 250 kg-K ha 21 and 65 kg-P ha 21 every three years for 62 years, bare fallow plots had been maintained crop-and weed-free by regular tilling for 52 years, and grassland plots had been maintained as a managed sward of mixed grasses for more than 200 years, mowed twice during summer months: all plots are considered now to be in quasi-equilibrium (Wu et al., 2012). Physical and biological data has already been reported for these soils (Table 1).

Phosphorus chemistry in soils
We employed NaOH-EDTA extraction (Bowman and Moir, 1993) to estimate the amounts of orthophosphate and phytate in each soil. Total P in extracts was assessed by ICP-OES. Phytate and orthophosphate concentrations were assessed by suppressed ion conductivity HPLC (SIC-HPLC). Details are given in Supplementary Information 1 (SI1).

Growth and activity of soil microbial communities in response to added phytate
To assess the ability of microbial communities in the soils to mineralize and grow on phytate as the sole source of P (20 mM), we followed cell growth by optical density at 600 nm in triplicate, laboratory-based cultures in a phosphate-free defined mineral medium to which phytate was added. We also used a microwell-based approach (Rowe et al., 1977) using the same medium to estimate the maximum likelihood most probable number (MPN) of cultivable phytase-expressing microorganisms calculated following a procedure described by Russek and Colwell (1983). We compared the number of wells showing growth at dilutions of 10 22 -10 25 -fold (n 5 8 for each soil at each dilution) and included dilution series of soils and medium to which no phytate was added as a negative control. An independent experiment was performed for each land-use replicate (n 5 3) and the data combined for MPN calculation. Details of culture media are given in SI2.

DNA extraction and metagenome sequencing
Soil was collected from triplicate plots for each land-use in October 2011 to a depth of 10 cm using a 3 cm diameter corer. The top 2 cm of soil containing root mats and other plant detritus was discarded. Ten cores per plot were pooled and thoroughly mixed whilst sieving through a 2 mm mesh; samples were then frozen at 280 8C. All implements were cleaned with 70% ethanol between sampling/sieving soil from each plot. Soil community DNA was extracted from a minimum of 2 g soil using the MoBio DNA PowerSoil V R DNA isolation kit (Mo Bio Laboratories, Inc. Carlsbad, CA) with three replicates for each land-use soil. When necessary, extracts from separate 2 g aliquots from the same replicate were pooled to provide sufficient material for sequencing. 10 mg of highquality DNA was provided for sequencing for each of the nine plots. Shotgun metagenomic sequencing of DNA from each soil was provided by Illumina V R (Cambridge, UK) using a HiSeq TM 2000, generating 150bp paired-end reads. For analysis, sequences were limited to a quality threshold of 25 and minimum read length of 100bp using FASTX-Toolkit (version 0.0.13.2, http://hannonlab.cshl.edu/fastx_toolkit/index.html).

Metagenome analysis
To assess general abundance of organisms in the metagenomes, we mapped individual metagenomic sequences to the non-redundant (nr) protein database of NCBI (downloaded August 22 nd 2016) using DIAMOND ver 0.8.27 (Buchfink et al., 2015) in BLASTX mode and using a bitscore cut-off of 55. For each sequence, only the match with the highest bitscore was considered. Sequences not matching the nr database were considered currently unclassified. For classification of sequences with more than one nr entry with an equal bitscore, the weighted Lowest Common Ancestor (LCA) algorithm of MEGAN ver 6.5.8 (Huson et al., 2007;Huson et al., 2016) was employed using default parameters. The LCA algorithm assigns species-specific sequences to specific taxa. Sequences that are conserved across different species (e.g. as a consequence of horizontal gene transfer) are only assigned to taxa of higher rank (Huson and Mitra, 2012). Taxonomic comparison between the metagenomes was then made using MEGAN.
To look at the relative abundance and diversity of specific functional genes associated with organic P cycling we adopted an assembly-free, gene-centric approach, MApPP (Metagenomics/transcriptomics Assignment pHMM Phylogenetic Placement), to analysing the abundance and phylogenetic diversity of phosphatase genes associated with microbial communities in the different soils. Archetypal proteins for which positive activity has been demonstrated ( Table  2) were used as a starting point to generate sets of reference proteins for each phosphatase. Details of MApPP analysis steps are provided in SI3. Subcellular localisation of each reference protein was predicted using PSORTb (Yu et al., 2010: predictions available upon request to the corresponding author), the predictions are available upon request from the corresponding author. Nucleotide sequences corresponding to each protein sequence (available upon request from the corresponding author) were then used to query metagenome sequence datasets. To allow meaningful comparison between metagenomic datasets, gene abundance was expressed as a proportion of the estimated total number of genomes in each dataset, assessed by estimating the abundance of the ubiquitous, single-copy genes rpoB, recA, gyrB (Santos and Ochman, 2004) and atpD (Gaunt et al., 2001). Nucleotide sequence-based pHMMs were developed for each gene as described above. Metagenome-derived homologue counts for each single-copy gene were size-normalized to the length of the shortest gene, recA accounting for differences in length between the genes. To do this, the modal length of recA (1,044 nt) was divided by the modal length of the other singlecopy genes (1,422 nt for atpD, 2,415 nt for gyrB, 4,029 nt for rpoB), and this value was then multiplied by each single-copy gene count. The size-normalized abundance of each target phosphatase gene was then calculated for each soil as [target gene countÁread length/(mean normalized counts of singlecopy genes)] (Howard et al., 2008). Metagenome sequences identified by our analyses have been deposited at the MG-RAST database (http://metagenomics.anl.gov/).

Supporting information
Additional Supporting Information may be found in the online version of this article at the publisher's website: Fig. S1. Co-elution of myo-and scyllo-insP 6 standards with soil sample components. Fallow soil extract was analysed with or without added 2.5 lM myo-or scyllo-insP 6 . Fig. S2. Suppressed ion conductivity profiles of soil extracts. NaOH-EDTA extracts of soils managed as bare fallow, grassland and arable, diluted 10-fold with water and analysed by suppressed ion conductivity HPLC. Fig. S3. Soil extract components are labile to Escherichia coli phytase. A fallow soil extract was analysed: without addition; after addition of 200 lM myo-insP 6 ; after addition of 200 lM myo-insP 6 and incubation with E. coli phytase; or after addition of enzyme alone. Orthophosphate resulting from dephosphorylation of insP 6 is identified by Pi. All samples were diluted 10-fold before HPLC. Fig. S4. Separation of stereoisomers of inositol hexakisphosphate. Aliquots (20 lL) of 10 lM inositol hexakisphosphate were analysed by suppressed ion conductivity HPLC: D-chiro-, myo-, neo-and scyllo-insP 6 . Fig. S5. Response of soil microbial communities to phytate. Growth (bottom) and orthophosphate (P i ) accumulation (top) in laboratory batch cultures inoculated with 100 mg of air dried soil from each treatment in which phytate was the only added source of phosphorus. P i was assessed using spectrophotometric detection based upon the P i ColorLock TM Gold phosphate detection system (Innova Biosciences Ltd., Cambridge, UK) following the manufacturer's instructions.