β -Glucan is a major growth substrate for human gut bacteria related to Coprococcus eutactus

A clone encoding carboxymethyl cellulase activity was isolated during functional screening of a human gut metagenomic library using Lactococcus lactis MG1363 as heterologous host. The insert carried a glycoside hydrolase family 9 (GH9) catalytic domain with sequence similarity to a gene from Coprococcus eutactus ART55/1. Genome surveys indicated a limited distribution of GH9 domains among dominant human colonic anaerobes. Genomes of C. eutactus -related strains harboured two GH9-encoding and four GH5-encoding genes, but the strains did not appear to degrade cellulose. Instead, they grew well on β -glucans and one of the strains also grew on galactomannan, galactan, glucomannan and starch. Coprococcus comes and Coprococcus catus strains did not harbour GH9 genes and were not able to grow on β -glucans. Gene expression and proteomic analysis of C. eutactus ART55/1 grown on cellobiose, β -glucan and lichenan revealed similar changes in expression in comparison to glucose. On β -glucan and lichenan only, one of the four GH5 genes was strongly upregulated. Growth on glucomannan led to a transcriptional response of many


Introduction
Dietary fibre originates from plant cell wall polysaccharides including cellulose, hemicellulose and pectins (Flint et al., 2012a). Their molecular structure is highly heterogeneous due to the presence of different monosaccharides, which are bound by a variety of glycosidic bonds. They are recalcitrant to digestion in the small intestine and reach the colon, where they serve as a substrate for microbial fermentation, which leads to the formation of short-chain fatty acids (SCFAs, mainly acetate, propionate and butyrate) and gases (Flint et al., 2012a). Indigestible plant storage polysaccharides and oligosaccharides are also widely regarded as belonging to the fibre-fraction of foods, as they reach the large intestine intact (Howlett et al., 2010;Slavin, 2013). An adequate supply of fibre and its efficient degradation is essential in fuelling the numerous healthpromoting actions of the gut microbiota, in particular the provision of beneficial SCFAs (Flint et al., 2012b;Louis and Flint, 2017). The fermentability of hemicelluloses, consisting of different types of polysaccharides (arabinoxylans, β-glucans, mannans, xyloglucans, etc.) is relatively high in the human gut and has been reported to rely on hydrolytic action of specific isolates belonging to many genera, including Roseburia and Bacteroides (Flint et al., 2012a;Sheridan et al., 2016;Tuncil et al., 2017). In contrast, the breakdown of cellulose by the human gut microbiota is less efficient and appears to be restricted to few species (Cann et al., 2016). Ruminococcus champanellensis is closely related to Ruminococcus flavefaciens, a major cellulose degrader in the herbivore GI tract (Flint et al., 2008;Chassard et al., 2011). Cellulolytic activity was also reported for a Bacteroides species isolated from the human gut (Robert et al., 2007). There is evidence for interindividual variation in cellulose-degrading gut microbes, which appears to correlate with whether methanogenic Archaea are present in an individual (Chassard et al., 2010).
The colonic microbiota is highly complex and our knowledge to date of which microbes are instrumental in fibre breakdown is incomplete. Numerous carbohydrate-active enzymes involved in fibre breakdown have been characterized and large human gut-derived data sets are available for genomic and metagenomic mining. However, it is often difficult to deduce function from sequence alone, as many glycoside hydrolase families comprise enzymes with different substrate specificities (Lombard et al., 2014). Functional metagenomics, which relies on functional expression of environmental genes in a heterologous host, can be utilized to identify microbes that are involved in fibre breakdown in the gut and reveal novel enzymatic functions. This approach has successfully been applied to human gut microbiota (Tasse et al., 2010;Cecchini et al., 2013). The heterologous expression host commonly used is Escherichia coli, which has been reported to successfully express genes from a wide range of organisms (Handelsman, 2004). However, an in silico analysis of 32 prokaryotic genome sequences for the presence of expression signals functional in E. coli suggests that only approximately 40% of genes would be successfully expressed in this host, with extensive variation (7%-73%) between different organisms (Gabor et al., 2004). Furthermore, post-translational processes, such as protein folding, insertion into the cell membrane or secretion from the cell may also differ between different microbes. The use of alternative expression hosts for metagenomic libraries may therefore improve the recovery of novel genes from metagenomic libraries, which has been demonstrated for functional metagenomic studies from other environments (McMahon et al., 2012).
Here, we report the comparative analysis of E. coli XL1 Blue and Lactococcus lactis MG1363 as hosts for functional screening of a human faecal metagenomic library on a range of different dietary carbohydrates. L. lactis belongs to low %G + C Firmicutes, is well-characterized and widely used as an alternative host to E. coli for heterologous gene expression, and genetic tools and vectors are available (Pontes et al., 2011). This led to the identification of a clone carrying a glycoside hydrolase (GH) family 9 gene with sequence identity to Coprococcus eutactus ART55/1. As this GH family is usually associated with cellulose breakdown, we investigated the breakdown of beta-linked glucans in this strain and its gene and protein expression response to growth on different substrates.

Results
Screening for glycoside hydrolase activities from a metagenomic library in Escherichia coli and Lactococcus lactis A human gut microbiome metagenomic library (6146 clones, average insert size estimate 2.5 kb), constructed in shuttle vector pTRKL2 and transformed into E. coli XL1 Blue, was functionally screened for glycoside hydrolase (GH) activities using seven carbohydrate substrates. Enzyme activity was confirmed in 16 clones after re-streaking on the respective media. Positive clones were found for all substrates apart from polygalacturonic acid and rhamnopyranoside, with the a. Substrates used for screening: AF, 4-methylumbelliferyl α-L-arabinofuranoside; S, potato starch; CMC, carboxymethyl cellulose; L, lichenan; X, oat spelt xylan. No clones with activity on polygalacturonic acid or 4-methylumbelliferyl α-L-rhamnopyranoside were detected. Level of activity detected is based on visual inspection of clearing zones on substrate-containing plates.
b. Full details of blast results of clone sequences are given in the Supporting Information Table S1. highest number of clones on starch and carboxymethyl cellulose ( Table 1). The comparative analysis of most sequenced inserts from positive clones showed a high level of identity to sequences from a variety of Bacteroidetes and Firmicutes bacteria of gut origin (Table 1 and Supporting Information Table S1).
Sequence analysis of open reading frames revealed homology with enzymes with the expected substrate specificity for some clones (e.g. P3H22 and P3B15, detected on α-L-arabinofuranoside; P5H21 and P1E14, detected on starch) whereas other clone sequences harboured less well-characterized open reading frames (Table 1 and Supporting Information Table S1). The E. coli XL1 Blue library was pooled and transferred into L. lactis MG1363 (4608 clones, insert frequency estimate 75%, average insert size estimate 2.5 kb) and functionally screened on all seven carbohydrate substrates, which resulted in a total of three positive clones on the β-linked carbohydrates carboxymethyl cellulose (CMC), lichenan and xylan. Sequencing analysis revealed that all three clones contained identical insert sequences but were different from the single positive clone found on the same substrates in E. coli XL1 Blue (Table 1 and  Supporting Information Table S1). One of the clones (P20A8) was transferred into E. coli XL1 Blue, but showed only very weak enzyme activity in this host. The 16 positive clones detected in E. coli XL1 Blue were also transformed into L. lactis MG1363, but none of them displayed enzyme activity in this host (data not shown). Clone P20A8 showed the presence of a truncated ORF with 100% sequence identity to C. eutactus ART55/1 gene CBK83841.1 (Fig. 1A). It contained a GH9 catalytic domain, suggesting that it encodes a β-glucanase.

Distribution of β-glucanase gene families among human colonic bacteria
Following detection of the GH9 catalytic domain containing clone P20A8 from C. eutactus ART55/1, we performed in silico analysis of β-glucanase gene families among human colonic bacteria. Glycoside hydrolases that break down β-(1,4) linkages in glucan chains belong to multiple GH families, notably GH5, GH8, GH9, GH16, GH44 and GH48 (CAZy database at www.cazy.org; Lombard et al., 2014). Figure 1B shows the distribution of the best characterized β-glucanase gene families across genomes available for selected human colonic bacteria from CAZy spanning Firmicutes, Bacteroidetes and Actinobacteria. Only GH5 is widely distributed (16/25 genomes) and this family is known to include enzymes with a very diverse range of specificities. In contrast, GH48 (generally encoding cellobiohydrolases) occurs only in the cellulolytic species R. champanellensis while GH44 (which has been implicated in xyloglucan utilization) is limited to two A. GH9-domain containing genes (short genes designated GH9/S and long genes GH9/L) in C. eutactus ART55/1 and Coprococcus sp. L2-50. Accession numbers and deduced length in amino acids (aa) are given in brackets. The start codon for L2-50_GH9/L was reassigned to position 18 of WP_008401367.1 based on the presence of a ribosome-binding site (GGAAG, eight nucleotides upstream) and a signal peptide motif. The line below ART_GH9/L indicates the region covered by clone P20A8 from the metagenomic library. Domain structure predictions are based on PFAM, PROSITE, InterPro and SMART databases searching. B. Genome carriage of glycoside hydrolase (GH) gene families associated with β-glucanase activity in human gut bacteria. species of Ruminococcus. The GH9 family, which includes many bacterial cellulases among cellulolytic bacteria from gut and non-gut habitats, also shows a limited distribution among human colonic anaerobes (Fig. 1B). This prompted us to investigate the characteristics of Coprococcus-related isolates and their GH9 genes further.
Phylogenetic analysis, based on multiple amino acid alignments of the catalytic domains, showed that ART_GH9/S and L2-50_GH9/S have >50% sequence identity to GH9 enzymes from other Clostridiales species including Lachnospiraceae bacteria, Butyrivibrio and Ruminococcus species within theme D of GH9 cellulases (Supporting Information Fig. S1). All of these enzymes display modular architecture similar to ART_GH9/S and L2-50_GH9/S with the presence of a carbohydrate-binding domain (CBM4_9, pfam02018), Ig-like domain (cd02850) and glycosyl hydrolase family 9 catalytic domain (pfam00759). The amino acid sequences of the GH9/L catalytic modules from both strains showed a lower level of identity (40%-50%) to other Clostridiales species belonging to Lachnospiraceae and Ruminococcus (Supporting Information Fig. S1).
The genus Coprococcus within the Lachnospiraceae family of Firmicutes contains three species, however, they are not phylogenetically closely related (Fig. 2). Sequence similarity searches of both GH9 genes identified in C. eutactus ART55/1 against reference genomes of C. eutactus, Coprococcus catus and Coprococcus comes revealed that both genes were present in eight C. eutactus strains including the type strain C. eutactus ATCC 27759 (ART_GH9/S, CBK83282.1, query coverage >91%, sequence identity >75%; ART_GH9/L, CBK83841.1, query coverage >97%, sequence identity >68%). No significant similarity was found with either GH9 gene in the C. catus and C. comes genomes. CAZyme analysis of several genomes of different Coprococcus species further confirmed that the GH9 genes were only present in C. eutactus, whereas in the other species, no β-glucanase-related genes were found apart from a single GH5 gene in one of the examined C. comes strains (Fig. 1B).
Catalytic activities of the GH family 9 enzymes from C. eutactus ART55/1 and Coprococcus sp. L2-50 The full length GH9 genes including predicted promoter sites were cloned in the same orientation into shuttle vector pTRKL2 and transformed into E. coli XL1 Blue and L. lactis MG1363. The transformants exhibited enzyme activity on substrate-containing plates using the Congo Red detection method (Fig. 3A). The enzymes from Coprococcus sp. L2-50 were functionally expressed in both hosts and showed activity on CMC-and lichenancontaining plates. In contrast, the enzymes from C. eutactus ART55/1 were functionally expressed only in L. lactis MG1363. The E. coli XL1 Blue transformants were devoid of lichenase activity and showed only limited activity on CMC-containing plates (Fig. 3A). Analysis of codon usage of the two GH9 genes from C. eutactus ART55/1 revealed that differences in codon usage may be  (Saitou and Nei, 1987). Bootstrap values (Felsenstein, 1985) from 500 replications are shown at branches and evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al., 2004) using Mega X (Kumar et al., 2018). responsible for their poor expression in E. coli XL1 Blue, as two codons of infrequent use in E. coli (Zhang et al., 1991) were present at relatively high frequency (ART_GH9/S AGA 14.5/1000, AUA 25.1/1000; ART_GH9/L AGA 9.5/1000, AUA 26.9/1000). The Coprococcus sp. L2-50 genes, on the other hand, had mostly lower frequencies (L2-50_GH9/S AGA 11.8/1000, AUA 0/1000; L2-50_GH9/L AGA 6.6/1000, AUA 7.7/1000) in line with their higher expression levels in E. coli XL1 Blue. The enzyme catalytic activities were also examined in supernatants and cell-free extracts of grown cultures using a reducing sugar assay, which agreed with the plate assay results. Enzyme activities for E. coli XL1 Blue transformants were mainly associated with the cell-free extracts, whereas activities from L. lactis MG1363 cultures were mainly detected in the supernatant fraction (Supporting Information Table S2). Thus, the cloned enzymes appear to be secreted by L. lactis MG1363.
To investigate the functionality of the different domains present in the shorter GH9 gene that was strongly conserved between both strains ( Fig. 1A), different constructs that lacked either the carbohydrate-binding domain and/or the Ig-like domain were generated of L2-50_GH9/S (Fig. 3B). The constructs were cloned and overexpressed in E. coli BL21 (DE3), and enzyme activities on various substrates were determined. Enzyme activities of the construct lacking the CBM did not differ from the complete enzyme, but the deletion of the Ig-like domain abolished catalytic activity (Fig. 3B). The highest activity was found on β-glucan, followed by lichenan.

Growth of Coprococcus species on different carbohydrates
We examined the ability of C. eutactus ART55/1 and Coprococcus sp. L2-50 to utilize a wide range of carbohydrate substrates by measuring optical density and a drop in pH. Coprococcus sp. L2-50 exhibited growth on glucose, cellobiose, β-glucan and lichenan as well as very limited growth on potato starch, but no utilization could be detected on laminarin, glucomannan, galactomannan, mannan, galactan, xylan, xyloglucan, arabinoxylan and pullulan ( Fig. 4A and Supporting Information Fig. S2). In addition to the substrates utilized by Coprococcus sp. L2-50, C. eutactus ART55/1 was also able to grow on glucomannan, galactomannan, galactan and potato starch and a limited pH drop was also detected on mannan ( Fig. 4 and Supporting Information Fig. S2). Medium pH was tracked for up to 11 days on the insoluble substrates Sigmacell type 50, acid-swollen cellulose and filter paper for both C. eutactus-related strains. No decrease in pH was seen (data not shown), indicating a failure to utilize these substrates. Thus, both strains are non-cellulolytic but are able to utilize certain soluble β-glucans, suggesting a possible role for the GH9 enzymes given their activity against these substrates (Fig. 3). Substrate utilization eutactus ART55/1 and Coprococcus sp. L2-50 GH9 genes on lichenan-or CMCcontaining agar plates after overnight incubation of 10 μl of a freshly grown overnight culture and staining with Congo Red. B. Enzyme activity of different constructs of Coprococcus sp. L2-50 L2-50_GH9/S missing either the carbohydrate binding and/or Ig-like domain (for domain designations see Fig. 1). Constructs were cloned and overexpressed in E. coli BL21 (DE3). Release of reducing sugars from lichenan, β-glucan and CMC was determined by Lever assay over 1 h of incubation and background activity of E. coli BL21 (DE3) carrying the empty vector were subtracted. Mean and standard deviation of three replicates.
was also examined of further strains from the three different Coprococcus species, which revealed that C. eutactus ATCC 27759 showed a very similar behaviour to C. eutactus ART55/1, but the three C. comes strains showed good growth only on glucose and C. catus GD/7 only showed very limited growth on potato starch (Supporting Information Fig. S2). This is in agreement with both the phylogenetic placement of the species as well as with the absence of the respective glycoside hydrolase genes in those strains.
Induction of β-glucanase activity during growth on β-glucan and transcriptional response to growth on different carbohydrates in C. eutactus ART55/1 Coprococcus eutactus ART55/1 and Coprococcus sp. L2-50 were grown to early stationary phase on either glucose, cellobiose or β-glucan to investigate whether β-glucanase activity was inducible in these strains. β-glucanase activity was much higher for cell extracts and supernatants from cultures grown on β-glucan than on glucose or cellobiose, especially for C. eutactus ART55/1 (Supporting Information Table S3). To identify differentially expressed genes (DEGs) during growth on β-glucan-type carbohydrates, C. eutactus ART55/1 grown on glucose was inoculated into medium containing glucose, cellobiose, β-glucan, lichenan or glucomannan and grown to exponential phase (Supporting Information Table S4) for RNA extraction and transcriptomic analysis by RNA sequencing. Principal component analysis (PCA) of transcript changes showed that C. eutactus ART55/1 cultures grown on glucomannan clustered most distinctly, whereas more similarity was seen between cellobiose, β-glucan and lichenan incubations (Fig. 5A). Significant DEGs [false discovery rate (FDR) <0.05 and log-fold change (LogFC) >1] compared to glucose as baseline were determined for all carbon sources. Glucomannan showed 197 DEGs compared to glucose specific to this substrate, whereas the other three substrates each only exhibited between two and seven specific DEGs (Fig. 5B). For all DEGs per substrate (including those shared with other substrates), on glucomannan almost half of the genes were upregulated (44%) relative to glucose, whereas on the other three substrates almost three quarters of the genes were downregulated (cellobiose 71%, β-glucan 72%, lichenan 68%, Fig. 5B). This trend was also reflected in the magnitude of the response (maximum fold change of upregulated DEGs 3.6, 20.7, 16.7 and 560.7 on cellobiose, β-glucan, lichenan and glucomannan, respectively, downregulated DEGs 273.8, 295.0, 231.9 and 40.6, respectively, Supporting Information Table S5).
Thirty-eight genes were significant across all four comparisons, with an additional 25 shared between β-glucan, cellobiose and lichenan (Fig. 5B). Interestingly, the complete data set of all 299 DEGs showed a very strong positive correlation for all pairwise comparisons between cellobiose, β-glucan and lichenan (P < 0.0001) but not for comparisons with glucomannan ( Fig. 5C and Supporting Information Fig. S3). Many of the DEGs significant on all four substrates responded in the opposite direction on glucomannan relative to each of the other substrates ( Fig. 5C and Supporting Information Fig. S3, data labelled in black). Gene ontology (GO) enrichment analysis identified several GO terms that were significantly over represented in the DEGs. These were mostly related to localization, transport, carbohydrate metabolic processes and hydrolase activity (Supporting Information Table S6). Three of the four genes strongly downregulated on all four substrates code for a CUT1 family ABC transporter with the fourth gene encoding a GH77 (4-α-glucanotransferase). On β-glucan, lichenan and to a lesser degree cellobiose, another ABC transporter belonging to the CUT2 family was also significantly downregulated. On glucomannan, on the other hand, an ABC-transporter annotated as a multidrug transport system was strongly upregulated (Supporting Information Table S5).
The genome of C. eutactus ART55/1 contains 45 glycoside hydrolase genes according to the CAZy database (Supporting Information Table S7), 19 of which showed a A. Medium pH drop after growth in 96-well plates after 48 h of incubation (mean and standard deviation of triplicate cultures). Growth on cellulose was carried out in Hungate tubes to enable longer incubation periods, but the pH did not change after 11 days of incubation. B. Representative growth curves of C. eutactus ART55/1 during growth in 96-well plates. No or very little growth was observed on laminarin, xylan, xyloglucan, arabinoxylan and pullulan (OD <0.07). significant change in gene expression (Fig. 6). One of the four GH5 genes present in the genome was strongly upregulated on β-glucan and lichenan and to a lesser degree on glucomannan. One of the two GH9 genes (GH9/L) was significantly upregulated on all four substrates, with glucomannan showing the strongest response. For most other GH genes, the response was opposite for glucomannan compared to the other substrates, with the vast majority being upregulated on glucomannan. The strongest upregulation was observed for all five genes assigned to GH families predominantly involved in mannan degradation (GH26, GH113 and GH130; Fig. 6). All ribosomal proteins were significantly reduced (FDR <0.05) on glucomannan and several reached a logFC of over −1, whereas none of the other substrates showed any significant changes relative to glucose (Supporting Information Table S5). This is in agreement with the lower growth rate observed on glucomannan (Supporting Information Table S4).
Proteomic response to growth on different carbohydrates in C. eutactus ART55/1 Bacterial cells from the cultures used for gene expression analysis were also subjected to proteomics analysis by mass spectrometry. In total, 891 C. eutactus ART55/1 proteins were identified. Label-free quantification (LFQ) values were analysed for differential abundance with a linear model identical to the gene expression analysis with glucose as the baseline and all proteins with adjusted significance values below 0.05 were regarded as differentially abundant. PCA revealed strikingly similar relationships between the different samples as was found for the gene expression data (Fig. 7A). A comparison of LFQ values with reads per kilobase per million mapped reads (RPKM) values from the gene expression analysis showed a strong positive trend between the two datasets ( Fig. 7B) despite the fact that they showed only a partial overlap in significantly differentially expressed/abundant genes and proteins (Supporting Information Fig. S4), likely reflecting post-transcriptional networks regulating protein expression (Vogel and Marcotte, 2012). As for the gene expression data, glucomannan resulted in the largest difference in the proteome compared to glucose, and the overall distribution of shared proteins between different substrates was similar as well (Fig. 7C). Several expression changes identified at the transcript level were confirmed at the proteome level, including the upregulated GH5 and some of the proteins comprising the downregulated CUT1 and CUT2 ABC transporters on β-glucan-type substrates, as well as some of the carbohydrate active enzymes upregulated on glucomannan (Supporting Information Table S5).

Discussion
Functional metagenomic screening was employed here to identify genes involved in carbohydrate breakdown from human faecal microbiota. A comparison of the suitability of two cloning hosts, E. coli XL1 Blue and L. lactis MG1363, revealed large differences in their functional expression of heterologous genes. Several active clones from a wide range of bacteria within both the Bacteroidetes and Firmicutes were recovered from the E. coli XL1 Blue library, whereas only a single clone with β-glucanase activity was recovered after electroporation of the E. coli XL1 Blue library into L. lactis MG1363. Thus, L. lactis MG1363 appears to be quite limited in its ability to successfully express genes from other organisms, and alternative Gram-positive hosts (e.g. Bacillus subtilis, Dobrijevic et al., 2013) may be more suitable for future functional metagenomic studies. This study does, however, also show the limitations in using E. coli as heterologous host, as the clone expressing very high activity in L. lactis MG1363 (P20A8) did not exhibit good activity in E. coli XL1 Blue. E. coli has been estimated to express approximately 40% of enzymatic activities from diverse microbial origins based on the analysis of expression signals in microbial genomes and is strongly biased towards genes from certain groups of organisms (Uchiyama and Miyazaki, 2009). Differences in codon usage between C. eutactus ART55/1 and E. coli XL1 Blue likely contribute to the poor expression of the two GH9 genes in E. coli. Protein export from the cell can also significantly affect successful heterologous expression (Freudl, 2018). Fractionation of grown cultures and enzyme activity measurements showed that E. coli XL1 Blue was not able to export the extracellular enzymes efficiently, whereas the activity was mainly detected in the culture supernatant in L. lactis MG1363. Differences in signal peptide recognition, secretion mechanism, the presence of chaperones and architecture of the cell envelope can affect secretion abilities between different organisms (Mingardon et al., 2011;Burdette et al., 2018).
The glycoside hydrolase family 9 mainly consists of cellulases (Wilson and Urbanowicz, 2019) and GH9 genes Fig. 6. Carbohydrate-active genes differentially expressed compared to glucose in C. eutactus ART55/1 during growth on cellobiose, β-glucan, lichenan and glucomannan. Putative enzyme substrates based on Cazypedia (CAZypedia Consortium, 2018) are indicated above the GH genes. Significant up-and downregulation for each of the four growth substrates relative to glucose is indicated by blue and orange asterisks, respectively. in the human cellulose degrader R. champanellensis have been shown to be involved in cellulose breakdown (Morais et al., 2016). This prompted us to investigate whether the C. eutactus-related strains ART55/1 and L2-50 are able to grow on cellulose, but this was not the case. Instead, both strains showed excellent growth on barley β-glucan and lichenan. The carriage of GH9 genes is therefore not a clear indicator of cellulose-degrading capacity in human gut bacteria. Based on sequence similarity and domain structure, ART_GH9/S and L2-50_GH9/S belong to GH9 theme D. Enzymes belonging to this group have been shown to initially cleave cellulose in a random mode and then act mainly as cellobiohydrolases (Devillard et al., 2004). The catalytic domains of ART_GH9/L and L2-50_GH9/L show sequence similarity with R. champanellensis GH9A, which was classed as an endoglucanase (Morais et al., 2016). More generally, GH9 enzymes have been associated with eight different catalytic specificities (EC numbers) however and these include activity against mixed-linkage beta glucans, as detected here (CAZY website www.cazy.org, Lombard et al. 2014). Coprococcus sp. L2-50 did not exhibit good growth on any of the other eight polysaccharides tested, but C. eutactus ART55/1 and ATCC 27759 showed some growth on glucomannan, galactomannan, galactan and starch. Genome analysis revealed that Coprococcus sp. L2-50 contains significantly fewer glycoside hydrolase genes [30 based on database dbCAN2 (Zhang et al., 2018)] than C. eutactus ART55/1 (43 based on dbCAN2, 45 based on CAZy, www.cazy.org) (Supporting Information Table S7). In agreement with the differences in carbohydrate degradation capacity seen for the two strains (Fig. 4), no genes belonging to GH26, GH113 or GH130, that were highly upregulated in C. eutactus ART55/1 on glucomannan, were identified in Coprococcus sp. L2-50 (Supporting Information Table S7). Despite the inability of Coprococcus sp. L2-50 to degrade galactan, the genome carriage of putative β-galactanases (CAZypedia Consortium, 2018) was similar between the two strains (one GH16 in Coprococcus sp. L2-50 and one GH53 in C. eutactus ART55/1, Supporting Information Table S7). Coprococcus sp. L2-50, however, harboured only a single potential β-galactosidase (belonging to GH2), whereas C. eutactus ART55/1 encoded four (two GH1, one GH2 and one GH42). GH42 enzymes have been hypothesized to be involved in plant cell wall degradation and may work in cooperation with GH53 galactanases (Moracci, 2019).
Gene expression and proteomic analysis of C. eutactus ART55/1 on β-glucan-type substrates compared to glucose revealed a significant increase in one of the two GH9 genes, which was strongest on glucomannan. GH5 enzymes also hydrolyse β-glucan-type linkages and four GH5 genes are present in this strain. One of these GH5 genes was found here to be strongly upregulated, in particular on β-glucan and lichenan, making it a strong candidate for involvement in their degradation. The bioinformatic analysis showed that this GH5 protein (CCU_08490) belongs to subfamily 37, which are intracellular enzymes of bacterial origin with endo-β-1,-3/4-glycanase (EC 3.2.1.4 and EC 3.2.1.73) and cellodextrinase (EC 3.2.1.74) activities (Aspeborg et al., 2012). Interestingly, most GH enzymes in both strains appear not to be secreted via typical secretion systems, as they contain no predicted signal peptides (signal peptides detected in four of 30 enzymes in Coprococcus sp. L2-50 and seven of 45 enzymes in C. eutactus ART55/1, Supporting Information Table S7). The absence of predicted signal peptides on the majority of GH enzymes was previously observed in other members of the Lachnospiraceae family (Sheridan et al., 2016). It remains to be established whether some of the GH enzymes without predicted signal peptides are membraneassociated or contain atypical secretory signal peptides (Gagic et al., 2016); however, the main ecological niche of the C. eutactus-related strains appears to be in the breakdown of soluble and shorter length β-glucans rather than complex insoluble fibre.
The relative expression levels during growth on glucose (percentage of specific gene relative to all genes, data not shown) did not reveal a big difference between the six GH5 and GH9 genes (0.011%-0.022% for the four GH5 genes (with the lowest one upregulated), 0.026 and 0.037 for the two GH9 genes (with the higher one upregulated), overall range of all genes 0.000035%-2.55%). Therefore, differences in responses to growth on polysaccharides are likely not due to differences in basal gene expression between the GH5 and GH9 genes. The glycoside hydrolases strongly upregulated on glucomannan (five genes belonging to GH26, GH113 and GH130, see Fig. 6) on the other hand had a lower basal gene expression on glucose (0.0006%-0.0093%). In general, the overall gene expression profile was very similar on cellobiose, β-glucan and lichenan and the response in comparison to glucose tended towards downregulation, with particularly strong responses for transporters likely involved in glucose transport. Thus β-glucans may be the preferred substrates for this organism, and oligosaccharides may be transported into the cell rather than glucose during growth on these substrates. The strong upregulation of genes involved in glucomannan degradation on this substrate, on the other hand, indicates that glucomannan constitutes an alternative energy source for C. eutactus ART55/1. Coprococcus sp. L2-50, a close relative of C. eutactus ART55/1, did not show good growth on any of the non-β-glucan-type substrates tested here, suggesting that bacteria related to C. eutactus may be adapted to thrive on β-glucans in the human large intestine. The negative correlation of GH77 on β-glucan, cellobiose, lichenan and particularly glucomannan in our study suggests that availability of these substrates repressed 4-α-glucanotransferase activity in C. eutactus ART55/1 associated with starch utilization (Ze et al., 2015). The genus Coprococcus was increased on a high resistant starch diet in pigs and its abundance was positively correlated with starch breakdown products (Sun et al., 2016). Both C. eutactus ART55/1 and Coprococcus sp. L2-50 carry several GH13 genes (Supporting Information Table S7), encoding α-amylases (Ze et al., 2015), however, reasonably good growth on potato starch was only found for C. eutactus ART55/1 (and also for the second C. eutactus strain ATCC 27759), but final optical densities were low compared to β-glucans (Supporting Information Fig. S2). The three C. comes strains and C. catus GD/7, on the other hand, did not show good growth on any of the polysaccharides examined here and thus occupy different ecological niches from C. eutactus. They may degrade polysaccharides not included here or may cross-feed from breakdown products of primary polysaccharide degraders. C. catus grows well on fructose, but also grows on lactate, thus it is able to cross-feed on fermentation products from other bacteria (Reichardt et al., 2014). Both species will have to be assigned new genus names based on their phylogenetic placement as well as physiological characteristics, as C. eutactus is the type species of the genus Coprococcus (Holdeman and Moore, 1974). The evidence provided here should also be taken into consideration for the interpretation of sequence-based studies, which often do not resolve data beyond genus level. Sequence-based studies that find a change in Coprococcus spp. should ideally be followed up with species-specific methods such as qPCR (Reichardt et al., 2018).
Stimulation of C. eutactus may have beneficial effects on human health, as it contributes to the production of the health-promoting metabolite butyrate (Louis et al., 2004). Coprococcus spp. were also consistently associated with higher quality of life across several cohorts and depleted in depression (Valles-Colomer et al., 2019) and C. eutactus showed an increase in abundance with a decrease in atopic dermatitis in an infant cohort (Nylund et al., 2015). Our study will aid in the development of nutritional strategies to stimulate these potentially beneficial microbes in the gut.
Human faecal isolates C. eutactus ART55/1 (Louis et al., 2004; note that its Genbank designation is Coprococcus sp. ART55/1 but for clarity it is named based on its phylogeny here as per Fig. 2), Coprococcus sp. L2-50 (Barcenilla et al., 2000; note that its Genbank designation is Clostridium sp. ART55/1 but for clarity it is named based on its phylogeny here as per Fig. 2), C. eutactus ATCC 27759, C. comes ATCC 27758, C. comes A2-232 (Barcenilla, 1999), C. comes SL7/1 (Louis et al., 2004) and C. catus GD/7 (Reichardt et al., 2014) were maintained anaerobically on M2GSC medium at 37 C (Miyazaki et al., 1997). Growth tests on different carbohydrates were performed in modified yeast extractcasitone-fatty acids (YCFA) medium (Duncan et al., 2002) in 96-well plates in an anaerobic cabinet at 37 C under 10% (v/v) carbon dioxide, 10% (v/v) hydrogen and 80% (v/v) nitrogen atmosphere with a medium pH at start of the experiment of 6.5 AE 0.2. Bacterial cellulase activity was determined using cultures grown in Hungate tubes with different cellulosic substrates. Optical density and pH readings were used as indicators for growth. For transcriptomic and proteomic analysis, overnight cultures grown in YCFA medium containing 0.2% (w/v) glucose were inoculated into YCFA containing 0.2% (w/v) of one of the following carbohydrates: glucose, cellobiose, barley β-glucan, lichenan or glucomannan and grown to exponential phase. Full details on media and growth conditions are given in supplemental methods.

Plasmid metagenomic library construction, functional screening and bioinformatic analysis
A freshly voided faecal sample was collected from a healthy female volunteer who had not received any antibiotics or other drugs during 6 months prior the sampling. The detailed method of library construction and functional screening is given in supplemental methods. Briefly, total metagenomic DNA was extracted, size fractionated and cloned into shuttle plasmid pTRKL2 (O'Sullivan and Klaenhammer, 1993). Transformation into E. coli XL1 Blue and selection of white colonies resulted in a library of 6146 viable clones containing an insert with an average insert size of 2.5 kb based on PCR colony screening of 24 randomly picked colonies. The library was transferred by pooling the E. coli XL1 Blue library and transforming the extracted DNA into L. lactis MG1363. It consisted of 4608 clones with 75% insert frequency and an average insert size estimated at 2.5 kb based on PCR colony screening of 15 random clones. Both libraries were arrayed on agar plates containing BHI (E. coli) or GM17 (L. lactis) medium with potato starch (Sigma-Aldrich S2004, 1% w/v), carboxymethyl cellulose (CMC, Sigma-Aldrich C4888, 0.5% w/v), lichenan (Sigma-Aldrich L6133, 0.05% w/v), oat spelt xylan (Sigma-Aldrich X0627, 0.5% w/v), polygalacturonic acid (Sigma-Aldrich P0853, 0.5% w/v), 4-methylumbelliferyl α-L-arabinofuranoside (Sigma-Aldrich M9519, 50 μg/ml) or 4-methylumbelliferyl α-L-rhamnopyranoside (Sigma-Aldrich M8412, 50 μg/ml) and incubated overnight at 37 C (E. coli) or 30 C (L. lactis). Positive clones identified by clearing zones or fluorescence were re-assessed to confirm activities and inserts sequenced (accession numbers see Supporting Information Table S1). Bioinformatic sequence analyses and databases used are detailed in supplemental methods.

Cloning of GH9 encoding genes from Coprococcusrelated species
Cloning procedures are given in detail in supplemental methods. Briefly, four genes encoding putative GH9 enzymes from C. eutactus ART55/1 and Coprococcus sp. L2-50 were cloned into shuttle plasmid pTRKL2 and transformed into E. coli XL1 Blue and L. lactis MG1363. For Coprococcus sp. L2-50 gene GH9/S (WP_008400439), synthetic gene constructs containing different domains (Supporting Information Fig. S5) were overexpressed in E. coli BL21(DE3) and enzyme activities were determined as described below.

Enzyme activity assays
For GH9 constructs cloned into E. coli XL1 Blue and L. lactis MG1363, freshly grown overnight culture (10 μl) was pipetted onto the surface of lichenan-, CMC-and xylan-containing agar plates, allowed to incubate overnight and stained with Congo Red. In order to increase the contrast, the plates were flooded with 50 mM acetic acid to turn the background towards the blue colour instead of pale orange.
For determination of enzyme activities in liquid culture and cellular localisation in E. coli XL1 Blue and L. lactis MG1363, freshly grown overnight cultures of recombinant clones were analysed. The enzyme activity was determined by reducing sugar assay following the Lever method (Lever, 1977), using supernatants and cell-free extracts prepared from three independently grown cultures (for details see supplemental methods). Enzyme activity was determined by measuring the amount of reducing sugar released by the fractions incubated with CMC, lichenan or β-glucan (each at 0.5% w/v) as substrates at 37 C.
For enzyme activities in E. coli BL21 (DE3) containing recombinant genes, C. eutactus ART55/1 and Coprococcus sp. L2-50, cells were harvested by centrifugation, washed in 20 ml and resuspended in 5 ml of sodium phosphate buffer (50 mM, pH 6.5). Cell extracts were prepared by bead-beating and enzyme activities determined by Lever assay (Lever, 1977). The details are provided in supplemental methods.
Gene expression and proteomic analysis of C. eutactus ART55/1 Full details of gene expression and proteomics methods are provided in supplemental methods. Briefly, triplicate cultures of C. eutactus ART55/1 grown on either glucose, cellobiose, β-glucan, lichenan or glucomannan were harvested during exponential phase (optical density 0.45-0.84, Supporting Information Table S4) and RNA and protein fractions were prepared.
For RNA sequencing, libraries were prepared after ribosomal RNA depletion and sequenced using the High Output 1X75 kit on the Illumina NextSeq 500 platform with v2 chemistry, producing 75 bp single end reads. In total, between 22 118 233 and 40 885 793 reads were produced per sample after quality filtering (99.9% of the raw read on average, Supporting Information Table S4). Raw sequencing data have been deposited in the Array Express database under the E-MTAB-8048. Reads were aligned against the reference genome for C. eutactus ART55/1 (FP929039.1, between 88.78% and 90.40% of the filtered reads, Supporting Information Table S4) and counted at gene locations (57.52%-60.00% counted, Supporting Information Table S4). For differential gene expression analysis, genes that had a CPM (count per million) value of more than one in three or more samples were kept for analysis, and all other genes were removed as low count genes, leaving 2022 genes for analysis. Differential expression analysis was performed using a generalized linear model with contrasts made between glucose (as the baseline) and all other carbon sources and setting significance at false discovery rate (FDR) < 0.05 and Log fold change (LogFC) > 1.
For functional analysis, significant differentially expressed gene sequences were isolated from the genome assembly and compared to the NCBI nonredundant protein database and to the InterPro protein signature database. The results of these searches were analysed with Blast2GO (version 5.2.5) (Conesa et al., 2005) where gene ontology (GO) terms were assigned to 1669 genes. GO enrichment analysis was carried out with Blast2GO using a Fisher Exact Test.
Protein digestion was carried out with porcine trypsin. Peptides were desalted and analysed by LC-MS as previously described (Herrero-de-Dios et al., 2018) using a Q Exactive Plus/Ultimate 3000RSLC nanoLC-MS system (Thermo Fisher Scientific, Hemel Hempstead, UK) to which a 25 cm long PepMap RSLC C18 nano column (internal diameter 75 μm) was fitted. Peak identification and quantification was carried out using MaxQuant (version 1.6.3.4) (Cox & Mann, 2008) with comparisons made to C. eutactus ART55/1 reference protein sequences downloaded from NCBI (https://www.ncbi.nlm.nih.gov/genome/ 13745?genome_assembly_id=175605). Parameters were set to calculate LFQ, as well as to identify potential contaminant proteins from media. In total, 919 proteins were identified by MaxQuant. Ninteen potential contaminants and 10 reverse sequence control proteins were removed before further analysis, leaving 891 proteins. LFQ values were log2 converted and then analysed for differential expression using LIMMA (version 3.38.3) (Ritchie et al., 2015) with an identical linear model as to that used with the RNA sequencing analysis, treating glucose as the baseline for comparisons.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al., 2019) partner repository with the dataset identifier PXD014174.
Aberdeen, Aberdeen, UK). The authors wish to thank Gill Campbell and Pauline Young for their assistance with colony picking and screening metagenomic libraries, Brennan Martin at the Centre for Genome Enabled Biology and Medicine, University of Aberdeen, for performing the RNA sequencing, Evelyn Argo and Craig Pattinson for advice on sample preparation for proteomics work, Bekhal Kareem Sharif and Manon Le Merrer for help with growth experiments, Sylvia Duncan for the gift of acid-swollen cellulose and for providing strains C. comes A2-232 and SL7/1, Pat Bain for help with figure formats and Mike Peck and Bruce Pearson at the Institute for Food Research Norwich for helpful discussions.

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Table S1 Sequence analysis of clones recovered by a functional screening of human gut microbiome metagenomic libraries. Table S2. Enzyme activity of GH9 enzymes from C. eutactus ART55/1 and Coprococcus sp. L2-50 expressed in E. coli XL1 Blue and L. lactis MG1363 supernatant and cell-free extract. Table S3. Reducing sugar assay of cell extracts and supernatants of cultures of C. eutactus ART55/1 and Coprococcus sp. L2-50 grown on glucose, cellobiose or β-glucan to early stationary phase. Table S4: Individual culture and sample statistics of gene expression and proteomics analysis. Table S5. Log fold change (logFC) and adjusted significance values (FDR and adj.P.Val) for RNA sequencing and proteomics analyses of C. eutactus ART55/1 genes. Table S6. Significantly enriched Gene Ontology (GO) terms identified by Blast2GO (FDR < 0.05) C. eutactus ART55/1 genes. Table S7. CAZyome of C. eutactus ART55/1 and Coprococcus sp. L2-50. Table S8. Primers used for amplification of GH9 genes from Coprococcus-related species. Fig. S1 Phylogenetic analysis of the family 9 catalytic modules of C. eutactus ART55/1 and Coprococcus sp. L2-50 GH9 genes (highlighted by blue and purple dots, respectively). The phylogenetic trees show the relationship of the catalytic domains of ART_GH9/S and L2-50_GH9/S proteins (A) and ART_GH9/L and L2-50_GH9/L (B) proteins. The amino acid sequences of the catalytic domains were retrieved from NCBI following BlastP analysis. The evolutionary history was inferred using the Neighbour-Joining method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the p-distance method and are in the units of the number of amino acid differences per site. The analysis involved 68 (A) and 44 (B) amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 376 (A) and 348 (B) positions in the final dataset. Evolutionary analyses were conducted in MEGA7. Fig. S2. Carbohydrate utilization of C. eutactus ART55/1, Coprococcus sp. L2-50, C. eutactus ATCC 27759, C. comes ATCC 27758, C. comes A2-232, C. comes SL7/1 and C. catus GD7, on different substrates. A. Medium pH drop after growth in 96-well plates after 48 h of incubation (mean and standard deviation of triplicate cultures). B. Growth curves during growth in 96-well plates. C. catus GD/7 does not grow well on glucose, but the preculture showed an optical density increase during the hour before inoculation and had reached OD 0.25 at inoculation of the growth experiment. Fig. S3. Relationship of all differentially expressed C. eutactus ART55/1 genes (DEGs, 299 genes in total) between different growth substrates, expressed as logFC. Other comparisons are shown in Fig. 5C. Fig. S4. Venn diagrams comparing significant differential expression of C. eutactus ART55/1 genes and proteins in glucose vs. β-glucan (A), glucose vs. cellobiose (B), glucose vs. glucomannan (C), and glucose vs. lichenan (D). Fig. S5. Cloning strategy for generation of synthetic gene constructs of Coprococcus sp. L2-50 GH9/S. Appendix S1: Supplemental methods