As mentioned above, library construction is the most costly step for whole-genome sequencing of small-genome samples. We have developed several protocols for genome and transcriptome sequencing. Here, as an example, we introduce a modified library construction protocol using the NEBNext Ultra Ⅱ FS DNA Library Prep kit (NEB Cat. No.: E7805S; 24 samples), which is fast, reliable and has low per-sample costs.
Genomic DNA from 191 lab-evolved Escherichia coli K-12 MG1655 cell lines carrying different SNPs (single nucleotide polymorphisms) was extracted using the Wizard® Genomic DNA Purification Kit. After quality control of all the samples with Qubit® and NanoDrop®, we diluted each DNA sample with nuclease-free water to 7.7 ng/μL and transferred a 1.3 μL aliquot of each sample into a well of a 96-well plate using multi-channel pipettes. Then standard workflow of fragmentation/end preparation, adaptor ligation, cleanup of adaptor-ligated DNA, PCR enrichment of adaptor-ligated DNA, cleanup of PCR and assessment of library quality was applied. The modified reagent volumes of major steps are listed in Table 1 and a modified NEB protocol for input DNA < 100 ng with step by step details is in Supplemental File S1. It should be noted that different steps are not scaled down equally (most reagents in the same step are scaled down equally), with ratios ranging from 1/20 to ~ 1/3 following numerous trials (Table 1). The quality control of final libraries was performed by Qubit® measurement and gel electrophoresis. The library insert size was ~ 350 bp (insert size was estimated based on the library size distribution of the six libraries on the gel; Fig. 1) and mean library concentration across all samples was 4.67 (standard deviation: 2.37) ng/μL or 23.39 (11.87) nM. The insert size and concentration meet the basic requirements for downstream Illumina X Ten sequencing. Similar to many other library kits using enzymatic DNA fragmentation such as the Illumina Nextera DNA Preparation Kit, the library insert size of small genomes is too short for paired-end sequencing if following the standard protocol. We thus shortened the enzymatic fragmentation time to 5 min at 37 ℃ for the fragmentation reaction step (Supplemental File S1). The magnetic-beads volume ratio for size selection was also adjusted to increase the library size (steps 15 and 31 in Supplemental File S1). An additional round of size selection for library size > 450 bp on pooled libraries is recommended before sequencing.
Step Reagents A B Fragmentation/end prep NEBNext Ultra Ⅱ FS reaction buffer 0.4 7 NEBNext Ultra Ⅱ FS enzyme mix 0.1 2 Adaptor ligation NEBNext Ultra Ⅱ ligation master mix 1.5 30 NEBNext ligation enhancer 0.05 1 NEBNext adaptor for Illumina 0.5 2.5 USER® Enzyme 0.5 3 Cleanup of adaptor-ligated DNA NEBNext sample purification beads 2.3 45.6 PCR enrichment of adaptor-ligated DNA Primer i501-Primer i508 0.5 5 NEBNext Ultra Ⅱ Q5 master mix 2.5 25 Primer i701-Primer i712 0.5 5 NEBNext Ultra Ⅱ Q5 master mix 2.5 25 Cleanup of PCR NEBNext sample purification beads 9 40.5 All units are in μL
Table 1. Comparison of miniaturized (A) and manufacturer-recommended (B) reagent volumes for major steps of the NEBNext Ultra Ⅱ FS DNA library kit (for ≤ 100 ng genomic DNA)
Many researchers outsource all their genomics experiments to service providers, which frequently choose the Illumina TruSeq DNA nano Library Kit due to its low price, long usage history and reliability, for example, 660 USD for a 24-sample kit (Cat. No.: 20015964). We randomly chose 10 genomic DNA from the total of 191 samples and outsourced to Berry Genomics, Inc. Beijing for TruSeq nano library construction and Illumina NovaSeq sequencing, to evaluate the sequencing quality using the miniaturized NEB library protocol. The sequencing platform was different from the above-mentioned X Ten for the miniaturized libraries, as the service provider required NovaSeq sequencing for small amounts of sample order. The two sequencing/library platforms are not known to differ in sequencing quality (Arora et al. 2019).
It is not uncommon that service providers require hundreds of nanograms to several micrograms of genomic DNA per sample (2 μg in our case) for Illumina library construction, so that they could have sufficient DNA to repeat the protocol if the initial trial is not successful. In addition, service providers usually require a minimum sequencing amount, for example, 6 Gbp per sample for our service provider, to make the exercise commercially viable. But this is overkill for our E. coli samples with a genome size of ~ 4.6 Mbp, and the sequencing resulted in 1435 × mean depth of coverage for the ten outsourced samples, with a coefficient of variation 11% (Table 2).
Samples Protocol Raw reads Q30 percentagea Depth Mapping (%) Breadth (%) T40MA005 Outsourced 23, 813, 466 95.64; 94.20 1229.19 99.94 99.57 Miniaturized 1, 048, 806 98.26; 95.32 58.18 99.60 99.44 T40MA031 Outsourced 24, 687, 601 93.49; 89.87 1296.11 99.23 99.58 Miniaturized 1, 639, 505 97.77; 95.74 84.56 99.64 99.54 T40MA101 Outsourced 26, 083, 628 94.39; 90.50 1355.36 99.40 99.57 Miniaturized 2, 168, 905 98.06; 94.90 120.56 99.72 99.68 T40MA145 Outsourced 26, 258, 855 95.57; 94.05 1375.36 99.85 99.58 Miniaturized 1, 598, 639 98.07; 94.15 88.85 99.53 99.69 T40MA189 Outsourced 27, 237, 886 94.99; 92.88 1397.34 97.64 99.58 Miniaturized 2, 871, 652 98.08; 94.28 152.91 95.47 99.72 T40MA221 Outsourced 33, 261, 933 95.24; 92.91 1708.92 99.83 99.58 Miniaturized 1, 478, 940 97.62; 94.10 79.92 99.41 99.67 T40MA265 Outsourced 26, 225, 478 94.62; 92.35 1374.84 99.67 99.58 Miniaturized 1, 592, 936 98.20; 94.35 89.97 99.60 99.63 T40MA319 Outsourced 26, 734, 731 95.06; 93.30 1419.72 99.71 99.58 Miniaturized 1, 646, 512 98.21; 94.50 91.87 99.42 99.62 T40MA349 Outsourced 28, 412, 667 95.40; 93.67 1473.98 99.82 99.58 Miniaturized 1, 651, 876 97.94; 92.71 93.76 98.88 99.72 T40MA379 Outsourced 33, 511, 125 95.40; 93.67 1724.01 99.63 99.58 Miniaturized 1, 903, 442 97.87; 92.91 108.35 98.52 99.72 aNumbers separated by semicolons are percentages of forward and reverse reads with > Q30 (sequencing quality 30, i.e., 1/1000 error probability), respectively; depth, depth of coverage, from mapped reads after GATK filters; breadth, proportion of genome sequenced
Table 2. Illumina PE150 genome sequencing of ten E. coli libraries from the miniaturized and outsourced protocols
By contrast, the miniaturized protocol needs only 10 ng genomic DNA (the lowest DNA amount could be 100 pg as per manufacturer's instructions). We also ordered a full X Ten lane (100 Gbp, 1091 USD) for the 191 miniaturized libraries that were normalized in the laboratory using the Qubit® measurement of each sample and this reduced the sequencing costs per sample to less than 6 USD (not including the library costs). The mean depth of coverage of the ten samples using the miniaturized protocol was about 97 ×, with a coefficient of variation of 27%. Although the miniaturized protocol leads to higher sequencing-depth variance, which is possibly caused by the enzymatic fragmentation vs. fragmentation by sonication in the outsourced TruSeq nano protocol, the coverage distribution along the whole genome is consistent between the two methods (Fig. 2; coverage distribution of another nine tested samples is shown in Supplemental Figure S1). This is usable at least for downstream SNP analysis, and one of our previous studies used a similar enzymatic-fragmentation library protocol in successfully detecting large-scale structural variants, which were verified by RT-PCR (Long et al. 2016).
Figure 2. Normalized coverage across the whole-genome of strain TA40MA189 using the miniaturized NEB and the outsourced protocols. The whole genome is 1-kb-binned with 500 bp step size; coverage is normalized by dividing the coverage of each bin with the mean coverage of the whole genome
The sequencing quality of reads from the miniaturized libraries is higher than that from the outsourced protocol (Table 2). To be specific, the miniaturized reads were better than the outsourced reads by 2.6% for forward reads and 1.6% for reverse reads on average in the proportion of reads with Phred quality score higher than 30 (forward reads: 98.0% miniaturized vs. 95.4% outsourced; reverse reads: 94.3% miniaturized vs. 92.7% outsourced; paired t-test, P < 0.05).
We mapped the cleaned reads from both library protocols to the E. coli MG1655 reference genome (NCBI Accession No.: NC_000913.3) using BWA ver. 0.1.12. Read mapping rates were not significantly different (99.5% outsourced vs. 97.2% miniaturized; paired t-test, P = 0.07) and the breadth of coverage (proportion of the genome sequenced) was identical (99.6%). Reads of all ten samples using both the miniaturized and outsourced library protocols have been submitted to NCBI SRA, with the BioProject number of PRJNA551791.
We also detected all the SNPs carried by the ten samples with standard hard filtering parameters as per GATK Best Practices recommendations (DePristo et al. 2011; Li and Durbin 2009; McKenna et al. 2010; Van der Auwera et al. 2013). The SNPs were identical regardless of whether the outsourced or the miniaturized method was used (Table 3).
Samples Protocol SNP coordinate SNPa Read support at the siteb T40MA005 Outsourced NC_000913.3: 4514645 C → T 1551T Miniaturized NC_000913.3: 4514645 C → T 64T T40MA031 Outsourced NC_000913.3: 3408514 T → A 1551A Miniaturized NC_000913.3: 3408514 T → A 98A T40MA101 Outsourced NC_000913.3: 1848794 C → A 1064A, 2C Miniaturized NC_000913.3: 1848794 C → A 93A, 2C T40MA145 Outsourced NC_000913.3: 1819675 T → A 1175A, 7T Miniaturized NC_000913.3: 1819675 T → A 86A, 1T T40MA189 Outsourced NC_000913.3: 4369133 A → C 1359C Miniaturized NC_000913.3: 4369133 A → C 140C T40MA265 Outsourced NC_000913.3: 3869409 C → A 1791A, 1C Miniaturized NC_000913.3: 3869409 C → A 118A, 2C T40MA349 Outsourced NC_000913.3: 1763358 G → A 1171A Miniaturized NC_000913.3: 1763358 G → A 84A, 1G T40MA349 Outsourced NC_000913.3: 4375460 G → T 1490T Miniaturized NC_000913.3: 4375460 G → T 81T Read support refers to the read composition at the SNP site; aancestral base → mutant base; bread depth from gvcf file generated by GATK HaplotypeCaller after standard hard filtering
Table 3. Unique SNPs detected using reads from the miniaturized and outsourced protocols
Although the miniaturized NEB library protocol leads to a slight decrease in the overall quality after sequencing compared with the service provider's protocol, the same SNP results from both protocols and the dramatic cost difference justify the application of the miniaturized protocol, especially if hundreds to thousands of samples need to be processed. Nevertheless, further efforts should be invested in improving this protocol to ensure it achieves no worse quality than the commercial protocol. In addition, this miniaturized protocol was only tried on E. coli, thus testing this protocol with other organisms is needed to make it more robust and reliable. Based on previous experience with similar kits using enzymatic fragmentation, miniaturized protocols usually work better with larger genomes than smaller ones.