Research ArticleGENETICS

# Digital-WGS: Automated, highly efficient whole-genome sequencing of single cells by digital microfluidics

See allHide authors and affiliations

Vol. 6, no. 50, eabd6454

## Abstract

Single-cell whole-genome sequencing (WGS) is critical for characterizing dynamic intercellular changes in DNA. Current sample preparation technologies for single-cell WGS are complex, expensive, and suffer from high amplification bias and errors. Here, we describe Digital-WGS, a sample preparation platform that streamlines high-performance single-cell WGS with automatic processing based on digital microfluidics. Using the method, we provide high single-cell capture efficiency for any amount and types of cells by a wetted hydrodynamic structure. The digital control of droplets in a closed hydrophobic interface enables the complete removal of exogenous DNA, sufficient cell lysis, and lossless amplicon recovery, achieving the low coefficient of variation and high coverage at multiple scales. The single-cell genomic variations profiling performs the excellent detection of copy number variants with the smallest bin of 150 kb and single-nucleotide variants with allele dropout rate of 5.2%, holding great promise for broader applications of single-cell genomics.

## INTRODUCTION

Single-cell genomics, uncovering genomic heterogeneity that is hidden in conventional bulk characterization, has enabled the interrogation for genomic variations of the multifarious biological processes at the single-cell level (1, 2). Currently, the technology of single-cell genomic sequencing has been widely applied in the resolution of early embryonic development (35), tumor heterogeneity (68), and neural somatic mosaicism (9, 10) and is exceedingly needed in the case of highly valued and rare samples, such as prenatal testing samples (11) and circulating tumor cells (12, 13). However, single-cell genomics has relied on whole-genome amplification (WGA) for amplifying genomic DNA from single cells to generate sufficient replicates for sequencing, possibly introducing amplification bias and loss of coverage (14).

There has been considerable effort to advance WGA performance by molecular or microfluidic strategies. Molecular strategies incorporate high-fidelity DNA replication or linear amplification steps into the process to improve uniformity, enlarge coverage, or reduce error rate, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR) (15), multiple displacement amplification (MDA) (16), multiple annealing and looping-based amplification cycles (MALBAC) (17), and linear amplification via transposon insertion (LIANTI) (18). Among all, MDA is the most widely used method, which exponentially amplifies DNA by random priming and strand displacement under isothermal conditions (16). Compared with other WGA methods, MDA is easy to be performed and offers higher fidelity and coverage (14). Unfortunately, MDA exhibits considerable bias due to exponential amplification and has lower precision and sensitivity in copy number variant (CNV) detection (19).

Microfluidic strategies, which implement nucleic acid amplification of small reaction volumes in microfluidic devices (nanoliters or picoliters), have been used to reduce nonspecific and repeated priming (2029) and were previously demonstrated useful for PCR (28), MDA (2027), and MALBAC (29). In particular, droplet microfluidics has recently been demonstrated to improve evenness of amplification while preserving MDA’s high fidelity and has attracted extensive attention due to its scalability for various single-cell studies (23, 26, 30). However, these droplet-based approaches still face various difficulties to completely fulfill WGA requirements. First, the strategy of single-cell isolation based on Poisson statistics causes low cell occupancy and high loss of cells, which is inaccessible for rare samples (31). Moreover, these approaches are hard to manipulate and to control the droplets addressably in parallel, which limit the capability of picking up desirable droplets. Besides, massively monodispersed droplets are usually unstable, which could affect the uniform amplification of DNA fragments per droplet. Overall, droplet-based approaches under the existing technical conditions are unable to perform efficient, automated, and robust single-cell WGA in an integrated microfluidic chip. An ideal single-cell WGA method should integrate all the major steps in sample preparation and offer high cell capture efficiency and throughput while maintaining data quality of high uniformity and accuracy across the whole genome.

Digital microfluidics (DMF) is a burgeoning microfluidic automation technique that manipulates microliter- to nanoliter-sized droplets on an array of electrodes via the electrowetting-on-dielectric (EWOD) phenomenon. By the application of a series of potentials to these electrodes, droplets can be individually controlled to merge, mix, split, and dispense from the reservoir. In comparison to existing fluid handling systems like channel-based devices and pipetting robots, DMF offers a multitude of advantages in terms of contactless and addressable droplet manipulation, flexible and universal chip design, and lossless sample handling and recovery. DMF has recently been used to perform cell-based assays, providing the ability for complicated and multistep experiments of cell culture and analysis, such as the first automated cell culture on a microfluidic platform (32, 33). Here, we develop Digital-WGS, a single-cell sample preparation platform based on DMF that integrates all the major steps of parallel nanoliter-volume MDA from single-cell isolation to WGA with automatic processing. By combining hydrodynamics and surface wettability on a DMF chip, we automatically and efficiently (100%) isolate single cells by droplet manipulation regardless of cell types and inputs. Digital-WGS allows addressable control of droplets during all steps to greatly promote the lysis efficiency and evenness of reaction, which is an important factor for sufficient release of genomic DNA from chromosomes and uniform amplification by increasing randomness of primer binding. The addressable and contactless workflows have reduced competition with contaminant or endogenously generated background, thus increasing the effective concentration of the genome template.

We applied Digital-WGS to perform many single-cell nanoliter-volume MDA reactions and comprehensively compared the performance to other reported MDA methods using both low-depth and higher-depth whole-genome sequencing (WGS). Our results indicate that Digital-WGS outperforms existing MDA methods at multiple scales, greatly reducing amplification bias and errors of exponential amplification. Using the method, we achieve the excellent detection of CNVs with the smallest bin of 150 kb and single-nucleotide variants (SNVs) with allele dropout (ADO) rate of 5.2%. Thus, Digital-WGS offers unique pathways for addressing the current problem of WGA, which provides an efficient and robust method for performing single-cell sequencing. This approach is also scalable and universal for any chemistry of single-cell analysis, holding great promise for broader applications of single-cell genomics.

## RESULTS

### Digital-WGS: Streamlining the single-cell MDA reaction in nanoliter volumes

To establish a single-cell sample preparation platform that integrates all the major steps of parallel nanoliter-volume MDA from efficient single-cell isolation to high-performance WGA with automatic processing, we developed Digital-WGS to address the limitations described above using a DMF chip (Fig. 1A and fig. S1). The DMF chip includes two parallel glass plates separated by spacers. The top plate is used as a ground electrode, and the bottom plate is patterned with an array of electrodes featuring geometrical design and single-cell capture structures. The geometrical design, including electrode size, pattern, and spacing, was optimized to be compatible with MDA in nanoliter volumes. The electrode array pattern consists of multiple reagent-dispensing units, three single-cell isolation units, a single-cell lysis region, a stop region, and a genome amplification region (fig. S2). The single-cell isolation unit was innovatively designed using wettability-based hydrodynamic traps, called butterfly structure (Fig. 1B), which can automatically and efficiently capture single cells. As a cell droplet passes across the butterfly structure, cells in the flow will be focused and funneled into the weir in the middle of the butterfly structure. Once the weir is filled with a single cell, the flow resistance is increased drastically through the weir, redirecting the main flow and carrying subsequent cells to the slits on either side (Fig. 1C). With droplets driven from the butterfly structure, the captured cell is retained because of the formation of the hydrophilic virtual microwells (fig. S3A and movie S1) (32). Thus, the structure realizes one-step automatic single-cell capture during the change of droplet contact angle by addressable manipulation of cell droplets through local-wetted hydrodynamic structure. After single-cell isolation, the single cell immobilized in the weir is backflushed, lysed, and amplified by MDA sequentially. There are three units for parallel single-cell capturing in the current design, steamlining the process to obtain nine samples at a time (fig. S3B). After amplification on the designated electrodes, we extract amplicons by actuating droplets from the amplification region to the side edge of the chip for sequencing.

The capture principle of the butterfly structure combines hydrodynamic traps (34) with surface wettability (35), thus differing from the static settling method of conventional single-cell isolation on the DMF chip. Single cells can be funneled into weirs by flow guidance under conditions of laminar flow at the bottom of the droplet and immobilized by the formation of the virtual microwells, which are not affected by disturbances during the deformation and movement of the droplet. We designed the shape of the trap structure in a butterfly configuration to maximize central flow through the weir for better single-cell capture (fig. S4). The captured single cells are retained under the hydrophilic surface energy traps, which are unlike conventional hydrodynamic traps where cells are retained primarily owing to their surface tension and are exposed to damaging stresses (fig. S5). Moreover, because droplets of cells are addressable for free actuation, this technology enables selective isolation of the desired single cells by reversely flushing, if necessary.

To ensure that the distribution of cells in the actuated droplets is not affected by notable recirculation near the drop interface (36), we optimized the chip to lower cellular flow strength with activation voltage set in the range of 100 to 130 V (fig. S6, A and B). The sufficiently low peak voltage offers minimum disturbance when the reverse droplet motion returns the cells near to their initial positions. This is an important characteristic of a well-ordered laminar flow for single-cell capture.

### Digital-WGS provides automated and efficient single-cell isolation

To characterize the device, we imaged and counted single K562 human cells captured on each weir of the butterfly structure to calculate capture efficiency of the chip (Fig. 2A). To assess possible damage to cells during the cell capture process, we compared the cell viability and the degree of DNA damage of the population before processing with that of the manipulated population. The results excluded the possibility of damage due to butterfly traps, which will not affect subsequent WGA (Fig. 2, B and C). Next, regarding capture efficiency, the results showed that it increased significantly with longer settling time and denser cell suspensions (Fig. 2D). Increased probability of single-cell capture at long settling time may result from greater accumulation of suspended cells at the surface of the bottom plate. We therefore selected 30 s as the optimal settling time, providing both high capture efficiency and rapid single-cell isolation. The single-cell capture efficiency was 100% using a settling time of 30 s and a cell suspension concentration of 2.5 × 103 cells/μl. For only a few dozen cells, it was considered acceptable for adding seeding cycles to achieve total single-cell isolation at a recovery rate of 100%, much better than traditional microfluidic devices (Fig. 2E) (37). Such a high-performance single-cell capture is attributed to our exploitation of the capture principle, which combines addressable droplet manipulation and local-wetted hydrodynamic structures on the DMF chip.

To assess the robustness of Digital-WGS, we first quantified capture efficiency for various cell lines with different cell sizes. The results showed excellent coherence in capture efficiency (Fig. 2F). We also measured the distribution of cell diameters before and after loading, and the results indicated that cell trapping did not introduce significant bias in selecting cells of different sizes (fig. S6, C and D). These results affirm that the reliance on hydrodynamics rather than on cell physical properties to isolate single cells makes the Digital-WGS uniquely suited to study a range of cell types. Besides, we determined that the electrode size and spacer height do not affect the capture efficiency of single cells (fig. S6, E and F), thus guaranteeing the flexibility of Digital-WGS for different reaction systems. The butterfly structure was stable for more than 100 single-cell isolation cycles, which was sufficient for consecutive automation control (fig. S7).

### Characterization of the performance of MDA for Digital-WGS

The major technical challenge of MDA is the highly uneven amplification of genomic DNA in a single cell. When performing a single-cell amplification experiment, all variables require careful consideration to minimize technical artifacts and the introduction of noise. DMF is capable of precise and reproducible dispensing of droplets of different viscosities with coefficient of variation (CV) ranging from 0.3 to 0.9% for volumes of 3 to 400 nl (fig. S8). In addition, this streamlined process of reaction assembly in a DMF format ensures automation of all reaction steps and greatly reduces technical variability associated with pipetting and mixing steps in microliter volumes. To improve the amplification evenness of Digital-WGS, we constructed some Digital-WGS experiments under different amplification conditions using MRC-5 cells, a normal human diploid cell line. Previous studies (21, 38) have shown that the implementation of MDA in nanoliter volumes, which increases the effective concentration of the genome template, can reduce amplification bias. We carried out some reactions ranging from 60 to 200 nl in volume to evaluate the effect of different amplification volumes by plotting the average read depths in 1-Mb bins of 0.75× average depth. We observed that the optimal amplification volume for Digital-WGS is 150 nl (fig. S9A). The great differences of CV for various amplification volumes showed that appropriate volume is essential for primer-annealing kinetics, maybe resulting from the balance between the high concentration of template and sufficiently random distribution of DNA in the droplet. Since the effective concentration of template DNA is low in the large amplification volume, there might exist potential iterations of repeated priming, causing high amplification bias. On the other hand, too small amplification volume will result in fewer DNA polymerase molecules per DNA template, so that DNA in the droplets could not bind with DNA polymerase in sufficiently random distribution. The effects of amplification time were observed with a trend of reduced bias with increasing single-cell MDA reaction time. However, overlong amplification time resulted in the accumulation of nonspecific products, reducing the effective content of template genome in the sequence library (fig. S9B). We performed all subsequent MDA reactions for 10 hours to maximize the proportion of effective templates. In addition, the DMF environment is more reliable for MDA amplification, considering the randomness of amplification among samples in the tube under the same lysis environment (fig. S9C). These characterizations made it easier for us to observe the mechanism of the MDA reaction, which was facilely implemented by simple program transformation.

The other unwanted characteristics of MDA is the nonspecific synthesis of contaminated DNA coming from exogenous environment. Because the single human cells are automatically isolated by programmed control, washed by phosphate-buffered saline (PBS) before lysing, and amplified in nanoliter volumes, the contamination from exogenous nonhuman DNA has been minimized. Genomic alignment analysis verified that Digital-WGS produced much clean (0.2% nonhuman reads) data for single-cell sequencing. In addition, the cross-contamination could be avoided since every reaction is distributed to spatially distinct droplets immersed in the oil. To ensure that cross-contamination was not occurring, we performed fluorescent monitoring using SYBR Green I to visualize DNA amplification of high concentration of starting genomic DNA for 24 hours. If a small quantity of DNA diffused out of droplet, then an increased fluorescence would be observed around the droplet. No observable fluorescence intensity change was found in anywhere near droplets, thus excluding the possibility of cross-contamination through diffusion (fig. S10).

### Digital-WGS amplifies single-cell genome with higher performance in many metrics

We assessed the performance of Digital-WGS relative to the following prepared or publicly available MDA methods for diploid cell lines in terms of mapping rate, coverage, uniformity, and error rate: conventional single-tube MDA prepared by single MRC-5 cells, droplet MDA (24), emulsion WGA (eWGA) (23), and commercial microfluidic MDA (Fluidigm C1) (22). To fairly compare all methods, we analyzed all raw datasets using the same analytical parameters of 0.75× average depth for every single cell to calculate the copy number with a mean size of 52.4 kb using the dynamic binning method (39). Digital-WGS and eWGA shared the most uniform amplification across the entire genome, with a CV of 0.15, which is lower than that of other MDA methods (Fig. 3A). Figure 3B shows the mapping rate and coverage breadth of reads that mapped to the reference genome, and 99.8% of the DNA sequences obtained from samples mapped to a reference human genome for coverage of 35.5%. Both results were the highest observed values of all MDA methods compared, indicating that there was almost no DNA contamination or sample loss in Digital-WGS. This is most probably because EWOD-based Digital-WGS, manipulating discrete droplet immersed in the oil sandwiched by hydrophobic surfaces with an automated and integrated system, provides automated single-cell sample preparation and a contactless droplet amplification environment.

We next performed 10× WGS on a few MRC-5 Digital-WGS samples and downsampled all datasets to the same depth to execute a comparison with other MDA methods. We plotted the Lorenz curves of coverage to validate the evenness (Fig. 3C). Digital-WGS showed the best uniformity across the entire genome, which was closest to the unamplified bulk sample. We also plotted the figure of CV of the read depth versus the bin size (Fig. 3D), which is more informative than Lorenz curves to quantify amplification bias. The result showed that Digital-WGS achieved low CV values on all scales, offering high accuracy for CNV detection, probably due to the homogeneity of the reaction system in the droplet and lossless sample handling. To characterize the coverage on all scales, we then plotted coverage breadth as a function of sequencing depth (Fig. 3E). The Digital-WGS samples achieved the highest coverage breadth of all samples at any given sequencing depth, covering 88.7% of the reference genome, respectively, when sequenced to 10× depth.

We then investigated the accuracy of SNV identification from single MRC-5 cells using Digital-WGS. From the deeply sequenced single-cell data (30×), Digital-WGS exhibits more homozygous and heterozygous SNVs than conventional MDA, yielding a 49% detection efficiency, in contrast to 20% with MDA, in accordance with the higher coverage breadth (Fig. 4A). Next, we examined the false-negative rate, particularly where alleles dropout because of amplification bias. Comparison of single-cell and bulk SNVs showed that 31% false-positive rate of the SNVs genotyped as homozygous mutations by Digital-WGS were actually heterozygous in bulk, which corresponds to a 5.2% ADO rate in Digital-WGS, noticeably less than the false-negative rate and ADO rate, 77 and 65.5%, of conventional MDA (Fig. 4B), making Digital-WGS a great choice for those single-cell applications that cannot be implemented by conventional MDA because of its notoriously high ADO rate. The false-positive rates associated with amplification and sequencing errors were evaluated. Compared to the bulk data, the Digital-WGS data contains 2.0 × 105 false positives out of 3 × 109 bases in the genome. This corresponds to a 6.7 × 10−5 false-positive rate. Our strategy to reduce the false-positive rate was to sequence two or three kindred cells derived from the same cell. The simultaneous appearance of an SNV in the kindred cells would indicate a true SNV. The false-positive rate due to uncorrelated random errors can be reduced to ~10−8 with two kindred cells and ~10−12 with three kindred cells (Fig. 4C).

### Digital-WGS enables high-resolution CNV in single cancer cells

We next applied Digital-WGS to sequence single K562 cancer cells, which is a cell line close to triploid. We observed that the coverage depth pattern of every single cell is similar to that of bulk genomic DNA (Fig. 4D). We called the CNVs from Digital-WGS-amplified single cells at 52.4-kb resolution and found that the CNV pattern of each single cell was almost identical to that of the monoclonal expanded bulk sample. At this resolution, we were able to identify CNVs with the smallest size of 150 kb (Fig. 4E). We also profiled CNV patterns of single cells amplified from conventional MDA at 52.4-kb resolution and found that, compared with conventional MDA, the improved amplification uniformity of Digital-WGS allowed us to obtain a more reliable genome-wide CNV pattern (Fig. 4F), as well as higher specificity and sensitivity of CNV identification in single cells, establishing the advantages of using Digital-WGS for single-cell studies.

## DISCUSSION

Despite the ability of existing platforms to process single-cell WGA, single-cell genomic analysis is underutilized because of the complexity, limited throughput, significant experiment cost, and input cell quantity requirements of available methods. We developed Digital-WGS that enables streamlining parallel nanoliter-volume single-cell MDA, which provides high-efficiency single-cell isolation and improved WGA performance. Figure S11 shows the comparison of our Digital-WGS with conventional MDA by tubes. Current MDA protocols by tube, which take approximately 5 min for single-cell isolation by skilled technicians (40), are limited to low capture efficiency, yield low product per unit volume, cost an estimated $20 per cell for amplification, and suffer from low mapping rate, coverage, and high amplification bias. In contrast, the Digital-WGS protocol, depending on the customizability and automation of the DMF chip, offers many advantages in terms of capture performance (capture efficiency of 100% and capture speed of 3 cells/min), cost (approximately 15 samples per$1), labor (fully automated processing by pipelining), and amplification performance (high coverage and uniformity). These performances strongly indicated that our system has great potential in single-cell analysis. In addition, as a flexible and scalable platform, high throughput of tens to hundreds of single-cell samples can be achieved by increasing the number of controllable electrodes to place more capture structures. To make higher throughput, a single-cell platform based on active matrix EWOD (AM-EWOD) (41) can be developed, which can support 16,800 electrodes and thus process thousands of cells. In the future, “combinatorial indexing” method (42) can be introduced to label each cell in the DMF chip to realize high-throughput genome sequencing for >10,000 single cells.

Our analyses of low-depth and higher-depth WGS data indicate that the performance of Digital-WGS compares favorably to that of other MDA methods considered under the fair comparison of sample datasets having unequal average sequencing depths, greatly reducing amplification bias and errors and improving genome coverage. Digital-WGS also enables accurate identification of both small CNVs and high-confidence SNVs from a single human cell, detecting CNVs at a 150-kb size with 52.4-kb resolution, and SNVs with an ADO rate of 5%.

Digital-WGS provides an automated and efficient method for single-cell nanoliter-volume sample preparation. The implementation overcomes the limitations of conventional microfluidic approaches, which not only realizes efficient and addressable single-cell manipulation but also offers a robust and accurate interrogation of CNVs and SNVs. We expect that Digital-WGS, the technology presented here and subsequent improvements thereof, will have a variety of applications as a robust and flexible platform for single-cell sample preparation, which continues to expand across numerous disciplines in the biological sciences.

## MATERIALS AND METHODS

### DMF chip design and fabrication

The DMF chips comprised a top plate and a bottom plate with an array of electrodes patterned by photolithography and wet etching. Briefly, AZ5214E (Clariant AG) was spun on a glass substrate (70 mm by 75 mm by 1 mm), which sputtered 300-nm-thick Chromium, and then exposed to ultraviolet (UV) light through a photomask. The actuation electrodes were formed by developing the exposed substrate in RZX-3338, etching in CR-4 and immersing in RBL-3368 to remove photoresist. The chip was then coated with a 14-μm height of SU-8 2015 photoresist (Microchem) as a dielectric layer followed by the fabrication of a cell trap layer using a 25-μm-high SU-8 2015 photoresist, which was further developed to form the butterfly structure. Last, the bottom plate was prepared by inserting hydrophilic circles into the capture structures for single-cell anchoring using a modification of Teflon–AF liftoff technique. Fourteen-micrometer AZ4620 (Clariant AG) was used as the intermediate photoresist to form the patterns. Then, 1-μm polytetrafluoroethylene (PTFE) [50% (v/v) in water] was precoated after oxygen plasma treatment. By immersing in RBL-3368 to remove photoresist, the pattern of hydrophilic spots was revealed. A post bake on a hot plate was provided, causing the contact angle of the droplet to be significantly different on the hydrophilic and hydrophobic site (θ = 81° and θ = 131°, respectively). The top plate of the DMF chip was indium-tin oxide (75 mm by 25 mm by 1 mm) spin-coated with Teflon-AF [1% (w/w) in FC-40] as a hydrophobic layer.

### Device assembly and operation

The all-in-one DMF automated platform included a homemade instrument called μFluidbox that was used to manage droplet operations controlled by a computer, the top and bottom plates of a DMF chip, a microscope, and a heater. A sequence of voltages (100 to 200 V, 6 kHz, sine wave) was applied between the top plate (ground) and electrodes in the bottom plate to power the closed-EWOD system via a Pogo pin interface. One top/bottom plate pair of the DMF chip was assembled with a polyethylene terephthalate (PET) film to form a spacer with a thickness range from 30 to 120 μm as needed and filled with 2-cSt silicone oil to reduce evaporation of the nanoliter-volume droplets on the substrate. The bottom plate contained an array of 91 actuation electrodes along with five reservoir electrodes from which unit droplets were dispensed, as well as reservoirs for reagent introduction or waste removal. The actuation electrodes, a series of squares with a specified length on each side (from 0.3 to 1.5 mm), endowed the chip with good ability for droplet moving, splitting, and merging. It is worth mentioning that the μFluidbox exhibited an excellent capacity for droplet manipulation in real time or for preprogrammed operation.

### Droplet dispensing on DMF chip

Chip operation performance was evaluated by the uniformity of droplet volume dispensed from a reservoir. By fixing the spacer between the top and bottom plates via a gap of known height, the volume of the droplet was directly related to the area of droplet. After assembling the DMF device, unit droplet dispensing from the reservoir controlled by μFluidbox was imaged in the bright field using a 10× objective on a Leica DM2700 microscope (Leica Microsystems Inc., Concord, ON, Canada). The images were processed by ImageJ to determine the cross-sectional area of each dispensed droplet and, by calculation, the droplet volume. Fifteen dispensed droplets were measured in each case.

### Simulation of single-cell capture structures

Computational fluid dynamics (CFD) simulation was used to rank different micropillar geometries on the basis of interaction probability and gentleness of shear stress. In addition, CFD coupled with solid mechanics was used to predict parameters to optimize micropillar geometry and visualize hydrodynamic behavior defined by the simulation model. In this model, the fluid flow is described by the Navier-Stokes equation, and the cells (microspheres) obey linear elastodynamics and Newton’s equations of motion. Two-Phase Flow and Level Set interface have been used to explain different surface hydrophilicities. To reduce the computational complexity, we ignored the influence of EWOD and simulated only the effect of structure on fluid behavior in laminar flow. The finite-element solver COMSOL Multiphysics software was used to create the mesh of the simulation domain and to discretize governing equations for a solution. Computations were performed on a desktop computer consisting of 16 cores (2 × eight-core Intel Xeon processor 2.60 GHz; 64-GB total memory).

### Capture characterization and optimization

An orthogonality experiment with various cell concentrations (1 × 102, 2.5 × 102, 5 × 102, 1 × 103, 2.5 × 103, and 5 × 103 cells /μl) and settling time (0, 15, 30, 60, and 180 s), as the primary influencing factors, was set to optimize the capture efficiency. Additional variations, including the gap between the two plates (30, 50, 60, 100, and 150 μm), electrode size (0.44, 0.62, and 0.80 mm), and cell lines (K562 for suspension cells and MRC-5 for adherent cells), showed no significant effect on capture efficiency. As another influencing factor, we characterized the moving path of cells in droplet under various actuation voltages (100, 110, 120, 130, 140, 150, 160, 170, and 180 V). We applied a sequence of preprogrammed voltages to actuate a cell droplet for unmanned single-cell capture, while for manned operation, the droplet was controlled in real time by the operator. All cells were washed with PBS before each test for single-cell capture efficiency.

### Cell culture

All cell lines used in this study were purchased from Cell Library of Chinese Academy of Sciences. The human leukemia cell line K562 was grown in Dulbecco’s modified Eagle medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin (PS) at 37°C in a humidified incubator with 5% CO2. The normal diploid human cell lines MRC-5, with 46, XY karyotype were used to characterize the performance of WGA methods. They were cultured in Eagle’s minimum essential medium (MEM) plus 1% sodium pyruvate as an additional source of energy and 1.5 g liter−1 dicarbonate supplemented with 10% FBS and 1% PS at 37°C with 5% CO2. Once the cells became confluent, 0.25% trypsin-EDTA was used for cell dissociation, and cells were resuspended in fresh medium. MRC-5 cells with more than eight passages were discarded.

### Cell preparation and on-chip reagent

The cell suspension was washed with sterilized PBS at least three times followed by resuspension in fresh PBS with 0.05% (v/v) Pluronic F68 (Sigma-Aldrich) to reduce biofouling. It was pipetted gently and then passed through a 40-μm filter three times to make cells disperse into single cells. The cell suspension had a concentration of about 2.5 × 106 cells ml−1 after dilution with PBS containing 0.05% (v/v) Pluronic F68.

On-chip single-cell WGA reagents included alkaline lysis buffer [400 mM KOH, 10 mM EDTA, 100 mM dithiothreitol, (pH ~13)], neutralization buffer [1 M tris-HCl, (pH ~4)], and REPLI-g Single Cell Master mix (REPLI-g Single Cell Kit, Qiagen). All the above reagents, except for neutralization buffer, were supplemented with 0.05% (v/v) Pluronic F68.

### Automated single-cell isolation, lysis, and MDA

The automated single-cell WGA was implemented by a modification of a previously reported protocol. Cell suspension, PBS, alkaline lysis buffer, neutralization buffer, and REPLI-g Single Cell Master mix were loaded into their designated reservoirs by pipetting and dispensed to form unit droplets volume-controlled by the size of the actuation electrode connected to the respective reservoir.

Before the experiment, the chip, silicone oil, and all reagents except DNA or enzymes were exposed to UV light for at least 30 min to eliminate all external DNA. After device assembly, a cell droplet was first actuated to the electrode containing the butterfly structure, and a 30-s settling time was provided. Then, the droplet was moved through the structure to form a subdroplet and to trap a single cell at the U-shaped dummy. After dispensing a PBS droplet to remove the undesired cells, the single-cell capture was accomplished, with a total time of about 1 min. For cell lysis and MDA, the volume ratio of droplets containing the single cell, alkaline lysis buffer, and neutralization buffer was 1:6:6, and REPLI-g Single Cell Master mix was added to quadruple the total volume. Typically, 13.8 nl of alkaline lysis buffer was added to lyse the cell and incubated at 65°C for the specified time. Then, a 13.8-nl neutralization buffer droplet was dispensed to mix with the lysate at 4°C, followed by addition of 120 nl of REPLI-g Single Cell Master mix. We actuated the droplet motion for full mixing. The MDA reaction was carried out on a heater set to 30°C for 10 hours with a final total volume of 150 nl, after which a 65°C heating for 15 min was provided to terminate the amplification.

### Amplified sample collection and purification

After amplification, we held the droplet by applying voltage to the relevant electrode and picked up the sample using low-attachment mouth pipets. The sample was diluted with water to 30 μl, and 1.8× AMPure XP beads were used for DNA purification. MDA yield was assessed by measuring the DNA concentration of 1 μl of diluted MDA product using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific).

### Whole-genome library preparation and sequencing

For each amplified sample, 100 ng of DNA was used to build the sequencing library for the Illumina platform using the AnnoLib DNA Library Prep Kit for Illumina (Annoroad). The libraries were sequenced by Illumina HiSeq 4000 sequencers. The MRC-5 and K562 bulk sample, single MRC-5, and K562 cells were sequenced on Illumina HiSeq 4000 platform using the “rapid run” mode (two-lane-per-flow cell) with 2 × 150–base pair (bp) pair-end sequencing.

### Alignment and analysis of whole-genome sequencing data

Sequencing reads were trimmed of adaptor and barcode sequences by Illumina software on the sequencing instrument. Sequencing data (fastq files) from other published studies were downloaded from the National Center for Biotechnology Information online database. All sequencing data were aligned to the GRCh37-lite reference genome using BWA-MEM (version 0.7.17) under the default setting. Aligned data were sorted and PCR duplicates were marked using Picard tools (version 2.18.13). SAMtools (version 1.9) was used to index aligned and sorted data. Statistics were calculated considering the entire reference genome using SAMtools (version 1.9) after downsampling all samples to the same number of total sequenced bases. In samples for which bulk data were available, Control-FREEC (version 10.9) was used to identify regions of the genome containing large-scale CNVs with a 500-bp bin size. These regions were omitted from subsequent analyses in the analysis of all single-cell samples.

Comparison of all samples using binned reads was performed as follows. The HMMcopy readcounter function was first used to determine the number of aligned reads, excluding duplicate reads, falling within fixed-width bins across the genome for each sample. The mean number of reads per bin of the sample with the fewest reads was then found. SAMtools (version 1.9) was then used to randomly downsample binned reads of all other samples, resulting in equal mean numbers of reads per bin across all samples. This ensured that the same quantity of aligned data was compared for all samples. For the MRC-5 samples binned into 10- and 100-kb bins, HMMcopy functions in R were then used to correct downsampled binned reads for biases due to the GC content and mappability of each bin.

Lorenz curves were generated from the high-depth sequencing data by downsampling all samples to the same depth, defined as the number of aligned bases divided by reference size [masked by a 75-bp universal mask (um75-hs37d5)]. To generate breadth versus depth curves, each sample was downsampled to between 0.5× and 10× sequencing depth relative to its reference [masked by a 75-bp universal mask (um75-hs37d5)] at increments of 0.5×, and BEDTools (version 2.17.0) was used to calculate coverage breadth at each depth. The CV plot is a better measure of magnification uniformity compared to the Lorenz curve and power spectrum. For drawing the CV curve, we refer to the analysis method of the LIANTI (18). The calculation formula isCV(L)={ιdLι213dL2  Lι1dL213dιL  Lι1

In addition, at a bin size L, the parameters were used for reads with a length l = 150 to a depth d = 10.

We used SAMtools and BCFtools to process the sequencing data for calling SNPs with root mean square mapping quality more than 40 and total read depth greater than 15. We called a nonreference (NR) allele if the NR allele was supported by at least five reads in the single-cell sample. If there were enough readings at a site covering that position, then both alleles had to be presented and accounted for more than 5% of all readings at that position. If not, then loss of heterozygotes occurred, so the number of heterozygous/all sites lost with sufficient depth as the ADO rate was calculated. Error rates were calculated from SNVs for the single copy of the X chromosome using male MRC-5 cells. Heterozygous SNVs identified on the X chromosome were considered as an error (all sites with insertions or deletions within 100 bp were filtered out). Compared with unamplified samples, if SNVs were present in the unamplified sample, then it was considered as a true-positive SNV; otherwise, it was considered as a false positive.

Lorenz curves were generated from the high-depth sequencing data by downsampling all samples to the same depth, defined as the number of aligned bases divided by reference size (taking into consideration omitted genomic regions in each cell type). To generate breadth versus depth curves, each sample was downsampled to between 0.5× and 10× sequencing depth relative to its reference at increments of 0.5×, and BEDTools was used to calculate coverage breadth at each depth.

CNVs were called using the HMMcopy software package (24), which takes in normalized binned read depth, groups contiguous bins into segments predicted to have equal copy number, and assigns a copy number to bins in each segment using a Hidden Markov Model. For all sample datasets except for the MRC-5 samples binned into 10- and 100-kb bins, the following custom HMMcopy parameters were used for CNV calling. Seven copy number states were used; the m values were set to 0, 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 for copy number states 0, 1, 2, 3, 4, 5, and 6, respectively; the μ values were set to 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0 for copy number states 0, 1, 2, 3, 4, 5, and 6, respectively; the κ values were set to 25, 50, 800, 50, 25, 25, and 25 for copy number states 0, 1, 2, 3, 4, 5, and 6, respectively; the e value was set to 0.995; and the S value was set to 35. To find the concordance between copy number states of bins of single-cell samples and bulk DNA in five MRC-5 cells, only bins with a mappability score above 0.85 were considered.