Computational integration of nanoscale physical biomarkers and cognitive assessments for Alzheimer’s disease diagnosis and prognosis

See allHide authors and affiliations

Science Advances  28 Jul 2017:
Vol. 3, no. 7, e1700669
DOI: 10.1126/sciadv.1700669


With the increasing prevalence of Alzheimer’s disease (AD), significant efforts have been directed toward developing novel diagnostics and biomarkers that can enhance AD detection and management. AD affects the cognition, behavior, function, and physiology of patients through mechanisms that are still being elucidated. Current AD diagnosis is contingent on evaluating which symptoms and signs a patient does or does not display. Concerns have been raised that AD diagnosis may be affected by how those measurements are analyzed. Unbiased means of diagnosing AD using computational algorithms that integrate multidisciplinary inputs, ranging from nanoscale biomarkers to cognitive assessments, and integrating both biochemical and physical changes may provide solutions to these limitations due to lack of understanding for the dynamic progress of the disease coupled with multiple symptoms in multiscale. We show that nanoscale physical properties of protein aggregates from the cerebral spinal fluid and blood of patients are altered during AD pathogenesis and that these properties can be used as a new class of “physical biomarkers.” Using a computational algorithm, developed to integrate these biomarkers and cognitive assessments, we demonstrate an approach to impartially diagnose AD and predict its progression. Real-time diagnostic updates of progression could be made on the basis of the changes in the physical biomarkers and the cognitive assessment scores of patients over time. Additionally, the Nyquist-Shannon sampling theorem was used to determine the minimum number of necessary patient checkups to effectively predict disease progression. This integrated computational approach can generate patient-specific, personalized signatures for AD diagnosis and prognosis.


Alzheimer’s disease (AD) is an age-related neurodegenerative disorder that results in the gradual deterioration of specific brain regions that hinders the person’s ability to think, recall memories, learn, and perform daily tasks (1). Unknown mechanisms that promote and drive AD pathogenesis have likely resulted in the death of effective treatments. The increasing prevalence of AD has become a global concern (2). Currently, AD is diagnosed using the “evaluate and eliminate” approach (3). With this strategy, patient history, physical exams, laboratory tests, imaging scans, and neurophysiological assessments are examined by doctors as a means to diagnose AD and determine its progression (4). This can be problematic though, because most symptoms of AD are only identifiable once the disease has significantly progressed (5). Thus, to improve disease diagnosis, means for early detection are needed.

Proteins like amyloid-β (Aβ) and tau (610) are some of the most well-studied biomarkers to clinically diagnose and monitor AD (11), because their morphological aggregation (1215), oligomerization (16), and fibrillization (17) are closely related to disease progression. However, despite their utility, these biomarkers have time, financial, and detection limitations (18). Cognitive assessments are also used for early AD diagnosis, because cognitive impairments precede the loss of functional impairments (19). Commonly used assessments, including the Mini-Mental State Examination (MMSE) (20), can provide brief screening tests that quantitatively assess the severity of cognitive impairment and document cognitive changes occurring over time (21). However, they have sensitivity limitations due to their reliance on conditional factors, such as patient perceptions, education levels, test familiarizations, responses, and performance (22). Although diagnostic approaches that focus solely on biochemical analysis and brain imaging (23) play a role as important complementary tools for cogitative assessments, they do not comprehensively capture the complex nature of macro- to nanolevel changes that occur during disease progression (1215).

Currently, physical properties such as cell stiffness (Young’s modulus) are being used in oncological research to distinguish malignant cancer cells from benign cells (2426). Studies have also been conducted with peptide self-assembly that relate structures to molecular activities and mechanical properties (27). Because pathological protein components inside the cerebral spinal fluid (CSF) and blood undergo macro- to nanolevel physical changes that promote physiological changes, such as the formation of protein aggregates that reflect disease advancement (28), these components and their physical changes during the aggregation may be detectable through nanoscale characterization and may have the potential to be used as a new class of “physical biomarkers” for AD diagnosis. This may provide a potential measure to determine how alterations to the nanomechanics and nanomorphology of proteins in patients’ CSF and blood reflect and affect AD onset and pathogenesis.

However, the current evaluate -and -eliminate approach fails to sufficiently assess the combined effects of each measurement, which makes it difficult to develop new therapies or early diagnostics (2931). Accordingly, there is immense promise in developing unbiased technologies that can computationally integrate cognitive assessments and additional biomarkers to diagnose AD, monitor its progression, and provide preventative recommendations. Considering that (i) the effects of widespread alterations that occur during AD pathogenesis are extensive and benefit from being analyzed using a combination of parameters including cognitive assessments and physical changes of protein biomarkers and that (ii) analyzing all the data at once would better represent a patient’s disease state, we hypothesized that computation integration of behavior-based cognitive assessments and protein physical biomarkers would lead to a more effective method to comprehensively diagnose and predict the course of an individual’s AD.

To achieve the above goal, a Kalman filter (KF)–based computational algorithm was used. The KF is a recursive algorithm that has superior capabilities for predicting dynamic processes because it integrates multiple measurements and predictions to form optimal estimates (32, 33). Unlike the evaluate -and -eliminate approach, we acknowledge that measurements from different types of complementary assessments need to be computationally integrated and analyzed concurrently to best reflect the nano to systemic effects of AD (Fig. 1).

Fig. 1 Using a computational approach to diagnose AD using nanocharacterization and cognitive assessments.

An atomic force microscope (AFM) and an HMI system were used to characterize nanomechanical and morphological properties of protein components from human CSF and blood samples. These properties were inputted into the algorithm with cognitive assessments to diagnose AD and predict its progression.

In this research, we first used a KF-based algorithm to process conventional cognitive assessment scores to diagnose disease states. The smoother prediction using KF allows a better analysis of the AD progression by following a consistent trend (fig. S1). This could exclude the diagnosis errors due to variations of doctors’ diagnosis and inconsistent fluctuations of the AD patients during cognitive assessment. These results show the promising potentials of the KF algorithm in AD diagnosis and prognosis.

Next, physical biomarkers were collected from CSF and blood (serum) samples for the AD status characterization in the nanoscale (“Nanoscale characterization” section). To ensure the effectiveness of the nanoscale biomarkers, we collected CSF and serum samples from both AD patients and healthy subjects as controls. The CSF and serum have differential Aβ concentrations, with the serum having a low abundance and the CSF having a relatively high abundance. As a result, we developed different characterization approaches to obtain nanoscale physical properties from each sample type. In the CSF samples, protein aggregates were collected, and their material and morphological properties were measured using atomic force microscopy (AFM) and a high-contract microparticle imaging (HMI) system. For the serum samples, functionalized AFM tips (34, 35) were used to detect Aβ components and measure their force interactions with other proteins. Synthesized Aβ was also used to determine whether freeze-thaw cycles would affect protein particle aggregation, independent of native Aβ isolation centrifugation (fig. S2).

Finally, we integrated multiple physical properties of the CSF and serum protein components into the KF-based algorithm for AD diagnosis (“Computational integration” section). We developed an unbiased approach to diagnose AD and predict its progression by integrating the cognitive assessment scores and physical biomarkers, which are the nanoscale physical properties of protein aggregates from patients. Patient-specific signatures for AD diagnosis and prognosis were generated through this integrated computational approach (Fig. 1).


Nanoscale characterization

Nanomechanical and morphological characterizations of CSF protein components provide a new class of physical biomarkers for AD diagnosis. To date, few studies have been performed for investigating the nanomechanical properties of proteins implicated in AD (27, 36, 37). This is most likely due to challenges associated with quantitatively characterizing 1000+ CSF proteins (38) and then subsequently associating their physical characteristics with AD pathogenesis. We first evaluated the existence, abundance, and localization of the AD hallmarks: Aβ and tau in the CSF’s protein components (figs. S3 and S4). Then, we used an AFM-based nanoindentation to determine whether the elastic modulus (Young’s modulus, the relationship between stress and strain in a material) of CSF protein components was altered at different stages of disease progression. Topographic mappings (Fig. 2B) were performed on all samples and allowed us to calculate Young’s modulus mappings (Fig. 2A); these values were associated with quantitative differences among different groups. On average, Young’s modulus mapping revealed that healthy protein samples were less stiff than diseased ones [analysis of variance (ANOVA), mean ± SD; healthy, 63.542 ± 30.17 MPa; mild AD, 97.38 ± 47.37 MPa, P < 0.01; moderate AD, 127.0 ± 53.4 MPa, P < 0.001; severe AD, 138.3 ± 66.69 MPa, P < 0.001; Fig. 2A]. However, comparison between the three disease states showed no significant differences, because the average Young’s modulus was calculated on the basis of a whole mapping region, including the protein aggregates and the areas without aggregates, which were called irrelevant areas that were rigid and affected our ability to distinguish the differences between different disease states. We examined the plausibility of this limitation and found that diseased samples had more aggregates and irrelevant areas. To more accurately characterize the aggregates’ nanomechanics, we removed the irrelevant areas in our analysis and focused on calculating the Young’s modulus of protein aggregates (particles). In this analysis, aggregates were identified from the topographic images (Fig. 2B) as high points, and their corresponding positions on the Young’s modulus mappings were extrapolated to determine the particles’ stiffness (Young’s modulus). The protein aggregates and particles of healthy subjects were less stiff than those with AD (ANOVA, mean ± SD; healthy, 11.78 ± 11.54 MPa; mild AD, 24.21 ± 16.86 MPa, P < 0.01; moderate AD, 37.38 ± 16.5 MPa, P < 0.001; severe AD, 54.09 ± 23.41 MPa, P < 0.001; Fig. 2C). A significant increase in stiffness was observed from healthy to mild AD and from mild AD to severe AD. These observed increases in Young’s modulus during disease progression could have resulted from alterations in the mechanical properties of Aβ(1–42) or tau aggregates and may be correlated with changes in their molecular structures. This would be in line with former studies showing that during Aβ(1–42) aggregations, the protein transforms from an oligomer to a mature fibril (39), consequently altering its internal structure and mechanical properties as the disease progresses (27). Furthermore, these results indicated that the nanomechanics of CSF protein components could be correlated with AD disease stages. As AD pathogenesis increased, so did aggregate stiffness. This finding is in support of our hypothesis that these properties can be used as physical biomarkers for AD diagnostics.

Fig. 2 Nanocharacterization of CSF protein components.

(A) The average Young’s modulus of an entire mapping area using AFM. Healthy samples showed lower values than disease cases. However, the comparison between moderate and severe cases showed no significant difference. (B) Representative images of a topography mapping and a Young’s modulus mapping using AFM. (C) Young’s modulus of particles was obtained from the corresponding points that had relatively high value in the topography mapping. Young’s modulus was significantly increased along with the disease progression. (D) Representative images of particles of four disease stages using the HMI system (CytoViva) (scale bar, 100 μm). (E) Representative AFM images of protein particles from four disease stages (scale bar, 1 μm). Height analysis of those particles was performed using a profiling line crossing the particles to determine particle height (z axis). (F) Particle concentration, shown as the numbers in a constant region of 675 × 900 μm2. There were significantly more particles in moderate and severe cases than in the healthy group. (G) Data of particle height. All disease cases were significantly larger than the healthy group. The particle height showed a gradual increase along with the disease progression. All data are presented as box plots. Sample number n = 34. One-way ANOVA test was performed; α = 0.05. **P < 0.01; ***P < 0.001.

We then asked whether AD pathogenesis resulted in alterations to the nanomorphology of those particles and whether those alterations could serve as additional physical biomarkers. To answer these questions, we determined differences in the physical properties of patients’ CSF protein components using HMI, which allowed us to characterize the particle concentration and distribution of the CSF components. In healthy subjects, minimal aggregates and a uniform particle distribution were observed; however, disease progression was accompanied by detection of more aggregates and larger particles (Fig. 2D). Particle concentration, but not the size (fig. S6), was significantly greater in moderate and severe AD compared to controls (ANOVA, mean ± SD; healthy particles, 110.6 ± 39.22; mild AD particles, 132.9 ± 60.44, P < 0.001; moderate AD particles, 157.1 ± 93.83, P < 0.001; severe AD particles, 303.3 ± 146.5, P < 0.001; Fig. 2F). Differences among mild AD, moderate AD, and healthy cases were not observed. This could have been due to the pathological status within the brain, which undergoes increased rates of altered pathology as disease severity increases. From the nanomorphological data, we determined that CSF particle concentration had applicability as a physical AD biomarker.

Additional nanoscale imaging was performed using AFM to enhance detection of the nanomorphology of protein aggregates. In the AFM images, particles were represented as white spots, and the gray-scaled image pixels served as individual measurements to determine the particle height (Fig. 2E). These images revealed clear differences and showed that moderate and severe AD cases had the highest prevalence of large aggregates. Height characterization of individual particles within the CSF showed that the particle height gradually increased as the severity of the disease progressed (ANOVA, mean ± SD; healthy particle height, 20.72 ± 7.06 nm; mild AD, 36.77 ± 8.97 nm, P < 0.001; moderate AD, 51.53 ± 11.40 nm, P < 0.001; severe AD, 69.69 ± 12.99 nm, P < 0.001; Fig. 2G). This finding was consistent with those from the earlier HMI and nanomechanical experiments showing that structural changes occurred in the protein components as disease severity increased and thus proving to have the capability of being physical AD biomarkers.

Nanocharacterization of serum protein components enriches the physical biomarkers from the force perspective. Functionalized AFM tips were used to determine how nanoforce measurements of protein aggregates in serum are associated with disease progression. As shown in Fig. 3A, when the functionalized AFM tip was bound with Aβ(1–42), changes in tip thermal frequency occurred. As binding increased, larger changes in thermal frequency resulted. On the basis of the level of the changes, we determined the relative amount of Aβ(1–42) that was associated with different AD stages. Aβ(1–42) amounts decreased with increasing disease progression (ANOVA, mean ± SD; healthy, 5.9 ± 1.44; mild AD, 3.72 ± 0.59, P < 0.001; moderate AD, 2.69 ± 0.56, P < 0.001; severe AD, 2.51 ± 0.72, P < 0.001; Fig. 3B).

Fig. 3 Nanocharacterization of serum protein components using functionalized nanoprobes with AFM.

(A) The thermal frequency of the tip changed when the tip was bound with Aβ. (B) The relative changes of tip thermal frequency decreased along with the disease progression. (C) The serum and anti-Aβ antibody interaction was measured when the targeted binding sites were detected during the mapping of selected areas. (D) The serum and anti-Aβ antibody interaction decreased along with the disease progression. Lower amount of Aβ inside the serum of AD patients was demonstrated. (E) The binding force between Aβ aggregates and anti-Aβ antibody was measured when the Aβ aggregates and the substrate (coated with anti-Aβ antibody) were separated. (F) The binding force has an increasing trend along with the disease progression. (G) The molecule force between Aβ aggregates and anti-Aβ antibody was extracted from the plateau-shaped region of retract force curves. (H) An increasing trend of molecule force was found, reflecting the increasing fibrillization caused by Aβ components associated with disease progression. All data are presented as box plots. Sample number n = 30. One-way ANOVA test was performed; α = 0.05. *P < 0.05; **P < 0.01; ***P < 0.001.

Serum and anti–Aβ(1–42) antibody interactions were then characterized. Using a serum-coated substrate and a functionalized AFM tip, differences in binding between the serum proteins from different stages of AD and the anti–Aβ(1–42)–coated tip were measured (Fig. 3C). Serum samples with more Aβ(1–42) resulted in more binding sites and thus greater serum and anti–Aβ(1–42) antibody interactions. As AD status progressed, a decreasing serum and anti–Aβ(1–42) antibody interaction was observed (ANOVA, mean ± SD; healthy, 6.11 ± 1.33 nN; mild AD, 4.82 ± 0.68 nN, P < 0.001; moderate AD, 3.33 ± 0.5 nN, P < 0.001; severe AD, 3.06 ± 0.6 nN, P < 0.001; Fig. 3D). This result was consistent with conventional approaches demonstrating that blood Aβ deceases along with disease progression (40).

We hypothesized that the Aβ-embedded proteins of AD patients had differential aggregations and fibrillization capabilities compared to those of healthy people (41). To test this hypothesis, a substrate coated with anti-Aβ antibody was used to measure the binding force between the Aβ aggregates and the anti-Aβ antibody. Here, an AFM tip was bound with Aβ aggregates (as in Fig. 3A); then, the tip was moved to make contact with the substrate so that additional binding between the Aβ aggregates and the substrate could be generated. The tip was then lifted to break the binding, and the binding force was measured (Fig. 3E). It was observed that the binding force had a gradual increase with disease progression (ANOVA, mean ± SD; healthy, 3.48 ± 0.59 nN; mild AD, 3.68 ± 0.62 nN, not significant (NS); moderate AD, 4.2 ± 1.38 nN, P < 0.05; severe AD, 6.95 ± 1.37 nN, P < 0.001; Fig. 3F). Furthermore, single molecule force was extracted from the force curves. The plateau-shaped regions showing a constant force that is independent of the extension length in the retract force curves represented the molecule force between Aβ aggregates and anti-Aβ antibody (Fig. 3G), in which an increase was found (ANOVA, mean ± SD; healthy, 56.1 ± 15.1 pN; mild AD, 67.7 ± 17.8 pN, NS; moderate AD, 74.9 ± 16.5 nN, P < 0.01; severe AD, 82.8 ± 14.5 pN, P < 0.001; Fig. 3H). Because different fibrillization may result in various neurotoxicities (41, 42), this trend may reflect the increasing fibrillization capability of the pathological Aβ components during different disease stages.

Computational integration

Framework of the computational algorithm for AD diagnosis and prognosis.
Data input

Having identified nanomechanical, nanomorphological, and force properties that could serve as physical biomarkers, we then integrated these variables with cognitive assessments as inputs for a computational algorithm to quantitatively diagnose and predict AD progression of individual patients (Fig. 4A). AD stages that were characterized by ordinal variables ranging from 1 to 4, representing increasing severity of disease, were included for the disease progression. Decimal numbers, such as 1.8, could be used to represent disease progression.

Fig. 4 AD diagnosis using a KF-based algorithm.

(A) Flowchart of the computational integration of physical biomarkers and cognitive assessments for AD diagnosis and prediction. Nanoscale measurements of CSF and serum samples from a new patient were fused with those measurements of the subjects with known AD stages. Then, the fused data were used to train the KF model. Diagnosis, progression prediction, and optimal checkup frequency were ultimately obtained. (B) Diagnosis of AD stages based on various combinations of physical parameters was conducted to evaluate the diagnostic accuracy. The accumulated diagnosis errors for all cases were calculated by comparing with the doctor’s diagnosis results. AD stages were converted into ordinal variables by assigning ranks of 1 to 4. Healthy condition labeled 1 and numeric assignments increased until 4, which represented severe AD. Measured data and medical records from doctors of 34 patients were used as the population database for the KF model training. Four female patients at the “age/AD stage” of 73/healthy, 75/mild, 77/moderate, and 81/severe were chosen to formulate a new virtual patient to test the proposed KF-based algorithm for the following AD analysis.

Computational algorithm

The KF predicts dynamic states by making estimates based on an initial state, and then updating its estimates when new measurements are acquired. To do this, the KF has two stages: (i) a “prediction stage,” which generates an a priori estimate about a patient’s disease state from inputted measurements, and (ii) an “update stage,” which readjusts the prediction based on follow-up measurements. In our algorithm, both stages included cognitive assessment scores and nanoscale characterizations of the proteins from patients with varying degrees of AD.

Capability for analysis

The KF-based algorithm was trained to generate personalized predictions as a way to make an accurate prognosis that reflects each individual patient’s dynamic disease state. Three types of analysis were performed for individual patients: current AD status, rate of AD progression, and optimal checkup frequency. The approach used for modeling and training of the KF-based software package is described in detail in Materials and Methods.

Identification of key parameters for ad diagnosis and prognosis. To achieve a robust prognosis, we investigated which of our measurements were more closely correlated with AD status. For this purpose, a correlation matrix (in terms of AD status) was derived from patient data to rank the parameters obtained from nanoscale physical biomarkers. Two sets of parameters, including a CSF parameter set (particle concentration, length of aggregates, particle height, and particles’ Young’s modulus) and a serum parameter set (tip thermal frequency changes, serum and anti-Aβ antibody interaction, binding force, and molecule force), were investigated.

We obtained correlation coefficients (r) for eight parameters: particle concentration, r = 0.81; length of aggregates, r = 0.86; particle height, r = 0.96; particle Young’s modulus, r = 0.91 in the CSF; tip thermal frequency changes, r = 0.94; serum and anti-Aβ antibody interaction, r = 0.92; binding force, r = 0.9; and molecule force, r = 0.97 in the serum. These coefficients were then used as a reference to select parameters for KF inputs. On the basis of the results obtained from the correlation matrix, we selected parameters with the highest correlation to disease progression. For CSF, particle height and particle Young’s modulus were used, and for serum samples, tip thermal frequency changes and molecule forces were selected.

To demonstrate the sensitivity and effectiveness of using these parameters for determining the AD stages through KF-based diagnosis, we compared the diagnostic accuracy of different parameter combinations (Fig. 4B). Compared to the doctor’s diagnosis, the KF diagnostic results using parameters with higher correlation coefficients trended most like the doctor’s diagnosis and had greater stability and predictability for AD diagnosis.

Integrating cognitive assessment and key parameters of physical biomarkers for AD diagnosis and prognosis. The identified key nanoscale physical parameters and cognitive assessment scores, including MMSE and Self-Administered Gerocognitive Examination (SAGE; see details in fig. S1), were then inputted into the KF-based algorithm to provide the diagnosis about each patient’s disease status (fig. S7). By doing this, the patients’ cognitive assessment scores and nanobiophysical measurements could provide complementary information. Therefore, a dynamic model reflecting the relationships between the measurements and AD status was established. This allowed accurate diagnosis of AD status by excluding effects from the changes spanning from the cognitive to the nanophysical scale. This method prevents the KF-based algorithm from making quantitative conclusions regarding AD status using limited measurements (4). As displayed in a proof-of-concept study, this approach was proven to be in closer agreement to the doctor’s assessment of AD status, as compared to specific conditions in which cognitive assessments could not correctly stage a patient (fig. S8).

Effective diagnosis, unbiased progression prediction, and optimized patient checkups for AD achieved through the computational approach. To further demonstrate the clinical effectiveness of the proposed computational algorithm, as well as the differences between using CSF and blood-based data, all previous data covering healthy, mild AD, moderate AD, and severe AD were used for proof-of-concept testing. On the basis of the CSF data (Fig. 5, A and B), blood data (Fig. 5, C and D), and CSF-blood integrated data (Fig. 5, E and F), the computational results validated the effectiveness of the proposal computation approach to make prognostic (Fig. 5, A, C, and E) and determined optimal checkup frequency (Fig. 5, B, D, and F).

Fig. 5 The stage diagnosis, progression prediction, and optimal checkup frequency of AD patients based on CSF data, blood data, and CSF-blood combined data.

(A) The early stages of CSF data showed large errors. (C) On the basis of serum data, the large errors were presented in the later disease stages. (E) When CSF and serum data were both applied for the KF-based prediction, smaller error and better fit were obtained, which showed that more variable inputs for the KF-based algorithm would provide better results. (B) Top: Data distribution in terms of age. Black line is clinical data; green, purple, orange, red, and blue lines are simulated data obtained by linear interpolation of clinical data (in terms of AD status). They correspondingly represent data sets of 19, 7, 6, 5, and 4 visits. Bottom: Prediction results of AD status based on the simulated data as shown in (A). On the basis of CSF data, the corresponding prediction errors were 25.7% (blue line, 4 visits), 15.6% (red line, 5 visits), 8.5% (orange line, 6 visits), 4.3% (purple line, 7 visits), and 2.7% (green line, 19 visits) by comparing with the clinical results. The prediction error can be calculated by Embedded Image, where xpre,j is the predicted AD stage, xclin,j is the clinically obtained AD stage, and j indicates the jth AD stage. The CSF case had good fit at early stages, but the later part differed too much. (D) The serum case had good fit for the whole changing trend, but the error was large. (F) The results from serum and CSF combined data fit well with doctor’s diagnosis when the visits reached five times in 8 years.

CSF data in Fig. 5A demonstrated the ability of the KF to accurately diagnose AD and to predict its progression. For diagnostic results (solid lines in Fig. 5A), the two-visit data (green solid line) showed an error of 50%. The model diagnosed a patient as stage 3 (moderate) when the doctor determined that she was stage 2 (mild). On the basis of the three-visit data (red solid line), where the diagnosis error was 6.7%, the model predicted that the patient was stage 2.8 (between mild and moderate and more close to moderate), a finding more in line with the doctor’s. By the fourth visit (solid blue line), diagnosis error was reduced to 2.5%, a finding that highlights how instrumental follow-up visits were for the KF-based algorithm to accurately make predictions. The KF was used not only to diagnose current AD status but also to predict its progression (dashed lines in Fig. 5A). Although estimations of AD progression using one to two visits (purple and green dashed lines) were possible, more checkups increased estimation accuracy (three visits, red dashed line). In Fig. 5A, one-visit data showed that stage 2.5 would occur when she was at age 73.5 years, whereas two-visit data predicted the onset of stage 2.5 when she was at age 75 years. In reality, the true onset was at age 76 years. Thus, the results demonstrated the clear effectiveness and potential of the proposed KF-based algorithm for unbiased AD diagnosis and prognosis. Notably, the CSF data did show prediction errors in the early disease stages, but unlike serum data, large errors were not present in the later disease stages (Fig. 5C). Combining the CSF and serum data as inputs for the KF-based prediction resulted in a smaller error and a better fit (Fig. 5E), thus demonstrating that more diverse inputs provide better results.

Although more checkups improved prediction accuracy, visiting too frequently may cause inconvenience to patients and increase medical costs. To determine the optimal checkup frequency to capture any changes in the KF inputs that would alter its prediction accuracy, we integrated the Nyquist-Shannon sampling (NSS) theorem (detailed in Materials and Methods). The NSS is a method of establishing the appropriate sampling rate to capture all information from a continuous signal with a series of discrete signals (43). In our approach, SAGE scores and physical biomarkers were treated as continuous variables; thus, simulations of future visit frequency were conducted by interpolation of simulated visits, AD status, and measurements (physical properties and cognitive assessment scores). To capture the progression of pathological and clinical events that lead to AD (4), predictions required a sampling rate (medical checkup frequency) that was at least two times higher than the frequency of progression dynamics. Thus, there was a minimum sampling rate (visiting frequency) that had to be met to make possible predictions on the basis of all information available throughout AD progression. The linear relationship between predicted visits and the minimum visiting frequency was evaluated by reviewing and comparing prediction error rates. On the basis of CSF data, simulation results (Fig. 5B) were obtained from representative data sets of 19, 7, 6, 5, and 4 visits. The corresponding prediction errors were 25.7% (blue line, 4 visits), 15.6% (red line, 5 visits), 8.5% (orange line, 6 visits), 4.3% (purple line, 7 visits), and 2.7% (green line, 19 visits). On the basis of these results, a minimum of six visits, that is, one visit every year in the first 4 years, appeared to be best for generating an accurate prediction result (error, <5%). Additionally, to keep this prediction accuracy, patients needed to maintain follow-up visits once every 16 months. When we compared the optimization of checkup frequencies based on CSF, serum (Fig. 5D), and both (Fig. 5F), we found that the CSF data provided a good fit at early stages but differed much in the later stages to make ideal predictions about the checkup frequency. The serum data had good fit for the whole changing trend, but the error was large. Using serum and CSF combined data (Fig. 5F), the results fit well and the five-visit curve was almost the same as doctor’s diagnosis. This means that patients only need to maintain an average of one doctor visit every 20 months for accurate prognosis, compared to the 16-month requirements from the CSF data–based prediction case.

Thus, in an optimal scenario, when a new patient arrives at the clinic, the KF-based algorithm would first diagnose AD status using baseline measurements from their CSF and blood nanocharacterization and cognitive inputs; then, additional computational means would be used to determine the optimal checkup frequency that captures any changes in their physical biomarkers or SAGE scores. The NSS, the KF, and the physical biomarkers from both CSF and blood provide a computational approach to determine the appropriate number and frequency of checkups needed to make real-time and accurate prognosis about a patient’s disease state.


Although much effort has been focused on evaluating cognitive assessment scores using computational methods such as noise filtration and data fusion, only small improvements in detection sensitivity have been achieved (44, 45). This is most likely due in part to (i) drastic differences in cognitive scores between different subjects with the same stage of AD and (ii) the limited number of measurement inputs. To address these concerns, some researchers have attempted to use regression modeling and support machine vector methods (46, 47). However, this has resulted in somewhat reliable short-term predictability and unreliable long-term predictability (48). In this research, we were able to overcome current AD diagnostic limitations in predictive modeling and biomarker utilization by using a KF-based algorithm that could computationally diagnose AD and predict its progression by integrating cognitive assessment scores and nanophysical characteristics of protein aggregates from CSF and serum. In the future, this method of computational integration could be further incorporated into the software for automatic AD diagnosis based on various inputs collected during the patient medical exams. This computational approach may reduce the human errors and help establish an unbiased approach to diagnose AD in hospitals, clinics, and research studies. Additionally, our findings support the previously hypothesized notion that AD progression is marked by alterations to the mechanical and morphological properties of pathological proteins in the CSF (17, 27) and blood (41, 42) and that the various fibrillization potentials of these proteins may help explain the pathological progression of AD. Furthermore, we found that alterations to these physical properties during disease pathogenesis occur in a correlative manner and that the nanomechanics (particle Young’s modulus) and nanomorphology (particle height) of CSF protein components, as well as the molecule binding force of serum Aβ, have applicability as physical AD biomarkers. Overall, our findings have established associations between the altered physical structures of pathological proteins and AD pathogenesis and, more importantly, demonstrate that these physical biomarkers can be computationally integrated with behavior-based cognitive assessments to produce patient-specific diagnostic signatures that can be used to improve means of medically diagnosing AD and predicting its progression.


Study participants

The present study was performed at The Ohio State University and The Ohio State University Wexner Medical Center. We analyzed 34 CSF samples from patients with AD (n = 24) and from healthy individuals serving as controls (n = 10). We analyzed 30 serum samples from patients with AD (n = 22) and from healthy individuals serving as controls (n = 8). Patients who received an AD diagnosis met the Diagnostic and Statistical Manual of Mental Disorders (Third Edition Revised) criteria of dementia (49) and the criteria of probable AD defined by the National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer’s Disease and Related Disorders Association (50). The controls underwent cognitive testing and neurologic examination by a physician, and individuals with objective cognitive or AD symptoms were not included as controls in the present study.

Collection of human CSF samples

CSF was obtained from a total of 34 human subjects ranging from 54 to 83 years of age (10 subjects from the healthy group, mild AD group, and moderate AD group, respectively, and 4 subjects from the severe AD group; details are shown in table S1). CSF was obtained by lumbar puncture by board-certified or board-eligible neurologists. The puncture was performed between the lumbar vertebrae L4/5 and L3/4. Numbing medication (Xylocaine) was injected into the skin before inserting the lumbar puncture needle to reduce the pain of the needle stick. A 22-gauge Sprotte spinal needle was used for lumbar puncture. We collected up to 20 ml of CSF from each patient. The samples were then centrifuged (1600g for 10 min at 4°C), and the supernatant was aliquoted into 2-ml plastic polypropylene screw cap tubes and frozen at −80°C for storage and further investigation.

CSF components preexperimental process

Protein aggregates were collected from CSF samples. Samples were first thawed at 4°C, and then 0.5 ml was centrifuged at 1600g for 15 min at room temperature. The precipitate (50 μl) was gently collected. Afterward, the collected portion was diluted 25× in volume using distilled (DI) water and centrifuged at 10,000g for 30 min, and the precipitate (50 μl) was collected. The same dilution procedure was repeated two more times. Eventually, the final precipitate (50 μl) was obtained and used for experiments.

Collection of human serum samples

Serum was obtained from a total of 30 human subjects ranging from 54 to 81 years of age (details are shown in table S2). Up to 10 ml of blood samples were collected and allowed to clot for 30 min at room temperature in a vertical position. The samples were centrifuged (1300g for 10 min at room temperature), and the supernatant was aliquoted into 2-ml plastic polypropylenes screw cap tubes and frozen at –80°C for storage and further investigation.

Immunofluorescence assay of CSF components

Processed CSF samples (10 μl) were combined with rabbit-derived primary antibodies anti–Aβ(1–42) antibody (#ab10148) and anti-tau (#ab64193, Abcam) and goat-derived anti-rabbit secondary antibody (Abcam) in immunofluorescence assays. Normal rabbit IgG antibody (Abcam) was used to label negative control samples. Labeling procedures were as follows: 10 μl of sample was mixed with one of the above three primary antibodies (5 μl, 1 mg/ml), incubated in 100 μl of DI water under room temperature for 2 hours, and centrifuged at 10,000g for 15 min. The precipitate (10 μl) was then collected, and another 100 μl of DI water was added. The previous centrifuge step was repeated, and the precipitate (10 μl) was collected. Next, the previous wash process was repeated. Secondary antibody (5 μl, 1 mg/ml) was mixed with it and incubated in 100 μl of DI water under room temperature for 30 min. The sample was then centrifuged at 10,000g for 15 min, and the precipitate (10 μl) was collected. The wash process used with the primary antibody labeling was conducted twice, and the final precipitate (10 μl) was used for fluorescence imaging.

Nanomechanics characterization using AFM

A CSF sample (10 μl) was dried on a slide glass surface for AFM (MFP-3D SPM, Asylum Research) nanomechanics characterization. Young’s modulus of the sample was obtained via nanoindentation. As the cantilever tip contacted the sample, the interaction between the sample and the tip produced a force curve (force versus distance). From this curve, built-in software was used for the slope to calculate and quantify the stiffness of the sample using Hertz theory (51). Samples were obtained from four disease groups, including healthy subjects, mild AD, moderate AD, and severe AD. Topographic mappings were first obtained, and then Young’s modulus mappings were calculated. Large particles were identified in topographic images, and their corresponding points in Young’s modulus mappings were extracted as the particle Young’s modulus.

Nanomorphology characterization using HMI

The HMI (CytoViva; resolution limit, 90 nm) was used for particle visualization and analysis. Using the system, images of micro-/nanoscale structures and micro-/nanoparticles were captured (52). Ten microliters of CSF sample was covered by a thin cover glass and observed by CytoViva. Without fluorescence staining, the particles inside the samples were illuminated and visualized from samples of four disease groups, including healthy subjects, mild, moderate, and severe AD patients. For each subject, 10 images were collected at different positions of the sample (including the margin and the center). The average particle concentration was calculated.

Nanomorphology characterization using AFM

To obtain details of nanomorphology with height for analysis, 10-μl samples were dried on a slide glass, and an AFM was used under tapping mode. In tapping mode, the cantilever oscillated up and down near its resonance frequency, and the interaction of forces acting on the cantilever when the tip comes close to the sample surface caused the amplitude of oscillation to decrease because of Van der Waals forces, dipole-dipole interactions, and electrostatic forces. Tapping mode was gentle enough to allow for visualization of supported lipid bilayers, adsorbed single polymer molecules (for instance, 0.4 nm thick), and high-resolution topographic imaging of the protein components.

Force measurement of serum samples using functionalized AFM tips

Coated with gold and anti-Aβ(1–42) antibody (#ab10148, Abcam), AFM tips were functionalized to specifically detect the Aβ-embedded proteins inside human serum. Serum or anti-Aβ(1–42) antibody (5 μl) was used to coat the substrates, respectively. The procedures of force measurement are the same as those of nanomechanics characterization. Binding force between the anti-Aβ–coated tip and the substrate was examined when the tip was lifted and the binding was broken (Fig. 3, C and E).

Data acquisition and statistical analysis

Experimental data sets of one subject are listed in tables S3 and S4.

CSF samples from 34 human subjects were used in the nanomechanics characterization of CSF. From AFM force curves, Young’s modulus of characterized areas was calculated on the basis of a Hertz model. Data processing was performed using the Igor software (Igor version 6.34, WaveMetrics Inc.). The final data were classed into four groups according to their AD stages and applied for the KF algorithm.

CSF samples from 34 human subjects were used in the nanomorphology characterization of CSF. Data processing is shown in fig. S9. First, high-contract particle images obtained via CytoViva were processed by ImageJ (version 1.47, National Institutes of Health) to calculate particle numbers, concentration, and distribution. Second, high-resolution images obtained from AFM were analyzed using ImageJ and Igor software. Sample size, shape, and arrangement were obtained.

Serum samples from 30 subjects were used in the force characterization of serum. The force curves were processed using Igor software (Igor version 6.34, WaveMetrics Inc.). Serum and anti-Aβ antibody interaction and the plateau height in retract lines were calculated.

Statistical analysis was applied for determining the statistical significance between samples from different AD stages. Statistical analysis was conducted using Prism 6 (GraphPad) software. One-way ANOVA with Dunnett’s post hoc test was used. Where there was significant difference in variance between groups, nonparametric test was applied. All data are presented as box plots (Fig. 2A). Meanings of labels in graphs are as follows: NS, not significant; *P < 0.05; **P < 0.01; ***P < 0.001. All final data were classed into four groups according to their AD stages and saved as measurement parameters applied for the KF algorithm, including particle Young’s modulus, concentration, size, and shape, as well as tip thermal frequency changes, serum and anti-Aβ antibody interaction, binding force, and molecule force based on serum samples.

KF-based algorithm for AD diagnosis

Dynamics modeling of KF-based algorithm for AD diagnosis. In this research, Xk was used to represent parameters characterizing AD state, such as AD status, patient age, and measurements. Zk was used to represent the measured data such as physical properties of CSF and serum components. k is progression index. The progression dynamics of AD state can be modeled by the Markova linear system as shown in Eq. 1, considering that the current AD state was only related to the previous one, where A is the transition matrix and Wk is the noise generated during the modeling formulation. Furthermore, AD state (Xk) can be characterized by measurements (Zk) and described by a mapping matrix C and an estimation error Vk, as shown in Eq. 2. Q and R are covariance matrices to characterize the noise generated in the modeling and measurement process.Embedded Image(1)Embedded Image(2)

KF-based algorithm for diagnosis. The KF-based algorithm was proposed for accurate AD analysis performed via recursive calculation as follows.

(1) The priori AD state (Embedded Image) can be estimated on the basis of the previous state (Embedded Image) by following Eq. 1Embedded Image

(2) The priori error covariance (Pk|k−1), which is used to characterize the state estimation error, can be calculated on the basis of the previous error covariance (Pk−1|k−1)Embedded Image

(3) With (Zk), the priori estimate can be updated with an optimal KF gain (Kk)Embedded ImagewhereEmbedded Image

(4) The estimate error covariance can be updated asEmbedded Image

(5) Proceed to the next cycle (k + 1) by going to step 1.

Personalized prediction algorithm derivation for clinical application. The KF-based algorithm for AD analysis as shown in Eqs. 1 and 2 is fully determined by four KF parameters: A, C, Q, and R. To obtain a personalized prediction algorithm for an individual patient, these parameters were optimally tuned using the maximum likelihood and expectation maximization methods (53). The new patient’s data were fused with the population data based on the central limit theorem and Bayes theorem in the KF parameter-tuning process (54), further improving the prediction accuracy for new patients.

AD diagnosis using KF-based algorithm

AD diagnosis for a new patient was conducted by the following procedures (Fig. 4A).

Data acquisition. The nanomechanics and nanomorphology of CSF protein components were obtained through nanocharacterization. The SAGE/MMSE scores were obtained from cognitive assessments. Doctor’s judgments and the demographic information, such as age, gender, education level, race, and historical medical records, were recorded correspondingly for each patient.

Data filtration to remove abnormal data and noise. The above data were processed by the failed data detector to identify the abnormal data. The noise was reduced by using the KF-based fixed-interval smoother.

Data evaluation and selection to obtain the data representing AD progression. Collinear analysis was conducted using Pearson’s method to identify the dependent data. Then, Bayes theorem–based data fusion method was used to redefine these data and ensure that each piece of data independently reflects AD progression. Afterward, correlation of each piece of data to the AD status was identified by calculating the correlation matrix, based on which the optimal data for accurate AD diagnosis were selected.

Combination of nanocharacterization and cognitive assessments to formulate complementary information for AD diagnosis. Nanocharacterization data are more sensitive in the early AD stage, and cognitive assessment data are more likely to change in the later stages. On the basis of the sensitivity difference, each data were segmented and assigned with a different weight to better reflect AD progression.

KF-based stochastic algorithm for the quantitative diagnosis of AD. With the above-obtained data input from one end of the algorithm, it was possible to generate accurate diagnosis information for the patient from the other end, including AD stage, potential risk to proceed to the next AD stage, and schedule for the next follow-up clinical visit.

Advising the optimal checkup frequency using KF-based algorithm

After a new patient came in for the first few times, the recommended clinical checkup frequency was given using the KF-based algorithm by the following procedures.

Data fusion and model training. Because the new patient only has one or two visits, the population-based model was personalized by fusing the measured data of first few visits. The KF-based algorithm was trained by the fused data to obtain a personalized model for this specific new patient.

Future data prognosis. On the basis of the fused data and the personalized model, the prognosis was made on the future visits. The age and physical properties of protein components were predicted for different AD status.

Data interpolation and simulation. After the predicted data for different AD status were obtained, data interpolation was conducted in different time spaces. Different visit points were interrupted into the time period between each predicted AD status to simulate different visiting frequencies. Afterward, the stochastic model-based algorithm was trained again by all the obtained data, and a further prediction for each visiting sequence was given by the algorithm.

Best frequency selection. After predictions were made for different checkup frequencies, the prediction error for each frequency was calculated. In addition, under certain error threshold, the best frequency of future visiting was determined as the recommend future clinical checkup frequency for the patient.


Supplementary material for this article is available at

fig. S1. Comparisons between cognitive assessment evaluations and the KF-based predictions of AD progression.

fig. S2. Synthesized Aβ was used to show the influence of a freeze-thaw cycle on the peptide and protein particle aggregation.

fig. S3. Identifying Aβ-embedded protein aggregates within the CSF.

fig. S4. Nanomorphology of tau-embedded proteins within the CSF during AD pathogenesis.

fig. S5. Age- and gender-related analysis.

fig. S6. Particle size, shown as the size of white spots.

fig. S7. Dynamics of nanocharacterization and cognitive assessment data in AD progression.

fig. S8. AD stages of a female patient determined by cognitive assessments and KF-based diagnosis.

fig. S9. Data analysis of nanomorphology using image processing.

table S1. Human subjects for CSF samples.

table S2. Human subjects for serum samples.

table S3. Details of experimental data sets of CSF-based characterization.

table S4. Details of experimental data sets of serum-based characterization.

table S5. MMSE and SAGE scores.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We gratefully thank Buckeye Biospecimen Repository of The Ohio State University for providing the human CSF and serum samples used in this research. We gratefully thank D. Wang for helping with the programs used in the computational integration of this paper. Funding: J.K. and J.P. acknowledge funding from the NIH (grant nos. RF1 AG054018 and T32GM068412, respectively). Author contributions: T.Y. and X.J. performed the experiments, analyzed the data, and wrote the paper. J.P. and L.S. analyzed the experimental data and wrote the paper. Z.F. and J.F. designed the experiments and provided reagents. R.D. and D.W.S. selected and processed human CSF and blood samples. S.G. performed computational approach for data analysis. J.K. analyzed experimental data. M.Z. conceived and designed the experiments, participated in discussions, and wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. All data associated with this study is in the body of the article.

Stay Connected to Science Advances

Navigate This Article