Research ArticleCANCER

Proteomic analysis of circulating extracellular vesicles identifies potential markers of breast cancer progression, recurrence, and response

See allHide authors and affiliations

Science Advances  02 Oct 2020:
Vol. 6, no. 40, eaba5714
DOI: 10.1126/sciadv.aba5714


Proteomic profiling of circulating small extracellular vesicles (sEVs) represents a promising, noninvasive approach for early detection and therapeutic monitoring of breast cancer (BC). We describe a relatively low-cost, fast, and reliable method to isolate sEVs from plasma of BC patients and analyze their protein content by semiquantitative proteomics. sEV-enriched fractions were isolated from plasma of healthy controls and BC patients at different disease stages before and after surgery. Proteomic analysis of sEV-enriched fractions using reverse phase protein array revealed a signature of seven proteins that differentiated BC patients from healthy individuals, of which FAK and fibronectin displayed high diagnostic accuracy. The size of sEVs was significantly reduced in advanced disease stage, concomitant with a stage-specific protein signature. Furthermore, we observed protein-based distinct clusters of healthy controls, chemotherapy-treated and untreated postsurgery samples, as well as a predictor of high risk of cancer relapse, suggesting that the applied methods warrant development for advanced diagnostics.


Early detection of breast cancer (BC) has an important clinical impact on cancer therapy and overall survival. Currently, mammography, ultrasound, and magnetic resonance imaging are commonly used for screening, but these methods are not always reliable, safe, and/or cost-effective (1). As a complement to imaging approaches, proteomic profiling of circulating extracellular vesicles (EVs) derived from plasma of patients with BC represents a promising approach for early detection, diagnosis, and prognosis (2). Increasing evidence suggests that protein contents of small EVs of 30 to 100 nm in size, such as exosomes or exosome-like vesicles (ELVs), can be used for assessing tumor prognosis and therapeutic responses. EV proteins are more stable than other serological proteins as they are protected from circulating proteases by a lipid bilayer and thus could be better markers (3).

Unlike apoptotic blebs (50 to 5000 nm) that are released from apoptotic cells, EVs (50- to 1000-nm diameter) are released from multiple cell types including leukocytes, platelets, fibroblasts, adipocytes, and cancer cells (4). Small EVs (sEVs) of ~100-nm diameter are generated from different subcellular compartments including the plasma membrane and multivesicular bodies and can be found in diverse body fluids such as semen, urine, saliva, breast milk, aminiotic fluid, cerebrospinal fluid, and blood (5). sEVs have unique morphology and density and, thus, can be isolated by differential centrifugation and identified by electron microscopy (EM). In addition, sEVs contain a restricted set of proteins, microRNA, mRNA, and DNA and play important roles in cell-cell communication by transferring their content to target cells (6). sEVs are robustly produced by cancer cells and markedly affect the primary tumor microenvironment (TME) including the immune ecosystem as well as distant metastatic niches, thereby facilitating tumor growth and metastasis (7).

Tumor biopsies are currently considered as “gold standard” of diagnosis, prognosis, and prediction of therapeutic response. In metastatic patients, tumor biopsy is limited by sampling a single metastatic site among many present, and in terms of longitudinal analysis, it is associated with potential morbidity and patient inconvenience. sEVs, in contrast, may provide unique information about the full metastatic complement of tumors and allow facile longitudinal analysis of tumor evolution in response to therapy. The challenges are to define and standardize reliable methods for EV isolation for clinical utility, as currently multiple methods, including differential centrifugation, density gradient centrifugation, size-exclusion chromatography (SEC) and/or affinity chromatography, microfluidic devices, and synthetic polymer-based precipitation reagents, are used (8).

The second challenge is to optimize a proteomic profiling approach. Most sEV profiling studies have used mass spectrometry (MS)–based proteomics technologies, which have the potential to provide an unbiased screen of many proteins. However, MS is associated with complex sample preparation, is limited in ability to identify low-abundance signaling proteins, expensive, and time-consuming (9). Reverse phase protein array (RPPA), on the other hand, is a robust, sensitive, cost-effective approach to analyze a large number of samples but can cover only a limited number of selected proteins where high-quality antibodies are available (10). However, high-quality antibodies and, in particular, phosphospecific antibodies can increase sensitivity over conventional MS technologies (11). Here, we describe a cost-effective and efficient approach to isolate sEVs from plasma of patients with BC and show that proteomic analysis by RPPA is a powerful approach to identify potential clinically relevant biomarkers and to predict risk of cancer recurrence.


Study design

Plasma from patients with BC (ages 37 to 82) was collected at Sheba Medical Center, Israel. The criteria for inclusion in the study were an early-stage BC and candidacy for total tumor dissection. The use of human blood samples was carried out in accordance with the Declaration of Helsinki (approval no. 2254-15-SMC). All patients provided written informed consent. Participant clinical information is given in table S1. Blood for plasma samples (10 ml) was taken at the time of entry to the study (before the dissection surgery). For 27 patients, an additional blood sample was collected 24 weeks in average after surgery. Median follow-up duration of the patients in this study is 114 weeks. Blood samples were also collected from healthy women (n = 22, ages above 40), to serve as a control group. An independent set of blood samples to be used for validation purposes was obtained from the Sheba Medical Center tissue bank (including healthy age-matched women as controls). Blood was collected into EDTA tubes (0.02%) and centrifuged at 1500g for 15 min. The supernatant (plasma) was collected, aliquoted, and kept at −80°C as source for sEV purification.

Isolation of sEVs and RPPA

To isolate sEVs from blood plasma, we established a reliable method using combination of SEC and filtration. Plasma (2 ml) was first centrifuged at 300g (10 min at 4°C) followed by supernatant centrifugation at 10,000g for 10 min. The 2-ml plasma was then concentrated to 0.5 ml using Nanosep Omega 300-kDa filters (PALL Life Science, Canada). Concentrated plasma samples were loaded on a qEV size chromatography column, separation size 70 nm (IZON, UK). The columns were washed with phosphate-buffered saline, and four fractions of 1.5 ml were collected from the effluent. One hundred microliters of each fraction was used for particle counting by light scattering using the NanoSight Instrument NS300 (Malvern Panalytical, UK), while the remainder of the samples were concentrated by repeated centrifugation through the 300-kDa filters (four times, 10,000 rpm, 15 min each) to obtain a vesicle pellet. The concentrated vesicles on the filters were lysed in 50 μl of RPPA lysis buffer [1% Triton X-100, 50 mM Hepes (pH 7.4), 150 mM NaCl, 1.5 mM MgCl2, 1 mM EGTA, 100 mM NaF, 10 mM Na pyrophosphate, 1 mM Na3VO4, 10% glycerol, and protease inhibitors]. Protein concentrations were measured by Bradford assay (Bio-Rad). Protein lysates were analyzed by the RPPA core facility of the MD Anderson Cancer Center (Houston, Texas). Results were normalized for protein loading as follows: The median for each antibody across all samples was calculated, and the results were median centered for each antibody. Then, the medians of each sample across all antibodies were measured. Samples with extremely low or high medians were considered to be outliers with either very low or high protein content and were removed from further analysis.

Statistical analysis

Statistical analysis of the RPPA results was performed with R using the following packages. Determination of differentially expressed proteins between presurgery patients, postsurgery samples, and healthy controls was performed using the LIMMA package. Comparisons between postsurgery and presurgery samples per patient were analyzed by paired t test. k-Nearest neighbor (kNN) tests were performed using the Caret package and optimized by manipulating several parameters including the number of neighbors and the number of proteins. Validation was done by the leave-one-out cross validation method. Elastic net regression was performed using the glmnet and caret packages. Receiver operating characteristic (ROC) curves were generated by the plotROC package in R and by the easyROC tool ( Area under the curve (AUC), confidence intervals, and P values for all ROC curves are given in table S3. Correlations were performed using the Hmisc package. Hierarchical clustering of the data was performed using the gplots package. Partition clustering was visualized by the Factoextra package. Decision trees and random forest models were built using the rpart and ranger packages, respectively. P values less than 0.05 were considered statistically significant.


Total proteins were extracted from EVs using a lysis buffer containing 0.2% Triton X-100, 50 mM Hepes (pH 7.5), 100 mM NaCl, 1 mM MgCl2, 50 mM NaF, 0.5 mM Na3VO4, 20 mM β-glycerophosphate, 1 mM phenylmethylsulphonyl fluoride, leupeptin (10 μg/ml), and aprotinin (10 μg/ml). EV lysates were centrifuged at 14,000 rpm for 15 min at 4°C, and protein concentration of the supernatants was measured by Bradford assay (Bio-Rad, Hercules, CA). Equal amounts of proteins were analyzed by SDS–polyacrylamide gel electrophoresis and Western blotting (WB) using the indicated antibodies. Equal volumes of lysates were analyzed using the Coomassie dye Imperial Protein Stain (Thermo Fisher Scientific). The following were the antibodies used in this study: TSG101 (ABCAM, ab30871), HSC70 (Enzo Life Sciences, ADI-SPA-822), ALIX (Santa Cruz, SC-53540), focal adhesion kinase (FAK) (Santa Cruz, SC-932), mitogen-activated protein kinase kinase 1 (MEK1) (Cell Signaling Technology, no. 9124), and fibronectin (DSHB, University of Iowa).

Transmission electronic microscopy

Isolated EVs (3 μl) were applied to glow-discharged, 300-mesh formvar/carbon-coated copper transmission electron microscopy (TEM) grids (Electron Microscopy Sciences) for 30 s. Excess liquid was blotted, followed by washing with distilled water, and staining with 2% uranyl acetate. Samples were visualized in an FEI Tecnai T12 TEM operated at 120 kV, equipped with a TVIPS TemCam-XF416.


Isolation of sEVs from human blood plasma

The sEV proteome has been proposed to provide useful clinical information for detection and stratification of BC (6). Thus, an efficient, robust, and reliable method for sEV isolation is a critical need (12). A major challenge of sEV isolation from plasma is to avoid contamination of abundant plasma proteins such as albumin while concurrently collecting sufficient sEV proteins for global proteomic or clinical analysis. To simultaneously accomplish these two requirements, we established an efficient protocol that requires only 2 ml of plasma and results in high yields of purified sEVs. The isolation protocol is illustrated in Fig. 1A and includes two important steps: filtration and SEC. SEC is considered to be a better method for diagnostic assays compared with standard ultracentrifugation as it retains integrity of sEVs (13, 14) and concomitantly decreases plasma protein contaminants (15). Following protocol calibration, the SEC eluent was fractionated into four fractions of 1.5 ml each, and particle numbers were measured by light scattering (NanoSight). As shown, the third and fourth fractions had the highest numbers of particles with an average size of ~110-nm diameter (Fig. 1B), a characteristic size of sEVs (5). Both fractions consist of typical exosomal-like markers, including TSG101, ALIX, and HSP70 (Fig. 1C). The abundance of albumin in the isolated fractions compared with total plasma was assessed by Coomassie Blue staining of similar lysate volumes and showed high albumin levels in the fourth fraction (Fig. 1D). Fraction purity was calculated as log ratio between particle number and protein concentration in each fraction (12), with highest purity in fraction number 3 (Fig. 1E). Accordingly, fraction 3 was used for further analysis and proteomic profiling.

Fig. 1 sEV extraction from human plasma samples.

(A) Scheme depicting the procedure for EV enrichment and extraction. sEVs were partially purified from the plasma of patients by serial centrifugations, filtration, and passing through SEC. Fractions of 1.5 ml were collected from the SEC eluent. (B) Size distribution of particles in the different fractions as measured by NanoSight. Shown is a representative chart of three independent repeats. (C) sEV markers in the different SEC fractions. WB analysis of protein extracts of the indicated fractions from two plasma samples, as a representative of at least four samples. (D) Coomassie Brilliant Blue staining of protein extracted from the different fractions and the original plasma (diluted 1:40). Shown is a representative of two repeats. Similar volume of indicated fraction was loaded. (E) Fractions purity was calculated as a log ratio of particle number to protein concentration (n = 6 plasma samples), and significant differences between fractions 3 and 4 were determined by t test. A.U., arbitrary units. ***P < 0.001.

Proteomic analysis and diagnostic signature

Protein extracts from the sEV-enriched fractions (50 μg) were analyzed by RPPA (core facility of MDACC, Houston, Texas) to assess total and phosphoprotein levels of ~276 cellular proteins that primarily associated with cancer-related signaling pathways (16). The results of significantly differentially expressed proteins in presurgery BC samples compared with healthy controls are summarized in the volcano plot shown in Fig. 2A, and most prominent proteins are indicated (full results are given in table S2A). Among the up-regulated proteins, FAK, MEK1, and fibronectin were highly enriched in EVs driven from plasma of patients with BC, consistent with previous reports on FAK (17) and fibronectin in EVs (18).

Fig. 2 RPPA analysis of sEV-enriched fractions.

(A to I) Plasma EVs extracted from presurgery patients with BC (n = 52) and healthy controls (n = 22) were analyzed by RPPA. (A) Volcano plot showing differentially expressed proteins between presurgery patients and healthy controls. The top hits are marked in red (up-regulated proteins) or blue (down-regulated proteins). (B) Principal components analysis (PCA) of the patients with BC and healthy controls using expression levels of the 60 top significantly different proteins (yielding the maximum partition in the cohort). (C) Unsupervised clustering of the entire cohort using the 10 proteins selected by the kNN test. Each row indicates one woman, either healthy control (orange) or presurgery patients (red). (D) Logistic regression with elastic net penalty performed on the main cohort. Shown is the importance plot of the proteins in the model (based on their z statistic and normalized on a scale from 0 to 100). Arrows in the bars indicate the proteins that appear in the kNN signature and are up- or down-regulated in BC versus healthy. (E) Unsupervised clustering of the entire cohort using the seven proteins selected both by the kNN test and the logistic regression model. (F) Accuracy parameters of the clustering in Fig. 1 (C and E). (G) ROC curves of the three up-regulated proteins in the signature. (H) Boxplot depicting the expression and distribution of the three up-regulated proteins in the signature. (I) Pairwise similarity matrix based on Spearman’s correlations of 276 proteins in the patients with BC, clustered into eight partitions. “1” and “2” indicate the partitions that include the three up-regulated proteins and three of the four down-regulated proteins from (E), respectively. (J) RPPA validation by WB. Shown are representative WBs of the three up-regulated proteins of the signature. Densitometry results of at least four healthy and four patients are shown in the right panels. *P < 0.05, ***P < 0.001.

To generate a protein signature that stratifies patients with BC and healthy controls, we performed kNN, a robust method for predicting outcomes based on array data (19). We used leave-one-out cross validation and ROC-AUC as the performance metric (19). Using 1 neighbor (k = 1) and performing the test with the N top significant differently expressed proteins (with N starting from 1, being FAK, up to 276, the number of proteins in the array, ordered by increasing P value), we discovered that the best partition is with N = 60 proteins (Fig. 2B and fig. S1A). To generate a signature composed of a small number of proteins that will maximize the classification, we performed kNN models for k = 1..31 and N = 1..100 (fig. S1B). We discovered a local maximum point for AUC at N = 10 (fig. S1B), suggesting that these 10 proteins, of which 5 were up-regulated and 5 were down-regulated in patients with BC, provide a good classifying signature of BC versus healthy women. Unsupervised clustering of the entire cohort of patients and controls using these 10 proteins (Fig. 2C) indeed showed a good separation between patients with BC and healthy controls. The clustering yielded high sensitivity (true-positive rate) of 96%, with a specificity (true-negative rate) of 64%. Positive and negative predictive values of the signature are 86 and 88%, respectively (Fig. 2F).

To improve the signature classification accuracy, in particular to increase specificity, we applied another classification method on the main cohort of 52 patients and 22 healthy controls—logistic regression with elastic net penalty. To obtain the best accuracy, the model was trained on the cohort (using 10-fold cross validation) to tune the parameters λ (which controls the total extent of the penalty) and α (which controls the shift between L1 [lasso] and L2 [ridge] penalties) (fig. S1C). The best accuracy was 92.3%, similar to the kNN model. The 20 most influential predictors in the model (in term of their coefficient P values) are given in Fig. 2D. Seven of 10 proteins in the kNN signature discovered above are among the most influential predictors of the elastic net model. We, therefore, clustered the cohort using these seven proteins (Fig. 1E), achieving better prediction accuracy, with improved specificity of 82% (Fig. 1F). By using a similarity matrix, showing the correlation between any two proteins in the BC cohort (Fig. 2I), we found that all the three up-regulated proteins (FAK, MEK1, and fibronectin) belong to the same cluster (cluster #1), indicating that each of these proteins is correlated with the others. Among the down-regulated proteins in the signature, three of them (β-actin, C-Raf, and N-cadherin) also belong to the same cluster (cluster #2 in the figure).

To assess the prediction accuracy of each individual protein among the seven proteins in the signature, we performed a ROC curve analysis on the up-regulated proteins in the signature (Fig. 2G and table S3). We took the area under the ROC curve (ROC-AUC) as a measurement of the predictive value of each protein. FAK, MEK1, and fibronectin were found to have high AUC and high fold change between patients with BC and healthy controls (Fig. 2H). We observed a positive correlation between FAK expression in the plasma of the patients with BC and the plasma levels of CA 125 and CA 15-3, two commonly used circulating markers for BC (fig. S1D), further demonstrating the clinical relevance of our identified markers. The strong predictive value of FAK was also shown by building a decision tree model using all available predictors. FAK was found to be the protein that classifies the cohort with the highest accuracy (fig. S1E). To validate the RPPA results, the levels of the up-regulated proteins in the signature (FAK, MEK1, and fibronectin) were also found to be increased in BC compared with healthy samples using WB (Fig. 2J). ROC-AUC analysis was also performed on the four down-regulated proteins in the signature, and AUCs of 0.728 to 0.838 were obtained (fig. S1, F and G).

Validating the predicative power of the signature

To validate the predictive strength of our analysis, we obtained an independent test set of plasma samples from other resources in the Sheba Medical Center. This set included plasma samples taken from 16 patients with BC obtained during surgery and 8 control samples from healthy women. There was no apparent batch effect in the RPPA results between the main cohort and the test set (fig. S2A). Using the identified signature of seven proteins shown in Fig. 2E, we could cluster the patients with BC and healthy controls with high accuracy of 88% (Fig. 3A and fig. S2B). For the individual predictors, we could obtain high ROC-AUCs for FAK and fibronectin in the test set (Fig. 3B and fig. S2C), thus validating their predictive value and highlighting the power of our prediction approach.

Fig. 3 Validation of the seven-protein signature.

(A to C) Validation of the results on an independent test set. Plasma EVs were extracted from 16 patients with BC (blood taken during surgery) and 8 healthy controls and were analyzed by RPPA. (A) Clustering of the test set samples using the seven-protein signature obtained using kNN and logistic regression on the main cohort. (B) ROC curves and AUC values of several proteins from the signature, done on the test set samples. (C) Machine learning models used to classify the test set samples. All models were performed using 268 proteins appearing both in the main cohort and the test set RPPA. Models were trained on the main cohort to tune model parameters (details in table S4). Models were applied on the test set samples, and confusion matrices were built to calculate accuracy, sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively).

To determine whether we can increase the test set prediction accuracy, we tested several other classification models (20) and compared the results to the classification by our seven-protein signature. Of the 276 proteins examined in the main cohort RPPA, 267 were also examined in the test set RPPA. As input, the models were given the expression levels of these 267 proteins. Each model was trained on the main cohort using different cross validation methods (table S4) and then tested on the independent test set. As shown in Fig. 3C, although random forest achieved better sensitivity (true-positive rate), overall our signature had the best specificity (true-negative rate) and, thus, overall accuracy.

Biomarkers for detection of cancer stage

Next, we examined significant differences between BC stages. As particle size distributions revealed no significant difference between presurgery patients and controls, we divided the presurgery patients into two categories on the basis of tumor stage, as classified by the TNM (tumor size, nodes infected, and metastasis) system (fig. S3, A and B). Our cohort was composed mainly of stage I and IIA patients (table S1). While particle size distribution of patients with stage I BC was not different from healthy controls, particles from stage IIA patients exhibited a shift toward lower particles size, as measured by side scattering analysis (Fig. 4A and table S5). This difference in size distribution was also observed by TEM analysis (fig. S3C). Moreover, the number of small EVs (size, <100 nm), which might include exosomes or ELVs, was significantly increased in stage IIA patients (Fig. 4B) compared with stage I or healthy samples, although the total number of particles remained similar in the three groups (fig. S3D). ROC analysis of sEV (<100 nm) concentration in the plasma of stage II BC versus healthy samples yielded an AUC of 0.75 (fig. S3F). An increased number of sEV of less than 100 nm was also associated with an increase in body mass index (BMI) (fig. S3E), but not with any other clinical parameter.

Fig. 4 Effects of BC stage on number and protein content of EVs.

(A and B) Light scattering (NanoSight) was used to generate a size histogram of the EVs in the enriched plasma fractions from healthy women or stage I or stage IIA patients. (A) Histogram shown is the average of n = 20 healthy controls, 12 stage I patients, and 6 stage IIA patients. The number of low-size sEVs (smaller than 100 nM) was quantified in (B). (C and D) kNN tests were used to generate protein signature to classify stage I and stage IIA patients. Unsupervised clustering of presurgery stage I patients (C, left) or stage IIA patients (D, left) (in red) with the healthy controls (in orange) is shown using the generated signatures. Logistic regression with elastic net penalty was built for each classification, and variables importance plot of the variables in the model (based on their z statistic and normalized on a scale from 0 to 100) is shown on the right. Arrows in the bars indicate the proteins that appear in the kNN signature and are up- or down-regulated in the relevant signature (n = 25 stage I patients, 11 stage IIA patients, and 22 healthy controls). *P < 0.05, **P < 0.01.

Next, we looked for sEV proteins that can distinguish between stage I, stage IIA, and healthy samples. To that end, we first built a protein signature unique for each stage using kNN, as well as a signature to differentiate stage IIA from stage I patients (fig. S4, A and B). Using patient versus healthy signatures, we could cluster stage I (Fig. 4C) and stage IIA (Fig. 4D) patients apart from healthy controls with high accuracy. Logistic regression done for each stage versus healthy samples, corroborated most of the proteins in each kNN signature (Fig. 4, C and D, right panel). While the protein signatures for stage I and stage IIA shared several proteins, including the three most prominent, FAK, MEK1, and fibronectin, each also had unique proteins (fig. S4C), which were analyzed by ROC-AUC to find unique markers for each stage. The best protein markers when comparing stage I patients (fig. S4D) or stage IIA patients (fig. S4E) to healthy controls are epidermal growth factor receptor and P-cadherin, respectively. Furthermore, the signature with the best AUC that differentiates between stage IIA and stage I consists of two proteins, including insulin-like growth factor receptor β subunit IGFRβ, which has the best ROC-AUC (fig. S4F). The expression level of these markers in our cohort is given in fig. S4H. Several of the prominent markers for stage IIA (including P-cadherin and TAZ) were decreased in plasma sEVs of patients with BC compared with control (fig. S4H). In light of the increased number of smaller EVs (<100 nm) in plasma of stage IIA patients (Fig. 4B), TAZ and P-cadherin are indeed significantly negatively correlated to smaller EV number (fig. S4G). This suggests that both the number of smaller EV and the expression levels of specific proteins can help to distinguish stage IIA from stage I patients.

Using similar methods, we could also find characteristic proteins for different BC subtypes [estrogen receptor positive (ER+), progesterone receptor positive (PR+), human epidermal growth factor receptor 2 positive (HER2+)]. Fibronectin was found by ROC analysis to differentiate between ER+ and ER, Wee1 between PR+ and PR, and Cox2 between HER2+ and HER2 (fig. S5, A to C). Consistent with our findings, Cox2 was previously shown to correlate to HER2 status in BC (21).

Protein signature for risk of relapse

Within the time frame of the study, four patients underwent relapse, and three of these four were analyzed by RPPA (the fourth was a technical outlier in the RPPA). To examine whether these relapsed patients belong to a discrete group, we used partition clustering analysis of presurgery samples. As shown in Fig. 5A, the presurgery samples are highly variant, consisting of a few distinct clusters, including a unique cluster of patients with a risk of relapse (cluster 4 in Fig. 5A). This segregation suggests that a protein signature can potentially be built to predict relapse using our methodology of sEV extraction and RPPA analysis. To explore this possibility, we obtained the Oncotype recurrence score (RS) of 16 patients from our cohort (Fig. 5B). The Oncotype RS is an excellent clinical test to estimate the likelihood of relapse and the benefit of chemotherapy in ER+ patients with BC based on the RNA expression levels of 21 selected genes (fig. S6A) (22). High Oncotype scores (>25) are considered to predict a high risk of relapse. Two of the patients (nos. 14 and 16; Fig. 5B) with a score above 25 indeed relapsed within the duration of our study. One of them, patient no. 14 (RS = 28), had RPPA data and was clustered in the relapse risk cluster (cluster 4; Fig. 5A). An additional patient with high Oncotype score (RS = 31, patient no. 15; Fig. 5B) was also found to be included in the high relapse potential cluster in our partitioning analysis (Fig. 5A), thus supporting a high likelihood of recurrence in patients of this cluster. We observed an increase in the RS values along the major principle component (Fig. 5A), implying that combining the proteomic analysis described here together with RS scoring could substantially improve prediction potential.

Fig. 5 Relapse prediction and associated proteins.

(A) Partition clustering of the patients with BC in the study. Clustering was done by the k-means method. Partitions are shown in PCA plot using the two highest principal components. Colors distinguish between six partitions (three of them include single points). Red points represent the patients who underwent relapse. Numbers below some of the points are the Oncotype recurrence score (RS) for those patients. (B) Oncotype RSs were measured for 16 of the patients. Red bars mark the patients who underwent relapse. (C) Volcano plot showing differentially expressed proteins between the three patients who underwent relapse plus patient number 15 from the Oncotype RS = 31 versus the other patients.

To gain better insight on protein expression patterns of relapsed patients, we compared the RPPA data of the three relapsed patients plus patient no. 15 (Fig. 5B), who is part of cluster 4 (Fig. 5A) and has high RS, with the RPPA data from the other patients. Although many differentially expressed proteins (Fig. 5C) were observed, the highest and most significant was HSP70, which was previously associated with tumor recurrence (23). Furthermore, we found good correlations between the expression of several proteins to the Oncotype score (fig. S6B), suggesting that these proteins can be used to predict relapse in patients who did not undergo Oncotype score evaluation. Together, our findings demonstrate the power of sEV proteome to predict recurrence risk and highlight its clinical potential pending additional validation studies.

Analysis of EVs postsurgery

Next, we analyzed EVs of 27 patients postsurgery. Plasma samples were collected on average 24 weeks after surgery. Patients undergoing chemotherapy were sampled during or after chemotherapy, and in most cases, before any other treatment. Other patients were sampled before or during radiotherapy and/or hormone therapy (table S6). sEVs were isolated as described in Fig. 1A. Light scattering analysis showed a significant shift in particle concentration histogram toward larger particle sizes of ~150-nm diameter (Fig. 6A and table S5), concurrent with a significantly reduced number of sEVs in the smaller range (< 100 nm), not only compared with the presurgery patients but also compared with healthy controls (Fig. 6B). These effects were not correlated to the time of plasma collection after surgery (fig. S6A), but most likely were due to the applied therapy. Unsupervised clustering separated healthy samples from the postsurgery samples (Fig. 6C and table S2B). Furthermore, the clustering separated to, some degree, patients who received chemotherapy compared with those who did not (Fig. 6C, chemotherapy regiment is indicated; table S6). Principal components analysis using all significantly different proteins between chemo-treated and nontreated patients (P value <0.01) also reveals this separation between the three groups (Fig. 6D). Together, this suggests that treatment induces substantial difference in EV content in postsurgery patients, mainly due to the chemotherapy treatment. Specifically, EVs from patients who underwent chemotherapy were enriched in metastasis, promoting factors such as transferrin receptor (TFRC) (24), concurrent with a substantial down-regulation of E-cadherin, suggesting that tumors may undergo epithelial-mesenchymal transition (EMT) on therapy, as expected (25) (Fig. 5E and table S7).

Fig. 6 Analysis of postsurgery samples.

(A and B) Plasma samples from 27 patients were collected ~24 weeks after surgery. Particle distribution (A) and number of sEVs smaller than 100-nm diameter (B) were measured in 23 postsurgery samples by NanoSight analysis. (C) Unsupervised clustering of the postsurgery samples and healthy controls. The close-up of the dendrogram zoom in on the postsurgery sample cluster, detailing the adjuvant chemotherapy regiment for the patients who received it before the plasma sample was taken. Gray indicates healthy controls (n = 22), orange are postsurgery samples that received adjuvant chemotherapy (n = 8), and red are postsurgery samples that did not (n = 19). (D) PCA analysis of postsurgery samples using the significantly differently expressed proteins (P < 0.01) between samples after chemotherapy and samples without chemotherapy. (E) Volcano plot showing the significantly differently expressed proteins in patients undergoing chemotherapy versus patients not receiving chemotherapy. *P < 0.05, **P < 0.01, ***P < 0.001.

Similarly, we analyzed proteins affected by radiotherapy (fig. S6B and table S8; 14 patients who received radiotherapy versus 13 who did not) and identified superoxide dismutase 2 (SOD2) as the highest up-regulated protein, possibly as a result of reactive oxygen species generation and oxidative stress induced by radiation (26,27). Last, we compared the differentially expressed proteins in presurgery (table S2A) and postsurgery (table S2B) versus healthy samples and observed a few proteins, including the biomarkers for BC identified above (FAK, MEK1, and fibronectin) that remained up-regulated in EVs of postsurgery samples (fig. S6C). By a pairwise comparison of post- and presurgery samples for the same patients, we were able to generate a list of proteins that change following the surgery and treatment (table S2C).


Identification of biomarkers with sufficient sensitivity and specificity for early detection of BC remains a major challenge. Here, we describe a simple and reliable method to isolate EVs from plasma of patients with BC and analyze their proteome by a semiquantitative method. We show that this approach could have a powerful diagnostic impact for early detection and prediction of recurrence risk.

We established a simple protocol (Fig. 1A) to enrich sEVs ~100 nm in size, likely encompassing exosomes and ELVs among other EVs (Fig. 1, A and C), from plasma of patients with BC. A total of 52 patients with BC and 22 healthy controls were analyzed by the RPPA technology to obtain expression profiles of ~276 cancer-related proteins, which were used to build prediction models. The models were further validated on an independent test set of 16 patients with BC and 8 healthy women. This analysis identified a signature of seven proteins that clusters patients with BC distinctly from healthy women with high accuracy and thus could have important clinical impact.

We identified several proteins with high predicative values compared with other methods of diagnosis. Fibronectin, for example, was proposed to be a diagnostic marker for BC, with a diagnostic accuracy of AUC 0.81 (18), very close to the 0.79 AUC obtained in our study (Fig. 2G). We also found fibronectin as the best marker for differentiating between ER+ and ER patients (fig. S5A). FAK was the most significantly enriched protein in sEVs from patients with BC compared with healthy women (Fig. 2). This nonreceptor tyrosine kinase plays an important role in BC progression and metastasis (28) and was identified in previous studies in EVs of patients with BC (17). Here, we show that FAK is not only present in EVs but also exhibits a strong association with early BC, with a ROC-AUC of 0.89, and thus may be considered as a potentially useful biomarker for early detection. The second highly enriched protein was MEK1, which also plays an important role in BC progression (29), but has not been previously implicated in exosome or sEV function.

Validation of the seven-protein signature (Fig. 3) on an independent test set, taken from a different source, further strengthened our findings. The seven-protein signature yielded high accuracy of 88%, concomitant with a remarkable high sensitivity of 94%. While the specificity was a bit lower (75%), possibly the importance of high sensitivity for early detection of BC is more crucial. Nevertheless, future studies relying on the described approach might improve the specificity by increasing number of samples applying a larger number of healthy samples to normalize the data.

Further analysis revealed that EVs can also be used to distinguish between stage I and stage IIA patients. Although the most profound difference was the increased number of smaller EVs of >100 nm (Fig. 4, A and B), we could define signatures and markers specific for these two stages. This finding has an advantage over classical serological markers of BC, such as CA 15-3 that are not useful to differentiate between early cancer stages (30). The most significantly differentially expressed proteins in stage IIA versus healthy women were P-cadherin and TAZ (fig. S4E), while IGFRβ was the best marker to differentiate between stage I and stage IIA (fig. S4F). The levels of TAZ and P-cadherin were reduced in stage IIA compared with healthy controls (fig. S4H) and also have a negative correlation with numbers of small-sized sEVs as expected (fig. S4G). Notably, all three proteins are known to promote cancer progression, for example, P-cadherin is highly expressed in invasive BC and promotes migration and invasion (31), and TAZ was also similarly implicated (32). It is unclear why these proteins are decreasing in plasma EVs of patients with more advanced tumors with a higher metastatic potential. It is important to note that plasma EVs consist not only of tumor-derived exosomes but also sEVs from different origins such as platelets (33), leukocytes, or adipose tissues (34). The high correlation of exosome number and BMI as shown in fig. S3E, implies that plasma exosomes are derived from many different cells. Therefore, it is possible that the overall changes in sEV cargo reflect the “disease state,” which is influenced by tumor cells as well as cells of the TME and circulating cells and does not necessarily recapitulate the expression profile of the tumors or of the EVs that are released from tumor cells.

Prediction of relapse based on the cargo of early-stage sEVs could provide highly important information to guide treatment planning and consideration of possible outcomes. As sEVs (mainly exosomes), even in early stage, can establish communication between the tumor and the premetastatic niche, we can assume that patients with specific sEV cargo will be more prone to relapse. We found that the three relapsed patients in our cohort are partitioned differently compared with most patients in our cohort (Fig. 5A). Furthermore, combining data of patients with high Oncotype scores (two of which are among those with relapse; Fig. 5B), we observed a discrete partition of patients with high risk of relapse (Fig. 5, A and B). The 21-gene signature of the Oncotype RS score has little overlap with the proteins measured in our RPPA; however, both approaches point to the same cluster of patients, thus strengthening the power of our profiling and clustering approach in prediction of a relapse risk (Fig. 5A). This is a very important finding as the Oncotype breast recurrent test is routinely used in the clinic and considered as an excellent tool for treatment decision. The assay relies on the expression of 16 cancer-related genes and 5 reference genes and is used for personalized medicine to predict risk of distant recurrence and benefit of chemotherapeutic treatment. The Oncotype recurrence test (Oncotype DX) was established through analysis of 10,273 patients with BC and monitored their 9-year survival and relapse outcomes (35). The results shown here highlight the clinical potential of the applied method of sEV extraction and proteome analysis for establishing a relapse predictive signature in future studies using larger cohorts.

It is clear from the postsurgery plasma analysis that the EVs 24 weeks postsurgery do not revert to the normal state as observed in the healthy controls. These particles were slightly larger than the typical ~100-nm sEVs and, thus, may represent additional types of microvesicles such as plasma membrane blebbing or shedding vesicles. The changes in EV composition may contribute to different cargo profiles observed between postsurgery samples and healthy controls, especially since exosomes and microvesicles have different modes of biogenesis (36). These changes may reflect effects of therapy. Paclitaxel, for example, can induce such a shift in EV size histogram of BC cell lines and murine models (37). Chemotherapy-induced EVs have been suggested to promote metastasis (37, 38). Consistent with this concept, postchemotherapy EVs in our study form a separate cluster and are enriched with metastatic factors such as TFRC (24, 39).

Together, we have discovered several potential markers that could contribute to early detection of BC in a cost- and time-efficient manner. Furthermore, our study highlights the prediction power of sEV proteome profiling and its potential for BC diagnosis and management. Following further evaluation and validation with larger cohorts, the approach may serve as an important new tool for determining both therapeutic options and response to therapy.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank N. Elad (Weizmann Institute of Science, Israel) for help with the TEM photos, and D. Siwak from the RPPA core facility at MDACC (Houston, TX, USA) for technical assistance. We thank the Ramón Areces Foundation for providing a postdoctoral fellowship to F.G.O. S.L. is the incumbent of the Joyce and Ben B. Eisenberg Chair of Molecular Biology and Cancer Research. Funding: This work was supported by the Weizmann-Tel Hashomer Medical Center (Sheba) Collaboration, the Israel Science Foundation (ISF) grant no. 1530/17, the ISF-NSFC joint research program (grant no.2526/16), the MDACC-SINF grant, and a research grant from D. E. Stone. Author contributions: Y.V. performed most of the experiments described in the paper, analyzed the results, and wrote the manuscript. F.G.O. calibrated the sEV purification method and extracted some of the plasma samples. G.B.M. was involved in RPPA, discussion of the results, and editing of the manuscript. Y.L. performed the RPPA. M.J., S.H., and M.A. were involved in contact with the patients, agreement signing, and plasma collections. M.G. was responsible for all the clinical aspects of the project at Sheba hospital, designing the protocol of plasma collection, and operating the study. S.L. designed the study, supervised the team, and wrote the manuscript. All authors read and approved the final manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article