Research ArticleNEUROSCIENCE

Assessing bimanual motor skills with optical neuroimaging

See allHide authors and affiliations

Science Advances  03 Oct 2018:
Vol. 4, no. 10, eaat3807
DOI: 10.1126/sciadv.aat3807

Abstract

Measuring motor skill proficiency is critical for the certification of highly skilled individuals in numerous fields. However, conventional measures use subjective metrics that often cannot distinguish between expertise levels. We present an advanced optical neuroimaging methodology that can objectively and successfully classify subjects with different expertise levels associated with bimanual motor dexterity. The methodology was tested by assessing laparoscopic surgery skills within the framework of the fundamentals of a laparoscopic surgery program, which is a prerequisite for certification in general surgery. We demonstrate that optical-based metrics outperformed current metrics for surgical certification in classifying subjects with varying surgical expertise. Moreover, we report that optical neuroimaging allows for the successful classification of subjects during the acquisition of these skills.

INTRODUCTION

Motor skills that involve bimanual motor coordination are essential for performing numerous tasks, ranging from simple daily activities to complex motor actions performed by highly skilled individuals. Hence, metrics to assess motor task performance are critical in numerous fields, including neuropathology and neurological recovery, surgical training and certification, and athletic performance (16). In the vast majority of fields, however, current metrics are human-administered and subjective and require significant personnel resources and time. Thus, there is a critical need for more automated, analytical, and objective evaluation methods (3, 710). From a neuroscience perspective, bimanual task assessment provides insights into motor skill expertise, motor dysfunctions, interconnectivity between brain regions, and higher cognitive and executive functions, such as motor perception, motor action, and task multitasking (6, 11). Therefore, incorporating the underlying neurological responses in bimanual motor skill assessment is a logical step toward providing robust, objective metrics, which ultimately may lead to greatly improving our understanding of motor skill processes and facilitating bimanual-based task certification.

Among all noninvasive functional brain imaging techniques, functional near-infrared spectroscopy (fNIRS) offers the unique ability to monitor and quantify fast functional brain activations over numerous cortical areas without constraining and interfering with bimanual task execution. Hence, fNIRS is a promising neuroimaging modality to study cortical brain activations, but to date, only a very limited number of studies have been reported with regard to assessing fine surgical motor skills (12). These exploratory studies have reported differentiation in functional cortical activations between groups with varying surgical motor skills (1216). However, they suffer from recognized limitations (12), such as the lack of signal specificity between scalp and cortical hemodynamics (17, 18), the lack of multivariate statistical approaches that leverage changes in functional brain activity across multiple brain regions, and benchmarking against established metrics. Hence, they have not affected current practice of professional bimanual skill proficiency assessment. Here, we present an fNIRS-based optical neuroimaging methodology that overcomes all these shortcomings at once. We measure concurrently functional activations in the prefrontal cortex (PFC), the primary motor cortex (M1), and the supplementary motor area (SMA) to map the distributed brain functions associated with motor task strategy, motor task planning, and fine motor control in complex bimanual tasks (1922). Moreover, we increase the specificity of optical measurements to cortical tissue hemodynamics by regressing signals from scalp tissues (17). Furthermore, we leverage changes in intraregional activation and interregional coupling of cerebral regions via multivariate statistical approaches to classify subjects according to motor skill levels. Finally, we compare our fNIRS-based approach with currently used metrics in surgical certification by assessing bimanual motor tasks that are a part of surgical training accreditation.

The performance of the reported optical neuroimaging methodology enables the objective assessment of complex bimanual motor skills as seen in laparoscopic surgery. Imaging-distributed task-based functional responses demonstrated significant cortical activation differences between subjects with varying surgical expertise. By leveraging connected cerebral regions correlated to fine motor skills, we report increased specificity in discriminating surgical motor skills via fNIRS-based metrics. We show that our approach is significantly more accurate than currently established metrics used for certification in general surgery, as reported via estimated misclassification errors (MCEs). These results demonstrate that the combination of advanced fNIRS imaging with multivariate statistical approaches offers a practical and quantitative method to assess complex bimanual tasks. Topically, the reported optical neuroimaging methodology is well suited to provide quantitative and standardized metrics for bimanual skill–based professional certifications.

RESULTS

Surgical training task performance assessment

To demonstrate the potential of neuroimaging as an objective tool to assess bimanual task expertise, we selected a challenging bimanual pattern cutting (PC) task, which is part of the fundamentals of laparoscopic surgery (FLS) programs. The American Board of Surgery now requires demonstrating proficiency in the FLS for certification in general surgery. For our study, we recruited a population with varying laparoscopic surgical expertise as defined via the FLS program and conventional professional nomenclature. The subjects were either classified into established skill levels, such as Novice surgeons (1st- to 3rd-year surgical residents) and Expert surgeons (4th- to 5th-year residents and attending surgeons), or into trained medical students that are labeled as Skilled or Unskilled trainees (see table S1). The Control group constituted medical students that underwent no training at all. Note that all groups were independent, that is, each subject belonged to only one group. Each subject followed the official FLS PC task protocols. The experimental protocol followed by each cohort is provided in fig. S1. We recorded the FLS performance scores for all subjects and the cumulative sum control chart (CUSUM) computed for the population following a training protocol. It is important to note that this study is the first to acquire FLS performance scores simultaneously with the neuroimaging data. We obtained the FLS scoring methodology with consent under a nondisclosure agreement from the FLS Committee. Thus, this study is the first one to report on direct comparisons of neuroimaging metrics and FLS scores for validation. In all cases, we acquired the FLS performance score simultaneously with the neuroimaging data.

Figure 1A shows a schematic of the surgical trainer along with the fNIRS setup that is used to measure real-time cortical activation, and Fig. 1B shows the probe placement schematic designed for this study for PFC, M1, and SMA measurements. A physical depiction of the setup is also provided in fig. S2. Figure 2A reports on the descriptive statistics of the FLS performance score for the Novice and Expert surgeons, where Experts significantly outperformed Novice surgeons (P < 0.05). Similarly, the descriptive statistics of FLS performance scores over the whole training period are provided for all FLS task training subjects and untrained Control subjects in Fig. 2B. Results indicate that there are no significant differences between the untrained Control subjects and training subjects on day 1 or the pretest (P > 0.05). However, the trained FLS students significantly outperformed the untrained Control students on the final posttest, which follows a 2-week break period after training (P < 0.05). To provide insight at the subject level, Fig. 2C summarizes the CUSUM scores for each of the subjects with respect to trials performed. Trials that have a FLS performance score higher than 63 are considered a “success,” and the respective CUSUM score is subtracted by 0.07 (23, 24). Trials that have a performance score lower than 63 are considered a “failure,” and the CUSUM score is added by 0.93 (23, 24). Results indicate that three trained subjects (FLS 2, FLS 3, and FLS 5) passed the acceptable failure rate of 0.05 (H0), and thus are considered “Skilled” henceforth. The remaining trained four subjects (FLS 1, FLS 4, FLS 6, and FLS 7) are considered “Unskilled,” as they did not meet the FLS criteria for successful completion of the training program.

Fig. 1 fNIRS probe placement design for PFC, M1, and SMA measurements.

(A) Schematic depicting the FLS box simulator where trainees perform the bimanual dexterity task. A continuous-wave spectrometer is used to measure functional brain activation via raw fNIRS signals in real time. (B) Optode positions for coverage over the PFC, M1, and SMA. Red dots indicate infrared sources, blue dots indicate long separation detectors, and light blue dots indicate short separation detectors. The PFC has three sources (1 to 3), three short separation detectors (S1 to S3), and four long separation detectors (1 to 4). The M1 has 4 sources (4 to 7), 4 short separation detectors (S4 to S7), and 10 long detectors (5 to 14). The SMA has one source (8), one short separation detector (S8), and three long separation detectors (9, 15, and 16). Illustration by Nicolás Fernández.

Fig. 2 Bimanual motor task performance scores.

(A) FLS performance scores for Novice surgeons (green) and Expert surgeons (red), where Expert surgeons significantly outperformed Novice surgeons. (B) FLS performance scores for all training subjects (black) with respect to days trained compared to untrained Control subjects (orange). Two-sample t tests were used for statistical differentiation. n.s., not significant. Red+ signs indicate outliers. *P < 0.05. (C) CUSUM scores for each trained subject with respect to trials. The H0 threshold indicates that the probability of any given trained subject is mislabeled as a “Skilled trainee” is less than 0.05 and is subsequently labeled as a “Skilled trainee” subject. Results indicate that three trained subjects, FLS 2, FLS 3, and FLS 5, are labeled as “Skilled trainees.” The remaining trained subjects that do not cross the H0 line are labeled “Unskilled trainees.”

Optical neuroimaging assessment of established surgical skill levels

To ascertain that our neuroimaging methodology can discriminate between established skill levels, we quantified the real-time hemodynamic activation over the PFC, M1, and SMA cortical regions while Novice and Expert surgeons performed the standardized FLS bimanual PC task (23, 25, 26), where typical hemodynamic responses are shown in fig. S3. Figure 3A depicts the spatial distribution of average changes in functional brain activation, as reported by Δ[HbO2] for all subjects in the surgical Novice and Expert groups. Significant differences were observed in all the PFC, the SMA, the left medial M1 (LMM1), and the right lateral M1 regions, as depicted in Fig. 3B. More precisely, Novice surgeons have significantly higher functional activation in the PFC regions (P < 0.05) and significantly lower functional activation in the LMM1 and SMA regions when compared to Expert surgeons.

Fig. 3 Differentiation and classification of motor skill between Novice and Expert surgeons.

(A) Brain region labels are shown for PFC, M1, and SMA regions. Average functional activation for all subjects in the Novice and Expert surgeon groups are shown as spatial maps while subjects perform the FLS task. (B) Average changes in hemoglobin concentration (conc.) during the FLS task duration with respect to specific brain regions for Novice (green) and Expert (red) surgeons. Two sample t tests were used for statistical tests. *P < 0.05. (C) LDA classification results for FLS scores and all combinations of fNIRS metrics. (D) Leave-one-out cross-validation results show the ratio of samples that are below MCE rates of 0.05 for FLS scores and all other combinations of fNIRS metrics.

While motor skill discrimination as reported via significant differences in the measurements from different cortical regions is typically central to neuroscience discovery studies, it does not provide insights into the utility of the dataset to achieve robust classification based on quantitative metrics, such as accomplished during certification (that is, successfully pass a performance-based manual skills assessment). To quantify the performance accuracy of neuroimaging-based classification of individuals in preset categories such as Novice surgeons and Expert surgeons, we postcomputed MCEs associated with current accredited FLS performance scores and with our neuroimaging method.

We used a multivariate statistical method, namely, linear discriminant analysis (LDA), to estimate the MCEs associated with the FLS- and fNIRS-based measurements. We also used quadratic support vector machines (SVMs) to classify subject populations to ensure that classification results are not dependent on classification techniques. MCEs are defined as the probability that the first population is classified into the second population (MCE12), and the second population is classified into the first population (MCE21). Perfect classification is indicated by MCE = 0%, and complete misclassification is indicated by MCE = 100%. Figure 3C reports on these two MCEs for FLS performance scores and all combinations of fNIRS metrics for the classification of surgical Experts and Novices. Results indicate that subject classification is relatively poor when considering FLS performance scores only (MCE12 = 61% and MCE21 = 53%). On the other hand, neuroimaging-based quantities provide lower errors (besides SMA only). Specifically, the combination of PFC, LMM1, and SMA leads to the overall lowest MCEs (MCE12 = 4.4% and MCE21 = 4.2%). In addition, we provide the leave-one-out cross-validation results for the LDA classification models used for this dataset, as seen in Fig. 3D. This approach assesses the robustness of the LDA classification model, where each sample is systematically not used to build the LDA model and is treated independently. Results show that the combination of PFC, LMM1, and SMA leads to the most robust and best performing datasets to build the classification model, as demonstrated by the fact that 100% of the samples in the leave-one-out cross-validation have MCEs < 5%. The specific distributions of the classification results are shown in fig. S8 (A and B). Furthermore, we determined weights for each cortical region and their respective contribution to the total LDA model to show the correlation between different cortical regions on motor skill proficiency. The weights for left lateral PFC (0.58), medial PFC (0.23), right lateral PFC (0.29), LMM1 (−0.70), and SMA (0.14) contribute to the entire discriminant function, with the norm of all the weights equal to 1.0. Three regions (left lateral PFC, right lateral PFC, and LMM1) account for 96.18% of the discriminant function, indicating the preponderance of these regions for a robust and accurate subject classification.

Optical neuroimaging assessment of surgical skill level during training

Beyond determining skill levels of individuals compared to established groups, one key challenge in bimanual skill assessment and in laparoscopic surgery is the evaluation of bimanual motor skill acquisition during training. We applied our neuroimaging methodology to the FLS PC task over an 11-day training period for inexperienced medical students. On the basis of the established FLS metrics currently used in the field, we divided the enrolled medical student population into Skilled and Unskilled trainees at the completion of the training program, as previously shown in Fig. 2C. In addition, we recruited five medical students with no previous experience in laparoscopic surgery as the Control group that underwent no training. Figure 4A shows a visual spatial map conveying the average cortical activation of all Skilled trainees or Unskilled trainees while performing the posttest (that is, simulated certification exam). Like Expert versus Novice surgeons as shown in Fig. 3A, Skilled trainees exhibit increased cortical activation in the LMM1 and SMA and decreased PFC activation when compared to Unskilled trainees upon training completion and after a 2-week break.

Fig. 4 Differentiation of motor skill between Control, Skilled, and Unskilled trainees.

(A) Spatial maps of average functional activation for all subjects in each respective group during the FLS training task on the posttest day. (B) Average changes in hemoglobin concentration during stimulus duration with respect to specific brain regions for untrained Control subjects (orange) and all FLS training students (black). Two-sample t tests were used for statistical differentiation. *P < 0.05. Type I error is defined as 0.05 for all cases.

To provide a more global view of the training outcome, we present the descriptive statistics of functional activation between untrained Control students and all trained FLS students for pretest (day 1) and posttest (final day after 2-week break period) with respect to different brain regions in Fig. 4B. Results indicate that there are no significant differences between the Control and all training students (Skilled and Unskilled trainees) at the onset of the training program (P > 0.05). However, at the completion of the training and after a 2-week break period, both Skilled and Unskilled trainees exhibit a significantly lower functional activation in the left lateral and right lateral PFC compared to the untrained Control students (P < 0.05). Furthermore, trained FLS students have significantly higher LMM1 and SMA activation than untrained Control students during the posttest (P < 0.05). These results reinforce the findings of the previous section regarding functional activation differences between Expert and Novice surgeons. To further stress the fact that our neuroimaging modality enables us to provide a more granular view of training outcomes, we computed the MCEs for the three populations involved in this surgical training study (Control, Skilled, and Unskilled trainees).

Figure 5 (A and B) reports on the MCEs for each potential combination of medical student populations at different stages or end points of training. We computed these MCEs using the combined PFC, LMM1, and SMA brain functional optical measurements. The longitudinal MCEs of pretest populations versus odd days of training indicate that at the onset of the training, the populations could not be distinguished as reported by large intergroup MCEs, as shown in Fig. 5A. However, after day 7, the Skilled trainee population demonstrated a significantly different neuroimaging distributed response compared to the first day of training, as demonstrated by very low intragroup MCEs. Conversely, the Unskilled trainee population did not exhibit these marked trends. Even during the final training day (day 11), we observed poor intragroup MCEs for the Unskilled trainee population (MCE12 = 24% and MCE21 = 47%). In contrast, we completely classified Skilled trainees on the final training day from Skilled trainees on the pretest, with MCE12 = 0% and MCE21 = 0%.

Fig. 5 Classification of motor skill between Control, Skilled, and Unskilled trainees.

(A) Inter- and intragroup MCEs for each subject population (Control, Skilled, and Unskilled trainees) with respect to training days. MCE12 and MCE21 values significantly decrease below 5% when classifying pretest Skilled and Unskilled trainees on the final training day. Furthermore, MCEs are also low when classifying Skilled and Unskilled trainees on the final training day, along with Skilled trainees and untrained Control subjects. (B) MCEs are reported for each combination of training groups (Control, Skilled, and Unskilled trainees) with respect to pretest, posttest, and final training days. MCEs are substantially low when classifying Skilled trainees and Control subjects along with inter-Skilled trainee group classification. Unskilled trainees, however, showed high MCEs even when compared to Unskilled trainees and Control subjects during the posttest. As a measure of skill retention, classification models were also applied for all subject groups from the final training day to the posttest.

Similar results were observed when looking at the same intragroup misclassifications between the pretest and posttest conditions, as shown in Fig. 5B. Classification continues to remain poor for Unskilled trainees when comparing this population from the pretest and the posttest, with MCE12 = 58% and MCE21 = 80%. Yet, we successfully classified Skilled trainees during the pretest from Skilled trainees during the posttest, with MCE12 = 10% and MCE21 = 11%. While the Unskilled and Skilled trainee intergroups were successfully classified at the end of the training session compared to the pretest, the two populations did exhibit some intragroup overlap in their associated probability density function during the posttest. Of importance, both trainee populations did not exhibit marked differences between the final training day and posttest measurements, as indicated by relatively high MCEs. Classification of Skilled trainees and Control subjects during the posttest also yielded very low MCEs, whereas classification of Unskilled trainees and Control subjects still yielded high MCEs, as shown in further detail in fig. S8 (C and D). These cross-validated classification methods show that cortical activation has significantly changed for Skilled trainees during the posttest when compared to Skilled trainees on the pretest or untrained Control subjects, whereas Unskilled trainees do not exhibit such a marked trend.

Classification of subjects with varying surgical expertise levels

For our neuroimaging-based approach for motor skill differentiation to be formative, it is important to validate the classification models across all subject populations, especially since the studies associated with assessment of established skill levels and FLS training were performed independently in two different institutions. The subject population represents the full spectrum of laparoscopic surgical expertise, from Novices to certified attending surgeons, including Skilled and Unskilled medical student trainees. Regarding the number of procedures and associated level of expertise (at the completion of the training protocol), it is expected that the distribution in terms of surgical skills levels, from more proficient to less proficient, is distributed as follows at the group level: Expert surgeons, Skilled trainees, Unskilled trainees, Novice surgeons, and Control.

Figure 6 shows the cross-validated classification model results comparing all subject population groups with varying expertise levels. Each box corresponds to a single trial for each expertise group, as shown via different-colored borders. Shaded regions within each box indicate the MCE if that trial is removed from the classification model. For example, the first trial in cross-validation results for classifying Expert surgeons from Skilled trainees shows an MCE of 0%, as indicated by a white shade. However, the 29th sample in the classification model, or the third trial in the Skilled trainee group, shows an MCE of 89% when removed from the classification model. The latter is an indication that the LDA classification model fails to reliably classify Experts and Skilled trainees if the third sample in the Skilled trainee group is removed.

Fig. 6 Cross-validation results for classification across all subjects with varying degree of motor skills.

Each box represents one trial per expertise group during the posttest, where the shaded regions indicate the MCE if that given trial is removed from the classification model. Cross-validation results with their respective ratio of samples that are below MCE rates of 5% for Expert surgeons versus Skilled trainees (28 of 35 samples), Expert surgeons versus Unskilled trainees (29 of 38), Expert versus Novice surgeons (43 of 43), Expert surgeons versus untrained Control subjects (34 of 38), Skilled trainees versus Unskilled trainees (15 of 21), Skilled trainees versus Novice surgeons (24 of 26), Skilled trainees versus untrained Control subjects (18 of 21), Unskilled trainees versus Novice surgeons (16 of 29 samples), Unskilled trainees versus untrained Control subjects (11 of 24), and finally, Novice surgeons versus untrained Control subjects (9 of 29).

First, we compare Expert surgeons with all other subject populations. Results indicate that Expert surgeons can be robustly classified with all subject populations, except for Skilled trainees where only 28 of 35 samples have MCEs less than 5%. Similarly, Skilled trainees can be successfully classified with Unskilled trainees, Novice surgeons, and Control subjects. Conversely, Unskilled trainees, Novice surgeons, and Control subjects exhibit a poor intergroup classification as reported by multiple samples leading to high MCEs. Overall, these results indicate that the population with high expertise levels (Expert surgeons and Skilled trainees) can be robustly classified compared to groups that have not yet attained the required expertise levels as required by the FLS training program. “Noncertified” group, including Novice surgeons, Unskilled trainees, and Control subjects, however, cannot be robustly classified among themselves.

DISCUSSION

While there have been extensive efforts in the surgical community to confirm training effectiveness and validation of the FLS program (2529), the surgical skill scoring component has received little attention and has garnered criticisms, such as subjectivity in scoring, inconsistencies in FLS score interpretations, and no correlation of patient injury reduction due to FLS certification (26, 28, 3034). Despite the lack of rigorous evaluation of the FLS scoring methodology, the program has become the de facto evaluation method for accreditation of skills required for general surgery (32). Given the high-stakes nature of surgical assessment in the FLS program and its implications on training for future surgeons, there is a current gap in the rigorous validation of FLS scores as a robust and objective methodology (32). In this regard, previous studies have broached the concept of noninvasive brain imaging as a means for objectively assessing surgical skills (12, 13, 16, 35). However, they suffer from methodological limitations that are now well recognized by the fNIRS community, namely, the contamination of superficial tissue, such as scalp, dura, or pia matter, in the recorded measurements (17, 18). To highlight this point, results from the Expert and Novice surgeon cohort in this study were reprocessed without the regression of superficial tissue data and are provided in fig. S9 (A to C). These results demonstrate that previously reported fNIRS-based metrics with the inclusion of superficial tissue responses can statistically differentiate surgical novices and experts (1215, 35) yet fail to classify subjects on the basis of motor skill proficiency and perform as poorly as current surgical skill assessment metrics. In contrast, regressing shallow tissue hemodynamics from the optical measurements significantly reduces the false omission rate, where a surgical novice is mistakenly classified as an expert, to 0%, whereas previous approaches still maintain false omission rates of 13 to 18% (see table S4). While oxygenated hemoglobin is the primary metric used in this study due to superior contrast-to-noise ratios (36), alternative metrics, such as deoxygenated hemoglobin (Hb), tissue oxygenation saturation (StO2), or total hemoglobin (HbT), may provide further insights into optimal measurement sensitivity and surgical skill assessment.

Beyond improving the robustness of optical measurements’ sensitivity to cortical activations, this work is also the first to measure functional activation in a multivariate fashion to determine critical cortical regions that are correlated to surgical motor skill differentiation and classification. More specifically, this is the first report of measuring functional activations in the PFC, M1, and SMA cortical regions that are putatively associated with motor task strategy, motor task planning, and fine motor control (2, 13, 14, 35, 3742). Since the PFC is associated with decision-making and motor strategy development (2, 13, 14, 35, 3742), it is expected that PFC activation decreases as motor skill proficiency increases, as seen in surgical experts. Similarly, higher executive functions such as fine motor skill control are also correlated to increased activation in the M1 and SMA, as expected for surgical experts. Our results corroborate these findings regarding activation changes in the PFC, M1, and SMA as motor skill proficiency increases (2, 13, 14, 35, 3742). Furthermore, our results demonstrate that the inclusion of these cortical regions significantly improves the utility of fNIRS in assessing bimanual skills and can offer improved objective metrics over conventional FLS-based metrics currently used for certification in general surgery. Of importance, while using single regional readouts leads to enhanced population differentiation, the combination of the three abovementioned cortical regions provides excellent classification performances (for completeness, we also provide bivariate classification results using SVMs in figs. S4 to S7). When combining measurements from these three brain regions, optical neuroimaging enables a remarkably robust classification of subjects based on their proven surgical skills levels, including novice, intermediate, and expert skill levels. More precisely, our methodology allows for (i) highly accurate classification of subjects with well-defined bimanual skills levels with better performance than currently used metrics, (ii) longitudinally assessing the acquisition of surgical skills during the FLS training program, and (iii) performing robust classifications of populations recruited from multiple institutions with varying skill levels.

On a practical side, it is important to note that even if our methods leverage the most recent technical developments in the field of fNIRS, the instrumental and algorithmic platforms used herein are readily available for wide dissemination and use in surgical training facilities. Moreover, as more neuroscience-driven investigations focus on mapping distributed brain function, the positioning of the optodes (source or detector) on the subject scalp becomes increasingly challenging with extended spatial coverage. One key consideration is to ensure that effective coupling is minimally affected by natural movements and not compromised by the subject’s hair. Hence, positioning of the optodes can be a lengthy process that is not suitable for professional environments that are time-constrained either by cost or throughput considerations. In this regard, our study identifies that the PFC, SMA, and LMM1 regions are sufficient for accurately assessing bimanual skill–based task execution. Thus, probe placement can be completed in a short amount of time without any impact on task execution, both critical factors for the acceptance of our surgical skill assessment methodology by the surgical community.

Beyond bimanual skill assessment and objective classification of individuals based on their skill levels, the work herein provides a sound foundation to further investigate the neurophysiology underlying bimanual skill acquisition and retention. Here, we deliberately focused on reading the brain outputs as a means to provide objective and quantitative measures of bimanual task execution without delving into the mechanistic understanding of the underlying physiology and functional connectivity. However, current neurophysiological knowledge supports the overall findings of our studies, namely, increases in LMM1 and SMA activation and significant decreases in PFC activation across all groups with increasing motor task performance (2, 1214, 35, 3742). It is also important to note that previous studies use motor tasks that are deliberately designed to decrease variability in studying cortical activation changes, such as finger tapping or simple visual or virtual-based unimanual tasks. Conversely, the FLS task at hand is a complex bimanual task that involves visuospatial coordination, varying degrees of synchronicity among hands, motion frequency and range, and exerted forces on the surgical tools for task completion. Consequently, it is not feasible to ensure that each session replicates the same conditions, and hence, the same cortical responses. Moreover, the cortical activations and interactions associated with task planning and execution are dynamic by nature, from expected explicit control in the early stages of learning to more implicit or automatic control in the later stages of motor learning. Thus, mapping the cortical networks and their dynamical changes associated with task execution and skill acquisition should be the next step.

There is currently great interest in investigating dynamic functional connectivity (DFC) in neuroscience. Typically, DFC studies are conducted using functional magnetic resonance imaging, which is not appropriate for protocols requiring supine positions and/or nonelicited task execution. Recent studies have demonstrated that fNIRS is well positioned in these scenarios (4345). We foresee that implementing these approaches in the context of bimanual skill assessment can lead to refined skill level assessment metrics and potentially provide predictive models of skill acquisition. For instance, composite cognitive metrics, possibly obtained by weighting regional cortical measurements using the LDA weights for best classification between Skilled and Unskilled trainees, could be central to developing a tailored surgical training program for optimal skill acquisition and retention assessment (figs. S6 and S7). In particular, this approach may be expanded to robustly identify and predict surgical candidates that may achieve faster learning curves for learning complex surgical skills, and by extension, achieve surgical skill mastery with a significantly faster rate than other surgical trainees. Furthermore, these methodologies can be easily applied to other fields, including rehabilitation, brain computer interfaces, robotics, stroke, and rehabilitation therapy (4648). In summary, we believe that this noninvasive imaging approach for objective quantification for complex bimanual motor skills will bring about a paradigm change in broad applications, such as surgical certification and assessment, aviation training, and motor skill rehabilitation and therapy.

MATERIALS AND METHODS

The study was approved by the Institutional Review Board of Massachusetts General Hospital, University of Buffalo, and Rensselaer Polytechnic Institute.

Hardware and equipment

We used a validated continuous-wave, 32-channel, near- infrared spectrometer for this study, which delivered infrared light at 690 and 830 nm (CW6 system, TechEn Inc.). The system used eight long-distance and eight short-distance illumination fibers coupled to 16 detectors. The long-distance channels comprised all the measurements within a 30- to 40-mm distance between the source and the detector, and the short distance channels comprised all the measurements within a ~8-mm distance between the source and the detector. The short channels were limited to probing the superficial tissue layers, such as skin, bone, dura, and pial surfaces, whereas the long channel probed both superficial layers and cortical surface. The probe design was assessed using Monte Carlo simulations and was characterized to have high sensitivity to functional changes in the PFC, M1, and SMA. A schematic of the geometric arrangement of probes is shown in Fig. 1B.

Participants and experimental design

Seventeen surgeons and 13 medical students participated in this study. The minimum number of samples required for this study was determined a priori using power analysis according to the two-sample t test comparing the means between two groups. On the basis of an initial pilot study, a conservative effect size (d = 1.4) was chosen for the prefrontal and motor cortices. Furthermore, with a 95% confidence interval and a minimum power of 0.80, it was determined that a minimum of eight samples were required per group, which was calculated by a statistical software, G*Power (49). The sample population was distributed within Novices (n = 9, 1st- to 3rd-year residents with mean age 31 ± 2) and Experts (n = 8, 4th- and 5th-year residents and attending surgeons with mean age 35 ± 6) surgeons. Subject demographics are listed in table S1. We ad hoc defined surgical novices and experts on the basis of the number of laparoscopic procedures completed according to the literature (15, 50). To avoid any issues regarding hemisphere-specific activation, only right-handed participants were selected. All participants were instructed on how to perform the task with standardized verbal instructions indicating the goal of the task and rules for task completion. The optical probes were positioned on the participant with great care to avoid any hair between the source/detector and scalp, as well as robust coupling with the skin. The cap holding the fibers on the participant as well as the fibers did not hinder the participant’s movement during bimanual tasks. This cap is a standard electroencephalography EASYCAP (www.easycap.de) that has been used for numerous fNIRS studies (16, 35, 51), which has marked anatomical landmarks for placement on the scalp. The cap was carefully placed on the scalp by aligning the CZ, FP1, and FP2 landmarks on the head and the marked landmarks on the cap. Specific anatomical location simulations (52) for each source and detector channel are shown in table S2. The participants were asked to perform the FLS PC task using an FLS-certified simulator, where the goal was to use laparoscopic tools to cut a marked piece of gauze as quickly and as accurately as possible. The experiment for each participant consisted of a block design of rest and stimulus period (cutting task). The surgical cutting task was performed until completion or stopped after 5 min. Then, a rest period of 1 min was observed. The cycle of cutting task and rest periods was repeated five times per participant. The following measurements were recorded simultaneously for each participant during each trial: total task time, light intensity (raw NIRS data), and performance scores for the PC task were based on the FLS metrics.

NIRS after processing

Data processing was completed using the open-source software HOMER2 (53), which is implemented in MATLAB (MathWorks). First, channels with signal quality outside of the range of 80 to 140 dB were excluded. The remaining raw optical signals (intensity at 690 and 830 nm) were converted into optical density using the modified Beer-Lambert law with a partial path-length factor of 6.4 (690 nm) and 5.8 (830 nm) (5456). Motion artifacts and systemic physiology interference were corrected using recursive principal component analysis and low-pass filters (53, 57, 58). The filtered optical density data were used to derive the delta concentrations of oxyhemoglobin and deoxyhemoglobin.

The short-distance channels were regressed from the long-distance channels to remove any interference originating from superficial layers. This was achieved by using a consecutive sequence of Gaussian basis functions via ordinary least squares to regress scalp and dura activation data collected from the short separation fibers to create the hemodynamic response function (HRF) (17, 18, 59). The corresponding source and detector pairs for each source were averaged over the task start and end time for each trial and subject (as shown in fig. S1). For example, if a subject performed five trials, then the HRF would be group-averaged over the task stimulus for each trial, ensuring that rest periods were not included in the group average. It is important to note that the task stimulus period for a given trial of each subject may vary because of motor skill proficiency, as seen in varying task completion times (table S3). The result is a scalar value for the change in oxyhemoglobin according to different brain regions for all participants. Finally, scalar values for each of the 32 channels were grouped into eight distinct regions of interest according to the anatomical structures as follows: left PFC (source 1, detectors 1 and 2), medial PFC (source 2, detectors 2 and 3), right PFC (source 3, detectors 3 and 4), left lateral M1 (source 4, detectors 5 to 8), LMM1 (source 5, detectors 8 to 10), right medial M1 (source 6, detectors 9 to 12), right lateral M1 (source 7, detectors 11 to 14), and finally, SMA (source 8, detectors 9, 15, and 16).

Task performance metrics, statistical, and classification methods

The FLS scores were determined using the standardized FLS scoring metric formulation for the PC task based on time and error. This formulation is intellectual property–protected and was obtained with consent under a nondisclosure agreement from the FLS Committee, and hence, its details cannot be reported in this paper. Descriptive and inferential statistics were performed using SPSS (IBM Inc.). Two-sample t tests were used to determine statistically significant differences in functional activation between two groups. All box plots display median values (red bar) along with SDs. A confidence level of 95% was selected as the minimum required to reject the null hypothesis.

LDA was used to classify the populations on the basis of their FLS scores and functional brain activation metrics. Before the analysis of LDA, all recorded metrics were first normalized, that is, the sample mean and variance are 0 and 1. LDA determines the optimal vector v such that the projected metrics of two classes (for example, Novice and Expert surgeons) in the v direction have the highest separation between the classes with the lowest variance for each class (60). The resulting LDA scores were objectively compared for each class, and the degree of separation was objectively quantified as MCEs.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/10/eaat3807/DC1

Table S1. Subject demographics and descriptive data.

Table S2. Theoretical Montreal Neurological Institute coordinates for each source detector channel.

Table S3. FLS task trial completion times for Novices.

Table S4. Expert versus Novice classification results for fNIRS (with and without short separation regression) and FLS metrics.

Fig. S1. Experimental protocol design.

Fig. S2. Subjects performing FLS PC task with fNIRS measurements.

Fig. S3. Group average HRFs with respect to cortical regions.

Fig. S4. Quadratic SVM classification of Expert and Novice surgeons.

Fig. S5. Weighted quadratic SVM classification of Expert and Novice surgeons.

Fig. S6. Quadratic SVM classification of Skilled versus Unskilled trainees.

Fig. S7. Weighted quadratic SVM classification of Skilled versus Unskilled trainees.

Fig. S8. Probability density functions for projected LDA classification models.

Fig. S9. Differentiation and classification of motor skill between Novice and Expert surgeons without short separation regression.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We would like to acknowledge D. Boas and J. Dubb from the Martinos Imaging Center for their significant support throughout this project. We would also like to acknowledge A. “Buzz” DiMartino and his team at TechEn for their gracious support for the fNIRS hardware components. Finally, we would like to thank N. Fernández for his contributions to the illustrations. Funding: This work was supported by funding provided by NIH/National Institute of Biomedical Imaging and Bioengineering grants 2R01EB005807, 5R01EB010037, 1R01EB009362, 1R01EB014305, and R01EB019443. Author contributions: A.N. conceived the original idea. A.N., X.I., S.D., and M.A.Y. designed the research study. D.W.G., C.C., S.D.S., and M.A.Y. contributed to study logistics, subject recruitment, and clinical expertise. A.N. performed research studies. A.N., M.A.Y., U.K. performed data processing and analyses of results. A.N., M.A.Y., U.K., X.I., and S.D. interpreted the results. A.N. and X.I. wrote the manuscript, and M.A.Y., U.K., and S.D. edited the manuscript. All authors discussed conclusions and commented on the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper are available through a linked repository (https://github.com/arun-nemani/Surgical-skill-classification). Correspondence and requests for materials should be addressed to X.I. Further information regarding data, figures, and other research findings in this study may be requested from corresponding authors.
View Abstract

Navigate This Article