Biomimetic and flexible piezoelectric mobile acoustic sensors with multiresonant ultrathin structures for machine learning biometrics

See allHide authors and affiliations

Science Advances  12 Feb 2021:
Vol. 7, no. 7, eabe5683
DOI: 10.1126/sciadv.abe5683


Flexible resonant acoustic sensors have attracted substantial attention as an essential component for intuitive human-machine interaction (HMI) in the future voice user interface (VUI). Several researches have been reported by mimicking the basilar membrane but still have dimensional drawback due to limitation of controlling a multifrequency band and broadening resonant spectrum for full-cover phonetic frequencies. Here, highly sensitive piezoelectric mobile acoustic sensor (PMAS) is demonstrated by exploiting an ultrathin membrane for biomimetic frequency band control. Simulation results prove that resonant bandwidth of a piezoelectric film can be broadened by adopting a lead-zirconate-titanate (PZT) membrane on the ultrathin polymer to cover the entire voice spectrum. Machine learning–based biometric authentication is demonstrated by the integrated acoustic sensor module with an algorithm processor and customized Android app. Last, exceptional error rate reduction in speaker identification is achieved by a PMAS module with a small amount of training data, compared to a conventional microelectromechanical system microphone.


In the coming era of artificial intelligence (AI) and Internet of Things (IoT), the voice user interface (VUI) has been attracting substantial interest for virtual secretary, smart home appliances, mobile electronics, and biometrics due to intuitive human-machine interaction (HMI) in hyperconnected society (14). Acoustic sensors convert an analog sound wave to digital signals, which are essential to enable HMI voice communications via machine learning algorithms (57). Commercial condenser and piezoelectric type microphones represent a flat frequency response of low sensitivity by locating resonant frequency above the audible sound spectrum (810). Single-channel capacitive microphones also have drawbacks such as high power consumption and unstable circuit operation because of a preamplification of low sensitivity. In contrast, human can detect the resonant sound using ~15,000 hair cell channels, allowing far-distant and accurate recognition (11).

Recently, several research teams have reported multiresonant piezoelectric acoustic sensors by mimicking the basilar membrane of human cochlea (5, 1216). Our group has also developed a highly sensitive flexible piezoelectric acoustic sensor (f-PAS) with a multitunable frequency band by using an inorganic perovskite Pb(Zr0.52Ti0.48)O3 [lead-zirconate-titanate (PZT)] thin film (5, 1719). Resonant electrical signals from the f-PAS presented four to eight times higher sensitivity than that of the reference condenser microphone over the voice frequency range. In this new type of acoustic sensor, the most challenging task is covering the entire voice spectrum by widening the sharp resonant peaks from limited channels. A machine learning–based speaker recognition was also demonstrated, achieving a 75% reduction in error rate compared to a conventional condenser type acoustic sensor. However, large device size (35 mm by 20 mm) has restricted the f-PAS integration into a small microchip for mobile and IoT systems. It is widely known that the sensitivity and resonant frequency are inversely proportional to the downsizing dimension; therefore, these negative synergistic influences make it more difficult to achieve the highly sensitive resonant mobile acoustic sensor. Downscaling technology of acoustic sensor relies on the frequency band control of ultrathin piezoelectric membrane while maintaining high sensitivity in miniaturized dimensions. To increase sensitivity, the mini f-PAS should overcome the inverse relationship between scaling and sensitivity, for example, by mimicking the human nature of resonant sound detection (20).

In human cochlea, a tiny trapezoidal basilar membrane (~1-mm width) has the average 10-μm ultrathin structure with gradual thickness change (21). The elongated region of the thin basilar membrane resonates at low frequencies, while the shortened thicker region of the basilar membrane responds at high frequencies (22). The hair cells convert the mechanical membrane vibrations of resonant sound into electrical pulses, similar to the flexible piezoelectric mechanism (21). These ultrathin biomimetic structures for mechanoelectrical bioconversion can be used in scaling of resonant piezoelectric acoustic sensors for mobile applications.

Herein, we report a highly sensitive and flexible piezoelectric mobile acoustic sensor (PMAS) via biomimetic frequency band control. A multiresonant voice spectrum was achieved by adopting the PZT thin film on an ultrathin polymer membrane for a mobile-sized acoustic sensor (130 mm2). Our simulation theoretically proved that the ultrathin polymer can broaden the resonant bandwidth of the inorganic piezoelectric film for full-cover phonetic frequencies, which is an essential factor of a flexible acoustic sensor. The biomimetic piezoelectric membrane of PMAS showed outstanding sensitivity figure of merit (FOM) of 52 mV/Pa in a 130-mm2 area, which is superior to the previous reports. The sensitivity of PMAS in a miniaturized dimension was improved by adjusting internal stress and enhancing lateral dipoles of the PZT membrane. Electrical signals of the signal-to-noise ratio (SNR) and linearity to sound pressure level (SPL) were analyzed to present excellent frequency response of the resonant PMAS. Biometric authentication for mobile smartphones was also demonstrated by integrating the mini PMAS, a machine learning processor, and a wireless transmitter into a mobile acoustic sensor module. The PMAS module achieved a 90% speaker identification rate with a 56% reduction in error rate compared to that of the conventional condenser microphone using a Gaussian mixture model (GMM) algorithm with a small amount of training data. The biometric PMAS module has showed the commercial mobile application of f-PAS for far-distant and accurate voice recognition.


Biomimetic PMAS and mobile biometric authentication

Figure 1A schematically illustrates the overall concept for biomimetic frequency band control and biometric mobile authentication of miniaturized PMAS. The following are the detailed procedures: (i) An ultrathin piezoelectric membrane was used for mimicking the basilar membrane in a human cochlea. The basilar membrane with a tiny asymmetric trapezoidal shape uses an ultrathin membrane of average 10-μm thickness to detect multiresonant frequencies (21). This biomimetic mechanism enables multifrequency band control for scaling of a resonant acoustic sensor. The resonance frequency is inversely proportional to the downscaling dimension of PMAS, which is defined by Eq. 1 (23)fRtl2Eρ(1)where fR is the resonance frequency, l and t are the length and thickness of PMAS, and E and ρ are the elastic modulus and density. The miniaturized PMAS with the multifrequency band was fabricated using a biomimetic membrane of 10-μm thickness that was similar in thickness to the basilar membrane. The resonant frequencies of PMAS were systematically tuned into the voice spectrum where most of the vocal energy is distributed. (ii) The PMAS was composed of an ultrathin polyethylene terephthalate (PET) substrate, a stress-controlled piezoelectric membrane and multichannel interdigitated electrodes (IDEs). The 4.8-μm thick polymer membrane with low quality factor (Q factor) was able to cover the entire voice frequency range by broadening the resonant bandwidth of PMAS (2426). The inorganic-based laser lift-off (ILLO) technique was used for the PMAS, as shown in Materials and Methods and fig. S1A (19, 2732). To increase the sensitivity of PMAS, the internal residual stress of PZT membrane was adjusted to enhance dipole alignment along the direction of IDEs (33, 34). The multiresonant band of PMAS exhibited outstanding frequency sensitivity, superior to that of a nonresonant condenser microphone over the voice spectrum from 100 Hz to 4 kHz. (iii) Machine learning–based biometric authentication was demonstrated by integrating the PMAS with an algorithm processor and signal transmitter. Multichannel data from a PMAS inserted into a smartphone were wirelessly transmitted to a machine learning processor and a customized biometric app. Last, access permission and prohibition to the mobile smartphone were controlled by GMM algorithm comparing input multichannel signals of the PMAS module with the pretrained database.

Fig. 1 Overall concept of biomimetic PMAS and mobile biometrics.

(A) Schematic illustration of biomimetic multifrequency band control and mobile biometric authentication of miniaturized PMAS: (i) Biomimetic ultrathin PMAS mimicking the basilar membrane of human cochlea to locate the multiresonant frequency into the voice range from 100 Hz to 4 kHz. (ii) Highly sensitive frequency response of PMAS for full-cover phonetic spectrum by using a low Q factor ultrathin polymer, a stress-controlled piezoelectric membrane, and a multichannel electrode. (iii) Biometric authentication for mobile application using the integrated acoustic module composed of mini PMAS, machine learning processor, and wireless transmitter. (B) Photograph of the ultrathin multichannel PMAS membrane floating on fragile bubbles. The inset shows a cross-sectional scanning electron microscopy image of the PZT thin film and adhesive layer on the ultrathin polymer. Photo credit: Hee Seung Wang, Korea Advanced Institute of Science and Technology. (C) Comparison of sensitivity FOM (FOMsens) between highly sensitive miniaturized PMAS and previously reported resonant piezoelectric acoustic sensors.

Figure 1B shows a photograph of the ultrathin multichannel PMAS, which is flexible enough to conformally contact small bubbles. This ultrathin membrane is extremely important for conversion of tiny sound wave input into maximum resonant displacements. As shown in the inset cross-sectional scanning electron microscopy image, a 4.8-μm-thick PET substrate was achieved to control the multifrequency band of PMAS into the voice frequency range. A 3-μm-thick rectangular PZT membrane with 1-μm-thick adhesive layer were used to fabricate the biomimetic PMAS of 10-μm thickness. Figure 1C displays the comparison of sensitivity FOM (FOMsens) between highly sensitive mini PMAS and previously reported piezoelectric acoustic sensors (5, 1217). Sensor performance per unit size is an important factor in acoustic devices because the sensitivity is proportional to the active piezoelectric area (20). The FOMsens, which is defined as sensitivity per area, uses the peak voltage of a standard 94-dB condition that is converted by Eq. 2 (15)Sensitivity=VP=VP0×10Lp20(2)where P0 is the reference sound pressure of 0.00002 Pa, Lp is the SPL in units of dB, and V is the peak voltage at decibels of Lp. The biomimetic piezoelectric membrane exhibited a superior FOMsens of 40 mV/Pa·cm2, higher than that of other resonant acoustic sensors, as shown in table S1. When compared with the previous f-PAS report (45 mV/Pa in 450-mm2 area; l = 35 mm, w1 = 5 mm, w2 = 20 mm, and t = ~50 μm), the miniaturized PMAS achieved not only a 70% size reduction but also a 20% improvement of sensitivity (52 mV/Pa in 130 mm2 area; l = 20 mm, w1 = 3 mm, w2 = 10 mm, and t = ~10 μm), as shown in fig. S1B (17).

Multiresonant band and enhanced sensitivity for full-cover phonetic spectrum

Figure 2A displays the quadratic behavior of resonance frequency versus downsizing of the PMAS using a finite element method (FEM) simulation with the experimentally identical piezoelectric material and shape. The dimensions of PMAS were reduced along the x-y direction while maintaining a trapezoidal shape and a constant membrane thickness of 40 μm (same as the f-PAS device in the previous report) (17). The third resonance frequencies of 130 mm2 was located above the voice frequency range of 10 kHz, which is inappropriate for a resonant-based acoustic sensor (35). To position the multiresonant frequencies within the voice spectrum, an ultrathin polymer membrane should be used for the miniaturized PMAS, inspired by human basilar membrane. Figure 2B shows the linear correlation between the resonance frequency and membrane thickness within a fixed active area of 130 mm2 (same dimensions as the actual PMAS device) via the FEM method. The first, second, and third resonance frequencies in the 10-μm-thick membrane were distributed within a range from 100 Hz to 4 kHz, which is crucial for allocating multiresonant frequency bands in the voice spectrum. Figure 2C displays the broadening effect of PMAS bandwidth by exploiting the polymer membrane underneath the inorganic thin film. The frequency signal of only PZT thin film presented a sharp and discrete resonant peak (∆f ~ 40 Hz) due to the low loss factor of ~0.0005. In contrast, the PZT on the ultrathin polymer with a high loss factor (~0.2) showed a broad resonant bandwidth (∆f ~ 300 Hz). These simulation results can be interpreted that the ultrathin polymer membrane with elastic damping and low Q factor (~16.6) generated extra oscillation around the resonance frequency of PMAS (2426). The Q factor is defined by the following Eq. 3Q=f0Δf(3)where f0 is the resonant frequency of PMAS membrane and ∆f is the frequency bandwidth below 3 dB of peak value. In addition, only PZT thin film exhibited the unintended resonance frequency shift from 830 to 3410 Hz due to a higher Young’s modulus (~ 344 GPa) than the ultrathin polymer (~2 GPa) (23, 35). In addition to the resonance frequency control shown in Fig. 2B, bandwidth broadening by the polymer membrane could provide a tool to cover the entire human voice spectrum. The relationship between potential and resonant bandwidth as a function of ratio of polymer to piezoelectric membrane was also simulated to prove the validity of the ultrathin polymer for broadening the frequency response of the 3-μm-thick PZT film, as shown in fig. S2. The trade-off result showed that the ultrathin polymer ratio to piezoelectric film from 1 to 20 was adequate to support the 3-μm-thick PZT membrane for covering the target frequency spectrum.

Fig. 2 Frequency band control and sensitivity improvement for full-cover phonetic spectrum.

(A) FEM calculation for quadratic behavior of resonance frequencies according to the miniaturized dimension of PMAS with 40-μm thickness (l′ < l). (B) Distribution of resonance frequencies in a 130-mm2 active area as a function of PMAS thickness calculated by FEM simulation (t′ < t). (C) Comparison of resonant bandwidth between only PZT thin film and PZT on ultrathin polymer. The simulated results display a broad bandwidth of PZT on ultrathin polymer with low Q factor in voice frequency range, compared to the sharp and discrete resonant spectrum of only PZT thin film. a.u., arbitrary units. (D) Schematics and FEM calculations for dipole alignment and in-plane piezoelectric potential under residual stress (tensile and compressive) in IDE structure. (E) Comparison of compressive stress at each piezoelectric thickness calculated by d-spacing variation on Psi orientation at (110) peak. (F) Saturation and remnant polarization values as a function of PZT membrane thickness by measuring the P-E hysteresis loops. The bottom inset shows the optical microscope image of IDE channel.

Figure 2D presents schematic diagrams and FEM calculations of the piezoelectric potential distribution in the ultrathin 3-μm PZT membrane depending on residual stress. The polarized dipoles can be aligned in the lateral or longitudinal directions by tensile or compressive stresses in film configuration, respectively (33). Note that the performance of piezoelectric membrane can be enhanced by the lateral dipole alignment preference (i.e., minimized compressive stress) due to geometrical match with the IDE direction. This methodology confirms that the sensitivity and performance of PMAS can be optimized by applying the internal stress-control approach to the piezoelectric membrane. Figure 2E presents the investigation of residual stress in the ultrathin PZT membrane using the x-ray diffraction method. The internal stress was calculated at a (110) peak with a 2θ angle of 31° to estimate the d-spacing parameter according to sin2φ orientation, as shown in fig. S3. A compressive stress was observed in the PZT membrane with thickness range from 1 to 6 μm owing to the deviation of thermal expansion coefficient between sapphire substrate (αsap = 7.5 ppm/K) and PZT membrane (αPZT = 5.5 ppm/K) (33). In addition, the thickness-residual stress tendency was corresponded to the gradual change of the crystallographic orientation at the (110) plane of PZT, as depicted in fig. S4. The 3-μm-thick PZT membrane showed the lowest compressive stress (~587.9 MPa) and the largest preferred orientation of the (110) peak. Note that the (110) orientation of the PZT membrane could be initialized and intensified up to 3-μm thickness to minimize lattice mismatch with the c-plane sapphire substrate (36). This observation suggests that the residual compressive stress can be alleviated by the preferred crystallographic orientation of PZT as a function of membrane thickness. As a result, the maximum polarization value can be presented in 3-μm-thick PZT with the lowest compressive stress because of preferred lateral dipoles in the direction of IDE. The crystallographic orientation at (110) plane was decreased by substrate clamping effect during subsequent deposition of PZT membrane after 3-μm thickness (37). Figure 2F displays the saturation (Ps) and remnant (Pr) polarization to evaluate the ferroelectric properties of ultrathin PZT membrane as a function of thickness. The inset of Fig. 2F shows the optical microscope image of IDE deposited onto the piezoelectric membrane. The polarization–electric field (P-E) hysteresis loops as a function of PZT membrane thickness are displayed in fig. S5, showing a saturation of polarization value in the 3-μm-thick PZT film. On the basis of these results, we used a 3-μm-thick PZT membrane for fabricating the highly sensitive miniaturized PMAS.

Resonant characterization of PMAS

Figure 3A shows the nanometer-scale and multiresonant displacements of PMAS membrane measured by a laser Doppler vibrometer (LDV) under white noise of 94-dB SPL. During a frequency sweep from 100 Hz to 4 kHz, a 632.8-nm wavelength LDV laser light was irradiated on the vibrating ultrathin membrane. The vibration displacement was measured by the frequency interval between the incident and reflected laser light. The frequency separation of multichannel PMAS was characterized using resonant oscillations via channel 2 of apex for low frequencies and channel 6 of base for high frequencies (38). The displacements of PMAS membrane were generated with a few tens of nanometers over the entire voice frequency range. The frequency components estimated by the LDV were consistent with the resonance frequencies of theoretical calculation by the FEM simulation in Fig. 2B. Figure 3B presents the frequency response of miniaturized PMAS with multiresonant band over the voice spectrum. Input white noise is defined as a mixture of broadband frequencies with identical intensity. An anechoic chamber with acoustic absorbent was used to inhibit external noise and wave reflection when measuring the electrical signals in a free-field condition (39). The frequency response of PMAS was plotted by selecting the highest relative sensitivity among seven channels via a dynamic signal acquisition (DSA) system. The meaning of relative sensitivity is the comparison of frequency response between the PMAS and reference microphone in the same conditions. The detailed frequency responses of seven PMAS channels from 100 Hz to 4 kHz are displayed in fig. S6. The PMAS with the 3-μm-thick PZT membrane was optimized by comparing frequency responses as a function of PZT thickness (fig. S7), which proved that the aforementioned stress-piezopotential relationship could be supported by a similar trend of maximum sensitivity with polarization curve of Fig. 2F. The maximum relative sensitivity of PMAS was 50 dB higher than that of a condenser type reference microphone (G.R.A.S. 46BE) at 830 Hz. A linear distribution was observed in the first, second, and third resonant bands due to the curved structure of PMAS (17). The electromechanical coupling coefficient and Butterworth-van Dyke equivalent circuit model were also calculated by an impedance analyzer to show the electromechanical characteristics of PMAS, as depicted in fig. S8. The inset of Fig. 3B displays that PMAS dimension was compared with a 2.4-cm-diameter Li-button cell. The curvilinear sound hole was designed in a printed circuit board (PCB) of the PMAS membrane for a bottom port microphone. As shown in Fig. 3B, the miniaturized PMAS covered the overall phonetic spectrum by broadening the resonant frequency bandwidth thanks to the low Q factor of the biomimetic ultrathin polymer.

Fig. 3 Mechanical and electrical characterizations of PMAS.

(A) Multiresonant displacements of ultrathin PMAS membrane measured by an LDV under frequency sweep from 100 Hz to 4 kHz. (B) Frequency response of PMAS plotted by selecting the highest sensitivity among the multiple channels (fR2 = 2.1fR1, fR3 = 3.2fR1). The inset exhibits a size comparison of PMAS with commercial Li-button cell (scale bar, 2 cm). Photo credit: Hee Seung Wang, Korea Advanced Institute of Science and Technology. (C) Piezoelectric voltage outputs of most sensitive channel at first, second, and third resonances and low frequency. The inset shows the magnified electrical signals of peak-to-peak voltage under monochromatic sinusoidal sound, presenting outstanding sensitivity of PMAS superior to a reference microphone. (D) Sensitivities of each resonant frequency converted into the units of dBV under sound wave of single frequency. Red, blue, and cyan boxed lines are fundamental frequencies at first, second, and third resonances, while other peaks are harmonic frequencies. (E) SNRs calculated by subtracting the sensitivity of each resonance frequency and noise baseline. The top inset displays multiresonant locations of curved PMAS membrane simulated by FEM calculation. (F) Linear behavior of channel 2 voltage at first resonance as a function of pressure. The inset exhibits the in-phase characteristic of PMAS signal identical to sinusoidal input.

Figure 3C shows the piezoelectric output voltage of highly sensitive PMAS under a monochromatic sound wave of 94-dB SPL. The highest electrical signal among the seven PMAS channels was measured at the first, second, and third resonances and low frequency. Signals of 200 and 830 Hz were recorded from channel 2, while the outputs of 1840 and 2890 Hz were characterized by channels 5 and 7, respectively. The magnified sinusoidal voltages at first resonance are presented in reference to a commercial G.R.A.S. microphone in the inset of Fig. 3C. The maximum peak-to-peak voltage of PMAS (~103 mV) at the first resonance was 28 times higher than that of the reference microphone (~3.7 mV). The detailed voltage signals of each resonant frequency are shown in fig. S9. Figure 3D displays the sensitivity of PMAS in units of logarithmic reference sound level (dBV) at each resonant frequency. The fundamental frequencies of each resonance were presented at 830, 1840, and 2890 Hz, followed by sequential harmonic frequencies with integer multiples. The sensitivity measured under the monochromatic sound wave was converted by voltage signal according to Eq. 4Sens.(dBV)=20logVV0(4)where Sens. (dBV) is the sensitivity, V is the root mean square value of voltage, and V0 is the reference 1 volt defined as 0 dBV. An outstanding maximum sensitivity of −28 dBV without amplifier was measured from the voltage signal of multichannel PMAS at the first resonant frequency. The highly sensitive PMAS for far-distant voice detection could resolve the limitation of a low-sensitivity microelectromechanical system (MEMS) microphone with unstable noise amplification (40).

Figure 3E presents the SNRs of PMAS under sound waves at each resonance frequency. An acoustic sensor with a high SNR is important for clear sound recognition with less noise interference (41). The SNRs of resonant frequencies were obtained by the deviation in peak sensitivity and noise baseline. The PMAS exhibited superior SNRs of 92, 85, and 78 dBV at resonance frequencies, indicating noise-robust characteristics of PMAS compared to 63 dB for the commercial condenser microphone (40). The first, second, and third resonant locations in the curved membrane were analyzed by FEM simulation, as depicted in the upper insets of Fig. 3E. Figure 3F displays the linearity between the pressure and the voltage output of channel 2 at the first resonance frequency. The piezoelectric voltage of PMAS increased from 1 mV to 0.43 V at pressure levels between 0.03 and 20.4 Pa. The linear behavior of PMAS exhibits the resonant capability in the wide acoustic pressure range for voice recognition. The inset of Fig. 3F shows the in-phase relationship between the sinusoidal input and PMAS output signal. In addition, the sensitivities of PMAS as functions of distance and incident angle were also analyzed to investigate the directional characteristic, as described in fig. S10.

Machine learning–based mobile biometric authentication

Figure 4A schematically describes the process of AI-based biometric authentication for speaker identification using a mobile sensor module. The integrated PMAS module consisted of a mini PMAS, signal transmitter, and machine learning processor. The multichannel PMAS was connected to a microcontroller unit (MCU; Raspberry Pi 3 Model B+) for the wireless transfer of input voice information. The transmitted resonant signals were analyzed by machine learning algorithm for training and testing procedures (42, 43). The GMM algorithm was used for speaker registration and verification with a voice dataset of 60 speeches from three men and two women (for a total of 300 data) to compare the identification accuracy of a PMAS module with a commercial microphone. Each of 150 data was included in the training and testing procedures for speaker decision. A customized biometric Android app was combined with a smartphone-integrated PMAS module to demonstrate bilateral HMI for access control, as shown in fig. S11. Figure 4B presents the resonant voice recognition of PMAS module by comparing the time domain waveform and frequency components with the original sound signal (women, voice of 0597168). The voltage signals as a function of time were converted to the frequency domain using the fast Fourier transform (FFT) and short-time FT (STFT) methods (44). In FFT analysis, channel 2 of the PMAS module exhibited frequency components identical to the original sound signal. The STFT spectrogram of multiple FFT frames also showed the matching of frequency range between the PMAS signal and original data in the time-varying characteristics. The STFT algorithm was used in GMM-based machine learning processing for mobile biometric PMAS application.

Fig. 4 Machine learning–based mobile biometric authentication of PMAS module.

(A) Schematic diagram of machine learning (ML)–based mobile biometric authentication using PMAS module. The multichannel signals of PMAS were wirelessly transferred to algorithm database for access control to a smartphone. (B) Comparison of voice feature between original sound and PMAS module signal. The graphs include voltage signal of time domain, FFT response, and STFT spectrogram. (C) Flowchart of GMM algorithm for speaker training and testing procedures composed of signal averaging, feature extraction, and layer formation. The speaker decision was performed by comparing the input voice information with pretrained dataset. (D) Speaker identification error rate of the PMAS module outperforming a commercial MEMS microphone in condition of 150 data training, 150 data testing, and seven mixtures. (E) Real-time mobile biometric authentication demonstrated by PMAS module and customized smartphone app for access permission and prohibition in condition of five training and one testing words. Photo credit: Hee Seung Wang, Korea Advanced Institute of Science and Technology.

Figure 4C displays a flowchart of the GMM-based training and testing procedures for the multichannel PMAS module. After STFT conversion of seven channels signals, the signal averaging was exploited by selecting the two most sensitive channels at each frequency and merging into one frequency-domain data for accurate speaker identification, as shown in fig. S12 (18). The input voice signal of PMAS module was used for the feature extraction and layer formation while a training procedure was used for recording the voice information of an individual speaker into the database. Subsequently, a speaker decision was made by comparing the pretrained voice information and the testing input. Figure 4D shows the accuracy of speaker identification in a PMAS module superior to a commercial MEMS microphone, based on a comparison of error rates. To calculate the biometric authentication rate, a mixture of Gaussian distributions referred to as GMM algorithm is incorporated (45). An outstanding reduction in error rate of 56% was achieved for seven mixtures (10% for the PMAS module and 18% for the commercial microphone). A small amount of training data is crucial in biometric mobile authentication for less processing time and simple module configuration. As demonstrated in our previous report, we believe that the error rate of PMAS module can be further decreased by increasing the number of training data (18). Figure 4E presents real-time biometric authentication using a PMAS module inserted into a smartphone. The multichannel voice signals of PMAS module were wirelessly transmitted to a machine learning processor for GMM algorithm as shown in movie S1 and figs. S13 and S14. The extremely small amount data of five training words and one testing word were inserted into the input of mobile app for real-time speaker registration and access verification. Accurate voice biometrics were successfully demonstrated using the multisignal characteristics of PMAS module with these small amounts of training and testing data.


We have advanced the downscaling of resonant piezoelectric acoustic sensor and processing efficiency of speaker identification algorithm for AI-based full device systems. Although the highly sensitive resonant type f-PAS has an excellent SNR value due to the unnecessity of power supplying to readout integrated circuit, the sensor downscaling has been restricted by negative synergistic influences of sensitivity and resonant frequency. To overcome the downscaling limitations, an engineering breakthrough of mimicking a tiny basilar membrane in human cochlear was used by fabricating an ultrathin inorganic piezoelectric film with 10-μm thickness of PMAS. We achieved a smartphone-integrated PMAS by using the biomimetic structure of tuning the resonant band within phonetic spectrum and the material design of controlling an intrinsic stress of piezoelectric film for IDEs. In addition, to cover the full phonetic spectrum, we theoretically formulated the relationship between the widening effect of Q factor and piezoelectric ultrathin membrane. We emphasized the important role of ultrathin polymer underneath an inorganic piezoelectric thin film for broadening Q factors; by combining multichannels of low Q factors, we successfully achieved the PMAS that has much higher sensitivity in the entire range of voice frequency than conventional microphones. The integrated product of multiresonant PMAS with excellent omnidirectionality and SNR could be applied in far-distance sound detection without the conventional beam-forming approach of a commercial microphone array. The highly sensitive PMAS with low-noise floor could also be used for high-quality three-dimensional acoustic recording, responded by a 360-degree spatial voice.

In addition to the downscaling and theoretical designing of highly sensitive acoustic sensor, we demonstrated the resonant PMAS module by integrating the systemized integrated circuit chips into single product and adopting the optimized algorithm to a small amount of training data. It is commonly known that the accuracy rate of speaker identification is proportional to the number of voice training; therefore, a heavy program of big server database cannot be compatible with the compact commercial product. We advanced the processing efficiency of speaker identification algorithm (150 training words) and added a function of distinguishing the preregistered person from unknown compared to the previous speaker recognition (2800 training words) (18). The above innovations enabled a real-time video demonstration of biometric authentication autonomously executed in the MCU with an extremely small amount training data (five words) for the commercial mobile application. Our miniaturized resonant PMAS could attract huge interest in the field of multi-input signal processing for highly accurate AI-based speaker identification because the each channel is equivalent with one conventional microphone; therefore, the PMAS collects seven times voice information compared to a commercial microphone. The multichannel signals of PMAS could be used to voice separation and noise canceling where the overlapped sounds from different channels were generated in single sensor.

In summary, we developed a highly sensitive and miniaturized PMAS for multiresonant frequency band control using a biomimetic ultrathin piezoelectric membrane. The PMAS channel on the ultrathin 4.8-μm polymer enabled a broadened resonant bandwidth with low Q factor (~16.6) at 830, 1840, and 2890 Hz, covering the entire voice frequency range from 100 Hz to 4 kHz. The biomimetic piezoelectric membrane of PMAS achieved not only an outstanding FOMsens of 103 mV/Pa in a 130-mm2 area but also a 70% size reduction compared to the previous report (90 mV/Pa in a 450-mm2 area). The frequency response of PMAS was improved by minimizing the residual compressive stress of the 3-μm piezoelectric membrane, thus aligning the lateral dipoles along the IDE direction. The excellent sensitivity and SNR were measured to be −28 and 92 dBV under monochromatic sinusoidal sound at the first resonance frequency of 830 Hz. Machine learning–based biometric authentication was demonstrated using a smartphone-integrated sensor module, composed of the PMAS, algorithm processor, and wireless transmitter. The speaker identification error rate of PMAS module was decreased up to 56% compared to the commercial MEMS microphone in a small-amount condition of 150 training data and 150 testing data. Last, the real-time voice biometrics for mobile application were successfully progressed even in five training words and one testing word by a customized Android app. We are currently investigating the signal processing for flatness of multichannel frequency responses to achieve the uniformity and undistorted sound output.


Fabrication of the miniaturized PMAS

The method used to fabricate the miniaturized PMAS uses protocols similar to those in our previous studies. Briefly, a 0.4 M sol-gel PZT solution (QUINTESS Co. Ltd.) was spin-coated on a sapphire substrate (Hi-Solar Co.), followed by pyrolysis and crystallization using rapid thermal annealing. The deposition procedures were repeated to obtain a PZT membrane of the desired thickness. Subsequently, biomimetic ultrathin PET (thickness of 4.8 μm) adhered to a handling substrate through a polydimethylsiloxane was covered by ultraviolet-sensitive polyurethane (Norland Optical Adhesive, no. 89) for attachment to the top surface of the PZT membrane. A XeCl excimer laser was irradiated on the sapphire substrate to transfer the piezoelectric membrane onto the flexible substrate. After the ILLO, a multichannel IDE was patterned with Cr/Au (thicknesses of 15 and 150 nm, respectively) using conventional radio frequency sputtering and photolithography. The van der Waals force between the biomimetic ultrathin membrane and the handling substrate was removed by dissolving the adhesive polydimethylsiloxane layer. The multichannels of mini PMAS were electrically interconnected to a PCB using a conductive paste. The rigid PCB substrate is a mechanical supporter to mount the ultrathin flexible membrane and process the electrical signals. Last, the piezoelectric dipoles of PZT membrane were aligned along the direction of an electrode using a high-voltage poling process.

Measurement of mechanical and electrical signals

The mechanical displacements of the PMAS were measured using an LDV with a He-Ne laser and the frequency sweep of a mouth simulator (type 4227-A, Bruel & Kjaer) from 100 Hz to 4 kHz. Electrical signals were characterized by a National Instruments Sound Module under white noise and monochromatic sinusoidal sound waves induced by a function generator. A commercial reference microphone (G.R.A.S. 46BE) was compared with the miniaturized PMAS under the same conditions, 94-dB SPL.


The crystallographic orientation and internal residual stress of the PZT membrane were investigated by a thin-film (Ultima IV, RIGAKU) and multipurpose thin-film (D/MAX-2500, RIGAKU) x-ray diffractometer. The piezoelectric polarization was characterized by P-E hysteresis loops using a ferroelectric measurement system (Precision Premier II, Radiant Technologies). The morphological images were obtained with an optical microscope (VHX-1000E, Keyence) and a focused ion beam scanning electron microscope (Helios Nanolab 450 F1, FEI company).

Resonance frequency modeling and simulation

FEM simulation (COMSOL Multiphysics 5.2 software) was used to calculate the resonance frequency, spectrum bandwidth, and piezoelectric potential. The PMAS structure was constructed to be similar to actual curvilinear shape (3 mm of w1, 10 mm of w2, and 20 mm of l). The resonant frequency defined in Eq. 1 was simulated to characterize resonance distribution and vibrational displacement in the biomimetic PMAS membrane. The resonant bandwidths of an only PZT thin film and PZT on ultrathin polymer were compared in the frequency response of PMAS by inputting the loss factors of each membrane. The residual stress generated during formation of the flexible piezoelectric film was assigned to the PZT membrane to calculate the piezoelectric potential of PMAS.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: Funding: This work was supported by Wearable Platform Materials Technology Center (WMC) (NRF-2016R1A5A1009926) and Convergent Technology R&D Program for Human Augmentation (NRF-2020M3C1B8081519) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT. We would like to thank Fronics Co. LTD. for support. Author contributions: H.S.W. and K.J.L. designed and carried out the experiments and data analysis. S.K.H., J.H.H., and Y.H.J. performed signal acquisition. H.K.J. and G.K. designed the machine learning algorithm. T.H.I. and C.K.J. contributed FEM simulations. B.-Y.L. performed characterization of PZT membrane. H.S.W., C.D.Y., and K.J.L. contributed to writing the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article