A team of researchers has developed a eavesdropping attack for Android devices that can, to varying degrees, recognize the gender and identity of the caller, and even discern private speech.
Named EarSpy, the side-channel attack aims to explore new possibilities for eavesdropping by capturing motion sensor data readings caused by speaker reverberations in mobile devices.
EarSpy is an academic effort by researchers from five US universities (Texas A&M University, New Jersey Institute of Technology, Temple University, University of Dayton, and Rutgers University).
Although this type of attack has been explored in smartphone speakers, the in-ear speakers were considered too weak to generate enough vibration for the risk of eavesdropping to turn such a side-channel attack into a practical attack.
However, modern smartphones use more powerful stereo speakers compared to models from a few years ago, which produce much better sound quality and stronger vibrations.
Similarly, modern devices use more sensitive motion sensors and gyroscopes that can register even the tiniest speaker resonances.
Evidence of this progress is shown below, where the earpiece of a 2016 OnePlus 3T barely registers on the spectrogram while the stereo speakers of a 2019 OnePlus 7T produce significantly more data.
Experience and results
The researchers used a OnePlus 7T and OnePlus 9 device in their experiments, along with various sets of pre-recorded audio that played only through the speakers of both devices.
The team also used the third-party application “Physics Toolbox Sensor Suite” to capture accelerometer data during a simulated call, then passed it to MATLAB for analysis and to extract features from the audio stream.
A machine learning (ML) algorithm was trained using readily available datasets to recognize speech content, caller ID and gender.
The test data varied by dataset and device, but produced overall promising results for listening through the speakerphone.
Caller ID gender on OnePlus 7T ranged between 77.7% and 98.7%, Caller ID classification ranged between 63.0% and 91.2%, and voice recognition ranged between 51.8% and 56.4%.
“We evaluate time and frequency domain features with classical ML algorithms, which show the highest accuracy of 56.42%,” the researchers explain in their paper.
On the OnePlus 9 device, gender identification reached 88.7%, speaker identification dropped to an average of 73.6%, while voice recognition ranged between 33.3% and 41.6 %.
Using the loudspeaker and the ‘Loud speaker‘ which the researchers developed while experimenting with a similar attack in 2020, the accuracy of gender and caller ID reached 99%, while voice recognition reached 80% accuracy.
Limits and Solutions
One thing that could reduce the effectiveness of the EarSpy attack is the volume users choose for their speakers. A lower volume could prevent listening through this side channel attack and it is also more comfortable for the ear.
The arrangement of the hardware components of the device and the sealing of the assembly also have an impact on the diffusion of the reverberation of the loudspeakers.
Finally, user movements or vibrations from the environment decrease the accuracy of the derived voice data.
Android 13 introduced a restriction in collecting sensor data without permission for sampling data rates beyond 200 Hz. Although this prevents speech recognition at the default sampling rate (400 Hz – 500 Hz ), this only lowers accuracy by about 10% if the attack is performed at 200 Hz.
The researchers suggest that phone makers ensure that sound pressure remains stable during calls and place motion sensors in a position where vibrations of internal origin do not affect them or at least have the minimum possible impact. .