Accurate, Fast and Low Computation Cost of Voice Biometrics Performance using Model of CNN Depthwise Separable Convolution and Method of Hybrid DWT-MFCC for Security System

Main Article Content

Haris Isyanto
Wahyu Ibrahim
Riza Samsinar

Abstract

Identity theft, a pervasive criminal risk in the digital realm, particularly in online transactions, demands innovative security solutions. Voice biometrics, a cutting-edge technology, have been developed to ensure the protection of one's identification. This study, a significant step forward, focuses on the development of voice biometrics using deep learning, specifically CNN Depthwise Separable Convolution (DSC) and CNN Residual. The research on these two systems was conducted to determine accuracy, performance evaluation, computing load, and training process time for effectively, rapidly, and accurately verifying user voice for banking transaction security. The initial CNN residual test yielded a high validation accuracy of 98.6345%. However, the large number of CNN residual parameters resulted in a training time of 7.37 seconds, increasing the computational workload. The second CNN DSC test exhibited a high validation accuracy of 98.3542%. The CNN DSC was successful in decreasing the parameter count, resulting in a reduction of 5.12 seconds in training time. Upon analyzing the test results, it is clear that the CNN DSC has superior performance, resulting in faster training times and less memory consumption. This effectively addresses the problem of high computational costs and significantly enhances user identity security in banking transactions, a crucial aspect of modern banking.

Article Details

Section
Telecommunication

References

Alsobhani, A., Alabboodi, H. M. A., & Mahdi, H. (2021). Speech Recognition using Convolution Deep Neural Networks. Journal of Physics: Conference Series, 1973(1). https://doi.org/10.1088/1742-6596/1973/1/012166

Amjad Hassan Khan, & P. S. Aithal. (2022). Voice Biometric Systems for User Identification and Authentication – A Literature Review. International Journal of Applied Engineering and Management Letters (IJAEML) A Refereed International Journal of Srinivas University, 6(1), 2581–7000.

Andra, M. B., & Usagawa, T. (2021). Improved Transcription and Speaker Identification System for Concurrent Speech in Bahasa Indonesia Using Recurrent Neural Network. IEEE Access, 9, 70758–70774. https://doi.org/10.1109/ACCESS.2021.3077441

Arora, S., & Bhatia, M. P. S. (2022). Challenges and opportunities in biometric security: A survey. Information Security Journal: A Global Perspective, 31(1), 28–48. https://doi.org/10.1080/19393555.2021.1873464

Batista, G. C., Oliveira, D. L., Saotome, O., & Silva, W. L. S. (2020). A low-power asynchronous hardware implementation of a novel SVM classifier, with an application in a speech recognition system. Microelectronics Journal, 105, 104907. https://doi.org/https://doi.org/10.1016/j.mejo.2020.104907

Chai, L., Du, J., Liu, Q.-F., & Lee, C.-H. (2021). A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 106–117. https://doi.org/10.1109/TASLP.2020.3036783

Chowdhury, A., & Ross, A. (2020). Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals. IEEE Transactions on Information Forensics and Security, 15, 1616–1629. https://doi.org/10.1109/TIFS.2019.2941773

Duraibi, S., Sheldon, F. T., & Alhamdani, W. (2020). Voice Biometric Identity Authentication Model for IoT Devices. International Journal of Security, Privacy and Trust Management, 9, 1–10. https://doi.org/10.5121/ijsptm.2020.9201

Filho, E. M. D. L., Filho, G. P. P. R., Sousa, R. T. De, & Gonçalves, V. P. (2022). Improving Data Security, Privacy, and Interoperability for the IEEE Biometric Open Protocol Standard. IEEE Access, 10, 26985–27001. https://doi.org/10.1109/ACCESS.2020.3046630

Hao, Q., Wang, F., Ma, X., & Zhang, P. (2021). A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism. 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 1–4. https://doi.org/10.1109/CISP-BMEI53629.2021.9624368

Hidayat, R., & Winursito, A. (2020). A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition. International Journal of Intelligent Engineering and Systems, 14(1), 12–21. https://doi.org/10.22266/IJIES2021.0228.02

Huang, C., Zhu, Z., & Guo, J. (2020). Investigations of HMM-Based Speech Recognition Technology. 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), 74–77. https://doi.org/10.1109/IWECAI50956.2020.00021

Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020a). An efficient algorithm for cardiac arrhythmia classification using ensemble of depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(2). https://doi.org/10.3390/app10020483

Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020b). Fast and accurate algorithm for ECG authentication using residual depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(9). https://doi.org/10.3390/app10093304

Jolad, B., & Khanai, R. (2022). ANNs for Automatic Speech Recognition—A Survey (pp. 35–48). https://doi.org/10.1007/978-981-16-2126-0_4

Jung, S.-Y., Liao, C.-H., Wu, Y.-S., Yuan, S.-M., & Sun, C.-T. (2021). Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features. Diagnostics (Basel, Switzerland), 11(4). https://doi.org/10.3390/diagnostics11040732

Kanervisto, A., Hautamäki, V., Kinnunen, T., & Yamagishi, J. (2022). Optimizing Tandem Speaker Verification and Anti-Spoofing Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 477–488. https://doi.org/10.1109/TASLP.2021.3138681

Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827

Liu, L. (2022). The New Approach Research on Singing Voice Detection Algorithm Based on Enhanced Reconstruction Residual Network. Journal of Mathematics, 2022, 7987592. https://doi.org/10.1155/2022/7987592

Lu, G., Zhang, W., & Wang, Z. (2022). Optimizing Depthwise Separable Convolution Operations on GPUs. IEEE Transactions on Parallel and Distributed Systems, 33(1), 70–87. https://doi.org/10.1109/TPDS.2021.3084813

M S, A., & P S, S. (2021). Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks. Traitement Du Signal, 38, 221–230. https://doi.org/10.18280/ts.380124

Malik, R. A., Setianingsih, C., & Nasrun, M. (2020). Speaker Recognition for Device Controlling using MFCC and GMM Algorithm. 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), 1–6. https://doi.org/10.1109/ICECIE50279.2020.9309603

Moreno, L. C., & Lopes, P. B. (2018). Voice Biometrics Based on Pitch Replication. International Journal for Innovation Education and Research, 6(10), 351–358. https://doi.org/10.31686/ijier.vol6.iss10.1201

Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24(4), 809–822. https://doi.org/10.1007/s10772-020-09771-2

Nayana, P. K., Mathew, D., & Thomas, A. (2017). Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods. Procedia Computer Science, 115, 47–54. https://doi.org/https://doi.org/10.1016/j.procs.2017.09.075

Pawade, D., Sakhapara, A., Ashtekar, R., Bakhai, D., & Tyagi, S. (2022). Voice Based Authentication Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Model. 2022 IEEE Bombay Section Signature Conference (IBSSC), 1–6. https://doi.org/10.1109/IBSSC56953.2022.10037421

Ping, L. (2021). English Speech Recognition Method Based on HMM Technology. 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), 646–649. https://doi.org/10.1109/ICITBS53129.2021.00164

Pyykkönen, P., Mimilakis, S. I., Drossos, K., & Virtanen, T. (2020). Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation. 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), 1–6. https://doi.org/10.1109/MMSP48831.2020.9287169

Quang, C. T., Nguyen, Q. M., Phuong, P. N., & Do, Q. T. (2021). Improving Speaker Verification in Noisy Environment Using DNN Classifier. 2021 RIVF International Conference on Computing and Communication Technologies (RIVF), 1–5. https://doi.org/10.1109/RIVF51545.2021.9642074

Sarkar, A., & Singh, B. K. (2020). A review on performance,security and various biometric template protection schemes for biometric authentication systems. Multimedia Tools and Applications, 79(37), 27721–27776. https://doi.org/10.1007/s11042-020-09197-7

Sen, N., Sahidullah, M., Patil, H., Mandal, S., Rao, K., & Basu, T. (2021). Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework. International Journal of Speech Technology, Article in. https://doi.org/10.1007/s10772-021-09862-8

Shan, W., Yang, M., Wang, T., Lu, Y., Cai, H., Zhu, L., Xu, J., Wu, C., Shi, L., & Yang, J. (2021). A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS. IEEE Journal of Solid-State Circuits, 56(1), 151–164. https://doi.org/10.1109/JSSC.2020.3029097

Shan, W., Yang, M., Xu, J., Lu, Y., Zhang, S., Wang, T., Yang, J., Shi, L., & Seok, M. (2020). 14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS. 2020 IEEE International Solid-State Circuits Conference - (ISSCC), 230–232. https://doi.org/10.1109/ISSCC19947.2020.9063000

Sholokhov, A., Kinnunen, T., Vestman, V., & Lee, K. A. (2020). Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores. Computer Speech & Language, 60, 101024. https://doi.org/https://doi.org/10.1016/j.csl.2019.101024

Singh, G., Bhardwaj, G., Singh, S. V., & Garg, V. (2021). Biometric Identification System: Security and Privacy Concern BT - Artificial Intelligence for a Sustainable Industry 4.0 (S. Awasthi, C. M. Travieso-González, G. Sanyal, & D. Kumar Singh (eds.); pp. 245–264). Springer International Publishing. https://doi.org/10.1007/978-3-030-77070-9_15

Singh, M. K. (2023). A text independent speaker identification system using ANN, RNN, and CNN classification technique. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-17573-2

Singla, D., & Verma, N. (2023). Machine and Deep learning in Biometric Authentication: A Review. 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 22–26. https://doi.org/10.1109/InCACCT57535.2023.10141692

Taye, M. M. (2023). Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions. Computation, 11(3), 52. https://doi.org/https://doi.org/10.3390/computation11030052

Wei, Y. (2020). Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization. IEEE Access, 8, 34942–34948. https://doi.org/10.1109/ACCESS.2020.2972511

Wells, A., & Usman, A. B. (2023). Trust and Voice Biometrics Authentication for Internet of Things. International Journal of Information Security and Privacy, 17(1), 1–28. https://doi.org/10.4018/IJISP.322102

Yusuf, N., Marafa, K. A., Shehu, K. L., Mamman, H., & Maidawa, M. (2020). A survey of biometric approaches of authentication. International Journal of Advanced Computer Research, 10(47), 96–104. https://doi.org/10.19101/ijacr.2019.940152

小川充洋. (2021). Parkinson's disease classification by residual network type 1-d CNN using vocal datasets. 生体医工学, Annual59(Abstract), 570. https://doi.org/10.11239/jsmbe.Annual59.570