Accurate, Fast and Low Computation Cost of Voice Biometrics Performance using Model of CNN Depthwise Separable Convolution and Method of Hybrid DWT-MFCC for Security System
Main Article Content
Abstract
Identity theft, a pervasive criminal risk in the digital realm, particularly in online transactions, demands innovative security solutions. Voice biometrics, a cutting-edge technology, have been developed to ensure the protection of one's identification. This study, a significant step forward, focuses on the development of voice biometrics using deep learning, specifically CNN Depthwise Separable Convolution (DSC) and CNN Residual. The research on these two systems was conducted to determine accuracy, performance evaluation, computing load, and training process time for effectively, rapidly, and accurately verifying user voice for banking transaction security. The initial CNN residual test yielded a high validation accuracy of 98.6345%. However, the large number of CNN residual parameters resulted in a training time of 7.37 seconds, increasing the computational workload. The second CNN DSC test exhibited a high validation accuracy of 98.3542%. The CNN DSC was successful in decreasing the parameter count, resulting in a reduction of 5.12 seconds in training time. Upon analyzing the test results, it is clear that the CNN DSC has superior performance, resulting in faster training times and less memory consumption. This effectively addresses the problem of high computational costs and significantly enhances user identity security in banking transactions, a crucial aspect of modern banking.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- Author grant the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
References
Alsobhani, A., Alabboodi, H. M. A., & Mahdi, H. (2021). Speech Recognition using Convolution Deep Neural Networks. Journal of Physics: Conference Series, 1973(1). https://doi.org/10.1088/1742-6596/1973/1/012166
Amjad Hassan Khan, & P. S. Aithal. (2022). Voice Biometric Systems for User Identification and Authentication – A Literature Review. International Journal of Applied Engineering and Management Letters (IJAEML) A Refereed International Journal of Srinivas University, 6(1), 2581–7000.
Andra, M. B., & Usagawa, T. (2021). Improved Transcription and Speaker Identification System for Concurrent Speech in Bahasa Indonesia Using Recurrent Neural Network. IEEE Access, 9, 70758–70774. https://doi.org/10.1109/ACCESS.2021.3077441
Arora, S., & Bhatia, M. P. S. (2022). Challenges and opportunities in biometric security: A survey. Information Security Journal: A Global Perspective, 31(1), 28–48. https://doi.org/10.1080/19393555.2021.1873464
Batista, G. C., Oliveira, D. L., Saotome, O., & Silva, W. L. S. (2020). A low-power asynchronous hardware implementation of a novel SVM classifier, with an application in a speech recognition system. Microelectronics Journal, 105, 104907. https://doi.org/https://doi.org/10.1016/j.mejo.2020.104907
Chai, L., Du, J., Liu, Q.-F., & Lee, C.-H. (2021). A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 106–117. https://doi.org/10.1109/TASLP.2020.3036783
Chowdhury, A., & Ross, A. (2020). Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals. IEEE Transactions on Information Forensics and Security, 15, 1616–1629. https://doi.org/10.1109/TIFS.2019.2941773
Duraibi, S., Sheldon, F. T., & Alhamdani, W. (2020). Voice Biometric Identity Authentication Model for IoT Devices. International Journal of Security, Privacy and Trust Management, 9, 1–10. https://doi.org/10.5121/ijsptm.2020.9201
Filho, E. M. D. L., Filho, G. P. P. R., Sousa, R. T. De, & Gonçalves, V. P. (2022). Improving Data Security, Privacy, and Interoperability for the IEEE Biometric Open Protocol Standard. IEEE Access, 10, 26985–27001. https://doi.org/10.1109/ACCESS.2020.3046630
Hao, Q., Wang, F., Ma, X., & Zhang, P. (2021). A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism. 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 1–4. https://doi.org/10.1109/CISP-BMEI53629.2021.9624368
Hidayat, R., & Winursito, A. (2020). A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition. International Journal of Intelligent Engineering and Systems, 14(1), 12–21. https://doi.org/10.22266/IJIES2021.0228.02
Huang, C., Zhu, Z., & Guo, J. (2020). Investigations of HMM-Based Speech Recognition Technology. 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), 74–77. https://doi.org/10.1109/IWECAI50956.2020.00021
Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020a). An efficient algorithm for cardiac arrhythmia classification using ensemble of depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(2). https://doi.org/10.3390/app10020483
Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020b). Fast and accurate algorithm for ECG authentication using residual depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(9). https://doi.org/10.3390/app10093304
Jolad, B., & Khanai, R. (2022). ANNs for Automatic Speech Recognition—A Survey (pp. 35–48). https://doi.org/10.1007/978-981-16-2126-0_4
Jung, S.-Y., Liao, C.-H., Wu, Y.-S., Yuan, S.-M., & Sun, C.-T. (2021). Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features. Diagnostics (Basel, Switzerland), 11(4). https://doi.org/10.3390/diagnostics11040732
Kanervisto, A., Hautamäki, V., Kinnunen, T., & Yamagishi, J. (2022). Optimizing Tandem Speaker Verification and Anti-Spoofing Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 477–488. https://doi.org/10.1109/TASLP.2021.3138681
Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827
Liu, L. (2022). The New Approach Research on Singing Voice Detection Algorithm Based on Enhanced Reconstruction Residual Network. Journal of Mathematics, 2022, 7987592. https://doi.org/10.1155/2022/7987592
Lu, G., Zhang, W., & Wang, Z. (2022). Optimizing Depthwise Separable Convolution Operations on GPUs. IEEE Transactions on Parallel and Distributed Systems, 33(1), 70–87. https://doi.org/10.1109/TPDS.2021.3084813
M S, A., & P S, S. (2021). Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks. Traitement Du Signal, 38, 221–230. https://doi.org/10.18280/ts.380124
Malik, R. A., Setianingsih, C., & Nasrun, M. (2020). Speaker Recognition for Device Controlling using MFCC and GMM Algorithm. 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), 1–6. https://doi.org/10.1109/ICECIE50279.2020.9309603
Moreno, L. C., & Lopes, P. B. (2018). Voice Biometrics Based on Pitch Replication. International Journal for Innovation Education and Research, 6(10), 351–358. https://doi.org/10.31686/ijier.vol6.iss10.1201
Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24(4), 809–822. https://doi.org/10.1007/s10772-020-09771-2
Nayana, P. K., Mathew, D., & Thomas, A. (2017). Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods. Procedia Computer Science, 115, 47–54. https://doi.org/https://doi.org/10.1016/j.procs.2017.09.075
Pawade, D., Sakhapara, A., Ashtekar, R., Bakhai, D., & Tyagi, S. (2022). Voice Based Authentication Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Model. 2022 IEEE Bombay Section Signature Conference (IBSSC), 1–6. https://doi.org/10.1109/IBSSC56953.2022.10037421
Ping, L. (2021). English Speech Recognition Method Based on HMM Technology. 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), 646–649. https://doi.org/10.1109/ICITBS53129.2021.00164
Pyykkönen, P., Mimilakis, S. I., Drossos, K., & Virtanen, T. (2020). Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation. 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), 1–6. https://doi.org/10.1109/MMSP48831.2020.9287169
Quang, C. T., Nguyen, Q. M., Phuong, P. N., & Do, Q. T. (2021). Improving Speaker Verification in Noisy Environment Using DNN Classifier. 2021 RIVF International Conference on Computing and Communication Technologies (RIVF), 1–5. https://doi.org/10.1109/RIVF51545.2021.9642074
Sarkar, A., & Singh, B. K. (2020). A review on performance,security and various biometric template protection schemes for biometric authentication systems. Multimedia Tools and Applications, 79(37), 27721–27776. https://doi.org/10.1007/s11042-020-09197-7
Sen, N., Sahidullah, M., Patil, H., Mandal, S., Rao, K., & Basu, T. (2021). Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework. International Journal of Speech Technology, Article in. https://doi.org/10.1007/s10772-021-09862-8
Shan, W., Yang, M., Wang, T., Lu, Y., Cai, H., Zhu, L., Xu, J., Wu, C., Shi, L., & Yang, J. (2021). A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS. IEEE Journal of Solid-State Circuits, 56(1), 151–164. https://doi.org/10.1109/JSSC.2020.3029097
Shan, W., Yang, M., Xu, J., Lu, Y., Zhang, S., Wang, T., Yang, J., Shi, L., & Seok, M. (2020). 14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS. 2020 IEEE International Solid-State Circuits Conference - (ISSCC), 230–232. https://doi.org/10.1109/ISSCC19947.2020.9063000
Sholokhov, A., Kinnunen, T., Vestman, V., & Lee, K. A. (2020). Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores. Computer Speech & Language, 60, 101024. https://doi.org/https://doi.org/10.1016/j.csl.2019.101024
Singh, G., Bhardwaj, G., Singh, S. V., & Garg, V. (2021). Biometric Identification System: Security and Privacy Concern BT - Artificial Intelligence for a Sustainable Industry 4.0 (S. Awasthi, C. M. Travieso-González, G. Sanyal, & D. Kumar Singh (eds.); pp. 245–264). Springer International Publishing. https://doi.org/10.1007/978-3-030-77070-9_15
Singh, M. K. (2023). A text independent speaker identification system using ANN, RNN, and CNN classification technique. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-17573-2
Singla, D., & Verma, N. (2023). Machine and Deep learning in Biometric Authentication: A Review. 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 22–26. https://doi.org/10.1109/InCACCT57535.2023.10141692
Taye, M. M. (2023). Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions. Computation, 11(3), 52. https://doi.org/https://doi.org/10.3390/computation11030052
Wei, Y. (2020). Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization. IEEE Access, 8, 34942–34948. https://doi.org/10.1109/ACCESS.2020.2972511
Wells, A., & Usman, A. B. (2023). Trust and Voice Biometrics Authentication for Internet of Things. International Journal of Information Security and Privacy, 17(1), 1–28. https://doi.org/10.4018/IJISP.322102
Yusuf, N., Marafa, K. A., Shehu, K. L., Mamman, H., & Maidawa, M. (2020). A survey of biometric approaches of authentication. International Journal of Advanced Computer Research, 10(47), 96–104. https://doi.org/10.19101/ijacr.2019.940152
小川充洋. (2021). Parkinson's disease classification by residual network type 1-d CNN using vocal datasets. 生体医工学, Annual59(Abstract), 570. https://doi.org/10.11239/jsmbe.Annual59.570