Accurate, Fast and Low Computation Cost of Voice Biometrics Performance using Model of CNN Depthwise Separable Convolution and Method of Hybrid DWT-MFCC for Security System

Main Article Content

Haris Isyanto


Identity theft presents a substantial criminal threat in the digital world, especially in online transactions. To overcome this problem, voice biometrics was created as a method to guarantee identity security. This research is to look at voice biometrics systems that use deep learning model, focusing on the CNN Depthwise Separable Convolution (DSC) model compared to CNN Residual. The comparison of these two systems is to improve accuracy and performance. CNN Residual's first Voice Biometrics testing showed a high accuracy validation performance 98.6345%. The large number of Residual CNN training parameters causes a longer training process time 7.37 seconds and response time 2.35 seconds. So, the computing load becomes larger. The second voice biometrics test of CNN DSC showed high accuracy validation performance results 98.3542%. CNN DSC performance succeeded in reducing the number of training parameters, thereby shortening the training process by 5.12 seconds and the fastest response time was 1.54 seconds. Based on the analysis of the test results above, it shows performance advantages. CNN DSC is able to reduce the computing load, is able to improve the user identity security system in banking transactions accurately and quickly and is able to solve the problem of high computing costs.


Article Details



Alsobhani, A., Alabboodi, H. M. A., & Mahdi, H. (2021). Speech Recognition using Convolution Deep Neural Networks. Journal of Physics: Conference Series, 1973(1).

Amjad Hassan Khan, & P. S. Aithal. (2022). Voice Biometric Systems for User Identification and Authentication – A Literature Review. International Journal of Applied Engineering and Management Letters (IJAEML) A Refereed International Journal of Srinivas University, 6(1), 2581–7000.

Andra, M. B., & Usagawa, T. (2021). Improved Transcription and Speaker Identification System for Concurrent Speech in Bahasa Indonesia Using Recurrent Neural Network. IEEE Access, 9, 70758–70774.

Arora, S., & Bhatia, M. P. S. (2022). Challenges and opportunities in biometric security: A survey. Information Security Journal: A Global Perspective, 31(1), 28–48.

Batista, G. C., Oliveira, D. L., Saotome, O., & Silva, W. L. S. (2020). A low-power asynchronous hardware implementation of a novel SVM classifier, with an application in a speech recognition system. Microelectronics Journal, 105, 104907.

Chai, L., Du, J., Liu, Q.-F., & Lee, C.-H. (2021). A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 106–117.

Chowdhury, A., & Ross, A. (2020). Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals. IEEE Transactions on Information Forensics and Security, 15, 1616–1629.

Duraibi, S., Sheldon, F. T., & Alhamdani, W. (2020). Voice Biometric Identity Authentication Model for IoT Devices. International Journal of Security, Privacy and Trust Management, 9, 1–10.

Filho, E. M. D. L., Filho, G. P. P. R., Sousa, R. T. De, & Gonçalves, V. P. (2022). Improving Data Security, Privacy, and Interoperability for the IEEE Biometric Open Protocol Standard. IEEE Access, 10, 26985–27001.

Hao, Q., Wang, F., Ma, X., & Zhang, P. (2021). A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism. 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 1–4.

Hidayat, R., & Winursito, A. (2020). A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition. International Journal of Intelligent Engineering and Systems, 14(1), 12–21.

Huang, C., Zhu, Z., & Guo, J. (2020). Investigations of HMM-Based Speech Recognition Technology. 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), 74–77.

Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020a). An efficient algorithm for cardiac arrhythmia classification using ensemble of depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(2).

Ihsanto, E., Ramli, K., Sudiana, D., & Gunawan, T. S. (2020b). Fast and accurate algorithm for ECG authentication using residual depthwise separable convolutional neural networks. Applied Sciences (Switzerland), 10(9).

Jolad, B., & Khanai, R. (2022). ANNs for Automatic Speech Recognition—A Survey (pp. 35–48).

Jung, S.-Y., Liao, C.-H., Wu, Y.-S., Yuan, S.-M., & Sun, C.-T. (2021). Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features. Diagnostics (Basel, Switzerland), 11(4).

Kanervisto, A., Hautamäki, V., Kinnunen, T., & Yamagishi, J. (2022). Optimizing Tandem Speaker Verification and Anti-Spoofing Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 477–488.

Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999–7019.

Liu, L. (2022). The New Approach Research on Singing Voice Detection Algorithm Based on Enhanced Reconstruction Residual Network. Journal of Mathematics, 2022, 7987592.

Lu, G., Zhang, W., & Wang, Z. (2022). Optimizing Depthwise Separable Convolution Operations on GPUs. IEEE Transactions on Parallel and Distributed Systems, 33(1), 70–87.

M S, A., & P S, S. (2021). Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks. Traitement Du Signal, 38, 221–230.

Malik, R. A., Setianingsih, C., & Nasrun, M. (2020). Speaker Recognition for Device Controlling using MFCC and GMM Algorithm. 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), 1–6.

Moreno, L. C., & Lopes, P. B. (2018). Voice Biometrics Based on Pitch Replication. International Journal for Innovation Education and Research, 6(10), 351–358.

Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24(4), 809–822.

Nayana, P. K., Mathew, D., & Thomas, A. (2017). Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods. Procedia Computer Science, 115, 47–54.

Pawade, D., Sakhapara, A., Ashtekar, R., Bakhai, D., & Tyagi, S. (2022). Voice Based Authentication Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Model. 2022 IEEE Bombay Section Signature Conference (IBSSC), 1–6.

Ping, L. (2021). English Speech Recognition Method Based on HMM Technology. 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), 646–649.

Pyykkönen, P., Mimilakis, S. I., Drossos, K., & Virtanen, T. (2020). Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation. 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), 1–6.

Quang, C. T., Nguyen, Q. M., Phuong, P. N., & Do, Q. T. (2021). Improving Speaker Verification in Noisy Environment Using DNN Classifier. 2021 RIVF International Conference on Computing and Communication Technologies (RIVF), 1–5.

Sarkar, A., & Singh, B. K. (2020). A review on performance,security and various biometric template protection schemes for biometric authentication systems. Multimedia Tools and Applications, 79(37), 27721–27776.

Sen, N., Sahidullah, M., Patil, H., Mandal, S., Rao, K., & Basu, T. (2021). Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework. International Journal of Speech Technology, Article in.

Shan, W., Yang, M., Wang, T., Lu, Y., Cai, H., Zhu, L., Xu, J., Wu, C., Shi, L., & Yang, J. (2021). A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS. IEEE Journal of Solid-State Circuits, 56(1), 151–164.

Shan, W., Yang, M., Xu, J., Lu, Y., Zhang, S., Wang, T., Yang, J., Shi, L., & Seok, M. (2020). 14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS. 2020 IEEE International Solid-State Circuits Conference - (ISSCC), 230–232.

Sholokhov, A., Kinnunen, T., Vestman, V., & Lee, K. A. (2020). Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores. Computer Speech & Language, 60, 101024.

Singh, G., Bhardwaj, G., Singh, S. V., & Garg, V. (2021). Biometric Identification System: Security and Privacy Concern BT - Artificial Intelligence for a Sustainable Industry 4.0 (S. Awasthi, C. M. Travieso-González, G. Sanyal, & D. Kumar Singh (eds.); pp. 245–264). Springer International Publishing.

Singh, M. K. (2023). A text independent speaker identification system using ANN, RNN, and CNN classification technique. Multimedia Tools and Applications.

Singla, D., & Verma, N. (2023). Machine and Deep learning in Biometric Authentication: A Review. 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 22–26.

Taye, M. M. (2023). Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions. Computation, 11(3), 52.

Wei, Y. (2020). Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization. IEEE Access, 8, 34942–34948.

Wells, A., & Usman, A. B. (2023). Trust and Voice Biometrics Authentication for Internet of Things. International Journal of Information Security and Privacy, 17(1), 1–28.

Yusuf, N., Marafa, K. A., Shehu, K. L., Mamman, H., & Maidawa, M. (2020). A survey of biometric approaches of authentication. International Journal of Advanced Computer Research, 10(47), 96–104.

小川充洋. (2021). Parkinson's disease classification by residual network type 1-d CNN using vocal datasets. 生体医工学, Annual59(Abstract), 570.