Digital Signal Processing for The Development of Deep Learning-Based Speech Recognition Technology

Dita Novita Sari; Danang Kusnadi; Ricco Herdiyan Saputra; Mujeeb Ullah Khan

doi:10.24042/ijecs.v4i1.22918

Digital Signal Processing for The Development of Deep Learning-Based Speech Recognition Technology

Dita Novita Sari , Danang Kusnadi , Ricco Herdiyan Saputra , Mujeeb Ullah Khan

Abstract

This research discusses digital signal processing in the context of developing deep learning-based speech recognition technology. Given the increasing demand for accurate and efficient speech recognition systems, digital signal processing techniques are essential. The research method used is an experimental method with a quantitative approach. This research method consists of several stages: introduction, research design, data collection, data preprocessing, Deep Learning Model Development, performance training and evaluation, experiments and testing, and data analysis. These findings are expected to contribute to developing more sophisticated and applicable speech recognition systems in various fields. For example, in virtual assistants such as Siri and Google Assistant, improved speech recognition accuracy will allow for more natural interactions and faster responses, improving the user experience. This technology can be used in security systems for safer and more reliable voice authentication, replacing or supplementing passwords and fingerprints. Additionally, in accessibility technology, more accurate voice recognition will be particularly beneficial for individuals with visual impairments or mobility, allowing them to control devices and access information with just voice commands. Other benefits include improvements in automated phone apps, automatic transcription for meetings or conferences, and the development of smart home devices that can be fully voice-operated.

Keywords

Digital Signal Processing; Deep Learning; Speech Recognition; Technology.

Full Text:

PDF

References

S. Naziya S. and R. R. Deshmukh, “Speech recognition system – a review,” IOSR J. Comput. Eng., vol. 18, no. 04, pp. 01–09, 2016, doi: 10.9790/0661-1804020109.

N. Xue, “Analysis model of spoken english evaluation algorithm based on intelligent algorithm of internet of things,” Comput. Intell. Neurosci., vol. 2022, no. 01, pp. 1–8, 2022, doi: 10.1155/2022/8469945.

A. Goeritno and I. Setyawibawa, “An electronic device reviewed by diagnosing on the modules embodiment,” Int. J. Electron. Commun. Syst., vol. 1, no. 2, pp. 41–55, 2021, doi: 10.24042/ijecs.v1i2.10383.

K. Marzuki, M. I. Kholid, I. P. Hariyadi, and L. Z. A. Mardedi, “Automation of open vswitch-based virtual network configuration using ansible on proxmox virtual environment,” Int. J. Electron. Commun. Syst., vol. 3, no. 1, pp. 11-20, 2023, doi: 10.24042/ijecs.v3i1.16524.

C. A. Cholik, “Perkembangan teknologi informasi komunikasi/ICT dalam berbagai bidang,” J. Fak. Tek. Kuningan, vol. 2, no. 2, pp. 39–46, 2021.

A. B. Abdusalomov, F. Safarov, M. Rakhimov, B. Turaev, and T. K. Whangbo, “Improved feature parameter extraction from speech signals using machine learning algorithm,” Sensors, vol. 22, no. 21, pp. 1-21, 2022, doi: 10.3390/s22218122.

R. Iskandar and M. E. K. Kesuma, “Designing a real-time-based optical character recognition to detect id cards,” Int. J. Electron. Commun. Syst., vol. 2, no. 1, pp. 23–29, 2022, doi: 10.24042/ijecs.v2i1.13108.

M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Communication, vol. 116, no. 1, pp. 56–76, doi: 10.1016/j.specom.2019.12.001.

A. Gulati et al., “Conformer: Convolution-augmented transformer for speech recognition,” in Interspeech, 2020, pp. 5036–5040. doi: 10.21437/Interspeech.2020-3015.

S. Kriman et al., “Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions,” in {ICASSP} 2020 - 2020 {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP}), IEEE, 2020, pp. 6124–6128. doi: 10.1109/ICASSP40776.2020.9053889.

A. Saepulrohman and A. Ismangil, “Data integrity and security of digital signatures on electronic systems using the digital signature algorithm (DSA),” Int. J. Electron. Commun. Syst, vol. 1, no. 1, pp. 11–15, 2021.

J. A. Smith, K. Holt, R. Dockry, S. Sen, K. Sheppard, P. Turner, P. Czyzyk, and K. Mcguinness, “Performance of a digital signal processing algorithm for the accurate quantification of cough frequency,” European Respipatory Journal Research Letter, vol. 58, no. 2, pp. 1-4, 2021, doi: 10.1183/13993003.04271-2020.

Y. Gao, J. Lin, J. Xie, and Z. Ning, “A real-time defect detection method for digital signal processing of industrial inspection applications,” IEEE Xplore, vol. 17, no. 5, pp. 3450–3459, doi: 10.1109/TII.2020.3013277.

M. Zhang, G. Wang, and Q. Hong, “Using mel-frequency cepstral coefficients in missing data technique,” EURASIP J. Adv. Signal Process., vol. 2004, no. 3, pp. 340-346, 2004, doi: 10.1155/s1110865704309030.

Y. H. Goh, Y.-S. Ko, Y. K. Lee, and Y.-J. Goh, “Fast wavelet-based pitch period detector for speech signals,” Atl. Press Int. Conf. Comput. Eng. Inf. Syst., vol. 52, no. 1, pp. 494–497, 2016, doi: 10.2991/ceis-16.2016.101.

X. Dai, X. Dai, and B. Yu, “An improved signal subspace algorithm for speech enhancement,” IFIP Adv. Inf. Commun. Technol., vol. 445, no. 2004, pp. 104–114, 2014, doi: 10.1007/978-3-662-45526-5_10.

J. Yu and Y. Wei, “Digital signal processing for high‐speed {THz} communications,” Chinese Journal of Electronics, vol. 31, no. 3, pp. 534–546, 2022, doi: 10.1049/cje.2021.00.258.

T. Matsuura, K. Maeda, T. Sasaki, and M. Koashi, “Finite-size security of continuous-variable quantum key distribution with digital signal processing,” Nature Communications, vol. 12, no. 01, pp. 1-13, 2021, doi: 10.1038/s41467-020-19916-1.

Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 11, pp. 4883–4894, 2018, doi: 10.1109/TITS.2019.2950416.

A. A. Ardakani, A. R. Kanafi, U. R. Acharya, N. Khadem, and A. Mohammadi, “Application of deep learning technique to manage {COVID}-19 in routine clinical practice using {CT} images: Results of 10 convolutional neural networks,” Computers in Biology and Medicine, vol. 121, no. 01, pp. 1-9, 2020, doi: 10.1016/j.compbiomed.2020.103795.

S. Ilahiyah and A. Nilogiri, “Implementasi deep learning pada identifikasi jenis tumbuhan berdasarkan citra daun menggunakan convolutional neural network,” JUSTINDO (Jurnal Sist. Dan Teknol. Inf. Indones., vol. 3, no. 2, pp. 49–56, 2018.

J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, “Deep learning for sensor-based activity recognition: A survey,” Pattern Recognit. Lett., vol. 119, no. 01, pp. 3–11, 2017, doi: 10.1016/j.patrec.2018.02.010.

T. Rahman et al., “Transfer learning with deep convolutional neural network ({CNN}) for pneumonia detection using chest x-ray,” Appl. Sci., vol. 10, no. 9, pp. 1-17, 2020, doi: 10.3390/app10093233.

S. Sharma, K. Guleria, S. Tiwari, and S. Kumar, “A deep learning based convolutional neural network model with VGG16 feature extractor for the detection of Alzheimer Disease using MRI scans,” Meas. Sensors, vol. 24, no.1, pp. 1-8, 2022.

D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomedical Signal Processing and Control, vol. 59, no. 1, pp. 1-14, 2020, doi: 10.1016/j.bspc.2020.101894.

M. Mustaqeem, M. Sajjad, and S. Kwon, “Clustering-based speech emotion recognition by incorporating learned features and deep {BiLSTM},” IEEE Access, vol. 8, no.1, pp. 79861–79875, 2020, doi: 10.1109/ACCESS.2020.2990405.

L. A. Neto, J. Maes, P. Larsson-Edefors, J. Nakagawa, K. Onohara, and S. J. Trowbridge, “Considerations on the use of digital signal processing in future optical access networks,” IEEE Xplore, vol. 38, no. 3, pp. 598–607, 2019, doi: 10.1109/JLT.2019.2946687.

I. P. A. Yuda, I. G. N. A. C. Putra, I. K. G. Suhartana, I. K. Ari, M. A. R. Mogi, and L. A. A. R. Putri, “Perancangan sistem keamanan lingkungan pengenal suara kulkul dengan menggunakan metode deep learning,” J. Elektron. Ilmu Komput. Udayana p-ISSN, vol. 11, no. 2, pp. 429-438, 2022.

F. Paath, L. A. Latumakulita, C. Montolalu, and Y. Langi. (2021). Pengenalan suara manusia menggunakan convolutional neural network studi kasus suara dosen program studi sistem informasi universitas sam ratulangi,” Presented Proceeding KONIK (Konferensi Nas. Ilmu Komputer). [Online]. Available: https://prosiding.konik.id/index.php/konik/article/view/53

I. F. Alam, M. I. Sarita, and A. M. Sajiah, “Implementasi deep learning dengan metode convolutional neural network untuk identifikasi objek secara real time berbasis android,” SemanTIK, vol. 5, no. 2, pp. 237–244, 2019.

L. S. Ramba, “Design of a voice controlled home automation system using deep learning convolutional neural network (DL-CNN),” Telekontran J. Ilm. Telekomun. Kendali dan Elektron. Terap., vol. 8, no. 1, pp. 57–73, 2020, doi: 10.34010/telekontran.v8i1.3078.

Y. Yang and Y. Yue, “English speech sound improvement system based on deep learning from signal processing to semantic recognition,” Int. J. Speech Technol., vol. 23, no. 03, pp. 505–515, 2020, doi: 10.1007/s10772-020-09733-8.

Y. W. Chen et al., “CITISEN: A deep learning-based speech signal-processing mobile application,” IEEE Access, vol. 10, no. 01, pp. 46082–46099, 2022, doi: 10.1109/ACCESS.2022.3153469.

Sugiyono, Metode Penelitian Pendidikan Pendekatan Kuantitatif, Kualitatif, dan R&D. Bandung: Alfabeta, 2016.

P. Sivakumar, C. S. Boopathi, M. G. Sumithra, M. Singh, J. Malhotra, and A. Grover, “Ultra-high capacity long-haul {PDM}-16-{QAM}-based {WDM}-{FSO} transmission system using coherent detection and digital signal processing,” Optical and Quantum Electronics, vol. 52, no. 11, pp. 2-18, 2020, doi: 10.1007/s11082-020-02616-x.

M. Sorkhi, M. R. Jahed-Motlagh, B. Minaei-Bidgoli, and M. R. Daliri, “Hybrid fuzzy deep neural network toward temporal-spatial-frequency features learning of motor imagery signals,” Sci. Rep., vol. 12, no. 1, pp. 1–15, 2022, doi: 10.1038/s41598-022-26882-9.

R. F. Caldeira, W. E. Santiago, and B. Teruel, “Identification of cotton leaf lesions using deep learning techniques,” Sensors, vol. 21, no. 9, pp. 1-14, 2021.

Y. Cao, A. Mohammadzadeh, J. Tavoosi, S. Mobayen, R. Safdar, and A. Fekih, “A new predictive energy management system: Deep learned type-2 fuzzy system based on singular value decommission,” Energy Reports, vol. 8, no.1, pp. 722–734, 2022.

J. Oruh, S. Viriri, and A. Adegun, “Long short-term memory recurrent neural network for automatic speech recognition,” IEEE Access, vol. 10, no. 1, pp. 30069–30079, 2022, doi: 10.1109/ACCESS.2022.3159339.

E. E. B. Adam, “Deep learning based nlp techniques in text to speech synthesis for communication recognition,” J. Soft Comput. Paradig., vol. 2, no. 4, pp. 209–215, 2020, doi: 10.36548/jscp.2020.4.002.

M. Kolbæk, D. Yu, Z.-H. Tan, and J. Jensen, “Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1901–1913, 2017, doi: 10.1109/TASLP.2017.2726762.

H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, and T. Sainath, “Deep Learning for audio signal processing,” IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 206–219, 2019, doi: 10.1109/JSTSP.2019.2908700.

F. Ye and J. Yang, “A deep neural network model for speaker identification,” Appl. Sci., vol. 11, no. 3603, pp. 1007–1010, 2021, doi: 10.21437/eurospeech.1999-246.

D. Yu, M. Kolbæk, Z.-H. Tan, and J. Jensen, “Permutation invariant training of deep models for speaker-independent multi-talker speech separation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 241–245. doi: 10.1109/ICASSP.2017.7952154.

D. Michelsanti et al., “An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, no.1, pp. 1368–1396, 2021, doi: 10.1109/TASLP.2021.3066303.

DOI: http://dx.doi.org/10.24042/ijecs.v4i1.22918

Refbacks

There are currently no refbacks.

International Journal of Electronics and Communications System (IJECS) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me