A1 Journal article (refereed)
Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques (2023)
Niazi, T., Das, T., Ahmed, G., Waqas, S. M., Khan, S., Khan, S., Abdelatif, A. A., & Wasi, S. (2023). Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques. Algorithms, 16(1), Article 53. https://doi.org/10.3390/a16010053
JYU authors or editors
Publication details
All authors or editors: Niazi, Tahira; Das, Teerath; Ahmed, Ghufran; Waqas, Syed Muhammad; Khan, Sumra; Khan, Suleman; Abdelatif, Ahmed Abdelaziz; Wasi, Shaukat
Journal or series: Algorithms
eISSN: 1999-4893
Publication year: 2023
Publication date: 12/01/2023
Volume: 16
Issue number: 1
Article number: 53
Publisher: MDPI AG
Publication country: Switzerland
Publication language: English
DOI: https://doi.org/10.3390/a16010053
Publication open access: Openly available
Publication channel open access: Open Access channel
Publication is parallel published (JYX): https://jyx.jyu.fi/handle/123456789/85762
Abstract
Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Researchers investigated the effect of code comments on software development tasks and demonstrated the use of comments in several ways, including maintenance, reusability, bug detection, etc. Given the importance of code comments, it becomes vital for novice developers to brush up on their code commenting skills. In this study, we initially investigated what types of comments novice students document in their source code and further categorized those comments using a machine learning approach. The work involves the initial manual classification of code comments and then building a machine learning model to classify student code comments automatically. The findings of our study revealed that novice developers/students’ comments are mainly related to Literal (26.66%) and Insufficient (26.66%). Further, we proposed and extended the taxonomy of such source code comments by adding a few more categories, i.e., License (5.18%), Profile (4.80%), Irrelevant (4.80%), Commented Code (4.44%), Autogenerated (1.48%), and Improper (1.10%). Moreover, we assessed our approach with three different machine-learning classifiers. Our implementation of machine learning models found that Decision Tree resulted in the overall highest accuracy, i.e., 85%. This study helps in predicting the type of code comments for a novice developer using a machine learning approach that can be implemented to generate automated feedback for students, thus saving teachers time for manual one-on-one feedback, which is a time-consuming activity.
Keywords: software development; software developers; beginners; programming; source codes; classification; machine learning
Free keywords: source code comments; classification; machine learning techniques
Contributing organizations
Ministry reporting: Yes
VIRTA submission year: 2023
JUFO rating: 1