A1 Journal article (refereed)
Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques (2023)

Niazi, T., Das, T., Ahmed, G., Waqas, S. M., Khan, S., Khan, S., Abdelatif, A. A., & Wasi, S. (2023). Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques. Algorithms, 16(1), Article 53. https://doi.org/10.3390/a16010053

JYU authors or editors

Publication details

All authors or editorsNiazi, Tahira; Das, Teerath; Ahmed, Ghufran; Waqas, Syed Muhammad; Khan, Sumra; Khan, Suleman; Abdelatif, Ahmed Abdelaziz; Wasi, Shaukat

Journal or seriesAlgorithms


Publication year2023

Publication date12/01/2023


Issue number1

Article number53

PublisherMDPI AG

Publication countrySwitzerland

Publication languageEnglish


Publication open accessOpenly available

Publication channel open accessOpen Access channel

Publication is parallel published (JYX)https://jyx.jyu.fi/handle/123456789/85762


Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Researchers investigated the effect of code comments on software development tasks and demonstrated the use of comments in several ways, including maintenance, reusability, bug detection, etc. Given the importance of code comments, it becomes vital for novice developers to brush up on their code commenting skills. In this study, we initially investigated what types of comments novice students document in their source code and further categorized those comments using a machine learning approach. The work involves the initial manual classification of code comments and then building a machine learning model to classify student code comments automatically. The findings of our study revealed that novice developers/students’ comments are mainly related to Literal (26.66%) and Insufficient (26.66%). Further, we proposed and extended the taxonomy of such source code comments by adding a few more categories, i.e., License (5.18%), Profile (4.80%), Irrelevant (4.80%), Commented Code (4.44%), Autogenerated (1.48%), and Improper (1.10%). Moreover, we assessed our approach with three different machine-learning classifiers. Our implementation of machine learning models found that Decision Tree resulted in the overall highest accuracy, i.e., 85%. This study helps in predicting the type of code comments for a novice developer using a machine learning approach that can be implemented to generate automated feedback for students, thus saving teachers time for manual one-on-one feedback, which is a time-consuming activity.

Keywordssoftware developmentsoftware developersbeginnersprogrammingsource codesclassificationmachine learning

Free keywordssource code comments; classification; machine learning techniques

Contributing organizations

Ministry reportingYes

Reporting Year2023

JUFO rating1

Last updated on 2024-15-06 at 21:25