Classification of Stack Overflow Questions Based on Difficulty

dc.contributor.authorRaida, Maliha Noushin
dc.contributor.authorSristy, Zannatun Naim
dc.contributor.authorMonisha, Sheikh Moonwara Anjum
dc.contributor.authorUlfat, Nawshin
dc.date.accessioned2023-03-23T09:48:12Z
dc.date.available2023-03-23T09:48:12Z
dc.date.issued2022-05-30
dc.descriptionSupervised by Mr. Md. Jubair Ibna Mostafa; Lecturer Mr. Md. Nazmul Haque,Lecturer Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.en_US
dc.description.abstractTechnical question answering sites, like Stack Overflow, are gaining enormous attention from the learners and practitioners of specialized fields to exchange their programming knowledge. Question answering on different topics has engaged all levels of programmers. All the developers don’t have the same level of expertise, and the question differs among them in terms of complexity and context. However, the existing approach of Stack Overflow models primarily filters out the questions based on tags, which is inefficient for predicting the difficulty level. Due to the limitation of the process, a large part of these posts fails to attract the attention of appropriate users, resulting in valid questions having no answer or significant delay in response time. Therefore, to address these limitations, we proposed three different supervised models using TF-IDF, Topic Modeling(LDA), and Doc2Vec that build more complicated relationships by extracting context-dependent features between the user and the question. Each of the models builds an informative relationship that helps classify the difficulty of a question. Extensive experiments on different variations of the datasets demonstrate the improved efficacy of our proposed models over contemporary models. The experiments find out that even with limited information, the models performance scores are satisfactory and the Doc2Vec model outperforms the other models under consideration.en_US
dc.identifier.citation[1] S. Wang, T.-H. P. Chen, and A. Hassan, “Understanding the factors for fast answers in technical q&a websites: an empirical study of four stack exchange websites,” Proceedings of the 40th International Conference on Software Engineering, 2018. [2] S. Mondal, C. M. K. Saifullah, A. Bhattacharjee, M. M. Rahman, and C. K. Roy, “Early detection and guidelines to improve unanswered questions on stack overflow,” in 14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference), ser. ISEC 2021. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3452383.3452392 [3] N. Viriyadamrongkij and T. Senivongse, “Measuring difficulty levels of javascript questions in question-answer community based on concept hierarchy,” 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6, 2017. [4] S. A. Hassan, D. Das, A. Iqbal, A. Bosu, R. Shahriyar, and T. Ahmed, “Soqde: A supervised learning based question difficulty estimation model for stack overflow,” in 2018 25th Asia-Pacific Software Engineering Conference (APSEC), 2018, pp. 445–454. [5] D. Thukral, A. Pandey, R. Gupta, V. Goyal, and T. Chakraborty, “Diffque: Estimating relative difficulty of questions in community question answering services,” ACM Trans. Intell. Syst. Technol., vol. 10, pp. 42:1–42:27, 2019. [6] L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, “Design lessons from the fastest qamp;a site in the west,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11. New York, NY, USA: Association for Computing Machinery, 2011, p. 28572866. [Online]. Available: https://doi.org/10.1145/1978942.1979366 [7] L. Wang, B. Wu, J. Yang, and S. Peng, “Personalized recommendation for new questions in community question answering,” in 2016 IEEE/ACM International 45 Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016, pp. 901–908. [8] M. Asaduzzaman, A. S. Mashiyat, C. K. Roy, and K. A. Schneider, “Answering questions about unanswered questions of stack overflow,” in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013, pp. 97–100. [9] L.Wang, L. Zhang, and J. Jiang, “Iea: an answerer recommendation approach on stack overflow,” Science China Information Sciences, vol. 62, 2019. [10] N. Viriyadamrongkij and T. Senivongse, “Measuring difficulty levels of javascript questions in question-answer community based on concept hierarchy,” in 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2017, pp. 1–6. [11] C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, “Latent semantic indexing: A probabilistic analysis,” in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ser. PODS ’98. New York, NY, USA: Association for Computing Machinery, 1998, p. 159168. [Online]. Available: https://doi.org/10.1145/275487.275505 [12] “A beginners guide to latent dirichlet allocation(lda),” https://iq.opengenus.org/topic-modelling-techniques/, accessed: 9.05.2022. [13] “A beginners guide to latent dirichlet allocation(lda),” https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2, accessed: 25.04.2022. [14] “Topic modelling techniques in nlp,” https://iq.opengenus.org/topic-modellingtechniques/, accessed: 25.04.2022. [15] “6 topic modeling,” https://www.tidytextmining.com/topicmodeling.html, accessed: 25.04.2022. [16] J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of population structure using multilocus genotype data,” Genetics, vol. 155, no. 2, pp. 945–959, 2000. [17] D. Falush, M. Stephens, and J. K. Pritchard, “Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies,” Genetics, vol. 164, no. 4, pp. 1567–1587, 2003. [18] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. null, p. 9931022, mar 2003. [19] “Understanding word2vec and doc2vec,” https://shuzhanfan.github.io/2018/08/understandingword2vec- and-doc2vec/, accessed: 25.04.2022. 46 [20] “A gentle introduction to doc2vec,” https://medium.com/wisio/a-gentleintroduction- to-doc2vec-db3e8c0cce5e, accessed: 25.04.2022. [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [22] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013. [23] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ser. ICML’14. JMLR.org, 2014, p. II1188II1196. [24] Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method,” arXiv preprint arXiv:1402.3722, 2014. [25] “Doc2vec,” https://blog.birost.com/a?ID=00600-e831ba42-3d77-495c-baa3- dba970172e91, accessed: 25.04.2022. [26] K. S. Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of documentation, 1972. [27] M. T. Maybury, Karen Spärck Jones and Summarization. Dordrecht: Springer Netherlands, 2005, pp. 99–103. [Online]. Available: https://doi.org/10.1007/ 1-4020-3467-9_7 [28] B. Li and I. King, “Routing questions to appropriate answerers in community question answering services,” in Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ser. CIKM ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 15851588. [Online]. Available: https://doi.org/10.1145/1871437.1871678 [29] A. Diyanati, B. S. Sheykhahmadloo, S. M. Fakhrahmad, M. H. Sadreddini, and M. H. Diyanati, “A proposed approach to determining expertise level of stackoverflow programmers based on mining of user comments,” J. Comput. Lang., vol. 61, p. 101000, 2020. [30] L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, and Z. Chen, “Cqarank: jointly model topics and expertise in community question answering,” Proceedings of the 22nd ACM international conference on Information & Knowledge Management, 2013. [31] Q. Wang, J. Liu, B. Wang, and L. Guo, “Question difficulty estimation in community question answering services,” in EMNLP, 2013.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/1780
dc.language.isoenen_US
dc.publisherDepartment of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladeshen_US
dc.subjectStack Overflow, Difficulty Classificationen_US
dc.titleClassification of Stack Overflow Questions Based on Difficultyen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Raida Thesis2021_CSE_170042001;170042043;17042057;170042081 - Zannatun Naim Sristy, 170042043.pdf
Size:
1.07 MB
Format:
Adobe Portable Document Format
Description:
Full text of the Thesis
Loading...
Thumbnail Image
Name:
7% _Raida_turnitin similarity.pdf
Size:
262.67 KB
Format:
Adobe Portable Document Format
Description:
Turnitin report_7% similarity

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections