Extraction of app problems and its corresponding user action from user review of apps using few-shot learning

Abstract

User reviews provide the developers with very important feedback on how to fix the problems, improve application performance, and enhance the user experience. However, these are very unstructured, containing typographical errors, grammat ical mistakes, and informal language in nature, hence making extracting action able insights from such huge reviews a big challenge for developers. The current research focuses on the extraction and synthesis of user behaviors and related ap plication problems extracted from app reviews in the social media industry using large language model capabilities such as those offered by GPT-3.5 Turbo. Our ap proach extends previous ones, like Caspar, by overcoming their major limitations through few-shot learning and advanced prompt engineering techniques. This study shows the need for an effective mechanism to identify and synthesize action-problem pairs from user reviews. We used a dataset of 330 reviews from social media applications to train a fine-tuned model that could handle diverse scenarios, even including reviews without explicit key phrases. Unlike previous attempts, our system effectively captures complex interactions, such as many user activities resulting in a single app fault or one action generating multiple issues. The methodology included substantial data preparation, advanced rapid engineer ing, and performance testing against known baselines. Metrics like recall, and accuracy demonstrate notable improvements over traditional models like Caspar. These results from the current research prove that the proposed model outperforms the state-of-the-art methodologies in the proper classification of action-problem pairs and dealing with noisy, informal data, which is often a very neglected chal lenge. This work shows the potential of deep natural language processing tech niques in software development and maintenance, emphasizing large datasets and automated frameworks to reduce human annotation effort and improve scalability. Future work will aim to develop a more extensive and generalized set of action problem pairs, exploring such industrial applications as automated test case gen eration and proactive maintenance support for developers.

Description

Supervised by Dr. Hasan Mahmud, Professor, Co-Supervisor, Dr. Md. Kamrul Hasan, Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Software Engineering, 2024

Keywords

Natural language processing, app reviews, user actions, app problems, large language models, few-shot learning, prompt engineer ing, social media applications, act

Citation

[1] D. Pagano and W. Maalej, “User feedback in the appstore: An empirical study,” in 2013 21st IEEE International Requirements Engineering Confer ence (RE), 2013, pp. 125–134. [2] A. J. Ko, M. J. Lee, V. Ferrari, S. Ip, and C. Tran, “A case study of post-deployment user feedback triage,” in Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, ser. CHASE ’11. New York, NY, USA: Association for Computing Machinery, 2011, p. 1–8. [Online]. Available: https://doi.org/10.1145/1984642.1984644 [3] D. Pagano and B. Br¨ugge, “User involvement in software evolution practice: a case study,” in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE ’13. IEEE Press, 2013, p. 953–962. [4] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang, “Ar miner: mining informative reviews for developers from mobile app marketplace,” in Proceedings of the 36th International Conference on Software Engineering, ser. ICSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, p. 767–778. [Online]. Available: https: //doi.org/10.1145/2568225.2568263 [5] A. Di Sorbo, S. Panichella, C. V. Alexandru, J. Shimagaki, C. A. Visaggio, G. Canfora, and H. C. Gall, “What would users change in my app? summarizing app reviews for recommending software changes,” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. New York, NY, USA: Association for Computing Machinery, 2016, p. 499–510. [Online]. Available: https://doi.org/10.1145/2950290.2950299 [6] Z. Kurtanovi´c and W. Maalej, “Mining user rationale from software reviews,” in 2017 IEEE 25th International Requirements Engineering Conference (RE), 2017, pp. 61–70. 39 Bibliography 40 [7] W. Maalej and H. Nabil, “Bug report, feature request, or simply praise? on automatically classifying app reviews,” in 2015 IEEE 23rd International Requirements Engineering Conference (RE), 2015, pp. 116–125. [8] S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall, “How can i improve my app? classifying user reviews for software main tenance and evolution,” in 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2015, pp. 281–290. [9] H. Guo and M. P. Singh, “Caspar: Extracting and synthesizing user sto ries of problems from app reviews,” in 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), 2020, pp. 628–640. [10] A. F. de Ara´ujo and R. M. Marcacini, “Re-bert: automatic extraction of software requirements from app reviews using bert language model,” in Proceedings of the 36th Annual ACM Symposium on Applied Computing, ser. SAC ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 1321–1327. [Online]. Available: https://doi.org/10.1145/3412841.3442006 [11] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bash lykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023. [12] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023. [13] P. A. Laplante and M. Kassab, Requirements engineering for software and systems. Auerbach Publications, 2022. [14] K. Ronanki, B. Cabrero-Daniel, J. Horkoff, and C. Berger, “Requirements engineering using generative ai: Prompts and prompting patterns,” 2023. [Online]. Available: https://arxiv.org/abs/2311.03832 [15] M. Tavakoli, L. Zhao, A. Heydari, and G. Nenadi´c, “Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools,” Expert Systems with Applications, vol. 113, pp. 186–199, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417418303361 [16] R. Kasauli, E. Knauss, J. Horkoff, G. Liebel, and F. G. de Oliveira Neto, “Requirements engineering challenges and practices in large-scale agile Bibliography 41 system development,” Journal of Systems and Software, vol. 172, p. 110851, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0164121220302417 [17] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang, “Ar miner: mining informative reviews for developers from mobile app marketplace,” in Proceedings of the 36th International Conference on Software Engineering, ser. ICSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, p. 767–778. [Online]. Available: https: //doi.org/10.1145/2568225.2568263 [18] L. Zhao, W. Alhoshan, A. Ferrari, K. J. Letsholo, M. A. Ajagbe, E.-V. Chioasca, and R. T. Batista-Navarro, “Natural language processing (nlp) for requirements engineering: A systematic mapping study,” 2020. [Online]. Available: https://arxiv.org/abs/2004.01099 [19] R. K. Helmeczi, M. Cevik, and S. Yıldırım, “Few-shot learning for sentence pair classification and its applications in software engineering,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08058 [20] M. Dragoni, M. Federici, and A. Rexha, “An unsupervised aspect extraction strategy for monitoring real-time reviews stream,” Information Processing Management, vol. 56, no. 3, pp. 1103–1118, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306457317305174 [21] N. Jha and A. Mahmoud, “Mining user requirements from application store reviews using frame semantics,” in Requirements Engineering: Foundation for Software Quality, P. Gr¨unbacher and A. Perini, Eds. Cham: Springer International Publishing, 2017, pp. 273–287. [22] K. Ruan, X. Chen, and Z. Jin, “Requirements modeling aided by chatgpt: An experience in embedded systems,” in 2023 IEEE 31st International Re quirements Engineering Conference Workshops (REW), 2023, pp. 170–177. [23] P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models: Techniques and applications,” 2024. [Online]. Available: https://arxiv.org/ abs/2402.07927 [24] C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied Bibliography 42 agents with large language models,” 2023. [Online]. Available: https: //arxiv.org/abs/2212.04088 [25] Y. Lu, H. Lin, J. Xu, X. Han, J. Tang, A. Li, L. Sun, M. Liao, and S. Chen, “Text2event: Controllable sequence-to-structure generation for end-to-end event extraction,” 2021. [Online]. Available: https: //arxiv.org/abs/2106.09232 [26] J. Gao, H. Zhao, W. Wang, C. Yu, and R. Xu, “Eventrl: Enhancing event extraction with outcome supervision for large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.11430 [27] B. Wang, C. Wei, Z. Liu, G. Lin, and N. F. Chen, “Resilience of large language models for noisy instructions,” 2024. [Online]. Available: https://arxiv.org/abs/2404.09754 [28] N. Jha and A. Mahmoud, “Mining user requirements from application store reviews using frame semantics,” in Requirements Engineering: Foundation for Software Quality, P. Gr¨unbacher and A. Perini, Eds. Cham: Springer International Publishing, 2017, pp. 273–287. [29] M. Lapata and A. Lascarides, “Inferring sentence-internal temporal rela tions,” in Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, 2004, pp. 153–160. [30] ——, “Learning sentence-internal temporal relations,” Journal of Artificial Intelligence Research, vol. 27, pp. 85–117, 2006

Collections

Endorsement

Review

Supplemented By

Referenced By