Improving Few Shot Adaptive Learning for Medical Image Classification using Vision Transformer(ViT)

Arif, Nokimul Hasan; Ahbab, Sakif; Aziz, Syem

Improving Few Shot Adaptive Learning for Medical Image Classification using Vision Transformer(ViT)

dc.contributor.author	Arif, Nokimul Hasan
dc.contributor.author	Ahbab, Sakif
dc.contributor.author	Aziz, Syem
dc.date.accessioned	2025-06-02T09:59:58Z
dc.date.available	2025-06-02T09:59:58Z
dc.date.issued	2024-06-30
dc.description	Supervised by Dr. Md. Hasanul Kabir, Professor, Co Supervisor Mr. Sabbir Ahmed Assistant Professor Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024	en_US
dc.description.abstract	Medical image classification plays a pivotal role in automating disease diagnosis and treatment planning. However, the limited availability of annotated medical data poses a significant challenge for training accurate classifiers. This research paper introduces an enhanced approach to improve Few-Shot Adaptive Learning for Medical Image Classification, employing the transformative capabilities of Vision Transformer (ViT) architectures. Our proposed method uses ViTs to capture intricate spatial relation ships and contextual information inherent in medical images. To address the chal lenge of limited labeled data, we focus on improving Few-Shot Learning by intro ducing adaptive learning strategies. The integration of ViT not only enhances the model’s ability to learn complex patterns but also facilitates efficient adaptation to new classes with minimal labeled data. The model dynamically adjusts its representa tion space, allowing for efficient adaptation to diverse medical imaging scenarios with minimal labeled examples. Extensive experiments are conducted on diverse medical image datasets to validate the effectiveness of our approach. The results showcase no table improvements in classification performance compared to existing state-of-the art methods. The proposed ViT-based framework holds promise for improving the generalization and adaptability of medical image classifiers, thereby contributing to the advancement of automated medical diagnosis and treatment planning	en_US
dc.identifier.citation	[1] I. Ashrafi, M. Mohammad, A. S. Mauree, and K. M. Habibullah, “Attention guided relation network for few-shot image classification,” in Proceedings of the 7th International Conference on Computer and Communications Management, 2019, pp. 177–180. [2] I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le, “Attention augmented con volutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019. [3] T. Brown, B. Mann, N. Ryder, et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https : / / proceedings . neurips . cc / paper _ files / paper / 2020 / file / 1457c0d6bfcb4967418bfb8ac142f64a - Paper.pdf. [4] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., Cham: Springer International Publishing, 2020, pp. 213–229, isbn: 978-3-030-58452-8. [5] M. Chen, A. Radford, R. Child, et al., “Generative pretraining from pixels,” in Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, Eds., ser. Proceedings of Machine Learning Research, vol. 119, PMLR, 13–18 Jul 2020, pp. 1691–1703. [Online]. Available: https://proceedi- %20ngs.mlr.press/v119/chen20s.html. [6] Y.-C. Chen, L. Li, L. Yu, et al., “Uniter: Universal image-text representation learning,” in European Conference on Computer Vision, 2019. [Online]. Avail able: https://api.semanticscholar.org/CorpusID:216080982. 43 [7] R. Child, S. Gray, A. Radford, and I. Sutskever, Generating long sequences with sparse transformers, Apr. 2019. [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds., Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. doi: 10. 18653/v1/N19-1423. [Online]. Available: https://aclanthology.org/N19- 1423. [9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020. arXiv: 2010 . 11929. [Online]. Available: https : / / arxiv . org / abs / 2010 . 11929. [10] L. Fe-Fei et al., “A bayesian approach to unsupervised one-shot learning of ob ject categories,” in proceedings ninth IEEE international conference on computer vision, IEEE, 2003, pp. 1134–1141. [11] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, pp. 594–611, 2006. [12] C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adapta tion of deep networks, 2017. arXiv: 1703.03400 [cs.LG]. [13] J. Ho, N. Kalchbrenner, D. Weissenborn, and T. Salimans,Axial attention in mul tidimensional transformers, 2020. [Online]. Available: https://openreview. net/forum?id=H1e5GJBtDr. [14] H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detec tion,” Nov. 2017. [15] S. X. Hu, D. Li, J. Stühmer, M. Kim, and T. M. Hospedales, Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference, 2022. arXiv: 2204.07305 [cs.CV]. 44 [16] M. Khalil, A. Khalil, and A. Ngom, A comprehensive study of vision transformers in image classification tasks, 2023. arXiv: 2312.01232 [cs.CV]. [17] G. R. Koch, “Siamese neural networks for one-shot image recognition,” 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:13874643. [18] Kshitiz, G. Garg, and A. Paul, “Few-shot diagnosis of chest x-rays using an ensemble of random discriminative subspaces,” in 2023 ICLR First Workshop on “Machine Learning & Global Health”, 2023. [Online]. Available: https:// openreview.net/forum?id=AF97JZpgPe. [19] B. Lake, R. Salakhutdinov, and J. Tenenbaum, “One-shot learning by invert ing a compositional causal process,” Advances in Neural Information Processing Systems, Feb. 2015. [20] L. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, Visualbert: A simple and performant baseline for vision and language, Aug. 2019. [21] B. Liu, X. Yu, A. Yu, P. Zhang, G. Wan, and R. Wang, “Deep few-shot learning for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 4, pp. 2290–2304, 2018. [22] Y. Liu, Y. Lei, J. Fan, F. Wang, Y. Gong, and Q. Tian, “Survey on image classi fication technology based on small sample learning,” Acta Autom. Sin, vol. 47, pp. 297–315, 2021. [23] F. Locatello, D. Weissenborn, T. Unterthiner, et al., “Object-centric learning with slot attention,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Cur ran Associates, Inc., 2020, pp. 11 525–11 538. [Online]. Available: https : / / proceedings.neurips.cc/paper_files/paper/2020/file/8511df98c02a- %20b60aea1b2356c013bc0f-Paper.pdf. [24] J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: Pretraining task-agnostic visi olinguistic representations for vision-and-language tasks,” in Advances in Neu ral Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/ paper/2019/file/c74d97b01eae257e44aa9d5bade97baf-Paper.pdf. 45 [25] T. Munkhdalai and H. Yu, Meta networks, 2017. arXiv: 1703.00837 [cs.LG]. [26] S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by pre dicting parameters from activations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7229–7238. [27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018. [28] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for se mantic segmentation,” arXiv preprint arXiv:1709.03410, 2017. [29] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020. [30] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few shot learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4136–4145. [31] R. Singh, V. Bharti, V. Purohit, A. Kumar, A. K. Singh, and S. K. Singh, “Metamed: Few-shot medical image classification using gradient-based meta-learning,” Pat tern Recognition, vol. 120, p. 108 111, 2021. [32] J. Snell, K. Swersky, and R. S. Zemel, Prototypical networks for few-shot learning, 2017. arXiv: 1703.05175 [cs.LG]. [33] C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, Videobert: A joint model for video and language representation learning, Apr. 2019. [34] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Ad vances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017. [Online]. Available: https : / / proceedings . neurips . cc / paper _ files / paper / 2017 / file / 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [35] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, Matching networks for one shot learning, 2017. arXiv: 1606.04080 [cs.LG]. 46 [36] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances in neural information processing systems, vol. 29, 2016. [37] H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, Mar. 2020. [38] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018. [39] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised clas sification and localization of common thorax diseases,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2097– 2106. [40] Y.-X.Wang, L. Gui, and M. Hebert, “Few-shot hash learning for image retrieval,” in Proceedings of the IEEE International Conference on Computer Vision Work shops, 2017, pp. 1228–1237. [41] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020. [42] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53, no. 3, Jun. 2020, issn: 0360-0300. doi: 10.1145/3386252. [Online]. Available: https://doi. org/10.1145/3386252. [43] D. Weissenborn, O. Täckström, and J. Uszkoreit, “Scaling autoregressive video models,” in International Conference on Learning Representations, 2020. [On line]. Available: https://openreview.net/forum?id=rJgsskrFwH. [44] B. Wu, C. Xu, X. Dai, et al., Visual transformers: Token-based image representa tion and processing for computer vision, Jun. 2020. [45] Y. Yu and N. Bian, “An intrusion detection method using few-shot learning,” IEEE Access, vol. 8, pp. 49 730–49 740, 2020. 47 [46] Z.-Y. Zhang and Z. Tian, “Adaptive kernel feature subspace method for efficient feature extraction,” Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, vol. 26, pp. 392–401, Apr. 2013. [47] Z. Zhang, Z. Tian, X. Duan, and X. Fu, “Adaptive kernel subspace method for speeding up feature extraction,” Neurocomputing, vol. 113, pp. 58–66, 2013,issn: 0925-2312. doi: https://doi.org/10.1016/j.neucom.2013.01.035. [On line]. Available: https://www.sciencedirect.com/science/article/pii/ S0925231213002257. [48] C. Zhou, M. Sun, L. Chen, A. Cai, and J. Fang, “Few-shot learning framework based on adaptive subspace for skin disease classification,” in 2022 IEEE Inter national Conference on Bioinformatics and Biomedicine (BIBM), 2022, pp. 2231– 2237. doi: 10.1109/BIBM55620.2022.9995042	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2412
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.title	Improving Few Shot Adaptive Learning for Medical Image Classification using Vision Transformer(ViT)	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 3 of 3

Name:: Fulltext_ CSE_190041107_190041212_190041238_Book - Syem Aziz 190041238.pdf
Size:: 3.41 MB
Format:: Adobe Portable Document Format
Description:

Download

Name:: Signature Page_ CSE_190041107_190041212_190041238_Signatures - Syem Aziz 190041238.pdf
Size:: 1.15 MB
Format:: Adobe Portable Document Format
Description:

Download

Name:: Turnitin Report_ 15%_CSE_190041107_190041212_190041238_PlagiarismReport - Syem Aziz 190041238.pdf
Size:: 2.29 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

2024