Improving Few Shot Adaptive Learning for Medical Image Classification using Vision Transformer(ViT)

dc.contributor.authorArif, Nokimul Hasan
dc.contributor.authorAhbab, Sakif
dc.contributor.authorAziz, Syem
dc.date.accessioned2025-06-02T09:59:58Z
dc.date.available2025-06-02T09:59:58Z
dc.date.issued2024-06-30
dc.descriptionSupervised by Dr. Md. Hasanul Kabir, Professor, Co Supervisor Mr. Sabbir Ahmed Assistant Professor Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024en_US
dc.description.abstractMedical image classification plays a pivotal role in automating disease diagnosis and treatment planning. However, the limited availability of annotated medical data poses a significant challenge for training accurate classifiers. This research paper introduces an enhanced approach to improve Few-Shot Adaptive Learning for Medical Image Classification, employing the transformative capabilities of Vision Transformer (ViT) architectures. Our proposed method uses ViTs to capture intricate spatial relation ships and contextual information inherent in medical images. To address the chal lenge of limited labeled data, we focus on improving Few-Shot Learning by intro ducing adaptive learning strategies. The integration of ViT not only enhances the model’s ability to learn complex patterns but also facilitates efficient adaptation to new classes with minimal labeled data. The model dynamically adjusts its representa tion space, allowing for efficient adaptation to diverse medical imaging scenarios with minimal labeled examples. Extensive experiments are conducted on diverse medical image datasets to validate the effectiveness of our approach. The results showcase no table improvements in classification performance compared to existing state-of-the art methods. The proposed ViT-based framework holds promise for improving the generalization and adaptability of medical image classifiers, thereby contributing to the advancement of automated medical diagnosis and treatment planningen_US
dc.identifier.citation[1] I. Ashrafi, M. Mohammad, A. S. Mauree, and K. M. Habibullah, “Attention guided relation network for few-shot image classification,” in Proceedings of the 7th International Conference on Computer and Communications Management, 2019, pp. 177–180. [2] I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le, “Attention augmented con volutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019. [3] T. Brown, B. Mann, N. Ryder, et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https : / / proceedings . neurips . cc / paper _ files / paper / 2020 / file / 1457c0d6bfcb4967418bfb8ac142f64a - Paper.pdf. [4] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., Cham: Springer International Publishing, 2020, pp. 213–229, isbn: 978-3-030-58452-8. [5] M. Chen, A. Radford, R. Child, et al., “Generative pretraining from pixels,” in Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, Eds., ser. Proceedings of Machine Learning Research, vol. 119, PMLR, 13–18 Jul 2020, pp. 1691–1703. [Online]. Available: https://proceedi- %20ngs.mlr.press/v119/chen20s.html. [6] Y.-C. Chen, L. Li, L. Yu, et al., “Uniter: Universal image-text representation learning,” in European Conference on Computer Vision, 2019. [Online]. Avail able: https://api.semanticscholar.org/CorpusID:216080982. 43 [7] R. Child, S. Gray, A. Radford, and I. Sutskever, Generating long sequences with sparse transformers, Apr. 2019. [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds., Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. doi: 10. 18653/v1/N19-1423. [Online]. Available: https://aclanthology.org/N19- 1423. [9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020. arXiv: 2010 . 11929. [Online]. Available: https : / / arxiv . org / abs / 2010 . 11929. [10] L. Fe-Fei et al., “A bayesian approach to unsupervised one-shot learning of ob ject categories,” in proceedings ninth IEEE international conference on computer vision, IEEE, 2003, pp. 1134–1141. [11] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, pp. 594–611, 2006. [12] C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adapta tion of deep networks, 2017. arXiv: 1703.03400 [cs.LG]. [13] J. Ho, N. Kalchbrenner, D. Weissenborn, and T. Salimans,Axial attention in mul tidimensional transformers, 2020. [Online]. Available: https://openreview. net/forum?id=H1e5GJBtDr. [14] H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detec tion,” Nov. 2017. [15] S. X. Hu, D. Li, J. Stühmer, M. Kim, and T. M. Hospedales, Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference, 2022. arXiv: 2204.07305 [cs.CV]. 44 [16] M. Khalil, A. Khalil, and A. Ngom, A comprehensive study of vision transformers in image classification tasks, 2023. arXiv: 2312.01232 [cs.CV]. [17] G. R. Koch, “Siamese neural networks for one-shot image recognition,” 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:13874643. [18] Kshitiz, G. Garg, and A. Paul, “Few-shot diagnosis of chest x-rays using an ensemble of random discriminative subspaces,” in 2023 ICLR First Workshop on “Machine Learning & Global Health”, 2023. [Online]. Available: https:// openreview.net/forum?id=AF97JZpgPe. [19] B. Lake, R. Salakhutdinov, and J. Tenenbaum, “One-shot learning by invert ing a compositional causal process,” Advances in Neural Information Processing Systems, Feb. 2015. [20] L. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, Visualbert: A simple and performant baseline for vision and language, Aug. 2019. [21] B. Liu, X. Yu, A. Yu, P. Zhang, G. Wan, and R. Wang, “Deep few-shot learning for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 4, pp. 2290–2304, 2018. [22] Y. Liu, Y. Lei, J. Fan, F. Wang, Y. Gong, and Q. Tian, “Survey on image classi fication technology based on small sample learning,” Acta Autom. Sin, vol. 47, pp. 297–315, 2021. [23] F. Locatello, D. Weissenborn, T. Unterthiner, et al., “Object-centric learning with slot attention,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Cur ran Associates, Inc., 2020, pp. 11 525–11 538. [Online]. Available: https : / / proceedings.neurips.cc/paper_files/paper/2020/file/8511df98c02a- %20b60aea1b2356c013bc0f-Paper.pdf. [24] J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: Pretraining task-agnostic visi olinguistic representations for vision-and-language tasks,” in Advances in Neu ral Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/ paper/2019/file/c74d97b01eae257e44aa9d5bade97baf-Paper.pdf. 45 [25] T. Munkhdalai and H. Yu, Meta networks, 2017. arXiv: 1703.00837 [cs.LG]. [26] S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by pre dicting parameters from activations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7229–7238. [27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018. [28] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for se mantic segmentation,” arXiv preprint arXiv:1709.03410, 2017. [29] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020. [30] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few shot learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4136–4145. [31] R. Singh, V. Bharti, V. Purohit, A. Kumar, A. K. Singh, and S. K. Singh, “Metamed: Few-shot medical image classification using gradient-based meta-learning,” Pat tern Recognition, vol. 120, p. 108 111, 2021. [32] J. Snell, K. Swersky, and R. S. Zemel, Prototypical networks for few-shot learning, 2017. arXiv: 1703.05175 [cs.LG]. [33] C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, Videobert: A joint model for video and language representation learning, Apr. 2019. [34] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Ad vances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017. [Online]. Available: https : / / proceedings . neurips . cc / paper _ files / paper / 2017 / file / 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [35] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, Matching networks for one shot learning, 2017. arXiv: 1606.04080 [cs.LG]. 46 [36] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances in neural information processing systems, vol. 29, 2016. [37] H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, Mar. 2020. [38] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018. [39] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised clas sification and localization of common thorax diseases,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2097– 2106. [40] Y.-X.Wang, L. Gui, and M. Hebert, “Few-shot hash learning for image retrieval,” in Proceedings of the IEEE International Conference on Computer Vision Work shops, 2017, pp. 1228–1237. [41] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020. [42] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53, no. 3, Jun. 2020, issn: 0360-0300. doi: 10.1145/3386252. [Online]. Available: https://doi. org/10.1145/3386252. [43] D. Weissenborn, O. Täckström, and J. Uszkoreit, “Scaling autoregressive video models,” in International Conference on Learning Representations, 2020. [On line]. Available: https://openreview.net/forum?id=rJgsskrFwH. [44] B. Wu, C. Xu, X. Dai, et al., Visual transformers: Token-based image representa tion and processing for computer vision, Jun. 2020. [45] Y. Yu and N. Bian, “An intrusion detection method using few-shot learning,” IEEE Access, vol. 8, pp. 49 730–49 740, 2020. 47 [46] Z.-Y. Zhang and Z. Tian, “Adaptive kernel feature subspace method for efficient feature extraction,” Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, vol. 26, pp. 392–401, Apr. 2013. [47] Z. Zhang, Z. Tian, X. Duan, and X. Fu, “Adaptive kernel subspace method for speeding up feature extraction,” Neurocomputing, vol. 113, pp. 58–66, 2013,issn: 0925-2312. doi: https://doi.org/10.1016/j.neucom.2013.01.035. [On line]. Available: https://www.sciencedirect.com/science/article/pii/ S0925231213002257. [48] C. Zhou, M. Sun, L. Chen, A. Cai, and J. Fang, “Few-shot learning framework based on adaptive subspace for skin disease classification,” in 2022 IEEE Inter national Conference on Bioinformatics and Biomedicine (BIBM), 2022, pp. 2231– 2237. doi: 10.1109/BIBM55620.2022.9995042en_US
dc.identifier.urihttp://hdl.handle.net/123456789/2412
dc.language.isoenen_US
dc.publisherDepartment of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladeshen_US
dc.titleImproving Few Shot Adaptive Learning for Medical Image Classification using Vision Transformer(ViT)en_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
Fulltext_ CSE_190041107_190041212_190041238_Book - Syem Aziz 190041238.pdf
Size:
3.41 MB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
Signature Page_ CSE_190041107_190041212_190041238_Signatures - Syem Aziz 190041238.pdf
Size:
1.15 MB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
Turnitin Report_ 15%_CSE_190041107_190041212_190041238_PlagiarismReport - Syem Aziz 190041238.pdf
Size:
2.29 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections