GNN andTransformer Fusion Learning for Molecular Classification of BACE1 Inhibitors
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh
Abstract
Alzheimer’s disease (AD) is a progressive and devastating neurodegenerative disor
der, primarily manifested through memory loss and cognitive decline [1], [2]. One of
the central pathological hallmarks of AD is the accumulation of amyloid-beta (A𝛽)
plaques, formed via the sequential cleavage of the amyloid precursor protein (APP)
by 𝛽-secretase (BACE1) and 𝛾-secretase [3]. Inhibiting BACE1 is therefore regarded
as a compelling therapeutic strategy, as it can impede the formation of neurotoxic
A𝛽 aggregates [4], [5]. Nevertheless, the identification of effective BACE1 inhibitors
remains arduous and resource-intensive when approached through conventional ex
perimental pipelines. In this study, we propose a hybrid deep learning framework
that fuses Graph Neural Networks (GNNs) with ChemBERTa, a transformer model
pretrained on large chemical corpora. While GNNs capture atom-level and bond
level interactions (local structural dependencies), ChemBERTa encodes long-range
dependencies and semantic patterns from SMILES representations (global chemical
context). By unifying these complementarymodalities, ourmodelovercomesthelim
itations of prior GNN+CNN approaches, where CNNs process sequential SMILES in
a strictly local fashion and fail to capture non-linear long-range dependencies across
molecular structures. Our GNN–ChemBERTa fusion model achieved an accuracy of
92.77% inclassifying active versus inactive BACE1 inhibitors, demonstrating superior
predictive power and generalization. Beyond its performance, the model contributes
to reducing drug discovery costs, accelerating virtual screening, and minimizing the
need for extensive laboratory experimentation. Moreover, a recall value of 93% in
dicates that almost all potential active molecules were successfully identified by the
model, minimizing the risk of missing true inhibitors. Similarly, a high precision
value of 93% demonstrates that the model produces very few false positives, thereby
reducing unnecessary laboratory costs associated with testing inactive compounds.
Additionally, the ROC–AUC score of 87.88% confirms that the model can effectively
distinguish between active and inactive molecules, reflecting strong overall classifica
tion performance. By enabling efficient in silico identification of potential inhibitors,
this approach not only streamlines the early stages of Alzheimer’s drug development
but also holds promise for broader application to other therapeutic targets associated
with neurodegenerative diseases.
Description
Supervised by
Mr. Tareque Mohmud Chowdhury,
Assistant Professor,
Mr. Njayou Youssouf,
Lecturer,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2025
