An Explainable Deep Learning Framework for Content-Based Image Retrieval Using Transfer Learning and Zero-Shot CLIP Models

Authors

  • Jehan Kadhim Al-Safi Department of Digital Media, Faculty of Media, University of Thi-Qar
  • Wasan M. Jwaid Department of Banking and Finance ,Administration and Economics, University of Thi-Qar

Abstract

Content-Based Image Retrieval (CBIR) plays an important role in computer vision by allowing users
to find images using their content rather than by looking at metadata. Although there have been
advances, CBIR has issues in getting high accuracy, making its results understandable and being
generally useful in situations like scarce labeled data or complex semantics. Therefore, this paper
proposes a unified and strong CBIR approach which uses transfer learning, explainable artificial
intelligence and zero-shot retrieval. To be precise, the system connects six advanced pretrained
convolutional neural networks (CNNs) such as MobileNet, MobileNetV2, DenseNet121, VGG16,
NASNetMobile and Xception to enable the extraction of important picture features. It was Xception that
gave the highest performance, reporting a test accuracy of 98.99% on the COREL-1000 dataset. To
make the model’s decisions understandable, Grad-CAM helps us show where in the image the model
had its highest attention. The framework also uses the CLIP model which allows users to find images by
asking in natural language, even without having labeled training data. It is shown to pull up semantically
useful pictures and deliver easy and adaptable ways to use the system. The results show that combining
explainable and zero-shot approaches with deep learning leads to a reliable CBIR solution that is better
than previous systems in performance and accuracy. For this reason, the framework is ideal for use in
different areas such as e-commerce, medical analysis and smart online media searches.

Published

2025-09-30

How to Cite

Al-Safi, J. K., & Jwaid, W. M. (2025). An Explainable Deep Learning Framework for Content-Based Image Retrieval Using Transfer Learning and Zero-Shot CLIP Models. University of Thi-Qar Journal, 20(3), 38–58. Retrieved from https://www.jutq.utq.edu.iq/index.php/main/article/view/417