Addressing the challenges of rare diseases is difficult, especially with the limited number of reference images and a small patient population. This is more evident in rare skin diseases, where we encounter long-tailed data distributions that make it difficult to develop unbiased and broadly effective models. The diverse ways in which image datasets are gathered and their distinct purposes also add to these challenges. Our study conducts a detailed examination of the benefits and drawbacks of episodic and conventional training methodologies, adopting a few-shot learning approach alongside transfer learning. We evaluated our models using the ISIC2018, Derm7pt, and SD-198 datasets. With minimal labeled examples, our models showed substantial information gains and better performance compared to previously trained models...
Y Özdemir, H.Y. Keles, Ö. Tanrıöver – (in Review- IEEE Journal of Biomedical and Health Informatics), 2024
[Paper] [Arxiv]
This study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process complex information, crucial for addressing the challenges of non-linear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multi-layered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes...
M. Khatounabad, H.Y. Keles, S. Kadioglu– (Accepted- IEEE Transactions on Geoscience and Remote Sensing), 2024
[Paper] [Arxiv]
We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model’s original response, embedding a few altered logits into the output while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution’s entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function...
E. Yilmaz, H.Y. Keles – 2024
[Paper] [Arxiv]
In this study, we present a novel methodology that enhances hand keypoint extraction in low-resolution sign language datasets, a challenge that has been largely unexplored in sign language research. By addressing the limitations of existing pose extraction models like OpenPose and MediaPipe, which frequently struggle with accurately detecting hand key-points in low-resolution footage, our method marks a notable advancement in this specialized field. Our methodology adapts the U-Net and Attention U-Net architectures to improve the resolution of sign language videos while reducing undetected hand presence (UHP) in low-resolution footage...
S.M. Taşyürek, T. Kızıltepe, H.Y. Keles – 2024
[Paper]
Sign language users often face challenges in environments where sign language is not readily available. That is primarily because producing sign language requires a real signer, which can be time-consuming and labor-intensive. One potential solution to make more available sign language to its users is the use of artificial intelligence to generate sign language. While there have been global studies exploring AI-generated sign language videos for various sign languages, the research in this field is still evolving and active. Specifically, for Turkish Sign Language (TSL), only a few studies have been conducted. Our study introduces a model that combines pose sequences with reference images of signers to generate authentic sign language videos. Utilizing a conditional generative adversarial network (cGAN), the model generates video sequences by particularly focusing on complex finger details, arm positions while preserving the appearance of the reference image...
F. Özkan, H.K. Tekin, H.Y. Keles – 2024
[Paper]
Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context, we focused on textually describing body movements from skeleton keypoint sequences, leading to the creation of a new dataset. We structured this dataset...
A. E. Keskin, H.Y. Keles – 2024
This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times. We developed two baseline models to address these challenges: the Pose to Text Transformer (P2T-T) and the Graph Neural Network based Transformer (GNN-T) models....
Ş. Öztürk, H.Y. Keles – 2024
Image-to-image translation has become an important technique in computer vision and focuses on preserving the effective content of images while enabling them to be translated from one domain to another. Among different spectral imaging formats, the Short Wave Infrared (SWIR) image format stands out with its ability to capture more information than the known color image format. However, the interpretability of SWIR images poses a significant challenge as they are not as easily understood by humans as color images...In this study, an original dataset is created where SWIR and color images can be translated into each other. Using this dataset, the performances of Cycle Consistent Adversarial Networks (CycleGAN), Contrastive Unpaired Translation (CUT) and Fast Constrastive Unpaired Translation (FastCUT) deep generative models, which are known to be successful in translating unpaired images, are compared...
D. Taşbaş, H.Y. Keles – 2024
[Paper]
In this study, the issue of few-shot audio classification in scenarios with limited labeled data is addressed, and experiments conducted on the GSC and ESC-50 audio datasets are presented. The study examines three experimental setups structured around training scenarios with 5,10, and 15 samples. In all these experiments, accuracy values obtained using 1 and 5 audio recordings per instance for 5-class situations are compared. Trainings were conducted with three different loss optimizations, and the effects of simple feature transformations on classification performance for each training were also assessed...
E.F. Çiğdem, H.Y. Keles – 2024
[Paper]
In this paper, we propose an efficient solution to the facial image painting problem using the Cyclic Reverse Generator (CRG) architecture, which provides an encoder-generator model. We use the encoder to embed a given image to the generator space and incrementally inpaint the masked regions until a plausible image is generated; a discriminator network is utilized to assess the generated images during the iterations. We empirically observed that only a few iterations are sufficient to generate realistic images with the proposed model.
Y Dogan, HY Keles – Neural Computing and Applications journal, 2022
In this paper, we propose an isolated sign language recognition model based on a model trained using Motion History Images (MHI) that are generated from RGB video frames. RGB-MHI images represent spatio-temporal summary of each sign video effectively in a single RGB image. We propose two different approaches using this RGB-MHI model. In the first approach we use the RGB-MHI model as a motion-based spatio-temporal attention module integrated into a 3D-CNN architecture. In the second approach, we use RGB-MHI model features directly with the features of a 3D-CNN model using a late fusion technique.
OM Sincan, HY Keles – IEEE Access, 2022
Remote sensing scene classification deals with the problem of classifying land use/cover of a region from images. To predict the development and socioeconomic structures of cities, the status of land use in regions are tracked by the national mapping agencies of countries. Many of these agencies use land use types that are arranged in multiple levels. In this paper, we examined the efficiency of a hierarchically designed CNN based framework that is suitable for such arrangements. We use NWPU-RESISC45 dataset for our experiments and arranged this data set in a two level nested hierarchy. We have two cascaded deep CNN models initiated using DenseNet-121 architectures.
O Sen, HY Keles – PFG Journal, 2022
In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively.
A Tur, HY Keles – Multimedia Tools and Applications, 2021
Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The method to achieve this goal is to find an accurate latent vector representation of an image and a direction corresponding to the attribute. Almost all the works in the literature use labeled datasets in a supervised setting for this purpose. In this study, we introduce an architecture called Cyclic Reverse Generator (CRG), which allows learning the inverse function of the generator accurately via an encoder in an unsupervised setting by utilizing cyclic cost minimization.
Y Dogan, HY Keles - Neurocomputing, 2020
Disease-aware image editing by means of generative adversarial networks (GANs) constitutes a promising avenue for advancing the use of AI in the healthcare sector. Here, we present a proof of concept of this idea. While GAN-based techniques have been successful in generating and manipulating natural images, their application to the medical domain, however, is still in its infancy. Working with the CheXpert data set, we show that StyleGAN can be trained to generate realistic chest X-rays. Inspired by the Cyclic Reverse Generator (CRG) framework, we train an encoder that allows for faithfully inverting the generator on synthetic X-rays and provides organ-level reconstructions of real ones.
A Saboo, SN Ramachandran, K Dierkes, HY Keles - MED-NEURIPS 2020
[Paper] [Presentation]
In this study, we present a new large-scale multi-modal Turkish Sign Language dataset (AUTSL) with a benchmark and provide baseline models for performance evaluations. Our dataset consists of 226 signs performed by 43 different signers and 38,336 isolated sign video samples in total. Samples contain a wide variety of backgrounds recorded in indoor and outdoor environments. Moreover, spatial positions and the postures of signers also vary in the recordings. Each sample is recorded with Microsoft Kinect v2 and contains color image (RGB), depth and skeleton data modalities. We prepared benchmark training and test sets for user independent assessments of the models. We trained several deep learning based models and provide empirical evaluations using the benchmark; we used Convolutional Neural Networks (CNNs) to extract features, unidirectional and bidirectional Long Short-Term Memory (LSTM) models to characterize temporal information.
OM Sincan, HY Keles – IEEE Access, 2020
[Paper] [Arxiv][AUTSL Dataset]
Foreground segmentation algorithms aim at segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder–decoder-type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a variation of our formerly proposed method that can be trained end-to-end using only a few training examples.
LA Lim, HY Keles – Pattern Analysis and Applications, 2020
In this paper, we propose two segmentation networks that mark face and hands from static images for sign language recognition using only a few training data. Our networks have encoder-decoder structure that contains convolutional, max pooling and upsampling layers; the first one is a U-Net based network and the second one is a VGG-based network.
OM Sincan, S Gencoglu, M Bacak, HY Keles – IEEE ISMSIT, 2019
Sign recognition is a challenging problem due to high variance of the signs among different signers and multiple modalities of the input information. In this work, we propose a Siamese Neural Network (SNN) architecture that is used to extract features from the RGB and the depth streams of a sign frame in parallel.
AO Tur, HY Keles – IEEE EUROCON, 2019
[Paper]
This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes.
HY Keles, J Rozhon, HG Ilk, M Voznak – IEEE Access, 2019
[Paper]
In this paper, two novel deep learning architectures are proposed to solve the scene classification problem using aerial images. In model evaluations, we used one of the largest open access dataset, i.e. NWPU-RESIS45 dataset, that contains 45 different categories and in total 31500 samples. 95.7% accuracy that we get with the developed models show a competitive performance with the state-of-the-art methods.
O Sen, HY Keles – IEEE SIU, 2019
[Paper]
Generative Adversarial Networks (GANs) enable generating photo-realistic images more successfully compared to other generative models. In this study, we empirically examined the state-of-the-art cost functions, regularization techniques and network architectures that have recently been proposed to deal with these problems, using CelebA dataset.
Y Dogan, HY Keles – IEEE SIU, 2019
[Paper]
Sign language recognition systems are used to convert signs in video streams to text automatically. In this work, an original isolated sign language recognition model is created using Convolutional Neural Networks (CNNs), Feature Pooling Module and Long Short-Term Memory Networks (LSTMs).
OM Sincan, AO Tur, HY Keles – IEEE SIU, 2019
[Paper]
We propose two robust encoder-decoder type neural networks that generate multi-scale feature encodings in different ways and can be trained end-to-end using only a few training samples. Using the same encoder-decoder configurations, in the first model, a triplet of encoders take the inputs in three scales to embed an image in a multi-scale feature space.
LA Lim, HY Keles – Pattern Recognition Letters, 2018
Action classification from video streams is a challenging problem, especially when there is a limited number of training data for different actions. In this work, we examined the performances of Hidden Markov Models (HMM) and long short-term memory (LSTM) based recurrent neural network models using the same sequence classification framework with the well known KTH action dataset.
EC Alp, HY Keles – Advances in Intelligent Systems and Computing Book, 2018
[Paper]
Embedding emergent parts in shape grammars is computationally challenging. The first challenge is the representation of shapes, which needs to enable reinterpretation of parts regardless of the creation history of the shapes. The second challenge is the relevant part searching algorithm that provides an extensive exploration of the design space–time efficiently. In this work, we propose a novel method to solve both problems.
HY Keles – AI EDAM, 2018
[Paper]
In this paper, we present a novel method to detect and classify moving objects from surveillance videos that are obtained from a moving camera. In our method, we first estimate the camera motion by interpreting the movement of interest points in the scene. Then, we eliminate the camera motion and find candidate regions that belong to the moving objects.
OM Sincan, HY Keles, S Tosun – Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 2018
[Paper]
Action recognition from video streams is among the active research topics in computer vision. The challenge is on the identification of the actions robustly regardless of the variations imposed by appearances of actions performed by different people. This paper proposes a Hidden Markov Model (HMM) based approach to model actions using Hu moments that are computed using a modified Motion History Images (MHI).
EC Alp, HY Keles – IEEE EUROCON , 2017
[Paper]
In this study, we investigate the suitability of functional near-infrared spectroscopy signals (fNIRS) for person identification using data visualization and machine learning algorithms. We first applied two linear dimension reduction algorithms: Principle Component Analysis (PCA) and Singular Value Decomposition (SVD) in order to reduce the dimensionality of the fNIRS data.
OM Sincan, HY Keles, Y Kır, A KUSMAN, B Baskak – Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 2017
[Paper]