Projects
I would love to get my hands dirty on any new concepts by trying out in real world applications. It helps to learn new things and keep me active.
I would love to get my hands dirty on any new concepts by trying out in real world applications. It helps to learn new things and keep me active.
Imagine if you can click pictures for your instagram without thinking about crowd disturbing your photos. We created a tool where you can remove the photobombs be it people or any object. The best part is you just have to click to remove and not draw.
Single cell gel electrophoresis (SCGE) or Comet assay is the most commonly used research methods to analyse DNA damage. The output of SCGE experiment is a set of comet assay images which are further analysed to measure the degree of DNA damage. The comet assay image analysis is currently performed using commercial or open source image analysis softwares. The commercial software is too expensive for common research use and open softwares are limited in analysing the damage. In this work, we propose a framework with three modules which includes detection of valid comets, classification of damaged comets from detected valid comets and quantification of damaged comets using a data driven deep learning approach. The detection of valid comets from comet assay image is achieved by Faster RCNN object detection algorithm. Further classification of detected valid comets into damaged and undamaged comets is performed by CNN model. The quantification of damaged comets is implemented using a key point detection model that measures the most commonly used damage parameters of comet assays from the detected keypoints. All the three modules are connected to serve as a tool in order to explore the damage in comet assays.
Imagine being able to detect blindness before it happened. Millions of people suffer from diabetic retinopathy, the leading cause of blindness among working aged adults. Aravind Eye Hospital in India hopes to detect and prevent this disease among people living in rural areas where medical screening is difficult to conduct. Successful entries in this competition will improve the hospital’s ability to identify potential patients.
Two Approaches:
1. Enhance the nerves of the DR images with median subtraction method and trained on DenseNet121 from scratch and got an accuracy of 88%
2. The Experiments were done with different of Efficientnet b5 and improved the accuracy to ~91%
The BB task is composed of two subtasks. Each subtask has two modalities: one where entities are given as input, and one where entities are not be provided. Teams are free to participate in the subtask(s) of their choice.
1. Entity detection and normalization subtask (BB-norm and BB-norm+ner)
2. Entity and relation extraction subtask (BB-rel and BB-rel+ner)
The evaluation measures will be Recall and Precision of predicted events against gold events.
Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective.
Approach:
1. Dataset contains images as well as metadata(age, sex, etc.,)
2. Developed three models: CNN based model using Efficientnet, SVM model for features extracted from CNN output and Xgboost using metadata.
3. Ensembling the prediction results of SVM and Xgboost resulted with the accuracy of ~92%
The cases of COVID-19 were increasing day by day at an exponential rate in April 2019. In such situation, we might be short of devices or machines for detecting coronavirus. But if we were able to use X-ray as a first level of diagnosis for COVID, it makes the life easier. So with the available dataset of 73 images developed and deployed a model which can classify between Normal, Bacterial, Viral or COVID infection in 2 days with an F1-score of 0.7767.
We have been using Siri, Ok google for years, Still building your own chat BOT is fun. This was my fun side project to understand how the text can be converted to numerical features . Used Cosine Similarity score to understand the context of the question and answer from webpage. The sentence with highest similarity score was retrieved and sent as reply.
IEEE-CIS works across a variety of AI and machine learning areas, including deep neural networks, fuzzy systems, evolutionary computation, and swarm intelligence. Today they’re partnering with the world’s leading payment service company, Vesta Corporation, seeking the best solutions for fraud prevention industry.
Approach:
1. The concept of value counts is applied to the negatively downsampled online transaction data.
2. Blended LGBM and Catboost model predicted the probability and got 0.927892 (AUC) which is the top 16% of total teams.
Forecasting earthquakes is one of the most important problems in Earth science because of their devastating consequences. Current scientific studies related to earthquake forecasting focus on three key points: when the event will occur, where it will occur, and how large it will be.
Approach:
1. Different statistical features like mean, median, skew, kurtosis, etc. for a segment are extracted to predict the time of failure before an earthquake.
2. Blended LightBoost, XGBoost, and Neural Network to get a mean absolute error of 2.53767 (Top 18%)
Cassava, or Manihot esculenta, belongs to the family Euphorbiaceae and is cultivated in tropical and subtropical regions for its edible starchy tuberous root, which is commonly dried into a powder and named tapioca. As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.
Outlier Detection:
1. The dataset contains 21397 images of 5 different classes.
2. One of the important EDA in image dataset before model development is finding outliers. Outliers are the images which standouts out compared to other images in a particular class. For example, dog image will be considered as outlier, if it is saved in cat dataset.
3. Using two different methods outliers are detected. (k-means clustering technique and Mean vs Standard Deviation technique)