Projects 

Projects

Food computing is a newly emerged research field that applies various Natural Language Processing (NLP) techniques to the various stages of food production pipelines. In this project, the we introduce a new dataset called FoodReview, which contains more than 18,000 comments given on different recipes. The aim of this project is to extract ingredients from these comments and to do so, we implement the Named Entity Recognition (NER) model using the widely used Spacy library. The use of NER in this context is important because it enables the automatic extraction of ingredients from comments, which can be useful for various food-related applications such as recipe recommendation, ingredient substitution, and food analysis. The FoodReview dataset provides a unique opportunity to explore the effectiveness of NER in this context and the results of this project could contribute to the development of new tools and techniques for food computing. [Click to See]


The study aimed to identify emotions from food recipes, which is a challenging task as food recipes often do not explicitly mention emotions. To address this, we used two datasets; a food review data that we created and the Amazon food recipes data. We used different feature extraction techniques to represent the recipes, such as bag-of-words, TF-IDF, word2vec, and One Hot sequence, as input to various machine learning models. The models trained on these features included Bert, fastText, HAN, RNN, BIRNN, AttBiRNN, CNN, and LSTM. These models were chosen because they are popular in sentiment analysis and text classification tasks, and have shown good results in previous studies. The results showed that the BERT and BIRNN models performed the best, achieving high accuracy scores on both datasets. It can be concluded that these models are capable of effectively identifying emotions from food recipes, and the study provides useful information for researchers and practitioners who are interested in this area. [Click to See]


This project aims to address the problem of fake news detection through the application of big data analytics. To achieve this, a multi-layered architecture is employed, known as the Lambda architecture, which is capable of processing both real-time and historical data. This ensures that the system can detect fake news as soon as it is published while also incorporating historical data to improve the accuracy of predictions. For machine learning prediction, the Logistic Regression algorithm is used. The batch processing of data is handled by Hadoop, while the speed layer is managed by PySpark, which enables real-time processing. Data ingestion is facilitated by Apache Kafka, which ingests data from various sources and prepares it for processing. Finally, the serving layer is provided by Apache HBase, which serves as the storage and retrieval system for the processed data. [Click to See]


This study aimed to predict the likelihood of purchasing a product based on social network advertising using age, gender, and salary as factors. The performance of three classification algorithms, K-Nearest Neighbors (KNN), Logistic Regression, and Naive Bayes, was compared. The results indicated that the KNN algorithm was the most accurate and effective in making this prediction. These findings could be utilized to enhance the targeting and efficacy of social network advertising by incorporating the KNN algorithm in the future. [Click to See]


A content-based recommendation system for movies was built using a movie dataset taken from Kaggle. Two popular techniques used in this system are TF-IDF and BOW models. The goal of these techniques is to convert movie descriptions, titles, and genres into numerical vectors that can be compared and processed mathematically. Comparing these two techniques, TF-IDF performs better in content-based recommendation systems as it considers both the importance of a word in a document and how unique it is compared to the corpus. This allows the model to better capture the unique characteristics of each movie. [Click to See]


Twitter disclosed the identities of nearly 3000 Twitter accounts, which are believed to be linked to Russia's Internet Research Agency, during the House Intelligence Committee's investigation into Russia's potential influence on the 2016 US Election. The agency, famous for managing social media troll accounts, had its data removed from Twitter.com and API by the platform after the accounts were suspended. However, a team at NBC News consisting of Ben Popken and EJ Fox managed to compile a dataset from the deleted information for their examination, revealing the actions of these troll accounts during crucial moments of the election. Our project aims to explore the links between tweets and the 2006 US Election through analysis. [Click to See]


In this study, the 3D protein structures of 1HEW and 1BMF were analyzed. The results provided insights into the unique features and functional implications of these proteins. The analysis helped to understand the structural basis of protein-protein interactions and the role they play in cellular processes. This research contributes to the broader field of structural biology and will aid in the development of new therapeutic strategies. [Click to See]


In this study, we analyzed a dataset consisting of sequences from eight different proteins (Insulin, plastin, albumin, alpha-fetoprotein, afamin, vitamin D-binding protein, elastin, prolactin) and seven different animals (Phodopus roborovskii, Ursus americanus, Nannospalax Galili, Lynx canadensis, Panthera tigris, Puma concolor, Pan paniscus) obtained from the NCBI BLAST tool. To understand the relationships between these sequences, we applied several clustering methods including k-Means, DBSCAN, hierarchical clustering, and kModes clustering. We also constructed phylogenetic trees using the sequences to infer their evolutionary relationships. Our results revealed that the different clustering methods produced distinct clusters and trees, and allowed us to gain insights into the relationships between the sequences. This study demonstrates the utility of clustering and tree construction in understanding the relationships between sequences and can have applications in various fields such as molecular biology and evolution. [Click to See]


In this project, we set out to bring forth the application of the Needleman Wunch Algorithm for global sequence alignment, starting from its basics. We procured a comprehensive dataset from the NCBI database in FASTA format, rich in information regarding homologous gene alignments and insulin sequence alignments. With this data as our foundation, we embarked on a series of experiments, employing two distinct scoring functions, with the goal of uncovering the optimal alignment and its corresponding score. [Click to See]


A simple web base detection system built on AWS cloud, which detects the hate and free speech of given input text. This is based on a classification machine learning model created using Amazon SageMaker. The definition and provision of the resources on the AWS cloud are done through the AWS CloudFormation template. [Click to See]


The coronavirus known as Covid-19 starts in Wuhan city of China and affects rapidly around all over the world. In March 2021, According to the Statistic of WHO approximately 120M cases were reported, and the deaths are 2.66M which is extremely frightening. It has caused a devastating effect on both community health and the worldwide economy. This study proposes a computer vision method that can detect COVID-19 from chest X-ray images. For this purpose. we will use the Dense Convolutional Neural Network (DenseNet) as the basic building block module for the classification of COVID-19 Computed Tomography (CT) Chest Images. [Click to See]


Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of computing, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated and real samples into a fixed, pre-trained feature space. Motivated by the finding that the discriminator cannot fully exploit features from deeper layers of the pre-trained model, we propose a more effective strategy that mixes features across channels and resolutions. Our Projected GAN improves image quality, sample efficiency, and convergence speed. It is further compatible with resolutions of up to one Megapixel and advances the Fr\'echet Inception Distance (FID) on twenty-two benchmark datasets. Importantly, Projected GANs match the previously lowest FIDs up to 40 times faster, cutting the wall-clock time from 5 days to less than 3 hours given the same computational resources. [Click to See]


The aim of this Project is to introduce methods of dynamical optimization for problems in which additional constraints either on control or state variables are imposed. The most important of these constraints are those concerning control variables, as usually in practical applications control is bounded. Similarly, large values of state variables are also not feasible, as they might represent e.g. a number of items that should be stored in a storage system of a limited capacity or the value of electrical current that would result in burning the circuit, etc. The framework introduced in this exercise facilitates dealing with various types of constraints by adding appropriate penalty functions to the original function that is to be minimized.  [Click to See]


The aim of the project was to build a model which would be a solution to TensorFlow Speech Recognition Challenge on Kaggle. The competition’s goal was to train a model which would recognize what word was said on 1-second recordings. We focused on three approaches LSTM, CNN with LSTM, and Bidirectional RNN with Clipped ReLU. We implement this approach with different experiments either changing the network architecture or different data preprocessing approaches. And in the end, we obtained 92% accuracy with LSTM. [Click to See]


One of the most well-known applications of machine learning is image identification. Images may be recognized with near-perfect accuracy by well-trained neural networks. Ordinary neural networks, on the other hand, are not ideal for the job because even a low-resolution image has hundreds, if not thousands, of pixels, each of which is a three-color combination. As a result, image identification becomes a challenge with hundreds of variables. Convolutional Neural Networks, DenseNet, VGG, and Resnet have been built to handle such challenges. We built our networks to compete in Kaggle's CIFAR-10 - Object Recognition in Images competition. For our neural networks, we experimented with a variety of parameters, primarily depth and width. We also put a variety of strategies to the test in order to avoid overfitting.  [Click to See]


The idea of a project is to create a web app and connect it with a model instance. Users can provide information about their desired motorbike and the website will return a quotation based on features. The infrastructure workflow can be described by components. [Click to See]


In this project, we implement five different feature selections techniques which are ANOVA F-value between label/feature – (SelectKBest), FPR test – (SelectFpr), Familywise error rate – (SelectFwe), Select from Random Forest Classifier model, and Kernel PCA method. And for the prediction methods, we experiment with different methods which are Logistic Regression, Quadratic Discriminant Analysis, Linear Discriminant Analysis, Random Forest Classifier, Support Vector Machine, and XGBoost Classifier. [Click to See]


In this project, we implement different optimization techniques which are IWLS, GD, SGD, and Adam for logistic regression and experiment on four different datasets. And based on our experiments gradient descent and stochastic gradient descent gave much better results. [Click to See]


As the COVID-19 pandemic rapidly spreads across the world, regrettably, misinformation and fake news related to COVID-19 have also spread remarkably. Such misinformation has confused people. To be able to detect such COVID-19 misinformation, an effective detection method should be applied to obtain more accurate information. This will help people and researchers easily differentiate between true and fake news. The goal of this project was to find capable methods and settings that could be used to help the detection of Covid-19 Fake News. For this purpose, we have used COVID-19 fake news data taken from mendelay which is labeled as fake or true news. In this work, we have evaluated Multilayer Perceptron and Support Vector Machine techniques for such purpose. The statistical analysis of results indicates that the Support Vector Machine shows good accuracy results. [Click to See]


Classification is one of the most important approaches to machine learning. The main task of machine learning is data analysis. Various algorithms are available for classification. In this project, we implement different machine learning classifiers to classify images based on their HOG features for Tree Species Recognition. The Scikit tool is used for implementation purposes. The dataset is too large and we take only 10 label classes for implementation. And Support Vector Machine gave the best result. [Click to See]


Image segmentation is the most significant job in many image processing systems, such as pattern recognition, image retrieval, and small surveillance. The outcome of Segmentation is mainly used for image content understanding and visual entity recognition through the identification of the region of interest. In this study, we used Leaf Dataset for Image segmentation. The dataset contains 300 images of Leaf. And also contains the ground truth images which as well we use later to determine the accuracy by comparing the result using the IoU method. The objective of this work is to develop efficient methods for the segmentation of leaf images. [Click to See]


In this project, which is completely on python, we did data preprocessing working on the following data frames: Badges, Comments, PostLinks, Posts, Tags, Users, and Votes taken from the travel stack exchange site. We discovered two alternative ways (SQLite, pandas) of data manipulation while working on this project, two of them are very different from one another. And we found that SQL is easier to implement and understand for data manipulation.  [Click to See]


 In this project which is completely on R, we did data preprocessing working on the following data frames: Badges, Comments, PostLinks, Posts, Tags, Users, and Votes taken from the travel stack exchange site. We discovered four alternative ways (sqdf, base, dplyr, data.table) of data manipulation while working on this project, all of them are very different from one another. However, we found that dplyr has an advantage over all of them due to the availability of more functions and simple language to manipulate. But when we compare the execution time of each method then Data.Table take lead from others. [Click to See]

Conversational AI Chatbot development using Artificial Intelligence and Machine Learning techniques is an interesting problem of Natural Language Processing. In many research and development projects, scientists are using AI, Machine Learning algorithms, and Natural Language Processing techniques for developing Conversational AI chatbots. The research and development of automated help desk and customer services through these conversation agents are still in progress and experimentation. Conversational AI Chatbot is mostly deployed by financially organizations like banks, credit card companies, businesses like online retail stores, and startups. Virtual agents are adopted by businesses ranging from very small start-ups to large corporations. There are many AI Chabot development frameworks available in the market both program-based and interface-based. But they lack the accuracy and flexibility in developing real dialogues. Among these popular intelligent personal assistants are Amazon’s Alexa, Microsoft’s Cortana, and Google’s Google Assistant. The functioning of these agents is limited, and retrieval-based agents are not aimed at holding conversations that emulate real human interaction. Among current chatbots, many are developed using rule-based techniques, simple machine learning algorithms, or retrieval-based techniques which do not generate good results. In this project, we have developed a Conversational AI Chatbot using modern-day techniques. For developing Conversational AI Chatbot, We have implemented encoder-decoder attention mechanism architecture. This encoder-decoder is using Recurrent Neural Network with LSTM (LongShort-Term-Memory) cells. [Click to See]


In the recently advanced society, online social media sites like YouTube, Twitter, Facebook, LinkedIn, etc are very popular. People turn to social media for interacting with other people, gaining knowledge, sharing ideas, for entertainment, and staying informed about the events happening in the rest of the world. Among these sites, YouTube has emerged as the most popular website for sharing and viewing video content. However, such success has also attracted malicious users, which aim to self-promote their videos or disseminate viruses and malware. These spam videos may be unrelated to their title or may contain pornographic content. Therefore, it is very important to find a way to detect these videos and report them. In this Project, we have evaluated several top-performance classification techniques for such purpose. The statistical analysis of results indicates that the Multilayer Perceptron and Support Vector Machine show good accuracy results. [Click to See]


In this project, we used a deep convolutional neural network model trained on the CIFAR-10 (Canadian Institute For Advanced Research) dataset to recognize multiple objects present in various images. Recognize 10 different objects which are airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, , and trucks. Trained the model on the CIFAR-10 (Canadian Institute For Advanced Research) dataset. There are 50000 training images and 10000 test images. Achieved an accuracy of 0.8005 after training for 100 epochs. Achieved a loss value of 0.5659 after training for 100 epochs. Achieved validation accuracy of 0.8147 after training for 100 epochs. Achieved a validation loss value of 0.5372 after training for 100 epochs. Used Keras with Tensorflow backend for the code. CIFAR-10 is a common benchmark in machine learning for image recognition. The code in this directory demonstrates how to use TensorFlow to train and evaluate a convolutional neural network (CNN) on GPU. [Click to See]


Used deep neural network model trained on MNIST dataset to classify images of handwritten digits. It generates automatic predictions for images. Trained the model on the MNIST (Modified National Institute of Standards and Technology) dataset, which has a training set of 60,000 examples, and a test set of 10,000 examples. Achieved an accuracy of 0.9871 after training for 5 epochs. Achieved a loss value of 0.0404 after training for 5 epochs. Used Keras with Tensorflow backend for the code. Everything is implemented in the Jupyter notebook which will hopefully make it easier to understand the code. [Click to See]


In this project create fake images from the real image using Deep Convolutional GANs (Generative Adversarial Networks). For creating new images from the real image we use the CIFAR-10 dataset. In this project, we PyTorch library and build a model (Generator & Discriminator) from scratch. During training a model we run only one number of an epoch because of too much training time. we run only one number of epochs and achieved more than 80% accuracy. [Click to See]


In this project, we made a detection system that detects real-time objects. We implement Single Shot Detection (SSD) and Open CV in this project. SSD is a popular algorithm in object detection. It's generally faster than Faster RCNN. We use a randomly 2-second video and apply our model to it to detect everything (objects) in the video with the rectangle and specify with the name. [Click to See]


In this project, we make a face Recognition application using open CV Libraries. Basically, these face Detection projects detect your face with Green Rectangle eyes with Blue Detection and Red Rectangle detect Smile faces which your face make smile. we use different methods of OpenCV like haarcascade_eye for detecting eyes, haarcascade_frontalface_default for face detection, and haarcascade_smile to detect smiling faces. [Click to See]


Text summarization is the problem of creating a short, accurate, and fluent summary of a longer text document. Examples of text summarization you may encounter every single day. The application and promise of deep learning methods for automatic text summarization. In this project, we take the whole information of the Pakistan Super League topic from the Wikipedia site and summarize it into a short paragraph using the Machine Learning models. [Click to See]


Sentiment analysis deals with identifying and classifying opinions or sentiments expressed in the source text. Social media is generating a vast amount of sentiment-rich data in the form of tweets, status updates, blog posts, etc. Sentiment analysis of this user-generated data is very useful in knowing the opinion of the crowd. Twitter sentiment analysis is difficult compared to general sentiment analysis due to the presence of slang words and misspellings. The maximum limit of characters that are allowed on Twitter is 140. The knowledge base approach and Machine learning approach are the two strategies used for analyzing sentiments from the text. In this paper, we try to analyze Twitter posts about Google reviews using the Machine Learning approach. [Click to See]


Automated text classification has been considered a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. In general, text classification plays an important role in information extraction and summarization, text retrieval, and question answering.  In this project we implement Logistic Regression to predict the movie reviews comments either is positive or negative. And in the end, we achieved 84.74% on the test set. [Click to See]


From Amazon product suggestions to Netflix movie recommendations - good recommender systems are very valuable in today's World. And specialists who can create them are some of the top-paid Data Scientists on the planet. We work on a dataset that has exactly the same features as the Netflix dataset: plenty of movies, and thousands of users, who have rated the movies they watched. The ratings go from 1 to 5, exactly like in the Netflix dataset, which makes the Recommender System more complex to build than if the ratings were simply “Liked” or “Not Liked”. Our final Recommender System is able to predict the ratings of the movies the customers didn’t watch. Accordingly, by ranking the predictions from 5 down to 1, our Deep Learning model is able to recommend which movies each user should watch. We implement Restricted Boltzmann Machine and Stacked Auto Encoder from scratch using Pytorch. The list of movies is explicit so simply need to rate the movies you already watched, input your ratings in the dataset, execute our model, and voila! The Recommender System will tell you exactly which movies you would love one night you if are out of ideas of what to watch on Netflix! [Click to See]


According to a recent report published by Markets & Markets, the Fraud Detection and Prevention Market is going to be worth $ 33.19 Billion USD by 2021. This is a huge industry and the demand for advanced Deep Learning skills is only going to grow. We use Unsupervised Deep Learning Models which is Self-Organization Map. The business challenge here is about detecting fraud in credit card applications. We create a Deep Learning model for a bank and are given a dataset that contains information on customers applying for an advanced credit card. This is the data that customers provided when filling out the application form. We detect potential fraud within these applications. That means that by the end of the challenge, we build a Self-Organization map that comes up with an explicit list of customers who potentially cheated on their applications. [Click to See]


In this project, we create one of the most powerful Deep Learning models. We will even go as far as saying that you will create the Deep Learning model closest to “Artificial Intelligence”. Why is that? Because this model will have long-term memory, just like us, humans. The branch of Deep Learning which facilitates this is Recurrent Neural Networks. Classic RNNs have a short memory and were neither popular nor powerful for this exact reason. But a recent major improvement in Recurrent Neural Networks gave rise to the popularity of LSTMs (Long Short-Term Memory RNNs) which has completely changed the playing field. In this part, we implement this ultra-powerful model, and we take the challenge to use it to predict the real Google stock price. A similar challenge has already been faced by researchers at Stanford University and us to do at least as good as them. In the end, we train 200 numbers of epochs after building our model we get more than 85% of real-time Google Stock Price Prediction. And in the end, we compare our result with real-time Google stock price and our predicted line is almost close to real-time Google stock price. [Click to See]


Convolutional Neural Networks (CNN) have been successfully applied to image classification problems. Although powerful, they require a large amount of memory. The purpose of this project is to classify CIFAR-10 images. The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. To classify this task, we implement the VGG model which is an extension of CNN.  Our experimental analysis shows that 86.07% image classification accuracy is obtained by our model. [Click to See]


The Titanic disaster occurred 110 years ago on April 15, 1912, killing about 1500 passengers and crew members. The fateful incident still compels the researchers and analysts to understand what can have led to the survival of some passengers and demise of the others. With the use of machine learning methods and a dataset consisting of 891 rows in the train set and 418 rows in the test set, the research attempts to determine the correlation between factors such as age, sex, passenger class, fare, etc. to the chance of survival of the passengers. These factors may or may not have impacted the survival rates of the passengers. In this project, Artificial Neural Networks have been implemented to predict the survival of passengers. This work compares the result of Keras and scikit-learn based on the percentage of accuracy on a test dataset. [Click to See]


Gave too much information to model is problem specially the overfitting issue. What do we do? Yes, we can give only relative information to the model which will overcome this issue. In this project, we use the Wine dataset taken from UCI. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. The initial data set had around 30 variables, but for some reason only has the 13-dimensional version. In order to reduce the dimensionality reduction of too many features dataset, we implement Principal Component Analysis (PCA) and  Linear Discriminant Analysis (LDA) and obtained exceptionally satisfactory results. [Click to See]


In this project, we implement k-Mean and hierarchical Clustering techniques to choose the optimal no of the cluster in order to predict the customer's behavior based on spending scores in malls to predict customer. we take a dataset of Mall Customers from Kaggle. This Dataset has the information about Mall Customer Spending Score which has five attributes and 200 instances. The first attribute is Customer ID which has every Customer has Unique Second is Gender which is, of course, male/female the third attribute is age which is between 19 to 70 of different customers 4th attribute is Annual Income in k$ which have the different customer have a different Income some have very low some have middle and some have very high income and last attribute Spending Score which they spend on Mall. We choose five no of clusters based on the optimal value of elbow and dendrogram. And the conclusion from each cluster is cluster is representing the Sensible customers who are income is low and whose spending score is also low. The second Cluster has represented those customers whose income averages and spends an average. The third Cluster represents a Careful customer whose income is high and spends low in the mall. The fourth customer is representing a careless customer whose income is low and spending score is high. And in the last customer is Target. The Target attribute contains those customers whose income is high, and the spending score is also high in this way we can easily find our target cluster using the clustering techniques. [Click to See]


The Recommendation system is one of the most popular applications of Machine Learning which attracts many researchers all over the globe. The advent of the Internet era has brought the wide implementation of recommendation systems in our everyday lives. In this project, we implement one of the techniques of Association rule called FP Growth Algorithm to build a recommendation system for Movies. [Click to See]


With the development of internet shopping, the amount of user data generated is increasing day by day. In this project, a shopping recommendation is constructed. For this purpose, we take a dataset of Market Basket Optimization from the Super Data Science website. This dataset contains the information of different transactions of items like (eggs, pizza, mint, green tea, milk, soup, etc.) which basically have the information of 7500 instances. In order to build a recommendation system for customers based on previous purchase history, we implement Apriori Algorithm. [Click to See]


Diabetes, also known as chronic illness, is a group of metabolic diseases due to a prominent level of sugar in the blood over a prolonged period. The risk factor and severity of diabetes can be reduced significantly if a precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and the presence of outliers (or missing values) in the diabetes datasets. In this project, we implement a Support Vector Machine to predict the Diabetes patient. SVM is a robust model to differentiate the binary class and in the end and we obtained 82% accuracy. [Click to See]


Bike sharing is an increasingly popular part of urban transportation systems. Accurate demand prediction is the key to supporting timely re-balancing and ensuring service efficiency. The dataset contains information on bike sharing taken from Kaggle which have 16 number of attributes and 730 number of records. In this project, we implement the Support Vector Regression model using Scikit-Learn to predict Bike Sharing.  [Click to See]


Accurate corrections for ionospheric total electron content (TEC) and early warning information are crucial for global navigation satellite system (GNSS) applications under the influence of space weather. In this project, we implement Logistic Regression, to predict the global ionospheric TEC by establishing a short-term ionospheric prediction model. The Target variable is a binary class either “Good” or  “bad.” "Good" radar returns are those showing evidence of some type of structure in the ionosphere. "Bad" returns are those that do not; their signals pass through the ionosphere. And in the end, we obtained 87% percent accuracy. [Click to See]


Banknotes are the currency that every country uses to conduct financial activities, and it is the asset that every country wants (Banknote) to be real. Many hooligans put on the market counterfeit banknotes that are identical to the original ones. Therefore, there is a need for an efficient authentication system to accurately predict whether the given note is real or not? In this project, we implement k-Nearest Neighbor(kNN) to predict whether the given note is fake or real. And we trained the model with 5 no of neighbors and achieved 100% accuracy. [Click to See]


In this project, we implement the Random Forrest Regression model using Scikit-Learn to predict the Montreal Bike Lanes. The dataset contains information about the number of bicycles that used certain bicycle lanes in Montreal in the year 2015. In the end, we obtained a 0.9646 R square score. [Click to See]


In the current era, Heart Failure (HF) is one of the common diseases that can lead to a dangerous situation. Every year almost 26 million patients are affected by this kind of disease. From the heart consultant and surgeon’s point of view, it is complex to predict heart failure at the right time. Fortunately, classification and predicting models are there, which can aid the medical field and can illustrate how to use medical data in an efficient way. In this project, we implement Random Forrest Classifier model to predict Heart Failure. And we trained the model with max depth = 12 and achieved 86% accuracy. [Click to See]


Predictive analytics for healthcare using machine learning is a challenging task to help doctors decide the exact treatments for saving lives. Chronic Kidney Disease prediction is one of the most critical issues in healthcare analytics. The most interesting and challenging task in day-to-day life is prediction in the medical field. In this project, we implement the Decision Tree Classifier model for predicting chronic kidney disease using clinical data. In the end, we obtained 96% accuracy which is incredibly competitive for detection and treatment. [Click to See]


The New York City Department of Transportation collects daily data about the number of bicycles going over bridges in New York City. This data is used to measure bike utilization as a part of transportation planning. This dataset is a daily record of the number of bicycles crossing into or out of Manhattan via one of the East River bridges for a stretch of 9 months. A count of the number of bicycles on each of the bridges in question is provided on a day-by-day basis, along with information on maximum and minimum temperature and precipitation. To solve this problem, we implement the Decision Tree Regression model, and we obtained a 0.9898 R square score. [Click to See]


Cancer is the second cause of death in the world. 8.8 million patients died due to cancer in 2015. Breast cancer is the leading cause of death among women. Several types of research have been done on the early detection of breast cancer to start treatment and increase the chance of survival. In this project, we implement Naïve Bayes Classifier for Breast Cancer Prediction and obtained 87% percent accuracy which is incredibly competitive and can be used for detection and treatment. [Click to See]


Salary Prediction over years of experience in Regression tasks. In the project, we have two attributes dataset ‘X’ which is Year of Experience, and ‘Y’ is Salary which is also our Target Variable. To predict the salary, we implement a simple Linear Regression model with the help of Scikit-Learn. The model Is predicted trained, and we obtained a 0.9864 R square value.[Click to See]