Work Experience/ Projects

In the past 7 years, I have worked in Data Science, Machine Learning, Deep Learning and Java domains. Here are the projects I have worked

- as Research Assistant at Northern Illinois University

- as a developer at Infosys Limited

Projects as Data Science Research Assistant (at NIU):

Project 1: A multi-level semantic web for hard-to-specify domain concept, Pedestrian, in ML-based software

In this project, we extracted terms closely associated with the term "Pedestrian" and filtered them based on various word2vec models. After the filtering, we applied a semantic analysis to check for the terms' meaning as close to a pedestrian. After extracting such terms, we augmented these terms' images to existing datasets to improve the pedestrian detection system.

Barzamini, Hamed, Murtuza Shahzad, Hamed Alhoori, and Mona Rahimi. "A multi-level semantic web for hard-to-specify domain concept, Pedestrian, in ML-based software." Requirements Engineering (2022): 1-22.

Techniques used - Data Preprocessing, Deep Learning, Neural Networks, and Computer Vision.

Project 2: Dynamic Domain analysis for addressing Concept Drift in Engineering AI-enabled Software

In this project, the research question is to find the change in the perception of a concept ("Pedestrian") throughout a timespan. For this purpose, we collected tweets related to pedestrian accidents caused by autonomous vehicles. This data was filtered, and we extracted the terms that show significant importance for the accident event. This significance was statistically calculated by observing the term with a probability of occurrence outside the 3 standard deviations.

Techniques used - Text Processing, Data Analysis, and Statistics.

Project 3: Predicting the Software Vulnerability on open source repositories

Online software repositories often infuse bugs and lead to vulnerable code. The fixed version of the code, in turn, may produce a new vulnerability issue to the code. In this project, I extract the code from different GitHub repositories that make the code vulnerable. After extraction, I analyze the code and convert the code to vectors using CodeBERT. Furthermore, I build ML models that predict if a future vulnerability will happen, given the type of bug-fixed code.

Techniques used - Classification, Natural Language Processing, Code2Vec, Software Engineering.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Details of the Altmetrics dataset:

Altmetrics have been proposed as a complement to scholarly metrics, such as citations. Altmetrics is a growing area of interest that intends to measure the societal impact of research based on the dissemination of a research outcome via multiple social media platforms such as Facebook and Twitter, reference managers such as Mendeley, and information websites such as Wikipedia, online news outlets, blogs, and other peer review websites.

More about Altmetrics

Project 1: Predicting Scholarly Impact Using Altmetrics

In this project, we used Altmetrics to predict citations that a scholarly publication could receive. I built various classification and regression models and evaluated their performance. I found that tree-based models performed best in classification. We found that Mendeley's readership, publication age, post length, maximum followers, and academic status were the most important factors in predicting citations.

Murtuza Shahzad, Pavan Ravikanth Knodamudi, Christian Bailey and Hamed Alhoori, "Predicting Scholarly Impact Using Altmetrics" in 2018 ACM conference on ICCDA'18.

Techniques used - Data Preprocessing, Classification, Regression.

Project 2: Quantifying the online Long-Term Interest in Research

In this project, I used the Altmetrics dataset and built various models on it to predict the long-term impact of an article on various online platforms. I built the clusters for various publication years of the research articles with respect to their citation count. On each cluster, I built Machine Learning and Deep Learning models to predict if they had more than median number of citation counts. Through a detailed analysis of the results, I found that Mendeley counts are the key factor in determining the long-term impact of an article online. Policy counts also contributed a lot to the data. Random Forest and Bernoulli Classifiers outperformed other classifiers for this prediction.

Murtuza Shahzad, Hamed Alhoori, Reva Freedman and Shaik Abdul Rahman, "Quantifying the Online Long-Term Interest in Research " in 2022 Journal of Informetrics.

Techniques used - Classification, Clustering, and Neural Networks.

Project 3: Predicting the Emotional Impact Using Altmetrics

The goal of this was to provide authors with a sentiment score their research article would receive after publication. For this purpose, I used the Facebook reactions feature and Twitter sentiment analysis. This research was very interesting as the outcome help many authors to predict the reaction of the community before formally submitting their research paper for peer review or final approval. Random Forest and Naive Bayes Classifier were better in predicting the Facebook reaction(“Like, Love, Haha, Wow, Sad, and Angry”) an article would receive.

Murtuza Shahzad and Hamed Alhoori "Public Reaction to Scientific Research Via Twitter Sentiment Prediction" in 2022 Journal of Data and Information Sciences.

Techniques used - Classification, NLP and Neural Networks.

Projects as developer at Infosys Limited:

Infosys Limited

Working at Infosys Limited was a great experience. I was exposed to real-time project development and the environment. The opportunity to interact with various people from diverse backgrounds was in itself a huge learning experience.

Project 1: Python flight and hotel reservation

As a trainee at Infosys, I developed an application for Flight and Hotel booking from requirements gathering to all phases of the waterfall model, providing users with various options for booking(both flights and hotels).

Technologies Used - Python and SQL.

Project 2: JCart

In this project, I developed a shopping web application called "JCart". I analyzed business requirements, added validations and wrote junit test cases to deal with all phases of test-driven development.

Technologies Used - SQL, JSF, and Hibernate Framework.

Project 3: Bristow Helicopters- eFlight

Bristow is a helicopter service providing company for different operations like search and rescue (SAR), and oil and gas. Different operations required for providing helicopter services were catered to by the project flight . The operations are pre-flight, flight, post-flight. I developed the back-end application for calculating the Center of Gravity(CoG) of the helicopter and thereby adjusting the baggage and freight accordingly. In addition, I developed the admin part of the system, where the admin has privileges to give access to the system to various members of the helicopter based on their designation(pilot, co-pilot, crew).

Technologies Used - HTML, CSS, JQuery, Spring, Hibernate framework.