Our main aim is to analyze the data of Visa applicants, build a predictive model to facilitate the process of visa approvals, and based on important factors that significantly influence the Visa status recommend a suitable profile for the applicants for whom the visa should be certified or denied.
Skills & Tools Covered
Skills Covered:
EDA
Data Preprocessing
Customer Profiling
Bagging Classifiers (Bagging and Random Forest)
Boosting Classifier (AdaBoostGradient BoostingXGBoost)
Stacking Classifier
Hyperparameter Tuning using GridSearchCV
Business insights
Tools Used:
Python: Jupyter Notebook
Libraries: Numpy, Pandas, Matplotlib, Seaborn, scikit-learn.
Executive Summary
U.S. businesses struggle to find skilled workers, and the current visa application process is overloaded. OFLC seeks an efficient way to identify promising foreign worker candidates.
EasyVisa proposes a machine learning model to analyze visa applications and predict approval likelihood. This will expedite processing and help OFLC prioritize applications with higher approval chances.
Benefits:
Faster visa processing for qualified candidates.
Improved talent acquisition for U.S. businesses.
Data-driven recommendations for visa approvals and denials.
EasyVisa's expertise will leverage existing visa application data to build a model that identifies key factors influencing approval decisions. This will inform OFLC's process and optimize U.S. talent acquisition.
Problem statement
In the United States, businesses are having a hard time finding qualified workers to fill their open positions.
To address this challenge, companies are able to hire talented individuals from other countries under the Immigration and Nationality Act.
This law helps to protect US workers' rights and makes sure that they are offered fair wages.
The Office of Foreign Labor Certification (OFLC) is responsible for managing these programs, and they make sure that US workers are given priority while also meeting the needs of businesses.
We Aim at training various machine learning models on historical data from the Office of Foreign Labor Certification (OFLC) considering factors like applicant skills, industry demand, and potential wage impact.
By combining predictions from these models, we can create a more robust system for identifying suitable foreign candidates while ensuring US worker rights and fair wages.
About 77% of all employees falls into Bachelor's and Master's graduates.
More than half are experienced employees.
Only 11.6% of employees need training prior to starting job.
Northeast and South tend to take more than half of the employees.
Doctorate and Master's candidates are the most privileged to get certified.
Europe and Africa lead the certification among other continents.
Prevailing wages don't like affect the case status. Certified and denied Applicants have almost same mean prevailing wages of around 55000
South has high privailing wage and likely high number of certified candidates.
We want to predict which visa will be certified.
Before we proceed to build a model, we'll have to encode categorical features.
We'll split the data into train and test to be able to evaluate the model that we build on the train data.
Model predicts that the visa application will get certified but in reality, the visa application should get denied.
Model predicts that the visa application will not get certified but in reality, the visa application should get certified.
Which case is more important?
Both the cases are important as:
If a visa is certified when it had to be denied a wrong employee will get the job position while US citizens will miss the opportunity to work on that position.
If a visa is denied when it had to be certified the U.S. will lose a suitable human resource that can contribute to the economy.
How to reduce the losses?
F1 Score can be used a the metric for evaluation of the model, greater the F1 score higher are the chances of minimizing False Negatives and False Positives. We will use balanced class weights so that model focuses equally on both classes.
prevailing_wage:
The odds ratio is approximately 1.000001, indicating that for each unit increase in prevailing wage, the odds of the event (whatever it represents in your model) increase by a very small amount. The percentage change in odds is approximately 0.0085%, indicating a very slight increase in the odds for each unit increase in prevailing wage.
Continent_Europe:
The odds ratio is approximately 2.117, indicating that the odds of the event are about 2.117 times higher for individuals from Europe compared to the other continent. The percentage change in odds is approximately 111.71%, indicating a substantial increase in the odds for individuals from Europe compared to the reference group.
XGBoost Classifier and Tuned XGBoost Classifier exhibit the highest F1-scores (0.808 and 0.821, respectively) on the testing set, along with relatively high accuracy, recall, and precision.
Tuned XGBoost Classifier shows slightly better performance due to additional optimization efforts.
These models are capable of effectively identifying visa applications likely to be certified or denied, making them suitable for deployment in a real-world scenario.
Master's degree is the most important feature followed by High school
This project lead us to discover most predictive model which EasyVisa can use to predict the outcome.
The analysis identified important factors influencing visa certification or denial, such as prevailing wage, education level, job experience, continent.
Models like XGBoost Classifier and Tuned XGBoost Classifier demonstrated high predictive accuracy in identifying visa outcomes.
Features such as prevailing wage, education level, and job experience were found to be significant predictors of visa certification or denial.
Implement Machine Learning-based solutions like the XGBoost Classifier or Tuned XGBoost Classifier to aid in the OFLC's visa approval process.
Prioritize visa applications based on predicted probabilities of approval to streamline the certification process and reduce processing time.
Continuously monitor and update the model using new data to improve predictive accuracy and adapt to changing trends in visa applications.
Provide feedback mechanisms to employers and applicants to help them understand the factors influencing visa outcomes and improve future applications.