Professional Experience
1. Senior Software Developer(Nov 2021 - Present)
ValueMomentum | IT Services & Software, New JerseyValueMomentum provides software and services to Insurance, Healthcare & Finance Service firms.
These include application development and management, systems integration, digital engagement, customer communications management, quality assurance & testing and cloud-enabled IT operations. We’re known for customer-first approach, expertise and industry depth.
Role: Senior Data Engineer
Project: ERIE
Environment: NoSQL, Python, Numpy, pandas, Sci-kit-learn, Machine Learning
Client: ERIE Insurance
Responsibilities:
Design data ingestion workflows using NoSQL for full load and incremental data loads.
Responsible for data transformation, curation, and maintaining performance and scalability, continuously monitor and enhance the pipelines, implementing error-handling mechanisms and safeguard data privacy.
Responsible for streamlined data ingestion process and real-time querying capabilities led to a 30% reduction in claims processing time and data-driven approach resulted in a 25% reduction in decision-making cycles.
2. Software Developer(May 2021 - Nov 2021)
Janssen: Pharmaceutical Companies of Johnson & JohnsonJanssen Research and Development is a leading pharmaceutical company in the United States, with a focus on innovation on some of the most devastating diseases and the most complex medical challenges of our time. The Janssen portfolio of pharmaceutical products also includes a large number of legacy products that are managed within the Established Products group
Role: Software Developer
Project: Drug Repurposing
Environment: KUDU, Impala, NIFI, Marklogic, Kafka, Python.
Responsibilities:
Implemented algorithm evaluation methods and data collecting, cleaning, labeling, modeling.
Conceptualizing large databases and utilizing AWS, Impala, SQL, Python libraries, Kafka, Nifi, Marklogic for moving data from source to target.
Created analysis reports, perform hypothesis testing, quantitative analysis, and data visualization to draw insight into DB.
3. Data Analyst/ Graduate Research Assistant (January 2020 - May 2021)
University of North TexasAt UNT Data Science lab we focus on the knowledge, theory, and technology dealing with the collection of facts and figures, and the processes and methods involved in their manipulation, storage, dissemination, publication, and retrieval. We build on emerging areas such as information organization, data science, information architecture, information seeking, and use, health informatics, social network analysis, knowledge management, digital content, and digital curation and information systems.
Role: Data Analyst
Project: Legal Intelligence using Artificial Intelligence, NLP
Environment: Python 3.7.7, Qlik Sense, Power BI, Cloudera, NLP, Machine Learning
Responsibilities:
Responsible for collecting, cleaning, labeling, and conceptualizing large databases.
Utilizing SQL queries, Python libraries, MS Access and MS Excel to filter and clean.
Analyzed datasets and validated with data mining tools to ensure matching descriptions with incidents.
Developed database objects, including tables, views, and materialized views
Responsible for creating analysis reports, maintaining and communicating with lab researchers
Classification and cleaning of near-miss records.
Developed a text-mining program to classify, clean, and visualize large datasets. Perform hypothesis testing, quantitative analysis, and data visualization are drawn insight into the database
Achieved an accuracy with a trained model of 89.5% for the sampled dataset.
Validated the model with confusion matrix and data collection agreement report by the Inter-rater reliability.
4. DataScience Global Product Management Intern (May 2020 - August 2020)
Janssen: Pharmaceutical Companies of Johnson & JohnsonJanssen Research and Development is a leading pharmaceutical company in the United States, with a focus on innovation on some of the most devastating diseases and the most complex medical challenges of our time. The Janssen portfolio of pharmaceutical products also includes a large number of legacy products which are managed within the Established Products group
Role: DataScience Global Product Management Intern
Project: Established Products Artificial Intelligence Project.
Environment: Python 3.7.7 , Qlik Sense, Power BI, Cloudera
Responsibilities:
1. As a DataScience Global Product Management intern, I will contribute to the Janssen EstablishedProducts Strategic Planning framework by engaging with multi-disciplinary teams to identify, define and develop innovative tools that drive qualitative and quantitative analysis and generate insights for our portfolio of products.
2. Work closely with members of the Janssen community on projects key to the advancement of global product management processes and systems.
3. Engage with multi-disciplinary teams to identify, define and deliver additional brand value in the context of the overall portfolio of products.
4. Perform qualitative and quantitative analysis on a product portfolio data set.
5. Develop, present and defend a business case for a new process improvement initiative.
6. Participate as a team member and contributor in areas of expertise.
7. Gain an organizational understanding of J&J, Janssen, GMO and Established Products.
8. Gain understanding of the EP groups in the context of the matrix cross functional & cross regional organization.
9. Ensure understanding of key initiatives within the portfolio that would require input, including but limited to:
Portfolio analysis and insight generation
EPiiN tool development, including visualization and database development
Transition of LMD database into EP and development of new search interface
SharePoint data migration from current to new SharePoint site.
Artificial Intelligence projects for the development of Clinical Overviews.
Execution of EP Communications strategy (i.e. EP identity, tools, news)
5. Machine Learning Engineer (January 2019 - August 2019)
ValueMomentum | IT Services & Software, IndiaValueMomentum provides software and services to Insurance, Healthcare & Finance Service firms.
These include application development and management, systems integration, digital engagement, customer communications management, quality assurance & testing and cloud-enabled IT operations. We’re known for customer-first approach, expertise and industry depth.
Role: Machine Learning Engineer
Project: GIECO
Environment: Python, Numpy, pandas, Sci-kit-learn, Machine Learning
Client: GIECO Insurance
Responsibilities:
As part of Machine Learning Proof Of Concepts, I have worked on Python, Numpy, pandas, Sci-kit-learn as part of Machine Learning to build a Decision Tree model for the prediction of claims that can be fraudulent or not for a particular policy which will be helpful to save a million dollars in the insurance sector.
Using the graphical visualization, statistical packages, and algorithms like RPART (Recursive Partitioning and Regression Tree) and Decision Tree I have implemented a classification model using R Programming which captures the hidden patterns in the healthcare data from Electronic Health Record (EHR) which gives a better understanding of the data for doctors and the big picture of patient’s health.
Designed and implemented machine learning algorithms including Supervised and Unsupervised models.
Implemented Feature Engineering, Feature extraction, and Feature selection for different business problems.
Implemented various algorithms like Support Vector Machines, Random Forests, and Gradient Boosting to obtain better results.
Involved in all stages of development in machine learning-based systems like Data-Wrangling, visualization, labeling data, selecting algorithms, training, model development, regression, debugging, and updates.
Extensively used python libraries like Scikit-learn, Numpy, SciPy, Pandas, Matplotlib, and Seaborn.
Used Gradient Descent techniques like Adam, rmsprop, and Stochastic Gradient Descent (SGD).
6. Software Engineer (March 2018 - January 2019)
ValueMomentum | IT Services & Software, IndiaRole: MarkLogic Developer
Project: Erie Data Hub
Environment:MarkLogic 8.0, 9.0, Windows
Client: Erie Insurance Group
Project Description: Erie Data Hub is the new data platform that is being built for different portfolios in Erie Insurance Group like Claims, Billing, Agency, Customer and ESignature. MarkLogic is used as the Database for Creating the Data Hub using MarkLogic Data Hub Framework. Here MarkLogic is used as Operational Data Store.
Involved in Erie Data Hub Development in Input flow and Harmonization Flow.
Developed content Ingestion, Content Transformations, and Data loading using XQuery
Involved in Corb for Bulk Ingestion and Bulk Transformation of content
Working on MarkLogic Build deployment tools like ML Gradle.
Working on XSLT for the transformation of content.
Created indexes over elements, Attributes, paths, Lexicons, Fields in the admin console
working with MarkLogic ContentPump Tool for Bulk Loading of data.
Created web-services using REST methodologies
Experienced in aspects of Software Development Life Cycle and also in Scrum Methodology
I have worked on Research and development for tool exploration and modification like Splunk, Data Analyzer, Data Explorer(Ferret) Tool which has been used for data analysis in the project.
Developed MarkLogic testing automation framework for REST Service Testing, Data Testing like comparing transformation rules, reconciliation of document count, source to target mapping, Functional Testing.
Reduced the testing effort by minimizing 18 hours of manual testing to 1 hour of automated testing with 90% accuracy.
Developed XML Schema Validator Service to identify the gaps between the mapping document and the target schema by reducing the time from 20 hours of manual validation to a 1-hour automatic report generator with issues in Excel.
Machine Learning Responsibilities:
Implemented Feature Engineering, Feature extraction, and Feature selection for different business problems and tested over 95% of the real-time data.
Work with Business Intelligence tools such as Qlikview and Business Objects to develop reports and provide data to stakeholders
Create and leverage analytical approaches to perform analysis of domain-specific source data to provide accurate statements and conclusions
Extensively used python libraries like Scikit-learn, Numpy, SciPy, Pandas, Matplotlib and Seaborn.
Designed and implemented machine learning algorithms including Supervised and Unsupervised models gaining efficiency of 96% and saved 1.2 million in identifying the fraud cases in the insurance and healthcare.
Redesigned the company’s strategic approach to billing that resulted in boosting revenue by 7% and reducing accounts receivable by 18%.
Trained 10+ business users in data modeling in Excel using PowerPivot for creating ad-hoc reports.
Converted 300+ reports from Oracle Reports into SSRS.
Database Responsibilities:
Perform coding migration, database change management & data management through the various stages of the development life cycle.
Work with the development team to resolve bug-fixes and assist in relevant testing through various environments
Monitor SQL server and database performance and proactively address potential performance issues
Review, design, and develop data models in conjunction with application development teams.
Provide solutions for database issues including capacity, redundancy, replication, and performance.
Provide technical leadership. Mentor junior staff on database technical details. Provide technical assistance to junior developers
Provide technical support to databases, including system management, backup/recovery, account management, configuration management, and trouble calls.
Estimate effort to complete software development activities and provide input that adds value to the management of software development projects.
Work with application development staff to develop database architectures, coding standards, and quality assurance policies and procedures
Has full authority for assigning, directing and evaluating work conducting performance evaluations, progressive counseling and career development discussions
Assist with Windows Server configuration and management as it relates to MS SQL Server installation and performance
Perform and manage daily database maintenance, monitoring and performance tuning tasks/jobs.
Installs necessary triggers stored procedures and business logic.
Ability to prepare and implement database standards and procedures across multiple environments
Good analytical and problem-solving skills to evaluate business problems and apply applications knowledge to identify appropriate solutions.
Ability to prioritize workload and consistently meet deadlines.
7. Software Engineer Trainee (August 2017 – March 2018)
ValueMomentum | IT Services & Software, IndiaRole: Hadoop Developer Trainee
Project: EDW Release 1
Environment: Hortonworks, Windows
Client: Pekin Insurance
Project Description: EDW Release 1 in Pekin Insurance project is the data platform which is used to migrate the data from mainframe to Hortonworks.
Responsibilities:
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
Working in setting Hadoop Cluster
Experience in analyzing data using HiveQL, PIG Latin, Spark.
Created Hive queries to spot emerging trends by comparing fresh data with reference tables and historical metrics.
Automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
I used spark framework using scala in my project to create a dashboard to return the number of claims that are generating for a particular policy which will be helpful for Business improvements where the business team get to know which coverage has more number of claims helpful for future predictions.
8. Big Data Intern (December 2016 – March 2017)
Jawaharlal Nehru Technological University, Hyderabad, IN
Role: Big data Intern
Project: Logfile analysis using Big data(Hadoop, Hive, Pig, Flume, OOzie, Spark, Scala, Kafka)
Environment: Hortonworks, Cloudera, Linux, Windows
Responsibilities:
Debugging skills in Big Data Analytics using SQL.
Proficiency in distributed query processing tools like Hive, Spark, SQL. Profound experience in NoSQL database - Hbase, Cassandra.
Hands-on experience in debugging to providing the best resolution.
Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop(Version 1.4.3) and load into Hive tables, which are partitioned.
Developed pig (Version 0.11.1) scripts to transform the data into the structured format.
Developed Hive (Version 0.11.0) queries for Analysis across different banners.
Developed Hive UDF's to bring all the customer's Email_Id into a structured format.
Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Developed bash scripts to bring the log files from the FTP server and then processing them to load into Hive tables.
All the bash scripts are scheduled using Resource Manager Scheduler.
Moved data from BulkOutputFormat class.
HDFS to Cassandra using Map Reduce and Developed Map Reduce programs for applying business rules on the data.
Developed and executed hive queries for Denormalizing the data. Worked on analyzing data with Hive and Pig.
Experience in Implementing Rack Topology scripts to the Hadoop Cluster
Very good experience with both MapReduce 1 (Job Tracker) and MapReduce.