Client: Cisco Systems, CA, USA (Sep. 2022 – Dec. 2024)
Security Researcher
As an independent research project aimed at creating a new prototype known as the Decentralized and Graph-based Software Supply Chain Security (DG-SSCS) system. The aim was to improve the security of the software supply chain by using different bill of materials (BOMs) and proof of concept through blockchain technology. The research involved integrating various modules from open-source tools to find vulnerabilities present in each package available in the GitHub repository. The final commits signed with the Ethereum blockchain, which enhances the security aspects of the industry.
Build a dependency graph of a GitHub repository using the GitHub GraphQL API.
Continuously verify and maintain the repository's Software Bill of Materials (SBOM).
Generate and extract SPDX SBOM using the Syft tool from the repository.
Automatically detect package managers or build systems used by the software.
Display extracted SBOM data in the Dashboard UI.
Scan repository dependencies for vulnerabilities.
Provide detailed information on each vulnerability (e.g., package ID, CVE ID, severity, description).
Utilize Grype to analyze SBOM files and match against known vulnerabilities.
Detect and identify cryptographic functions in the source code.
Generate JSON reports, including the line number of cryptographic usage.
Determine whether cryptographic functions are quantum safe.
Analyze interdependencies and couplings between Java modules and methods using Jarviz.
Create a directed multigraph of Java project dependencies, revealing intricate method-level couplings.
Interact with Ethereum using Web3.py.
Sign developers and public repositories using private/public keys.
Perform blockchain-related functions such as reading block data and interacting with smart contracts.
Scan Docker container images for SPDX SBOM and vulnerabilities using Syft and Grype libraries.
Provide security insights for containerized applications.
Analyze Linux ELF and binary files to generate function-level dependency graphs.
Detect exceptions such as buffer overflows or memory leaks using the angr library.
Perform static and dynamic analysis, including control flow, data dependency, and value flow graphs.
Developed various shell scripts to evaluate the back-end modules as microservices of entire system.
Documented system designs, experimental results, and project requirements to support ongoing research and development.
Utilized AWS for DevOps tasks, including Docker, CI/CD pipelines with LocalStack, and Git for version control.
Key Technologies: ReactJs, Python, Flask, MySQL, Blockchain, Linux, GitHub, Semantic-UI react, AgGrid react, GraphQL API, JavaScript, vulnerability Scanner, Docker, Ubuntu OS, cmake,
Client: Cisco Systems, CA, USA (June 2022 – Aug. 2022)
Data Engineer
PySpark & Spark-SQL Development: Built PySpark and Spark-SQL applications for data extraction, transformation, and aggregation, leveraging AWS EMR, S3, Glue Metastore, and Athena for scalable data applications.
Real-time Data Processing: Implemented Spark Structured Streaming and Kafka for real-time data processing, creating robust pipelines for real-time analytics.
Lambda & Workflow Automation: Developed Lambda Functions for data ingestion and orchestrated workflows using StepFunctions, optimizing data flow and automation across AWS services.
Multi-Format Data Processing: Processed complex data in formats like Avro, Parquet, Hudi, and JSON, across sources such as S3 and local systems.
CI/CD & Cost Optimization: Enhanced the CI/CD pipeline with CircleCI for automated deployments, and utilized LocalStack to simulate AWS APIs, reducing billing costs by 60%.
Testing & Quality Assurance: Automated test cases using pytest and unit tests, ensuring data accuracy and reliability across ETL processes.
CloudFormation & Performance Optimization: Deployed applications using CloudFormation for efficient provisioning, and optimized Spark job performance for better resource management.
ETL Development & Snowflake Integration: Designed and maintained ETL pipelines, integrated Snowflake with PySpark for data manipulation, and automated data flow from S3 to Snowflake.
Cluster Load Balancing: Optimized Hadoop cluster load balancing and Spark job performance, improving data processing efficiency and reducing resource consumption.
Key Technologies: Hadoop, PySpark, Hive, AWS EMR, S3, EC2, Glue, StepFunction, Lambda, Kinesis, CircleCI, LocalStack, AWS Athena, Glue, SQS, SNS, Snowflake, Pandas, Numpy, Matplotlib, plotly
Department R&D coordinator and sent research projects to various funding sources.
Delivered lectures to both undergraduates and graduates, preparation of question papers and correction.
Developed department website and integrated Tableau Dashboards using JavaScript API.
Publishing quality research papers in national and international Conferences and Journals.
Developed and taught undergraduate courses in data science, business analytics, and decision sciences using real-world case studies and firsthand projects to bridge the gap between theory and practice.
Mentored research projects related to machine learning, NLP, data science which led to successful publications in IEEE conferences.
Courses taught:
Undergraduate Courses: Linux Programming, C-Programming, Graph Theory/Analytics, PHP Programming, Pattern Recognition, Machine Learning
Graduate Courses: Data Science tools and techniques, Cloud Computing, Decision Support Systems, Deep Learning techniques
Professional experience as a Data Analyst, worked on Predictive Analytics, complex data processing using SQL, Stored Procedures, Hadoop framework in various Andhra Pradesh Government Schemes.
Experience in design and develop of public distribution schemes and feedback system
Extracted various metrics from the feedback system and applied regression analysis to predict public distribution reaches across the state.
Developed an interactive Data visualization using Tableau.
Written Linux cron jobs to auto backup the database.
Involved in building Data Analytics Pipelines.
Created Hadoop cluster to handle large volume of data and processing.
Developed incremental inserts using Sqoop and exports from HDFS to RDBMS.
Created Hive tables and complex hive jobs to process that data.
Built real-time streaming application using PySpark Streaming API, KafKa, Zookeeper.
Written PySpark structured streaming to process complex data.
Consumed JSON messages from KafKa in batch intervals and process using PySpark data frames.
Performed import and export of streaming data into KafKa through PySpark streaming.
Converted all processed JSON using spark batch into Parquet format.
Handled various semi-structured data like XML, JSON, Avro and Parquet.
Environment: Linux, Hadoop, Python, Sqoop Jobs, Hive, PySpark, Spark SQL, KafKa, Tableau, AWS S3, EMR, Kinesis, Lambda, RDS, Athena, Tableau, Oracle BI, python
Vedavaag Systems Ltd, Hyderabad, India (Aug. 2015 – July 2018)
Sr. Data Analyst (Contract)
Led the development of an election survey management system using ReactJS for front-end and Java for back-end, integrated with MS SQL Server for managing user data and survey results.
Built custom stored procedures and triggers to handle complex data aggregation and reporting requirements for survey results.
Implemented secure REST APIs using Spring Boot to manage survey workflows, voter registration, and reporting dashboards.
Developed reusable React components to create an intuitive and responsive user interface for election managers and respondents.
Collaborated with a cross-functional team of UX designers, QA engineers, and product owners to deliver features on-time in an Agile environment.
Built complex MS SQL Server queries, views, and stored procedures for large-scale data analysis and automated reporting.
Worked on implementing session management and authentication services for secure access to the election survey system.
Optimized stored procedures and SQL queries to handle high traffic loads and large datasets efficiently.
Improved database query performance by 40% through optimization of indexes, stored procedures, and schema design.
Developed various microservices: maintain user data, reports for various GIS administrator levels, voter details, government schemes,
Written complex SQL queries and integrated them into Tableau dashboard.
Key Technologies: Linux, Java, Python, MS SQL Server, Semantic-UI-React, IntelliJ, git
Languages: C, Shell Scripting, awk, Java, Python, R, HTML, JavaScript, CSS, jQuery, JSON, Rectjs
Data Processing: Hadoop, Sqoop, Hive, PySpark, Apache Cassandra, Nifi (data flow, integration)
Real-Time Streaming: Apache Kafka, PySpark Streaming
Data Visualization: Tableau
Database: MS SQL, MY SQL, NoSQL, Google BigQuery
Operating Systems: Linux, Windows
Deep Learning: PyTorch, Keras, Tensorflow
DevOps Tools: Jenkins, Git, Docker, Kubernetes, Terraform, CloudFormation, jupyter-note-book