Detailed Additional IT Work Information

1. 2019 – present Senior Analyst and Programmer on GCP/MCCP AI/ML, Mayo Clinic MCCP AIF (DES)

· Selected to work on the team of MCCP (Mayo Clinic Cloud Platform) Data Environment Services of AI Factory Services(2019/9)

· Followed the instructions of Mayo Enterprise Architect and the AIFS Section head in charge to explore the capabilities of native GCP in terms of ML/AI/data science – Data storage (Cloud Storage, Bigquery, Bigtable, Cloud Filestore, Cloud Spanner, Cloud SQL – MySQL/PostgreSQL, Cloud Memorystore, Firebase RTDB, Firestore and Datastore – Firestore in Datastore Mode), Cloud App/Compute/ Kubernetes Engine, Cloud AI-Platform (ML-Engine: Data labelling services, Model Training Services, Prediction Services – Online or batch predictions, on-demand Notebooks – Jupyter Python 2/3 or R Notebooks), and custom-trained ML/AI model transformation and deployment on GCP AI-Platform and perform prediction/inference services (Note: Covering the processes of AI/ML/DS across discovery, translation and production). Built workable docker container images of Jupyter Python 3 and R Notebooks that are configured to work with GCP data storages (Cloud Storage, Bigquery, Bigtable, Firestore, Datastore or Firebase RTDB, etc), GCP AI-Platform, Mayo RDBMS and Big Data (e.g., DB2, SqlServer, PostgreSQL, etc)

· Successfully completed 12 Coursera (Google) Training for Data Scientists in 1.5 months, and received certificates with each course grade achieved: 100%. This includes two consecutive DS Specialties: (i) Machine Learning with TensorFlow on Google Cloud Platform (Note: Aligned with Mayo Clinic ML/DL/DS Discovery phase); and (ii) Advanced Machine Learning with TensorFlow on Google Cloud Platform (Note: Aligned with Mayo Clinic ML/DL/DS Translation and Production phase(s))

· Participated Mayo DSP (data science program) training - Building Predictive Models in Health Care, and Translational of the Predictive Model to Production

· Developed lots of GCP-compatible python and R code on both Generic ML/AI + Deep Learning including Data analytics (data preparation, transformation and visualization etc), Model training, Hyper-parameter tuning, Model optimization, Model exporting and deployment, Batch or online inference or prediction using trained/saved model, and Container construction for ML/AI

· Dispatched to work with MCCP Core Build (IaC) team (for speeding its Phase 1 go-live) and collaborate Google GCP (Google Cloud Platform) PSO engineers to develop and test Terraform modules for dynamically provisioning on-demand GCP resources (GCS, BQ, CPU/GPU/TPU, GCE, GKE, Data Flow, Pub/Sub, AI VMs and Notebooks, Containers, …) to Mayo data scientists /researchers/physicians to do ML/DL on MCCP

· Developed the automated testing for MCCP AIF services – DLVM automated SQA testing, AutoML-Tables Automated Functional testing, AutoML-Vision Automated Functional testing, Large-Container Building, Scanning and Model Training Automated Functional testing for Mayo AIF platform provisioning and customer/production supports. Although the majority of the code were written in Python, some were written in R as per the testing (especially SQA testing) design and requirement in addition to the base languages of Bash and Mark-down. CNN (convolutional neural network) Model Training using GPU and TPU in addition to the popular CPU, with matching Tensorflow version, was included in our DLVM SQA automated testing, which should make Mayo AIF (DES) look great to the customers if all the DLVMs we provisioned using TFE all passed the testing

· Helped AIF team to identify and solve small or big issues in the development of AIF platforms – DLVM and Notebooks in addition to make the right decision on adoption of the appropriate technologies (GCA instead of Twistlock for container vulnerability scanning for Phase 1 release), customer-oriented KBs (knowledge base articles), production/customer support policies, etc

2. 2014 – 2019 Senior Analyst and Programmer on GCP/MCCP AI/ML, Mayo Clinic Big Data (Core team)

· Promoted to be a senior analyst and programmer to design, implement, test and deploy architectures of a variety of Big Data solution projects based on solid expertise in lifecycle management from conception to completion. Major technologies adopted: Oracle, DB2, SQL Server, Sybase, PostgreSQL, MySQL, IBM MQ/ESB, IBM WebSphere Application Server/Apache Tomcat Server, Unix, Linux, Windows, ElasticSearch/Kibana, Storm, Flume, HDFS, HBase, WebHDFS, WebHBase, Hive, HCatalog (Templeton), WebHcat, MapReduce, YARN, Zookeeper, Sqoop, Kafka, Solr, Oozie, Phoenix, Spark/Livy (v1/v2), Zeppelin, Ambari, Kerberos, Ranger, Ranger KMS, Knox, F5 Balancer, Traefik, Active Directory, LDAP, Pig, Hue, Nagios, Ganglia, Ambari Metrics, Falcon, DataFlow (NiFi)/HDF/CDF, Talend, DataStage, SmartSense, Granfana, Atlas, Eclipse, Jupyter, R/R studio, CDSW, RapidSQL, Tableau, AppDynamics, IBM Streams (Studio)/DSI/DSR, SVN, GitHub, Docker, Kubernetes, and TFS/ServiceNow/TAYS/HwxSP ticketing

· Daily or frequently used program languages for completing routine tasks: Java, Unix/Linux shell scripting, Python, Scala (Spark), SPL, R, and Marked-up (Hugo/Zeppelin-%md)

· Contributed both code and documentation as a contributor to Apache Knox and HBase projects etc - all are the critical components of the Big Data ecosystem

· Developed and deployed Ambari Alerts (written in python and Jason) including vital Big Data functional testing alerts on HDFS/WebHDFS, HBase/WebHBase, Hive, and ElasticSearch via Knox Gateway service, Node Usage alert , HDF (NiFi) cluster node disconnecting alert etc, which are running on all Mayo Clinic HDP and HDF clusters including Normal and DR (Disaster Recovery) clusters

· Developed and deployed critical Ambari mpacks for easily managing Hadoop cluster ecosystem components or their functionalities on. May Clinic Big Data clusters - e.g., WebHBase mpack, BDTS Dashboard mpack, Kafka Topics Inter-cluster Mirroring mpack, etc, which are running as services of Ambari on Mayo Clinic Normal (3) and DR (Disaster Recovery, 2) Hadoop clusters

· Developed Best Practice example codes and documentation on WebHDFS, WebHBase, Hive and Zeppelin-Livy (Note: Livy is the rest API of Spark; v1/v2) on HDFS, Hive, HBase, RDBMS (PostgreSQL, MS SqlServer, DB2, etc.) , ElasticSearch and Kafka, which are good for onboarding and training new Big Data users/ clients

· Developed Machine Learning (ML)/Deep Learning (DL) example code of 10 use-cases that covered all types and all major categories of ML/DL model algorithms for the following 5 language + major libraries: Spark (Scala), PySpark (Python v2/v3), Pure Python (Python v2/v3; ScikitLearn, Tensorflow, Keras, Matplotlib, Pandas, Numpy, etc.), SparklyR (R), and Pure R (R; tensorflow, keras, tidyr, ggplot2, neuralnet, etc.). Explored ML capabilities of Zeppelin, CDSW, ElasticSearch, R/RStudio, etc.

· Architected and implemented the critical Big Data solutions to the issue of large number of small HDFS files generated by > 60 MayoTopology instances continuously running on 3 Mayo Clinic (MC) Hadoop clusters to reduce namenode heap stress and maintain the cluster health until the retirement of Storm topologies in 2018, and to the issue of data corruption caused by namenode or journal node failures on 5 MC Hadoop clusters to ensure data integrity on MC Big Data platforms

· Initiated and programmed both unsecure and secured versions of Automated Enterprise-Readiness Hadoop Certification Testing (A-ERHCT) Suite (in Java). Reduced ~70% cost and ~90% time of 6 MC Hadoop clusters new installation, expansion, conversion, upgrading, OS hardening, enterprise-security enabling, SSL & upgrading, and configuration changes by running the A-ERHCT test suite to identify and quickly resolve the critical issues (>10 P1 and > 50 P2 and P3 TAYS issues) and maintain the full functionality of all > 25 usable components of Hadoop Stacks for the first 4 years of Mayo Big Data program

· Decreased > 90% the outage and unavailability time of all past 7 Hadoop clusters by identifying and resolving >1200 critical Hadoop ecosystem issues, standardizing Hadoop administration methods/protocols and ticketing, training Hadoop >7 administrators on root-cause identification and fixing skills or keytab & work accounts password updating to maintain Hadoop clusters healthy and high availability, and establishing Ranger policies of HDF, Hive, HBase, Storm, Kafka, Knox, Yarn and Solr for bdts, bdsupportedadmin and other specific AD groups or single users

· Improved Hadoop Stack components usability or expanded their enterprise use-case spectrum by learning and pioneer-developing reusable sample or example code (commands, scripts, and code written in Java, Python, Scala, XML or .Net) and SOPs to share with members of Big Data client teams

· Re-architected the installation and configuration of Storm, Kafka and Spark (v1 and v2) for improving the storage capacity, data processing performance, and data analytics or generic machine learning scope and capabilities (Spark + PySpark) on MC Hadoop clusters by Mayo Clinic users and NiFi applications

· Guaranteed the 100% success of MayoTopology (Storm) architecture development and deployment on MC Hadoop clusters by collaborating with other programmers across Mayo IT on its design, coding and testing, by programming and deploying MC ESB queue monitoring and draining software, and by creating and performing end-to-end Stress and Trickle testing (end-to-end) using the right test data generated post the comparative analysis of the difference between Amalga (SQL Server) database and EDT MCLSS (DB2) database

· Archived ~13 TB Amalga Historical HL7-message data by ETL and ELT from RDBMS into MC Hadoop clusters using sqoop, shell-scripting and Java programs

· Achieved the migration of ~20 TB of HDFS, HBase and Hive data from Version#1 production Hadoop cluster into Version #2 production Hadoop cluster where data cannot be migrated between the 2 clusters using cp, distcp, Falcon/Oozie, snapshots due to the difference of HDP version + Local KDC and enterprise KDC and that no cross-realm trusting is allowed - by MC security by implementing the correctly-designed algorithms

· Successfully performed HDP/HDF/ElasticSearch/IBM Streams platform production, on-call and customer (in good relationship) supports for up to 5 years.

· Collaborated with Teradata, Hortonworks and Cloudera engineers for successfully solving complex and difficult data or data ingestion platform issue, e.g., HDF-Kafka-SSL two-way authentication and authorization issue, Oozie-Sqoop-Hive issue, Zeppelin-Spark2-Livy2-HBase issue, and the on-going Oozie-Spark2 ClassNotFound issue; extensively and deeply collaborated with different employees of Mayo Clinic in IT/non-IT departments for solving their critical data access , data processing or analytics issues

Additional Experience & Skills

· Showcased the MC Big Data program by leading a team of 7 authors to publish and share the work to the world on the high-impact IEEE CS professional journal - IEEE TII, functioning as the first and the corresponding author; presented the charter #1 work and the enterprise security work of Mayo Clinic Big Data platforms respectively in Hadoop 2015 Summit San Jose and DataWorks 2017 Summit San Jose - as an organization committee selected speaker in both international conferences

· Reviewed manuscripts for professional computer science journals or conferences such as IEEE Transactions on Industrial Informatics, Recent Patents on Computer Science, 2018 International Workshop on Computer Science and Technology (2018IWCST)

· Improved leadership, management, communication, collaboration, presentations skills via continuously attending on-line training classes or learning on-line materials and put what have been learnt into practice in the real-time daily work

· Successfully led the CDSW Sandbox POC project – managing, coordinating, training, and helping different interested users at Mayo Clinic for the evaluation of CDSW as a secure enterprise ML/data science platform

· Continuously improved ML/AI/Data Science knowledge and technologies by actively attending (as a member) and presenting to the weekly meeting(s) of Mayo Clinic Machine Learning-Deep Learning Journal Club; self-motivated to attend the Machine Learning/AI/data science webinars hosted by Cloudera, Amazon, Google, and Elastic, etc. or to study critical on-line ML/AI/data science tutorials or related public code or datasets in GitHub

3. 2012-2013 Analyst and Programmer, Mayo Clinic Enterprise Datawarehousing Team

• Selected to be one of the nine founding member of Mayo Clinic first or pioneering Big Data project from >2300 company-wide IT professionals based on the criteria: outstanding performance history, excellent productivity record, strong programming capability, remarkable technical skills, and strong team-work skills

• Designed and constructed the first Hadoop cluster (BDDev) at Mayo Clinic by installing, configuring, adjusting and issue-fixing CentOS 6.5 inside Xen-hypervisor, Hortonworks Hadoop Stack HDP1.3.2 (15 components: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Hcatalog, WebHcat, Zookeeper, Ganglia, Nagios, Oozie, Hue, and Ambari), ElasticSearch 1.0.0, and Storm 0.9.0.1 for the development usage of the initial or Charter #1 Big Data project

• Initiated and programmed 3 software tools - Fully Automated DataStage Job (Application) Unit Testing program, S2T Guide Error Finding & Correction program, and DataStage SQL Query Plus program - by coding using Java, shell-scripting, Visual Basic, and SQL programming language to speed up the development, defects correction and QA testing of DataStage ETL jobs by reducing > 90% time

• Achieved the highest EDW developer productivity by developing (designing, implementing, and QC testing) > 30 new DataStage jobs (Staging, ADS and Mart) and fixing defects for >130 existing DataStage jobs (ADS and Mart jobs) in less than one year for SCA and P3 projects

• Authored the CS paper on DataStage testing - Development of Fully Automated DataStage Application Unit Testing Software Tool" accepted by the 11th International Conference on Fuzzy Systems and Knowledge Discovery

• Published the paper on DataStage performance tuning - "Comparative Analysis of DataStage Types and Parameters on the Performance of a DataStage Application (Job)" in the 9th International Conference on Natural Computation

4. 2011 - 2012 Bioinformatics System Design and Development Consultant (Contract) - (Contracted to Mayo Clinic BORA project)

• Boosted the bioinformatics software development efficiency by standardizing the software and service pack installation & upgrading, configuration, SVN setup, and database cataloging of DDQB studio, DB2 server, and Tomcat server for BORA_DDQB java project, and those of IBM RAD (v8), DB2 server, and WebSphere application servers for BORA Control Center projects

• Resolved the long-existing software issues by debugging and fixing >10 P1 and >10 P2 issues in the java program - BORA_DDQB, data warehouse model, and implementation algorithm

• Speeded up the release of BORA software system release (v1.0) by designing, programming and performing the version 1.0 and 1.1 QA testing plan and suites using HP Quick Test Professional for identifying and fixing the 13 failures or incidents

Detailed:

(1) Successfully standardized the software and service pack installation & upgrading, configuration, SVN setup, and database cataloging of DDQB studio, DB2 server, and Tomcat server for BORA_DDQB project. A dated SOP was created.

(2) Successfully standardized the software and service pack installation & upgrading, configuration, SVN setup, and database cataloging of IBM RAD (v8), DB2 server, and WebSphere application servers for BORA Control Center projects. A dated SOP was created.

(3) Successfully resolved and documented a variety of 1st-priority issues of BORA’s DDQB project, including “Launch BORADDQB Broken Link” issue, “DIM_SUM_SEGMENT” issue, “Chr1Strand+” issue, and “DDQB Map Functions Populating and Authorization” issue, “Force User To Use Patient Set for Dynamic Condition Queries”, “LANID, RACFID, OWN(E)R_ID Upper or Lower case Issue”, “Gene-Based Dynamic Condition Query Issue”, “BORA Warehouse Database Switching from BOR1D to BOR2D Issue” etc.

(4) Successfully identified and documented some issues of BORA’s DDQB to be the framework issues of IBM DDQB per se, including “DDQB Tab Switching JSP” issue and “Query Results Exporting Pre-Termination” issue (Note: The related documents have been forwarded through MCLSS team to IBM – we cannot spend time to modify DDQB framework due to potential legal issue(s) that may be involved).

(5) Successfully analyzed and proposed optional approaches to resolve the hardest issue of BORA’s DDQB project – “Dynamic Condition Query Performance” issue – It costs a user to spend a lot of precious time to complete a query with dynamic conditions. For example, at least or >3 min (for a query with a patient set of 2 or more subjects for genotype, methylation or gene expression data on chromosome 1/ + strand /nt 0-2,000,000), >20 min (for a query with a patient set of 2 or more subjects for genotype, methylation or gene expression data on chromosome 1/ + strand /nt 0-20,000,000), and >4 hours (for query with a patient set of 2 or more subjects for methylation or gene expression data on chromosome 1/ + strand /nt 0-200,000,000). Some approaches are briefed as follows:

(a) One adopted and implemented approach (Note: Keep current warehouse database model and DDQB relational model both unchanged, correct and modify the implementation algorithm) significantly decreased the execution time for the above 3 dynamic condition queries to ~8, ~14, ~49 seconds, respectively, under the same other conditions;

(b) One approach that was accepted but not implemented due to job priority issue (Note: Keep current warehouse database model and others unchanged, correct or modify DDQB relational model) but not implemented yet due to the priority of tasks, which will decrease the execution time of the above 3 queries by 10-20%, under the same other conditions;

(c) One another approach (Note: Change current DDQB relation model and/or warehouse database model) that was not adopted due to historical and other reasons – It will enable the same query execution time to be under 2 seconds for all the above 3 queries (Note: Data were obtained by executing my prototype), under the same other conditions.

(6) Generated the first version of the QA (Quality Assurance) test plan (v1.0) for BORA_DDQB software system and other BORA software systems, and generating all the test suites & their associated test scenarios (v1.0 & v1.1) consisting of test cases for manual QA testing at present and for automated QA testing in the near future.

(7) Successfully performed Phase-I BORA_DDQB software system (v1.0) QA Testing (QA1 Testing) and documenting / resolving of the 11 types of software problems – categorized from the incidents found in QA1 testing (Note: 8 out of 11 types of software problems were successfully resolved while the other 3 were deferred to next software release).

(8) Successfully created Two Java and C++ Projects on Next Generation Sequencing Data (Note: A type of Big Data with processed .vcf.gz files): (1) Automatically unzip and extract data row number and sample ID (list) information in a series of .vcf.gz files, and output the results to .txt files for easy human reading; (2) Automatically compare the data content of .vcf files (mostly ~1 – 100 GB/file in size) between each Intermediate Reference and Query VCF files, and output results into .txt files for easy human reading.

(9) Successfully implemented some new functional aspects of BORA_DDQB software system such as dynamic condition SQL query range – limited within 2.4 million bases on any chromosome / strand.

(10) Automating tests for all BORA software systems including BORA_DDQB and BORA_Control_Center-related project systems.

5. 2011 Software Engineer, IBM (Rochester, MN)

• Revitalized the software development collaboration between IBM and Mayo Clinic IT/Bioinformatics teams by contributing coding, debugging, testing, software analysis, database remodeling, and process standardization. Major technology used: Java/JavaEE/Servlet/JSP/SQL&PL/SQL/JavaScript/HTML plus IBM RAD, DB2, DDQB, ILOG, WebSphere, Tomcat, DB Solo, Cytoscape

• Strengthened the IBM-Mayo Clinic-collaborated software DDQB_BORA by developing a data transformation plugin - "Gene Expression Data Format Transformation" plugin - which transformed gene expression data based on Entrez Gene ID or Gene Symbol with the results saving to a CSV file or displaying on the web browser, which can be further processed by Cytoscape or other graph analysis software

• Expanded the IBM DDQB software use cases by creating a generic graphing plugin that incorporated free and/or commercial graph and analysis software programs such as Cytoscape, JFreeChart, DOJO JS, ManyEyes, COGNOS and ILOG for easily generating graphs from numeric data retrieved by DDQB from a RDBMS database or data warehouse

Detailed:

(1) Collaboration work with Mayo Clinic IT/Bioinformatics teams - Scrum-driven development of database warehouse related software (BORA) using Java/JavaEE/Servlet/JSP/SQL&PL/SQL/JavaScript/HTML plus IBM RAD, DB2, DDQB, ILOG, WebSphere, Tomcat, DB Solo, Cytoscape, etc. Major satisfactory accomplishments:

§ Optimized and standardized BORA Model database scripts + 2 SOPs;

§ Compared BORA Model with real time or static BORA Dev database and generated scripts + 1 SOP for synchronizing two different BORA databases, accompanied by generation of a software program for automatic creation of simple human-readable database comparison report files; and

§ Developed a BORA’s DDQB Plugin named “Gene Expression Data Format Transformation” based on Entrez Gene ID or Gene Symbol with results saving to a CSV file or displaying on the web browser, which can be further processed by Cytoscape or other graph analysis software.

(2) IBM work: Plan-driven development of a generic plugin software program of DDQB - Creating graphs using numeric data retrieved by DDQB from a database or data warehouse such as BORA using free and/or commercial graph and analysis software programs such as Cytoscape, JFreeChart, DOJO, COGNOS and ILOG. Major satisfactory accomplishments: Version 1.0 using JFreeChart for 3D/regular charting and Cytoscape for network analysis; Version 2.0, more charting using IBM ManyEyes and Dojo JS libraries.

6. Selected Pre-2011 full-score/excellent computer science projects (Demos and presentation .ppt files of some projects listed here may be provided upon request/appointment) in Winona State University

(1) DC Automatic BVT Testing Tool (A software program to automatically generate test cases by different BVT methods, and execute the generated test cases (>500 TCs) against the well known Triangle, Nextdate and Commission Programs respectively) (Note: With SRS, UML design, test plan/analysis).

(2) DC MedLab Web Application (A database-based Web Application using advanced database design & development - ER modeling and standard industrial 3rd normal form, advanced complex SQL query, Java/JavaEE/Servlet/JSP, JDBC, DB2 / Oracle / MsSQL / MySQL DBMS, IBM RSA, WebSphere Application Server, etc.).

(3) DARF Mortgage System (An enterprise database-web application that manages the mortgage issuing, user (loan officer, borrower, and co-borrower) information, loan/payment business of a mortgage company or a bank based on a plan-driven software engineering approach with security and reliability as the critical quality priority, using UML (UCD, AD, SD, CD and STD) for software design, using RSA, DB2 DBMS, Websphere Application Server for implementation, and using JUnit for unit testing while servlets-JSP for component and system testing, etc.).

(4) DNT AutoParts project (An enterprise application running on a web server such as IBM WAS server was developed using Java and JavaEE (IBM RSA) – which contains 2 Java projects for setting up or accessing/manipulating data in a DB2 database for the business using JDBC design and 1 dynamic web project using servlets/JSPs/JavaBeans for the routine activities of the business including the database management) using model-view-controller (MVC) design pattern).

(5) DNT Dentistry project (Object-oriented analysis, design and implementation that uses Java and JavaEE (IBM RSA), DB2 and JDBC, Servlets/JSPs/JavaBeans for all the routine business activities of the Dentistry clinic, which performs dental care and orthodontic care business).

(6) CPU Scheduler (A program to correctly schedule processes between JobQueue, ReadyQueue, CPU, DiskQueue and I/OWaitingQueue using SJF, FCFS and Round Robin algorithms, coded in Java using Eclipse).

(7) Fast Food Restaurant Web Application (A web application using a database on all DB2, Oracle, MsSQL, and MySQL RDBMS with Checks, Stored Procedures & Triggers to manage the routine business of a fast food restaurant including displaying menus, taking orders and recording various orders, etc.).

(8) Readers-Writers Problem Project (A program to solve first, second and third Readers-Writers concurrency problems using semaphore, coded in Java using Eclipse).

(9) Paging Project (A program to find page faults under LRU or FIFO algorithms using model mainstore, page table, frame list and page reference string, coded in Java using Eclipse).

(10) Disk Scheduler (A program to correctly schedule disk access and traversals using algorithms of FCFS, SSTF and C-LOOK, coded in Java using Eclipse).

(11) Sorting Project: Process data in an input .txt file into an array or a doubly linked list, and sort the array or doubly linked list in ascending order by merge sort, quick sort and other sorting algorithms before exporting the sorted data into different .txt file (Coded in C++ using both CodeLite and Eclipse with C/C++ plugin).

(12) PHP and XML project (3 XML parsers were programmed using PHP/HTML/Apache Server – SimpleXML, XML Dom Parser and SAX parser).

(13) Animal Guess-and-Learn project (An independent game developed using Java and the Data structure of GameTree that extends BinaryTreeBasis and implements GameTreeInterface, which asks a game player to think of an animal and the computer will guess the animal by asking yes/no questions, and the data of questions and animals in the game tree will be expended and saved after a user adds some information and decides to leave the game for next game player).

(14) Polynomial project (A project was developed using Java and the data structures of array and linked list for the addition, subtraction, multiplication and correct display of any kinds of polynomials).

(15) Sino-USA Bank project (A relational database was developed on 4 DBMS systems - MySQL / MsSQL / DB2 / Oracle server, ER modeling, Boyce-Codd Normal Form (BCNF) and multivalued dependency, relational algebra, SQL, and PHP/HTML).

(16) Shreveport Loan and Pay Project (An stand-alone software program was developed using Visual Basic, which took factors such as different taxes, house rent, real time payment variance, loan amount, down-payment amount, loan types and interest rates into consideration for a customer to get a full view for each decision he/she makes as regards a specific loan and pay paradigm).

(17) Gene Transcription Motif Finding project (An independent software program using Java and the data structures of 4-ary Full Search Tree and DNA Matrix for finding the motif(s) that regulates gene transcription using brute-force, simplified and branch-and-bound median-string algorithms).

(18) Next-Generation Sequencing project (Analyze the algorithms of NGS short reads mapping software such as ZOOM that uses Spaced Seeds Technology for increasing alignment speed and decreasing memory consuming) [Note: An ongoing project - Develop an algorithm and stand-alone program using Java for aligning NGS short reads against a reference sequence or assembling NGS reads de novo).

------------------------------------------------------------------------------------------------------------------------------

Page updated

Google Sites

Report abuse