Gordon Rios's CV

Gordon Rios (Palo Alto, CA)




Email: gpar...@gmail.com
Academic Email: g.r...@4c.ucc.ie

Objective:

To make a significant and ongoing contribution to deep scientific challenges requiring innovations in computing.

Specialties:

  • Software Development: C++, Ruby, Java, and Python
    Scientific Computing: R / S+, Matlab (Octave), Maple, Mathematica, LaTeX
  • Machine Learning: Reinforcement Learning, Pattern Recognition, Classification, Clustering
  • Data Mining: Induction Trees, Logistic Regression, Artificial Neural Networks, and Support Vector Machines (SVM)
  • Optimization: Mathematical Programming including Linear and Nonlinear programming; Dynamic programming; Constraint programming
  • Mathematics: Probability, Stochastic Processes, and Algorithms
  • Data Systems: Hypertable, SoLR, Hadoop, Nutch, MySQL, Aster Data (SQL/MR)
I have written software for the following platforms:
  • Linux  linux, a strong favorite since 1996
  • Mac OS X (tiger, leopard)
  • Several Flavors of Unix: including Sun, SGI, Digital, and IBM
  • FreeBSD
  • Windows (Win32 API)
  • DOS

Employment History

05/2009 - Present
Cork Constraint Computation Centre (4C) at Univ. College Cork (UCC), Cork, Ireland
Staff Scientist: Marie Curie Research Fellowship working with Rick Wallace on the Constraint Reasoning Extended to Enhance Decision Support project.

My objectives are to apply software engineering principles and research in machine learning and optimization to the development of environmental, energy management, or ecological protection systems.

01/2009 - 07/2009
ShareThis.com, Mountain View, CA
Research Architect: Responsible for rapidly developing a world class data mining capability. Moved to technical advisory and research consulting as of August 2009.
  • Provided vision and strategy to move company to Hadoop + Cascading on AWS for log processing and data mining. Company's first project (log processing) written up as case study by Chris Wensel in the Hadoop: The Definitive Guide.
  • Recruited three new team members including team lead for Insights group (an expert in AWS and Hadoop), team lead for web applications, and a senior web developer / architect.
  • Prototyped crawl, index, and search, capability for shared urls using Nutch.
  • Created a production prototype recommendation technology using Aster Data's SQL/MR framework.
  • Developed a wide variety of user and content clustering utilities based on the Boost C++ Graph Library.

11/2006-01/2009
Zvents, Inc., San Mateo, CA

CTO: broad technical leadership and comprehensive vision for industry leading local search SaaS company (C++, Ruby, Java, and R)

Perform duties as CTO helping to build the company from less than ten employees to over fifty and the leading SaaS provider of local "things to do" search.  During my first year, I was responsible for leading and growing the engineering team from three to twelve engineers; in my second year, I was responsible for developing the search engineering team from three to six engineers plus integrating search technology throughout engineering. Transitioned to Scientific Adviser in January 2009.

Major accomplishments included:

  • Presented Zvents Search Technology for Series A/B funding including the recent investment round led by Nokia and key customers including MSN (cityguides.msn.com), AOL (www.when.com) as recently announced in search engine watch, AT&T (yellowpages.com for mobile), and Nokia.
  • Led design and implementation of content analysis pipeline including personal of development text mining and fuzzy logic libraries in ruby.
  • Led design and development of search capability based on SOLR (Search on Lucene w/Replication).
  • Led design of comprehensive relevance ranking framework including "federated" search and supervised development and deployment of search ranking, spell checking, and de-duplication systems.
  • Led design and helped build distributed web classification system based on Hadoop -- work led to a filed patent.
  • Sponsored Hypertable project (an open source, high performance, scalable database) from inception as a founding team member of the project and organized a technical strategy for Zvents around it.
  • Worked with Zvents' Chief Architect Tyler Kovacs to fully integrate Hypertable into production by developing a ruby extension using the Ruby Interface for C++ Extensions. Package allows ruby developers to execute a wide array of methods against the Hypertable Client API. Extension is also being used to create a HyperRecord drop in replacement for Ruby on Rails ActiveRecord.
  • Worked with Doug Judd to wrote a tutorial for Hypertable Query Language for alpha release.
  • Directly hired nine engineers plus assisted recruiting and hiring VP/Eng, Director/Content , VP/Product, PM/Search, and four senior engineers.
  • Evangelized the adoption of Hadoop and sponsored (and attended) a beta version of Chris Wensel's Two Day Hadoop Bootcamp.
  • Prototyped several classes of traffic and content analytics reports in Ruby.
  • Wrote technical design outlines for search relevance, sponsored search listings, and paid listing models, for 2009 projects.

2/2005-11/2006
Yahoo!, Sunnyvale, CA

Technical Yahoo (principal): international document and query classification (Python, C++, R (open source version of S+)

Worked with a broad set of engineering teams to implement classification technologies for international search markets. Conducting research into various aspects of web search relevance measurement and improvement.

  • Worked with a team to transfer machine learning based content classification from the us market to all major international markets. My role was to write tools and design processes that allowed international engineers to build, maintain, and analyze, several types of content classification models, including spam and adult content, across markets including Yahoo! China, Yahoo! Japan, Yahoo! Korea, Yahoo! France, Yahoo! Germany, utilizing Yahoo! US technical resources. My tools were picked up by US team and are currently used to build and maintain those models as well.
  • Worked with director of research (Gordon Sun) to develop novel query classification framework for use by all international markets. Designed and implemented a general framework and library for query classification. Currently in production as a C++ library for Yahoo! Japan and an internal CGI demo across four languages.
  • Designed a novel algorithm for automatically learning binary classification models from two lists of queries or short text in any language (utf8 encoded) or language mix. Algorithm is implemented in C++ as a standalone utility being used to generate models for query classification models in production and planned for production.
  • Conducted research on analyzing and training models using subjective ordinal data. Wrote and submitted a paper which, though rejected, was a good first approach to a complex problem.  Proposed a direct and clear approach for this problem and presented to Yahoo! Statistical Advisers Profs. Jerome Friedman and Trevor Hastie from Stanford University.

5/2003-12/2004
ProofPoint, Cupertino, CA

Principal Scientist: Spam Detection and Content Classification (Python, C++, R [open source version of S+], and Perl) 5/2003-12/2004
Technical Adviser: 12/2004-4/2006

Fast paced research and development of machine learning applications for content classification and spam detection. Responsibilities include overall architecture of Proofpoint's MLX (machine learning) technology for anti-spam classification and content filtering based on classifier ensembles and multistage models.

  • Close collaboration with one senior engineer to develop spam scoring engine for several major releases of the proofpoint protection server including 1.01, 1.20/1.21, 1.50/1.51, and 2.0/2.01. Made contributions on all releases from 1.01 upto and including 3.0. Time Line of Technology Development at Proofpoint (Wikipedia).
  • Wrote complete high performance large scale logistic regression package in C++. Helped design and test implementation of phrase feature extraction engine.
  • Developed working prototypes for naive bayes, induction trees, and self-organizing maps in python. Used for comparative accuracy studies.
  • Developed complete ensemble classifier framework in python. Used in testing various ensemble strategies and multistage modeling implementation. Published and presented paper Exploring SVM and Random Forests for Spam Detection at first annual Conference on Email and Anti-Spam.
  • Wrote numerous utilities for performing model validation, robustness testing and reporting, and a wide array of data analytics tools including eigenvector analysis of feature sets in C++ and python. Developed innovative text analysis code (C++) for assessing the probability of text fragments or email addresses being generated by spammers.
  • Performed weekly model updates and data mining with custom built modeling and validation tools to support several dozens of customers representing hundreds of thousands of end users. Led the design and hiring of three positions (including the team lead) responsible for performing daily model update and maintenance.
  • Applied for patent on update process entitled Updating Large Scale Logistic Regression Models (US PATENT 7,461,063)
  • Carefully documented all work through over 25 html documents, a large wiki site, and almost 90 graphs and charts posted on intranet for engineering and marketing use. Regulary conduct technical presentations to support pre-sales and technical sales activity. Contribute technical sections for product white papers, demos, and presentations. Wrote comprehensive MLX FAQ for sales and marketing.

8/2002-4/2003
Certive, Redwood City CA

Senior Software Engineer: Analytics and Modeling (Java + EJB, XML, Python)

Developed two java packages for providing high level services for computing and updating convolutions of Key Performance Indicators (KPI) used in the company's Business Intelligence platform (JBoss, Tomcat, and JMS). The framework allows application engineers to rapidly create and deploy (e.g. 30 minutes or less) new stochastic KPI's for the platform by composing them from elemental building blocks. All my development was done on Linux using Xemacs JDE, Ant, and Perforce.

Prototyped resource scheduling algorithms for use in analytics framework using Python. Wrote several small papers for the design and justification of small subcomponents and future enhancements.

Hired as a Principal Software Engineer but company is currently verifying Sr. Software Engineer.

2/1999-8/2002
Inktomi, San Mateo CA

Software Engineer/Scientist (C++, Python, Perl, S+) Web Search Development and Directory Engine groups.

Responsible for implementing software components and utilities for machine learning, statistical computing, and data mining to improve document relevance scoring in Inktomi's Web Search product.

  • Wrote an integrated pipeline in C++ to support the release of the Directory Engine product in 1999. Code was used to provide Directory Engine service to more than 20 large customer sites.
  • Wrote a C++ application for building phrase lists from 100's of millions of candidates (wrapped around InXights linguistX libraries) that was used to improve the accuracy of the Naive Bayes Classification component of the Directory Engine product release.
  • Wrote reporting packages in Perl for analyzing accuracy of Directory Engine and Search Engine relevance and accuracy. Reporting served as one of the key prototypes used to justify building and staffing a Relevance group at Inktomi Websearch.
  • Designed a complete rescoring solution to precompute results for most popular queries deployed as a prototype web search product at Snap and Hotbot.com. Wrote a key system component in C++ that integrated user clickthough data, output from link analysis software, and the search engine rank, to process 100's of millions of urls for final ranking. Project included building tree based classifcation and logistic regression models in S+ and writing utilities in Perl for mapping S+ model output to C++ include files. Internal testing showed that top ranked result had higher relevance than all other search competitors including Google.
  • Wrote several thousand lines of Python code and modules for performing a wide range of click-through analysis and annotations for use in the development and aggressive growth of Inktomi's paid inclusion products. Code is regularly used to support decisions about the quality and effectiveness of ongoing content submissions from 10's of thousands to millions of urls.

10/1998-2/1999
Documentum's Relevance Product Group, San Francisco CA

Consulting Staff Scientist (Java, XML, Mathematica). Wrote four mathematical papers analyzing Relevance's information retrieval system. Conducted a detailed evaluation and comparison fo the text processing capabilities of InXight and INSO software libraries.

11/1996-10/1998
Cogit Corporation, San Francisco

Lead Software Designer (C++, Perl, Tcl/Tk). Developed a complete induction tree (machine learning) engine for Cogit's primary data mining capability used in production runs with over six million records and 350+ data fields running on a Digital Alphaserver 8400 with eight processors. Set direction for predictive modeling software development across a distributed team composed of 4 PhD's including Jean Marc Langlois (physics), Jeffrey Davitz (statistics), Pieter Hazewindus (computer science), and Jonathan Dickinson (economics). Designed and developed the following server side components:

  • Robust induction tree engine derived from CART that incorporated multiple treatments directly into the splitting criteria of the algorithm.
  • Adaptive refinement for tree models based on response data.
  • Automated model based experimental design (patent filed 1999)
  • Neural network for combining and updating scores from multiple models.
All software was portable across more than a dozen combinations of platform and compiler. During this time I also prototyped analysis utilities in Perl -- including the TK extension -- and Tcl/TK. These utilities were used by company analysts and scientists to support engagements with several clients including Wells Fargo, CapitalOne, Nabisco, and BellSouth.

1/1996-11/1996
Scudder, Stevens & Clark, San Francisco CA

Senior Quantitative Analyst (C++, Perl, and S+). Consulted with managing director on the applications of advanced computing technologies and data visualization for mutual funds and acquisitions of related businesses. Modeled analyst rating systems, built a stock screening system, and developed a Perl based reporting package. Rating and screening applications implemented in FACTSET and used by Scudder's Large Cap Value fund.

6/1992-1/1996
Allstate Research and Planning Center (ARPC), Menlo Park CA

Research Associate (C++ and S+). Supervised a research team in the application of Machine Learning Technologies (Neural Networks, Genetic Algorithms, Fuzzy Logic, and Agent Based Simulation) and Complex Systems Theory to investment problems. Applications included time series analysis and forecasting, data visualization, industry simulation, decision analysis, and optimization. Winner of the ARPC Cutting Edge Research Award.

Staff Research Analyst (C and S+). Developed fuzzy logic expert system for assisting in the asset allocation decision for Allstate Pension Plans of more than two billion dollars. Published results in AI/Finance. Developed Localized Linear Regression Models and Genetic Algorithms (automated variable selection) systems for time series analysis and forecasting of BARRA risk components used in porfolio optimizations for multi-billion dollar common stock portfolio. Developed Nonlinear Regression models (prepayments) for derivative pricing and analysis used by Allstate investment managers to build and maintain multi-billion dollar MBS and CMO portfolio.

Senior Research Analyst (IMSL/IDL and S+). Developed a multi-asset capital market return simulator for use in a large scale stochastic optimization program developed jointly with IBM's TJ Watson Lab. Developed Monte Carlo simulations and nonlinear regression models for forecasting and risk management. Overall program used to aid in the assset allocation decision for Allstate's $60+ billion dollar portfolio.

1990-92
University of California, Berkeley
Haas School of Business

Graduate Student Researcher. Free trade agreements. Fall 1990.
Graduate Student Instructor: International Business. Material included financial risk analysis including currency hedging strategies. Spring 1991.
Graduate Student Instructor: Microeconomics (w/Calculus). Material included linear programming and game theory. Responsibilities included writing and grading exams, leading discussion sections and case studies, and conducting review sessions. Fall 1991 and Spring 1992. Earl F. Cheit Student Outstanding Teaching Award Winner, Spring 1992.

1989-1990
University of Maryland, University College

Database Programmer (Dbase III+).
Managed PC lab and wrote database applications for administration staff. Wrote research application for history department.

1983-89
United States Air Force

Intelligence Specialist and Supervisor
Performed intelligence and imagery analysis for strategic and tactical operations and training. US duty locations: Texas, Colorado, Utah, Nevada, and Guam. Foreign duty locations: England and Norway. Honor graduate: Intelligence Technical School, Tactical Operational Intelligence Course (Nellis AFB), and NCO Academy. Security clearance: TS-SCI.

Additional Software Projects

I've worked on many additional software projects for a variety of research and commercial applications. This work has allowed me to build expertise as a component specialist -- essentially, wrapping up sophisticated analytic and scientific functionality as high level software services.

Academic Degrees and Coursework

1999-2002
University of California at Berkeley, College of Engineering

M.S., Operations Research

Still conducting intermittent research with Prof. Stuart Dreyfus in Reinforcement Learning and Dynamic Programming with applications to adaptive decision systems, scheduling, agent learning, and games.

1990-1992
University of California at Berkeley, Haas Graduate School of Business

M.B.A., Managment Science and Computational Finance
Masters Thesis: Institutional Portfolio Insurance with Two Color Rainbow Options. Developed simulation and pricing software for exotic options under the supervision of Prof. Mark Rubinstein. Work included implementing a numerical approximation for the bivariate normal distribution. Other courses included probability theory with Prof. Jim Pitman

1987-1990
University of Maryland, University College

B.S., Business Management, Mathematics, and Computer Technology
University of Maryland President's Fellowship. Summa Cum Laude and Phi Kappa Phi.


Additional Coursework and Professional Seminars

Continuing Studies

  • Advanced Perl Programming, Sriram Srinivasan, U.C. Berkeley Extension 1996.
  • Soft Computing Technologies, U.C. Berkeley Extension 1994.
  • Neural Networks for Pattern Recognition, David Stork, U.C. Berkeley Extension 1994.
  • Advanced Data Structures in C, U.C. Berkeley Extension 1993.
  • C Language Programming, U.C. Santa Cruz Extension 1993.
  • Introduction to Chaos Theory, U.C. Berkeley Extension 1992.

Selected Professional Seminars and Courses

  • Hadoop Boot Camp, Scale Unlimited, 2008.
  • Frontiers of Derivatives Research, John Hull and Alan White, University of Toronto 1995.
  • Soft Computing Days in San Francisco, Berenji, Zadeh, Widrow, and Stork 1995.
  • Modern Regression Methods with S-PLUS, Statistical Sciences 1993.

    Selected Talks, Publications, and Programs

    • Patterns (and Anti-Patterns) for Developing Machine Learning Systems. Invited Talk at SysML '08 San Diego, CA, December 11, 2008.
    • Subjective Ordinal Labels: Empirical Study for Yahoo! Statistical Advisers Profs. Jerome Friedman and Trevor Hastie, company level mini-conference based on research following a paper submitted to NIPS (rejected) (2006)
    • Federated Search Applications: given at INRIX corporation in 2006.
    • First Yahoo! Open Hack Day presentation on work with two engineers to create a search engine based on a novel click-through modeling approach.
    • Machine Learning and Statistics for Internet Services Book outline and Introduction I did for Brett McLaughlin from O'Reilly in response to my book proposal.
    • Third Conference on Email and Anti-Spam (CEAS 2005). Program committee member.
    • Second Conference on Email and Anti-Spam (CEAS 2005). Program committee member and session chair for machine learning.
    • Exploring Support Vector Machines and Random Forests for Spam Detection. First Annual Conference on Email and Anti-Spam Mountain View, CA.
    • Market Mechanisms for Community Web Search. Yahoo Research Labs. Pasadena, CA.
    • Machine Learning for Spam Detection. Proofpoint webinar. Cupertino, CA.
    • Clustering for Image Data. NVIDIA performance group meeting. Santa Clara, CA.
    • Competitive Measures of Search Engine Relevance. Looksmart. San Francisco, CA.
    • Usefulness: A New Framework for Search Engine Relevance. Inktomi Seminar. Foster City, CA.
    • Applications of Peer-to-peer computing for Information Retrieval. Centrata. Menlo Park, CA.
    • Complexity Theory: New Paradigms for Quantitative Analysis. Annual Investment Meeting, Allstate Insurance. Kohler, WI.
    • Quantitative Investment Analysis: Perspectives from Complexity Theory. Annual Investment Officers' Meeting, Allstate Insurance. Menlo Park, CA.
    • Surveying Decision Maker Preferences for Linear Objectives. Stanford University Tenth Annual Operations Research Symposium. Palo Alto, CA.
    • Modern Applied Statistics with S-PLUS. Gave S-PLUS Demonstration at Applied Materials, Inc. for StatSci, Inc. Santa Clara, CA.
    • Asset Allocation for Pension Plans. Advanced Technologies for Trading and Asset Management. New York, NY.
    • Asset Allocation for Pension Plans, AI/Finance, January 1994. Miller-Freeman, San Francisco.
    • Three Feature Articles, HaasWeek: Student Newspaper for Haas School of Business, UC Berkeley.
    • Back to the Issues, San Francisco Chronicle, Letter to the Editor, October 1998.

    Professional Associations and Memberships

    Languages (other than English)

    • Familiarity with spoken and written Spanish.
    • Familiarity with spoken Russian.