Abstracts

Meta-Mathematics and Meta-Economics: In Defense of Adam Smith and the Invisible Hand

Nimai Mehta, American Univeristy

Yong Yoon, American University

04/21/2022

Abstract : The recent AU Math/Stat colloquium by Gael Giraud (03/22) has inspired us to consider more critically the application of mathematics to economics. Neoclassical economics when seen as applied mathematics has tended to assume the form of a self-contained, closed system of propositions and proofs. Progress in explaining real world markets and institutions, however, has more often been the result of insights that have emerged from outside which, in turn, have led to a reworking of existing theoretical propositions and models. We highlight here the debt that economics science owes to Adam Smith whose early insights on the nature of exchange, the division of labor, and the “invisible hand” continue to help modern economists push the boundaries of the science. We will illustrate the value of Smith’s ideas to economics by showing how they help overcome some of the game-theoretic dilemmas of multiple- equilibria, instability, and non-cooperative outcomes highlighted by Giraud.

Computational Modeling in Data Science:Applications and Education

Kateryna Nesvit, American University

04/19/2022

Abstract: The world around us is full of data, and it is interesting to explore, learn, teach, and use the data efficiently. The first part of this presentation focuses on several of the most productive numerical approaches in real-life web/mobile applications to predict and recommend objects. The second part of the talk focuses on the necessary skills/courses of data science techniques to build these computational models.

Generalizing a Mysterious Pattern

Dan Kalman, American University

04/05/2022

Abstract: In his book, Mathematics: Rhyme and Reason, Mel Currie discusses what he calls a mysterious pattern involving the sequence 𝑎𝑛 = 2𝑛 √2 − √2 + √2 + ⋯ + √2 where n is the number of nested radicals. The mystery hinges on the fact that 𝑎𝑛 → 𝜋 as 𝑛 → ∞ . In this talk, we explore a variety of related results. It is somewhat surprising how many interesting extensions, insights, or generalizations arise. Here are a few examples:

2𝑛√2 − √2 + √2 + ⋯ + √3 → 2𝜋 ; 2𝑛√2 − √2 + √2 + ⋯ + √1 + 𝜑 → 4𝜋 ; 2𝑛√−2 + √2 + √2 + ⋯ + √16⁄3 → 2 ln 3. 35

(Note that 𝜑 is the golden mean, (1 + √5)/2. ) The basis for this talk is ongoing joint work with Currie.

Some Applications of Mathematics in Economics

Gaël Giraud, Georgetown University

03/22/2022

Abstract: We will cover a few applications of mathematics in today's economic modelling. Three topics will be explored: algebraic topology in Game theory; continuous time dynamical systems in macro-economics; stochastic calculus in econometrics. Each topic will be illustrated with several examples.

Multiscale mechanistic modelling of the host defense in invasive aspergillosis

Henrique de Assis Lopes Ribeiro, Laboratory for Systems Medicine, UF-Health, University of Florida

10/26/2021

Abstract: Fungal infections of the respiratory system are a life-threatening complication for immunocompromised patients. Invasive pulmonary aspergillosis, caused by the airborne mold Aspergillus fumigatus, has a mortality rate of up to 50% in this patient population. The lack of neutrophils, a common immunodeficiency caused by, e.g., chemotherapy, disables a mechanism of sequestering iron from the pathogen, an important virulence factor. This paper shows that a key reason why macrophages are unable to control the infection in the absence of neutrophils is the onset of hemorrhaging, as the fungus punctures the alveolar wall. The result is that the fungus gains access to heme-bound iron. At the same time, the macrophage response to the fungus is impaired. We show that these two phenomena together enable the infection to be successful. A key technology used in this work is a novel dynamic computational model used as a virtual laboratory to guide the discovery process. The paper shows how it can be used further to explore potential therapeutics to strengthen the macrophage response.

Double reduction estimation and equilibrium tests in natural autopolyploid populations

David Gerard, American University

10/19/2021

Abstract: Many bioinformatics pipelines include tests for equilibrium. Tests for diploids are well studied and widely available but extending these approaches to autopolyploids is hampered by the presence of double reduction, the co-migration of sister chromatid segments into the same gamete during meiosis. Though a hindrance for equilibrium tests, double reduction rates are quantities of interest in their own right, as they provide insights about the meiotic behavior of autopolyploid organisms. Here, we develop procedures to (i) test for equilibrium while accounting for double reduction, and (ii) estimate double reduction given equilibrium. To do so, we take two approaches: a likelihood approach, and a novel U-statistic minimization approach that we show generalizes the classical equilibrium χ² test in diploids. Our methods are implemented in the hwep R package on the Comprehensive R Archive Network https://cran.r-project.org/package=hwep.

The talk will be based on the author’s new preprint: https://doi.org/10.1101/2021.09.24.461731

If you talk to these materials, will they talk back?

Max Gaultieri, Wilson Senior HS

10/5/2021

Abstract: This talk will cover the process behind building and writing code for a sonar system. Next the strength and characteristics of sound reflecting off of different materials will be discussed using data collected by the sonar system.

Mentor: Dr. Michael Robinson

Investigation of Affordable Rental Housing across Prince George’s County, Maryland

Zelene Desiré , Georgetown Visitation Preparatory School

10/5/2021

Abstract: Prince George’s County residents experience a shortage of affordable rental housing which varies across ZIP codes. This research investigates whether data from the US Census Bureau’s American Community Survey can help in explaining the differences across the county.

Mentor: Dr. Richard Ressler

Investigation of COVID-19 Vaccination Rates across Prince George’s County, Maryland

Nicolas McClure, Georgetown Day School

10/5/2021

Abstract: Prince George’s County reports varying rates of vaccinations for COVID-19 across the county’s ZIP codes. This research investigates whether data from the US Census Bureau’s American Community Survey can help in explaining the differences in vaccination rates across the county.

Mentor: Dr. Richard Ressler

Differential Privacy and the 2020 Census in the United States

John Maron Abowd, Associate Director and Chief Scientist, Research and Methodology Directorate U.S. Census Bureau

4/14/2021

Abstract: The talk will focus on the implementation of differential privacy used to protect the data products in the 2020 Census of Population and Housing. I will present a high-level overview of the design used for the majority of the data products, known as the TopDown Algorithm. I will focus on the high-level policy and technical challenges that the U.S. Census Bureau faced during the implementation including the original science embodied in that algorithm, implementation challenges arising from the production constraints, formalizing policies about privacy-loss budgets, communicating the effects of the algorithms on the final data products, and balancing competing data users' interests against the inherent privacy loss associated with detailed data publications.

INTRODUCTION TO -NEUMANN PROBLEM

Der-Chen Chang, Georgetown University

3/30/2021

Abstract: In this talk, we first introduce the -Neumann problem and geometric background of geometric analysis in several complex variables. Next, we use the method developed by Greiner-Stein and Chang-Nagel-Stein to construct the “fundamental solution for the

-Neumann problem. Then we will discuss possible sharp estimates for the operator N. To achieve this goal, we need to deal with new classes of singular integral operators which becomes the center of Harmonic analysis nowadays.

The Analysis of Periodic Point Processes

Stephen D. Casey, American University

Thomas J. Casey

3/16/2021

Abstract: Point processes are an important component of data analysis, from the queuing theory used to analyze customer arrival times in business to the occurrences of radar pulses in signal analysis to the analysis of neuron firing rates in computational neuroscience. Our talk addresses the problems of extracting information from periodic point processes. We divide our analysis into two cases – periodic processes created by a single source, and those processes created by several sources. We wish to extract the fundamental period of the generators, and, in the second case, to deinterleave the processes. We first present very efficient algorithm for extracting the fundamental period from a set of sparse and noisy observations of a single source periodic process. The procedure is straightforward and converges quickly. Its use is justified by a probabilistic interpretation of the Riemann zeta function. We then build upon this procedure to deinterleave and then analyze data from multiple source periodic processes. This second process relies both on the probabilistic interpretation of the Riemann zeta function, the equidistribution theorem of Weyl, and Wiener’s periodogram. Both procedures are general and very efficient and can be applied to the analysis of all periodic processes. The focus of this talk is to describe the algorithms and provide analysis as to why they work. Both algorithms rely on the structure of randomness, which tells us that random data can settle into a structure based on the set from which the data is extracted. We close by demonstrating simulations of the procedures.

Isometric Data Embeddings: Visualizations and Lifting Signal Processing to Arbitrary Datasets

Nate Strawn, Georgetown University Dept. of Mathematics and Statistics

2/23/2021

Abstract: We propose several extensions of lossless PCA to isometrically embed arbitrary Euclidean datasets into high-dimensional spaces of images and spaces of paths in 2D and 3D. In particular, these procedures produce "visually coherent" images or paths for each data point. Such embeddings provide an interesting tool for Exploratory Data Analysis and allow us to apply mature techniques from Image Processing, Computer Vision, and Signal Processing to arbitrary datasets. We discuss theory, algorithms, and applications to Dictionary Learning, Deep Learning, Topological Data Analysis, and Data Imputation. We also discuss relevant open problems arising from optimization over spaces of Fusion Frames, and generalizations of interlacing inequalities and majorization for Fusion Frames.

Weight calibration to improve efficiency for estimating pure absolute risks from the proportional

and additive hazards model in the nested case-control design

Yei Eun Shin, PhD Biostatistics Branch | Division of Cancer Epidemiology and Genetics, National Cancer Institute | National Institutes of Health

2/16/2021

Abstract: Cohort studies provide information on the risks of disease. For rare outcomes, large cohorts are needed to have sufficient numbers of events, making it costly to obtain covariate information on all cohort members. Nested case-control design (NCC) is one of the most popular cost-effective subsampling designs in such epidemiological studies. Standard NCC studies only use case-control subsamples in which information are complete. Recent studies incorporate covariate information available in the entire cohort using weight calibration techniques for improving the estimation of the covariate effects of hazard models. My objective is to extend the weight calibration approaches to improve the estimation of pure absolute risks by additionally incorporating survival information such as follow-up times. Two model frameworks, Cox proportional hazards model and Aalen’s additive hazards model, are considered. Simulations show how much precision is improved by calibration and confirm the validity of inference based on asymptotic normality. Examples are provided using data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Study.

A Bilevel Optimization Method for an Exact Solution to Equilibrium Problems with Binary Variables

Sauleh Siddiqui , Dept. of Environmental Science, Dept. of Math & Stat, American University

Research Fellow at the German Economic Research Institute (DIW Berlin).

2/9/2021

Abstract: Bilevel optimization problems are used to model decisions that are constrained by other decisions, such as setting policies in energy markets, making long-term infrastructure decisions, and optimizing hyperparameters in machine learning models. We provide an introduction to bilevel optimization and propose a new method to find exact Nash equilibria in games with binary decision variables. We include compensation payments and incentive-compatibility constraints from non-cooperative game theory directly into an optimization framework in lieu of using first-order conditions of a linearization, or relaxation of integrality conditions. The reformulation offers a new approach to obtain and interpret dual variables to binary constraints using the benefit or loss from deviation rather than marginal relaxations. The method endogenizes the trade-off between overall (societal) efficiency and compensation payments necessary to align incentives of individual players. We provide existence results and conditions under which this problem can be solved as a mixed-binary linear program. We apply the solution approach to a stylized nodal power-market equilibrium problem with binary on-off decisions. This illustrative example shows that our approach yields an exact solution to the binary Nash game with compensation. We compare different implementations of actual market rules within our model, in particular constraints ensuring non-negative profits (no-loss rule) and restrictions on the compensation payments to non-dispatched generators. We discuss the resulting equilibria in terms of overall welfare, efficiency, and allocational equity.

An Analysis of IQ-Link (TM)

Donna Dietz, American University

11/17/2020

Abstract: From the moment I first opened my new IQ-Link puzzle (created by Raf Peeters), I felt a very strong attraction to the toy. As Raf says, "It almost seems like the puzzle pieces are jewels or are made of candy... The object of the game is to make all puzzle pieces fit on the game board.'' The pieces themselves are colorful, shiny, translucent, and smooth to the touch. They even resonate with a gently musical percussive sound on their board! The toy is portable and inexpensive, so it makes a great gift.

But what is it, really? Should you bother? You get a sense that you're trying to solve a problem that has connections to molecular chemistry, but what if you're just being fooled by its superficial beauty? Perhaps you, like me, feel the Sudoku "strategy'' of trial-and-error-until-you-drop is not really strategy at all. According to computer security expert Ben Laurie, "Sudoku is a denial of service attack on the human intellect''. Peter Norvig in an attempt to cure his wife from this "virus'', wrote an article on his website, "Solving Every Sudoku Puzzle''. His code fits on a single page and can solve the hardest Sudoku puzzle he could find in only 0.01 seconds. What if IQ-Link is a game of that sort?

In the puzzles of yesteryear, puzzles were created by hand and intended to be solved by hand. Now, computers can generate endless "puzzles'' for humans, but are they really anything other than clerical speed and accuracy drills? When we, as recreational mathematicians, speak to this point, we can give puzzlists a general idea about new games when they come out on the market. This gives a sense of who would like the toy. In this discussion, I shed light on this complexity question for IQ-Link, a game for which I have found no published theoretical research.

Control of inward solidification in Cryobiology

Anthony Kearsley, National Institute of Standards and Technology (NIST)

11/10/2020

Abstract: For many years mathematical models that predict a cell’s response to encroaching ice have played an important role in developing cryopreservation protocols. It is clear that information about the cellular state as a function of cooling rate can improve the

design of cryopreservation protocols and explain reasons for cell damage during freezing. However, previous work has ignored the interaction between the important solutes, the effects on the state of the cell being frozen and encroaching ice fronts. In this talk, I

will survey our work on this problem and examine the cryobiologically relevant setting of a spherically-symmetric model of a biological cell separated by a ternary fluid mixture from an encroaching solid–liquid interface and will illustrate our work on a simplified 1-D

problem. In particular, I will demonstrate how the thermal and chemical states inside the cell are influenced and can potentially be controlled by altering cooling protocols at the external boundary.

Harnessing Dataset Complexity in Classification Tasks

Nathalie Japkowicz, American University

10/27/2020

Abstract: The purpose of this talk is to discuss two particular and related aspects of dataset complexity— Multi-Modality and Subclass Mix—in classification tasks, observe their effects, and show how they can be harnessed using relatively simple principles to improve classification performance. We will present four studies that show the difficulties caused to three types of learning paradigms: Binary, Multi-Class and One-Class Classification. Particular attention will be given to situations where the data presents additional challenges such as class imbalances, small disjuncts and general data scarcity. A number of approaches allowing us to harness the problems will be presented that all rely on the same or a similar principle.

Machine learning improves estimates of environmental exposures

Yuri Levin-Schwartz, Icahn School of Medicine at Mount Sinai

10/20/2020

Abstract: In environmental studies, the true level of exposure to environmental chemicals (e.g., lead, mercury, etc.) is unknown and must be estimated. The most common way to estimate exposure is with the use of "exposure biomarkers," measures of the chemical of the interest or metabolite, in a biological sample (e.g., blood, urine) from the subject. The inherent assumption is that higher levels of exposure should generally translate into higher levels in the exposure biomarkers. However, different biomarkers have variable utility as surrogate measures of exposure and no single biomarker is ideal for all chemicals. A natural question is: can multiple biomarkers be combined to improve exposure estimates? In this talk, I will describe the use of multiple machine learning methods to address this question and show how their successful application can improve our ability to highlight the effects of environmental exposures on human health.

Hitting objects with random walks

John P. Nolan, American University

10/13/2020

Abstract: Random walks are a powerful tool for studying the structure of complicated objects. This talk describes the key ideas for classical Brownian motion and its relationship to potential theory and capacity. Then we describe a generalization to alpha-stable motion and the computation of alpha-capacity.

Incorporating survival data to case-control studies with incident and prevalent cases

Soutrik Mandal, Division of Cancer Epidemiology and Genetics (National Cancer Institute | National Institutes of Health)

09/29/2020

Abstract: Typically, case-control studies only include incident cases to estimate odds-ratios for the association of risk factors with outcome from logistic regression models. Incorporating prevalent cases requires adjustment of the logistic model for the time between disease diagnosis and sampling, the backward time, to ensure unbiased odds-ratio estimates. To accommodate this survival bias in prevalent cases via backward time adjustment, one needs to estimate the distribution of time from disease onset to death. To relax parametric assumptions on this distribution, (needed when only backward times are available) we propose a computationally simple two-step procedure to incorporate additionally observed prospective survival time from all cases into the analysis of case-control studies with prevalent cases. We illustrate the proposed method through simulation studies and analyze the United States Radiologic Technologists Study to assess the association of SNPs in candidate genes with risk of breast cancer. Work done in collaboration with Jing Qin, Ruth Pfeiffer.

Statistical analysis of epidemic counts data: modeling, detection of outbreaks, and recovery of missing data

Michael Baron, American University

09/22/2020

Abstract: Official counts from the seasonal flu, 2009 H1N1 flu, and the COVID-19 epidemics are analyzed with the goals of (1) detecting epidemic outbreaks and any anomalous deviations from the expected trends; and (2) estimating the counts that are not included in official reports.

An influenza outbreak becomes an epidemic when its caused mortality exceeds the epidemic threshold. It is possible though to use statistical change-point detection tools to identify an outbreak earlier and to predict an epidemic. Construction of a change-point algorithm for the popular SIR epidemic model brings us to a more general class of binomial thinning processes. The standard CUSUM stopping rule is no longer optimal in this case. We show how it can be improved with a dynamically adaptive threshold. The resulting scheme attains a shorter detection delay under asymptotically the same rate of false alarms.

As we learn from the current COVID-19 pandemic, the officially reported daily counts of infected, recovered, and perished people are underestimated. A substantial portion of infected people is not tested, many recovered cases are not reported, and the proportion of unobserved and under-observed counts varies by territory and changes in time, because of different and changing diagnostics and reporting standards.

We develop a stochastic model that includes untested individuals and unobserved COVID-19 recoveries and casualties, extending the SIR model to include additional compartments. Its parameters such as the infection rate, the testing rate, the recovery rate, the mortality rate, and the reporting rate may vary continuously in time. The proposed Bayesian algorithm uses observed counts to estimate the model parameters and unobserved counts, continuously updating the estimates with new data.

Further Visualizations of the Census: ACS, Redistricting, Names Files; & HMDA using Shapefiles, BISG, Venn Diagrams, Quantification, 3-D rotation, & Animation

Martha Dusenberry Pohl, American University

09/15/2020

Abstract: Data visualization examples including Venn Diagrams, 3-D rotation and animation will be provided using public Census data at the ZCTA, County and Block levels. Census data sources include Shapefiles, Redistricting file, American Community Survey (ACS), and the Decennial Census. Federal Financial Institutions Examination Council (FFIEC) is the data source for the public HMDA (Home Mortgage Disclosure Act). An exploration of the BISG (Bayesian Improved Surname Geocoding) proxy method using data visualization will be reviewed. This methodology uses Bayes’ Theorem, Names Files and the racial/ethnic composition of the population of the geographic area from the Decennial Census. This presentation will look at this proxy for selected surnames at the County and Zip Code (ZCTA) levels nationwide to examine these changes by geographic location and concentration.

Data Scientists continually look to enhance the capability of analyzing data. This can be done by dynamic mapping and animation over time to visualize changes. Geographic visualizations will include animations of the racial/ethnic composition of the population by county over time using the ACS 5-year Estimates. An example using the Census Redistricting file and Shapefiles at the Block level will be conducted. We will take a preliminary look at 4 of the 5 Census Surname lists (Names Files) using “animated” Venn diagrams:

#1. 2010 Census;

#2. Census 2000;

#3. 1990 Census Heavily Hispanic; and

#4. 1980 Census “Passel-Word Spanish surname” List.

Examples of Data Quantification using visualizations and the HMDA data will be provided. Other tools such as 3-D rotation will also be provided. All calculations and visualizations were conducted using SAS 9.4.

* Disclaimer: Any opinions expressed in this presentation are those of the author and do not constitute policy or opinion of the U.S. Department of Justice or any of its subcomponents.

A New Architecture for Cell Phones Sampling via Projection for UWB and AFB Systems

Stephen D. Casey, American University; Norbert Wiener Center University of Maryland

9/8/2020

Abstract: This talk develops a new approach to how signals are transmitted through cell phones. We will describe the current model for cell phone architectures (sample and hold) and a new approach to this architecture (signal projection) developed by the speaker. This architecture is a series of “mathematical gadgets" which have resulted in two United States Patents. We will describe these “gadgets," and explain why they make cell phone communication more efficient. We develop a new approach to signal sampling, designed to deal with ultra-wide band (UWB) and adaptive frequency band (AFB) communication systems. The overarching goal of the theory developed in this talk is to develop a computable atomic decomposition of time-frequency space. The idea is to come up with a way of non-uniformly tiling time and frequency so that if the signal has a burst of high-frequency information, we tile quickly and efficiently in time and broadly in frequency, whereas if the signal has a relatively low- frequency segment, we can tile broadly in time and efficiently in frequency. Computability is key; systems are designed so that they can be implemented in circuitry.

Data Science Skills for Delivering Mission Outcome: An interactive discussion with the Director of the National Technical Information Service (NTIS), US Department of Commerce.

Avi Bender, Chakib Chraibi, and Patrick Lee National Technical Information Service (NTIS)

2/25/2020

Abstract: Artificial Intelligence, Machine Learning, Robotic Process Automation and Cybersecurity are driving global technological and process breakthroughs. The Federal Government is at the forefront of working with industry and academia to seek ways for leveraging data as a strategic asset to achieve mission outcome and to improve citizen services. Data Scientists will play a key role in delivering innovation across all market segments and must be equipped with both technical and soft skills. In this interactive session, Mr. Avi Bender, Director of NTIS, US Department of Commerce, and his associates, Dr. Chakib Chraibi and Dr. Patrick Lee will discuss the required skills for applied data science based on actual use cases. Both Dr. Chraibi and Dr. Lee recently joined NTIS from academia.

A Text Mining and Machine Learning Platform to Classify Businesses into NAICS codes

Sudip Bhattacharjee, Professor, School of Business, University of Connecticut, Senior Research Fellow, US Census Bureau

2/18/2020

Abstract: Classification of business establishments into their correct NAICS (North American Industry Classification System) codes is a fundamental building block for sampling, measuring and monitoring the $19 trillion US economy. NAICS codes form the basis of the Economic Census (held every 5 years), calculation of GDP, and other monthly, quarterly and annual economic reports. The US Census Bureau is the custodian of NAICS for the US, and receives NAICS codes for businesses from internal surveys, and other sources such as IRS (Internal Revenue Service), SSA (Social Security Administration) and BLS (Bureau of Labor Statistics). These NAICS codes do not match in several cases due to different underlying data and processes. Further, the process to generate these codes relies heavily on response to surveys as well as hours of analyst effort. This results in expense and errors on the part of the establishments and statistical agencies.

In this research, we develop a natural language processing and machine learning based methodology to predict full 6-digit NAICS codes. It uses a novel mix of publicly available, commercial, and official data. Specifically, we use publicly available textual information, (business names and company website text), commercial information (Google reviews and Place Types), and official NAICS codes at the establishment level from the Business Register (BR). Our sample consists of approximately 130,000 establishments across all 20 NAICS sectors (2-digit), and across approximately 500 national industry codes (6-digit). We implement four different machine learning models. We also implement a stacked generalization model using all four base models. We show that publicly available data is typically most informative in predicting the correct NAICS code at the 2, 4 and 6-digit levels, with commercial data providing additional information. Model accuracies range from 70% to 95%, depending on the level of specification. Accuracies increase further with other feature engineering additions. We show model stability results, and provide insights with top keywords that predict each NAICS sector. Our research can reduce both respondent and analyst burden while improving data quality of business classification. Our model can be used by various statistical agencies and private entities to streamline and standardize a core task of business classifications.

Sufficient dimension reduction for high dimensional longitudinally measured biomarkers

Ruth Pfeiffer, Biostatistics Branch, National Cancer Institution, NIH

1/28/2020

Abstract: In many practical applications one encounters predictors that are matrix valued. For example, in cohort studies conducted to study diseases, multiple biomarkers are often measured longitudinally during follow up. It is of interest to assess the associations of these repeated multivariate marker measurements with outcome to aid understanding of biological underpinnings of disease, and to use marker combinations for diagnosis and disease classification. Sufficient dimension reduction (SDR) aims to find a low dimensional transformation of predictors that preserves all of most of their information about a particular outcome. We propose least squares and maximum likelihood based SDR approaches to estimate optimal combinations for longitudinally measured markers, i.e. matrix valued predictors. We assume a linear model for the inverse regression of the predictors as a function of the outcome variable, and model the mean using a matrix that is the Kronecker-product (two-dimensional tensor) of two sub-matrices, one that captures the association of markers and the outcome over time and one that captures the associations of the outcome with the different markers. These model-based approaches improve efficiency compared to nonparametric methods. We derive computationally fast least squares algorithms building on results of Van Loan and Pitsianis (1993) and show in simulations that they lead to estimates close in efficiency to maximum likelihood estimates for practically relevant sample sizes. The methods are illustrated using biomarker and imaging data. This is joint work with Wei Wang and Efstathia Bura.

Examining the Decline in U.S. Manufacturing Employment

Justin Pierce, Federal Reserve Washington DC

01/21/2020

Abstract: This presentation describes the evolution of manufacturing employment and the reallocation of economic activity toward services over the past fifty years, emphasizing variation across U.S. industries, firms, and regions. I highlight ways in which a period of steep decline in manufacturing employment after 2000 differs from the more modest downward trend observed in previous decades and describe evidence for the role of increased import competition after 2000. I present research examining some of the socioeconomic implications of the post-2000 decline in manufacturing employment and close with suggestions for future research on improving the re-employment prospects of workers displaced by trade or technology.

The Arithmetic of Newton's Method

Xander Faber, Researcher, Institute for Defense Analyses, Center for Computing Sciences, Bowie, MD

11/19/2019

Abstract: Fix a square-free polynomial f with rational coefficients. Iteration of the familiar Newton map

N(z) = z - f(z) / f'(z)

allows you to rapidly locate roots of f; indeed, N is a rational function with super-attracting fixed points at precisely those roots. Newton's method works in both the real/complex setting (as in calculus) as well as in the p-adic domain (as in Hensel's lemma).

What happens if we consider convergence in all metrics simultaneously? Fix a general rational starting point x and consider the set of primes p for which the Newton iteration sequence x, N(x), N^2(x) ... converges p-adically to a root of f.

Felipe Voloch and I conjectured in 2010 that this set of primes has density zero among all primes, though all we could prove was that the set of such primes is infinite and co-infinite.
More recently, Rob Benedetto, Ben Hutz, Jamie Juul, Yu Yasufuku, and I found a way to use explicit computations in arboreal Galois theory to settle the conjecture for the simplest nontrivial case: f(z) = z^3 - z.

(And no, you don't get a trophy if you can find the roots of this polynomial *without* Newton's method.)

Transparency and Reproducibility in the Design, Testing, Implementation and Maintenance of Procedures for the Integration of Multiple Data Sources

John L. Eltinge, United States Census Bureau

11/12/2019

Abstract: Many government statistical agencies are exploring the increased use of data from multiple sources, including administrative records, commercial transaction information and image data, as well as traditional sample surveys. Practical decisions about production of estimates through the integration of such data require carefully nuanced evaluation of data quality (e.g., properties of the resulting point estimation and inference methods); and realistic assessment of risks and costs incurred in the design, testing, implementation and maintenance of the resulting production-level statistical procedures.

This paper explores these issues, with emphasis on three areas:

1. Conceptual and operational connections among transparency, reproducibility, replicability and sensitivity analysis for practical assessment of data quality, risk and cost

2. Consistency of (1) with fundamental organizing principles that are broadly accepted by practitioners in a given methodological or substantive area

3. Communication of (1)-(2) in ways that resonate with a wide range of data users and other stakeholders

These general ideas are motivated and illustrated with two examples based on, respectively, (a) use of administrative record data to supplement standard sample surveys; and (b) design of a specialized sample survey to “fill in the gaps” frequently encountered in the integration of non-survey data sources.

Census 2020 - Clear Vision for the Country

Michael T. Thieme , Assistant Director for Decennial Census Programs, Systems and Contracts U.S. Census Bureau

10/29/2019

Abstract: I will discuss 2020 Census design innovations including new technology, updated operations, and an integrated communications campaign. I will also touch on some of the unique challenges involved with taking the Census in an environment highly influenced by social media and increased concern about cybersecurity.

Steering qubits, cats and cars via gradient ascent in function space

Dennis Lucarelli, American University

10/22/2019

Abstract: A number of mechanical control systems can be modeled using matrix Lie groups. Notably, finite dimensional quantum mechanical systems, as encountered in quantum information science, can be modeled as control systems evolving on unitary groups. In this talk, I will introduce Lie group models and derive a simple functional gradient of the controlled dynamics that can be used to numerically optimize controls for a variety of applications. Examples from quantum control, satellite attitude motion planning, and geometric phases in robotics will be presented.

Entity Resolution Techniques to Significantly Improve Healthcare Data Quality

LATIF KHALIL, JBS INTERNATIONAL, INC.

10/15/2019

Abstract: The U.S. healthcare systems use digitized patient health records or Electronic Health Records (EHRs) to manage patient’s medical data. With the continued digitization, the ability to correctly link a patient’s medical data into a longitudinal record has become progressively challenging. As the use of unique patient id is not permitted in U.S. healthcare systems, patient demographic information is the only reliable data to link the records generated with each patient visit to a doctor or medical lab. In addition to the clerical data entry errors, patients move, marry, divorce, change names throughout their lifetime, rendering the patient demographic data unreliable to accurately link patient records resulting in poor healthcare data quality. Entity Resolution methodology is well-suited to solve these types of problems and its customized application on healthcare data has generated promising results. During the talk, we will explore the use of Entity Resolution techniques relevant to healthcare data quality improvements.

Assessing and enhancing the generalizability of randomized trials to target populations

Dr. ELIZABETH STUART, Johns Hopkins Bloomberg School of Public Health

10/8/2019

Abstract: With increasing attention being paid to the relevance of studies for real-world practice (such as in education, international development, and comparative effectiveness research), there is also growing interest in external validity and assessing whether the results seen in randomized trials would hold in target populations. While randomized trials yield unbiased estimates of the effects of interventions in the sample of individuals (or physician practices or hospitals) in the trial, they do not necessarily inform about what the effects would be in some other, potentially somewhat different, population. While there has been increasing discussion of this limitation of traditional trials, relatively little statistical work has been done developing methods to assess or enhance the external validity of randomized trial results. This talk will discuss design and analysis methods for assessing and increasing external validity, as well as general issues that need to be considered when thinking about external validity. The primary analysis approach discussed will be a reweighting approach that equates the sample and target population on a set of observed characteristics. Underlying assumptions, performance in simulations, and limitations will be discussed. Implications for how future studies should be designed in order to enhance the ability to assess generalizability will also be discussed.

Data Fusion in the Age of Data: Recent Theoretical Advances and Applications

Dr. ZOIS BOUKOUVALAS, American University

9/24/2019

Abstract: Due to the wide use of multi-sensor technology, analysis of multiple sets of data that describe the same phenomenon becomes a challenge in many engineering and social science applications. One of the main goals in such studies is the generation of a low dimensional latent representation space that would enable the extraction of meaningful information about the underlying structure of the data. This information can be used for knowledge discovery and to enhance the prediction ability of a machine learning model. Data fusion methods based on blind source separation enable simultaneous study of multiple datasets with few assumptions placed on the data. Independent vector analysis (IVA), a recent generalization of independent component analysis (ICA), enables the joint analysis of datasets and achieves improved performance over performing ICA on each dataset separately. In this talk, we will start by presenting the theoretical foundations of ICA and IVA and will present the mathematical framework of our recently developed ICA and IVA algorithms. We will then demonstrate how those algorithms can be effectively used for the analysis of fMRI data, prediction of molecular properties and extraction of chemical insights from text corpora, as well as for knowledge discovery and early detection of misinformation on social media.

Radio Fox Hunting using Sheaves

Dr. Michael Robinson, American University

9/17/2019

Abstract: Radio foxhunting is a sport in which amateur radio operators -- "hams" -- race each other to locate a hidden radio transmitter on a known frequency. Since hams are encouraged to design and build their own equipment, estimates of the hidden transmitter's position vary in accuracy and in precision depending on terrain, environmental conditions, equipment quality, and the skill of the operator.

Locating the fox transmitter is a model-based data fusion problem: one combines disparate local observations into a global inference. Over the past few years, I have developed a methodology for using sheaves -- mathematical models of local-to-global inference -- to solve data fusion problems. Although we can perform data fusion for any sheaf, I will focus on developing a sheaf for the fox hunting problem. The resulting fox hunting sheaf is modular; different sensors or models of their performance can be substituted easily without changing the methodology.

Some useful information about the Wilcoxon-Mann Whitney test and effect size measurement

Dr. SCOTT PARKER, American University

9/10/2019

Abstract: The conditions that make the Wilcoxon-Mann-Whitney test valid prove to be less restrictive than is generally thought. And that fact makes the list of appropriate effect-size measures more restrictive than is generally thought.