We consider mixed-integer quadratic optimization problems with banded matrices and indicator variables. These problems arise pervasively in statistical inference problems with time-series data, where the banded matrix captures the temporal relationship of the underlying process. In particular, the problem studied arises in monitoring problems, where the decision-maker wants to detect changes or anomalies. We propose to solve these problems using decision diagrams. In particular we show how to exploit the temporal dependencies to construct diagrams with size polynomial in the number of decision variables. We also describe how to construct the convex hull of the set under study from the decision diagrams, and how to deploy the method online to solve the problems in milliseconds via a shortest path algorithm.

Gómez A, Han S and Lozano L.

This paper investigates convex quadratic optimization problems involving  indicator variables, each associated with a continuous variable, particularly focusing on scenarios where the matrix  defining the quadratic term is positive definite and its sparsity pattern corresponds to the adjacency matrix of a tree graph. We introduce a graph-based dynamic programming algorithm that solves this problem in quadratic time and memory complexity. Central to our algorithm is a precise parametric characterization of the cost function across various nodes of the graph corresponding to distinct variables. Our computational experiments conducted on both synthetic and real-world datasets demonstrate the superior performance of our proposed algorithm compared to existing algorithms and state-of-the-art mixed-integer optimization solvers. An important application of our algorithm is in the real-time inference of Gaussian hidden Markov models from data affected by outlier noise. Using a real on-body accelerometer dataset, we solve instances of this problem with over 30,000 variables in under a minute, and its online variant within milliseconds on a standard computer. 

Bhathena P, Fattahi S, Gómez A and Küçükyavuz S.

In this paper, we consider convex quadratic optimization problems with indicators on the continuous variables. In particular, we assume that the Hessian of the quadratic term is a Stieltjes matrix, which naturally appears in sparse graphical inference problems and others. We describe an explicit convex formulation for the problem by studying the Stieltjes polyhedron arising as part of an extended formulation and exploiting the supermodularity of a set function defined on its extreme points. Our computational results confirm that the proposed convex relaxation provides an exact optimal solution and may be an effective alternative, especially for instances with large integrality gaps that are challenging with the standard approaches. 

Liu P, Atamtürk A , Gómez A and Küçükyavuz S.

We consider the problem of learning support vector machines robust to uncertainty. It has been established in the literature that typical loss functions, including the hinge loss, are sensible to data perturbations and outliers, thus performing poorly in the setting considered. In contrast, using the 0-1 loss or a suitable non-convex approximation results in robust estimators, at the expense of large computational costs. In this paper we use mixed-integer optimization techniques to derive a new loss function that better approximates the 0-1 loss compared with existing alternatives, while preserving the convexity of the learning problem. In our computational results, we show that the proposed estimator is competitive with the standard SVMs with the hinge loss in outlier-free regimes and better in the presence of outliers. 

Cepeda V, Gómez A and Han S.

We consider the problem of learning classification trees that are robust to distribution shifts between training and testing/deployment data. This problem arises frequently in high stakes settings such as public health and social work where data is often collected using self-reported surveys which are highly sensitive to e.g., the framing of the questions, the time when and place where the survey is conducted, and the level of comfort the interviewee has in sharing information with the interviewer. We propose a method for learning optimal robust classification trees based on mixed-integer robust optimization technology. In particular, we demonstrate that the problem of learning an optimal robust tree can be cast as a single-stage mixed-integer robust optimization problem with a highly nonlinear and discontinuous objective. We reformulate this problem equivalently as a two-stage linear robust optimization problem for which we devise a tailored solution procedure based on constraint generation. We evaluate the performance of our approach on numerous publicly available datasets, and compare the performance to a regularized, non-robust optimal tree. We show an increase of up to 12.48% in worst-case accuracy and of up to 4.85% in average-case accuracy across several datasets and distribution shifts from using our robust solution in comparison to the non-robust one.

Justin N, Aghaei S, Gómez A and Vayanos P.

ODTLearn is an open-source Python package that provides methods for learning optimal decision trees for high-stakes predictive and prescriptive tasks based on the mixed-integer optimization (MIO) framework proposed in Aghaei et al. (2019) and several of its extensions. The current version of the package provides implementations for learning optimal classification trees, optimal fair classification trees, optimal classification trees robust to distribution shifts, and optimal prescriptive trees from observational data. We have designed the package to be easy to maintain and extend as new optimal decision tree problem classes, reformulation strategies, and solution algorithms are introduced. To this end, the package follows object-oriented design principles and supports both commercial (Gurobi) and open source (COIN-OR branch and cut) solvers. 

Vossler P, Aghaei S, Justin N, Jo N, Gómez A and Vayanos P.

We study the problem of inferring sparse time-varying Markov random fields (MRFs) with different discrete and temporal regularizations on the parameters. Due to the intractability of discrete regularization, most approaches for solving this problem rely on the so-called \maximum-likelihood estimation (MLE) with relaxed regularization, which neither results in ideal statistical properties nor scale to the dimensions encountered in realistic settings. In this paper, we address these challenges by departing from the MLE paradigm and resorting to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters. Despite the nonconvex and discrete nature of our formulation, we show that it can be solved efficiently and parametrically for \textit{all} sparsity levels. More specifically, we show that the entire solution path of the time-varying MRF for all sparsity levels can be obtained in O(pT^3), where T is the number of time steps and p is the number of unknown parameters at any given time. The efficient and parametric characterization of the solution path renders our approach highly suitable for cross-validation, where parameter estimation is required for varying regularization values. Despite its simplicity and efficiency, we show that our proposed approach achieves provably small estimation error for different classes of time-varying MRFs, namely Gaussian and discrete MRFs, with as few as one sample per time. Utilizing our algorithm, we can recover the complete solution path for instances of time-varying MRFs featuring over 30 million variables in less than 12 minutes on a standard laptop computer.

Fattahi S and Gómez A.

Ridge regularized sparse linear regression involves selecting a subset of features that explains the relationship between a high-dimensional design matrix and an output vector in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like leave-one-out cross-validation are commonly used for hyperparameter tuning. However, cross-validation typically increases the cost of sparse regression by several orders of magnitude, because it requires solving multiple mixed-integer optimization problems (MIOs) for each hyperparameter combination. Additionally, validation metrics are noisy estimators of the test-set error, with different hyperparameter combinations leading to models with different amounts of noise. Therefore, optimizing over these metrics is vulnerable to out-of-sample disappointment, especially in underdetermined settings. To address this state of affairs, we make two contributions. First, we leverage the generalization theory literature to propose confidence-adjusted variants of the leave-one-out error that display less propensity to out-of-sample disappointment. Second, we leverage ideas from the mixed-integer optimization literature to obtain computationally tractable relaxations of the confidence-adjusted leave-one-out error, thereby minimizing it without solving as many MIOs. Our relaxations give rise to an efficient cyclic coordinate descent scheme which allows us to obtain significantly lower leave-one-out errors than via other methods in the literature. We validate our theory by demonstrating that we obtain significantly sparser and comparably accurate solutions than via popular methods like GLMNet and suffer from less out-of-sample disappointment. On synthetic datasets, our confidence adjustment procedure generates significantly fewer false discoveries, and improves out-of-sample performance by 2%–5% compared to cross-validating without confidence adjustment. Across a suite of 13 real datasets, a calibrated version of our confidence adjustment improves the test set error by an average of 4% compared to cross-validating without confidence adjustment.    

Cory-Wright R, Gómez A.

The increasing use of machine learning in high-stakes domains – where people’s livelihoods are impacted – creates an urgent need for interpretable, fair, and highly accurate algorithms. With these needs in mind, we propose a mixed integer optimization (MIO) framework for learning optimal classification trees – one of the most interpretable models – that can be augmented with arbitrary fairness constraints. In order to better quantify the “price of interpretability”, we also propose a new measure of model interpretability called decision complexity that allows for comparisons across different classes of machine learning models. We benchmark our method against state-of-the-art approaches for fair classification on popular datasets; in doing so, we conduct one of the first comprehensive analyses of the trade-offs between interpretability, fairness, and predictive accuracy. Given a fixed disparity threshold, our method has a price of interpretability of about 4.2 percentage points in terms of out-of-sample accuracy compared to the best performing, complex models. However, our method consistently finds decisions with almost full parity, while other methods rarely do.   

Jo N, Aghaei S, Benson J, Gómez A and Vayanos P (2023). Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics and Society.

TIn many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.   

Gómez A and Neto J.

The problem of inferring Markov random fields (MRFs) with a sparsity or robustness prior can be naturally modeled as a mixed-integer program. This motivates us to study a general class of convex submodular optimization problems with indicator variables, which we show to be polynomially solvable in this paper. The key insight is that, possibly after a suitable reformulation, indicator constraints preserve submodularity. Fast computations of the associated Lovasz extensions are also discussed under certain smoothness conditions, and can be implemented using only linear-algebraic operations in the case of quadratic objectives.  

Han S, Gómez A and Pang JS.

In this paper, we study the mixed-integer nonlinear set given by a separable quadratic constraint on continuous variables, where each continuous variable is controlled by an additional indicator. This set occurs pervasively in optimization problems with uncertainty and in machine learning. We show that optimization over this set is NP-hard. Despite this negative result, we characterize the structure of the convex hull, and show that it can be formally studied using polyhedral theory. Moreover, we show that although perspective relaxation in the literature for this set fails to match the structure of its convex hull, it is guaranteed to be a close approximation. 

Gómez A and Xie W (2024). Operations Research Letters, 52:107059.

We consider the convex quadratic optimization problem with indicator variables and arbitrary constraints on the indicators. We show that a convex hull description of the associated mixed-integer set in an extended space with a quadratic number of additional variables consists of a single positive semidefinite constraint (explicitly stated) and linear constraints. In particular, convexification of this class of problems reduces to describing a polyhedral set in an extended formulation. We also give descriptions in the original space of variables: we provide a description based on an infinite number of conic-quadratic inequalities, which are ``finitely generated." In particular, it is possible to characterize whether a given inequality is necessary to describe the convex-hull. The new theory presented here unifies several previously established results, and paves the way toward utilizing polyhedral methods to analyze the convex hull of mixed-integer nonlinear sets.

Wei L, Atamtürk A, Gómez A and Küçükyavuz S (2024). Mathematical Programming, 204:703-737.

We study the mixed-integer epigraph of a low-rank convex function with non-convex indicator constraints, which are often used to impose logical constraints on the support of the solutions. Extended formulations describing the convex hull of such sets can easily be constructed via disjunctive programming, although a direct application of this method often yields prohibitively large formulations, whose size is exponential in the number of variables. In this paper, we propose a new disjunctive representation of the sets under study, which leads to compact formulations with size exponential in the rank of the function, but polynomial in the number of variables. Moreover, we show how to project out the additional variables for the case of rank-one functions, recovering or generalizing known results for the convex hulls of such sets (in the original space of variables).

Han S and Gómez A (2024). Forthcoming in Mathematics of Operations Research.

2023, INFORMS JFIG Paper Competition, 3rd place.

In this paper, we consider convex quadratic optimization problems with indicator variables when the matrix Q defining the quadratic term in the objective is sparse. We use a graphical representation of the support of Q, and show that if this graph is a path, then we can solve the associated problem in polynomial time. This enables us to construct a compact extended formulation for the closure of the convex hull of the epigraph of the mixed-integer convex problem. Furthermore,  we propose a novel decomposition method for general (sparse) Q, which leverages the efficient algorithm for the path case. Our  computational experiments demonstrate  the effectiveness of the proposed method compared to state-of-the-art  mixed-integer optimization solvers.

Liu P, Fattahi S, Gómez A and Küçükyavuz S (2023). Mathematical Programming, 200:669-701.

2022, INFORMS Computing Society Student Paper Award, 2nd place.

This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity.  Three such paths are considered: the "L0-path" where the discontinuous L0-function provides the exact sparsity count; the "L1-path" where the L1-function provides a convex surrogate of sparsity count; and the "capped L1-path" where the nonconvex nondifferentiable capped L1-function aims to enhance the L1-approximation.  Serving different purposes, each of these three formulations is different from each other, both analytically and computationally.  Our results deepen the understanding of (old and new)properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped L1-path. Our study of the capped L1-path is the first of its kind as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths,  we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped L1-regularized problem offers interesting theoretical properties and practical compromise between the L0-path and the L1-path.  Indeed, while the L0-path is computationally prohibitive and greatly handicapped by the repeated solution of mixed integer nonlinear programs, the quality of L1-path, in terms of the two criteria---loss and sparsity--- in the estimation objective, is inferior to the capped L1-path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.

 He Z, Han S, Gómez A, Cui Y and Pang JS (2024). Mathematical Programming: 517-566.

We consider the problem of learning an optimal prescriptive tree (i.e., a personalized treatment assignment policy in the form of a binary tree) of moderate depth, from observational data. This problem arises in numerous socially important domains such as public health and personalized medicine, where interpretable and data-driven interventions are sought based on data gathered in deployment, through passive collection of data, rather than from randomized trials. We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology. We show that under mild conditions our method is asymptotically exact in the sense that it converges to an optimal out-of-sample treatment assignment policy as the number of historical data samples tends to infinity. This sets us apart from existing literature on the topic which either requires data to be randomized or imposes stringent assumptions on the trees. Based on extensive computational experiments on both synthetic and real data, we demonstrate that our asymptotic guarantees translate to significant out-of-sample performance improvements even in finite samples.

 Jo N, Aghaei S, Gómez A and Vayanos P.

Binarized neural networks are an important class  of neural network in deep learning due to their computational efficiency. This paper contributes towards a better understanding of the structure of binarized neural networks, specifically, ideal convex representations of the activation functions used. We describe the convex hull of the graph of the signum activation function associated with a single neuron, deriving closed forms for the convex and concave envelopes that improve upon those used in the literature. The new formulations lead to improved methods to verify the robustness of a binarized neural network via convex optimization. 

 Han S and Gómez A.

This paper studies several versions of the single-parameter sparse optimization problem in statistical estimation defined by a pairwise separation objective. The sparsity (i.e., L0) function is approximated by a folded concave function; the pairwise separation gives rise to an objective of the Z-type. After presenting several realistic estimation problems to illustrate the Z-structure, we introduce a linear-step inner-outer loop algorithm for computing a directional stationary solution of the nonconvex nondifferentiable folded concave sparsity problem. When specialized to a quadratic loss function with a Z-matrix and a piecewise affine folded concave sparsity function, the overall complexity of the algorithm is a low-order polynomial in the number of variables of the problem; thus the algorithm is strongly polynomial in this quadratic case. We also consider the parametric version of the problem that has a weighted L1-regularizer and a quadratic loss function with a (hidden) Z-matrix. We present a linear-step algorithm in two cases depending on whether the variables have prescribed signs or with unknown signs. In both cases, a parametric algorithm is presented and its strong polynomiality is established under suitable conditions on the weights. Such a parametric algorithm can be combined with an interval search scheme for choosing the parameter to optimize a secondary objective function in a bilevel setting. The analysis makes use of a least-element property of a Z-function, and, for the case of a quadratic loss function, the strongly polynomial solvability of a linear complementarity problem with a hidden Z-matrix. The origin of the latter class of matrices can be traced to an inspirational paper of Olvi Mangasarian to whom we dedicate our present work.

 Gómez A, He Z and Pang JS (2021). Mathematical Programming, 198:1339-1380.

In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes {sparsely} over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing MRFs. The proposed optimization problem is formulated based on the exact L0 regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error. As a special case, we derive sharp statistical guarantees for the inference of sparsely-changing Gaussian MRFs (GMRF) in the high-dimensional regime, showing that such problems can be learned with as few as one sample per time. Our proposed method is extremely efficient in practice: it can accurately estimate sparsely-changing graphical models with more than 500 million variables in less than one hour.

 Fattahi S and Gómez A (2021). Advances in Neural Information Processing Systems, 34:6529-6541.

We study the minimization of a rank-one quadratic with indicators and show that the underlying set function obtained by projecting out the continuous variables is supermodular. Although supermodular minimization is, in general, difficult, the specific set function for the rank-one quadratic can be  minimized in linear time. We show that the convex hull of the epigraph of the quadratic can be obtaining from inequalities for the underlying supermodular set function by lifting them into nonlinear inequalities in the original space of variables. Explicit forms of the convex-hull description are given, both in the original space of variables and in an extended formulation via conic quadratic-representable inequalities, along with a polynomial separation algorithm. Computational experiments indicate that the lifted supermodular inequalities in conic quadratic form are quite effective in reducing the integrality gap for quadratic optimization with indicators.

 Atamtürk A and Gómez A (2022). Mathematical Programming, 201:295-338.

Motivated by modern regression applications, in this paper, we study the convexification of a class of convex optimization problems with indicator variables and combinatorial constraints on the indicators. Unlike most of the previous work on convexification of sparse regression problems, we simultaneously consider the nonlinear non-separable objective, indicator variables, and combinatorial constraints. Specifically, we give the convex hull description of the epigraph of the composition of a one-dimensional convex function and an affine function under arbitrary combinatorial constraints. As special cases of this result, we derive ideal convexifications for problems with hierarchy, multi-collinearity, and sparsity constraints. Moreover, we also give a short proof that for a separable objective function, the perspective reformulation is ideal independent from the  constraints of the problem.  Our computational experiments with regression problems under hierarchy constraints on real datasets demonstrate the potential of the proposed approach in improving the relaxation quality without significant computational overhead.

Wei L, Gómez A and Küçükyavuz S (2022). Mathematical Programming, 192:57-88.

Conference version: https://link.springer.com/chapter/10.1007/978-3-030-45771-6_33.

2021, Northwestern, Nemhauser Prize for Best Student Paper. 

In this paper, we compare the strength of the optimal perspective reformulation and Shor's SDP relaxation. We prove these two formulations are equivalent for quadratic optimization problems with indicator variables.

Han S, Gómez A and Atamtürk A (2022). Operations Research Letters, 50:195-198.

In this paper, we study the convex quadratic optimization problem with indicator variables. For the bivariate case, we describe the convex hull of the epigraph in the original space of variables, and also give a conic quadratic extended formulation. Then, using the convex hull description for the bivariate case as a building block, we derive an extended SDP relaxation for the general case. This new formulation is stronger than other SDP relaxations proposed in the literature for the problem, including Shor's SDP relaxation, the optimal perspective relaxation as well as the optimal rank-one relaxation. Computational experiments indicate that the proposed formulations are quite effective in reducing the integrality gap of the optimization problems.

Han S, Gómez A and Atamtürk A (2023). Mathematical Programming, 202:95-134.

We consider the problem of learning optimal binary classification trees. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer programming (MIP) technology. Yet, existing approaches from the literature do not leverage the power of MIP to its full extent. Indeed, they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose a flow-based MIP formulation for optimal binary classification trees that has a stronger linear programming relaxation. Our formulation presents an attractive decomposable structure. We exploit this structure and max-flow/min-cut duality to derive a Benders' decomposition method, which scales to larger instances. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 50 times faster than state-of-the art MIP-based techniques and  improve out of sample performance up to 13.8%.

Aghaei S,  Gómez A and Vayanos P (2023). Forthcoming in Operations Research. 

We give safe screening rules to eliminate variables from regression with L0 regularization or cardinality constraint. These rules are based on guarantees that a feature may or may not be selected in an optimal solution. The screening rules can be computed from a convex relaxation solution in linear time, without solving the L0 optimization problem. Thus, they can be used in a preprocessing step to safely remove variables from consideration apriori. Numerical experiments on real and synthetic data indicate that, on average, 76% of the variables can be fixed to their optimal values, hence, reducing the computational burden for optimization substantially. Therefore, the proposed fast and effective screening rules extend the scope of algorithms for L0-regression to larger data sets. 

Atamtürk A and Gómez A (2020). Proceedings of the 37th International Conference on Machine Learning, 119:421-430.

We consider the problem of estimating the true values of a Wiener process given noisy observations corrupted by outliers. The problem considered is closely related to the Trimmed Least Squares estimation problem, a robust estimation procedure well-studied from a statistical standpoint but poorly understood from an optimization perspective. In this paper we show how to improve existing mixed-integer quadratic optimization formulations for this problem. Specifically, we convexify the existing formulations via lifting, deriving new mixed-integer conic quadratic reformulations. The proposed reformulations are stronger and substantially faster when used with current mixed-integer optimization solvers. In our experiments, solution times are improved by at least two orders-of-magnitude. 

Gómez A (2021). SIAM Journal on Optimization 31:1897-1925. 

Feature selection is a fundamental preprocessing step for many machine learning and pattern recognition systems. Notably, some correlation-based and mutual-information-based feature selection  problems can be formulated as fractional programs with a single ratio of polynomial 0-1 functions.~We study approaches that ensure globally optimal solutions for   these feature selection problems. We conduct computational experiments with several real data sets and report encouraging results. The considered solution methods perform well for medium- and reasonably large-sized instances, where the existing mixed-integer linear programs from the literature fail.

Mehmanchi E, Gómez A and Prokopyev O (2021). Annals of Operations Research 303:265-295. 

In this note we study multiple-ratio fractional 0--1 programs, a broad class of NP-hard combinatorial optimization problems. In particular, under some relatively mild assumptions we provide a complete characterization of the conditions, which ensure that a single-ratio function is submodular. Then we illustrate our theoretical results with the assortment optimization and facility location problems, and discuss practical situations that guarantee submodularity in the considered application settings. In such cases, near-optimal solutions for multiple-ratio fractional 0--1 programs can be found via simple greedy algorithms.

Han S, Gómez A and Prokopyev O (2022). Journal of Global Optimization 84:77-93.

2023, JOGO Best Paper Competition, Honorable Mention.

This paper focuses on methods that improve the performance of solution approaches for multiple-ratio fractional 0-1 programs (FPs) in their general structure. In particular, we explore the links between equivalent mixed-integer linear programming and conic quadratic programming reformulations of FPs. Thereby, we show that integrating the ideas behind these two types of reformulations of FPs allows us to push further the limits of the current state-of-the-art results and tackle larger-size problems. We perform extensive computational experiments to compare the proposed approaches against the current reformulations from the literature. 

Mehmanchi E, Gómez A and Prokopyev O (2019). Journal of Global Optimization 75:273-339. 

Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an L0 constraint restricting the support of the estimators is a challenging non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based on the ideal (convex-hull) formulations for rank-one quadratic terms with indicator variables. The new relaxations can be formulated as semidefinite optimization problems in an extended space and are stronger and more general than the state-of-the-art formulations, including the perspective reformulation and formulations with the reverse Huber penalty and the minimax concave penalty functions. Furthermore, the proposed rank-one strengthening can be interpreted as a non-separable, non-convex sparsity-inducing regularizer, which dynamically adjusts its penalty according to the shape of the error function. In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4% optimality gap) for non-convex L0 problems. Moreover, the resulting estimators also outperform alternative convex approaches from a statistical viewpoint, achieving high prediction accuracy and good interpretability. 

Atamtürk A and Gómez A. 

We propose a new class of valid inequalities for mixed-integer nonlinear optimization problems with indicator variables. The inequalities are obtained by lifting polymatroid inequalities in the space of the 0-1 variables into conic inequalities in the original space of variables. The proposed inequalities are shown to describe the convex hull of the set under study under appropriate submodularity conditions. Moreover, the inequalities include the perspective reformulation and pairwise inequalities for quadratic optimization with indicator variables as special cases. Finally, we use the proposed methodology to derive ideal formulations of other mixed-integer sets with indicator variables. 

Gómez A (2019). Technical report. 

Signal estimation problems with smoothness and sparsity priors can be naturally modeled as quadratic optimization with l_0-``norm" constraints. Since such problems are non-convex and hard-to-solve, the standard approach is, instead, to tackle their convex surrogates based on l_1-norm relaxations. In this paper, we propose new iterative conic quadratic relaxations that exploit not only the l_0-``norm" terms, but also the fitness and smoothness functions. The iterative convexification approach substantially closes the gap between the l_0-``norm" and its l_1 surrogate. Experiments using an off-the-shelf conic quadratic solver on synthetic as well as real datasets indicate that the proposed iterative convex relaxations lead to significantly better estimators than l_1-norm while preserving the computational efficiency. In addition, the parameters of the model and the resulting estimators are easily interpretable.  

Atamtürk A, Gómez A and Han S (2021). Journal of Machine Learning Research 22:1-43.

We study single- and multiple-ratio robust fractional 0-1 programming problems (RFPs). In particular, this work considers RFPs under a wide-range of disjoint and joint uncertainty sets, where the former implies separate uncertainty sets for each numerator and denominator, and the latter accounts for different forms of inter-relatedness between them. First, we demonstrate that, unlike the deterministic case, single-ratio RFP is NP-hard under general polyhedral uncertainty sets. However, if the uncertainty sets are imbued with a certain structure - variants of the well-known budgeted uncertainty - the disjoint and joint single-ratio RFPs are polynomially-solvable when the deterministic counterpart is. We also propose mixed-integer linear programming (MILP) formulations for multiple-ratio RFPs. We conduct extensive computational experiments to evaluate the performance of our MILP reformulations, as well as to compare the disjoint and joint uncertainty sets. Finally, we demonstrate the value of the robust approach by examining the robust solution in a deterministic setting and vice versa. 

Mehmanchi E, Gillen C, Gómez A and Prokopyev O (2019). INFORMS Journal on Optimization 2:96-133. 

We consider the best subset selection problem in linear regression, i.e., finding a parsimonious subset of the regression variables that provides the best fit to the data according to some predefined criterion. We show that, for a broad range of criteria used in the statistics literature, the best subset selection problem can be modeled as a mixed-integer fractional optimization problem. Then we show how to exploit underlying submodularity in the problem to strengthen the formulations, and propose to tackle the problem by solving a sequence of mixed-integer quadratic optimization problems. The proposed approach can be implemented with off-the-shelf solvers and, in our computations, it outperforms existing alternatives by orders of magnitude. 

Gómez A and Prokopyev O (2021). INFORMS Journal on Computing 33:551-565.

We study the convex hull of the mixed-integer set given by a conic quadratic inequality and indicator variables. Conic quadratic terms are often used to encode uncertainties, while the indicator variables are used to model fixed costs or enforce sparsity in the solutions. We provide the convex hull description of the set under consideration when the continuous variables are unbounded. We propose valid nonlinear inequalities for the bounded case, and show that they describe the convex hull for the two-variable case. All the proposed inequalities are described in the original space of variables and are SOCP-representable. We present computational experiments demonstrating the strength of the proposed formulations. 

Gómez A (2021). Mathematical Programming, 188:193-226.

 2018, INFORMS JFIG Paper Competition, Finalist. 

We study quadratic optimization with indicator variables and an M-matrix, i.e., a PSD matrix with non-positive off-diagonal entries, which arises directly in image segmentation and portfolio optimization with transaction costs, as well as a substructure of general quadratic optimization problems. We prove, under mild assumptions, that the minimization problem is solvable in polynomial time by showing its equivalence to a submodular minimization problem. To strengthen the formulation, we decompose the quadratic function into a sum of simple quadratic functions with at most two indicator variables each, and provide the convex-hull descriptions of these sets. We also describe strong conic quadratic valid inequalities. Preliminary computational experiments indicate that the proposed inequalities can substantially improve the strength of the continuous relaxations with respect to the standard perspective reformulation. 

Atamtürk A and Gómez A (2018). Mathematical Programming 170:141-176. 

We consider minimizing a conic quadratic objective over a polyhedron. Such problems arise in parametric value-at-risk minimization, portfolio optimization, and robust optimization with ellipsoidal objective uncertainty; and they can be solved by polynomial interior point algorithms for conic quadratic optimization. However, interior point algorithms are not well-suited for branch-and-bound algorithms for the discrete counterparts of these problems due to the lack of effective warm starts necessary for the efficient solution of convex relaxations repeatedly at the nodes of the search tree. In order to overcome this shortcoming, we reformulate the problem using the perspective of its objective. The perspective reformulation lends itself to simple coordinate descent and bisection algorithms utilizing the simplex method for quadratic programming, which makes the solution methods amenable to warm starts and suitable for branch-and-bound algorithms. We test the simplex-based quadratic programming algorithms to solve convex as well as discrete instances and compare them with the state-of-the-art approaches. The computational experiments indicate that the proposed algorithms scale much better than interior point algorithms and return higher precision solutions. In our experiments, for large convex instances, they provide up to 22x speed-up. For smaller discrete instances, the speed-up is about 13x over a barrier-based branch-and-bound algorithm and 6x over the LP-based branch-and-bound algorithm with extended formulations. 

Atamtürk A and Gómez A (2019). Mathematical Programming Computation 11:311-340. 

We describe strong convex valid inequalities for conic quadratic mixed 0-1 optimization. The inequalities exploit the submodularity of the binary restrictions and are based on the polymatroid inequalities over binaries for the diagonal case. We prove that the convex inequalities completely describe the convex hull of a single conic quadratic constraint as well as the rotated cone constraint over binary variables and unbounded continuous variables. We then generalize and strengthen the inequalities by incorporating additional constraints of the optimization problem. Computational experiments on mean-risk optimization with correlations, assortment optimization, and robust conic quadratic optimization indicate that the new inequalities strengthen the convex relaxations substantially and lead to significant performance improvements. 

Atamtürk A and Gómez A (2019). Operations Research 68:609-630. 

Given a polytope X, a monotone concave univariate function g, and two vectors c and d, we study the discrete optimization problem of finding a vertex of X that maximizes the utility function c'x + g(d'x). This problem has numerous applications in combinatorial optimization with a probabilistic objective, including estimation of project duration with stochastic times, in reliability models, in multinomial logit models and in robust optimization. We show that the problem is NP-hard for any strictly concave function g even for simple polytopes, such as the uniform matroid, assignment and path polytopes; and propose a 1/2-approximation algorithm for it. We discuss improvements for special cases where g is the square root, log utility, negative exponential utility and multinomial logit probability function. In particular, for the square root function, the approximation ratio is 4/5. We also propose a 1.25-approximation algorithm for a class of minimization problems in which the maximization of the utility function appears as a subproblem. Although the worst case bounds are tight, computational experiments indicate that the suggested approach finds solutions within 1-2% optimality gap for most of the instances, and can be considerably faster than the existing alternatives. 

Atamtürk A and Gómez A (2017). Operations Research 65:433-445. 

2017, UC Berkeley, Katta Murty Best Paper Prize. 

Flow cover inequalities are among the most effective valid inequalities for capacitated fixed-charge network flow problems. These valid inequalities are based on implications for the flow quantity on the cut arcs of a two-partitioning of the network, depending on whether some of the cut arcs are open or closed. As the implications are only on the cut arcs, flow cover inequalities can be obtained by collapsing a subset of nodes into a single node. In this article, we derive new valid inequalities for the capacitated fixed-charge network flow problem by exploiting additional information from the network. In particular, the new inequalities are based on a three partitioning of the nodes. The new three-partition flow cover inequalities include the flow cover inequalities as a special case. We discuss the constant capacity case and give a polynomial separation algorithm for the inequalities. Finally, we report computational results with the new inequalities for networks with different characteristics. 

Atamtürk A, Gómez A and Küçükyavuz S (2016). Networks 67: 299–315. 

Vehicle routing problems with stochastic travel and service times (VRPSTT) consist of designing transportation routes of minimal expected cost over a network where travel and service times are represented by random variables. Most of the existing approaches for VRPSTT are conceived to exploit the properties of the distributions assumed for the random variables. Therefore, these methods are tied to a given family of distributions and subject to strong modeling assumptions. We propose an alternative way to model travel and service times in VRPSTT without making many assumptions regarding such distributions. To illustrate our approach, we embed it into a state-of-the-art routing engine and use it to conduct experiments on instances with different travel and service time distributions 

Gómez A, Mariño R, Akhavan-Tabatabaei R, Medaglia A and Mendoza J (2016). Transportation Science 50:627-641.