# Optimization and Control (math.OC)

• Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may be responsible for such success. In particular, several authors have noted that \emphover-parametrization appears to act as a remedy against non-convexity. In this paper, we address this phenomenon by studying key topological properties of the loss, such as the presence or absence of "spurious valleys", defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, our main contribution is to prove that as soon as the hidden layer size matches the \emphintrinsic dimension of the reproducing space, defined as the linear functional space generated by the activations, no spurious valleys exist, thus allowing the existence of descent directions. Our setup includes smooth activations such as polynomials, both in the empirical and population risk, and generic activations in the empirical risk case.
• In this paper, we consider the (global and sum) energy efficiency optimization problem in downlink multi-input multi-output multi-cell systems, where all users suffer from multi-user interference. This is a challenging problem due to several reasons: 1) it is a nonconvex fractional programming problem, 2) the transmission rate functions are characterized by (complex-valued) transmit covariance matrices, and 3) the processing-related power consumption may depend on the transmission rate. We tackle this problem by the successive pseudoconvex approximation approach, and we argue that pseudoconvex optimization plays a fundamental role in designing novel iterative algorithms, not only because every locally optimal point of a pseudoconvex optimization problem is also globally optimal, but also because a descent direction is easily obtained from every optimal point of a pseudoconvex optimization problem. The proposed algorithms have the following advantages: 1) fast convergence as the structure of the original optimization problem is preserved as much as possible in the approximate problem solved in each iteration, 2) easy implementation as each approximate problem is suitable for parallel computation and its solution has a closed-form expression, and 3) guaranteed convergence to a stationary point or a Karush-Kuhn-Tucker point. The advantages of the proposed algorithm are also illustrated numerically.
• Small inhibitory neuronal circuits have long been identified as key neuronal motifs to generate and modulate the coexisting rhythms of various motor functions. Our paper highlights the role of a cellular switching mechanism to orchestrate such circuits. The cellular switch makes the circuits reconfigurable, robust, adaptable, and externally controllable. Without this cellular mechanism, the circuits rhythms entirely rely on specific tunings of the synaptic connectivity, which makes them rigid, fragile, and difficult to control externally. We illustrate those properties on the much studied architecture of a small network controlling both the pyloric and gastric rhythms of crabs. The cellular switch is provided by a slow negative conductance often neglected in mathematical modeling of central pattern generators. We propose that this conductance is simple to model and key to computational studies of rhythmic circuit neuromodulation.
• We consider a supremal functional of the form $$F(u)= \supess_x ∈\Omega f(x,Du(x))$$ where $\Omega\subseteq \R^N$ is a regular bounded open set, $u\in \wi$ and $f$ is a Borel function. Under a mild assumption on the sublevel sets of $F$, we show that the lower semicontinuous envelopes of $F$ with respect to the weak$^*$ topology, the weak$^*$ convergence and the uniform convergence are level convex (i.e. they have convex sub-level sets).
• As the energy transition transforms power grids across the globe it poses several challenges regarding grid design and control. In particular, high levels of intermittent renewable generation complicate the job of continuously balancing power supply and demand, which is necessary for the grid's stability. Although there exist several proposals to control the grid, most of them have not demonstrated to be cost efficient in terms of optimal control theory. Here, we mathematically formulate the control problem for stable operation of power grids, determining the minimal amount of control in the active power needed to achieve the constraints, and minimizing a suitable cost function at the same time. We investigate the performance of the optimal control method with respect to the uncontrolled scenario and we compare it to a simple linear control case, for two types of external disturbances. Considering case studies with two and five nodes respectively, we find that the linear control can improve the synchronization and transient stability of the power grid. However, if the synchronized angular velocity after a disturbance is allowed to differ from its initial steady state value, the linear control performs inefficiently in comparison to the optimal one.
• Feb 20 2018 math.OC arXiv:1802.06637v1
Mean field games (MFG) are dynamic games with infinitely many infinitesimal agents. In this context, we study the efficiency of Nash MFG equilibria: Namely, we compare the social cost of a MFG equilibrium with the minimal cost a global planner can achieve. We find a structure condition on the game under which there exists efficient MFG equilibria and, in case this condition is not fulfilled, quantify how inefficient MFG equilibria are.
• We give sufficient conditions for the expected excess and the upper semideviation of recourse functions to be strongly convex. This is done in the setting of two-stage stochastic programs with complete linear recourse and random right-hand side. This work extends results on strong convexity of risk-neutral models.
• We consider the decidability of state-to-state reachability in linear time-invariant control systems, with control sets defined by boolean combinations of linear inequalities. Decidability of the sub-problem in which control sets are linear subspaces is a fundamental result in control theory. We first show that reachability is undecidable if the set of controls is a finite union of affine subspaces. We then consider two simple subclasses of control sets---unions of two affine subspaces and bounded convex polytopes respectively---and show that in these two cases the reachability problem for LTI systems is as hard as certain longstanding open decision problems concerning linear recurrence sequences. Finally we present some spectral assumptions on the transition matrix of an LTI system under which reachability becomes decidable with bounded convex polytopes as control sets.
• This paper studies a 4-approximation algorithm for k-prize collecting Steiner tree problems. This problem generalizes both k-minimum spanning tree problems and prize collecting Steiner tree problems. Our proposed algorithm employs two 2-approximation algorithms for k-minimum spanning tree problems and prize collecting Steiner tree problems. Also our algorithm framework can be applied to a special case of k-prize collecting traveling salesman problems.
• We develop a framework for goal oriented optimal design of experiments (GOODE) for large-scale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in a predicted quantity of interest (QoI) rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a prediction QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low dimensionality of prediction QoIs. We present two modified design criteria, A-GOODE and D-GOODE, which are natural analogues of classical Bayesian A- and D-optimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradient-based optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an $\ell_1$-norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter.
• Feb 20 2018 math.OC arXiv:1802.06482v1
This paper provides construction methods of the nearest graph Laplacian to a matrix identified from measurement data of graph Laplacian dynamics that include biochemical systems, synchronization systems, and multi-agent systems. We consider the following three cases. 1) The network structure, i.e., the connection relationship of edges of a given graph, is known and the network is directed. 2) The network structure is known and the network is undirected. 3) The network structure is unknown. For all the cases, problems of finding the nearest graph Laplacian are formulated as convex optimization problems. To efficiently solve the problems, we propose alternating direction method of multiplies (ADMM) algorithms by reformulating our problems into ADMM-applicable forms. Simulation experiments demonstrate that our proposed methods are useful to perform data-driven modeling of graph Laplacian dynamics.
• We consider leader-follower multi-agent systems that have many leaders, defined on any connected weighted undirected graphs, and address the leader selection and demotion problems. The leader selection problem is formulated as a minimization problem for the $H^2$ norm of the difference between the transfer functions of the original and new agent systems, under the assumption that the leader agents to be demoted are fixed. The leader demotion problem is that of finding optimal leader agents to be demoted, and is formulated using the global optimal solution to the leader selection problem. We prove that a global optimal solution to the leader selection problem is the set of the original leader agents except for those that are demoted to followers. To this end, we relax the original problem into a differentiable problem. Then, by calculating the gradient and Hessian of the objective function of the relaxed problem, we prove that the function is convex. It is shown that zero points of the gradient are global optimal solutions to the leader selection problem, which is a finite combinatorial optimization problem. Furthermore, we prove that any set of leader agents to be demoted subject to a fixed number of elements is a solution to the leader demotion problem. By combining the solutions to the leader selection and demotion problems, we prove that if we choose new leader agents from the original ones except for those specified by the set of leader agents to be demoted, then the relative $H^2$ error between the transfer functions of the original and new agent systems is completely determined by the numbers of original leader agents and leader agents that are demoted to follower agents. That is, we reveal that the relative $H^2$ error does not depend on the number of agents on the graph. Finally, we verify the solutions using a simple example.
• The BFGS quasi-Newton methodology, popular for smooth minimization, has also proved surprisingly effective in nonsmooth optimization. Through a variety of simple examples and computational experiments, we explore how the BFGS matrix update improves the local metric associated with a convex function even in the absence of smoothness and without using a line search. We compare the behavior of the BFGS and Shor r-algorithm updates.
• We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz. For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, one of the two mutually exclusive events will occur: either the Langevin trajectory ends up somewhere outside the $\varepsilon$-neighborhood of this particular optimum within a short recurrence time; or it enters this $\varepsilon$-neighborhood by the recurrence time and stays there until an exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the recurrence time is dimension-independent, and resembles the convergence time of deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.
• Pushing a little forward an approach proposed by Villani, we are going to prove that in the Riemannian setting the condition $\nabla^2 f< g$ implies that $f$ is $c$-concave with respect to the quadratic cost as soon as it has a sufficiently small $C^1$-norm. From this, we deduce a sufficient condition for the optimality of transport maps.
• In this paper we consider online mirror descent (OMD) algorithms, a class of scalable online learning algorithms exploiting data geometric structures through mirror maps. Necessary and sufficient conditions are presented in terms of the step size sequence $\{\eta_t\}_{t}$ for the convergence of an OMD algorithm with respect to the expected Bregman distance induced by the mirror map. The condition is $\lim_{t\to\infty}\eta_t=0, \sum_{t=1}^{\infty}\eta_t=\infty$ in the case of positive variances. It is reduced to $\sum_{t=1}^{\infty}\eta_t=\infty$ in the case of zero variances for which the linear convergence may be achieved by taking a constant step size sequence. A sufficient condition on the almost sure convergence is also given. We establish tight error bounds under mild conditions on the mirror map, the loss function, and the regularizer. Our results are achieved by some novel analysis on the one-step progress of the OMD algorithm using smoothness and strong convexity of the mirror map and the loss function.
• The purpose of this paper is two-fold: We extend the well-known relation between optimal stopping and randomized stopping of a given stochastic process to a situation where the available information flow is a sub-filtration of the filtration of the process. We call these problems optimal stopping and randomized stopping with partial information. Following an idea of Krylov [K] we introduce a special singular stochastic control problem with partial information and show that this is also equivalent to the partial information optimal stopping and randomized stopping problems. Then we show that the solution of this singular control problem can be expressed in terms of (partial information) variational inequalities, which in turn can be rewritten as a reflected backward stochastic differential equation (RBSDE) with partial information.
• A function in a class $\mathcal{F}(X)$ is said to be subdifferentially determined in $\mathcal{F}(X)$ if it is equal up to an additive constant to any function in $\mathcal{F}(X)$ with the same subdifferential. A function is said to be subdifferentially representable if it can be recovered from a subdifferential. We identify large classes of lower semicontinuous functions that possess these properties.
• We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce optimization over a constrained domain to unconstrained optimization. All of our reductions run as fast as online gradient descent. We use our new techniques to improve upon the previously best regret bounds for parameter-free learning, and do so for arbitrary norms.
• There is a recent surge of interest in nonconvex reformulations via low-rank factorization for stochastic convex semidefinite optimization problem in the purpose of efficiency and scalability. Compared with the original convex formulations, the nonconvex ones typically involve much fewer variables, allowing them to scale to scenarios with millions of variables. However, it opens a new challenge that under what conditions the nonconvex stochastic algorithms may find the global optima effectively despite their empirical success in applications. In this paper, we provide an answer that a stochastic gradient descent method with variance reduction, can be adapted to solve the nonconvex reformulation of the original convex problem, with a \textitglobal linear convergence, i.e., converging to a global optimum exponentially fast, at a proper initial choice in the restricted strongly convex case. Experimental studies on both simulation and real-world applications on ordinal embedding are provided to show the effectiveness of the proposed algorithms.
• Low-thrust orbital transfers are difficult to optimize by indirect methods. The main issues come from the costate guess and from the numerical propagation accuracy required by the shooting method. In the case of a coplanar minimum-time low-thrust transfer with eclipses, an analytical costate guess is proposed. The optimal control problem reduces to an unconstrained minimization problem with two unknowns. A derivative-free algorithm yields a quasi-optimal solution from scratch in a few minutes. No specific guess is necessary and the algorithm properties ensure finding the global minimum of the unconstrained problem. The method is applicable to thrust levels from large to very low and to any eclipse configuration, as exemplified on a transfer towards the geostationary orbit.
• The problem of simultaneous trajectography of several dynamical objects is formulated as an optimization problem. The available observations consist in a series of photographs showing undiscriminated objects. The goal is to find the object initial states so that the resulting trajectories match as well as possible the set of observations. An assignment problem is solved at each observation date by the Hungarian method, yielding a deviation cost between the simulated trajectories and the measurements. A fitness function summing the deviation costs is minimized by a particle swarm algorithm. The method is illustrated on a space orbitography application.
• We consider a dynamic resource allocation problem. There are multiple resources and each resource has multiple units. Customers arrive over a finite horizon according to some known distribution, and each customer requests a combination of resources and pays a price. The decision maker needs to irrevocably accept or reject these customers to maximize expected revenue. The exact solution to the problem is often impossible to obtain due to the curse of dimensionality. We study a class of re-solving heuristics. These heuristics periodically re-solve the deterministic linear program (DLP), where random customer arrivals are replace by their expectations. We find that frequently re-solving the DLP produces the same order of revenue loss as one would get without re-solving. However, by re-solving the DLP at a few selected points in time, we design a new algorithm that has a revenue loss bounded by a constant that is independent of the horizon length.
• Controllable building loads have the potential to increase the flexibility of power systems. A key step in developing effective and attainable load control policies is modeling the set of feasible building load profiles. In this paper, we consider buildings whose source of flexibility is their HVAC loads. We propose a data-driven method to empirically estimate a robust feasible region of the load using coarse data, that is, using only total building load and average indoor temperatures. The proposed method uses easy-to-gather coarse data and can be adapted to buildings of any type. The resulting feasible region model is robust to temperature prediction errors and is described by linear constraints. The mathematical simplicity of these constraints makes the proposed model adaptable to many power system applications, for example, economic dispatch, and optimal power flow. We validate our model using data from EnergyPlus and demonstrate its usefulness through a case study in which flexible building loads are used to balance errors in wind power forecasts.
• Using elementary arguments, we derive $\L_{p}$-error bounds for the approximation of frictionless wealth process in markets with proportional transaction costs. For utilities with bounded risk aversion, these estimates yield lower bounds for the frictional value function, which pave the way for its asymptotic analysis using stability results for viscosity solutions. Using tools from Malliavin calculus, we also derive simple sufficient conditions for the regularity of frictionless optimal trading strategies, the second main ingredient for the asymptotic analysis of small transaction costs.
• We consider an optimal control problem where the state equations are a coupled hyperbolic-elliptic system. This system arises in elastodynamics with piezoelectric effects -- the elastic stress tensor is a function of elastic displacement and electric potential. The electric flux acts as the control variable and, in addition to the state constraints, the bound constraints on the control are considered. We develop a complete analysis for the state equations and the control problem. The requisite regularity on the control, to show the well-posedness of state equations, is enforced using the cost functional. We rigorously derive the first order necessary and sufficient conditions using adjoint equations and further study their well-posedness. For spatially discrete (time continuous) problem, we show the convergence of our numerical scheme. Three dimensional numerical experiments are provided showing convergence properties of a fully discrete method and the practical applicability of our approach.
• In the power system, state estimation (SE) is important monitoring task for the reliable operation of the system. The optimal estimate from the SE is delivered to all EMS application such as fault analysis, automatic generation control. Hence, it is crucial to have good estimation before taking any critical actions. However, the SE problem is challenging problem due to nonconvexity of power flow equations in the nonlinear AC power flow model, which give us a usually local solution. To deal with this nonconvexity, some recent literatures applied the convex semi-definite (SDP) relaxation technique to relax the SE problem attaining or approximating a global solution. In this paper, we investigate the rank of this technique, which is critical to yield a physically meaningful solution with the five-bus test system and propose new approach to possibly reduce the rank by complementing the traditional set of measurement with PMU data. Numerical tests on the standard IEEE 14, 30, 57, 118-bus test system are presented for the demonstration.
• We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,...,\Theta_L$ and defined by $h(x) = \Theta_L \Theta_{L-1} ... \Theta_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the optimum, in the case where the initial hypothesis $\Theta_1 = ... = \Theta_L = I$ has loss bounded by a small enough constant. On the other hand, we show that gradient descent fails to converge for $\Phi$ whose distance from the identity is a larger constant, and we show that some forms of regularization toward the identity in each layer do not help. If $\Phi$ is symmetric positive definite, we show that an algorithm that initializes $\Theta_i = I$ learns an $\epsilon$-approximation of $f$ using a number of updates polynomial in $L$, the condition number of $\Phi$, and $\log(d/\epsilon)$. In contrast, we show that if the target $\Phi$ is symmetric and has a negative eigenvalue, then all members of a class of algorithms that perform gradient descent with identity initialization, and optionally regularize toward the identity in each layer, fail to converge. We analyze an algorithm for the case that $\Phi$ satisfies $u^{\top} \Phi u > 0$ for all $u$, but may not be symmetric. This algorithm uses two regularizers: one that maintains the invariant $u^{\top} \Theta_L \Theta_{L-1} ... \Theta_1 u > 0$ for all $u$, and another that "balances" $\Theta_1 ... \Theta_L$ so that they have the same singular values.
• This paper considers an unsignalized intersection used by two traffic streams. A stream of cars is using a primary road, and has priority over the other stream. Cars belonging to the latter stream cross the primary road if the gaps between two subsequent cars on the primary road are larger than their critical headways. A question that naturally arises relates to the capacity of the secondary road: given the arrival pattern of cars on the primary road, what is the maximum arrival rate of low-priority cars that can be sustained? This paper addresses this issue by considering a compact model that sheds light on the dynamics of the considered unsignalized intersection. The model, which is of a queueing-theoretic nature, reveals interesting insights into the impact of the user behavior on stability. The contributions of this paper are threefold. First, we obtain new results for the aforementioned model that includes driver impatience. Secondly, we reveal some surprising aspects that have remained unobserved in the existing literature so far, many of which are caused by the fact that the capacity of the minor road cannot be expressed in terms of the \emphmean gap size; instead more detailed characteristics of the critical headway distribution play a crucial role. The third contribution is the introduction of a new form of bunching on the main road, called Markov platooning. The tractability of this model allows us to study the impact of various platoon formations on the main road on the capacity of the minor road.
• We consider learning-based variants of the $c \mu$ rule -- a classic and well-studied scheduling policy -- in single and multi-server settings for multi-class queueing systems. In the single server setting, the $c \mu$ rule is known to minimize the expected holding-cost (weighted queue-lengths summed both over classes and time). We focus on the setting where the service rates $\mu$ are unknown, and are interested in the holding-cost regret -- the difference in the expected holding-costs between that induced by a learning-based rule (that learns $\mu$) and that from the $c \mu$ rule (which has knowledge of the service rates) over any fixed time horizon. We first show that empirically learning the service rates and then scheduling using these learned values results in a regret of holding-cost that does not depend on the time horizon. The key insight that allows such a constant regret bound is that a work-conserving scheduling policy in this setting allows explore-free learning, where no penalty is incurred for exploring and learning server rates. We next consider the multi-server setting. We show that in general, the $c \mu$ rule is not stabilizing (i.e. there are stabilizable arrival and service rate parameters for which the multi-server $c \mu$ rule results in unstable queues). We then characterize sufficient conditions for stability (and also concentrations on busy periods). Using these results, we show that learning-based variants of the $c\mu$ rule again result in a constant regret (i.e. does not depend on the time horizon). This result hinges on (i) the busy period concentrations of the multi-server $c \mu$ rule, and that (ii) our learning-based rule is designed to dynamically explore server rates, but in such a manner that it eventually satisfies an explore-free condition.