# Optimization and Control (math.OC)

• We consider how to connect a set of disjoint networks to optimize the performance of the resulting composite network. We quantify this performance by the coherence of the composite network, which is defined by an H2 norm of the system. Two dynamics are considered: noisy consensus dynamics with and without stubborn agents. For noisy consensus dynamics without stubborn agents, we derive analytical expressions for the coherence of composite networks in terms of the coherence of the individual networks and the structure of their interconnections. We also identify optimal interconnection topologies and give bounds on coherence for general composite graphs. For noisy consensus dynamics with stubborn agents, we show that the coherence of a composite network is a submodular function over the set of potential edges between the disjoint networks. We leverage this submodularity to develop a non-combinatorial algorithm that identifies connecting edges such that the composite network coherence is within a provable bound of optimal.
• Under Markovian assumptions we leverage a Central Limit Theorem (CLT) related to the test statistic in the composite hypothesis Hoeffding test so as to derive a new estimator for the threshold needed by the test. We first show the advantages of our estimator over an existing estimator by conducting extensive numerical experiments. We then apply the Hoeffding test with our threshold estimator to detecting anomalies in both communication and transportation networks. The former application seeks to enhance cyber security and the latter aims at building smarter transportation systems in cities.
• Natural disasters, such as hurricanes, earthquakes and large wind or ice storms, typically require the repair of a large number of components in electricity distribution networks. Since power cannot be restored before these repairs have been completed, optimally scheduling the available crews to minimize the cumulative duration of the customer interruptions reduces the harm done to the affected community. Considering the radial network structure of the distribution system, this repair and restoration process can be modeled as a job scheduling problem with soft precedence constraints. As a benchmark, we first formulate this problem as a time-indexed ILP with valid inequalities. Three practical methods are then proposed to solve the problem: (i) an LP-based list scheduling algorithm, (ii) a single to multi-crew repair schedule conversion algorithm, and (iii) a scheduling algorithm based on $\rho$-factors which can be interpreted as Component Importance Measures. We show that the first two algorithms are $4$ and $\left(2 - \frac{1}{m}\right)$ approximations respectively. We also prove that the latter two algorithms are equivalent. Numerical results validate the effectiveness of the proposed methods.
• Alternating minimization, or Fienup methods, have a long history in phase retrieval. We provide new insights related to the empirical and theoretical analysis of these algorithms when used with Fourier measurements and combined with convex priors. In particular, we show that Fienup methods can be viewed as performing alternating minimization on a regularized nonconvex least-squares problem with respect to amplitude measurements. We then prove that under mild additional structural assumptions on the prior (semi-algebraicity), the sequence of signal estimates has a smooth convergent behaviour towards a critical point of the nonconvex regularized least-squares objective. Finally, we propose an extension to Fienup techniques, based on a projected gradient descent interpretation and acceleration using inertial terms. We demonstrate experimentally that this modification combined with an $\ell_1$ prior constitutes a competitive approach for sparse phase retrieval.
• Stochastic network optimization problems entail finding resource allocation policies that are optimum on an average but must be designed in an online fashion. Such problems are ubiquitous in communication networks, where resources such as energy and bandwidth are divided among nodes to satisfy certain long-term objectives. This paper proposes an asynchronous incremental dual decent resource allocation algorithm that utilizes delayed stochastic gradients for carrying out its updates. The proposed algorithm is well-suited to heterogeneous networks as it allows the computationally-challenged or energy-starved nodes to, at times, postpone the updates. The asymptotic analysis of the proposed algorithm is carried out, establishing dual convergence under both, constant and diminishing step sizes. It is also shown that with constant step size, the proposed resource allocation policy is asymptotically near-optimal. An application involving multi-cell coordinated beamforming is detailed, demonstrating the usefulness of the proposed algorithm.
• Under the strongly convex assumption, several recent works studied the global linear convergence rate of the proximal incremental aggregated gradient (PIAG) method for minimizing the sum of a large number of smooth component functions and a non-smooth convex function. In this paper, under \textslthe quadratic growth condition--a strictly weaker condition than the strongly convex assumption, we derive a new global linear convergence rate result, which implies that the PIAG method attains global linear convergence rates in both the function value and iterate point errors. The main idea behind is to construct a certain Lyapunov function.
• Multiview representation learning is very popular for latent factor analysis. It naturally arises in many data analysis, machine learning, and information retrieval applications to model dependent structures between a pair of data matrices. For computational convenience, existing approaches usually formulate the multiview representation learning as convex optimization problems, where global optima can be obtained by certain algorithms in polynomial time. However, many evidences have corroborated that heuristic nonconvex approaches also have good empirical computational performance and convergence to the global optima, although there is a lack of theoretical justification. Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by two stochastic gradient descent (SGD) methods. Theoretically, by analyzing the dynamics of the algorithms based on diffusion processes, we establish global rates of convergence to the global optima with high probability. Numerical experiments are provided to support our theory.
• We present exact copositive relaxation and global optimality conditions for an extended trust-region problem under suitable conditions by way of studying its semi-Lagrangian duality. We then establish conditions under which exactness of semi-Lagrangian relaxations, or of the usual Lagrangian relaxation, holds for an extended CDT (two-ball trust-region) problem.
• We consider using a battery storage system simultaneously for peak shaving and frequency regulation through a joint optimization framework which captures battery degradation, operational constraints and uncertainties in customer load and regulation signals. Under this framework, using real data from a Microsoft data center, an University of Washington building and PJM, we show the electricity bill of users can be reduced by up to 15\%. Furthermore, we demonstrate that the saving from joint optimization is often larger than the sum of the optimal savings when the battery is used for the two individual applications. A simple threshold real-time algorithm is proposed and achieves this super-linear gain. Compared to prior works that focused on using battery storage systems for single applications, our results suggest that batteries can produce much larger economic benefits than previously thought if they jointly provide multiple services.
• We discuss exact observability of boundary and internal observation of a one-dimensional Schrödinger equation on a time dependent domain.
• This paper considers the constant step-size stochastic subgradient descent (SSD) method, that is commonly used for solving resource allocation problems arising in wireless networks, cognitive radio networks, smart-grid systems, and task scheduling. It is well known that with a step size of $\epsilon$, SSD converges to an $\mathcal{O}(\epsilon)$-sized neighborhood of the optimum. In practice however, there exists a trade-off between the rate of convergence and the choice of $\epsilon$. This paper establishes a convergence rate result for the SSD algorithm that precisely characterizes this trade-off. \colrTowards this end, a novel stochastic bound on the gap between the objective function and the optimum is developed. The asymptotic behavior of the stochastic term is characterized in an almost sure sense, thereby generalizing the existing results for the SSD methods. When applied to the stochastic resource allocation problem, the result explicates the rate with which the allocated resources become near-optimum. As an example, the power and user-allocation problem in device-to-device networks is formulated and solved using the SSD algorithm. Further intuition on the rate results is obtained from the verification of the regularity conditions and accompanying simulation results.
• Short-Term Load Forecasting (STLF) is a fundamental component in the efficient management of power systems, which has been studied intensively over the past 50 years. The emerging development of smart grid technologies is posing new challenges as well as opportunities to STLF. Load data, collected at higher geographical granularity and frequency through thousands of smart meters, allows us to build a more accurate local load forecasting model, which is essential for local optimization of power load through demand side management. With this paper, we show how several existing approaches for STLF are not applicable on local load forecasting, either because of long training time, unstable optimization process, or sensitivity to hyper-parameters. Accordingly, we select five models suitable for local STFL, which can be trained on different time-series with limited intervention from the user. The experiment, which consists of 40 time-series collected at different locations and aggregation levels, revealed that yearly pattern and temperature information are only useful for high aggregation level STLF. On local STLF task, the modified version of double seasonal Holt-Winter proposed in this paper performs relatively well with only 3 months of training data, compared to more complex methods.
• Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations.
• This paper considers the minimization of a general objective function $f(X)$ over the set of non-square $n\times m$ matrices where the optimal solution $X^\star$ is low-rank. To reduce the computational burden, we factorize the variable $X$ into a product of two smaller matrices and optimize over these two matrices instead of $X$. Despite the resulting nonconvexity, recent studies in matrix completion and sensing have shown that the factored problem has no spurious local minima and obeys the so-called strict saddle property (the function has a directional negative curvature at all critical points but local minima). We analyze the global geometry for a general and yet well-conditioned objective function $f(X)$ whose restricted strong convexity and restricted strong smoothness constants are comparable. In particular, we show that the reformulated objective function has no spurious local minima and obeys the strict saddle property. These geometric properties implies that a number of iterative optimization algorithms (such as gradient descent) can provably solve the factored problem with global convergence.
• Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.
• Consider reconstructing a signal $x$ by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on $x$ and enforces its sparsity using $\ell_1$-norm analysis regularization. We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible $x$ and also develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match.
• A split feasibility formulation for the inverse problem of intensity-modulated radiation therapy (IMRT) treatment planning with dose-volume constraints (DVCs) included in the planning algorithm is presented. It involves a new type of sparsity constraint that enables the inclusion of a percentage-violation constraint in the model problem and its handling by continuous (as opposed to integer) methods. We propose an iterative algorithmic framework for solving such a problem by applying the feasibility-seeking CQ-algorithm of Byrne combined with the automatic relaxation method (ARM) that uses cyclic projections. Detailed implementation instructions are furnished. Functionality of the algorithm was demonstrated through the creation of an intensity-modulated proton therapy plan for a simple 2D C-shaped geometry and also for a realistic base-of-skull chordoma treatment site. Monte Carlo simulations of proton pencil beams of varying energy were conducted to obtain dose distributions for the 2D test case. A research release of the Pinnacle3 proton treatment planning system was used to extract pencil beam doses for a clinical base-of-skull chordoma case. In both cases the beamlet doses were calculated to satisfy dose-volume constraints according to our new algorithm. Examination of the dose-volume histograms following inverse planning with our algorithm demonstrated that it performed as intended. The application of our proposed algorithm to dose-volume constraint inverse planning was successfully demonstrated. Comparison with optimized dose distributions from the research release of the Pinnacle3 treatment planning system showed the algorithm could achieve equivalent or superior results.
• In optimization community, coordinate descent algorithms are often viewed as some coordinate-wise variants of $\ell_2$-norm gradient descent. In this paper, we improve coordinate descent algorithms from another view that greedy coordinate descent (GCD) is equivalent to finding the solution with the least coordinate update of $\ell_1$-norm gradient descent. Then from the perspective of solving an $\ell_1$-regularized $\ell_1$-norm gradient descent ($\ell_1$-$\ell_1$-GD) problem, GCD is generalized to solve $\ell_1$-regularized convex problems. It is nontrivial to solve the $\ell_1$-$\ell_1$-GD problem and thus an efficient greedy algorithm called \emph$\ell_1$-greedy is proposed, which is proved to \emphexactly solve it. Meanwhile, by combing $\ell_1$-greedy and a specific mirror descent step as well as Katyusha framework, we \emphaccelerate and \emphrandomize GCD to obtain the optimal convergence rate $O(1/\sqrt{\epsilon})$ and reduce the complexity of greedy selection by a factor up to $n$. When the regularization parameter is relatively large and the samples are high-dimensional and dense, the proposed AGCD and BASGCD algorithms are better than the state of the art algorithms for $\ell_1$-regularized empirical risk minimization.
• Packing rings into a minimum number of rectangles is an optimization problem which appears naturally in the logistics operations of the tube industry. It encompasses two major difficulties, namely the positioning of rings in rectangles and the recursive packing of rings into other rings. This problem is known as the Recursive Circle Packing Problem (RCPP). We present the first exact method for solving RCPP, based on a Dantzig-Wolfe decomposition of a nonconvex mixed-integer nonlinear programming formulation. The key idea of this reformulation is to break symmetry on each recursion level by enumerating all so-called one-level packings, i.e., packings of circles into other circles, and by dynamically generating packings of circles into rectangles. We propose a branch-and-price algorithm to solve the reformulation to global optimality. Extensive computational experiments on a large test set show that our method not only computes exact dual bounds, but often produces primal solutions better than computed by heuristics from the literature.
• We consider the optimal allocation of generic resources among multiple generic entities of interest over a finite planning horizon, where each entity generates stochastic returns as a function of its resource allocation during each period. The main objective is to maximize the expected return while at the same time managing risk to an acceptable level for each period. We devise a general solution framework and establish how to obtain the optimal dynamic resource allocation.
• Epidemic processes are used commonly for modeling and analysis of biological networks, computer networks, and human contact networks. The idea of competing viruses has been explored recently, motivated by the spread of different ideas along different social networks. Previous studies of competitive viruses have focused only on two viruses and on static graph structures. In this paper, we consider multiple competing viruses over static and dynamic graph structures, and investigate the eradication and propagation of diseases in these systems. Stability analysis for the class of models we consider is performed and an antidote control technique is proposed.