results for au:Moura_J in:cs

- We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end -- from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results. First, as a 'sanity check' demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among 'visual' dialog agents with no human supervision. Second, we conduct large-scale real-image experiments on the VisDial dataset, where we pretrain with supervised dialog data and show that the RL 'fine-tuned' agents significantly outperform SL agents. Interestingly, the RL Qbot learns to ask questions that Abot is good at, ultimately resulting in more informative dialog and a better team.
- Mar 20 2017 cs.CV arXiv:1703.05868v1In this paper, we estimate traffic density from low quality videos captured by city web cameras (webcams). Webcam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN based method significantly reduces the mean absolute value from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.
- Feb 10 2017 cs.MA arXiv:1702.02597v1In many problems, agents cooperate locally so that a leader or fusion center can infer the state of every agent from probing the state of only a small number of agents. Versions of this problem arise when a fusion center reconstructs an extended physical field by accessing the state of just a few of the sensors measuring the field, or a leader monitors the formation of a team of robots. Given a link cost, the paper presents a polynomial time algorithm to design a minimum cost coordinated network dynamics followed by the agents, under an observability constraint. The problem is placed in the context of structural observability and solved even when up to k agents in the coordinated network dynamics fail.
- Jan 12 2017 cs.SI arXiv:1701.02864v1We define and discuss the utility of two equivalence graph classes over which a spectral projector-based graph Fourier transform is equivalent: isomorphic equivalence classes and Jordan equivalence classes. Isomorphic equivalence classes show that the transform is equivalent up to a permutation on the node labels. Jordan equivalence classes permit identical transforms over graphs of nonidentical topologies and allow a basis-invariant characterization of total variation orderings of the spectral components. Methods to exploit these classes to reduce computation time of the transform as well as limitations are discussed.
- Jan 12 2017 cs.SI arXiv:1701.02851v1We propose an inexact method for the graph Fourier transform of a graph signal, as defined by the signal decomposition over the Jordan subspaces of the graph adjacency matrix. This method projects the signal over the generalized eigenspaces of the adjacency matrix, which accelerates the transform computation over large, sparse, and directed adjacency matrices. The trade-off between execution time and fidelity to the original graph structure is discussed. In addition, properties such as a generalized Parseval's identity and total variation ordering of the generalized eigenspaces are discussed. The method is applied to 2010-2013 NYC taxi trip data to identify traffic hotspots on the Manhattan grid. Our results show that identical highly expressed geolocations can be identified with the inexact method and the method based on eigenvector projections, while reducing computation time by a factor of 26,000 and reducing energy dispersal among the spectral components corresponding to the multiple zero eigenvalue.
- Jan 11 2017 cs.SI arXiv:1701.02690v1The paper presents the graph Fourier transform (GFT) of a signal in terms of its spectral decomposition over the Jordan subspaces of the graph adjacency matrix $A$. This representation is unique and coordinate free, and it leads to unambiguous definition of the spectral components ("harmonics") of a graph signal. This is particularly meaningful when $A$ has repeated eigenvalues, and it is very useful when $A$ is defective or not diagonalizable (as it may be the case with directed graphs). Many real world large sparse graphs have defective adjacency matrices. We present properties of the GFT and show it to satisfy a generalized Parseval inequality and to admit a total variation ordering of the spectral components. We express the GFT in terms of spectral projectors and present an illustrative example for a real world large urban traffic dataset.
- This work presents distributed algorithms for estimation of time-varying random fields over multi-agent/sensor networks. A network of sensors makes sparse and noisy local measurements of the dynamic field. Each sensor aims to obtain unbiased distributed estimates of the entire field with bounded mean-squared error (MSE) based on its own local observations and its neighbors' estimates. This work develops three novel distributed estimators: Pseudo-Innovations Kalman Filter (PIKF), Distributed Information Kalman Filter (DIKF) and Consensus+Innovations Kalman Filter (CIKF). We design the gain matrices such that the estimators achieve unbiased estimates with bounded MSE under minimal assumptions on the local observation and network communication models. This work establishes trade-offs between these three distributed estimators and demonstrates how they outperform existing solutions. We validate our results through extensive numerical evaluations.
- Jan 10 2017 cs.NA arXiv:1701.01780v1Design of filters for graph signal processing benefits from knowledge of the spectral decomposition of matrices that encode graphs, such as the adjacency matrix and the Laplacian matrix, used to define the shift operator. For shift matrices with real eigenvalues, which arise for symmetric graphs, the empirical spectral distribution captures the eigenvalue locations. Under realistic circumstances, stochastic influences often affect the network structure and, consequently, the shift matrix empirical spectral distribution. Nevertheless, deterministic functions may often be found to approximate the asymptotic behavior of empirical spectral distributions of random matrices. This paper uses stochastic canonical equation methods developed by Girko to derive such deterministic equivalent distributions for the empirical spectral distributions of random graphs formed by structured, non-uniform percolation of a D-dimensional lattice supergraph. Included simulations demonstrate the results for sample parameters.
- We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). Data collection is underway and on completion, VisDial will contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO, with a total of 2M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders -- Late Fusion, Hierarchical Recurrent Encoder and Memory Network -- and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Our dataset, code, and trained models will be released publicly. Putting it all together, we demonstrate the first 'visual chatbot'!
- In graph signal processing, the graph adjacency matrix or the graph Laplacian commonly define the shift operator. The spectral decomposition of the shift operator plays an important role in that the eigenvalues represent frequencies and the eigenvectors provide a spectral basis. This is useful, for example, in the design of filters. However, the graph or network may be uncertain due to stochastic influences in construction and maintenance, and, under such conditions, the eigenvalues of the shift matrix become random variables. This paper examines the spectral distribution of the eigenvalues of random networks formed by including each link of a D-dimensional lattice supergraph independently with identical probability, a percolation model. Using the stochastic canonical equation methods developed by Girko for symmetric matrices with independent upper triangular entries, a deterministic distribution is found that asymptotically approximates the empirical spectral distribution of the scaled adjacency matrix for a model with arbitrary parameters. The main results characterize the form of the solution to an important system of equations that leads to this deterministic distribution function and significantly reduce the number of equations that must be solved to find the solution for a given set of model parameters. Simulations comparing the expected empirical spectral distributions and the computed deterministic distributions are provided for sample parameters.
- In networks such as the smart grid, communication networks, and social networks, local measurements/observations are scattered over a wide geographical area. Centralized inference algorithm are based on gathering all the observations at a central processing unit. However, with data explosion and ever-increasing network sizes, centralized inference suffers from large communication overhead, heavy computation burden at the center, and susceptibility to central node failure. This paper considers inference over networks using factor graphs and a distributed inference algorithm based on Gaussian belief propagation. The distributed inference involves only local computation of the information matrix and of the mean vector and message passing between neighbors. We discover and show analytically that the message information matrix converges exponentially fast to a unique positive definite limit matrix for arbitrary positive semidefinite initialization. We provide the necessary and sufficient convergence condition for the belief mean vector to converge to the optimal centralized estimator. An easily verifiable sufficient convergence condition on the topology of a factor graph is further provided.
- This paper studies an attacker against a cyber-physical system (CPS) whose goal is to move the state of a CPS to a target state while ensuring that his or her probability of being detected does not exceed a given bound. The attacker's probability of being detected is related to the nonnegative bias induced by his or her attack on the CPS' detection statistic. We formulate a linear quadratic cost function that captures the attacker's control goal and establish constraints on the induced bias that reflect the attacker's detection-avoidance objectives. When the attacker is constrained to be detected at the false-alarm rate of the detector, we show that the optimal attack strategy reduces to a linear feedback of the attacker's state estimate. In the case that the attacker's bias is upper bounded by a positive constant, we provide two algorithms -- an optimal algorithm and a sub-optimal, less computationally intensive algorithm -- to find suitable attack sequences. Finally, we illustrate our attack strategies in numerical examples based on a remotely-controlled helicopter under attack.
- Jul 06 2016 cs.SI arXiv:1607.01100v2Motivated by the need to extract meaning from large amounts of complex structured data, we consider three critical problems on graphs: localization, decomposition, and dictionary learning of piecewise-constant signals. These graph-based problems are related to many real-world applications, such as localizing stimulus in brain connectivity networks, and mining traffic events in city street networks, where the key issue is to find the supports of localized activated patterns. Counterparts of these problems in classical signal/image processing, such as impulse detection and foreground detection, have been studied over the past few decades. We use piecewise-constant graph signals to model localized patterns, where each piece indicates a localized pattern that exhibits homogeneous internal behavior and the number of pieces indicates the number of localized patterns. For such signals, we show that decomposition and dictionary learning are natural extensions of localization, the goal of which is not only to efficiently approximate graph signals, but also to accurately find supports of localized patterns. For each of the three problems, i.e., localization, decomposition, and dictionary learning, we propose a specific graph signal model, an optimization problem, and a computationally efficient solver. The proposed solvers directly find the supports of arbitrary localized activated patterns without tuning any thresholds. We then conduct an extensive empirical study to validate the proposed methods on both simulated and real data including the analysis of a large volume of spatio-temporal Manhattan urban data. The analysis validates the effectiveness of the approach and suggests that graph signal processing tools may aid in urban planning and traffic forecasting.
- In this paper, we address the distributed filtering and prediction of time-varying random fields represented by linear time-invariant (LTI) dynamical systems. The field is observed by a sparsely connected network of agents/sensors collaborating among themselves. We develop a Kalman filter type consensus+innovations distributed linear estimator of the dynamic field termed as Consensus+Innovations Kalman Filter. We analyze the convergence properties of this distributed estimator. We prove that the mean-squared error of the estimator asymptotically converges if the degree of instability of the field dynamics is within a pre-specified threshold defined as tracking capacity of the estimator. The tracking capacity is a function of the local observation models and the agent communication network. We design the optimal consensus and innovation gain matrices yielding distributed estimates with minimized mean-squared error. Through numerical evaluations, we show that, the distributed estimator with optimal gains converges faster and with approximately 3dB better mean-squared error performance than previous distributed estimators.
- Feb 08 2016 cs.NI arXiv:1602.02144v3Future wireless networks need to offer orders of magnitude more capacity to address the predicted growth in mobile traffic demand. Operators to enhance the capacity of cellular networks are increasingly using WiFi to offload traffic from their core networks. This paper deals with the efficient and flexible management of a heterogeneous networking environment offering wireless access to multimode terminals. This wireless access is evaluated under disruptive usage scenarios, such as flash crowds, which can mean unwanted severe congestion on a specific operator network whilst the remaining available capacity from other access technologies is not being used. To address these issues, we propose a scalable network assisted distributed solution that is administered by centralized policies, and an embedded reputation system, by which initially selfish operators are encouraged to cooperate under the threat of churn. Our solution after detecting a congested technology, including within its wired backhaul, automatically offloads and balances the flows amongst the access resources from all the existing technologies, following some quality metrics. Our results show that the smart integration of access networks can yield an additional wireless quality for mobile flows up to thirty eight percent beyond that feasible from the best effort standalone operation of each wireless access technology. It is also evidenced that backhaul constraints are conveniently reflected on the way the flow access to wireless media is granted. Finally, we have analyzed the sensitivity of the handover decision algorithm running in each terminal agent to consecutive flash crowds, as well as its centralized feature that controls the connection quality offered by a heterogeneous access infrastructure owned by distinct operators.
- This paper focuses on the problem of recursive nonlinear least squares parameter estimation in multi-agent networks, in which the individual agents observe sequentially over time an independent and identically distributed (i.i.d.) time-series consisting of a nonlinear function of the true but unknown parameter corrupted by noise. A distributed recursive estimator of the \emphconsensus + \emphinnovations type, namely $\mathcal{CIWNLS}$, is proposed, in which the agents update their parameter estimates at each observation sampling epoch in a collaborative way by simultaneously processing the latest locally sensed information~(\emphinnovations) and the parameter estimates from other agents~(\emphconsensus) in the local neighborhood conforming to a pre-specified inter-agent communication topology. Under rather weak conditions on the connectivity of the inter-agent communication and a \emphglobal observability criterion, it is shown that at every network agent, the proposed algorithm leads to consistent parameter estimates. Furthermore, under standard smoothness assumptions on the local observation functions, the distributed estimator is shown to yield order-optimal convergence rates, i.e., as far as the order of pathwise convergence is concerned, the local parameter estimates at each agent are as good as the optimal centralized nonlinear least squares estimator which would require access to all the observations across all the agents at all times. In order to benchmark the performance of the proposed distributed $\mathcal{CIWNLS}$ estimator with that of the centralized nonlinear least squares estimator, the asymptotic normality of the estimate sequence is established and the asymptotic covariance of the distributed estimator is evaluated. Finally, simulation results are presented which illustrate and verify the analytical findings.
- Jan 26 2016 cs.NI arXiv:1601.06202v1Some traffic characteristics like real-time, location-based, and community-inspired, as well as the exponential increase on the data traffic in mobile networks, are challenging the academia and standardization communities to manage these networks in completely novel and intelligent ways, otherwise, current network infrastructures can not offer a connection service with an acceptable quality for both emergent traffic demand and application requisites. In this way, a very relevant research problem that needs to be addressed is how a heterogeneous wireless access infrastructure should be controlled to offer a network access with a proper level of quality for diverse flows ending at multi-mode devices in mobile scenarios. The current chapter reviews recent research and standardization work developed under the most used wireless access technologies and mobile access proposals. It comprehensively outlines the impact on the deployment of those technologies in future networking environments, not only on the network performance but also in how the most important requirements of several relevant players, such as, content providers, network operators, and users/terminals can be addressed. Finally, the chapter concludes referring the most notable aspects in how the environment of future networks are expected to evolve like technology convergence, service convergence, terminal convergence, market convergence, environmental awareness, energy-efficiency, self-organized and intelligent infrastructure, as well as the most important functional requisites to be addressed through that infrastructure such as flow mobility, data offloading, load balancing and vertical multihoming.
- Jan 26 2016 cs.NI arXiv:1601.06203v1This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.
- Jan 26 2016 cs.CR arXiv:1601.06206v2This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.
- Jan 21 2016 cs.NI arXiv:1601.05329v2Cloud Computing offers virtualized computing, storage, and networking resources, over the Internet, to organizations and individual users in a completely dynamic way. These cloud resources are cheaper, easier to manage, and more elastic than sets of local, physical, ones. This encourages customers to outsource their applications and services to the cloud. The migration of both data and applications outside the administrative domain of customers into a shared environment imposes transversal, functional problems across distinct platforms and technologies. This article provides a contemporary discussion of the most relevant functional problems associated with the current evolution of Cloud Computing, mainly from the network perspective. The paper also gives a concise description of Cloud Computing concepts and technologies. It starts with a brief history about cloud computing, tracing its roots. Then, architectural models of cloud services are described, and the most relevant products for Cloud Computing are briefly discussed along with a comprehensive literature review. The paper highlights and analyzes the most pertinent and practical network issues of relevance to the provision of high-assurance cloud services through the Internet, including security. Finally, trends and future research directions are also presented.
- Jan 13 2016 physics.soc-ph cs.SI arXiv:1601.02923v1This paper considers the dynamics of edges in a network. The Dynamic Bond Percolation (DBP) process models, through stochastic local rules, the dependence of an edge $(a,b)$ in a network on the states of its neighboring edges. Unlike previous models, DBP does not assume statistical independence between different edges. In applications, this means for example that failures of transmission lines in a power grid are not statistically independent, or alternatively, relationships between individuals (dyads) can lead to changes in other dyads in a social network. We consider the time evolution of the probability distribution of the network state, the collective states of all the edges (bonds), and show that it converges to a stationary distribution. We use this distribution to study the emergence of global behaviors like consensus (i.e., catastrophic failure or full recovery of the entire grid) or coexistence (i.e., some failed and some operating substructures in the grid). In particular, we show that, depending on the local dynamical rule, different network substructures, such as hub or triangle subgraphs, are more prone to failure.
- We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.
- Jul 15 2015 cs.CY arXiv:1507.03682v1Researchers look for new virtual instruments that can improve and maximize traditional forms of teaching and learning. In this paper, we present the ARG system, a virtual tool developed to help the teaching/learning process in argumentation theory, especially in the field of Law. ARG was developed based on Araucaria by Reed and Rowe, Room 5 by Ronald P. Loui, as well on systems such as Argue!-System and ArguMed by Bart Verheij. ARG is a platform for online collaboration and applies the theory of Stephen Toulmin to produce arguments that are more concise, precise, minimally structured and more resistant to criticism.
- Jul 03 2015 physics.soc-ph cs.SI arXiv:1507.00396v1Propagation of contagion in networks depends on the graph topology. This paper is concerned with studying the time-asymptotic behavior of the extended contact processes on static, undirected, finite-size networks. This is a contact process with nonzero exogenous infection rate (also known as the \epsilon-SIS, \epsilon susceptible-infected-susceptible, model [1]). The only known analytical characterization of the equilibrium distribution of this process is for complete networks. For large networks with arbitrary topology, it is infeasible to numerically solve for the equilibrium distribution since it requires solving the eigenvalue-eigenvector problem of a matrix that is exponential in N , the size of the network. We show that, for a certain range of the network process parameters, the equilibrium distribution of the extended contact process on arbitrary, finite-size networks is well approximated by the equilibrium distribution of the scaled SIS process, which we derived in closed-form in prior work. We confirm this result with numerical simulations comparing the equilibrium distribution of the extended contact process with that of a scaled SIS process. We use this approximation to decide, in polynomial-time, which agents and network substructures are more susceptible to infection by the extended contact process.
- We find large deviations rates for consensus-based distributed inference for directed networks. When the topology is deterministic, we establish the large deviations principle and find exactly the corresponding rate function, equal at all nodes. We show that the dependence of the rate function on the stochastic weight matrix associated with the network is fully captured by its left eigenvector corresponding to the unit eigenvalue. Further, when the sensors' observations are Gaussian, the rate function admits a closed form expression. Motivated by these observations, we formulate the optimal network design problem of finding the left eigenvector which achieves the highest value of the rate function, for a given target accuracy. This eigenvector therefore minimizes the time that the inference algorithm needs to reach the desired accuracy. For Gaussian observations, we show that the network design problem can be formulated as a semidefinite (convex) program, and hence can be solved efficiently. When observations are identically distributed across agents, the system exhibits an interesting property: the graph of the rate function always lies between the graphs of the rate function of an isolated node and the rate function of a fusion center that has access to all observations. We prove that this fundamental property holds even when the topology and the associated system matrices change randomly over time, with arbitrary distribution. Due to generality of its assumptions, the latter result requires more subtle techniques than the standard large deviations tools, contributing to the general theory of large deviations.
- This paper studies the impact of side initial state information on the detectability of data deception attacks against cyber-physical systems. We assume the attack detector has access to a linear function of the initial system state that cannot be altered by an attacker. First, we provide a necessary and sufficient condition for an attack to be undetectable by any dynamic attack detector under each specific side information pattern. Second, we characterize attacks that can be sustained for arbitrarily long periods without being detected. Third, we define the zero state inducing attack, the only type of attack that remains dynamically undetectable regardless of the side initial state information available to the attack detector. Finally, we design a dynamic attack detector that detects detectable attacks.
- Many applications collect a large number of time series, for example, the financial data of companies quoted in a stock exchange, the health care data of all patients that visit the emergency room of a hospital, or the temperature sequences continuously measured by weather stations across the US. These data are often referred to as unstructured. A first task in its analytics is to derive a low dimensional representation, a graph or discrete manifold, that describes well the interrelations among the time series and their intrarelations across time. This paper presents a computationally tractable algorithm for estimating this graph that structures the data. The resulting graph is directed and weighted, possibly capturing causal relations, not just reciprocal correlations as in many existing approaches in the literature. A convergence analysis is carried out. The algorithm is demonstrated on random graph datasets and real network time series datasets, and its performance is compared to that of related methods. The adjacency matrices estimated with the new method are close to the true graph in the simulated data and consistent with prior physical knowledge in the real dataset tested.
- We consider the problem of signal recovery on graphs as graphs model data with complex structure as signals on a graph. Graph signal recovery implies recovery of one or multiple smooth graph signals from noisy, corrupted, or incomplete measurements. We propose a graph signal model and formulate signal recovery as a corresponding optimization problem. We provide a general solution by using the alternating direction methods of multipliers. We next show how signal inpainting, matrix completion, robust principal component analysis, and anomaly detection all relate to graph signal recovery, and provide corresponding specific solutions and theoretical analysis. Finally, we validate the proposed methods on real-world recovery problems, including online blog classification, bridge condition identification, temperature estimation, recommender system, and expert opinion combination of online blog classification.
- Nov 03 2014 cs.CR arXiv:1410.8747v1Botnets represent a global problem and are responsible for causing large financial and operational damage to their victims. They are implemented with evasion in mind, and aim at hiding their architecture and authors, making them difficult to detect in general. These kinds of networks are mainly used for identity theft, virtual extortion, spam campaigns and malware dissemination. Botnets have a great potential in warfare and terrorist activities, making it of utmost importance to take action against. We present CONDENSER, a method for identifying data generated by botnet activity. We start by selecting appropriate the features from several data feeds, namely DNS non-existent domain responses and live communication packages directed to command and control servers that we previously sinkholed. By using machine learning algorithms and a graph based representation of data, then allows one to identify botnet activity, helps identifying anomalous traffic, quickly detect new botnets and improve activities of tracking known botnets. Our main contributions are threefold: first, the use of a machine learning classifier for classifying domain names as being generated by domain generation algorithms (DGA); second, a clustering algorithm using the set of selected features that groups network communication with similar patterns; third, a graph based knowledge representation framework where we store processed data, allowing us to perform queries.
- Oct 09 2014 cs.SI physics.soc-ph arXiv:1410.2196v2In previous work, we developed the scaled SIS process, which models the dynamics of SIS epidemics over networks. With the scaled SIS process, we can consider networks that are finite-sized and of arbitrary topology (i.e., we are not restricted to specific classes of networks). We derived for the scaled SIS process a closed-form expression for the time-asymptotic probability distribution of the states of all the agents in the network. This closed-form solution of the equilibrium distribution explicitly exhibits the underlying network topology through its adjacency matrix. This paper determines which network configuration is the most probable. We prove that, for a range of epidemics parameters, this combinatorial problem leads to a submodular optimization problem, which is exactly solvable in polynomial time. We relate the most-probable configuration to the network structure, in particular, to the existence of high density subgraphs. Depending on the epidemics parameters, subset of agents may be more likely to be infected than others; these more-vulnerable agents form subgraphs that are denser than the overall network. We illustrate our results with a 193 node social network and the 4941 node Western US power grid under different epidemics parameters.
- Even though modularity has been studied extensively in conventional logic programming, there are few approaches on how to incorporate modularity into Answer Set Programming, a prominent rule-based declarative programming paradigm. A major approach is Oikarinnen and Janhunen's Gaifman-Shapiro-style architecture of program modules, which provides the composition of program modules. Their module theorem properly strengthens Lifschitz and Turner's splitting set theorem for normal logic programs. However, this approach is limited by module conditions that are imposed in order to ensure the compatibility of their module system with the stable model semantics, namely forcing output signatures of composing modules to be disjoint and disallowing positive cyclic dependencies between different modules. These conditions turn out to be too restrictive in practice and in this paper we discuss alternative ways of lift both restrictions independently, effectively solving the first, widening the applicability of this framework and the scope of the module theorem.
- This paper studies the convergence of the estimation error process and the characterization of the corresponding invariant measure in distributed Kalman filtering for potentially unstable and large linear dynamic systems. A gossip network protocol termed Modified Gossip Interactive Kalman Filtering (M-GIKF) is proposed, where sensors exchange their filtered states (estimates and error covariances) and propagate their observations via inter-sensor communications of rate $\overline{\gamma}$; $\overline{\gamma}$ is defined as the averaged number of inter-sensor message passages per signal evolution epoch. The filtered states are interpreted as stochastic particles swapped through local interaction. The paper shows that the conditional estimation error covariance sequence at each sensor under M-GIKF evolves as a random Riccati equation (RRE) with Markov modulated switching. By formulating the RRE as a random dynamical system, it is shown that the network achieves weak consensus, i.e., the conditional estimation error covariance at a randomly selected sensor converges weakly (in distribution) to a unique invariant measure. Further, it is proved that as $\overline{\gamma} \rightarrow \infty$ this invariant measure satisfies the Large Deviation (LD) upper and lower bounds, implying that this measure converges exponentially fast (in probability) to the Dirac measure $\delta_{P^*}$, where $P^*$ is the stable error covariance of the centralized (Kalman) filtering setup. The LD results answer a fundamental question on how to quantify the rate at which the distributed scheme approaches the centralized performance as the inter-sensor communication rate increases.
- We consider distributed optimization in random networks where N nodes cooperatively minimize the sum \sum_i=1^N f_i(x) of their individual convex costs. Existing literature proposes distributed gradient-like methods that are computationally cheap and resilient to link failures, but have slow convergence rates. In this paper, we propose accelerated distributed gradient methods that: 1) are resilient to link failures; 2) computationally cheap; and 3) improve convergence rates over other gradient methods. We model the network by a sequence of independent, identically distributed random matrices W(k) drawn from the set of symmetric, stochastic matrices with positive diagonals. The network is connected on average and the cost functions are convex, differentiable, with Lipschitz continuous and bounded gradients. We design two distributed Nesterov-like gradient methods that modify the D-NG and D-NC methods that we proposed for static networks. We prove their convergence rates in terms of the expected optimality gap at the cost function. Let k and K be the number of per-node gradient evaluations and per-node communications, respectively. Then the modified D-NG achieves rates O(log k/k) and O(\log K/K), and the modified D-NC rates O(1/k^2) and O(1/K^2-\xi), where \xi>0 is arbitrarily small. For comparison, the standard distributed gradient method cannot do better than \Omega(1/k^2/3) and \Omega(1/K^2/3), on the same class of cost functions (even for static networks). Simulation examples illustrate our analytical findings.
- We study distributed optimization where nodes cooperatively minimize the sum of their individual, locally known, convex costs $f_i(x)$'s, $x \in {\mathbb R}^d$ is global. Distributed augmented Lagrangian (AL) methods have good empirical performance on several signal processing and learning applications, but there is limited understanding of their convergence rates and how it depends on the underlying network. This paper establishes globally linear (geometric) convergence rates of a class of deterministic and randomized distributed AL methods, when the $f_i$'s are twice continuously differentiable and have a bounded Hessian. We give explicit dependence of the convergence rates on the underlying network parameters. Simulations illustrate our analytical findings.
- Signals and datasets that arise in physical and engineering applications, as well as social, genetics, biomolecular, and many other domains, are becoming increasingly larger and more complex. In contrast to traditional time and image signals, data in these domains are supported by arbitrary graphs. Signal processing on graphs extends concepts and techniques from traditional signal processing to data indexed by generic graphs. This paper studies the concepts of low and high frequencies on graphs, and low-, high-, and band-pass graph filters. In traditional signal processing, there concepts are easily defined because of a natural frequency ordering that has a physical interpretation. For signals residing on graphs, in general, there is no obvious frequency ordering. We propose a definition of total variation for graph signals that naturally leads to a frequency ordering on graphs and defines low-, high-, and band-pass graph signals and filters. We study the design of graph filters with specified frequency response, and illustrate our approach with applications to sensor malfunction detection and data classification.
- Single virus epidemics over complete networks are widely explored in the literature as the fraction of infected nodes is, under appropriate microscopic modeling of the virus infection, a Markov process. With non-complete networks, this macroscopic variable is no longer Markov. In this paper, we study virus diffusion, in particular, multi-virus epidemics, over non-complete stochastic networks. We focus on multipartite networks. In companying work http://arxiv.org/abs/1306.6198, we show that the peer-to-peer local random rules of virus infection lead, in the limit of large multipartite networks, to the emergence of structured dynamics at the macroscale. The exact fluid limit evolution of the fraction of nodes infected by each virus strain across islands obeys a set of nonlinear coupled differential equations, see http://arxiv.org/abs/1306.6198. In this paper, we develop methods to analyze the qualitative behavior of these limiting dynamics, establishing conditions on the virus micro characteristics and network structure under which a virus persists or a natural selection phenomenon is observed.
- Jun 27 2013 cs.SI physics.soc-ph arXiv:1306.6198v1Epidemics in large complete networks is well established. In contrast, we consider epidemics in non-complete networks. We establish the fluid limit macroscopic dynamics of a multi-virus spread over a multipartite network as the number of nodes at each partite or island grows large. The virus spread follows a peer-to-peer random rule of infection in line with the Harris contact process. The model conforms to an SIS (susceptible-infected-susceptible) type, where a node is either infected or it is healthy and prone to be infected. The local (at node level) random infection model induces the emergence of structured dynamics at the macroscale. Namely, we prove that, as the multipartite network grows large, the normalized Markov jump vector process $\left(\bar{\mathbf{Y}}^\mathbf{N}(t)\right) = \left(\bar{Y}_1^\mathbf{N}(t),\ldots, \bar{Y}_M^\mathbf{N}(t)\right)$ collecting the fraction of infected nodes at each island $i=1,\ldots,M$, converges weakly (with respect to the Skorokhod topology on the space of \emphcàdlàg sample paths) to the solution of an $M$-dimensional vector nonlinear coupled ordinary differential equation. In the case of multi-virus diffusion with $K\in\mathbb{N}$ distinct strains of virus, the Markov jurmp matrix process $\left(\bar{\mathbf{Y}}^\mathbf{N}(t)\right)$, stacking the fraction of nodes infected with virus type $j$, $j=1,\ldots,K$, at each island $i=1,\ldots,M$, converges weakly as well to the solution of a $\left(K\times M\right)$-dimensional vector differential equation that is also characterized.
- The paper studies the problem of distributed parameter estimation in multi-agent networks with exponential family observation statistics. A certainty-equivalence type distributed estimator of the consensus + innovations form is proposed in which, at each each observation sampling epoch agents update their local parameter estimates by appropriately combining the data received from their neighbors and the locally sensed new information (innovation). Under global observability of the networked sensing model, i.e., the ability to distinguish between different instances of the parameter value based on the joint observation statistics, and mean connectivity of the inter-agent communication network, the proposed estimator is shown to yield consistent parameter estimates at each network agent. Further, it is shown that the distributed estimator is asymptotically efficient, in that, the asymptotic covariances of the agent estimates coincide with that of the optimal centralized estimator, i.e., the inverse of the centralized Fisher information rate. From a technical viewpoint, the proposed distributed estimator leads to non-Markovian mixed timescale stochastic recursions and the analytical methods developed in the paper contribute to the general theory of distributed stochastic approximation.
- Oct 18 2012 cs.SI physics.soc-ph arXiv:1210.4752v2In social settings, individuals interact through webs of relationships. Each individual is a node in a complex network (or graph) of interdependencies and generates data, lots of data. We label the data by its source, or formally stated, we index the data by the nodes of the graph. The resulting signals (data indexed by the nodes) are far removed from time or image signals indexed by well ordered time samples or pixels. DSP, discrete signal processing, provides a comprehensive, elegant, and efficient methodology to describe, represent, transform, analyze, process, or synthesize these well ordered time or image signals. This paper extends to signals on graphs DSP and its basic tenets, including filters, convolution, z-transform, impulse response, spectral representation, Fourier transform, frequency response, and illustrates DSP on graphs by classifying blogs, linear predicting and compressing data from irregularly located weather stations, or predicting behavior of customers of a mobile service provider.
- The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents' objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of $Q$-learning, $\mathcal{QD}$-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is \emphweakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed time-scale stochastic dynamics of the \emphconsensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.
- Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence of matrix products $W_kW_{k-1}... W_1$. In this paper, we find the exact exponential rate $I$ for the convergence in probability of the product of such matrices when time $k$ grows large, under the assumption that the $W_k$'s are symmetric and independent identically distributed in time. Further, for commonly used random models like with gossip and link failure, we show that the rate $I$ is found by solving a min-cut problem and, hence, easily computable. Finally, we apply our results to optimally allocate the sensors' transmission power in consensus+innovations distributed detection.
- We study distributed optimization problems when $N$ nodes minimize the sum of their individual costs subject to a common vector variable. The costs are convex, have Lipschitz continuous gradient (with constant $L$), and bounded gradient. We propose two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establish their convergence rates in terms of the per-node communications $\mathcal{K}$ and the per-node gradient evaluations $k$. Our first method, Distributed Nesterov Gradient, achieves rates $O\left({\log \mathcal{K}}/{\mathcal{K}}\right)$ and $O\left({\log k}/{k}\right)$. Our second method, Distributed Nesterov gradient with Consensus iterations, assumes at all nodes knowledge of $L$ and $\mu(W)$ -- the second largest singular value of the $N \times N$ doubly stochastic weight matrix $W$. It achieves rates $O\left({1}/{\mathcal{K}^{2-\xi}}\right)$ and $O\left({1}/{k^2}\right)$ ($\xi>0$ arbitrarily small). Further, we give with both methods explicit dependence of the convergence constants on $N$ and $W$. Simulation examples illustrate our findings.
- We establish the large deviations asymptotic performance (error exponent) of consensus+innovations distributed detection over random networks with generic (non-Gaussian) sensor observations. At each time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and 2) assimilate their new observations (innovations). This paper shows for general non-Gaussian distributions that consensus+innovations distributed detection exhibits a phase transition behavior with respect to the network degree of connectivity. Above a threshold, distributed is as good as centralized, with the same optimal asymptotic detection performance, but, below the threshold, distributed detection is suboptimal with respect to centralized detection. We determine this threshold and quantify the performance loss below threshold. Finally, we show the dependence of the threshold and performance on the distribution of the observations: distributed detectors over the same random network, but with different observations' distributions, for example, Gaussian, Laplace, or quantized, may have different asymptotic performance, even when the corresponding centralized detectors have the same asymptotic performance.
- The paper considers the problem of distributed adaptive linear parameter estimation in multi-agent inference networks. Local sensing model information is only partially available at the agents and inter-agent communication is assumed to be unpredictable. The paper develops a generic mixed time-scale stochastic procedure consisting of simultaneous distributed learning and estimation, in which the agents adaptively assess their relative observation quality over time and fuse the innovations accordingly. Under rather weak assumptions on the statistical model and the inter-agent communication, it is shown that, by properly tuning the consensus potential with respect to the innovation potential, the asymptotic information rate loss incurred in the learning process may be made negligible. As such, it is shown that the agent estimates are asymptotically efficient, in that their asymptotic covariance coincides with that of a centralized estimator (the inverse of the centralized Fisher information rate for Gaussian systems) with perfect global model information and having access to all observations at all times. The proof techniques are mainly based on convergence arguments for non-Markovian mixed time scale stochastic approximation procedures. Several approximation results developed in the process are of independent interest.
- We study the large deviations performance of consensus+innovations distributed detection over noisy networks, where sensors at a time step k cooperate with immediate neighbors (consensus) and assimilate their new observations (innovation.) We show that, even under noisy communication, \emphall sensors can achieve exponential decay e^-k C_\mathrmdis of the detection error probability, even when certain (or most) sensors cannot detect the event of interest in isolation. We achieve this by designing a single time scale stochastic approximation type distributed detector with the optimal weight sequence \alpha_k, by which sensors weigh their neighbors' messages. The optimal design of \alpha_k balances the opposing effects of communication noise and information flow from neighbors: larger, slowly decaying \alpha_k improves information flow but injects more communication noise. Further, we quantify the best achievable C_\mathrmdis as a function of the sensing signal and noise, communication noise, and network connectivity. Finally, we find a threshold on the communication noise power below which a sensor that can detect the event in isolation still improves its detection by cooperation through noisy links.
- Graphical models use graphs to compactly capture stochastic dependencies amongst a collection of random variables. Inference over graphical models corresponds to finding marginal probability distributions given joint probability distributions. In general, this is computationally intractable, which has led to a quest for finding efficient approximate inference algorithms. We propose a framework for generalized inference over graphical models that can be used as a wrapper for improving the estimates of approximate inference algorithms. Instead of applying an inference algorithm to the original graph, we apply the inference algorithm to a block-graph, defined as a graph in which the nodes are non-overlapping clusters of nodes from the original graph. This results in marginal estimates of a cluster of nodes, which we further marginalize to get the marginal estimates of each node. Our proposed block-graph construction algorithm is simple, efficient, and motivated by the observation that approximate inference is more accurate on graphs with longer cycles. We present extensive numerical simulations that illustrate our block-graph framework with a variety of inference algorithms (e.g., those in the libDAI software package). These simulations show the improvements provided by our framework.
- We study the large deviations performance, i.e., the exponential decay rate of the error probability, of distributed detection algorithms over random networks. At each time step $k$ each sensor: 1) averages its decision variable with the neighbors' decision variables; and 2) accounts on-the-fly for its new observation. We show that distributed detection exhibits a "phase change" behavior. When the rate of network information flow (the speed of averaging) is above a threshold, then distributed detection is asymptotically equivalent to the optimal centralized detection, i.e., the exponential decay rate of the error probability for distributed detection equals the Chernoff information. When the rate of information flow is below a threshold, distributed detection achieves only a fraction of the Chernoff information rate; we quantify this achievable rate as a function of the network rate of information flow. Simulation examples demonstrate our theoretical findings on the behavior of distributed detection over random networks.
- The paper considers gossip distributed estimation of a (static) distributed random field (a.k.a., large scale unknown parameter vector) observed by sparsely interconnected sensors, each of which only observes a small fraction of the field. We consider linear distributed estimators whose structure combines the information \emphflow among sensors (the \emphconsensus term resulting from the local gossiping exchange among sensors when they are able to communicate) and the information \emphgathering measured by the sensors (the \emphsensing or \emphinnovations term.) This leads to mixed time scale algorithms--one time scale associated with the consensus and the other with the innovations. The paper establishes a distributed observability condition (global observability plus mean connectedness) under which the distributed estimates are consistent and asymptotically normal. We introduce the distributed notion equivalent to the (centralized) Fisher information rate, which is a bound on the mean square error reduction rate of any distributed estimator; we show that under the appropriate modeling and structural network communication conditions (gossip protocol) the distributed gossip estimator attains this distributed Fisher information rate, asymptotically achieving the performance of the optimal centralized estimator. Finally, we study the behavior of the distributed gossip estimator when the measurements fade (noise variance grows) with time; in particular, we consider the maximum rate at which the noise variance can grow and still the distributed estimator being consistent, by showing that, as long as the centralized estimator is consistent, the distributed estimator remains consistent.
- We apply large deviations theory to study asymptotic performance of running consensus distributed detection in sensor networks. Running consensus is a stochastic approximation type algorithm, recently proposed. At each time step k, the state at each sensor is updated by a local averaging of the sensor's own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations. We allow the underlying network be time varying, provided that the graph that collects the union of links that are online at least once over a finite time window is connected. This paper shows through large deviations that, under stated assumptions on the network connectivity and sensors' observations, the running consensus detection asymptotically approaches in performance the optimal centralized detection. That is, the Bayes probability of detection error (with the running consensus detector) decays exponentially to zero as k goes to infinity at the Chernoff information rate-the best achievable rate of the asymptotically optimal centralized detector.
- Withdrawn.
- We study distributed optimization in networked systems, where nodes cooperate to find the optimal quantity of common interest, x=x^⋆. The objective function of the corresponding optimization problem is the sum of private (known only by a node,) convex, nodes' objectives and each node imposes a private convex constraint on the allowed values of x. We solve this problem for generic connected network topologies with asymmetric random link failures with a novel distributed, decentralized algorithm. We refer to this algorithm as AL-G (augmented Lagrangian gossiping,) and to its variants as AL-MG (augmented Lagrangian multi neighbor gossiping) and AL-BG (augmented Lagrangian broadcast gossiping.) The AL-G algorithm is based on the augmented Lagrangian dual function. Dual variables are updated by the standard method of multipliers, at a slow time scale. To update the primal variables, we propose a novel, Gauss-Seidel type, randomized algorithm, at a fast time scale. AL-G uses unidirectional gossip communication, only between immediate neighbors in the network and is resilient to random link failures. For networks with reliable communication (i.e., no failures,) the simplified, AL-BG (augmented Lagrangian broadcast gossiping) algorithm reduces communication, computation and data storage cost. We prove convergence for all proposed algorithms and demonstrate by simulations the effectiveness on two applications: l_1-regularized logistic regression for classification and cooperative spectrum sensing for cognitive radio networks.
- We introduce block-tree graphs as a framework for deriving efficient algorithms on graphical models. We define block-tree graphs as a tree-structured graph where each node is a cluster of nodes such that the clusters in the graph are disjoint. This differs from junction-trees, where two clusters connected by an edge always have at least one common node. When compared to junction-trees, we show that constructing block-tree graphs is faster, and finding optimal block-tree graphs has a much smaller search space. Applying our block-tree graph framework to graphical models, we show that, for some graphs, e.g., grid graphs, using block-tree graphs for inference is computationally more efficient than using junction-trees. For graphical models with boundary conditions, the block-tree graph framework transforms the boundary valued problem into an initial value problem. For Gaussian graphical models, the block-tree graph framework leads to a linear state-space representation. Since exact inference in graphical models can be computationally intractable, we propose to use spanning block-trees to derive approximate inference algorithms. Experimental results show the improved performance in using spanning block-trees versus using spanning trees for approximate estimation over Gaussian graphical models.
- The paper presents the gossip interactive Kalman filter (GIKF) for distributed Kalman filtering for networked systems and sensor networks, where inter-sensor communication and observations occur at the same time-scale. The communication among sensors is random; each sensor occasionally exchanges its filtering state information with a neighbor depending on the availability of the appropriate network link. We show that under a weak distributed detectability condition: 1. the GIKF error process remains stochastically bounded, irrespective of the instability properties of the random process dynamics; and 2. the network achieves \emphweak consensus, i.e., the conditional estimation error covariance at a (uniformly) randomly selected sensor converges in distribution to a unique invariant measure on the space of positive semi-definite matrices (independent of the initial state.) To prove these results, we interpret the filtered states (estimates and error covariances) at each node in the GIKF as stochastic particles with local interactions. We analyze the asymptotic properties of the error process by studying as a random dynamical system the associated switched (random) Riccati equation, the switching being dictated by a non-stationary Markov chain on the network graph.
- Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wireless network conditions. Recently, there has been a surge of activity in the computer science, control, signal processing, and information theory communities, developing faster and more robust gossip algorithms and deriving theoretical performance guarantees. This article presents an overview of recent work in the area. We describe convergence rate results, which are related to the number of transmitted messages and thus the amount of energy consumed in the network for gossiping. We discuss issues related to gossiping over wireless links, including the effects of quantization and noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributed estimation, source localization, and compression.
- We characterize the invariant filtering measures resulting from Kalman filtering with intermittent observations (\citeBruno), where the observation arrival is modeled as a Bernoulli process. In \citeRiccati-weakconv, it was shown that there exists a $\overline{\gamma}^{\{\scriptsize{sb}}}>0$ such that for every observation packet arrival probability $\overline{\gamma}$, $\overline{\gamma}>\overline{\gamma}^{\{\scriptsize{sb}}}>0$, the sequence of random conditional error covariance matrices converges in distribution to a unique invariant distribution $\mathbb{\mu}^{\overline{\gamma}}$ (independent of the filter initialization.) In this paper, we prove that, for controllable and observable systems, $\overline{\gamma}^{\{\scriptsize{sb}}}=0$ and that, as $\overline{\gamma}\uparrow 1$, the family $\{\mathbb{\mu}^{\overline{\gamma}}\}_{\overline{\gamma}>0}$ of invariant distributions satisfies a moderate deviations principle (MDP) with a good rate function $I$. The rate function $I$ is explicitly identified. In particular, our results show:
- In this correspondence, we present an algorithm for distributed sensor localization with noisy distance measurements (DILAND) that extends and makes the DLRE more robust. DLRE is a distributed sensor localization algorithm in $\mathbb{R}^m$ $(m\geq1)$ introduced in \citeusman_loctsp:08. DILAND operates when (i) the communication among the sensors is noisy; (ii) the communication links in the network may fail with a non-zero probability; and (iii) the measurements performed to compute distances among the sensors are corrupted with noise. The sensors (which do not know their locations) lie in the convex hull of at least $m+1$ anchors (nodes that know their own locations.) Under minimal assumptions on the connectivity and triangulation of each sensor in the network, this correspondence shows that, under the broad random phenomena described above, DILAND converges almost surely (a.s.) to the exact sensor locations.
- The paper studies the problem of filtering a discrete-time linear system observed by a network of sensors. The sensors share a common communication medium to the estimator and transmission is bit and power budgeted. Under the assumption of conditional Gaussianity of the signal process at the estimator (which may be ensured by observation packet acknowledgements), the conditional prediction error covariance of the optimum mean-squared error filter is shown to evolve according to a random dynamical system (RDS) on the space of non-negative definite matrices. Our RDS formalism does not depend on the particular medium access protocol (randomized) and, under a minimal distributed observability assumption, we show that the sequence of random conditional prediction error covariance matrices converges in distribution to a unique invariant distribution (independent of the initial filter state), i.e., the conditional error process is shown to be ergodic. Under broad assumptions on the medium access protocol, we show that the conditional error covariance sequence satisfies a Markov-Feller property, leading to an explicit characterization of the support of its invariant measure. The methodology adopted in this work is sufficiently general to envision this application to sample path analysis of more general hybrid or switched systems, where existing analysis is mostly moment-based.
- We consider the weight design problem for the consensus algorithm under a finite time horizon. We assume that the underlying network is random where the links fail at each iteration with certain probability and the link failures can be spatially correlated. We formulate a family of weight design criteria (objective functions) that minimize n, n = 1,...,N (out of N possible) largest (slowest) eigenvalues of the matrix that describes the mean squared consensus error dynamics. We show that the objective functions are convex; hence, globally optimal weights (with respect to the design criteria) can be efficiently obtained. Numerical examples on large scale, sparse random networks with spatially correlated link failures show that: 1) weights obtained according to our criteria lead to significantly faster convergence than the choices available in the literature; 2) different design criteria that corresponds to different n, exhibits very interesting tradeoffs: faster transient performance leads to slower long time run performance and vice versa. Thus, n is a valuable degree of freedom and can be appropriately selected for the given time horizon.
- We present \emphtelescoping recursive representations for both continuous and discrete indexed noncausal Gauss-Markov random fields. Our recursions start at the boundary (a hypersurface in $\R^d$, $d \ge 1$) and telescope inwards. For example, for images, the telescoping representation reduce recursions from $d = 2$ to $d = 1$, i.e., to recursions on a single dimension. Under appropriate conditions, the recursions for the random field are linear stochastic differential/difference equations driven by white noise, for which we derive recursive estimation algorithms, that extend standard algorithms, like the Kalman-Bucy filter and the Rauch-Tung-Striebel smoother, to noncausal Markov random fields.
- We design the weights in consensus algorithms with spatially correlated random topologies. These arise with: 1) networks with spatially correlated random link failures and 2) networks with randomized averaging protocols. We show that the weight optimization problem is convex for both symmetric and asymmetric random graphs. With symmetric random networks, we choose the consensus mean squared error (MSE) convergence rate as optimization criterion and explicitly express this rate as a function of the link formation probabilities, the link formation spatial correlations, and the consensus weights. We prove that the MSE convergence rate is a convex, nonsmooth function of the weights, enabling global optimization of the weights for arbitrary link formation probabilities and link correlation structures. We extend our results to the case of asymmetric random links. We adopt as optimization criterion the mean squared deviation (MSdev) of the nodes states from the current average state. We prove that MSdev is a convex function of the weights. Simulations show that significant performance gain is achieved with our weight design method when compared with methods available in the literature.
- The paper presents higher dimension consensus (HDC) for large-scale networks. HDC generalizes the well-known average-consensus algorithm. It divides the nodes of the large-scale network into anchors and sensors. Anchors are nodes whose states are fixed over the HDC iterations, whereas sensors are nodes that update their states as a linear combination of the neighboring states. Under appropriate conditions, we show that the sensor states converge to a linear combination of the anchor states. Through the concept of anchors, HDC captures in a unified framework several interesting network tasks, including distributed sensor localization, leader-follower, distributed Jacobi to solve linear systems of algebraic equations, and, of course, average-consensus. In many network applications, it is of interest to learn the weights of the distributed linear algorithm so that the sensors converge to a desired state. We term this inverse problem the HDC learning problem. We pose learning in HDC as a constrained non-convex optimization problem, which we cast in the framework of multi-objective optimization (MOP) and to which we apply Pareto optimality. We prove analytically relevant properties of the MOP solutions and of the Pareto front from which we derive the solution to learning in HDC. Finally, the paper shows how the MOP approach resolves interesting tradeoffs (speed of convergence versus quality of the final state) arising in learning in HDC in resource constrained networks.
- The paper studies the asymptotic behavior of Random Algebraic Riccati Equations (RARE) arising in Kalman filtering when the arrival of the observations is described by a Bernoulli i.i.d. process. We model the RARE as an order-preserving, strongly sublinear random dynamical system (RDS). Under a sufficient condition, stochastic boundedness, and using a limit-set dichotomy result for order-preserving, strongly sublinear RDS, we establish the asymptotic properties of the RARE: the sequence of random prediction error covariance matrices converges weakly to a unique invariant distribution, whose support exhibits fractal behavior. In particular, this weak convergence holds under broad conditions and even when the observations arrival rate is below the critical probability for mean stability. We apply the weak-Feller property of the Markov process governing the RARE to characterize the support of the limiting invariant distribution as the topological closure of a countable set of points, which, in general, is not dense in the set of positive semi-definite matrices. We use the explicit characterization of the support of the invariant distribution and the almost sure ergodicity of the sample paths to easily compute the moments of the invariant distribution. A one dimensional example illustrates that the support is a fractured subset of the non-negative reals with self-similarity properties.
- In this paper, we propose a linear complexity encoding method for arbitrary LDPC codes. We start from a simple graph-based encoding method ``label-and-decide.'' We prove that the ``label-and-decide'' method is applicable to Tanner graphs with a hierarchical structure--pseudo-trees-- and that the resulting encoding complexity is linear with the code block length. Next, we define a second type of Tanner graphs--the encoding stopping set. The encoding stopping set is encoded in linear complexity by a revised label-and-decide algorithm--the ``label-decide-recompute.'' Finally, we prove that any Tanner graph can be partitioned into encoding stopping sets and pseudo-trees. By encoding each encoding stopping set or pseudo-tree sequentially, we develop a linear complexity encoding method for general LDPC codes where the encoding complexity is proved to be less than $4 \cdot M \cdot (\overline{k} - 1)$, where $M$ is the number of independent rows in the parity check matrix and $\overline{k}$ represents the mean row weight of the parity check matrix.
- The paper studies distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy inter-sensor communication. It introduces \emphseparably estimable observation models that generalize the observability condition in linear centralized estimation to nonlinear distributed estimation. It studies two distributed estimation algorithms in separably estimable models, the $\mathcal{NU}$ (with its linear counterpart $\mathcal{LU}$) and the $\mathcal{NLU}$. Their update rule combines a \emphconsensus step (where each sensor updates the state by weight averaging it with its neighbors' states) and an \emphinnovation step (where each sensor processes its local current observation.) This makes the three algorithms of the \textitconsensus + innovations type, very different from traditional consensus. The paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value,) efficiency, and asymptotic unbiasedness. For $\mathcal{LU}$ and $\mathcal{NU}$, it proves asymptotic normality and provides convergence rate guarantees. The three algorithms are characterized by appropriately chosen decaying weight sequences. Algorithms $\mathcal{LU}$ and $\mathcal{NU}$ are analyzed in the framework of stochastic approximation theory; algorithm $\mathcal{NLU}$ exhibits mixed time-scale behavior and biased perturbations, and its analysis requires a different approach that is developed in the paper.
- The paper develops DILOC, a \emphdistributive, \emphiterative algorithm that locates M sensors in $\mathbb{R}^m, m\geq 1$, with respect to a minimal number of m+1 anchors with known locations. The sensors exchange data with their neighbors only; no centralized data processing or communication occurs, nor is there centralized knowledge about the sensors' locations. DILOC uses the barycentric coordinates of a sensor with respect to its neighbors that are computed using the Cayley-Menger determinants. These are the determinants of matrices of inter-sensor distances. We show convergence of DILOC by associating with it an absorbing Markov chain whose absorbing states are the anchors. We introduce a stochastic approximation version extending DILOC to random environments when the knowledge about the intercommunications among sensors and the inter-sensor distances are noisy, and the communication links among neighbors fail at random times. We show a.s. convergence of the modified DILOC and characterize the error between the final estimates and the true values of the sensors' locations. Numerical studies illustrate DILOC under a variety of deterministic and random operating conditions.
- The paper studies the problem of distributed average consensus in sensor networks with quantized data and random link failures. To achieve consensus, dither (small noise) is added to the sensor states before quantization. When the quantizer range is unbounded (countable number of quantizer levels), stochastic approximation shows that consensus is asymptotically achieved with probability one and in mean square to a finite random variable. We show that the meansquared error (m.s.e.) can be made arbitrarily small by tuning the link weight sequence, at a cost of the convergence rate of the algorithm. To study dithered consensus with random links when the range of the quantizer is bounded, we establish uniform boundedness of the sample paths of the unbounded quantizer. This requires characterization of the statistical properties of the supremum taken over the sample paths of the state of the quantizer. This is accomplished by splitting the state vector of the quantizer in two components: one along the consensus subspace and the other along the subspace orthogonal to the consensus subspace. The proofs use maximal inequalities for submartingale and supermartingale sequences. From these, we derive probability bounds on the excursions of the two subsequences, from which probability bounds on the excursions of the quantizer state vector follow. The paper shows how to use these probability bounds to design the quantizer parameters and to explore tradeoffs among the number of quantizer levels, the size of the quantization steps, the desired probability of saturation, and the desired level of accuracy $\epsilon$ away from consensus. Finally, the paper illustrates the quantizer design with a numerical study.
- The paper studies average consensus with random topologies (intermittent links) \emphand noisy channels. Consensus with noise in the network links leads to the bias-variance dilemma--running consensus for long reduces the bias of the final average estimate but increases its variance. We present two different compromises to this tradeoff: the $\mathcal{A-ND}$ algorithm modifies conventional consensus by forcing the weights to satisfy a \emphpersistence condition (slowly decaying to zero); and the $\mathcal{A-NC}$ algorithm where the weights are constant but consensus is run for a fixed number of iterations $\hat{\imath}$, then it is restarted and rerun for a total of $\hat{p}$ runs, and at the end averages the final states of the $\hat{p}$ runs (Monte Carlo averaging). We use controlled Markov processes and stochastic approximation arguments to prove almost sure convergence of $\mathcal{A-ND}$ to the desired average (asymptotic unbiasedness) and compute explicitly the m.s.e. (variance) of the consensus limit. We show that $\mathcal{A-ND}$ represents the best of both worlds--low bias and low variance--at the cost of a slow convergence rate; rescaling the weights...
- This paper derives a \emphdistributed Kalman filter to estimate a sparsely connected, large-scale, $n-$dimensional, dynamical system monitored by a network of $N$ sensors. Local Kalman filters are implemented on the ($n_l-$dimensional, where $n_l\ll n$) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an $L$th order Gauss-Markovian structure of the centralized error processes. The information loss due to the $L$th order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as $L\uparrow$. The order of the approximation, $L$, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an $L$th order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of $n-$dimensional vectors and matrices is needed; only $n_l \ll n$ dimensional vectors and matrices are communicated or used in the computation at the sensors.
- In a sensor network, in practice, the communication among sensors is subject to:(1) errors or failures at random times; (3) costs; and(2) constraints since sensors and networks operate under scarce resources, such as power, data rate, or communication. The signal-to-noise ratio (SNR) is usually a main factor in determining the probability of error (or of communication failure) in a link. These probabilities are then a proxy for the SNR under which the links operate. The paper studies the problem of designing the topology, i.e., assigning the probabilities of reliable communication among sensors (or of link failures) to maximize the rate of convergence of average consensus, when the link communication costs are taken into account, and there is an overall communication budget constraint. To consider this problem, we address a number of preliminary issues: (1) model the network as a random topology; (2) establish necessary and sufficient conditions for mean square sense (mss) and almost sure (a.s.) convergence of average consensus when network links fail; and, in particular, (3) show that a necessary and sufficient condition for both mss and a.s. convergence is for the algebraic connectivity of the mean graph describing the network topology to be strictly positive. With these results, we formulate topology design, subject to random link failures and to a communication cost constraint, as a constrained convex optimization problem to which we apply semidefinite programming techniques. We show by an extensive numerical study that the optimal design improves significantly the convergence speed of the consensus algorithm and can achieve the asymptotic performance of a non-random network at a fraction of the communication cost.
- This paper presents a systematic methodology based on the algebraic theory of signal processing to classify and derive fast algorithms for linear transforms. Instead of manipulating the entries of transform matrices, our approach derives the algorithms by stepwise decomposition of the associated signal models, or polynomial algebras. This decomposition is based on two generic methods or algebraic principles that generalize the well-known Cooley-Tukey FFT and make the algorithms' derivations concise and transparent. Application to the 16 discrete cosine and sine transforms yields a large class of fast algorithms, many of which have not been found before.