results for au:Tan_V in:cs

- We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By proposing a new framework for analyzing convergence, we theoretically improve the (linear) convergence rates and computational complexities of the stochastic L-BFGS algorithms in previous works. In addition, we propose several practical acceleration strategies to speed up the empirical performance of such algorithms. We also provide theoretical analyses for most of the strategies. Experiments on large-scale logistic and ridge regression problems demonstrate that our proposed strategies yield significant improvements via-à-vis competing state-of-the-art algorithms.
- The learning of Gaussian mixture models (GMMs) is a classical problem in machine learning and applied statistics. This can also be interpreted as a clustering problem. Indeed, given data samples independently generated from a GMM, we would like to find the correct target clustering of the samples according to which Gaussian they were generated from. Despite the large number of algorithms designed to find the correct target clustering, many practitioners prefer to use the k-means algorithm because of its simplicity. k-means tries to find an optimal clustering which minimizes the sum of squared distances between each point and its cluster center. In this paper, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering of the samples which are independently generated from a GMM. Moreover, to achieve significantly faster running time and reduced memory usage, we show that under weaker conditions on the GMM, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of k-means as an algorithm for learning a GMM, further substantiating the conclusions in Kumar and Kannan [2010]. We verify the correctness of our theorems using numerical experiments and show, using datasets with reduced dimensionality, significant speed ups for the time required to perform clustering.
- This paper investigates the asymptotic expansion for the maximum coding rate of a parallel Gaussian channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probability of decoding the transmitted message is non-vanishing as the blocklength increases. It is well known that the presence of feedback does not increase the first-order asymptotics of the channel, i.e., capacity, in the asymptotic expansion, and the closed-form expression of the capacity can be obtained by the well-known water-filling algorithm. The main contribution of this paper is a self-contained proof of an upper bound on the second-order asymptotics of the parallel Gaussian channel with feedback. The proof techniques involve developing an information spectrum bound followed by using Curtiss' theorem to show that a sum of dependent random variables associated with the information spectrum bound converges in distribution to a sum of independent random variables, thus facilitating the use of the usual central limit theorem. Combined with existing achievability results, our result implies that the presence of feedback does not improve the second-order asymptotics.
- In this paper, we revisit the high-dimensional content identification with lossy recovery problem (Tuncel and Gündüz, 2014). We first present a non-asymptotic converse bound. Invoking the non-asymptotic converse bound, we derive a lower bound on the exponent of the probability of correct decoding (the strong converse exponent) and show the lower bound is strictly positive if the rate-distortion tuple falls outside the rate-distortion region by Tuncel and Gündüz. Hence, we establish the exponential strong converse theorem for the content identification problem with lossy recovery. As corollaries of the exponential strong converse theorem, we derive an upper bound on the joint excess-distortion exponent for the problem. Our main results can be specialized to the biometrical identification problem~(Willems, 2003) and the content identification problem~(Tuncel, 2009) since these two problems are both special cases of the content identification problem with lossy recovery. We leverage the information spectrum method introduced by Oohama (2015, 2016). We adapt the strong converse techniques therein to be applicable to the problem at hand and we unify the analysis carefully to obtain the desired results.
- The problem of publishing privacy-guaranteed data for hypothesis testing is studied using the maximal leakage (ML) as a metric for privacy and the type-II error exponent as the utility metric. The optimal mechanism (random mapping) that maximizes utility for a bounded leakage guarantee is determined for the entire leakage range for binary datasets. For non-binary datasets, approximations in the high privacy and high utility regimes are developed. The results show that, for any desired leakage level, maximizing utility forces the ML privacy mechanism to reveal partial to complete knowledge about a subset of the source alphabet. The results developed on maximizing a convex function over a polytope may also of an independent interest.
- We analyse families of codes for classical data transmission over quantum channels that have both a vanishing probability of error and a code rate approaching capacity as the code length increases. To characterise the fundamental tradeoff between decoding error, code rate and code length for such codes we introduce a quantum generalisation of the moderate deviation analysis proposed by Altug and Wagner as well as Polyanskiy and Verdu. We derive such a tradeoff for classical-quantum (as well as image-additive) channels in terms of the channel capacity and the channel dispersion, giving further evidence that the latter quantity characterises the necessary backoff from capacity when transmitting finite blocks of classical data. To derive these results we also study asymmetric binary quantum hypothesis testing in the moderate deviations regime. Due to the central importance of the latter task, we expect that our techniques will find further applications in the analysis of other quantum information processing tasks.
- We study the application of polar codes in deletion channels by analyzing the cascade of a binary erasure channel (BEC) and a deletion channel. We show how polar codes can be used effectively on a BEC with a single deletion, and propose a list decoding algorithm with a cyclic redundancy check for this case. The decoding complexity is $O(N^2\log N)$, where $N$ is the blocklength of the code. An important contribution is an optimization of the amount of redundancy added to minimize the overall error probability. Our theoretical results are corroborated by numerical simulations which show that the list size can be reduced to one and the original message can be recovered with high probability as the length of the code grows.
- This paper investigates the achievable rates of an additive white Gaussian noise (AWGN) energy-harvesting (EH) channel with an infinite battery under the assumption that the error probabilities do not vanish as the blocklength increases. The EH process is characterized by a sequence of blocks of harvested energy. The harvested energy remains constant within a block while the harvested energy across different blocks is characterized by a sequence of independent and identically distributed (i.i.d.) random variables. The blocks have length $L$, which can be interpreted as the coherence time of the energy arrival process. If $L$ is a constant or grows sublinearly in the blocklength $n$, we fully characterize the first-order coding rate. In addition, we obtain lower and upper bounds on the second-order coding rate, which are proportional to $-\sqrt{\frac{L}{n}}$ for any fixed error probability $<1/2$. If $L$ grows linearly in $n$, we obtain lower and upper bounds on the first-order coding rate, which coincide whenever the EH random variable is continuous. Our results suggest that correlation in the energy-arrival process decreases the effective blocklength by a factor of~$L$.
- This work investigates the limits of communication over a noisy channel that wears out, in the sense of signal-dependent catastrophic failure. In particular, we consider a channel that starts as a memoryless binary-input channel and when the number of transmitted ones causes a sufficient amount of damage, the channel ceases to convey signals. We restrict attention to constant composition codes. Since infinite blocklength codes will always wear out the channel for any finite threshold of failure and therefore convey no information, we make use of finite blocklength codes to determine the maximum expected transmission volume at a given level of average error probability. We show that this maximization problem has a recursive form and can be solved by dynamic programming. A discussion of damage state feedback in channels that wear out is also provided. Numerical results show that a sequence of block codes is preferred to a single block code for streaming sources.
- We derive upper and lower bounds on the reliability function for the common-message discrete memoryless broadcast channel with variable-length feedback. We show that the bounds are tight when the broadcast channel is stochastically degraded. For the achievability part, we adapt Yamamoto and Itoh's coding scheme by controlling the expectation of the maximum of a set of stopping times. For the converse part, we adapt Burnashev's proof techniques for establishing the reliability functions for (point-to-point) discrete memoryless channels with variable-length feedback and sequential hypothesis testing.
- This paper is motivated by the error-control problem in communication channels in which the transmitted sequences are subjected to random permutations, in addition to being impaired with insertions, deletions, substitutions, and erasures of symbols. Bounds on the size of optimal codes in this setting are derived, and their asymptotic behavior examined in the fixed-minimum-distance regime. A family of codes correcting these types of errors is described and is shown to be asymptotically optimal for some sets of parameters. The corresponding error-detection problem is also analyzed. Applications to data transmission over packet networks based on multipath routing are discussed.
- Given a sufficient statistic for a parametric family of distributions, one can estimate the parameter without access to the data itself. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, for $n$ independent data samples drawn from a $k$-nomial distribution with $d=k-1$ degrees of freedom, the length of the code scales as $d\log n+O(1)$. In many applications though, we may not have a useful notion of sufficient statistics (e.g., when the parametric family is not an exponential family) and also may not need to reconstruct the generating distribution exactly. By adopting a Shannon-theoretic approach in which we consider allow a small error in estimating the generating distribution, we construct various notions of \em approximate sufficient statistics and show that the code length can be reduced to $\frac{d}{2}\log n+O(1)$. We also note that the locality assumption that is used to describe the notion of local approximate sufficient statistics when the parametric family is not an exponential family can be dispensed of. We consider errors measured according to the relative entropy and variational distance criteria. For the code construction parts, we leverage Rissanen's minimum description length principle, which yields a non-vanishing error measured using the relative entropy. For the converse parts, we use Clarke and Barron's asymptotic expansion for the relative entropy of a parametrized distribution and the corresponding mixture distribution. The limitation of this method is that only a weak converse for the variational distance can be shown. We develop new techniques to achieve vanishing errors and we also prove strong converses for all our statements. The latter means that even if the code is allowed to have a non-vanishing error, its length must still be at least $\frac{d}{2}\log n$.
- A $ B_h $ set (or Sidon set of order $ h $) in an Abelian group $ G $ is any subset $ \{b_0, b_1, \ldots,b_{n}\} \subset G $ with the property that all the sums $ b_{i_1} + \cdots + b_{i_h} $ are different up to the order of the summands. Let $ \phi(h,n) $ denote the order of the smallest Abelian group containing a $ B_h $ set of cardinality $ n + 1 $. It is shown that, as $ h \to \infty $ and $ n $ is kept fixed, \[ \phi(h,n) ∼\frac1n! \ \delta_L(\triangle^n) h^n , \]where $ \delta_L(\triangle^n) $ is the lattice packing density of an $ n $-simplex in the Euclidean space. This determines the asymptotics exactly in cases where this density is known ($ n \leq 3 $) and gives improved bounds on $ \phi(h,n) $ in the remaining cases. The corresponding geometric characterization of bases of order $ h $ in finite Abelian groups in terms of lattice coverings by simplices is also given.
- The multiplicative update (MU) algorithm has been used extensively to estimate the basis and coefficient matrices in nonnegative matrix factorization (NMF) problems under a wide range of divergences and regularizations. However, theoretical convergence guarantees have only been derived for a few special divergences. In this work, we provide a conceptually simple, self-contained, and unified proof for the convergence of the MU algorithm applied on NMF with a wide range of divergences and regularizations. Our result shows the sequence of iterates (i.e., pairs of basis and coefficient matrices) produced by the MU algorithm converges to the set of stationary points of the NMF (optimization) problem. Our proof strategy has the potential to open up new avenues for analyzing similar problems.
- We characterize the information-theoretic limits of the additive white Gaussian noise (AWGN) channel and the Gaussian multiple access channel (MAC) when variable-length feedback is available at the encoder and a non-vanishing error probability is permitted. For the AWGN channel, we establish the $\varepsilon$-capacity (for $0<\varepsilon<1$) and show that it is larger than the corresponding $\varepsilon$-capacity when fixed-length feedback is available. Due to the continuous nature of the channel and the presence of expected power constraints, we need to develop new achievability and converse techniques. In addition, we show that a variable-length feedback with termination (VLFT) code outperforms a stop-feedback code in terms of the second-order asymptotic behavior. Finally, we extend out analyses to the Gaussian MAC with the two types of variable-length feedback where we establish the $\varepsilon$-capacity region. Due to the multi-terminal nature of the channel model, we are faced with the need to bound the asymptotic behavior of the expected value of the maximum of several stopping times. We do so by leveraging tools from renewal theory developed by Lai and Siegmund.
- Binary hypothesis testing under the Neyman-Pearson formalism is a statistical inference framework for distinguishing data generated by two different source distributions. Privacy restrictions may require the curator of the data or the data respondents themselves to share data with the test only after applying a randomizing privacy mechanism. Using mutual information as the privacy metric and the relative entropy between the two distributions of the output (postrandomization) source classes as the utility metric (motivated by the Chernoff-Stein Lemma), this work focuses on finding an optimal mechanism that maximizes the chosen utility function while ensuring that the mutual information based leakage for both source distributions is bounded. Focusing on the high privacy regime, an Euclidean information-theoretic (E-IT) approximation to the tradeoff problem is presented. It is shown that the solution to the E-IT approximation is independent of the alphabet size and clarifies that a mutual information based privacy metric preserves the privacy of the source symbols in inverse proportion to their likelihood.
- This paper considers a multimessage network where each node may send a message to any other node in the network. Under the discrete memoryless model, we prove the strong converse theorem for any network with tight cut-set bound, i.e., whose cut-set bound is achievable. Our result implies that for any network with tight cut-set bound and any fixed rate vector that resides outside the capacity region, the average error probabilities of any sequence of length-$n$ codes operated at the rate vector must tend to $1$ as $n$ grows. The proof is based on the method of types. The proof techniques are inspired by the work of Csiszár and Körner in 1982 which fully characterized the reliability function of any discrete memoryless channel (DMC) with feedback for rates above capacity. In addition, we generalize the strong converse theorem to the Gaussian model where each node is subject to a peak power constraint. Important consequences of our results are new strong converses for the Gaussian multiple access channel (MAC) with feedback and the following relay channels under both models: The degraded relay channel (RC), the RC with orthogonal sender components, and the general RC with feedback.
- In this paper, we analyze the asymptotics of the normalized remaining uncertainty of a source when a compressed or hashed version of it and correlated side-information is observed. For this system, commonly known as Slepian-Wolf source coding, we establish the optimal (minimum) rate of compression of the source to ensure that the remaining uncertainties vanish. We also study the exponential rate of decay of the remaining uncertainty to zero when the rate is above the optimal rate of compression. In our study, we consider various classes of random universal hash functions. Instead of measuring remaining uncertainties using traditional Shannon information measures, we do so using two forms of the conditional Rényi entropy. Among other techniques, we employ new one-shot bounds and the moments of type class enumerator method for these evaluations. We show that these asymptotic results are generalizations of the strong converse exponent and the error exponent of the Slepian-Wolf problem under maximum \empha posteriori (MAP) decoding.
- Motivated by communication scenarios such as timing channels (in queuing systems, molecular communications, etc.) and bit-shift channels (in magnetic recording systems), we study the error control problem in cases where the dominant type of noise are symbol shifts. In particular, two channel models are introduced and their zero-error capacities determined by an explicit construction of optimal zero-error codes. Model A can be informally described as follows: 1) The information is stored in an $ n $-cell register, where each cell can either be left empty, or can contain a particle of one of $ P $ possible types, and 2) due to the imperfections of the device every particle is shifted $ k $ cells away from its original position over time, where $ k $ is drawn from a certain range of integers, without the possibility of reordering particles. Model B is an abstraction of a single-server queue: 1) The transmitter sends symbols/packets from a $ P $-ary alphabet through a queuing system with an infinite buffer, and 2) each packet is being processed by the server for a number of time slots $ k \in \{0, 1, \ldots, K \} $. Several variations of the above models are also discussed, e.g., with multiple particles per cell, with additional types of noise, and the continuous-time case. The models are somewhat atypical due to the fact that the length of the channel output in general differs from that of the corresponding input, and that this length depends on the noise (shift) pattern as well as on the input itself. This will require the notions of a zero-error code and the zero-error capacity, as introduced by Shannon, to be generalized.
- In this paper, we consider the problem of blockwise streaming compression of a pair of correlated sources, which we term streaming Slepian-Wolf coding. We study the moderate deviations regime in which the rate pairs of a sequence of codes converges, along a straight line, to various points on the boundary of the Slepian-Wolf region at a speed slower than the inverse square root of the blocklength $n$, while the error probability decays subexponentially fast in $n$. Our main result focuses on directions of approaches to corner points of the Slepian-Wolf region. It states that for each correlated source and all corner points, there exists a non-empty subset of directions of approaches such that the moderate deviations constant (the constant of proportionality for the subexponential decay of the error probability) is enhanced (over the non-streaming case) by at least a factor of $T$, the block delay of decoding symbol pairs. We specialize our main result to the setting of lossless streaming source coding and generalize this result to the setting where we have different delay requirements for each of the two source blocks. The proof of our main result involves the use of various analytical tools and amalgamates several ideas from the recent information-theoretic streaming literature. We adapt the so-called truncated memory idea from Draper and Khisti (2011) and Lee, Tan and Khisti (2015) to ensure that the effect of error accumulation is nullified in the limit of large blocklengths. We also adapt the use of the so-called minimum empirical suffix entropy decoder which was used by Draper, Chang and Sahai (2014) to derive achievable error exponents for streaming Slepian-Wolf coding.
- In this paper, a streaming transmission setup is considered where an encoder observes a new message in the beginning of each block and a decoder sequentially decodes each message after a delay of $T$ blocks. In this streaming setup, the fundamental interplay between the coding rate, the error probability, and the blocklength in the moderate deviations regime is studied. For output symmetric channels, the moderate deviations constant is shown to improve over the block coding or non-streaming setup by exactly a factor of $T$ for a certain range of moderate deviations scalings. For the converse proof, a more powerful decoder to which some extra information is fedforward is assumed. The error probability is bounded first for an auxiliary channel and this result is translated back to the original channel by using a newly developed change-of-measure lemma, where the speed of decay of the remainder term in the exponent is carefully characterized. For the achievability proof, a known coding technique that involves a joint encoding and decoding of fresh and past messages is applied with some manipulations in the error analysis.
- We propose a unified and systematic framework for performing online nonnegative matrix factorization in the presence of outliers. Our framework is particularly suited to large-scale data. We propose two solvers based on projected gradient descent and the alternating direction method of multipliers. We prove that the sequence of objective values converges almost surely by appealing to the quasi-martingale convergence theorem. We also show the sequence of learned dictionaries converges to the set of stationary points of the expected loss function almost surely. In addition, we extend our basic problem formulation to various settings with different constraints and regularizers. We also adapt the solvers and analyses to each setting. We perform extensive experiments on both synthetic and real datasets. These experiments demonstrate the computational efficiency and efficacy of our algorithms on tasks such as (parts-based) basis learning, image denoising, shadow removal and foreground-background separation.
- This paper revisits the Gaussian degraded relay channel, where the link that carries information from the source to the destination is a physically degraded version of the link that carries information from the source to the relay. The source and the relay are subject to expected power constraints. The $\varepsilon$-capacity of the channel is characterized and it is strictly larger than the capacity for $\varepsilon>0$, which implies that the channel does not possess the strong converse property. The proof of the achievability part is based on several key ideas: block Markov coding which is used in the classical decode-forward strategy, power control for Gaussian channels under expected power constraints, and a careful scaling between the block size and the total number of block uses. The converse part is proved by first establishing two non-asymptotic lower bounds on the error probability, which are derived from the type-II errors of some binary hypothesis tests. Subsequently, each lower bound is simplified by conditioning on an event related to the power of some linear combination of the codewords transmitted by the source and the relay. Lower and upper bounds on the second-order term of the optimal coding rate in terms of blocklength and error probability are also obtained.
- We study the top-$K$ ranking problem where the goal is to recover the set of top-$K$ ranked items out of a large collection of items based on partially revealed preferences. We consider an adversarial crowdsourced setting where there are two population sets, and pairwise comparison samples drawn from one of the populations follow the standard Bradley-Terry-Luce model (i.e., the chance of item $i$ beating item $j$ is proportional to the relative score of item $i$ to item $j$), while in the other population, the corresponding chance is inversely proportional to the relative score. When the relative size of the two populations is known, we characterize the minimax limit on the sample size required (up to a constant) for reliably identifying the top-$K$ items, and demonstrate how it scales with the relative size. Moreover, by leveraging a tensor decomposition method for disambiguating mixture distributions, we extend our result to the more realistic scenario in which the relative population size is unknown, thus establishing an upper bound on the fundamental limit of the sample size for recovering the top-$K$ set.
- We derive the optimal second-order coding region and moderate deviations constant for successive refinement source coding with a joint excess-distortion probability constraint. We consider two scenarios: (i) a discrete memoryless source (DMS) and arbitrary distortion measures at the decoders and (ii) a Gaussian memoryless source (GMS) and quadratic distortion measures at the decoders. For a DMS with arbitrary distortion measures, we prove an achievable second-order coding region, using type covering lemmas by Kanlis and Narayan and by No, Ingber and Weissman. We prove the converse using the perturbation approach by Gu and Effros. When the DMS is successively refinable, the expressions for the second-order coding region and the moderate deviations constant are simplified and easily computable. For this case, we also obtain new insights on the second-order behavior compared to the scenario where separate excess-distortion proabilities are considered. For example, we describe a DMS, for which the optimal second-order region transitions from being characterizable by a bivariate Gaussian to a univariate Gaussian, as the distortion levels are varied. We then consider a GMS with quadratic distortion measures. To prove the direct part, we make use of the sphere covering theorem by Verger-Gaugry, together with appropriately-defined Gaussian type classes. To prove the converse, we generalize Kostina and Verdú's one-shot converse bound for point-to-point lossy source coding. We remark that this proof is applicable to general successively refinable sources. In the proofs of the moderate deviations results for both scenarios, we follow a strategy similar to that for the second-order asymptotics and use the moderate deviations principle.
- This paper investigates the scaling exponent of polar codes for binary-input energy-harvesting (EH) channels with infinite-capacity batteries. The EH process is characterized by a sequence of i.i.d. random variables with finite variances. The scaling exponent $\mu$ of polar codes for a binary-input memoryless channel (BMC) characterizes the closest gap between the capacity and non-asymptotic rates achieved by polar codes with error probabilities no larger than some non-vanishing $\varepsilon\in(0,1)$. It has been shown that for any $\varepsilon\in(0,1)$, the scaling exponent $\mu$ for any binary-input memoryless symmetric channel (BMSC) with $I(q_{Y|X})\in(0,1)$ lies between 3.579 and 4.714 , where the upper bound $4.714$ was shown by an explicit construction of polar codes. Our main result shows that $4.714$ remains to be a valid upper bound on the scaling exponent for any binary-input EH channel, i.e., a BMC subject to additional EH constraints. Our result thus implies that the EH constraints do not worsen the rate of convergence to capacity if polar codes are employed. The main result is proved by leveraging the following three existing results: scaling exponent analyses for BMSCs, construction of polar codes designed for binary-input memoryless asymmetric channels, and the save-and-transmit strategy for EH channels.
- We consider streaming data transmission over a discrete memoryless channel. A new message is given to the encoder at the beginning of each block and the decoder decodes each message sequentially, after a delay of $T$ blocks. In this streaming setup, we study the fundamental interplay between the rate and error probability in the central limit and moderate deviations regimes and show that i) in the moderate deviations regime, the moderate deviations constant improves over the block coding or non-streaming setup by a factor of $T$ and ii) in the central limit regime, the second-order coding rate improves by a factor of approximately $\sqrt{T}$ for a wide range of channel parameters. For both regimes, we propose coding techniques that incorporate a joint encoding of fresh and previous messages. In particular, for the central limit regime, we propose a coding technique with truncated memory to ensure that a summation of constants, which arises as a result of applications of the central limit theorem, does not diverge in the error analysis. Furthermore, we explore interesting variants of the basic streaming setup in the moderate deviations regime. We first consider a scenario with an erasure option at the decoder and show that both the exponents of the total error and the undetected error probabilities improve by factors of $T$. Next, by utilizing the erasure option, we show that the exponent of the total error probability can be improved to that of the undetected error probability (in the order sense) at the expense of a variable decoding delay. Finally, we also extend our results to the case where the message rate is not fixed but alternates between two values.
- We study the second-order asymptotics of information transmission using random Gaussian codebooks and nearest neighbor (NN) decoding over a power-limited stationary memoryless additive non-Gaussian noise channel. We show that the dispersion term depends on the non-Gaussian noise only through its second and fourth moments, thus complementing the capacity result (Lapidoth, 1996), which depends only on the second moment. Furthermore, we characterize the second-order asymptotics of point-to-point codes over $K$-sender interference networks with non-Gaussian additive noise. Specifically, we assume that each user's codebook is Gaussian and that NN decoding is employed, i.e., that interference from the $K-1$ unintended users (Gaussian interfering signals) is treated as noise at each decoder. We show that while the first-order term in the asymptotic expansion of the maximum number of messages depends on the power of the interferring codewords only through their sum, this does not hold for the second-order term.
- In this paper, we consider single- and multi-user Gaussian channels with feedback under expected power constraints and with non-vanishing error probabilities. In the first of two contributions, we study asymptotic expansions for the additive white Gaussian noise (AWGN) channel with feedback under the average error probability formalism. By drawing ideas from Gallager and Nakiboğlu's work for the direct part and the meta-converse for the converse part, we establish the $\varepsilon$-capacity and show that it depends on $\varepsilon$ in general and so the strong converse fails to hold. Furthermore, we provide bounds on the second-order term in the asymptotic expansion. We show that for any positive integer $L$, the second-order term is bounded between a term proportional to $-\ln_{(L)} n$ (where $\ln_{(L)}(\cdot)$ is the $L$-fold nested logarithm function) and a term proportional to $+\sqrt{n\ln n}$ where $n$ is the blocklength. The lower bound on the second-order term shows that feedback does provide an improvement in the maximal achievable rate over the case where no feedback is available. In our second contribution, we establish the $\varepsilon$-capacity region for the AWGN multiple access channel (MAC) with feedback under the expected power constraint by combining ideas from hypothesis testing, information spectrum analysis, Ozarow's coding scheme, and power control.
- In this paper, we revisit the discrete lossy Gray-Wyner problem. In particular, we derive its optimal second-order coding rate region, its error exponent (reliability function) and its moderate deviations constant under mild conditions on the source. To obtain the second-order asymptotics, we extend some ideas from Watanabe's work (2015). In particular, we leverage the properties of an appropriate generalization of the conditional distortion-tilted information density, which was first introduced by Kostina and Verdú (2012). The converse part uses a perturbation argument by Gu and Effros (2009) in their strong converse proof of the discrete Gray-Wyner problem. The achievability part uses two novel elements: (i) a generalization of various type covering lemmas; and (ii) the uniform continuity of the conditional rate-distortion function in both the source (joint) distribution and the distortion level. To obtain the error exponent, for the achievability part, we use the same generalized type covering lemma and for the converse, we use the strong converse together with a change-of-measure technique. Finally, to obtain the moderate deviations constant, we apply the moderate deviations theorem to probabilities defined in terms of information spectrum quantities.
- This paper considers delay-limited communication over quasi-static fading channels under a long-term power constraint. A sequence of length-$n$ delay-limited codes for a quasi-static fading channel is said to be capacity-achieving if the codes achieve the delay-limited capacity, which is defined to be the maximum rate achievable by delay-limited codes. The delay-limited capacity is sometimes referred to as the zero-outage capacity in wireless communications. The delay-limited capacity is the appropriate choice of performance measure for delay-sensitive applications such as voice and video over fading channels. It is shown that for any sequence of capacity-achieving delay-limited codes with vanishing error probabilities, the normalized relative entropy between the output distribution induced by the length-$n$ code and the $n$-fold product of the capacity-achieving output distribution, denoted by $\frac{1}{n}D(p_{Y^n}\|p_{Y^n}^*)$, converges to zero. Additionally, we extend our convergence result to capacity-achieving delay-limited codes with non-vanishing error probabilities.
- We prove that 2-user Gaussian broadcast channels admit the strong converse. This implies that every sequence of block codes with an asymptotic average error probability smaller than one is such that all the limit points of the sequence of rate pairs must lie within the capacity region derived by Cover and Bergmans. The main mathematical tool required for our analysis is a logarithmic Sobolev inequality known as the Gaussian Poincaré inequality.
- We derive upper and lower bounds for the error exponents of lossless streaming compression of two correlated sources under the blockwise and symbolwise settings. We consider the linear scaling regime in which the delay is a scalar multiple of the number of symbol pairs of interest. We show that for rate pairs satisfying certain constraints, the upper and lower bounds for the error exponent of blockwise codes coincide. For symbolwise codes, the bounds coincide for rate pairs satisfying the aforementioned constraints and a certain condition on the symbol pairs we wish to decode---namely, that their indices are asymptotically comparable to the blocklength. We also derive moderate deviations constants for blockwise and symbolwise codes, leveraging the error exponent results, and using appropriate Taylor series expansions. In particular, for blockwise codes, we derive an information spectrum-type strong converse, giving the complete characterization of the moderate deviations constants. For symbolwise codes, under an additional requirement on the backoff from the first-order fundamental limit, we can show that the moderate deviations constants are the same as the blockwise setting.
- This paper investigates the information-theoretic limits of energy-harvesting (EH) channels in the finite blocklength regime. The EH process is characterized by a sequence of i.i.d. random variables with finite variances. We use the save-and-transmit strategy proposed by Ozel and Ulukus (2012) together with Shannon's non-asymptotic achievability bound to obtain lower bounds on the achievable rates for both additive white Gaussian noise channels and discrete memoryless channels under EH constraints. The first-order terms of the lower bounds of the achievable rates are equal to $C$ and the second-order (backoff from capacity) terms are proportional to $-\sqrt{ \frac{\log n}{n}}$, where $n$ denotes the blocklength and $C$ denotes the capacity of the EH channel, which is the same as the capacity without the EH constraints. The constant of proportionality of the backoff term is found and qualitative interpretations are provided.
- In this paper, we study a security problem on a simple wiretap network, consisting of a source node S, a destination node D, and an intermediate node R. The intermediate node connects the source and the destination nodes via a set of noiseless parallel channels, with sizes $n_1$ and $n_2$, respectively. A message $M$ is to be sent from S to D. The information in the network may be eavesdropped by a set of wiretappers. The wiretappers cannot communicate with one another. Each wiretapper can access a subset of channels, called a wiretap set. All the chosen wiretap sets form a wiretap pattern. A random key $K$ is generated at S and a coding scheme on $(M, K)$ is employed to protect $M$. We define two decoding classes at D: In Class-I, only $M$ is required to be recovered and in Class-II, both $M$ and $K$ are required to be recovered. The objective is to minimize $H(K)/H(M)$ for a given wiretap pattern under the perfect secrecy constraint. The first question we address is whether routing is optimal on this simple network. By enumerating all the wiretap patterns on the Class-I/II $(3,3)$ networks and harnessing the power of Shannon-type inequalities, we find that gaps exist between the bounds implied by routing and the bounds implied by Shannon-type inequalities for a small fraction~($<2\%$) of all the wiretap patterns. The second question we investigate is the following: What is $\min H(K)/H(M)$ for the remaining wiretap patterns where gaps exist? We study some simple wiretap patterns and find that their Shannon bounds (i.e., the lower bound induced by Shannon-type inequalities) can be achieved by linear codes, which means routing is not sufficient even for the ($3$, $3$) network. For some complicated wiretap patterns, we study the structures of linear coding schemes under the assumption that they can achieve the corresponding Shannon bounds....
- We evaluate the asymptotics of equivocations, their exponents as well as their second-order coding rates under various Rényi information measures. Specifically, we consider the effect of applying a hash function on a source and we quantify the level of non-uniformity and dependence of the compressed source from another correlated source when the number of copies of the sources is large. Unlike previous works that use Shannon information measures to quantify randomness, information or uniformity, we define our security measures in terms of a more general class of information measures--the Rényi information measures and their Gallager-type counterparts. A special case of these Rényi information measure is the class of Shannon information measures. We prove tight asymptotic results for the security measures and their exponential rates of decay. We also prove bounds on the second-order asymptotics and show that these bounds match when the magnitudes of the second-order coding rates are large. We do so by establishing new classes non-asymptotic bounds on the equivocation and evaluating these bounds using various probabilistic limit theorems asymptotically.
- This monograph presents a unified treatment of single- and multi-user problems in Shannon's information theory where we depart from the requirement that the error probability decays asymptotically in the blocklength. Instead, the error probabilities for various problems are bounded above by a non-vanishing constant and the spotlight is shone on achievable coding rates as functions of the growing blocklengths. This represents the study of asymptotic estimates with non-vanishing error probabilities. In Part I, after reviewing the fundamentals of information theory, we discuss Strassen's seminal result for binary hypothesis testing where the type-I error probability is non-vanishing and the rate of decay of the type-II error probability with growing number of independent observations is characterized. In Part II, we use this basic hypothesis testing result to develop second- and sometimes, even third-order asymptotic expansions for point-to-point communication. Finally in Part III, we consider network information theory problems for which the second-order asymptotics are known. These problems include some classes of channels with random state, the multiple-encoder distributed lossless source coding (Slepian-Wolf) problem and special cases of the Gaussian interference and multiple-access channels. Finally, we discuss avenues for further research.
- We prove the strong converse for the $N$-source Gaussian multiple access channel (MAC). In particular, we show that any rate tuple that can be supported by a sequence of codes with asymptotic average error probability less than one must lie in the Cover-Wyner capacity region. Our proof consists of the following. First, we perform an expurgation step to convert any given sequence of codes with asymptotic average error probability less than one to codes with asymptotic maximal error probability less than one. Second, we quantize the input alphabets with an appropriately chosen resolution. Upon quantization, we apply the wringing technique (by Ahlswede) on the quantized inputs to obtain further subcodes from the subcodes obtained in the expurgation step so that the resultant correlations among the symbols transmitted by the different sources vanish as the blocklength grows. Finally, we derive upper bounds on achievable sum-rates of the subcodes in terms of the type-II error of a binary hypothesis test. These upper bounds are then simplified through judicious choices of auxiliary output distributions. Our strong converse result carries over to the Gaussian interference channel under strong interference as long as the sum of the two asymptotic average error probabilities less than one.
- Error and erasure exponents for the broadcast channel with degraded message sets are analyzed. The focus of our error probability analysis is on the main receiver where, nominally, both messages are to be decoded. A two-step decoding algorithm is proposed and analyzed. This receiver first attempts to decode both messages, failing which, it attempts to decode only the message representing the coarser information, i.e., the cloud center. This algorithm reflects the intuition that we should decode both messages only if we have confidence in the estimates; otherwise one should only decode the coarser information. The resulting error and erasure exponents, derived using the method of types, are expressed in terms of a penalized form of the modified random coding error exponent.
- This paper characterizes the second-order coding rates for lossy source coding with side information available at both the encoder and the decoder. We first provide non-asymptotic bounds for this problem and then specialize the non-asymptotic bounds for three different scenarios: discrete memoryless sources, Gaussian sources, and Markov sources. We obtain the second-order coding rates for these settings. It is interesting to observe that the second-order coding rate for Gaussian source coding with Gaussian side information available at both the encoder and the decoder is the same as that for Gaussian source coding without side information. Furthermore, regardless of the variance of the side information, the dispersion is $1/2$ nats squared per source symbol.
- This paper investigates the asymptotic expansion for the size of block codes defined for the additive white Gaussian noise (AWGN) channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probability of decoding the transmitted message is non-vanishing as the blocklength increases. It is well-known that the presence of feedback does not increase the first-order asymptotics (i.e., capacity) in the asymptotic expansion for the AWGN channel. The main contribution of this paper is a self-contained proof of an upper bound on the asymptotic expansion for the AWGN channel with feedback. Combined with existing achievability results for the AWGN channel, our result implies that the presence of feedback does not improve the second- and third-order asymptotics. An auxiliary contribution is a proof of the strong converse for the parallel Gaussian channels with feedback under a peak power constraint.
- This paper studies the second-order asymptotics of the discrete memoryless multiple-access channel with degraded message sets. For a fixed average error probability $\epsilon\in(0,1)$ and an arbitrary point on the boundary of the capacity region, we characterize the speed of convergence of rate pairs that converge to that point for codes that have asymptotic error probability no larger than $\epsilon$, thus complementing an analogous result given previously for the Gaussian setting.
- This paper establishes that the strong converse holds for some classes of discrete memoryless multimessage multicast networks (DM-MMNs) whose corresponding cut-set bounds are tight, i.e., coincide with the set of achievable rate tuples. The strong converse for these classes of DM-MMNs implies that all sequences of codes with rate tuples belonging to the exterior of the cut-set bound have average error probabilities that necessarily tend to one (and are not simply bounded away from zero). Examples in the classes of DM-MMNs include wireless erasure networks, DM-MMNs consisting of independent discrete memoryless channels (DMCs) as well as single-destination DM-MMNs consisting of independent DMCs with destination feedback. Our elementary proof technique leverages properties of the Rényi divergence.
- The problem of channel coding with the erasure option is revisited for discrete memoryless channels. The interplay between the code rate, the undetected and total error probabilities is characterized. Using the information spectrum method, a sequence of codes of increasing blocklengths $n$ is designed to illustrate this tradeoff. Furthermore, for additive discrete memoryless channels with uniform input distribution, we establish that our analysis is tight with respect to the ensemble average. This is done by analysing the ensemble performance in terms of a tradeoff between the code rate, the undetected and the total errors. This tradeoff is parametrized by the threshold in a generalized likelihood ratio test. Two asymptotic regimes are studied. First, the code rate tends to the capacity of the channel at a rate slower than $n^{-1/2}$ corresponding to the moderate deviations regime. In this case, both error probabilities decay subexponentially and asymmetrically. The precise decay rates are characterized. Second, the code rate tends to capacity at a rate of $n^{-1/2}$. In this case, the total error probability is asymptotically a positive constant while the undetected error probability decays as $\exp(- b n^{ 1/2})$ for some $b>0$. The proof techniques involve applications of a modified (or "shifted") version of the Gärtner-Ellis theorem and the type class enumerator method to characterize the asymptotic behavior of a sequence of cumulant generating functions.
- We consider block codes for degraded wiretap channels in which the legitimate receiver decodes the message with an asymptotic error probability no larger than $\varepsilon$ but the leakage to the eavesdropper vanishes. For discrete memoryless and Gaussian wiretap channels, we show that the maximum rate of transmission does not depend on $\varepsilon\in [0,1)$, i.e., such channels possess the partial strong converse property. Furthermore, we derive sufficient conditions for the partial strong converse property to hold for memoryless but non-stationary symmetric and degraded wiretap channels. Our proof techniques leverage the information spectrum method, which allows us to establish a necessary and sufficient condition for the partial strong converse to hold for general wiretap channels without any information stability assumptions.
- We study a form of unequal error protection that we term "unequal message protection" (UMP). The message set of a UMP code is a union of $m$ disjoint message classes. Each class has its own error protection requirement, with some classes needing better error protection than others. We analyze the tradeoff between rates of message classes and the levels of error protection of these codes. We demonstrate that there is a clear performance loss compared to homogeneous (classical) codes with equivalent parameters. This is in sharp contrast to previous literature that considers UMP codes. To obtain our results we generalize finite block length achievability and converse bounds due to Polyanskiy-Poor-Verdú. We evaluate our bounds for the binary symmetric and binary erasure channels, and analyze the asymptotic characteristic of the bounds in the fixed error and moderate deviations regimes. In addition, we consider two questions related to the practical construction of UMP codes. First, we study a "header" construction that prefixes the message class into a header followed by data protection using a standard homogeneous code. We show that, in general, this construction is not optimal at finite block lengths. We further demonstrate that our main UMP achievability bound can be obtained using coset codes, which suggests a path to implementation of tractable UMP codes.
- In 1975, Carleial presented a special case of an interference channel in which the interference does not reduce the capacity of the constituent point-to-point Gaussian channels. In this work, we show that if the inequalities in the conditions that Carleial stated are strict, the dispersions are similarly unaffected. More precisely, in this work, we characterize the second-order coding rates of the Gaussian interference channel in the strictly very strong interference regime. In other words, we characterize the speed of convergence of rates of optimal block codes towards a boundary point of the (rectangular) capacity region. These second-order rates are expressed in terms of the average probability of error and variances of some modified information densities which coincide with the dispersion of the (single-user) Gaussian channel. We thus conclude that the dispersions are unaffected by interference in this channel model.
- We derive the optimum second-order coding rates, known as second-order capacities, for erasure and list decoding. For erasure decoding for discrete memoryless channels, we show that second-order capacity is $\sqrt{V}\Phi^{-1}(\epsilon_t)$ where $V$ is the channel dispersion and $\epsilon_t$ is the total error probability, i.e., the sum of the erasure and undetected errors. We show numerically that the expected rate at finite blocklength for erasures decoding can exceed the finite blocklength channel coding rate. We also show that the analogous result also holds for lossless source coding with decoder side information, i.e., Slepian-Wolf coding. For list decoding, we consider list codes of deterministic size that scales as $\exp(\sqrt{n}l)$ and show that the second-order capacity is $l+\sqrt{V}\Phi^{-1}(\epsilon)$ where $\epsilon$ is the permissible error probability. We also consider lists of polynomial size $n^\alpha$ and derive bounds on the third-order coding rate in terms of the order of the polynomial $\alpha$. These bounds are tight for symmetric and singular channels. The direct parts of the coding theorems leverage on the simple threshold decoder and converses are proved using variants of the hypothesis testing converse.
- This paper shows that, under the average error probability formalism, the third-order term in the normal approximation for the additive white Gaussian noise channel with a maximal or equal power constraint is at least $\frac{1}{2} \log n + O(1)$. This matches the upper bound derived by Polyanskiy-Poor-Verdú (2010).
- This paper studies the second-order asymptotics of the Gaussian multiple-access channel with degraded message sets. For a fixed average error probability $\varepsilon \in (0,1)$ and an arbitrary point on the boundary of the capacity region, we characterize the speed of convergence of rate pairs that converge to that boundary point for codes that have asymptotic error probability no larger than $\varepsilon$. As a stepping stone to this local notion of second-order asymptotics, we study a global notion, and establish relationships between the two. We provide a numerical example to illustrate how the angle of approach to a boundary point affects the second-order coding rate. This is the first conclusive characterization of the second-order asymptotics of a network information theory problem in which the capacity region is not a polygon.
- We study non-asymptotic fundamental limits for transmitting classical information over memoryless quantum channels, i.e. we investigate the amount of classical information that can be transmitted when a quantum channel is used a finite number of times and a fixed, non-vanishing average error is permissible. We consider the classical capacity of quantum channels that are image-additive, including all classical to quantum channels, as well as the product state capacity of arbitrary quantum channels. In both cases we show that the non-asymptotic fundamental limit admits a second-order approximation that illustrates the speed at which the rate of optimal codes converges to the Holevo capacity as the blocklength tends to infinity. The behavior is governed by a new channel parameter, called channel dispersion, for which we provide a geometrical interpretation.
- We study the performance limits of state-dependent discrete memoryless channels with a discrete state available at both the encoder and the decoder. We establish the epsilon-capacity as well as necessary and sufficient conditions for the strong converse property for such channels when the sequence of channel states is not necessarily stationary, memoryless or ergodic. We then seek a finer characterization of these capacities in terms of second-order coding rates. The general results are supplemented by several examples including i.i.d. and Markov states and mixed channels.
- Bounds on the reliability function for the discrete memoryless relay channel are derived using the method of types. Two achievable error exponents are derived based on partial decode-forward and compress-forward which are well-known superposition block-Markov coding schemes. The derivations require combinations of the techniques involved in the proofs of Csiszár-Körner-Marton's packing lemma for the error exponent of channel coding and Marton's type covering lemma for the error exponent of source coding with a fidelity criterion. The decode-forward error exponent is evaluated on Sato's relay channel. From this example, it is noted that to obtain the fastest possible decay in the error probability for a fixed effective coding rate, one ought to optimize the number of blocks in the block-Markov coding scheme assuming the blocklength within each block is large. An upper bound on the reliability function is also derived using ideas from Haroutunian's lower bound on the error probability for point-to-point channel coding with feedback.
- We present novel non-asymptotic or finite blocklength achievability bounds for three side-information problems in network information theory. These include (i) the Wyner-Ahlswede-Korner (WAK) problem of almost-lossless source coding with rate-limited side-information, (ii) the Wyner-Ziv (WZ) problem of lossy source coding with side-information at the decoder and (iii) the Gel'fand-Pinsker (GP) problem of channel coding with noncausal state information available at the encoder. The bounds are proved using ideas from channel simulation and channel resolvability. Our bounds for all three problems improve on all previous non-asymptotic bounds on the error probability of the WAK, WZ and GP problems--in particular those derived by Verdu. Using our novel non-asymptotic bounds, we recover the general formulas for the optimal rates of these side-information problems. Finally, we also present achievable second-order coding rates by applying the multidimensional Berry-Esseen theorem to our new non-asymptotic bounds. Numerical results show that the second-order coding rates obtained using our non-asymptotic achievability bounds are superior to those obtained using existing finite blocklength bounds.
- This paper shows that the logarithm of the epsilon-error capacity (average error probability) for n uses of a discrete memoryless channel is upper bounded by the normal approximation plus a third-order term that does not exceed 1/2 log n + O(1) if the epsilon-dispersion of the channel is positive. This matches a lower bound by Y. Polyanskiy (2010) for discrete memoryless channels with positive reverse dispersion. If the epsilon-dispersion vanishes, the logarithm of the epsilon-error capacity is upper bounded by the n times the capacity plus a constant term except for a small class of DMCs and epsilon >= 1/2.
- We consider the scenario in which multiple sensors send spatially correlated data to a fusion center (FC) via independent Rayleigh-fading channels with additive noise. Assuming that the sensor data is sparse in some basis, we show that the recovery of this sparse signal can be formulated as a compressive sensing (CS) problem. To model the scenario in which the sensors operate with intermittently available energy that is harvested from the environment, we propose that each sensor transmits independently with some probability, and adapts the transmit power to its harvested energy. Due to the probabilistic transmissions, the elements of the equivalent sensing matrix are not Gaussian. Besides, since the sensors have different energy harvesting rates and different sensor-to-FC distances, the FC has different receive signal-to-noise ratios (SNRs) for each sensor. This is referred to as the inhomogeneity of SNRs. Thus, the elements of the sensing matrix are also not identically distributed. For this unconventional setting, we provide theoretical guarantees on the number of measurements for reliable and computationally efficient recovery, by showing that the sensing matrix satisfies the restricted isometry property (RIP), under reasonable conditions. We then compute an achievable system delay under an allowable mean-squared-error (MSE). Furthermore, using techniques from large deviations theory, we analyze the impact of inhomogeneity of SNRs on the so-called k-restricted eigenvalues, which governs the number of measurements required for the RIP to hold. We conclude that the number of measurements required for the RIP is not sensitive to the inhomogeneity of SNRs, when the number of sensors n is large and the sparsity of the sensor data (signal) k grows slower than the square root of n. Our analysis is corroborated by extensive numerical results.
- We consider the Gel'fand-Pinsker problem in which the channel and state are general, i.e., possibly non-stationary, non-memoryless and non-ergodic. Using the information spectrum method and a non-trivial modification of the piggyback coding lemma by Wyner, we prove that the capacity can be expressed as an optimization over the difference of a spectral inf- and a spectral sup-mutual information rate. We consider various specializations including the case where the channel and state are memoryless but not necessarily stationary.
- In this paper, we study the finite blocklength limits of state-dependent discrete memoryless channels where the discrete memoryless state is known noncausally at the encoder. For the point-to-point case, this is known as the Gel'fand-Pinsker channel model. We define the (n,\epsilon)-capacity of the Gel'fand-Pinsker channel as the maximal rate of transmission of a message subject to the condition that the length of the block-code is n and the average error probability is no larger than \epsilon. This paper provides a lower bound for the (n,\epsilon)-capacity of the Gel'fand-Pinsker channel model, and hence an upper bound on the dispersion, a fundamental second-order quantity in the study of the performance limits of discrete memoryless channels. In addition, we extend the work of Y. Steinberg (2005), in which the (degraded) broadcast channel extension of the Gel'fand-Pinsker model was studied. We provide and inner bound to the (n,\epsilon)-capacity region for this broadcast channel model using a combination of ideas of Gel'fand-Pinsker coding, superposition coding and dispersion (finite blocklength) analysis.
- We analyze the dispersions of distributed lossless source coding (the Slepian-Wolf problem), the multiple-access channel and the asymmetric broadcast channel. For the two-encoder Slepian-Wolf problem, we introduce a quantity known as the entropy dispersion matrix, which is analogous to the scalar dispersions that have gained interest recently. We prove a global dispersion result that can be expressed in terms of this entropy dispersion matrix and provides intuition on the approximate rate losses at a given blocklength and error probability. To gain better intuition about the rate at which the non-asymptotic rate region converges to the Slepian-Wolf boundary, we define and characterize two operational dispersions: the local dispersion and the weighted sum-rate dispersion. The former represents the rate of convergence to a point on the Slepian-Wolf boundary while the latter represents the fastest rate for which a weighted sum of the two rates converges to its asymptotic fundamental limit. Interestingly, when we approach either of the two corner points, the local dispersion is characterized not by a univariate Gaussian but a bivariate one as well as a subset of off-diagonal elements of the aforementioned entropy dispersion matrix. Finally, we demonstrate the versatility of our achievability proof technique by providing inner bounds for the multiple-access channel and the asymmetric broadcast channel in terms of dispersion matrices. All our proofs are unified a so-called vector rate redundancy theorem which is proved using the multidimensional Berry-Esseen theorem.
- We study the moderate-deviations (MD) setting for lossy source coding of stationary memoryless sources. More specifically, we derive fundamental compression limits of source codes whose rates are $R(D) \pm \epsilon_n$, where $R(D)$ is the rate-distortion function and $\epsilon_n$ is a sequence that dominates $\sqrt{1/n}$. This MD setting is complementary to the large-deviations and central limit settings and was studied by Altug and Wagner for the channel coding setting. We show, for finite alphabet and Gaussian sources, that as in the central limit-type results, the so-called dispersion for lossy source coding plays a fundamental role in the MD setting for the lossy source coding problem.
- We propose a general methodology for performing statistical inference within a `rare-events regime' that was recently suggested by Wagner, Viswanath and Kulkarni. Our approach allows one to easily establish consistent estimators for a very large class of canonical estimation problems, in a large alphabet setting. These include the problems studied in the original paper, such as entropy and probability estimation, in addition to many other interesting ones. We particularly illustrate this approach by consistently estimating the size of the alphabet and the range of the probabilities. We start by proposing an abstract methodology based on constructing a probability measure with the desired asymptotic properties. We then demonstrate two concrete constructions by casting the Good-Turing estimator as a pseudo-empirical measure, and by using the theory of mixture model estimation.
- We consider the secret key generation problem when sources are randomly excited by the sender and there is a noiseless public discussion channel. Our setting is thus similar to recent works on channels with action-dependent states where the channel state may be influenced by some of the parties involved. We derive single-letter expressions for the secret key capacity through a type of source emulation analysis. We also derive lower bounds on the achievable reliability and secrecy exponents, i.e., the exponential rates of decay of the probability of decoding error and of the information leakage. These exponents allow us to determine a set of strongly-achievable secret key rates. For degraded eavesdroppers the maximum strongly-achievable rate equals the secret key capacity; our exponents can also be specialized to previously known results. In deriving our strong achievability results we introduce a coding scheme that combines wiretap coding (to excite the channel) and key extraction (to distill keys from residual randomness). The secret key capacity is naturally seen to be a combination of both source- and channel-type randomness. Through examples we illustrate a fundamental interplay between the portion of the secret key rate due to each type of randomness. We also illustrate inherent tradeoffs between the achievable reliability and secrecy exponents. Our new scheme also naturally accommodates rate limits on the public discussion. We show that under rate constraints we are able to achieve larger rates than those that can be attained through a pure source emulation strategy.
- We consider the problem of high-dimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph. For such graphs, the proposed algorithm has a sample complexity of $n=\Omega(J_{\min}^{-2}\log p)$, where $p$ is the number of variables, and $J_{\min}$ is the minimum (absolute) edge potential in the model. We also establish nonasymptotic necessary and sufficient conditions for structure estimation.
- We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=omega(J_min^-2 log p), where p is the number of variables and J_min is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency.
- This paper establishes information-theoretic limits in estimating a finite field low-rank matrix given random linear measurements of it. These linear measurements are obtained by taking inner products of the low-rank matrix with random sensing matrices. Necessary and sufficient conditions on the number of measurements required are provided. It is shown that these conditions are sharp and the minimum-rank decoder is asymptotically optimal. The reliability function of this decoder is also derived by appealing to de Caen's lower bound on the probability of a union. The sufficient condition also holds when the sensing matrices are sparse - a scenario that may be amenable to efficient decoding. More precisely, it is shown that if the n\times n-sensing matrices contain, on average, \Omega(nlog n) entries, the number of measurements required is the same as that when the sensing matrices are dense and contain entries drawn uniformly at random from the field. Analogies are drawn between the above results and rank-metric codes in the coding theory literature. In fact, we are also strongly motivated by understanding when minimum rank distance decoding of random rank-metric codes succeeds. To this end, we derive distance properties of equiprobable and sparse rank-metric codes. These distance properties provide a precise geometric interpretation of the fact that the sparse ensemble requires as few measurements as the dense one. Finally, we provide a non-exhaustive procedure to search for the unknown low-rank matrix.
- We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset.
- The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.
- The problem of learning tree-structured Gaussian graphical models from independent and identically distributed (i.i.d.) samples is considered. The influence of the tree structure and the parameters of the Gaussian distribution on the learning rate as the number of samples increases is discussed. Specifically, the error exponent corresponding to the event that the estimated tree structure differs from the actual unknown tree structure of the distribution is analyzed. Finding the error exponent reduces to a least-squares problem in the very noisy learning regime. In this regime, it is shown that the extremal tree structure that minimizes the error exponent is the star for any fixed set of correlation coefficients on the edges of the tree. If the magnitudes of all the correlation coefficients are less than 0.63, it is also shown that the tree structure that maximizes the error exponent is the Markov chain. In other words, the star and the chain graphs represent the hardest and the easiest structures to learn in the class of tree-structured Gaussian graphical models. This result can also be intuitively explained by correlation decay: pairs of nodes which are far apart, in terms of graph distance, are unlikely to be mistaken as edges by the maximum-likelihood estimator in the asymptotic regime.
- The problem of maximum-likelihood (ML) estimation of discrete tree-structured distributions is considered. Chow and Liu established that ML-estimation reduces to the construction of a maximum-weight spanning tree using the empirical mutual information quantities as the edge weights. Using the theory of large-deviations, we analyze the exponent associated with the error probability of the event that the ML-estimate of the Markov tree structure differs from the true tree structure, given a set of independently drawn samples. By exploiting the fact that the output of ML-estimation is a tree, we establish that the error exponent is equal to the exponential rate of decay of a single dominant crossover event. We prove that in this dominant crossover event, a non-neighbor node pair replaces a true edge of the distribution that is along the path of edges in the true tree graph connecting the nodes in the non-neighbor pair. Using ideas from Euclidean information theory, we then analyze the scenario of ML-estimation in the very noisy learning regime and show that the error exponent can be approximated as a ratio, which is interpreted as the signal-to-noise ratio (SNR) for learning tree distributions. We show via numerical experiments that in this regime, our SNR approximation is accurate.
- As an example of the recently-introduced concept of rate of innovation, signals that are linear combinations of a finite number of Diracs per unit time can be acquired by linear filtering followed by uniform sampling. However, in reality, samples are rarely noiseless. In this paper, we introduce a novel stochastic algorithm to reconstruct a signal with finite rate of innovation from its noisy samples. Even though variants of this problem has been approached previously, satisfactory solutions are only available for certain classes of sampling kernels, for example kernels which satisfy the Strang-Fix condition. In this paper, we consider the infinite-support Gaussian kernel, which does not satisfy the Strang-Fix condition. Other classes of kernels can be employed. Our algorithm is based on Gibbs sampling, a Markov chain Monte Carlo (MCMC) method. Extensive numerical simulations demonstrate the accuracy and robustness of our algorithm.