results for au:Dai_B in:cs

- Mar 20 2017 cs.CV arXiv:1703.06029v1Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the "ground-truth" captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity -- two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.
- The physical layer security in the up-link of the wireless communication systems is often modeled as the multiple access wiretap channel (MAC-WT), and recently it has received a lot attention. In this paper, the MAC-WT has been re-visited by considering the situation that the legitimate receiver feeds his received channel output back to the transmitters via two noiseless channels, respectively. This model is called the MAC-WT with noiseless feedback. Inner and outer bounds on the secrecy capacity region of this feedback model are provided. To be specific, we first present a decode-and-forward (DF) inner bound on the secrecy capacity region of this feedback model, and this bound is constructed by allowing each transmitter to decode the other one's transmitted message from the feedback, and then each transmitter uses the decoded message to re-encode his own messages, i.e., this DF inner bound allows the independent transmitters to co-operate with each other. Then, we provide a hybrid inner bound which is strictly larger than the DF inner bound, and it is constructed by using the feedback as a tool not only to allow the independent transmitters to co-operate with each other, but also to generate two secret keys respectively shared between the legitimate receiver and the two transmitters. Finally, we give a sato-type outer bound on the secrecy capacity region of this feedback model. The results of this paper are further explained via a Gaussian example.
- Learning to hash plays a fundamentally important role in the efficient image and video retrieval and many other computer vision problems. However, due to the binary outputs of the hash functions, the learning of hash functions is very challenging. In this paper, we propose a novel approach to learn stochastic hash functions such that the learned hashing codes can be used to regenerate the inputs. We develop an efficient stochastic gradient learning algorithm which can avoid the notorious difficulty caused by binary output constraint, and directly optimize the parameters of the hash functions and the associated generative model jointly. The proposed method can be applied to both $L2$ approximate nearest neighbor search (L2NNS) and maximum inner product search (MIPS). Extensive experiments on a variety of large-scale datasets show that the proposed method achieves significantly better retrieval results than previous state-of-the-arts.
- Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$. These learning problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that $z$ is independent of $x$, or require an overwhelmingly large samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. With such new reformulation, we only need to deal with the joint distribution $p(z,x)$. We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments on both synthetic and real-world datasets show that the proposed approach can significantly improve over the existing algorithms.
- The finite state Markov channel (FSMC), where the channel transition probability is controlled by a state undergoing a Markov process, is a useful model for the mobile wireless communication channel. In this paper, we investigate the security issue in the mobile wireless communication systems by considering the FSMC with an eavesdropper, which we call the finite state Markov wiretap channel (FSM-WC). We assume that the state is perfectly known by the legitimate receiver and the eavesdropper, and through a noiseless feedback channel, the legitimate receiver sends his received channel output and the state back to the transmitter after some time delay. Inner and outer bounds on the capacity-equivocation regions of the FSM-WC with delayed state feedback and with or without delayed channel output feedback are provided in this paper, and we show that these bounds meet if the eavesdropper's received symbol is a degraded version of the legitimate receiver's. The above results are further explained via degraded Gaussian and Gaussian fading examples.
- Mar 18 2016 cs.LG arXiv:1603.05629v4Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced a number of interdisciplinary areas such as computational biology and drug design. Typically, kernels are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods from scaling up to millions of data points, and exploiting discriminative information to learn feature representations. We propose, structure2vec, an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information. Interestingly, structure2vec extracts features by performing a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are $10,000$ times smaller, while at the same time achieving the state-of-the-art predictive performance.
- This paper studies the energy efficiency of the cloud radio access network (C-RAN), specifically focusing on two fundamental and different downlink transmission strategies, namely the data-sharing strategy and the compression strategy. In the data-sharing strategy, the backhaul links connecting the central processor (CP) and the base-stations (BSs) are used to carry user messages -- each user's messages are sent to multiple BSs; the BSs locally form the beamforming vectors then cooperatively transmit the messages to the user. In the compression strategy, the user messages are precoded centrally at the CP, which forwards a compressed version of the analog beamformed signals to the BSs for cooperative transmission. This paper compares the energy efficiencies of the two strategies by formulating an optimization problem of minimizing the total network power consumption subject to user target rate constraints, where the total network power includes the BS transmission power, BS activation power, and load-dependent backhaul power. To tackle the discrete and nonconvex nature of the optimization problems, we utilize the techniques of reweighted $\ell_1$ minimization and successive convex approximation to devise provably convergent algorithms. Our main finding is that both the optimized data-sharing and compression strategies in C-RAN achieve much higher energy efficiency as compared to the non-optimized coordinated multi-point transmission, but their comparative effectiveness in energy saving depends on the user target rate. At low user target rate, data-sharing consumes less total power than compression, however, as the user target rate increases, the backhaul power consumption for data-sharing increases significantly leading to better energy efficiency of compression at the high user rate regime.
- Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters. However, when Bayes' rule does not result in tractable closed-form, most approximate inference algorithms lack either scalability or rigorous guarantees. To tackle this challenge, we propose a simple yet provable algorithm, \emphParticle Mirror Descent (PMD), to iteratively approximate the posterior density. PMD is inspired by stochastic functional mirror descent where one descends in the density space using a small batch of data points at each iteration, and by particle filtering where one uses samples to approximate a function. We prove result of the first kind that, with $m$ particles, PMD provides a posterior density estimator that converges in terms of $KL$-divergence to the true posterior in rate $O(1/\sqrt{m})$. We demonstrate competitive empirical performances of PMD compared to several approximate inference algorithms in mixture models, logistic regression, sparse Gaussian processes and latent Dirichlet allocation on large scale datasets.
- This paper considers a downlink cloud radio access network (C-RAN) in which all the base-stations (BSs) are connected to a central computing cloud via digital backhaul links with finite capacities. Each user is associated with a user-centric cluster of BSs; the central processor shares the user's data with the BSs in the cluster, which then cooperatively serve the user through joint beamforming. Under this setup, this paper investigates the user scheduling, BS clustering and beamforming design problem from a network utility maximization perspective. Differing from previous works, this paper explicitly considers the per-BS backhaul capacity constraints. We formulate the network utility maximization problem for the downlink C-RAN under two different models depending on whether the BS clustering for each user is dynamic or static over different user scheduling time slots. In the former case, the user-centric BS cluster is dynamically optimized for each scheduled user along with the beamforming vector in each time-frequency slot, while in the latter case the user-centric BS cluster is fixed for each user and we jointly optimize the user scheduling and the beamforming vector to account for the backhaul constraints. In both cases, the nonconvex per-BS backhaul constraints are approximated using the reweighted l1-norm technique. This approximation allows us to reformulate the per-BS backhaul constraints into weighted per-BS power constraints and solve the weighted sum rate maximization problem through a generalized weighted minimum mean square error approach. This paper shows that the proposed dynamic clustering algorithm can achieve significant performance gain over existing naive clustering schemes. This paper also proposes two heuristic static clustering schemes that can already achieve a substantial portion of the gain.
- The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after $t$ iterations converges to the optimal function in the reproducing kernel Hilbert space in rate $O(1/t)$, and achieves a generalization performance of $O(1/\sqrt{t})$. This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.
- In this paper, we investigate the effects of an additional trusted relay node on the secrecy of multiple-access wiretap channel (MAC-WT) by considering the model of multiple-access relay wiretap channel (MARC-WT). More specifically, first, we investigate the discrete memoryless MARC-WT. Three inner bounds (with respect to decode-forward (DF), noise-forward (NF) and compress-forward (CF) strategies) on the secrecy capacity region are provided. Second, we investigate the degraded discrete memoryless MARC-WT, and present an outer bound on the secrecy capacity region of this degraded model. Finally, we investigate the Gaussian MARC-WT, and find that the NF and CF strategies help to enhance Tekin-Yener's achievable secrecy rate region of Gaussian MAC-WT. Moreover, we find that if the noise variance of the transmitters-relay channel is smaller than that of the transmitters-receiver channel, the DF strategy may also enhance Tekin-Yener's achievable secrecy rate region of Gaussian MAC-WT, and it may perform even better than the NF and CF strategies.
- The overhead of internal network monitoring motivates techniques of network tomography. Network coding (NC) presents a new opportunity for network tomography as NC introduces topology-dependent correlation that can be further exploited in topology estimation. Compared with traditional methods, network tomography with NC has many advantages such as the improvement of tomography accuracy and the reduction of complexity in choosing monitoring paths. In this paper we first introduce the problem of tomography with NC and then propose the taxonomy criteria to classify various methods. We also present existing solutions and future trend. We expect that our comprehensive review on network tomography with NC can serve as a good reference for researchers and practitioners working in the area.
- Software Defined Networking (SDN) is a revolutionary network architecture that separates out network control functions from the underlying equipment and is an increasingly trend to help enterprises build more manageable data centers where big data processing emerges as an important part of applications. To concurrently process large-scale data, MapReduce with an open source implementation named Hadoop is proposed. In practical Hadoop systems one kind of issue that vitally impacts the overall performance is know as the NP-complete minimum make span problem. One main solution is to assign tasks on data local nodes to avoid link occupation since network bandwidth is a scarce resource. Many methodologies for enhancing data locality are proposed such as the HDS and state-of-the-art scheduler BAR. However, all of them either ignore allocating tasks in a global view or disregard available bandwidth as the basis for scheduling. In this paper we propose a heuristic bandwidth-aware task scheduler BASS to combine Hadoop with SDN. It is not only able to guarantee data locality in a global view but also can efficiently assign tasks in an optimized way. Both examples and experiments demonstrate that BASS has the best performance in terms of job completion time. To our knowledge, BASS is the first to exploit talent of SDN for big data processing and we believe it points out a new trend for large-scale data processing.
- Given a hypothesis space, the large volume principle by Vladimir Vapnik prioritizes equivalence classes according to their volume in the hypothesis space. The volume approximation has hitherto been successfully applied to binary learning problems. In this paper, we extend it naturally to a more general definition which can be applied to several transductive problem settings, such as multi-class, multi-label and serendipitous learning. Even though the resultant learning method involves a non-convex optimization problem, the globally optimal solution is almost surely unique and can be obtained in O(n^3) time. We theoretically provide stability and error analyses for the proposed method, and then experimentally show that it is promising.
- We investigate the effects of an additional relay node on the secrecy of broadcast channels by considering the model of relay broadcast channels with confidential messages. We show that this additional relay node can increase the achievable secrecy rate region of the broadcast channels with confidential messages. More specifically, first, we investigate the discrete memoryless relay broadcast channels with two confidential messages and one common message. Three inner bounds (with respect to decode-forward, generalized noise-forward and compress-forward strategies) and an outer bound on the capacity-equivocation region are provided. Second, we investigate the discrete memoryless relay broadcast channels with two confidential messages. Inner and outer bounds on the capacity-equivocation region are provided. Finally, we investigate the discrete memoryless relay broadcast channels with one confidential message and one common message. Inner and outer bounds on the capacity-equivocation region are provided, and the results are further explained via a Gaussian example. Compared with Csiszar-Korner's work on broadcast channels with confidential messages (BCC), we find that with the help of the relay node, the secrecy capacity region of the Gaussian BCC is enhanced.
- Spectral methods have greatly advanced the estimation of latent variable models, generating a sequence of novel and efficient algorithms with strong theoretical guarantees. However, current spectral algorithms are largely restricted to mixtures of discrete or Gaussian distributions. In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric. The key idea of the method is to embed the joint distribution of a multi-view latent variable into a reproducing kernel Hilbert space, and then the latent parameters are recovered using a robust tensor power method. We establish that the sample complexity for the proposed method is quadratic in the number of latent components and is a low order polynomial in the other relevant parameters. Thus, our non-parametric tensor approach to learning latent variable models enjoys good sample and computational efficiencies. Moreover, the non-parametric tensor power method compares favorably to EM algorithm and other existing spectral algorithms in our experiments.
- Aug 12 2013 cs.NI arXiv:1308.2002v2A Peer-to-Peer (P2P) network can boost its performance if peers are provided with underlying network-layer routing topology. The task of inferring the network-layer routing topology and link performance from an end host to a set of other hosts is termed as network tomography, and it normally requires host computers to send probing messages. We design a passive network tomography method that does not requires any probing messages and takes a free ride over data flows in P2P networks. It infers routing topology based on end-to-end delay correlation estimation (DCE) without requiring any synchronization or cooperation from the intermediate routers. We implement and test our method in the real world Internet environment and achieved the accuracy of 92% in topology recovery. We also perform extensive simulation in OMNet++ to evaluate its performance over large scale networks, showing that its topology recovery accuracy is about 95% for large networks.
- Jul 20 2013 cs.NI arXiv:1307.5085v1Tomography is important for network design and routing optimization. Prior approaches require either precise time synchronization or complex cooperation. Furthermore, active tomography consumes explicit probeing resulting in limited scalability. To address the first issue we propose a novel Delay Correlation Estimation methodology named DCE with no need of synchronization and special cooperation. For the second issue we develop a passive realization mechanism merely using regular data flow without explicit bandwidth consumption. Extensive simulations in OMNeT++ are made to evaluate its accuracy where we show that DCE measured delay correlation is highly identical with the true value. Also from test result we find that mechanism of passive realization is able to achieve both regular data transmission and purpose of tomography with excellent robustness versus different background traffic and package size.
- May 01 2013 cs.CG arXiv:1304.7833v1We consider the problem of computing the time-convex hull of a point set under the general $L_p$ metric in the presence of a straight-line highway in the plane. The traveling speed along the highway is assumed to be faster than that off the highway, and the shortest time-path between a distant pair may involve traveling along the highway. The time-convex hull ${TCH}(P)$ of a point set $P$ is the smallest set containing both $P$ and \emphall shortest time-paths between any two points in ${TCH}(P)$. In this paper we give an algorithm that computes the time-convex hull under the $L_p$ metric in optimal $O(n\log n)$ time for a given set of $n$ points and a real number $p$ with $1\le p \le \infty$.
- Several security models of multiple-access channel (MAC) are investigated. First, we study the degraded MAC with confidential messages, where two users transmit their confidential messages (no common message) to a destination, and each user obtains a degraded version of the output of the MAC. Each user views the other user as a eavesdropper, and wishes to keep its confidential message as secret as possible from the other user. Measuring each user's uncertainty about the other user's confidential message by equivocation, the inner and outer bounds on the capacity-equivocation region for this model have been provided. The result is further explained via the binary and Gaussian examples. Second, the discrete memoryless multiple-access wiretap channel (MAC-WT) is studied, where two users transmit their corresponding confidential messages (no common message) to a legitimate receiver, while an additional wiretapper wishes to obtain the messages via a wiretap channel. This new model is considered into two cases: the general MAC-WT with cooperative encoders, and the degraded MAC-WT with non-cooperative encoders. The capacity-equivocation region is totally determined for the cooperative case, and inner and outer bounds on the capacity-equivocation region are provided for the non-cooperative case. For both cases, the results are further explained via the binary examples.
- Recent technological advances in wireless communications offer new opportunities and challenges for relay network.To enhance system performance, Demodulate-Network Coding (Dm-NC) scheme has been examined at relay node; it works directly to De-map the received signals and after that forward the mixture to the destination. Simulation analysis has been proven that the performance of Dm-NC has superiority over analog-NC. In addition, the Quantize-Decode-NC scheme (QDF-NC) has been introduced. The presented simulation results clearly provide that the QDF-NC perform better than analog-NC. The toggle between analogNC and QDF-NC is simulated in order to investigate delay and power consumption reduction at relay node.
- We propose a general information-theoretic approach called Seraph (SEmi-supervised metRic leArning Paradigm with Hyper-sparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probability on labeled data and minimize it on unlabeled data following entropy regularization, which allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Furthermore, Seraph is regularized by encouraging a low-rank projection induced from the metric. The optimization of Seraph is solved efficiently and stably by an EM-like scheme with the analytical E-Step and convex M-Step. Experiments demonstrate that Seraph compares favorably with many well-known global and local metric learning methods.
- Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost and it is difficult to design nonstationary GP priors in practice. In this paper, we propose a sparse Gaussian process model, EigenGP, based on the Karhunen-Loeve (KL) expansion of a GP prior. We use the Nystrom approximation to obtain data dependent eigenfunctions and select these eigenfunctions by evidence maximization. This selection reduces the number of eigenfunctions in our model and provides a nonstationary covariance function. To handle nonlinear likelihoods, we develop an efficient expectation propagation (EP) inference algorithm, and couple it with expectation maximization for eigenfunction selection. Because the eigenfunctions of a Gaussian kernel are associated with clusters of samples - including both the labeled and unlabeled - selecting relevant eigenfunctions enables EigenGP to conduct semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over alternative state-of-the-art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification.
- This paper presents a relay selection for decode-and-forward based on network coding (DF-NC) and analog-NC protocols in general scheme of cellular network system. In the propose scheme the two source node simultaneously transmit their own information to all the relays as well as the destination node, and then, a single relay i.e. best with a minimum symbol error rate (SER) will be selected to forward the new version of the received signal. Simulation results show that, the DF-NC scheme with considerable performance has exactness over analog-NC scheme. To improve the system performance, optimal power allocation between the two sources and the best relay is determined based on the asymptotic SER. By increasing the number of relays node, the optimum power allocation achieve better performance than asymptotic SER.
- In this paper, first, we investigate the model of degraded broadcast channel with side information and confidential messages. This work is from Steinberg's work on the degraded broadcast channel with causal and noncausal side information, and Csisz$\acute{a}$r-Körner's work on broadcast channel with confidential messages. Inner and outer bounds on the capacity-equivocation regions are provided for the noncausal and causal cases. Superposition coding and double-binning technique are used in the corresponding achievability proofs. Then, we investigate the degraded broadcast channel with side information, confidential messages and noiseless feedback. The noiseless feedback is from the non-degraded receiver to the channel encoder. Inner and outer bounds on the capacity-equivocation region are provided for the noncausal case, and the capacity-equivocation region is determined for the causal case. Compared with the model without feedback, we find that the noiseless feedback helps to enlarge the inner bounds for both causal and noncausal cases. In the achievability proof of the feedback model, the noiseless feedback is used as a secret key shared by the non-degraded receiver and the transmitter, and therefore, the code construction for the feedback model is a combination of superposition coding, Gel'fand-Pinsker's binning, block Markov coding and Ahlswede-Cai's secret key on the feedback system.
- We propose a general information-theoretic approach called Seraph (SEmi-supervised metRic leArning Paradigm with Hyper-sparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probability on labeled data and minimize it on unlabeled data following entropy regularization, which allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Furthermore, Seraph is regularized by encouraging a low-rank projection induced from the metric. The optimization of Seraph is solved efficiently and stably by an EM-like scheme with the analytical E-Step and convex M-Step. Experiments demonstrate that Seraph compares favorably with many well-known global and local metric learning methods.
- This submission has been withdrawn by the author.