results for au:Xu_J in:cs

- Mar 20 2018 cs.CV arXiv:1803.06598v1Cascaded Regression (CR) based methods have been proposed to solve facial landmarks detection problem, which learn a series of descent directions by multiple cascaded regressors separately trained in coarse and fine stages. They outperform the traditional gradient descent based methods in both accuracy and running speed. However, cascaded regression is not robust enough because each regressor's training data comes from the output of previous regressor. Moreover, training multiple regressors requires lots of computing resources, especially for deep learning based methods. In this paper, we develop a Self-Iterative Regression (SIR) framework to improve the model efficiency. Only one self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor. Specifically, we proposed Landmarks-Attention Network (LAN) as our regressor, which concurrently learns features around each landmark and obtains the holistic location increment. By doing so, not only the rest of regressors are removed to simplify the training process, but the number of model parameters is significantly decreased. The experiments demonstrate that with only 3.72M model parameters, our proposed method achieves the state-of-the-art performance.
- Mar 16 2018 cs.CV arXiv:1803.05471v1Aim: Early detection and correct diagnosis of lung cancer are the most important steps in improving patient outcome. This study aims to assess which deep learning models perform best in lung cancer diagnosis. Methods: Non-small cell lung carcinoma and small cell lung carcinoma biopsy specimens were consecutively obtained and stained. The specimen slides were diagnosed by two experienced pathologists (over 20 years). Several deep learning models were trained to discriminate cancer and non-cancer biopsies. Result: Deep learning models give reasonable AUC from 0.8810 to 0.9119. Conclusion: The deep learning analysis could help to speed up the detection process for the whole-slide image (WSI) and keep the comparable detection rate with human observer.
- As one of most fascinating machine learning techniques, deep neural network (DNN) has demonstrated excellent performance in various intelligent tasks such as image classification. DNN achieves such performance, to a large extent, by performing expensive training over huge volumes of training data. To reduce the data storage and transfer overhead in smart resource-limited Internet-of-Thing (IoT) systems, effective data compression is a "must-have" feature before transferring real-time produced dataset for training or classification. While there have been many well-known image compression approaches (such as JPEG), we for the first time find that a human-visual based image compression approach such as JPEG compression is not an optimized solution for DNN systems, especially with high compression ratios. To this end, we develop an image compression framework tailored for DNN applications, named "DeepN-JPEG", to embrace the nature of deep cascaded information process mechanism of DNN architecture. Extensive experiments, based on "ImageNet" dataset with various state-of-the-art DNNs, show that "DeepN-JPEG" can achieve ~3.5x higher compression rate over the popular JPEG solution while maintaining the same accuracy level for image recognition, demonstrating its great potential of storage and power efficiency in DNN-based smart IoT system design.
- This paper studies a new mobile edge computing (MEC) setup where an unmanned aerial vehicle (UAV) is served by cellular ground base stations (GBSs) for computation offloading. The UAV flies between a give pair of initial and final locations, during which it needs to accomplish certain computation tasks by offloading them to some selected GBSs along its trajectory for parallel execution. Under this setup, we aim to minimize the UAV's mission completion time by optimizing its trajectory jointly with the computation offloading scheduling, subject to the maximum speed constraint of the UAV, and the computation capacity constraints at GBSs. The joint UAV trajectory and computation offloading optimization problem is, however, non-convex and thus difficult to be solved optimally. To tackle this problem, we propose an efficient algorithm to obtain a high-quality suboptimal solution. Numerical results show that the proposed design significantly reduces the UAV's mission completion time, as compared to benchmark schemes.
- Laser power has become a viable solution to provide convenient and sustainable energy supply to unmanned aerial vehicles (UAVs). In this paper, we study a laser-powered UAV wireless communication system, where a laser transmitter sends laser beams to charge a fixed-wing UAV in flight, and the UAV uses the harvested laser energy to communicate with a ground station. To maintain the UAV's sustainable operation, its total energy consumption cannot exceed that harvested from the laser transmitter. Under such a laser energy harvesting constraint, we maximize the downlink communication throughput from the UAV to the ground station over a finite time duration, by jointly optimizing the UAV's trajectory and its transmit power allocation. However, due to the complicated UAV energy consumption model, this problem is non-convex and difficult to be solved. To tackle the problem, we first consider a special case with a double-circular UAV trajectory which balances the tradeoff between maximizing the performance of laser energy harvesting versus wireless communication at the UAV. Next, based on the obtained double-circular trajectory, we propose an efficient solution to the general problem, by applying the techniques of alternating optimization and sequential convex programming (SCP). Finally, numerical results are provided to validate the communication throughput performance of the proposed design.
- Feb 27 2018 cs.CL arXiv:1802.08970v1Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
- We propose a novel locally adaptive learning estimator for enhancing the inter- and intra- discriminative capabilities of Deep Neural Networks, which can be used as improved loss layer for semantic image segmentation tasks. Most loss layers compute pixel-wise cost between feature maps and ground truths, ignoring spatial layouts and interactions between neighboring pixels with same object category, and thus networks cannot be effectively sensitive to intra-class connections. Stride by stride, our method firstly conducts adaptive pooling filter operating over predicted feature maps, aiming to merge predicted distributions over a small group of neighboring pixels with same category, and then computes cost between the merged distribution vector and their category label. Such design can make groups of neighboring predictions from same category involved into estimations on predicting correctness with respect to their category, and hence train networks to be more sensitive to regional connections between adjacent pixels based on their categories. In the experiments on Pascal VOC 2012 segmentation datasets, the consistently improved results show that our proposed approach achieves better segmentation masks against previous counterparts.
- This paper studies a multi-user cooperative mobile-edge computing (MEC) system, in which a local mobile user can offload intensive computation tasks to multiple nearby edge devices serving as helpers for remote execution. We focus on the scenario where the local user has a number of independent tasks that can be executed in parallel but cannot be further partitioned. We consider a time division multiple access (TDMA) communication protocol, in which the local user can offload computation tasks to the helpers and download results from them over pre-scheduled time slots. Under this setup, we minimize the local user's computation latency by optimizing the task assignment jointly with the time and power allocations, subject to individual energy constraints at the local user and the helpers. However, the joint task assignment and wireless resource allocation problem is a mixed-integer non-linear program (MINLP) that is hard to solve optimally. To tackle this challenge, we first relax it into a convex problem, and then propose an efficient suboptimal solution based on the optimal solution to the relaxed convex problem. Finally, numerical results show that our proposed joint design significantly reduces the local user's computation latency, as compared against other benchmark schemes that design the task assignment separately from the offloading/downloading resource allocations and local execution.
- Low-rank matrix completion (MC) has achieved great success in many real-world data applications. A latent feature model formulation is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning, e.g., the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use a squared L2 norm to measure the pairwise difference, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is further developed to uniformly solve the optimization problem, with a theoretical convergence guarantee. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully characterized. In particular, we theoretically characterize the complexity-regularized maximum likelihood estimator, as a special case of our framework. It has a better error bound when compared to the standard trace-norm regularized matrix completion. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.
- Redundancy is a fundamental characteristic of many biological processes such as those in the genetic, visual, muscular and nervous system; yet its function has not been fully understood. The conventional interpretation of redundancy is that it serves as a fault-tolerance mechanism, which leads to redundancy's de facto application in man-made systems for reliability enhancement. On the contrary, our previous works have demonstrated an example where redundancy can be engineered solely for enhancing other aspects of the system, namely accuracy and precision. This design was inspired by the binocular structure of the human vision which we believe may share a similar operation. In this paper, we present a unified theory describing how such utilization of redundancy is feasible through two complementary mechanisms: representational redundancy (RPR) and entangled redundancy (ETR). Besides the previous works, we point out two additional examples where our new understanding of redundancy can be applied to justify a system's superior performance. One is the human musculoskeletal system (HMS) - a biological instance, and one is the deep residual neural network (ResNet) - an artificial counterpart. We envision that our theory would provide a framework for the future development of bio-inspired redundant artificial systems as well as assist the studies of the fundamental mechanisms governing various biological processes.
- Sensing is the process of deriving signals from the environment that allows artificial systems to interact with the physical world. The Shannon theorem specifies the maximum rate at which information can be acquired. However, this upper bound is hard to achieve in many man-made systems. The biological visual systems, on the other hand, have highly efficient signal representation and processing mechanisms that allow precise sensing. In this work, we argue that redundancy is one of the critical characteristics for such superior performance. We show architectural advantages by utilizing redundant sensing, including correction of mismatch error and significant precision enhancement. For a proof-of-concept demonstration, we have designed a heuristic-based analog-to-digital converter - a zero-dimensional quantizer. Through Monte Carlo simulation with the error probabilistic distribution as a priori, the performance approaching the Shannon limit is feasible. In actual measurements without knowing the error distribution, we observe at least 2-bit extra precision. The results may also help explain biological processes including the dominance of binocular vision, the functional roles of the fixational eye movements, and the structural mechanisms allowing hyperacuity.
- In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)-smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in high-dimensional ($p\gg n$) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to non-convex ones satisfying the Polyak-Lojasiewicz condition and give a tighter upper bound on the utility than the one in \citeijcai2017-548.
- In this paper, we study the Empirical Risk Minimization problem in the non-interactive local model of differential privacy. We first show that if the ERM loss function is $(\infty, T)$-smooth, then we can avoid a dependence of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (\em i.e., $\alpha^{-p}$), which answers a question in \citesmith2017interaction. Our approach is based on Bernstein polynomial approximation. Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound is asymptotically the same as the original one. Also with additional assumptions we show a server efficient algorithm with polynomial running time. At last, we propose (efficient) non-interactive locally differential private algorithms, based on different types of polynomial approximations, for learning the set of k-way marginal queries and the set of smooth queries.
- Feb 13 2018 cs.CV arXiv:1802.03617v1Bronchoscopy inspection as a follow-up procedure from the radiological imaging plays a key role in lung disease diagnosis and determining treatment plans for the patients. Doctors needs to make a decision whether to biopsy the patients timely when performing bronchoscopy. However, the doctors also needs to be very selective with biopsies as biopsies may cause uncontrollable bleeding of the lung tissue which is life-threaten. To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB). The system is developed based on transfer learning. We propose a novel transfer learning method: sentential fine-tuning . Compared to traditional fine-tuning methods, our methods achieves the best performance. We obtained a overall accuracy of 77.0% a dataset of 81 normal cases, 76 tuberculosis cases and 277 lung cancer cases while the other traditional transfer learning methods achieve an accuracy of 73% and 68%. . The detection accuracy of our method for cancers, TB and normal cases are 87%, 54% and 91% respectively. This indicates that the CAD system has potential to improve lung disease diagnosis accuracy in bronchoscopy and it also might be used to be more selective with biopsies.
- In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new method called two-step preconditioning to achieve an approximate solution with a time complexity lower than that of the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.
- Feb 06 2018 cs.CL arXiv:1802.01345v2Existing text generation methods tend to produce repeated and "boring" expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeated text and high reward for "novel" text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our method can generate substantially more diverse and informative text than existing baseline methods. The code is available at https://github.com/lancopku/DPGAN
- Jan 23 2018 cs.CV arXiv:1801.06742v2Sufficient training data is normally required to train deeply learned models. However, the number of pedestrian images per ID in person re-identification (re-ID) datasets is usually limited, since manually annotations are required for multiple camera views. To produce more data for training deeply learned models, generative adversarial network (GAN) can be leveraged to generate samples for person re-ID. However, the samples generated by vanilla GAN usually do not have labels. So in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated images. With MpRL, the generated samples will be used as supplementary of real training data to train a deep model in a semi-supervised learning fashion. Considering data bias between generated and real samples, MpRL utilizes different contributions from predefined training classes. The contribution-based virtual labels are automatically assigned to generated samples to reduce ambiguous prediction in training. Meanwhile, MpRL only relies on predefined training classes without using extra classes. Furthermore, to reduce over-fitting, a regularized manner is applied to MpRL to regularize the learning process. To verify the effectiveness of MpRL, two state-of-the-art convolutional neural networks (CNNs) are adopted in our experiments. Experiments demonstrate that by assigning MpRL to generated samples, we can further improve the person re-ID performance on three datasets i.e., Market-1501, DukeMTMCreID, and CUHK03. The proposed method obtains +6.29%, +6.30% and +5.58% improvements in rank-1 accuracy over a strong CNN baseline respectively, and outperforms the state-of-the- art methods.
- Jan 19 2018 cs.DC arXiv:1801.05868v1Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the network edge, thereby meeting the latency requirements of many emerging mobile applications and saving backhaul network bandwidth. Although many existing works have studied computation offloading policies, service caching is an equally, if not more important, design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related databases/libraries in the edge server (e.g. MEC-enabled BS), thereby enabling corresponding computation tasks to be executed. Because only a small number of application services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the edge computing performance. In this paper, we investigate the extremely compelling but much less studied problem of dynamic service caching in MEC-enabled dense cellular networks. We propose an efficient online algorithm, called OREO, which jointly optimizes dynamic service caching and task offloading to address a number of key challenges in MEC systems, including service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination. Our algorithm is developed based on Lyapunov optimization and Gibbs sampling, works online without requiring future information, and achieves provable close-to-optimal performance. Simulation results show that our algorithm can effectively reduce computation latency for end users while keeping energy consumption low.
- Graph embedding has been proven to be efficient and effective in facilitating graph analysis. In this paper, we present a novel spectral framework called NOn-Backtracking Embedding (NOBE), which offers a new perspective that organizes graph data at a deep level by tracking the flow traversing on the edges with backtracking prohibited. Further, by analyzing the non-backtracking process, a technique called graph approximation is devised, which provides a channel to transform the spectral decomposition on an edge-to-edge matrix to that on a node-to-node matrix. Theoretical guarantees are provided by bounding the difference between the corresponding eigenvalues of the original graph and its graph approximation. Extensive experiments conducted on various real-world networks demonstrate the efficacy of our methods on both macroscopic and microscopic levels, including clustering and structural hole spanner detection.
- This paper studies an unmanned aerial vehicle (UAV)-enabled wireless powered communication network (WPCN), in which a UAV is dispatched as a mobile access point (AP) to serve a set of ground users periodically. The UAV employs the radio frequency (RF) wireless power transfer (WPT) to charge the users in the downlink, and the users use the harvested RF energy to send independent information to the UAV in the uplink. Unlike the conventional WPCN with fixed APs, the UAV-enabled WPCN can exploit the mobility of the UAV via periodic trajectory design, jointly with the transmission resource allocation optimization, to improve the system performance. In particular, we aim to maximize the uplink common (minimum) throughput among all ground users over a finite UAV's flight period, subject to its maximum speed constraint and the users' energy neutrality constraints. The resulting problem is non-convex and thus difficult to be solved optimally. To tackle this challenge, we first consider an ideal case without the maximum UAV speed constraint, and obtain the optimal solution to the relaxed problem. The optimal solution shows that the UAV should successively hover above a finite number of ground locations for downlink WPT, as well as above each of the ground users for uplink communication. Next, based on the above multi-location-hovering solution, we propose a successive hover-and-fly trajectory, jointly with the downlink and uplink power allocations, to find an efficient suboptimal solution to the problem with the maximum UAV speed constraint. Numerical results show that the proposed UAV-enabled WPCN achieves significant common throughput gain over the conventional WPCN with a fixed-location AP.
- Suppose a database containing $M$ records is replicated across $N$ servers, and a user wants to privately retrieve one record by accessing the servers such that identity of the retrieved record is secret against any up to $T$ servers. A scheme designed for this purpose is called a $T$-private information retrieval ($T$-PIR) scheme. Three indexes are concerned for PIR schemes: (1)rate, indicating the amount of retrieved information per unit of downloaded data. The highest achievable rate is characterized by the capacity; (2) sub-packetization, reflexing the implementation complexity for linear schemes; (3) field size. We consider linear schemes over a finite field. In this paper, a general $T$-PIR scheme simultaneously attaining the optimality of almost all of the three indexes is presented. Specifically, we design a linear capacity-achieving $T$-PIR scheme with sub-packetization $\!dn^{M-1}\!$ over a finite field $\mathbb{F}_q$, $q\geq N$. The sub-packetization $\!dn^{M-1}\!$, where $\!d\!=\!{\rm gcd}(N,T)\!$ and $\!n\!=\!N/d$, has been proved to be optimal in our previous work. The field size of all existing capacity-achieving $T$-PIR schemes must be larger than $Nt^{M-2}$ where $t=T/d$, while our scheme reduces the field size by an exponential factor.
- Deep learning has achieved impressive results on many problems. However, it requires high degree of expertise or a lot of experience to tune well the hyperparameters, and such manual tuning process is likely to be biased. Moreover, it is not practical to try out as many different hyperparameter configurations in deep learning as in other machine learning scenarios, because evaluating each single hyperparameter configuration in deep learning would mean training a deep neural network, which usually takes quite long time. Hyperband algorithm achieves state-of-the-art performance on various hyperparameter optimization problems in the field of deep learning. However, Hyperband algorithm does not utilize history information of previous explored hyperparameter configurations, thus the solution found is suboptimal. We propose to combine Hyperband algorithm with Bayesian optimization (which does not ignore history when sampling next trial configuration). Experimental results show that our combination approach is superior to other hyperparameter optimization approaches including Hyperband algorithm.
- A fundamental challenge in wireless multicast has been how to simultaneously achieve high-throughput and low-delay for reliably serving a large number of users. In this paper, we show how to harness substantial throughput and delay gains by exploiting multi-channel resources. We develop a new scheme called Multi-Channel Moving Window Codes (MC-MWC) for multi-channel multi-session wireless multicast. The salient features of MC-MWC are three-fold. (i) High throughput: we show that MC-MWC achieves order-optimal throughput in the many-user many-channel asymptotic regime. Moreover, the number of channels required by a conventional channel-allocation based scheme is shown to be doubly-exponentially larger than that required by MC-MWC. (ii) Low delay: using large deviations theory, we show that the delay of MC-MWC decreases linearly with the number of channels, while the delay reduction of conventional schemes is no more than a finite constant. (iii) Low feedback overhead: the feedback overhead of MC-MWC is a constant that is independent of both the number of receivers in each session and the number of sessions in the network. Finally, our trace-driven simulation and numerical results validate the analytical results and show that the implementation complexity of MC-MWC is low.
- We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow \delta$ and obtain sharp performance bounds, where $m$ is the number of measurements and $n$ is the signal dimension. We show that for complex signals the algorithm can perform accurate recovery with only $m= \left(\frac{64}{\pi^2}-4\right)n \approx 2.5n$ measurements. Also, we provide sharp analysis on the sensitivity of the algorithm to noise. We highlight the following facts about our message passing algorithm: (i) Adding $\ell_2$ regularization to the non-convex loss function can be beneficial. (ii) Spectral initialization has marginal impact on the performance of the algorithm. The sharp analyses in this paper, not only enable us to compare the performance of our method with other phase recovery schemes, but also shed light on designing better iterative algorithms for other non-convex optimization problems.
- Although prior works have exploited the UAV's mobility to enhance the wireless communication performance under different setups, the fundamental capacity limits of UAV-enabled/aided multiuser communication systems have not yet been characterized. To fill this gap, we consider in this paper a UAV-enabled two-user broadcast channel (BC), where a UAV flying at a constant altitude is deployed to send independent information to two users at different fixed locations on the ground. We aim to characterize the capacity region of this new type of BC over a given UAV flight duration, by jointly optimizing the UAV's trajectory and transmit power/rate allocations over time, subject to the UAV's maximum speed and maximum transmit power constraints. First, to draw essential insights, we consider two special cases with asymptotically large/low UAV flight duration/speed, respectively. For the former case, it is shown that a simple hover-fly-hover (HFH) UAV trajectory with time division multiple access (TDMA) based orthogonal multiuser transmission is capacity-achieving, while in the latter case, the UAV should hover at a fixed location that is nearer to the user with larger achievable rate and in general superposition coding (SC) based non-orthogonal transmission with interference cancellation at the receiver of the nearer user is required. Next, we consider the general case with finite UAV speed and flight duration. We show that the optimal UAV trajectory should follow a general HFH structure, i.e., the UAV successively hovers at a pair of initial and final locations above the line segment of the two users each with a certain amount of time and flies unidirectionally between them at the maximum speed, and SC is generally needed.
- Dec 29 2017 cs.SI physics.soc-ph arXiv:1712.09648v1Our world produces massive data every day; they exist in diverse forms, from pairwise data and matrix to time series and trajectories. Meanwhile, we have access to the versatile toolkit of network analysis. Networks also have different forms; from simple networks to higher-order network, each representation has different capabilities in carrying information. For researchers who want to leverage the power of the network toolkit, and apply it beyond networks data to sequential data, diffusion data, and many more, the question is: how to represent big data and networks? This dissertation makes a first step to answering the question. It proposes the higher-order network, which is a critical piece for representing higher-order interaction data; it introduces a scalable algorithm for building the network, and visualization tools for interactive exploration. Finally, it presents broad applications of the higher-order network in the real-world.
- Dec 29 2017 cs.SI physics.soc-ph arXiv:1712.09658v1A major branch of anomaly detection methods rely on dynamic networks: raw sequential data is first converted to a series of networks, then critical change points are identified in the evolving network structure. However, existing approaches use the first-order network (FON) to represent the underlying raw data, which may lose important higher-order sequence patterns, making higher-order anomalies undetectable in subsequent analysis. By replacing FON with higher-order network (HONs), we show that existing anomaly detection algorithms can better capture higher-order anomalies that may otherwise be ignored. We show that the existing HON construction algorithm cannot be used for the anomaly detection task due to the extra parameters and poor scalability; we introduce a parameter-free algorithm that constructs HON in big data sets. Using a large-scale synthetic data set with 11 billion web clickstreams, we demonstrate how the proposed method can capture variable orders of anomalies. Using a real-world taxi trajectory data, we show how the proposed method amplifies higher-order anomaly signals. Finally, we provide complexity analysis and benchmarking to show how one can incorporating higher-order dependencies with a small overhead.
- Hybrid circuit and packet switching for data center networking (DCN) has received considerable research attention recently. A hybrid-switched DCN employs a much faster circuit switch that is reconfigurable with a nontrivial cost, and a much slower packet switch that is reconfigurable with no cost, to interconnect its racks of servers. The research problem is, given a traffic demand matrix (between the racks), how to compute a good circuit switch configuration schedule so that the vast majority of the traffic demand is removed by the circuit switch, leaving a remaining demand matrix that contains only small elements for the packet switch to handle. In this paper, we propose two new hybrid switch scheduling algorithms under two different scheduling constraints. Our first algorithm, called 2-hop Eclipse, strikes a much better tradeoff between the resulting performance (of the hybrid switch) and the computational complexity (of the algorithm) than the state of the art solution Eclipse/Eclipse++. Our second algorithm, called BFF (best first fit), is the first hybrid switching solution that exploits the potential partial reconfiguration capability of the circuit switch for performance gains.
- Dec 13 2017 cs.CV arXiv:1712.04119v1Positron emission tomography (PET) is widely used in various clinical applications, including cancer diagnosis, heart disease and neuro disorders. The use of radioactive tracer in PET imaging raises concerns due to the risk of radiation exposure. To minimize this potential risk in PET imaging, efforts have been made to reduce the amount of radio-tracer usage. However, lowing dose results in low Signal-to-Noise-Ratio (SNR) and loss of information, both of which will heavily affect clinical diagnosis. Besides, the ill-conditioning of low-dose PET image reconstruction makes it a difficult problem for iterative reconstruction algorithms. Previous methods proposed are typically complicated and slow, yet still cannot yield satisfactory results at significantly low dose. Here, we propose a deep learning method to resolve this issue with an encoder-decoder residual deep network with concatenate skip connections. Experiments shows the proposed method can reconstruct low-dose PET image to a standard-dose quality with only two-hundredth dose. Different cost functions for training model are explored. Multi-slice input strategy is introduced to provide the network with more structural information and make it more robust to noise. Evaluation on ultra-low-dose clinical data shows that the proposed method can achieve better result than the state-of-the-art methods and reconstruct images with comparable quality using only 0.5% of the original regular dose.
- Consider the problem of private information retrieval (PIR) over a distributed storage system where $M$ records are stored across $N$ servers by using an $[N,K]$ MDS code. For simplicity, this problem is usually referred as the coded-PIR problem. The capacity of coded-PIR with privacy against any individual server was determined by Banawan and Ulukus in 2016, i.e., $\mathcal{C}_{\tiny C-PIR}=(1+\frac{K}{N}+\dots+\frac{K^{M-1}}{N^{M-1}})^{-1}$. They also presented a linear capacity-achieving scheme with sub-packetization $KN^{M}$. In this paper we focus on minimizing the sub-packetization for linear capacity-achieving coded-PIR schemes. We prove that the sub-packetization for all linear capacity-achieving coded-PIR schemes in the nontrivial cases (i.e. $N>K\geq 1$ and $M>1$) must be no less than $Kn^{M-1}$, where $n=N/{\rm gcd}(N,K)$. Moreover, we design a linear capacity-achieving coded-PIR scheme with sub-packetization $Kn^{M-1}$ for all $N>K\geq 1$ and $M>1$. Therefore, $Kn^{M-1}$ is the optimal sub-packetization for linear capacity-achieving coded-PIR schemes.
- In the first part of this paper, inspired by the geometric method of Jean-Pierre Marec, we consider the two-impulse Hohmann transfer problem between two coplanar circular orbits as a constrained nonlinear programming problem. By using the Kuhn-Tucker theorem, we analytically prove the global optimality of the Hohmann transfer. Two sets of feasible solutions are found, one of which corresponding to the Hohmann transfer is the global minimum, and the other is a local minimum. In the second part, we formulate the Hohmann transfer problem as two-point and multi-point boundary-value problems by using the calculus of variations. With the help of the Matlab solver bvp4c, two numerical examples are solved successfully, which verifies that the Hohmann transfer is indeed the solution of these boundary-value problems. Via static and dynamic constrained optimization, the solution to the orbit transfer problem proposed by W. Hohmann ninety-two years ago and its global optimality are re-discovered.
- This paper develops a novel methodology for using symbolic knowledge in deep learning. From first principles, we derive a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output. An experimental evaluation shows that our semantic loss function effectively guides the learner to achieve (near-)state-of-the-art results on semi-supervised multi-class classification. Moreover, it significantly increases the ability of the neural network to predict structured objects, such as rankings and paths. These discrete concepts are tremendously difficult to learn, and benefit from a tight integration of deep learning and symbolic reasoning methods.
- We study the min-cost seed selection problem in online social networks, where the goal is to select a set of seed nodes with the minimum total cost such that the expected number of influenced nodes in the network exceeds a predefined threshold. We propose several algorithms that outperform the previous studies both on the theoretical approximation ratios and on the experimental performance. Under the case where the nodes have heterogeneous costs, our algorithms are the first bi- criteria approximation algorithms with polynomial running time and provable logarithmic performance bounds using a general contagion model. Under the case where the users have uniform costs, our algorithms achieve logarithmic approximation ratio and provable time complexity which is smaller than that of existing algorithms in orders of magnitude. We conduct extensive experiments using real social networks. The experimental results show that, our algorithms significantly outperform the existing algorithms both on the total cost and on the running time, and also scale well to billion-scale networks.
- This paper proposes the first end-to-end deep framework for high dynamic range (HDR) imaging of dynamic scenes with large-scale foreground motions. In state-of-the-art deep HDR imaging such as [13], the problem is formulated as an image composition problem, by first aligning input images using optical flows which are still error-prone due to occlusion and large motions. In our end-to-end approach, HDR imaging is formulated as an image translation problem and no optical flows are used. Moreover, our simple translation network can automatically hallucinate plausible HDR details in the presence of total occlusion, saturation and under-exposure, which are otherwise almost impossible to recover by conventional optimization approaches. We perform extensive qualitative and quantitative comparisons to show that our end-to-end HDR approach produces excellent results where color artifacts and geometry distortion are significantly reduced compared with existing state-ofthe-art methods.
- Convolutional Neural Networks (CNN) and the locally connected layer are limited in capturing the importance and relations of different local receptive fields, which are often crucial for tasks such as face verification, visual question answering, and word sequence prediction. To tackle the issue, we propose a novel locally smoothed neural network (LSNN) in this paper. The main idea is to represent the weight matrix of the locally connected layer as the product of the kernel and the smoother, where the kernel is shared over different local receptive fields, and the smoother is for determining the importance and relations of different local receptive fields. Specifically, a multi-variate Gaussian function is utilized to generate the smoother, for modeling the location relations among different local receptive fields. Furthermore, the content information can also be leveraged by setting the mean and precision of the Gaussian function according to the content. Experiments on some variant of MNIST clearly show our advantages over CNN and locally connected layer.
- We consider the problem of estimating means of two Gaussians in a 2-Gaussian mixture, which is not balanced and is corrupted by noise of an arbitrary distribution. We present a robust algorithm to estimate the parameters, together with upper bounds on the numbers of samples required for the estimate to be correct, where the bounds are parametrised by the dimension, ratio of the mixing coefficients, a measure of the separation of the two Gaussians, related to Mahalanobis distance, and a condition number of the covariance matrix. In theory, this is the first sample-complexity result for imbalanced mixtures corrupted by adversarial noise. In practice, our algorithm outperforms the vanilla Expectation-Maximisation (EM) algorithm in terms of estimation error.
- Nov 21 2017 cs.CL arXiv:1711.07010v3Named Entity Recognition and Relation Extraction for Chinese literature text is regarded as the highly difficult problem, partially because of the lack of tagging sets. In this paper, we build a discourse-level dataset from hundreds of Chinese literature articles for improving this task. To build a high quality dataset, we propose two tagging methods to solve the problem of data inconsistency, including a heuristic tagging method and a machine auxiliary tagging method. Based on this corpus, we also introduce several widely used models to conduct experiments. Experimental results not only show the usefulness of the proposed dataset, but also provide baselines for further research. The dataset is available at https://github.com/lancopku/Chinese-Literature-NER-RE-Dataset.
- Automatic conflict detection has grown in relevance with the advent of body-worn technology, but existing metrics such as turn-taking and overlap are poor indicators of conflict in police-public interactions. Moreover, standard techniques to compute them fall short when applied to such diversified and noisy contexts. We develop a pipeline catered to this task combining adaptive noise removal, non-speech filtering and new measures of conflict based on the repetition and intensity of phrases in speech. We demonstrate the effectiveness of our approach on body-worn audio data collected by the Los Angeles Police Department.
- This paper studies an unmanned aerial vehicle (UAV)-enabled multicast channel, in which a UAV serves as a mobile transmitter to deliver common information to a set of $K$ ground users. We aim to characterize the capacity of this channel over a finite UAV communication period, subject to its maximum speed constraint and an average transmit power constraint. To achieve the capacity, the UAV should use a sufficiently long code that spans over its whole communication period. Accordingly, the multicast channel capacity is achieved via maximizing the minimum achievable time-averaged rates of the $K$ users, by jointly optimizing the UAV's trajectory and transmit power allocation over time. However, this problem is non-convex and difficult to be solved optimally. To tackle this problem, we first consider a relaxed problem by ignoring the maximum UAV speed constraint, and obtain its globally optimal solution via the Lagrange dual method. The optimal solution reveals that the UAV should hover above a finite number of ground locations, with the optimal hovering duration and transmit power at each location. Next, based on such a multi-location-hovering solution, we present a successive hover-and-fly trajectory design and obtain the corresponding optimal transmit power allocation for the case with the maximum UAV speed constraint. Numerical results show that our proposed joint UAV trajectory and transmit power optimization significantly improves the achievable rate of the UAV-enabled multicast channel, and also greatly outperforms the conventional multicast channel with a fixed-location transmitter.
- In this addendum, we show that the switching algorithm QPS-SERENA can be converted R(QPS-SERENA), an algorithm for computing approximate Maximum Weight Matching (MWM). Empirically, R(QPS-SERENA) computes $(1-\epsilon)$-MWM within linear time (with respect to the number of edges $N^2$) for any fixed $\epsilon\in (0,1)$, for complete bipartite graphs with \it i.i.d. uniform edge weight distributions. This efficacy matches that of the state-of-art solution, although we so far cannot prove any theoretical guarantees on the time complexities needed to attain a certain approximation ratio. Then, we have similarly converted QPS-SERENADE to R(QPS-SERENADE), which empirically should output $(1-\epsilon)$-MWM within only $O(N \log N)$ time for the same type of complete bipartite graphs as described above.
- Nov 07 2017 cs.CL arXiv:1711.01427v1In recent years, neural networks have proven to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labelled training data. In this paper, we propose a deep stacking framework to improve the performance on word segmentation tasks with insufficient data by integrating datasets from diverse domains. Our framework consists of two parts, domain-based models and deep stacking networks. The domain-based models are used to learn knowledge from different datasets. The deep stacking networks are designed to integrate domain-based models. To reduce model conflicts, we innovatively add communication paths among models and design various structures of deep stacking networks, including Gaussian-based Stacking Networks, Concatenate-based Stacking Networks, Sequence-based Stacking Networks and Tree-based Stacking Networks. We conduct experiments on six low-resource datasets from various domains. Our proposed framework shows significant performance improvements on all datasets compared with several strong baselines.
- Suppose a database containing $M$ records is replicated across $N$ servers, and a user wants to privately retrieve one record by accessing the servers such that identity of the retrieved record is secret against any up to $T$ servers. A scheme designed for this purpose is called a private information retrieval (PIR) scheme. In practice, capacity-achieving and small sub-packetization are both desired for PIR schemes, because the former implies the highest download rate and the latter usually means simple realization. For general values of $N,T,M$, the only known capacity-achieving PIR scheme was designed by Sun and Jafar in 2016 with sub-packetization $N^M$. In this paper, we design a linear capacity-achieving PIR scheme with much smaller sub-packetization $dn^{M-1}$, where $d={\rm gcd}(N,T)$ and $n=N/d$. Furthermore, we prove that for any linear capacity-achieving PIR scheme it must have sub-packetization no less than $dn^{M-1}$, implying our scheme has the optimal sub-packetization. Moreover, comparing with Sun and Jafar's scheme, our scheme reduces the field size by a factor of $\frac{1}{Nd^{M-2}}$.
- Nov 01 2017 cs.CL arXiv:1710.11332v1Recently, encoder-decoder models are widely used in social media text summarization. However, these models sometimes select noise words in irrelevant sentences as part of a summary by error, thus declining the performance. In order to inhibit irrelevant sentences and focus on key information, we propose an effective approach by learning sentence weight distribution. In our model, we build a multi-layer perceptron to predict sentence weights. During training, we use the ROUGE score as an alternative to the estimated sentence weight, and try to minimize the gap between estimated weights and predicted weights. In this way, we encourage our model to focus on the key sentences, which have high relevance with the summary. Experimental results show that our approach outperforms baselines on a large-scale social media corpus.
- Nov 01 2017 cs.CL arXiv:1710.11334v1In recent years, more research has been devoted to studying the subtask of the complete shallow discourse parsing, such as indentifying discourse connective and arguments of connective. There is a need to design a full discourse parser to pull these subtasks together. So we develop a discourse parser turning the free text into discourse relations. The parser includes connective identifier, arguments identifier, sense classifier and non-explicit identifier, which connects with each other in pipeline. Each component applies the maximum entropy model with abundant lexical and syntax features extracted from the Penn Discourse Tree-bank. The head-based representation of the PDTB is adopted in the arguments identifier, which turns the problem of indentifying the arguments of discourse connective into finding the head and end of the arguments. In the non-explicit identifier, the contextual type features like words which have high frequency and can reflect the discourse relation are introduced to improve the performance of non-explicit identifier. Compared with other methods, experimental results achieve the considerable performance.
- Most of today's high-speed switches and routers adopt an input-queued crossbar switch architecture. Such a switch needs to compute a matching (crossbar schedule) between the input ports and output ports during each switching cycle (time slot). A key research challenge in designing large (in number of input/output ports $N$) input-queued crossbar switches is to develop crossbar scheduling algorithms that can compute "high quality" matchings - i.e. those that result in high switch throughput (ideally $100\%$) and low queueing delays for packets - at line rates. SERENA is arguably the best algorithm in that regard: It outputs excellent matching decisions that result in $100\%$ switch throughput and near-optimal queueing delays. However, since SERENA is a centralized algorithm with $O(N)$ computational complexity, it cannot support switches that both are large (in terms of $N$) and have a very high line rate per port. In this work, we propose SERENADE (SERENA, the Distributed Edition), a parallel algorithm suite that emulates SERENA in only $O(\log N)$ %(distributed) iterations between input ports and output ports, and hence has a time complexity of only $O(\log N)$ per port. Through extensive simulations, we show that all three variants in the SERENADE suite can, either provably or empirically, achieve 100\% throughput, and that they have similar delay performances as SERENA under heavy traffic loads.
- Oct 20 2017 cs.IR arXiv:1710.06997v1When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, Web pages are not solely linked texts, but have structured layout organizing a large variety of elements in different styles. Such layout itself can convey useful visual information, indicating the relevance of a Web page. For example, the query-independent layout (i.e., raw page layout) can help identify the page quality, while the query-dependent layout (i.e., page rendered with matched query words) can further tell rich structural information (e.g., size, position and proximity) of the matching signals. However, such visual information of layout has been seldom utilized in Web search in the past. In this work, we propose to learn rich visual features automatically from the layout of Web pages (i.e., Web page snapshots) for relevance ranking. Both query-independent and query-dependent snapshots are considered as the new inputs. We then propose a novel visual perception model inspired by human's visual search behaviors on page viewing to extract the visual features. This model can be learned end-to-end together with traditional human-crafted features. We also show that such visual features can be efficiently acquired in the online setting with an extended inverted indexing scheme. Experiments on benchmark collections demonstrate that learning visual features from Web page snapshots can significantly improve the performance of relevance ranking in ad-hoc Web retrieval tasks.
- Oct 18 2017 cs.NI arXiv:1710.06121v1Detecting the anomaly behaviors such as network failure or Internet intentional attack in the large-scale Internet is a vital but challenging task. While numerous techniques have been developed based on Internet traffic in past years, anomaly detection for structured datasets by complex network have just been of focus recently. In this paper, a anomaly detection method for large-scale Internet topology is proposed by considering the changes of network crashes. In order to quantify the dynamic changes of Internet topology, the network path changes coefficient(NPCC) is put forward which will highlight the Internet abnormal state after it is attacked continuously. Furthermore we proposed the decision function which is inspired by Fibonacci Sequence to determine whether the Internet is abnormal or not. That is the current Internet is abnormal if its NPCC is beyond the normal domain which structured by the previous k NPCCs of Internet topology. Finally the new Internet anomaly detection method was tested over the topology data of three Internet anomaly events. The results show that the detection accuracy of all events are over 97%, the detection precision of each event are 90.24%, 83.33% and 66.67%, when k = 36. According to the experimental values of the index F_1, we found the the better the detection performance is, the bigger the k is, and our method has better performance for the anomaly behaviors caused by network failure than that caused by intentional attack. Compared with traditional anomaly detection, our work may be more simple and powerful for the government or organization in items of detecting large-scale abnormal events.
- Oct 17 2017 cs.IR arXiv:1710.05649v1This paper concerns a deep learning approach to relevance ranking in information retrieval (IR). Existing deep IR models such as DSSM and CDSSM directly apply neural networks to generate ranking scores, without explicit understandings of the relevance. According to the human judgement process, a relevance label is generated by the following three steps: 1) relevant locations are detected, 2) local relevances are determined, 3) local relevances are aggregated to output the relevance label. In this paper we propose a new deep learning architecture, namely DeepRank, to simulate the above human judgment process. Firstly, a detection strategy is designed to extract the relevant contexts. Then, a measure network is applied to determine the local relevances by utilizing a convolutional neural network (CNN) or two-dimensional gated recurrent units (2D-GRU). Finally, an aggregation network with sequential integration and term gating mechanism is used to produce a global relevance score. DeepRank well captures important IR characteristics, including exact/semantic matching signals, proximity heuristics, query term importance, and diverse relevance requirement. Experiments on both benchmark LETOR dataset and a large scale clickthrough data show that DeepRank can significantly outperform learning to ranking methods, and existing deep learning methods.
- Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the proximity of data sources, thereby reducing service provision latency and saving backhaul network bandwidth. Although computation offloading has been extensively studied in the literature, service caching is an equally, if not more, important design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related data (libraries/databases) in the edge server, e.g. MEC-enabled Base Station (BS), enabling corresponding computation tasks to be executed. Since only a small number of services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the system performance. In this paper, we investigate collaborative service caching in MEC-enabled dense small cell (SC) networks. We propose an efficient decentralized algorithm, called CSC (Collaborative Service Caching), where a network of small cell BSs optimize service caching collaboratively to address a number of key challenges in MEC systems, including service heterogeneity, spatial demand coupling, and decentralized coordination. Our algorithm is developed based on parallel Gibbs sampling by exploiting the special structure of the considered problem using graphing coloring. The algorithm significantly improves the time efficiency compared to conventional Gibbs sampling, yet guarantees provable convergence and optimality. CSC is further extended to the SC network with selfish BSs, where a coalitional game is formulated to incentivize collaboration. A coalition formation algorithm is developed by employing the merge-and-split rules and ensures the stability of the SC coalitions.
- Imitation learning holds the promise to address challenging robotic tasks such as autonomous navigation. It however requires a human supervisor to oversee the training process and send correct control commands to robots without feedback, which is always prone to error and expensive. To minimize human involvement and avoid manual labeling of data in the robotic autonomous navigation with imitation learning, this paper proposes a novel semi-supervised imitation learning solution based on a multi-sensory design. This solution includes a suboptimal sensor policy based on sensor fusion to automatically label states encountered by a robot to avoid human supervision during training. In addition, a recording policy is developed to throttle the adversarial affect of learning too much from the suboptimal sensor policy. This solution allows the robot to learn a navigation policy in a self-supervised manner. With extensive experiments in indoor environments, this solution can achieve near human performance in most of the tasks and even surpasses human performance in case of unexpected events such as hardware failures or human operation errors. To best of our knowledge, this is the first work that synthesizes sensor fusion and imitation learning to enable robotic autonomous navigation in the real world without human supervision.
- Graph based semi-supervised learning (GSSL) has intuitive representation and can be improved by exploiting the matrix calculation. However, it has to perform iterative optimization to achieve a preset objective, which usually leads to low efficiency. Another inconvenience lying in GSSL is that when new data come, the graph construction and the optimization have to be conducted all over again. We propose a sound assumption, arguing that: the neighboring data points are not in peer-to-peer relation, but in a partial-ordered relation induced by the local density and distance between the data; and the label of a center can be regarded as the contribution of its followers. Starting from the assumption, we develop a highly efficient non-iterative label propagation algorithm based on a novel data structure named as optimal leading forest (LaPOLeaF). The major weaknesses of the traditional GSSL are addressed by this study. We further scale LaPOLeaF to accommodate big data by utilizing block distance matrix technique, parallel computing, and Locality-Sensitive Hashing (LSH). Experiments on large datasets have shown the promising results of the proposed methods.
- In this paper, we use the house price data ranging from January 2004 to October 2016 to predict the average house price of November and December in 2016 for each district in Beijing, Shanghai, Guangzhou and Shenzhen. We apply Autoregressive Integrated Moving Average model to generate the baseline while LSTM networks to build prediction model. These algorithms are compared in terms of Mean Squared Error. The result shows that the LSTM model has excellent properties with respect to predict time series. Also, stateful LSTM networks and stack LSTM networks are employed to further study the improvement of accuracy of the house prediction model.
- This paper studies a new unmanned aerial vehicle (UAV)-enabled wireless power transfer (WPT) system, where a UAV-mounted mobile energy transmitter (ET) is dispatched to deliver wireless energy to a set of on-ground energy receivers (ERs). We investigate how the UAV should optimally exploit its mobility via trajectory design to maximize the energy transferred to all ERs during a finite period. First, we consider the maximization of the sum energy received by all ERs by optimizing the UAV's trajectory subject to its maximum speed constraint. We obtain its optimal solution, which shows that the UAV should hover at one single fixed location during the whole period. However, the sum-energy maximization incurs a "near-far" fairness issue. To overcome this issue, we consider a different problem to maximize the minimum received energy among all ERs. We first consider an ideal case by ignoring the UAV's maximum speed constraint, and show that the relaxed problem can be optimally solved via the Lagrange dual method. Then, for the general case with the UAV's maximum speed constraint considered, we propose a new successive hover-and-fly trajectory motivated by the optimal trajectory in the ideal case, and obtain efficient trajectory designs by applying the successive convex programing (SCP).
- Sep 20 2017 cs.NE arXiv:1709.06114v2Research on the performance of recycled concrete as building material in the current world is an important subject. Given the complex composition of recycled concrete, conventional methods for forecasting slump scarcely obtain satisfactory results. Based on theory of nonlinear prediction method, we propose a recycled concrete slump prediction model based on geometric semantic genetic programming (GSGP) and combined it with recycled concrete features. Tests show that the model can accurately predict the recycled concrete slump by using the established prediction model to calculate the recycled concrete slump with different mixing ratios in practical projects and by comparing the predicted values with the experimental values. By comparing the model with several other nonlinear prediction models, we can conclude that GSGP has higher accuracy and reliability than conventional methods.
- As traditional neural network consumes a significant amount of computing resources during back propagation, \citetSun2017mePropSB propose a simple yet effective technique to alleviate this problem. In this technique, only a small subset of the full gradients are computed to update the model parameters. In this paper we extend this technique into the Convolutional Neural Network(CNN) to reduce calculation in back propagation, and the surprising results verify its validity in CNN: only 5\% of the gradients are passed back but the model still achieves the same effect as the traditional CNN, or even better. We also show that the top-$k$ selection of gradients leads to a sparse calculation in back propagation, which may bring significant computational benefits for high computational complexity of convolution operation in CNN.
- There is a special type of text which the order of the rows makes no difference (e.g., a word list). To compress these special texts, the traditional lossless compression method is not the ideal choice. A new method that can achieve better compression results for this type of texts is proposed. The texts are pre-processed by a method named SSE and are then compressed through the traditional lossless compression method. Comparison shows that an improved compression result is achieved.
- The recent development of CNN-based image dehazing has revealed the effectiveness of end-to-end modeling. However, extending the idea to end-to-end video dehazing has not been explored yet. In this paper, we propose an End-to-End Video Dehazing Network (EVD-Net), to exploit the temporal consistency between consecutive video frames. A thorough study has been conducted over a number of structure options, to identify the best temporal fusion strategy. Furthermore, we build an End-to-End United Video Dehazing and Detection Network(EVDD-Net), which concatenates and jointly trains EVD-Net with a video object detection model. The resulting augmented end-to-end pipeline has demonstrated much more stable and accurate detection results in hazy video.
- This paper studies the problem of estimating the grahpon model - the underlying generating mechanism of a network. Graphon estimation arises in many applications such as predicting missing links in networks and learning user preferences in recommender systems. The graphon model deals with a random graph of $n$ vertices such that each pair of two vertices $i$ and $j$ are connected independently with probability $\rho \times f(x_i,x_j)$, where $x_i$ is the unknown $d$-dimensional label of vertex $i$, $f$ is an unknown symmetric function, and $\rho$ is a scaling parameter characterizing the graph sparsity. Recent studies have identified the minimax error rate of estimating the graphon from a single realization of the random graph. However, there exists a wide gap between the known error rates of computationally efficient estimation procedures and the minimax optimal error rate. Here we analyze a spectral method, namely universal singular value thresholding (USVT) algorithm, in the relatively sparse regime with the average vertex degree $n\rho=\Omega(\log n)$. When $f$ belongs to HÃ¶lder or Sobolev space with smoothness index $\alpha$, we show the error rate of USVT is at most $(n\rho)^{ -2 \alpha / (2\alpha+d)}$, approaching the minimax optimal error rate $\log (n\rho)/(n\rho)$ for $d=1$ as $\alpha$ increases. Furthermore, when $f$ is analytic, we show the error rate of USVT is at most $\log^d (n\rho)/(n\rho)$. In the special case of stochastic block model with $k$ blocks, the error rate of USVT is at most $k/(n\rho)$, which is larger than the minimax optimal error rate by at most a multiplicative factor $k/\log k$. This coincides with the computational gap observed for community detection. A key step of our analysis is to derive the eigenvalue decaying rate of the edge probability matrix using piecewise polynomial approximations of the graphon function $f$.
- Merging mobile edge computing (MEC) functionality with the dense deployment of base stations (BSs) provides enormous benefits such as a real proximity, low latency access to computing resources. However, the envisioned integration creates many new challenges, among which mobility management (MM) is a critical one. Simply applying existing radio access oriented MM schemes leads to poor performance mainly due to the co-provisioning of radio access and computing services of the MEC-enabled BSs. In this paper, we develop a novel user-centric energy-aware mobility management (EMM) scheme, in order to optimize the delay due to both radio access and computation, under the long-term energy consumption constraint of the user. Based on Lyapunov optimization and multi-armed bandit theories, EMM works in an online fashion without future system state information, and effectively handles the imperfect system state information. Theoretical analysis explicitly takes radio handover and computation migration cost into consideration and proves a bounded deviation on both the delay performance and energy consumption compared to the oracle solution with exact and complete future system information. The proposed algorithm also effectively handles the scenario in which candidate BSs randomly switch on/off during the offloading process of a task. Simulations show that the proposed algorithms can achieve close-to-optimal delay performance while satisfying the user energy consumption constraint.
- We present an approach to accelerating a wide variety of image processing operators. Our approach uses a fully-convolutional network that is trained on input-output pairs that demonstrate the operator's action. After training, the original operator need not be run at all. The trained network operates at full resolution and runs in constant time. We investigate the effect of network architecture on approximation accuracy, runtime, and memory footprint, and identify a specific architecture that balances these considerations. We evaluate the presented approach on ten advanced image processing operators, including multiple variational models, multiscale tone and detail manipulation, photographic style transfer, nonlocal dehazing, and nonphotorealistic stylization. All operators are approximated by the same model. Experiments demonstrate that the presented approach is significantly more accurate than prior approximation schemes. It increases approximation accuracy as measured by PSNR across the evaluated operators by 8.5 dB on the MIT-Adobe dataset (from 27.5 to 36 dB) and reduces DSSIM by a multiplicative factor of 3 compared to the most accurate prior approximation scheme, while being the fastest. We show that our models generalize across datasets and across resolutions, and investigate a number of extensions of the presented approach. The results are shown in the supplementary video at https://youtu.be/eQyfHgLx8Dc
- Designing an e-commerce recommender system that serves hundreds of millions of active users is a daunting challenge. From a human vision perspective, there're two key factors that affect users' behaviors: items' attractiveness and their matching degree with users' interests. This paper proposes Telepath, a vision-based bionic recommender system model, which understands users from such perspective. Telepath is a combination of a convolutional neural network (CNN), a recurrent neural network (RNN) and deep neural networks (DNNs). Its CNN subnetwork simulates the human vision system to extract key visual signals of items' attractiveness and generate corresponding activations. Its RNN and DNN subnetworks simulate cerebral cortex to understand users' interest based on the activations generated from browsed items. In practice, the Telepath model has been launched to JD's recommender system and advertising system. For one of the major item recommendation blocks on the JD app, click-through rate (CTR), gross merchandise value (GMV) and orders have increased 1.59%, 8.16% and 8.71% respectively. For several major ads publishers of JD demand-side platform, CTR, GMV and return on investment have increased 6.58%, 61.72% and 65.57% respectively by the first launch, and further increased 2.95%, 41.75% and 41.37% respectively by the second launch.
- This paper studies the wireless surveillance of a hybrid automatic repeat request (HARQ) based suspicious communication link over Rayleigh fading channels. We propose a proactive eavesdropping approach, where a half-duplex monitor can opportunistically jam the suspicious link to exploit its potential retransmissions for overhearing more efficiently. In particular, we consider that the suspicious link uses at most two HARQ rounds for transmitting the same data packet, and we focus on two cases without and with HARQ combining at the monitor receiver. In both cases, we aim to maximize the successful eavesdropping probability at the monitor, by adaptively allocating the jamming power in the first HARQ round according to fading channel conditions, subject to an average jamming power constraint. For both cases, we show that the optimal jamming power allocation follows a threshold-based policy, and the monitor jams with constant power when the eavesdropping channel gain is less than the threshold. Numerical results show that the proposed proactive eavesdropping scheme achieves higher successful eavesdropping probability than the conventional passive eavesdropping, and HARQ combining can help further improve the eavesdropping performance.
- Aug 31 2017 cs.NE arXiv:1708.09116v2Genetic programming has been widely used in the engineering field. Compared with the conventional genetic programming and artificial neural network, geometric semantic genetic programming (GSGP) is superior in astringency and computing efficiency. In this paper, GSGP is adopted for the classification and regression analysis of a sample dataset. Furthermore, a model for slope stability analysis is established on the basis of geometric semantics. According to the results of the study based on GSGP, the method can analyze slope stability objectively and is highly precise in predicting slope stability and safety factors. Hence, the predicted results can be used as a reference for slope safety design.
- Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-membrane proteins (non-MPs) and then predicting three-dimensional structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs (TMscore at least 0.6), and generates three-dimensional models with RMSD less than 4 Angstrom and 5 Angstrom for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation (CAMEO) project shows that our method predicted high-resolution three-dimensional models for two recent test MPs of 210 residues with RMSD close to 2 Angstrom. We estimated that our method could predict correct folds for between 1,345 and 1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at membrane proteins.
- Long Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture shows more than 8% relative reduction in Phone Error Rate (PER) on TIMIT tasks. At the same time, our residual fast LSTM approach shows 4% relative reduction in PER on the same task. Besides, we find that all this architecture could have good results on THCHS-30, Librispeech and Switchboard corpora.
- Mobile robots are cyber-physical systems where the cyberspace and the physical world are strongly coupled. Attacks against mobile robots can transcend cyber defenses and escalate into disastrous consequences in the physical world. In this paper, we focus on the detection of active attacks that are capable of directly influencing robot mission operation. Through leveraging physical dynamics of mobile robots, we develop RIDS, a novel robot intrusion detection system that can detect actuator attacks as well as sensor attacks for nonlinear mobile robots subject to stochastic noises. We implement and evaluate a RIDS on Khepera mobile robot against concrete attack scenarios via various attack channels including signal interference, sensor spoofing, logic bomb, and physical damage. Evaluation of 20 experiments shows that the averages of false positive rates and false negative rates are both below 1%. Average detection delay for each attack remains within 0.40s.
- Aug 01 2017 cs.DC arXiv:1707.09455v2The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. Online phase uses the discovered knowledge from the offline analysis along with real-time investigation of the network condition to optimize the protocol parameters. As real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data to reduce the real-time investigation overhead while ensuring near optimal throughput for each transfer. Our network and data agnostic solution is tested over different networks and achieved up to 93% accuracy compared with the optimal achievable throughput possible on those networks.
- Aug 01 2017 cs.CV arXiv:1707.09862v1Existing manifold learning methods are not appropriate for image retrieval task, because most of them are unable to process query image and they have much additional computational cost especially for large scale database. Therefore, we propose the iterative manifold embedding (IME) layer, of which the weights are learned off-line by unsupervised strategy, to explore the intrinsic manifolds by incomplete data. On the large scale database that contains 27000 images, IME layer is more than 120 times faster than other manifold learning methods to embed the original representations at query time. We embed the original descriptors of database images which lie on manifold in a high dimensional space into manifold-based representations iteratively to generate the IME representations in off-line learning stage. According to the original descriptors and the IME representations of database images, we estimate the weights of IME layer by ridge regression. In on-line retrieval stage, we employ the IME layer to map the original representation of query image with ignorable time cost (2 milliseconds). We experiment on five public standard datasets for image retrieval. The proposed IME layer significantly outperforms related dimension reduction methods and manifold learning methods. Without post-processing, Our IME layer achieves a boost in performance of state-of-the-art image retrieval methods with post-processing on most datasets, and needs less computational cost.
- Jul 31 2017 cs.CL arXiv:1707.09168v1The charge prediction task is to determine appropriate charges for a given case, which is helpful for legal assistant systems where the user input is fact description. We argue that relevant law articles play an important role in this task, and therefore propose an attention-based neural network method to jointly model the charge prediction task and the relevant article extraction task in a unified framework. The experimental results show that, besides providing legal basis, the relevant articles can also clearly improve the charge prediction results, and our full model can effectively predict appropriate charges for cases with different expression styles.
- Jul 26 2017 cs.IR arXiv:1707.07700v1The effective of information retrieval (IR) systems have become more important than ever. Deep IR models have gained increasing attention for its ability to automatically learning features from raw text; thus, many deep IR models have been proposed recently. However, the learning process of these deep IR models resemble a black box. Therefore, it is necessary to identify the difference between automatically learned features by deep IR models and hand-crafted features used in traditional learning to rank approaches. Furthermore, it is valuable to investigate the differences between these deep IR models. This paper aims to conduct a deep investigation on deep IR models. Specifically, we conduct an extensive empirical study on two different datasets, including Robust and LETOR4.0. We first compared the automatically learned features and hand-crafted features on the respects of query term coverage, document length, embeddings and robustness. It reveals a number of disadvantages compared with hand-crafted features. Therefore, we establish guidelines for improving existing deep IR models. Furthermore, we compare two different categories of deep IR models, i.e. representation-focused models and interaction-focused models. It is shown that two types of deep IR models focus on different categories of words, including topic-related words and query-related words.