results for au:Yang_X in:cs

- Dec 13 2017 cs.NI arXiv:1712.04201v1We investigate network performance of ultra-dense heterogeneous networks (HetNets) and study the maximum energy-efficient base station (BS) deployment incorporating probabilistic non-line-of-sight (NLoS) and line-of-sight (LoS) transmissions. First, we develop an analytical framework with the maximum instantaneous received power (MIRP) and the maximum average received power (MARP) association schemes to model the coverage probability and related performance metrics, e.g., the potential throughput (PT) and the energy efficiency (EE). Second, we formulate two optimization problems to achieve the maximum energy-efficient deployment solution with specific service criteria. Simulation results show that there are tradeoffs among the coverage probability, the total power consumption, and the EE. To be specific, the maximum coverage probability with ideal power consumption is superior to that with practical power consumption when the total power constraint is small and inferior to that with practical power consumption when the total power constraint becomes large. Moreover, the maximum EE is a decreasing function with respect to the coverage probability constraint.
- Dec 04 2017 cs.CV arXiv:1712.00097v1In this paper, we address the challenging problem of effi- cient temporal activity detection in untrimmed long videos. While most recent work has focused and advanced the de- tection accuracy, the inference time can take seconds to minutes in processing one video, which is computationally prohibitive for many applications with tight runtime con- straints. This motivates our proposed budget-aware frame- work, which learns to perform activity detection by intel- ligently selecting a small subset of frames according to a specified time or computational budget. We formulate this problem as a Markov decision process, and adopt a recurrent network to model a policy for the frame selec- tion. To train the network, we employ a recurrent policy gradient approach to approximate the gradient of the non- decomposable and non-differentiable objective defined in our problem. In the extensive experiments on two bench- mark datasets, we achieve competitive detection accuracy, and more importantly, our approach is able to substantially reduce computational time and detect multiple activities in only 348ms for each untrimmed long video of THUMOS14 and ActivityNet.
- Computer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a conditional variational autoencoder with augmented word2vec architecture that explicitly represents the topic or theme information. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. The proposed augmented word2vec model further improves the rhythm and symmetry. We also present a straightforward evaluation metric RHYTHM score to automatically measure the rule-consistency of generated poems. Tests show that 45.24% generated poems by our model are judged by humans to be written by real people.
- Nov 17 2017 cs.CL arXiv:1711.06061v1Machine translation is going through a radical revolution, driven by the explosive development of deep learning techniques using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a special case in machine translation problems, targeting to translate natural language into Structural Query Language (SQL) for data retrieval over relational database. Although generic CNN and RNN learn the grammar structure of SQL when trained with sufficient samples, the accuracy and training efficiency of the model could be dramatically improved, when the translation model is deeply integrated with the grammar rules of SQL. We present a new encoder-decoder framework, with a suite of new approaches, including new semantic features fed into the encoder as well as new grammar-aware states injected into the memory of decoder. These techniques help the neural network focus on understanding semantics of the operations in natural language and save the efforts on SQL grammar learning. The empirical evaluation on real world database and queries show that our approach outperform state-of-the-art solution by a significant margin.
- We present a robust and precise localization system that achieves centimeter-level localization accuracy in disparate city scenes. Our system adaptively uses information from complementary sensors such as GNSS, LiDAR, and IMU to achieve high localization accuracy and resilience in challenging scenes, such as urban downtown, highways, and tunnels. Rather than relying only on LiDAR intensity or 3D geometry, we make innovative use of LiDAR intensity and altitude cues to significantly improve localization system accuracy and robustness. Our GNSS RTK module utilizes the help of the multi-sensor fusion framework and achieves a better ambiguity resolution success rate. An error-state Kalman filter is applied to fuse the localization measurements from different sources with novel uncertainty estimation. We validate, in detail, the effectiveness of our approaches, achieving 5-10cm RMS accuracy and outperforming previous state-of-the-art systems. Importantly, our system, while deployed in a large autonomous driving fleet, made our vehicles fully autonomous in crowded city streets despite road construction that occurred from time to time. A dataset including more than 60 km real traffic driving in various urban roads is used to comprehensively test our system.
- Deformable image registration and regression are important tasks in medical image analysis. However, they are computationally expensive, especially when analyzing large-scale datasets that contain thousands of images. Hence, cluster computing is typically used, making the approaches dependent on such computational infrastructure. Even larger computational resources are required as study sizes increase. This limits the use of deformable image registration and regression for clinical applications and as component algorithms for other image analysis approaches. We therefore propose using a fast predictive approach to perform image registrations. In particular, we employ these fast registration predictions to approximate a simplified geodesic regression model to capture longitudinal brain changes. The resulting method is orders of magnitude faster than the standard optimization-based regression model and hence facilitates large-scale analysis on a single graphics processing unit (GPU). We evaluate our results on 3D brain magnetic resonance images (MRI) from the ADNI datasets.
- Nov 16 2017 cs.MM arXiv:1711.04916v2The distortion in steganography that usually comes from the modification or recoding on the cover image during the embedding process leaves the steganalyzer with possibility of discriminating. Faced with such a risk, we propose generative steganography with Kerckhoffs' principle (GSK) in this letter. In GSK, the secret messages are generated by a cover image using a generator rather than embedded into the cover, thus resulting in no modifications in the cover. To ensure the security, the generators are trained to meet Kerckhoffs' principle based on generative adversarial networks (GAN). Everything about the GSK system, except the extraction key, is public knowledge for the receivers. The secret messages can be outputted by the generator if and only if the extraction key and the cover image are both inputted. In the generator training procedures, there are two GANs, Message- GAN and Cover-GAN, designed to work jointly making the generated results under the control of the extraction key and the cover image. We provide experimental results on the training process and give an example of the working process by adopting a generator trained on MNIST, which demonstrate that GSK can use a cover image without any modification to generate messages, and without the extraction key or the cover image, only meaningless results would be obtained.
- Most mainstream Automatic Speech Recognition (ASR) systems consider all feature frames equally important. However, acoustic landmark theory is based on a contradictory idea, that some frames are more important than others. Acoustic landmark theory exploits the quantal nonlinear articulatory-acoustic relationships from human speech perception experiments, and provides theoretical support for extracting acoustic features in the vicinity of landmark regions where an abrupt change occurs in the spectrum of speech signals. In this work, we conduct experiments on the TIMIT corpus, with both GMM and DNN based ASR systems and found that frames containing landmarks are more informative than others. We found that altering the level of emphasis on landmarks through accordingly re-weighting acoustic likelihood in frames, tends to reduce the phone error rate (PER). Furthermore, by leveraging the landmark as a heuristic, one of our hybrid DNN frame dropping strategies maintained a PER within 0.44% of optimal when scoring less than half (41.2% to be precise) of the frames. This hybrid strategy out-performs other non-heuristicbased methods and demonstrates the potential of landmarks for reducing computation.
- Oct 03 2017 physics.chem-ph cs.CE arXiv:1710.00616v1Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational auto encoders (VAEs) and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel python library ChemTS that explores the chemical space by combining Monte Carlo tree search (MCTS) and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.
- Sep 28 2017 cs.CV arXiv:1709.09550v1We propose the segmentation of noisy datasets into Multiple Inlier Structures with a new Robust Estimator (MISRE). The scale of each individual structure is estimated adaptively from the input data and refined by mean shift, without tuning any parameter in the process, or manually specifying thresholds for different estimation problems. Once all the data points were classified into separate structures, these structures are sorted by their densities with the strongest inlier structures coming out first. Several 2D and 3D synthetic and real examples are presented to illustrate the efficiency, robustness and the limitations of the MISRE algorithm.
- Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.
- The new cyber attack pattern of advanced persistent threats (APTs) poses a serious threat to cyberspace. This paper addresses the issue of defending against APTs in a cost-effective way. First, the APT-based cyber attack-defense processes are modeled as a type of differential dynamical systems. Then, the cyber defense problem is modeled as an optimal control problem. The optimal control problem is shown to have an optimal control, and the optimality system for the problem is presented. Therefore, a cost-effective cyber defense strategy can be figured out by solving the optimality system. Finally, the influences of some factors, including the bounds on the admissible controls and the network topology, on the cost-effective defense strategies are examined. To our knowledge, this is the first time the APT-based cyber defense problem is treated this way.
- Sep 11 2017 cs.SI arXiv:1709.02767v1This paper addresses the issue of suppressing a rumor using the truth in a cost-effective way. First, an individual-level dynamical model capturing the rumor-truth mixed spreading processes is proposed. On this basis, the cost-effective rumor-containing problem is modeled as an optimization problem. Extensive experiments show that finding a cost-effective rumor-containing strategy boils down to enhancing the first truth-spreading rate until the cost effectiveness of the rumor-containing strategy reaches the first turning point. This finding greatly reduces the time spent for solving the optimization problem. The influences of different factors on the optimal cost effectiveness of a rumor-containing strategy are examined through computer simulations. We believe our findings help suppress rumors in a cost-effective way. To our knowledge, this is the first time the rumor-containing problem is treated this way.
- Sep 08 2017 cs.CV arXiv:1709.02371v2We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct the cost volume, which is processed by a CNN to estimate the optical flow. PWCNet is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our model will be publicly available.
- Aug 30 2017 cs.CV arXiv:1708.08687v1Input binarization has shown to be an effective way for network acceleration. However, previous binarization scheme could be regarded as simple pixel-wise thresholding operations (i.e., order-one approximation) and suffers a big accuracy loss. In this paper, we propose a highorder binarization scheme, which achieves more accurate approximation while still possesses the advantage of binary operation. In particular, the proposed scheme recursively performs residual quantization and yields a series of binary input images with decreasing magnitude scales. Accordingly, we propose high-order binary filtering and gradient propagation operations for both forward and backward computations. Theoretical analysis shows approximation error guarantee property of proposed method. Extensive experimental results demonstrate that the proposed scheme yields great recognition accuracy while being accelerated.
- Aug 11 2017 cs.DC arXiv:1708.03184v2Big data analytics on geographically distributed datasets (across data centers or clusters) has been attracting increasing interests from both academia and industry, but also significantly complicates the system and algorithm designs. In this article, we systematically investigate the geo-distributed big-data analytics framework by analyzing the fine-grained paradigm and the key design principles. We present a dynamic global manager selection algorithm (GMSA) to minimize energy consumption cost by fully exploiting the system diversities in geography and variation over time. The algorithm makes real-time decisions based on the measurable system parameters through stochastic optimization methods, while achieving the performance balances between energy cost and latency. Extensive trace-driven simulations verify the effectiveness and efficiency of the proposed algorithm. We also highlight several potential research directions that remain open and require future elaborations in analyzing geo-distributed big data.
- We develop an extension of recently developed methods for obtaining time-space tradeoff lower bounds for problems of learning from random test samples to handle the situation where the space of tests is signficantly smaller than the space of inputs, a class of learning problems that is not handled by prior work. This extension is based on a measure of how matrices amplify the 2-norms of probability distributions that is more refined than the 2-norms of these matrices. As applications that follow from our new technique, we show that any algorithm that learns $m$-variate homogeneous polynomial functions of degree at most $d$ over $\mathbb{F}_2$ from evaluations on randomly chosen inputs either requires space $\Omega(mn)$ or $2^{\Omega(m)}$ time where $n=m^{\Theta(d)}$ is the dimension of the space of such functions. These bounds are asymptotically optimal since they match the tradeoffs achieved by natural learning algorithms for the problems.
- We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.
- Aug 03 2017 cs.CV arXiv:1708.00813v1Sustainability of the global environment is dependent on the accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing land cover datasets were derived from a pixel-based single-date multi-spectral remotely sensed image with low accuracy. To improve the accuracy, the bottleneck is how to develop an accurate and effective image classification technique. By incorporating and utilizing the complete multi-spectral, multi-temporal and spatial information in remote sensing images and considering their inherit spatial and sequential interdependence, we propose a new patch-based RNN (PB-RNN) system tailored for multi-temporal remote sensing data. The system is designed by incorporating distinctive characteristics in multi-temporal remote sensing data. In particular, it uses multi-temporal-spectral-spatial samples and deals with pixels contaminated by clouds/shadow present in the multi-temporal data series. Using a Florida Everglades ecosystem study site covering an area of 771 square kilo-meters, the proposed PB-RNN system has achieved a significant improvement in the classification accuracy over pixel-based RNN system, pixel-based single-imagery NN system, pixel-based multi-images NN system, patch-based single-imagery NN system and patch-based multi-images NN system. For example, the proposed system achieves 97.21% classification accuracy while a pixel-based single-imagery NN system achieves 64.74%. By utilizing methods like the proposed PB-RNN one, we believe that much more accurate land cover datasets can be produced over large areas efficiently.
- Mobile edge computing (MEC) is expected to be an effective solution to deliver 360-degree virtual reality (VR) videos over wireless networks. In contrast to previous computation-constrained MEC framework, which reduces the computation-resource consumption at the mobile VR device by increasing the communication-resource consumption, we develop a communications-constrained MEC framework to reduce communication-resource consumption by increasing the computation-resource consumption and exploiting the caching resources at the mobile VR device in this paper. Specifically, according to the task modularization, the MEC server can only deliver the components which have not been stored in the VR device, and then the VR device uses the received components and the corresponding cached components to construct the task, resulting in low communication-resource consumption but high delay. The MEC server can also compute the task by itself to reduce the delay, however, it consumes more communication-resource due to the delivery of entire task. Therefore, we then propose a task scheduling strategy to decide which computation model should the MEC server operates, in order to minimize the communication-resource consumption under the delay constraint. Finally, we discuss the tradeoffs between communications, computing, and caching in the proposed system.
- Aug 03 2017 cs.CV arXiv:1708.00573v1Automatic and accurate whole-heart and great vessel segmentation from 3D cardiac magnetic resonance (MR) images plays an important role in the computer-assisted diagnosis and treatment of cardiovascular disease. However, this task is very challenging due to ambiguous cardiac borders and large anatomical variations among different subjects. In this paper, we propose a novel densely-connected volumetric convolutional neural network, referred as DenseVoxNet, to automatically segment the cardiac and vascular structures from 3D cardiac MR images. The DenseVoxNet adopts the 3D fully convolutional architecture for effective volume-to-volume prediction. From the learning perspective, our DenseVoxNet has three compelling advantages. First, it preserves the maximum information flow between layers by a densely-connected mechanism and hence eases the network training. Second, it avoids learning redundant feature maps by encouraging feature reuse and hence requires fewer parameters to achieve high performance, which is essential for medical applications with limited training data. Third, we add auxiliary side paths to strengthen the gradient propagation and stabilize the learning process. We demonstrate the effectiveness of DenseVoxNet by comparing it with the state-of-the-art approaches from HVSMR 2016 challenge in conjunction with MICCAI, and our network achieves the best dice coefficient. We also show that our network can achieve better performance than other 3D ConvNets but with fewer parameters.
- It has been an active research issue for many years to construct new bent functions. For $k$ odd with $\gcd(n, k)=1$, and $a\in\mathbb{F}_{3^n}^{*}$, the function $f(x)=Tr(ax^{\frac{3^k+1}{2}})$ is weakly regular bent over $\mathbb{F}_{3^n}$, where $Tr(\cdot):\mathbb{F}_{3^n}\rightarrow\mathbb{F}_3$ is the trace function. This is the well-known Coulter-Matthews bent function. In this paper, we determine the dual function of $f(x)$ completely. As a consequence, we find many classes of ternary bent functions not reported in the literature previously. Such bent functions are not quadratic if $k>1$, and have $\left(\left(\frac{1+\sqrt{5}}{2}\right)^{w+1}-\right.$ $\left.\left(\frac{1-\sqrt{5}}{2}\right)^{w+1}\right)/\sqrt{5}$ or $\left(\left(\frac{1+\sqrt{5}}{2}\right)^{n-w+1}-\right.$ $\left.\left(\frac{1-\sqrt{5}}{2}\right)^{n-w+1}\right)/\sqrt{5}$ trace terms, where $0<w<n$ and $wk\equiv 1\ (\bmod\;n)$. Among them, five special cases are especially interesting: for the case of $k=(n+1)/2$, the number of trace terms is $\left(\left(\frac{1+\sqrt{5}}{2}\right)^{n-1}-\right.$ $\left.\left(\frac{1-\sqrt{5}}{2}\right)^{n-1}\right)/\sqrt{5}$; for the case of $k=n-1$, the number of trace terms is $\left(\left(\frac{1+\sqrt{5}}{2}\right)^n-\right.$ $\left.\left(\frac{1-\sqrt{5}}{2}\right)^n\right)/\sqrt{5}$; for the case of $k=(n-1)/2$, the number of trace terms is $\left(\left(\frac{1+\sqrt{5}}{2}\right)^{n-1}-\right.$ $\left.\left(\frac{1-\sqrt{5}}{2}\right)^{n-1}\right)/\sqrt{5}$; for the case of $(n, k)=(5t+4, 4t+3)$ or $(5t+1, 4t+1)$ with $t\geq 1$, the number of trace terms is 8; and for the case of $(n, k)=(7t+6, 6t+5)$ or $(7t+1, 6t+1)$ with $t\geq 1$, the number of trace terms is 21. As a byproduct, we find new classes of ternary bent functions with only 8 or 21 trace terms.
- Jul 18 2017 cs.CV arXiv:1707.04993v2Visual signals in a video can be divided into content and motion. While content specifies which objects are in the video, motion describes their dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video by mapping a sequence of random vectors to a sequence of video frames. Each random vector consists of a content part and a motion part. While the content part is kept fixed, the motion part is realized as a stochastic process. To learn motion and content decomposition in an unsupervised manner, we introduce a novel adversarial learning scheme utilizing both image and video discriminators. Extensive experimental results on several challenging datasets with qualitative and quantitative comparison to the state-of-the-art approaches, verify effectiveness of the proposed framework. In addition, we show that MoCoGAN allows one to generate videos with same content but different motion as well as videos with different content and same motion.
- Jul 13 2017 cs.CV arXiv:1707.03574v2To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scores. Subsequently, the existing no-reference IQA algorithms, which were 5 opinion-aware approaches viz., NFERM, GMLF, DIIVINE, BRISQUE and BLIINDS2, and 8 opinion-unaware approaches viz., QAC, SISBLIM, NIQE, FISBLIM, CPBD, S3 and Fish_bb, were executed for the evaluation of the THz security image quality. The statistical results demonstrated the superiority of Fish_bb over the other testing IQA approaches for assessing the THz image quality with PLCC (SROCC) values of 0.8925 (-0.8706), and with RMSE value of 0.3993. The linear regression analysis and Bland-Altman plot further verified that the Fish__bb could substitute for the subjective IQA. Nonetheless, for the classification of THz security images, we tended to use S3 as a criterion for ranking THz security image grades because of the relatively low false positive rate in classifying bad THz image quality into acceptable category (24.69%). Interestingly, due to the specific property of THz image, the average pixel intensity gave the best performance than the above complicated IQA algorithms, with the PLCC, SROCC and RMSE of 0.9001, -0.8800 and 0.3857, respectively. This study will help the users such as researchers or security staffs to obtain the THz security images of good quality. Currently, our research group is attempting to make this research more comprehensive.
- Jul 13 2017 cs.CV arXiv:1707.03816v1Visual tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we propose to exploit the rich hierarchical features of deep convolutional neural networks to improve the accuracy and robustness of visual tracking. Deep neural networks trained on object recognition datasets consist of multiple convolutional layers. These layers encode target appearance with different levels of abstraction. For example, the outputs of the last convolutional layers encode the semantic information of targets and such representations are invariant to significant appearance variations. However, their spatial resolutions are too coarse to precisely localize the target. In contrast, features from earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchical features of convolutional layers as a nonlinear counterpart of an image pyramid representation and explicitly exploit these multiple levels of abstraction to represent target objects. Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance. We infer the maximum response of each layer to locate targets in a coarse-to-fine manner. To further handle the issues with scale estimation and target re-detection from tracking failures caused by heavy occlusion or moving out of the view, we conservatively learn another correlation filter that maintains a long-term memory of target appearance as a discriminative classifier. Extensive experimental results on large-scale benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art tracking methods.
- Jul 13 2017 cs.CR arXiv:1707.03611v1This paper is devoted to measuring the security of cyber networks under advanced persistent threats (APTs). First, an APT-based cyber attack-defense process is modeled as an individual-level dynamical system. Second, the dynamic model is shown to exhibit the global stability. On this basis, a new security metric of cyber networks, which is known as the limit security, is defined as the limit expected fraction of compromised nodes in the networks. Next, the influence of different factors on the limit security is illuminated through theoretical analysis and computer simulation. This work helps understand the security of cyber networks under APTs.
- Jul 11 2017 cs.CR arXiv:1707.02437v1To achieve an intended objective, an adversary may conduct an advanced persistent threat (APT) campaign against a targeted cyber network. Before an APT attack is launched, the attacker must maximize the effectiveness of the attack by properly allocating available APT resource. This paper addresses the APT effectiveness maximization problem. First, an APT-related cyber attack-defense process is modeled as an individual-level dynamical system, and the APT effectiveness maximization problem is modeled as a constrained optimization problem. Second, a type of good APT resource allocation schemes, which are known as Genetic-Algorithm-Based (GAB) schemes, are derived by solving the established optimization problem with a well-designed genetic algorithm. Next, the influences of different factors, including the available APT resource per unit time, the attack duration and the network heterogeneity, on the cost effectiveness of a GAB scheme are concluded through computer simulations. Finally, five types of heuristic APT resource allocation schemes are considered, and an experimental comparison among the cost effectiveness of these schemes and GAB schemes is conducted. This work helps understand the pros and cons of APTs.
- Jul 11 2017 cs.CV arXiv:1707.02309v1Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.
- We study a class of linear network coding (LNC) schemes, called \emphcircular-shift LNC, whose encoding operations at intermediate nodes consist of only circular-shifts and bit-wise addition (XOR). Departing from existing literature, we systematically formulate circular-shift LNC as a special type of vector LNC, where the local encoding kernels of an $L$-dimensional circular-shift linear code of degree $\delta$ are summation of at most $\delta$ cyclic-permutation matrices of size $L$. Under this framework, an intrinsic connection between scalar LNC and circular-shift LNC is established. In particular, on a general network, for some block lengths $L$, every scalar linear solution over GF($2^{L-1}$) can induce an $(L-1, L)$-fractional circular-shift linear solution of degree $(L-1)/2$. Specific to multicast networks, an $(L-1, L)$-fractional circular-shift linear solution of arbitrary degree $\delta$ can be efficiently constructed. With different $\delta$, the constructed solution has an interesting encoding-decoding complexity tradeoff, and when $\delta = (L-1)/2$, it requires fewer binary operations for both encoding and decoding processes compared with scalar LNC. While the constructed solution has one-bit redundancy per edge transmission, we show that this is inevitable, and that circular-shift LNC is insufficient to achieve the exact capacity of some multicast networks. Last, both theoretical and numerical analysis imply that with increasing $L$, a randomly constructed circular-shift linear code has comparable linear solvability behavior to a randomly constructed permutation-based linear code, but has much shorter overheads for random coding.
- Jul 05 2017 cs.CV arXiv:1707.01058v2This work make the first attempt to generate articulated human motion sequence from a single image. On the one hand, we utilize paired inputs including human skeleton information as motion embedding and a single human image as appearance reference, to generate novel motion frames, based on the conditional GAN infrastructure. On the other hand, a triplet loss is employed to pursue appearance-smoothness between consecutive frames. As the proposed framework is capable of jointly exploiting the image appearance space and articulated/kinematic motion space, it generates realistic articulated motion sequence, in contrast to most previous video generation methods which yield blurred motion effects. We test our model on two human action datasets including KTH and Human3.6M, and the proposed framework generates very promising results on both datasets.
- Most existing matching algorithms are one-off algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long short-term memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.
- We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
- Jun 08 2017 cs.SI physics.soc-ph arXiv:1706.02035v1This paper addressed the issue of estimating the damage caused by a computer virus. First, an individual-level delayed SIR model capturing the spreading process of a digital virus is derived. Second, the damage inflicted by the virus is modeled as the sum of the economic losses and the cost for developing the antivirus. Next, the impact of different factors, including the delay and the network structure, on the damage is explored by means of computer simulations. Thereby some measures of reducing the damage of a virus are recommended. To our knowledge, this is the first time the antivirus-developing cost is taken into account when estimating the damage of a virus.
- Jun 02 2017 cs.CV arXiv:1706.00212v2Key to automatically generate natural scene images is to properly arrange among various spatial elements, especially in the depth direction. To this end, we introduce a novel depth structure preserving scene image generation network (DSP-GAN), which favors a hierarchical and heterogeneous architecture, for the purpose of depth structure preserving scene generation. The main trunk of the proposed infrastructure is built on a Hawkes point process that models the spatial dependency between different depth layers. Within each layer generative adversarial sub-networks are trained collaboratively to generate realistic scene components, conditioned on the layer information produced by the point process. We experiment our model on a sub-set of SUNdataset with annotated scene images and demonstrate that our models are capable of generating depth-realistic natural scene image.
- May 31 2017 cs.SI arXiv:1705.10618v1Spreading truths and blocking rumors are two typical strategies for inhibiting rumors. In practice, a tradeoff between the two strategies, which is known as the TSRB strategy, may achieve a better cost-effectiveness. This paper is devoted to assessing the effectiveness of the TSRB strategy. For that purpose, an individual-level spreading model (the generic URQT model) capturing the interaction between a rumor and the truth is established. Under the model, a set of criteria for the dying out of a rumor is presented. These criteria capture the combined influence of the basic parameters and the network structures on the effectiveness of the TSRB strategy. Experimental results show that, when the rumor dies out, the dynamics of a simplified URQT model (the linear URQT model) fits well with the actual rumor-truth interacting process. Therefore, the generic URQT model and sometimes the linear URQT model provide a proper basis for assessing the effectiveness of the TSRB strategy.
- May 29 2017 cs.CV arXiv:1705.09467v1Predicting human interaction is challenging as the on-going activity has to be inferred based on a partially observed video. Essentially, a good algorithm should effectively model the mutual influence between the two interacting subjects. Also, only a small region in the scene is discriminative for identifying the on-going interaction. In this work, we propose a relative attention model to explicitly address these difficulties. Built on a tri-coupled deep recurrent structure representing both interacting subjects and global interaction status, the proposed network collects spatio-temporal information from each subject, rectified with global interaction information, yielding effective interaction representation. Moreover, the proposed network also unifies an attention module to assign higher importance to the regions which are relevant to the on-going action. Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy.
- Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, and has lent the event data fundamentally different from the time-series whereby series is indexed with fixed and equal time interval. One expressive mathematical tool for modeling event is point process. The intensity functions of many point processes involve two components: the background and the effect by the history. Due to its inherent spontaneousness, the background can be treated as a time series while the other need to handle the history events. In this paper, we model the background by a Recurrent Neural Network (RNN) with its units aligned with time series indexes while the history effect is modeled by another RNN whose units are aligned with asynchronous events to capture the long-range dynamics. The whole model with event type and timestamp prediction output layers can be trained end-to-end. Our approach takes an RNN perspective to point process, and models its background and history effect. For utility, our method allows a black-box treatment for modeling the intensity which is often a pre-defined parametric form in point processes. Meanwhile end-to-end training opens the venue for reusing existing rich techniques in deep network for point process modeling. We apply our model to the predictive maintenance problem using a log dataset by more than 1000 ATMs from a global bank headquartered in North America.
- May 19 2017 cs.SI arXiv:1705.06604v1Spreading truths is recognized as a feasible strategy for inhibiting rumors. This paper is devoted to assessing the effectiveness of the truth-spreading strategy. An individual-level rumor-truth spreading model (the generic URTU model) is derived. Under the model, two criteria for the termination of a rumor are presented. These criteria capture the influence of the network structures on the effectiveness of the truth-spreading strategy. Extensive simulations show that, when the rumor or the truth terminates, the dynamics of a simplified URTU model (the linear URTU model) fits well with the actual rumor-truth interplay process. Therefore, the generic URTU model forms a theoretical basis for assessing the effectiveness of the truth-spreading strategy for restraining rumors.
- May 16 2017 cs.SI arXiv:1705.04818v1The decentralized patch distribution mechanism holds significant promise as an alternative to its centralized counterpart. For the purpose of accurately evaluating the performance of the decentralized patch distribution mechanism and based on the exact SIPS model that accurately captures the average dynamics of the interaction between viruses and patches, a new virus-patch interacting model, which is known as the generic SIPS model, is proposed. This model subsumes the linear SIPS model. The dynamics of the generic SIPS model is studied comprehensively. In particular, a set of criteria for the final extinction or/and long-term survival of viruses or/and patches are presented. Some conditions for the linear SIPS model to accurately capture the average dynamics of the virus-patch interaction are empirically found. As a consequence, the linear SIPS model can be adopted as a standard model for assessing the performance of the distributed patch distribution mechanism, provided the proper conditions are satisfied.
- Apr 25 2017 cs.SI arXiv:1704.06920v1As compared to the traditional advertising, the word-of-mouth (WOM) communications have striking advantages such as significantly lower cost and rapid delivery; this is especially the case with the popularity of online social networks. This paper addresses the issue of maximizing the overall profit of a WOM marketing campaign. A marketing process with both positive and negative WOM is modeled as a dynamical model knwn as the SIPNS model, and the profit maximization problem is modeled as a constrained optimization problem. The influence of different factors on the dynamics of the SIPNS model is revealed experimentally. Also, the impact of different factors on the expected overall profit of a WOM marketing campaign is uncovered experimentally. On this basis, some promotion strategies are suggested. To our knowledge, this is the first time a WOM marketing campaign is treated this way.
- Apr 25 2017 cs.SI arXiv:1704.06910v1This paper addresses the discount pricing in word-of-mouth (WOM) marketing. A new discount strategy known as the Infection-Based Discount (IBD) strategy is proposed. The basic idea of the IBD strategy lies in that each customer enjoys a discount that is linearly proportional to his/her influence in the WOM network. To evaluate the performance of the IBD strategy, the WOM spreading process is modeled as a dynamic model known as the DPA model, and the performance of the IBD strategy is modeled as a function of the basic discount. Next, the influence of different factors, including the basic discount and the WOM network, on the dynamics of the DPA model is revealed experimentally. Finally, the influence of different factors on the performance of the IBD strategy is uncovered experimentally. On this basis, some promotional measures are recommended.
- Apr 21 2017 cs.CV arXiv:1704.06020v2Despite the promising progress made in recent years, person re-identification (re-ID) remains a challenging task due to the complex variations in human appearances from different camera views. For this challenging problem, a large variety of algorithms have been developed in the fully-supervised setting, requiring access to a large amount of labeled training data. However, the main bottleneck for fully-supervised re-ID is the limited availability of labeled training samples. To address this problem, in this paper, we propose a self-trained subspace learning paradigm for person re-ID which effectively utilizes both labeled and unlabeled data to learn a discriminative subspace where person images across disjoint camera views can be easily matched. The proposed approach first constructs pseudo pairwise relationships among unlabeled persons using the k-nearest neighbors algorithm. Then, with the pseudo pairwise relationships, the unlabeled samples can be easily combined with the labeled samples to learn a discriminative projection by solving an eigenvalue problem. In addition, we refine the pseudo pairwise relationships iteratively, which further improves the learning performance. A multi-kernel embedding strategy is also incorporated into the proposed approach to cope with the non-linearity in person's appearance and explore the complementation of multiple kernels. In this way, the performance of person re-ID can be greatly enhanced when training data are insufficient. Experimental results on six widely-used datasets demonstrate the effectiveness of our approach and its performance can be comparable to the reported results of most state-of-the-art fully-supervised methods while using much fewer labeled data.
- Apr 19 2017 cs.NE arXiv:1704.05174v1Optimization techniques play an important role in several scientific and real-world applications, thus becoming of great interest for the community. As a consequence, a number of open-source libraries are available in the literature, which ends up fostering the research and development of new techniques and applications. In this work, we present a new library for the implementation and fast prototyping of nature-inspired techniques called LibOPT. Currently, the library implements 15 techniques and 112 benchmarking functions, as well as it also supports 11 hypercomplex-based optimization approaches, which makes it one of the first of its kind. We showed how one can easily use and also implement new techniques in LibOPT under the C paradigm. Examples are provided with samples of source-code using benchmarking functions.
- We propose a novel fifth-generation (5G) rapid prototyping (RaPro) system architecture by combining FPGA-privileged modules from a software defined radio (or FPGA-coprocessor) and high-level programming language for advanced algorithms from multi-core general purpose processors. The proposed system architecture exhibits excellent flexibility and scalability in the development of a 5G prototyping system. As a proof of concept, a multi-user full-dimension multiple-input and multiple-output system is established based on the proposed architecture. Experimental results demonstrate the superiority of the proposed architecture in large-scale antenna and wideband communication systems.
- Apr 18 2017 cs.RO arXiv:1704.05016v1Loop closure detection (LCD) is an indispensable part of simultaneous localization and mapping systems (SLAM); it enables robots to produce a consistent map by recognizing previously visited places. When robots operate over extended periods, robustness to viewpoint and condition changes as well as satisfactory real-time performance become essential requirements for a practical LCD system. This paper presents an approach to directly utilize the outputs at the intermediate layer of a pre-trained convolutional neural network (CNN) as image descriptors. The matching location is determined by matching the image sequences through a method called SeqCNNSLAM. The utility of SeqCNNSLAM is comprehensively evaluated in terms of viewpoint and condition invariance. Experiments show that SeqCNNSLAM outperforms state-of-the-art LCD systems, such as SeqSLAM and Change Removal, in most cases. To allow for the real-time performance of SeqCNNSLAM, an acceleration method, A-SeqCNNSLAM, is established. This method exploits the location relationship between the matching images of adjacent images to reduce the matching range of the current image. Results demonstrate that acceleration of 4-6 is achieved with minimal accuracy degradation, and the method's runtime satisfies the real-time demand. To extend the applicability of A-SeqCNNSLAM to new environments, a method called O-SeqCNNSLAM is established for the online adjustment of the parameters of A-SeqCNNSLAM.
- Apr 12 2017 cs.CV arXiv:1704.03152v1In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion model for fusing multiple input modalities that are inherently temporal in nature. Key features of our proposed model include: (i) simultaneous learning of the joint representation and temporal dependencies between modalities, (ii) use of multiple loss terms in the objective function, including a maximum correlation loss term to enhance learning of cross-modal information, and (iii) the use of an attention model to dynamically adjust the contribution of different input modalities to the joint representation. We validate our model via experimentation on two different tasks: video- and sensor-based activity classification, and audio-visual speech recognition. We empirically analyze the contributions of different components of the proposed CorrRNN model, and demonstrate its robustness, effectiveness and state-of-the-art performance on multiple datasets.
- Apr 04 2017 cs.CV arXiv:1704.00036v1Registration involving one or more images containing pathologies is challenging, as standard image similarity measures and spatial transforms cannot account for common changes due to pathologies. Low-rank/Sparse (LRS) decomposition removes pathologies prior to registration; however, LRS is memory-demanding and slow, which limits its use on larger data sets. Additionally, LRS blurs normal tissue regions, which may degrade registration performance. This paper proposes an efficient alternative to LRS: (1) normal tissue appearance is captured by principal component analysis (PCA) and (2) blurring is avoided by an integrated model for pathology removal and image reconstruction. Results on synthetic and BRATS 2015 data demonstrate its utility.
- Apr 03 2017 cs.CV arXiv:1703.10902v1We introduce a deep encoder-decoder architecture for image deformation prediction from multimodal images. Specifically, we design an image-patch-based deep network that jointly (i) learns an image similarity measure and (ii) the relationship between image patches and deformation parameters. While our method can be applied to general image registration formulations, we focus on the Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By predicting the initial momentum of the shooting formulation of LDDMM, we preserve its mathematical properties and drastically reduce the computation time, compared to optimization-based approaches. Furthermore, we create a Bayesian probabilistic version of the network that allows evaluation of registration uncertainty via sampling of the network at test time. We evaluate our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our experiments show that our method generates accurate predictions and that learning the similarity measure leads to more consistent registrations than relying on generic multimodal image similarity measures, such as mutual information. Our approach is an order of magnitude faster than optimization-based LDDMM.
- Apr 03 2017 cs.CV arXiv:1703.10908v4This paper introduces Quicksilver, a fast deformable image registration method. Quicksilver registration for image-pairs works by patch-wise prediction of a deformation model based directly on image appearance. A deep encoder-decoder network is used as the prediction model. While the prediction strategy is general, we focus on predictions for the Large Deformation Diffeomorphic Metric Mapping (LDDMM) model. Specifically, we predict the momentum-parameterization of LDDMM, which facilitates a patch-wise prediction strategy while maintaining the theoretical properties of LDDMM, such as guaranteed diffeomorphic mappings for sufficiently strong regularization. We also provide a probabilistic version of our prediction network which can be sampled during the testing time to calculate uncertainties in the predicted deformations. Finally, we introduce a new correction network which greatly increases the prediction accuracy of an already existing prediction network. We show experimental results for uni-modal atlas-to-image as well as uni- / multi- modal image-to-image registrations. These experiments demonstrate that our method accurately predicts registrations obtained by numerical optimization, is very fast, achieves state-of-the-art registration results on four standard validation datasets, and can jointly learn an image similarity measure. Quicksilver is freely available as an open-source software.
- Mar 27 2017 cs.LG arXiv:1703.08524v1A variety of real-world processes (over networks) produce sequences of data whose complex temporal dynamics need to be studied. More especially, the event timestamps can carry important information about the underlying network dynamics, which otherwise are not available from the time-series evenly sampled from continuous signals. Moreover, in most complex processes, event sequences and evenly-sampled times series data can interact with each other, which renders joint modeling of those two sources of data necessary. To tackle the above problems, in this paper, we utilize the rich framework of (temporal) point processes to model event data and timely update its intensity function by the synergic twin Recurrent Neural Networks (RNNs). In the proposed architecture, the intensity function is synergistically modulated by one RNN with asynchronous events as input and another RNN with time series as input. Furthermore, to enhance the interpretability of the model, the attention mechanism for the neural point process is introduced. The whole model with event type and timestamp prediction output layers can be trained end-to-end and allows a black-box treatment for modeling the intensity. We substantiate the superiority of our model in synthetic data and three real-world benchmark datasets.
- Mar 06 2017 cs.CV arXiv:1703.01025v1In this study, a multi-task deep neural network is proposed for skin lesion analysis. The proposed multi-task learning model solves different tasks (e.g., lesion segmentation and two independent binary lesion classifications) at the same time by exploiting commonalities and differences across tasks. This results in improved learning efficiency and potential prediction accuracy for the task-specific models, when compared to training the individual models separately. The proposed multi-task deep learning model is trained and evaluated on the dermoscopic image sets from the International Skin Imaging Collaboration (ISIC) 2017 Challenge - Skin Lesion Analysis towards Melanoma Detection, which consists of 2000 training samples and 150 evaluation samples. The experimental results show that the proposed multi-task deep learning model achieves promising performances on skin lesion segmentation and classification. The average value of Jaccard index for lesion segmentation is 0.724, while the average values of area under the receiver operating characteristic curve (AUC) on two individual lesion classifications are 0.880 and 0.972, respectively.
- Robot awareness of human actions is an essential research problem in robotics with many important real-world applications, including human-robot collaboration and teaming. Over the past few years, depth sensors have become a standard device widely used by intelligent robots for 3D perception, which can also offer human skeletal data in 3D space. Several methods based on skeletal data were designed to enable robot awareness of human actions with satisfactory accuracy. However, previous methods treated all body parts and features equally important, without the capability to identify discriminative body parts and features. In this paper, we propose a novel simultaneous Feature And Body-part Learning (FABL) approach that simultaneously identifies discriminative body parts and features, and efficiently integrates all available information together to enable real-time robot awareness of human behaviors. We formulate FABL as a regression-like optimization problem with structured sparsity-inducing norms to model interrelationships of body parts and features. We also develop an optimization algorithm to solve the formulated problem, which possesses a theoretical guarantee to find the optimal solution. To evaluate FABL, three experiments were performed using public benchmark datasets, including the MSR Action3D and CAD-60 datasets, as well as a Baxter robot in practical assistive living applications. Experimental results show that our FABL approach obtains a high recognition accuracy with a processing speed of the order-of-magnitude of 10e4 Hz, which makes FABL a promising method to enable real-time robot awareness of human behaviors in practical robotics applications.
- Apprenticeship learning has recently attracted a wide attention due to its capability of allowing robots to learn physical tasks directly from demonstrations provided by human experts. Most previous techniques assumed that the state space is known a priori or employed simple state representations that usually suffer from perceptual aliasing. Different from previous research, we propose a novel approach named Sequence-based Multimodal Apprenticeship Learning (SMAL), which is capable to simultaneously fusing temporal information and multimodal data, and to integrate robot perception with decision making. To evaluate the SMAL approach, experiments are performed using both simulations and real-world robots in the challenging search and rescue scenarios. The empirical study has validated that our SMAL approach can effectively learn plans for robots to make decisions using sequence of multimodal observations. Experimental results have also showed that SMAL outperforms the baseline methods using individual images.
- Jan 24 2017 cs.PL arXiv:1701.06104v1The verification of linearizability -- a key correctness criterion for concurrent objects -- is based on trace refinement whose checking is PSPACE-complete. This paper suggests to use \emphbranching bisimulation instead. Our approach is based on comparing an abstract specification in which object methods are executed atomically to a real object program. Exploiting divergence sensitivity, this also applies to progress properties such as lock-freedom. These results enable the use of \emphpolynomial-time divergence-sensitive branching bisimulation checking techniques for verifying linearizability and progress. We conducted the experiment on concurrent lock-free stacks to validate the efficiency and effectiveness of our methods.
- Jan 24 2017 cs.CV arXiv:1701.06351v1We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either single frame based person to person patch matching, or graph based sequence to sequence matching. We show that a progressive/sequential fusion framework based on long short term memory (LSTM) network aggregates the frame-wise human region representation at each time stamp and yields a sequence level human feature representation. Since LSTM nodes can remember and propagate previously accumulated good features and forget newly input inferior ones, even with simple hand-crafted features, the proposed recurrent feature aggregation network (RFA-Net) is effective in generating highly discriminative sequence level human representations. Extensive experimental results on two person re-identification benchmarks demonstrate that the proposed method performs favorably against state-of-the-art person re-identification methods.
- In this paper, considering multiple interference regions simultaneously, an optimal antenna deployment problem for distributed Multi-Input Multi-Output (MIMO) radar is investigated. The optimal antenna deployment problem is solved by proposing an antenna deployment method based on Multi-Objective Particle Swarm Optimization (MOPSO). Firstly, we construct a multi-objective optimization problem for MIMO radar antenna deployment by choosing the interference power densities of different regions as objective functions. Then, to obtain the optimal deployment result without wasting time and computational resources, an iteration convergence criterion based on interval distance is proposed. The iteration convergence criterion can be used to stop the MOPSO optimization process efficiently when the optimal antenna deployment algorithm reaches the desired convergence level. Finally, numerical results are provided to verify the validity of the proposed algorithm.
- Mobile Crowdsourcing is a promising service paradigm utilizing ubiquitous mobile devices to facilitate largescale crowdsourcing tasks (e.g. urban sensing and collaborative computing). Many applications in this domain require Device-to-Device (D2D) communications between participating devices for interactive operations such as task collaborations and file transmissions. Considering the private participating devices and their opportunistic encountering behaviors, it is highly desired to establish secure and trustworthy D2D connections in a fast and autonomous way, which is vital for implementing practical Mobile Crowdsourcing Systems (MCSs). In this paper, we develop an efficient scheme, Trustworthy Device Pairing (TDP), which achieves user-transparent secure D2D connections and reliable peer device selections for trustworthy D2D communications. Through rigorous analysis, we demonstrate the effectiveness and security intensity of TDP in theory. The performance of TDP is evaluated based on both real-world prototype experiments and extensive trace-driven simulations. Evaluation results verify our theoretical analysis and show that TDP significantly outperforms existing approaches in terms of pairing speed, stability, and security.
- Mobile Crowdsensing is a promising paradigm for ubiquitous sensing, which explores the tremendous data collected by mobile smart devices with prominent spatial-temporal coverage. As a fundamental property of Mobile Crowdsensing Systems, temporally recruited mobile users can provide agile, fine-grained, and economical sensing labors, however their self-interest cannot guarantee the quality of the sensing data, even when there is a fair return. Therefore, a mechanism is required for the system server to recruit well-behaving users for credible sensing, and to stimulate and reward more contributive users based on sensing truth discovery to further increase credible reporting. In this paper, we develop a novel Cheating-Resilient Incentive (CRI) scheme for Mobile Crowdsensing Systems, which achieves credibility-driven user recruitment and payback maximization for honest users with quality data. Via theoretical analysis, we demonstrate the correctness of our design. The performance of our scheme is evaluated based on extensive realworld trace-driven simulations. Our evaluation results show that our scheme is proven to be effective in terms of both guaranteeing sensing accuracy and resisting potential cheating behaviors, as demonstrated in practical scenarios, as well as those that are intentionally harsher.
- Dec 20 2016 cs.CV arXiv:1612.05890v1Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception.
- Dec 14 2016 cs.CR arXiv:1612.04350v2High-dimensional crowdsourced data collected from a large number of users produces rich knowledge for our society. However, it also brings unprecedented privacy threats to participants. Local privacy, a variant of differential privacy, is proposed as a means to eliminate the privacy concern. Unfortunately, achieving local privacy on high-dimensional crowdsourced data raises great challenges on both efficiency and effectiveness. Here, based on EM and Lasso regression, we propose efficient multi-dimensional joint distribution estimation algorithms with local privacy. Then, we develop a Locally privacy-preserving high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, both correlations and joint distribution among multiple attributes can be identified to reduce the dimension of crowdsourced data, thus achieving both efficiency and effectiveness in locally private high-dimensional data publication. Extensive experiments on real-world datasets demonstrated that the efficiency of our multivariate distribution estimation scheme and confirm the effectiveness of our LoPub scheme in generating approximate datasets with local privacy.
- Dec 07 2016 cs.CV arXiv:1612.01655v1Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the fitting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively refine the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.
- Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.
- The number balancing (NBP) problem is the following: given real numbers $a_1,\ldots,a_n \in [0,1]$, find two disjoint subsets $I_1,I_2 \subseteq [n]$ so that the difference $|\sum_{i \in I_1}a_i - \sum_{i \in I_2}a_i|$ of their sums is minimized. An application of the pigeonhole principle shows that there is always a solution where the difference is at most $O(\frac{\sqrt{n}}{2^n})$. Finding the minimum, however, is NP-hard. In polynomial time,the differencing algorithm by Karmarkar and Karp from 1982 can produce a solution with difference at most $n^{-\Theta(\log n)}$, but no further improvement has been made since then. In this paper, we show a relationship between NBP and Minkowski's Theorem. First we show that an approximate oracle for Minkowski's Theorem gives an approximate NBP oracle. Perhaps more surprisingly, we show that an approximate NBP oracle gives an approximate Minkowski oracle. In particular, we prove that any polynomial time algorithm that guarantees a solution of difference at most $2^{\sqrt{n}} / 2^{n}$ would give a polynomial approximation for Minkowski as well as a polynomial factor approximation algorithm for the Shortest Vector Problem.
- Nov 23 2016 cs.CV arXiv:1611.07385v1Physical library collections are valuable and long standing resources for knowledge and learning. However, managing books in a large bookshelf and finding books on it often leads to tedious manual work, especially for large book collections where books might be missing or misplaced. Recently, deep neural models, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have achieved great success for scene text detection and recognition. Motivated by these recent successes, we aim to investigate their viability in facilitating book management, a task that introduces further challenges including large amounts of cluttered scene text, distortion, and varied lighting conditions. In this paper, we present a library inventory building and retrieval system based on scene text reading methods. We specifically design our scene text recognition model using rich supervision to accelerate training and achieve state-of-the-art performance on several benchmark datasets. Our proposed system has the potential to greatly reduce the amount of human labor required in managing book inventories as well as the space needed to store book information.
- This paper tests the hypothesis that distinctive feature classifiers anchored at phonetic landmarks can be transferred cross-lingually without loss of accuracy. Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN). All detectors are trained on English data (TIMIT),and tested on English, Turkish, and Spanish (performance measured using F1 and accuracy). Experiments demonstrate that manual features outperform all MFCC classifiers, while CNNfeatures outperform both. MFCC-based classifiers suffer an F1reduction of 16% absolute when generalized from English to other languages. Manual features suffer only a 5% F1 reduction,and CNN features actually perform better in Turkish and Span-ish than in the training language, demonstrating that features capable of representing long-term spectral dynamics (CNN and landmark-based features) are able to generalize cross-lingually with little or no loss of accuracy
- Nov 01 2016 cs.SE arXiv:1610.09405v1Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates. We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6x higher performance and 8x lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5x higher performance with 12x lower energy compared to the K1's 192-core GPU.
- Linearizability and progress properties are key correctness notions for concurrent objects. However, model checking linearizability has suffered from the PSPACE-hardness of the trace inclusion problem. This paper proposes to exploit branching bisimulation, a fundamental semantic equivalence relation developed for process algebras which can be computed efficiently, in checking these properties. A quotient construction is provided which results in huge state space reductions. We confirm the advantages of the proposed approach on more than a dozen benchmark problems.
- Sep 22 2016 cs.CV arXiv:1609.06371v3The robust estimator presented in this paper processes each structure independently. The scales of the structures are estimated adaptively and no threshold is involved in spite of different objective functions. The user has to specify only the number of elemental subsets for random sampling. After classifying all the input data, the segmented structures are sorted by their strengths and the strongest inlier structures come out at the top. Like any robust estimators, this algorithm also has limitations which are described in detail. Several synthetic and real examples are presented to illustrate every aspect of the algorithm.
- Spurred by the dramatic mobile IP growth and the emerging Internet of Things (IoT) and cloud-based applications, wireless networking is witnessing a paradigm shift. By fully exploiting the spatial degrees of freedom, the massive multipleinput- multiple-output (MIMO) technology promises significant gains in both data rates and link reliability. This paper presents a time-division duplex (TDD)-based 128-antenna massive MIMO prototyping system designed to operate on a 20 MHz bandwidth. Up to twelve single-antenna users can be served by the designed system at the same time. System model is provided and link-level simulation corresponding to our practical TDDbased massive MIMO prototyping system is conducted to validate our design and performance of the algorithms. Based on the system hardware design demonstrated in this paper, both uplink real-time video and downlink data transmissions are realized, and the experiment results show that 268.8 Mbps rate was achieved for eight single-antenna users using QPSK modulation. The maximum spectral efficiency of the designed system will be 80.64 bit/s/Hz by twelve single-antenna users with 256-QAM modulation.
- Jul 26 2016 cs.SI physics.soc-ph arXiv:1607.06819v1Social media play an increasingly important role in political communication. Various studies investigated how individuals adopt social media for political discussion, to share their views about politics and policy, or to mobilize and protest against social issues. Yet, little attention has been devoted to the main actors of political discussions: the politicians. In this paper, we explore the topics of discussion of U.S. President Obama and the 50 U.S. State Governors using Twitter data and agenda-setting theory as a tool to describe the patterns of daily political discussion, uncovering the main topics of attention and interest of these actors. We examine over one hundred thousand tweets produced by these politicians and identify seven macro-topics of conversation, finding that Twitter represents a particularly appealing vehicle of conversation for American opposition politicians. We highlight the main motifs of political conversation of the two parties, discovering that Republican and Democrat Governors are more or less similarly active on Twitter but exhibit different styles of communication. Finally, by reconstructing the networks of occurrences of Governors' hashtags and keywords related to political issues, we observe that Republicans form a tight core, with a stronger shared agenda on many issues of discussion.