results for au:Li_W in:cs

- Apr 24 2018 cs.DS arXiv:1804.08178v1We design new algorithms for maximizing a monotone non-negative submodular function under various constraints, which improve the state-of-the-art in time complexity and/or performance guarantee. We first investigate the cardinality constrained submodular maximization problem that has been widely studied for about four decades. We design an $(1-\frac{1}{e}-\varepsilon)$-approximation algorithm that makes $O(n\cdot \max \{\varepsilon^{-1},\log\log k \})$ queries. To the best of our knowledge, this is the fastest known algorithm. We further answer the open problem on finding a lower bound on the number of queries. We show that, no (randomized) algorithm can achieve a ratio better than $(\frac{1}{2}+\Theta(1))$ with $o(\frac{n}{\log n})$ queries. The acceleration above is achieved by our \emphAdaptive Decreasing Threshold (ADT) algorithm. Based on ADT, we study the $p$-system and $d$ knapsack constrained maximization problem. We show that an $(1/(p+\frac{7}{4}d+1)-\varepsilon)$-approximate solution can be computed via $O(\frac{n}{\varepsilon}\log \frac{n}{\varepsilon}\max\{\log \frac{1}{\varepsilon},\log\log n\})$ queries. Note that it improves the state of the art in both time complexity and approximation ratio. We also show how to improve the ratio for a single knapsack constraint via $O(n\cdot \max \{\varepsilon^{-1},\log\log k \})$ queries. For maximizing a submodular function with curvature $\kappa$ under matroid constraint, we show an $(1-\frac{\kappa}{e}-\varepsilon)$-approximate algorithm that uses $\tilde{O}(nk)$ value oracle queries. Our ADT could be utilized to obtain faster algorithms in other problems. To prove our results, we introduce a general characterization between randomized complexity and deterministic complexity of approximation algorithms that could be used in other problems and may be interesting in its own right.
- Blockchain stores information into a chain of blocks, whose integrity is usually guaranteed by Proof of Work (PoW). In many blockchain applications (including cryptocurrencies), users compete with each other to win the ownership of the blocks, a process commonly referred as mining. Mining activities consume huge amount of power, while the outcome appears to be useless besides validating a block. Here we discuss the requirements of designing a new PoW algorithm. We also propose a PoW scheme to help solve high-dimension, non-linear optimization problems. The revised scheme enables us to address difficult scientific questions as a byproduct of mining.
- Apr 24 2018 cs.RO arXiv:1804.07906v1Monocular camera systems are prevailing in intelligent transportation systems, but by far they have rarely been used for dimensional purposes such as to accurately estimate the localization information of a vehicle. In this paper, we show that this capability can be realized. By integrating a series of advanced computer vision techniques including foreground extraction, edge and line detection, etc., and by utilizing deep learning networks for fine-grained vehicle model classification, we developed an algorithm which can estimate vehicles location (position, orientation and boundaries) within the environment down to 3.79 percent position accuracy and 2.5 degrees orientation accuracy. With this enhancement, current massive surveillance camera systems can potentially play the role of e-traffic police and trigger many new intelligent transportation applications, for example, to guide vehicles for parking or even for autonomous driving.
- We propose a dynamic boosted ensemble learning method based on random forest (DBRF), a novel ensemble algorithm that incorporates the notion of hard example mining into Random Forest (RF) and thus combines the high accuracy of Boosting algorithm with the strong generalization of Bagging algorithm. Specifically, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples and noise examples from training data, we evolve the random forest to focus on hard examples dynamically so as to learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate predictions, thus making RF deep. We also propose to use evolution mechanism and smart iteration mechanism to improve the performance of the model. DBRF outperforms RF on three UCI datasets and achieved state-of-the-art results compared to other deep models. Moreover, we show that DBRF is also a new way of sampling and can be very useful when learning from unbalanced data.
- Apr 19 2018 cs.CV arXiv:1804.06423v1This work presents a deep object co-segmentation (DOCS) approach for segmenting common objects of the same class within a pair of images. This means that the method learns to ignore common, or uncommon, background stuff and focuses on objects. If multiple object classes are presented in the image pair, they are jointly extracted as foreground. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. To train our model, we compile a large object co-segmentation dataset consisting of image pairs from the PASCAL VOC dataset with common objects masks. We evaluate our approach on commonly used datasets for co-segmentation tasks and observe that our approach consistently outperforms competing methods, for both seen and unseen object classes.
- Apr 18 2018 cs.CV arXiv:1804.05984v1Detecting camouflaged moving foreground objects has been known to be difficult due to the similarity between the foreground objects and the background. Conventional methods cannot distinguish the foreground from background due to the small differences between them and thus suffer from under-detection of the camouflaged foreground objects. In this paper, we present a fusion framework to address this problem in the wavelet domain. We first show that the small differences in the image domain can be highlighted in certain wavelet bands. Then the likelihood of each wavelet coefficient being foreground is estimated by formulating foreground and background models for each wavelet band. The proposed framework effectively aggregates the likelihoods from different wavelet bands based on the characteristics of the wavelet transform. Experimental results demonstrated that the proposed method significantly outperformed existing methods in detecting camouflaged foreground objects. Specifically, the average F-measure for the proposed algorithm was 0.87, compared to 0.71 to 0.8 for the other state-of-the-art methods.
- Apr 17 2018 cs.SI arXiv:1804.05143v1Emojis have been widely used in textual communications as a new way to convey nonverbal cues. An interesting observation is the various emoji usage patterns among different users. In this paper, we investigate the correlation between user personality traits and their emoji usage patterns, particularly on overall amounts and specific preferences. To achieve this goal, we build a large Twitter dataset which includes 352,245 users and over 1.13 billion tweets associated with calculated personality traits and emoji usage patterns. Our correlation and emoji prediction results provide insights into the power of diverse personalities that lead to varies emoji usage patterns as well as its potential in emoji recommendation tasks.
- Apr 12 2018 cs.LO arXiv:1804.03922v2In complex analysis, the winding number measures the number of times a path (counter-clockwise) winds around a point, while the Cauchy index can approximate how the path winds. We formalise this approximation in the Isabelle theorem prover, and provide a tactic to evaluate winding numbers through Cauchy indices. By further combining this approximation with the argument principle, we are able to make use of remainder sequences to effectively count the number of complex roots of a polynomial within some domains, such as a rectangle box and a half-plane.
- Apr 10 2018 cs.DS arXiv:1804.02801v1The $P_2$-packing problem asks for whether a graph contains $k$ vertex-disjoint paths each of length two. We continue the study of its kernelization algorithms, and develop a $5k$-vertex kernel.
- Apr 09 2018 cs.CV arXiv:1804.02201v1Manifold theory has been the central concept of many learning methods. However, learning modern CNNs with manifold structures has not raised due attention, mainly because of the inconvenience of imposing manifold structures onto the architecture of the CNNs. In this paper we present ManifoldNet, a novel method to encourage learning of manifold-aware representations. Our approach segments the input manifold into a set of fragments. By assigning the corresponding segmentation id as a pseudo label to every sample, we convert the problem of preserving the local manifold structure into a point-wise classification task. Due to its unsupervised nature, the segmentation tends to be noisy. We mitigate this by introducing ensemble manifold segmentation (EMS). EMS accounts for the manifold structure by dividing the training data into an ensemble of classification training sets that contain samples of local proximity. CNNs are trained on these ensembles under a multi-task learning framework to conform to the manifold. ManifoldNet can be trained with only the pseudo labels or together with task-specific labels. We evaluate ManifoldNet on two different tasks: network imitation (distillation) and semi-supervised learning. Our experiments show that the manifold structures are effectively utilized for both unsupervised and semi-supervised learning.
- In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research's key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning's applications in many areas of artificial intelligence, including voice, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions.
- Apr 05 2018 cs.CV arXiv:1804.01194v2This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.
- Apr 02 2018 cs.SC arXiv:1803.11301v1We show a new algorithm and its implementation for multiplying bit-polynomials of large degrees. The algorithm is based on evaluating polynomials at a specific set comprising a natural set for evaluation with additive FFT and a high order element under Frobenius map of $\mathbb{F}_{2}$. With the high order element, we can derive more values of the polynomials under Frobenius map. Besides, we also adapt the additive FFT to efficiently evaluate polynomials at the set with an encoding process. For the implementation, we reorder the computations in the additive FFT for reducing the number of memory writes and hiding the latency for reads. The algebraic operations, including field multiplication, bit-matrix transpose, and bit-matrix multiplication, are implemented with efficient SIMD instructions. As a result, we effect a software of best known efficiency, shown in our experiments.
- Mar 28 2018 cs.CV arXiv:1803.09786v1Most existing person re-identification (re-id) methods require supervised model learning from a separate large set of pairwise labelled training data for every single camera pair. This significantly limits their scalability and usability in real-world large scale deployments with the need for performing re-id across many camera views. To address this scalability problem, we develop a novel deep learning method for transferring the labelled information of an existing dataset to a new unseen (unlabelled) target domain for person re-id without any supervised learning in the target domain. Specifically, we introduce an Transferable Joint Attribute-Identity Deep Learning (TJ-AIDL) for simultaneously learning an attribute-semantic and identitydiscriminative feature representation space transferrable to any new (unseen) target domain for re-id tasks without the need for collecting new labelled training data from the target domain (i.e. unsupervised learning in the target domain). Extensive comparative evaluations validate the superiority of this new TJ-AIDL model for unsupervised person re-id over a wide range of state-of-the-art methods on four challenging benchmarks including VIPeR, PRID, Market-1501, and DukeMTMC-ReID.
- Mar 28 2018 cs.CV arXiv:1803.09210v2This paper proposes an importance weighted adversarial nets-based method for unsupervised domain adaptation, specific for partial domain adaptation where the target domain has less number of classes compared to the source domain. Previous domain adaptation methods generally assume the identical label spaces, such that reducing the distribution divergence leads to feasible knowledge transfer. However, such an assumption is no longer valid in a more realistic scenario that requires adaptation from a larger and more diverse source domain to a smaller target domain with less number of classes. This paper extends the adversarial nets-based domain adaptation and proposes a novel adversarial nets-based partial domain adaptation method to identify the source samples that are potentially from the outlier classes and, at the same time, reduce the shift of shared classes between domains.
- Mar 28 2018 cs.CV arXiv:1803.09208v1This paper presents a novel multi-task learning-based method for unsupervised domain adaptation. Specifically, the source and target domain classifiers are jointly learned by considering the geometry of target domain and the divergence between the source and target domains based on the concept of multi-task learning. Two novel algorithms are proposed upon the method using Regularized Least Squares and Support Vector Machines respectively. Experiments on both synthetic and real world cross domain recognition tasks have shown that the proposed methods outperform several state-of-the-art domain adaptation methods.
- Mar 23 2018 cs.RO arXiv:1803.08478v1This paper presents a versatile robotic system for sewing 3D structured object. Leveraging on using a customized robotic sewing device and closed-loop visual servoing control, an all-in-one solution for sewing personalized stent graft is demonstrated. Stitch size planning and automatic knot tying are proposed as the two key functions of the system. By using effective stitch size planning, sub-millimetre sewing accuracy is achieved for stitch sizes ranging from 2mm to 5mm. In addition, a thread manipulator for thread management and tension control is also proposed to perform successive knot tying to secure each stitch. Detailed laboratory experiments have been performed to access the proposed instruments and allied algorithms. The proposed framework can be generalised to a wide range of applications including 3D industrial sewing, as well as transferred to other clinical areas such as surgical suturing.
- Mar 20 2018 cs.CV arXiv:1803.06524v2Deep convolutional neural networks (CNNs) have greatly improved the Face Recognition (FR) performance in recent years. Almost all CNNs in FR are trained on the carefully labeled datasets containing plenty of identities. However, such high-quality datasets are very expensive to collect, which restricts many researchers to achieve state-of-the-art performance. In this paper, we propose a framework, called SeqFace, for learning discriminative face features. Besides a traditional identity training dataset, the designed SeqFace can train CNNs by using an additional dataset which includes a large number of face sequences collected from videos. Moreover, the label smoothing regularization (LSR) and a new proposed discriminative sequence agent (DSA) loss are employed to enhance discrimination power of deep face features via making full use of the sequence data. Our method achieves excellent performance on Labeled Faces in the Wild (LFW), YouTube Faces (YTF), only with a single ResNet. The code and models are publicly available on-line (https://github.com/huangyangyu/SeqFace).
- We study a natural Wasserstein gradient flow on manifolds of probability distributions with discrete sample spaces. We derive the Riemannian structure for the probability simplex from the dynamical formulation of the Wasserstein distance on a weighted graph. We pull back the geometric structure to the parameter space of any given probability model, which allows us to define a natural gradient flow there. In contrast to the natural Fisher-Rao gradient, the natural Wasserstein gradient incorporates a ground metric on sample space. We discuss implementations following the forward and backward Euler methods. We illustrate the analysis on elementary exponential family examples.
- Mar 15 2018 cs.NE arXiv:1803.05006v1In this paper, we propose a new scheme for modelling the diverse behavior of neurons. We introduce the conditional activation, in which a neurons activation function is dynamically modified by a control signal. We apply this method to recreate behavior of special neurons existing in the human auditory and visual system. A heterogeneous multilayered perceptron (MLP) incorporating the developed models demonstrates simultaneous improvement in learning speed and performance across a various number of hidden units and layers, compared to a homogeneous network composed of the conventional neuron model. For similar performance, the proposed model lowers the memory for storing network parameters significantly.
- Mar 14 2018 cs.CV arXiv:1803.04831v1Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM.
- To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.
- Mar 09 2018 cs.CV arXiv:1803.03243v1Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
- Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.
- Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.
- Mar 06 2018 cs.CL arXiv:1803.01557v1During the long time of development, Chinese language has evolved a great deal. Native speakers now have difficulty in reading sentences written in ancient Chinese. In this paper, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs out of the abundant passage-aligned corpus. With this method, we build a large parallel corpus. We propose to apply the sequence to sequence model to automatically transfer between ancient and contemporary Chinese sentences. Experiments show that both our alignment and transfer method can produce very good result except for some circumstances that even human translators can make mistakes without background knowledge.
- Geometric matrix completion~(GMC) has been proposed for recommendation by integrating the relationship~(link) graphs among users/items into matrix completion~(MC) . Traditional \mboxGMC methods typically adopt graph regularization to impose smoothness priors for \mboxMC. Recently, geometric deep learning on graphs~(\mboxGDLG) is proposed to solve the GMC problem, showing better performance than existing GMC methods including traditional graph regularization based methods. To the best of our knowledge, there exists only one GDLG method for GMC, which is called \mboxRMGCNN. RMGCNN combines graph convolutional network~(GCN) and recurrent neural network~(RNN) together for GMC. In the original work of RMGCNN, RMGCNN demonstrates better performance than pure GCN-based method. In this paper, we propose a new \mboxGMC method, called \underlineconvolutional \underlinegeometric \underlinematrix \underlinecompletion~(CGMC), for recommendation with graphs among users/items. CGMC is a pure GCN-based method with a newly designed graph convolutional network. Experimental results on real datasets show that CGMC can outperform other state-of-the-art methods including RMGCNN.
- Mar 01 2018 cs.CV arXiv:1802.10316v1The process of determining which disease or condition explains a person's symptoms and signs can be very complicated and may be inaccurate in some cases. The general belief is that diagnosing diseases relies on doctors' keen intuition, rich experience and professional equipment. In this work, we employ ideas from recent advances in plantar pressure research and from the powerful capacity of the convolutional neural network for learning representations. Here, we propose a model using convolutional neural network based on plantar pressure for medical diagnosis. Our model learns a network that maps plantar pressure data to its corresponding medical diagnostic label. We then apply our model to make the medical diagnosis on datasets we collected from cooperative hospital and achieve an accuracy of 98.36%. We demonstrate that the model base on the convolutional neural network is competitive in medical diagnosis.
- Feb 23 2018 cs.CV arXiv:1802.08122v1Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.
- We consider a distributed resource allocation problem in a multicarrier multi-user MIMO network where multiple transmitter-receiver links interfere among each other. Each user aims to maximize its own energy efficiency by adjusting its signal covariance matrix under a predefined power constraint. This problem has been addressed recently by applying a matrix exponential learning (MXL) algorithm which has a very appealing convergence rate. In this learning algorithm, however, each transmitter must know an estimate of the gradient matrix of the user utility. The knowledge of the gradient matrix at the transmitters incurs a high signaling overhead especially that this matrix size increases with the number of antennas and subcarriers. In this paper, we therefore investigate two strategies in order to decrease the informational exchange per iteration of the algorithm. In the first strategy, each user sends at each iteration part of the elements of the gradient matrix with respect to a certain probability. In the second strategy, each user feeds back sporadically the whole gradient matrix. We focus on the analysis of the convergence of the MXL algorithm to Nash Equilibrium (NE) under these two strategies. Upper bounds of the average convergence rate are obtained in both situations with general step-size setting, from which we can clearly see the impact of the incompleteness of the feedback information. We prove that the algorithm can still converge to NE and the convergence rate are not seriously affected. Simulation results further corroborate our claim and show that, in terms of convergence rate, MXL performs better under the second proposed strategy.
- Feb 16 2018 cs.GT arXiv:1802.05294v2In this paper, we focus on how to dynamically allocate a divisible resource fairly among n players who arrive and depart over time. The players may have general heterogeneous valuations over the resource. It is known that the exact envy-free and proportional allocations may not exist in the dynamic setting [Walsh, 2011]. Thus, we will study to what extent we can guarantee the fairness in the dynamic setting. We first design two algorithms which are O(log n)-proportional and O(n)-envy-free for the setting with general valuations, and by constructing the adversary instances such that all dynamic algorithms must be at least Omega(1)-proportional and Omega(n/log n)-envy-free, we show that the bounds are tight up to a logarithmic factor. Moreover, we introduce the setting where the players' valuations are uniform on the resource but with different demands, which generalize the setting of [Friedman et al., 2015]. We prove an O(log n) upper bound and a tight lower bound for this case.
- Feb 13 2018 cs.NE arXiv:1802.03608v1This paper proposes a novel constraint-handling mechanism named angle-based constrained dominance principle (ACDP) embedded in a decomposition-based multi-objective evolutionary algorithm (MOEA/D) to solve constrained multi-objective optimization problems (CMOPs). To maintain the diversity of the working population, ACDP utilizes the information of the angle of solutions to adjust the dominance relation of solutions during the evolutionary process. This paper uses 14 benchmark instances to evaluate the performance of the MOEA/D with ACDP (MOEA/D-ACDP). Additionally, an engineering optimization problem (which is I-beam optimization problem) is optimized. The proposed MOEA/D-ACDP, and four other decomposition-based CMOEAs, including C-MOEA/D, MOEA/D-CDP, MOEA/D-Epsilon and MOEA/D-SR are tested by the above benchmarks and the engineering application. The experimental results manifest that MOEA/D-ACDP is significantly better than the other four CMOEAs on these test instances and the real-world case, which indicates that ACDP is more effective for solving CMOPs.
- In ISSAC 2017, van der Hoeven and Larrieu showed that evaluating a polynomial P in GF(q)[x] of degree <n at all n-th roots of unity in GF($q^d$) can essentially be computed d-time faster than evaluating Q in GF($q^d$)[x] at all these roots, assuming GF($q^d$) contains a primitive n-th root of unity. Termed the Frobenius FFT, this discovery has a profound impact on polynomial multiplication, especially for multiplying binary polynomials, which finds ample application in coding theory and cryptography. In this paper, we show that the theory of Frobenius FFT beautifully generalizes to a class of additive FFT developed by Cantor and Gao-Mateer. Furthermore, we demonstrate the power of Frobenius additive FFT for q=2: to multiply two binary polynomials whose product is of degree <256, the new technique requires only 29,005 bit operations, while the best result previously reported was 33,397. To the best of our knowledge, this is the first time that FFT-based multiplication outperforms Karatsuba and the like at such a low degree in terms of bit-operation count.
- Linear classification has been widely used in many high-dimensional applications like text classification. To perform linear classification for large-scale tasks, we often need to design distributed learning methods on a cluster of multiple machines. In this paper, we propose a new distributed learning method, called feature-distributed stochastic variance reduced gradient (FD-SVRG) for high-dimensional linear classification. Unlike most existing distributed learning methods which are instance-distributed, FD-SVRG is feature-distributed. FD-SVRG has lower communication cost than other instance-distributed methods when the data dimensionality is larger than the number of data instances. Experimental results on real data demonstrate that FD-SVRG can outperform other state-of-the-art distributed methods for high-dimensional linear classification in terms of both communication cost and wall-clock time, when the dimensionality is larger than the number of instances in training data.
- Feb 08 2018 cs.CV arXiv:1802.02208v1Automated pavement crack detection is a challenging task that has been researched for decades due to the complicated pavement conditions in real world. In this paper, a supervised method based on deep learning is proposed, which has the capability of dealing with different pavement conditions. Specifically, a convolutional neural network (CNN) is used to learn the structure of the cracks from raw images, without any preprocessing. Small patches are extracted from crack images as inputs to generate a large training database, a CNN is trained and crack detection is modeled as a multi-label classification problem. Typically, crack pixels are much fewer than non-crack pixels. To deal with the problem with severely imbalanced data, a strategy with modifying the ratio of positive to negative samples is proposed. The method is tested on two public databases and compared with five existing methods. Experimental results show that it outperforms the other methods.
- Feb 06 2018 cs.CL arXiv:1802.01345v2Existing text generation methods tend to produce repeated and "boring" expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeated text and high reward for "novel" text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our method can generate substantially more diverse and informative text than existing baseline methods. The code is available at https://github.com/lancopku/DPGAN
- Feb 06 2018 cs.SI arXiv:1802.01288v1With invaluable theoretical and practical benefits, the problem of partitioning networks for community structures has attracted significant research attention in scientific and engineering disciplines. In literature, Newman's modularity measure is routinely applied to quantify the quality of a given partition, and thereby maximizing the measure provides a principled way of detecting communities in networks. Unfortunately, the exact optimization of the measure is computationally NP-complete and only applicable to very small networks. Approximation approaches have to be sought to scale to large networks. To address the computational issue, we proposed a new method to identify the partition decisions. Coupled with an iterative rounding strategy and a fast constrained power method, our work achieves tight and effective spectral relaxations. The proposed method was evaluated thoroughly on both real and synthetic networks. Compared with state-of-the-art approaches, the method obtained comparable, if not better, qualities. Meanwhile, it is highly suitable for parallel execution and reported a nearly linear improvement in running speed when increasing the number of computing nodes, which thereby provides a practical tool for partitioning very large networks.
- Object recognition and sorting plays a key role in robotic systems, especially for the autonomous robots to implement object sorting tasks in a warehouse. In this paper, we present a global texture-shape 3D feature descriptor which can be utilized in a sorting system, and this system can perform object sorting tasks well. Our proposed descriptor stems from the clustered viewpoint feature histogram (CVFH). As the CVFH feature descriptor relies on the geometrical information of the whole 3D object surface only, it can not perform well on the objects with similar geometrical information. Therefore, we extend the CVFH descriptor with texture information to generate a new global 3D feature descriptor. Then this proposed descriptor is tested for sorting 3D objects by using multi-class support vector machines (SVM). It is also evaluated by a public 3D image dataset and real scenes. The results of evaluation show that our proposed descriptor have a good performance for object recognition compared to the CVFH. Then we leverage this proposed descriptor in the proposed sorting system, showing that the proposed descriptor helps the sorting system implement the object detection, the object recognition and object grasping tasks well.
- Feb 01 2018 cs.RO arXiv:1801.10599v1A new kind of six degree-of-freedom teaching manipulator without actuators is designed, for recording and conveniently setting a trajectory of an industrial robot. The device requires good gravity balance and operating force performance to ensure being controlled easily and fluently. In this paper, we propose a process for modeling the manipulator and then the model is used to formulate a multi-objective optimization problem to optimize the design of the testing manipulator. Three objectives, including total mass of the device, gravity balancing and operating force performance are analyzed and defined. A popular non-dominated sorting genetic algorithm (NSGA-II-CDP) is used to solve the optimization problem. The obtained solutions all outperform the design of a human expert. To extract design knowledge, an innovization study is performed to establish meaningful implicit relationship between the objective space and the decision space, which can be reused by the designer in future design process.
- Jan 31 2018 cs.CV arXiv:1801.10111v1Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.
- Jan 30 2018 cs.CL arXiv:1801.09031v1Using low dimensional vector space to represent words has been very effective in many NLP tasks. However, it doesn't work well when faced with the problem of rare and unseen words. In this paper, we propose to leverage the knowledge in semantic dictionary in combination with some morphological information to build an enhanced vector space. We get an improvement of 2.3% over the state-of-the-art Heidel Time system in temporal expression recognition, and obtain a large gain in other name entity recognition (NER) tasks. The semantic dictionary Hownet alone also shows promising results in computing lexical similarity.
- Jan 30 2018 cs.CL arXiv:1801.09030v1Traditional Chinese Medicine (TCM) is an influential form of medical treatment in China and surrounding areas. In this paper, we propose a TCM prescription generation task that aims to automatically generate a herbal medicine prescription based on textual symptom descriptions. Sequence-to-sequence (seq2seq) model has been successful in dealing with conditional sequence generation tasks like dialogue generation. We explore a potential end-to-end solution to the TCM prescription generation task using seq2seq models. However, experiments show that directly applying seq2seq model leads to unfruitful results due to the severe repetition problem. To solve the problem, we propose a novel architecture for the decoder with masking and coverage mechanism. The experimental results demonstrate that the proposed method is effective in diversifying the outputs, which significantly improves the F1 score by nearly 10 points (8.34 on test set 1 and 10.23 on test set 2).
- Jan 25 2018 cs.CR arXiv:1801.07983v1Many millions of users routinely use their Google, Facebook and Microsoft accounts to log in to websites supporting OAuth 2.0 and/or OpenID Connect-based single sign on. The security of OAuth 2.0 and OpenID Connect is therefore of critical importance, and it has been widely examined both in theory and in practice. Unfortunately, as these studies have shown, real-world implementations of both schemes are often vulnerable to attack, and in particular to cross-site request forgery (CSRF) attacks. In this paper we propose a new technique which can be used to mitigate CSRF attacks against both OAuth 2.0 and OpenID Connect.
- Jan 16 2018 cs.HC arXiv:1801.04829v2This paper presents a pilot study on developing an instrument to predict the quality of e-commerce websites. The 8C model was adopted as the reference model of the heuristic evaluation. Each dimension of the 8C was mapped into a set of quantitative website elements, selected websites were scraped to get the quantitative website elements, and the score of each dimension was calculated. A software was developed in PHP for the experiments. In the training process, 10 experiments were conducted and quantitative analyses were regressively conducted between the experiments. The conversion rate was used to verify the heuristic evaluation of an e-commerce website after each experiment. The results showed that the mapping revisions between the experiments improved the performance of the evaluation instrument, therefore the experiment process and the quantitative mapping revision guideline proposed was on the right track. The software resulted from the experiment 10 can serve as the aimed e-commerce website evaluation instrument. The experiment results and the future work have been discussed.
- Jan 11 2018 cs.DC arXiv:1801.03314v1Stragglers are commonly believed to have a great impact on the performance of big data system. However, the reason to cause straggler is complicated. Previous works mostly focus on straggler detection, schedule level optimization and coarse-grained cause analysis. These methods cannot provide valuable insights to help users optimize their programs. In this paper, we propose BigRoots, a general method incorporating both framework and system features for root-cause analysis of stragglers in big data system. BigRoots considers features from big data framework such as shuffle read/write bytes and JVM garbage collection time, as well as system resource utilization such as CPU, I/O and network, which is able to detect both internal and external root causes of stragglers. We verify BigRoots by injecting high resource utilization across different system components and perform case studies to analyze different workloads in Hibench. The experimental results demonstrate that BigRoots is effective to identify the root cause of stragglers and provide useful guidance for performance optimization.
- In recent years, more and more machine learning algorithms have been applied to odor recognition. These odor recognition algorithms usually assume that the training dataset is static. However, for some odor recognition tasks, the odor dataset is dynamically growing where not only the training samples but also the number of classes increase over time. Motivated by this concern, we proposed a deep nearest class mean (DNCM) model which combines the deep learning framework and nearest class mean (NCM) method. DNCM not only can leverage deep neural network to extract deep features, but also well suited for integrating new classes. Experiments demonstrate that the proposed DNCM model is effective and efficient for incremental odor classification, especially for new classes with only a small number of training examples.
- Jan 08 2018 cs.CV arXiv:1801.01262v1In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of finger vein recognition algorithms heavily depends on large-scale real-world data sets. In order to motivate research on finger vein recognition, we released the largest finger vein data set up to now and hold finger vein recognition competitions based on our data set every year. In 2017, International Competition on Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered and 10 of them joined the final evaluation. The winner of this year dramatically improved the EER from 2.64% to 0.483% compared to the winner of last year. In this paper, we introduce the process and results of ICFVR 2017 and give insights on development of state-of-art finger vein recognition algorithms.
- Jan 04 2018 cs.CV arXiv:1801.01080v1A novel deep neural network training paradigm that exploits the conjoint information in multiple heterogeneous sources is proposed. Specifically, in a RGB-D based action recognition task, it cooperatively trains a single convolutional neural network (named c-ConvNet) on both RGB visual features and depth features, and deeply aggregates the two kinds of features for action recognition. Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities. The ranking loss consists of intra-modality and cross-modality triplet losses, and it reduces both the intra-modality and cross-modality feature variations. Furthermore, the correlations between RGB and depth data are embedded in the c-ConvNet, and can be retrieved by either of the modalities and contribute to the recognition in the case even only one of the modalities is available. The proposed method was extensively evaluated on two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art results.
- In many applications such as color image processing, data has more than one piece of information associated with each spatial coordinate, and in such cases the classical optimal mass transport (OMT) must be generalized to handle vector-valued or matrix-valued densities. In this paper, we discuss the vector and matrix optimal mass transport and present three contributions. We first present a rigorous mathematical formulation for these setups and provide analytical results including existence of solutions and strong duality. Next, we present a simple, scalable, and parallelizable methods to solve the vector and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA GPU and present experiments and applications.
- In this paper, a novel blind multi-user detection(MUD) framework for autonomous grant-free high-overloading non-orthogonal multiple access is introduced in detail aimed at fulfilling the requirements of fifth-generation massive Machine Type Communications. From the perspective of the transmitter side, pros and cons regarding diverse types of emerging grant-free transmission, particularly autonomous grant-free, are elaborated and presented in a comparative manner. In the receiver end,code word-level successive interference cancellation (CL-SIC) is revealed as the main framework to perform MUD. In addition, underpinning state-of-art blind ideas such as blind activation detection taking advantage of the statistical metric of the aggregate signals, blind equalization based on the constellation's simple geometric character of low order modulation symbols, and blind channel estimation employing solely the successfully decoded code words are explained.
- Dec 01 2017 cs.CV arXiv:1711.11556v2Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.
- Recent systems on structured prediction focus on increasing the level of structural dependencies within the model. However, our study suggests that complex structures entail high overfitting risks. To control the structure-based overfitting, we propose to conduct structure regularization decoding (SR decoding). The decoding of the complex structure model is regularized by the additionally trained simple structure model. We theoretically analyze the quantitative relations between the structural complexity and the overfitting risk. The analysis shows that complex structure models are prone to the structure-based overfitting. Empirical evaluations show that the proposed method improves the performance of the complex structure models by reducing the structure-based overfitting. On the sequence labeling tasks, the proposed method substantially improves the performance of the complex neural network models. The maximum F1 error rate reduction is 36.4% for the third-order model. The proposed method also works for the parsing task. The maximum UAS improvement is 5.5% for the tri-sibling model. The results are competitive with or better than the state-of-the-art results.
- Nov 28 2017 cs.CV arXiv:1711.09125v1Spatiotemporal feature learning in videos is a fundamental and difficult problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.
- To improve the efficiency of elderly assessments, an influence-based fast preceding questionnaire model (FPQM) is proposed. Compared with traditional assessments, the FPQM optimizes questionnaires by reordering their attributes. The values of low-ranking attributes can be predicted by the values of the high-ranking attributes. Therefore, the number of attributes can be reduced without redesigning the questionnaires. A new function for calculating the influence of the attributes is proposed based on probability theory. Reordering and reducing algorithms are given based on the attributes' influences. The model is verified through a practical application. The practice in an elderly-care company shows that the FPQM can reduce the number of attributes by 90.56% with a prediction accuracy of 98.39%. Compared with other methods, such as the Expert Knowledge, Rough Set and C4.5 methods, the FPQM achieves the best performance. In addition, the FPQM can also be applied to other questionnaires.
- Nov 23 2017 cs.CV arXiv:1711.08362v1Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and multi-modal-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.
- Nov 21 2017 cs.CR arXiv:1711.07120v2The widespread use of mobile electronic devices increases the complexities of mobile security. This paper aims to provide a secure communication environment for smartphone users. Some research proves that the one-time pad is one of the securest encryption methods, and the key distribution problem can be solved by using the QKD (quantum key distribution). The objective of this project is to design an Android APP (application) to exchange several random keys between mobile phones. Inspired by QKD, the developed APP uses the quick response (QR) code as a carrier to dispatch large amounts of one-time keys. After evaluating the performance of APP, it allows the mobile phone to capture and decode 1800 bytes of random data in 600ms. The continuous scanning mode of APP is designed to improve the overall transmission performance and user experience, and the maximum transmission rate of this mode is around 2200 bytes/s. The omnidirectional readability and error correction capability of QR code gives it better real-life application, and the features of adequate storage capacity and quick response optimize overall transmission efficiency. The security of this APP is guaranteed since QR code is exchanged face-to-face, eliminating the risk of being eavesdropped. Also, the id of QR code is the only message that would be transmitted through the whole communication. The experimental results show this project can achieve superior transmission performance, and the correlation between the transmission rate of the system and several parameters, such as the QR code size, has been analyzed. In addition, some existing technologies and the main findings in the context of the project are summarized and critically compared in detail.
- Nov 20 2017 physics.soc-ph cs.SI arXiv:1711.06535v1An adaptive label propagation algorithm (ALPA) is proposed to detect and monitor communities in dynamic networks. Unlike the traditional methods by re-computing the whole community decomposition after each modification of the network, ALPA takes into account the information of historical communities and updates its solution according to the network modifications via a local label propagation process, which generally affects only a small portion of the network. This makes it respond to network changes at low computational cost. The effectiveness of ALPA has been tested on both synthetic and real-world networks, which shows that it can successfully identify and track dynamic communities. Moreover, ALPA could detect communities with high quality and accuracy compared to other methods. Therefore, being low-complexity and parameter-free, ALPA is a scalable and promising solution for some real-world applications of community detection in dynamic networks.
- We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy.
- Nov 16 2017 cs.CV arXiv:1711.05431v1With exploiting contextual information over large image regions in an efficient way, the deep convolutional neural network has shown an impressive performance for single image super-resolution (SR). In this paper, we propose a deep convolutional network by cascading the well-designed inception-residual blocks within the deep Laplacian pyramid framework to progressively restore the missing high-frequency details of high-resolution (HR) images. By optimizing our network structure, the trainable depth of the proposed network gains a significant improvement, which in turn improves super-resolving accuracy. With our network depth increasing, however, the saturation and degradation of training accuracy continues to be a critical problem. As regard to this, we propose an effective two-stage training strategy, in which we firstly use images downsampled from the ground-truth HR images as the optimal objective to train the inception-residual blocks in each pyramid level with an extremely high learning rate enabled by gradient clipping, and then the ground-truth HR images are used to fine-tune all the pre-trained inception-residual blocks for obtaining the final SR model. Furthermore, we present a new loss function operating in both image space and local rank space to optimize our network for exploiting the contextual information among different output components. Extensive experiments on benchmark datasets validate that the proposed method outperforms existing state-of-the-art SR methods in terms of the objective evaluation as well as the visual quality.
- Nov 16 2017 physics.soc-ph cs.SY arXiv:1711.05254v1In this paper, the influence of fan-shaped buffer zone on the performance of the toll plaza is researched. A two-dimensional traffic flow model and a comprehensive evaluation model based on mechanical model and psychological field are established. The traffic flow model is simulated by creating coordinate system. We first establish queue theory model to analyze vehicles when entering toll plaza. Then, a two-dimensional steadily car-following model is established based on psychological field for the analysis of vehicles when leaving toll plaza. According to psychological field theory, we analyze the force condition of each vehicle. The force of each vehicle is contributed by the vehicles in its observation area and obstacles. By projecting these vehicles and obstacles via the equipotential line in the psychological field, the influence on the value and direction acceleration of following vehicles is obtained. Consequently, the changes of each vehicle's speed and position are obtained as well. Next, we establish simulation based on the states of vehicles and make the rules of vehicle state-changing. By simulating the system, we obtain the throughput of the toll plaza's input and output. Then we obtained the bearing pressure on the road by the max throughput and the demand of the roads. Using the number of cars in per unit area as the safety factor. Then a comprehensive evaluation model is established based on bearing pressure on the road, cost and safety factor.
- Nov 15 2017 cs.CV arXiv:1711.04147v1Scene text detection is a challenging problem in computer vision. In this paper, we propose a novel text detection network based on prevalent object detection frameworks. In order to obtain stronger semantic feature, we adopt ResNet as feature extraction layers and exploit multi-level feature by combining hierarchical convolutional networks. A vertical proposal mechanism is utilized to avoid proposal classification, while regression layer remains working to improve localization accuracy. Our approach evaluated on ICDAR2013 dataset achieves F-measure of 0.91, which outperforms previous state-of-the-art results in scene text detection.
- Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.
- Nov 10 2017 cs.CV arXiv:1711.03368v1Person re-identification (re-id) is to match people across disjoint camera views in a multi-camera system, and re-id has been an important technology applied in smart city in recent years. However, the majority of existing person re-id methods are not designed for processing sequential data in an online way. This ignores the real-world scenario that person images detected from multi-cameras system are coming sequentially. While there is a few work on discussing online re-id, most of them require considerable storage of all passed data samples that have been ever observed, and this could be unrealistic for processing data from a large camera network. In this work, we present an onepass person re-id model that adapts the re-id model based on each newly observed data and no passed data are directly used for each update. More specifically, we develop an Sketch online Discriminant Analysis (SoDA) by embedding sketch processing into Fisher discriminant analysis (FDA). SoDA can efficiently keep the main data variations of all passed samples in a low rank matrix when processing sequential data samples, and estimate the approximate within-class variance (i.e. within-class covariance matrix) from the sketch data information. We provide theoretical analysis on the effect of the estimated approximate within-class covariance matrix. In particular, we derive upper and lower bounds on the Fisher discriminant score (i.e. the quotient between between-class variation and within-class variation after feature transformation) in order to investigate how the optimal feature transformation learned by SoDA sequentially approximates the offline FDA that is learned on all observed data. Extensive experimental results have shown the effectiveness of our SoDA and empirically support our theoretical analysis.
- Nov 09 2017 cs.DC arXiv:1711.02976v1RPYFMM is a software package for the efficient evaluation of the potential field governed by the Rotne-Prager-Yamakawa (RPY) tensor interactions in biomolecular hydrodynamics simulations. In our algorithm, the RPY tensor is decomposed as a linear combination of four Laplace interactions, each of which is evaluated using the adaptive fast multipole method (FMM) [1] where the exponential expansions are applied to diagonalize the multipole-to-local translation operators. RPYFMM offers a unified execution on both shared and distributed memory computers by leveraging the DASHMM library [2, 3]. Preliminary numerical results show that the interactions for a molecular system of 15 million particles (beads) can be computed within one second on a Cray XC30 cluster using 12, 288 cores, while achieving approximately 54% strong-scaling efficiency.
- Nov 07 2017 cs.CV arXiv:1711.01984v1Always, some individuals in images are more important/attractive than others in some events such as presentation, basketball game or speech. However, it is challenging to find important people among all individuals in images directly based on their spatial or appearance information due to the existence of diverse variations of pose, action, appearance of persons and various changes of occasions. We overcome this difficulty by constructing a multiple Hyper-Interaction Graph to treat each individual in an image as a node and inferring the most active node referring to interactions estimated by various types of clews. We model pairwise interactions between persons as the edge message communicated between nodes, resulting in a bidirectional pairwise-interaction graph. To enrich the personperson interaction estimation, we further introduce a unidirectional hyper-interaction graph that models the consensus of interaction between a focal person and any person in a local region around. Finally, we modify the PageRank algorithm to infer the activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union of the pairwise-interaction and hyperinteraction graphs, and we call our algorithm the PersonRank. In order to provide publicable datasets for evaluation, we have contributed a new dataset called Multi-scene Important People Image Dataset and gathered a NCAA Basketball Image Dataset from sports game sequences. We have demonstrated that the proposed PersonRank outperforms related methods clearly and substantially.
- Nov 07 2017 cs.CL arXiv:1711.01701v1Traditional Chinese Medicine (TCM) has accumulated a big amount of precious resource in the long history of development. TCM prescriptions that consist of TCM herbs are an important form of TCM treatment, which are similar to natural language documents, but in a weakly ordered fashion. Directly adapting language modeling style methods to learn the embeddings of the herbs can be problematic as the herbs are not strictly in order, the herbs in the front of the prescription can be connected to the very last ones. In this paper, we propose to represent TCM herbs with distributed representations via Prescription Level Language Modeling (PLLM). In one of our experiments, the correlation between our calculated similarity between medicines and the judgment of professionals achieves a Spearman score of 55.35 indicating a strong correlation, which surpasses human beginners (TCM related field bachelor student) by a big margin (over 10%).
- Vertex coloring is one of the classic symmetry breaking problems studied in distributed computing. In this paper we present a new algorithm for $(\Delta+1)$-list coloring in the randomized $\textsf{LOCAL}$ model running in $O(\log^\ast n + \textsf{Det}_d(\text{poly} \log n))$ time, where $\textsf{Det}_d(n')$ is the deterministic complexity of $(\text{deg}+1)$-list coloring ($v$'s palette has size $\text{deg}(v)+1$) on $n'$-vertex graphs. This improves upon a previous randomized algorithm of Harris, Schneider, and Su (STOC 2016). with complexity $O(\sqrt{\log \Delta} + \log\log n + \textsf{Det}_d(\text{poly} \log n))$, and is dramatically faster than the best known deterministic algorithm of Fraigniaud, Heinrich, and Kosowski (FOCS 2016), with complexity $O(\sqrt{\Delta}\log^{2.5}\Delta + \log^* n)$. Our algorithm appears to be optimal. It matches the $\Omega(\log^\ast n)$ randomized lower bound, due to Naor (SIDMA 1991) and sort of matches the $\Omega(\textsf{Det}(\text{poly} \log n))$ randomized lower bound due to Chang, Kopelowitz, and Pettie (FOCS 2016), where $\textsf{Det}$ is the deterministic complexity of $(\Delta+1)$-list coloring. The best known upper bounds on $\textsf{Det}_d(n')$ and $\textsf{Det}(n')$ are both $2^{O(\sqrt{\log n'})}$ by Panconesi and Srinivasan (Journal of Algorithms 1996), and it is quite plausible that the complexities of both problems are the same, asymptotically.
- Understanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.
- Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called "Doubling Condition", we provide nearly instance-optimal sample complexity.
- Oct 27 2017 cs.CL arXiv:1710.09617v1We develop streaming keyword spotting systems using a recurrent neural network transducer (RNN-T) model: an all-neural, end-to-end trained, sequence-to-sequence model which jointly learns acoustic and language model components. Our models are trained to predict either phonemes or graphemes as subword units, thus allowing us to detect arbitrary keyword phrases, without any out-of-vocabulary words. In order to adapt the models to the requirements of keyword spotting, we propose a novel technique which biases the RNN-T system towards a specific keyword of interest. Our systems are compared against a strong sequence-trained, connectionist temporal classification (CTC) based "keyword-filler" baseline, which is augmented with a separate phoneme language model. Overall, our RNN-T system with the proposed biasing technique significantly improves performance over the baseline system.