results for au:Jin_L in:cs

- Mar 28 2018 cs.CV arXiv:1803.09256v1This paper presents a method that can accurately detect heads especially small heads under indoor scene. To achieve this, we propose a novel Feature Refine Net (FRN) and a cascaded multi-scale architecture. FRN exploits the multi-scale hierarchical features created by deep convolutional neural networks. Proposed channel weighting method enables FRN to make use of features alternatively and effectively. To improve the performance of small head detection, we propose a cascaded multi-scale architecture which has two detectors. One called global detector is responsible for detecting large objects and acquiring the global distribution information. The other called local detector is specified for small objects detection and makes use of the information provided by global detector. Due to the lack of head detection datasets, we have collected and labeled a new large dataset named SCUT-HEAD that includes 4405 images with 111251 heads annotated. Experiments show that our method has achieved state-of-art performance on SCUT-HEAD.
- Mar 23 2018 cs.GT arXiv:1803.08415v1Vehicle-to-Infrastructure (V2I) communications are increasingly supporting highway operations such as electronic toll collection, carpooling, and vehicle platooning. In this paper we study the incentives of strategic misbehavior by individual vehicles who can exploit the security vulnerabilities in V2I communications and impact the highway operations. We consider a V2I-enabled highway segment facing two classes of vehicles (agent populations), each with an authorized access to one server (subset of lanes). Vehicles are strategic in that they can misreport their class (type) to the system operator and get unauthorized access to the server dedicated to the other class. This misbehavior causes a congestion externality on the compliant vehicles, and thus, needs to be deterred. We focus on an environment where the operator is able to inspect the vehicles for misbehavior based on their reported types. The inspection is costly and successful detection incurs a fine on the misbehaving vehicle. We formulate a signaling game to study the strategic interaction between the vehicle classes and the operator. Our equilibrium analysis provides conditions on the cost parameters that govern the vehicles' incentive to misbehave, and determine the operator's optimal inspection strategy.
- There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, gram- mars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.
- Jan 22 2018 cs.CV arXiv:1801.06345v1Facial beauty prediction (FBP) is a significant visual recognition problem to make assessment of facial attractiveness that is consistent to human perception. To tackle this problem, various data-driven models, especially state-of-the-art deep learning techniques, were introduced, and benchmark dataset become one of the essential elements to achieve FBP. Previous works have formulated the recognition of facial beauty as a specific supervised learning problem of classification, regression or ranking, which indicates that FBP is intrinsically a computation problem with multiple paradigms. However, most of FBP benchmark datasets were built under specific computation constrains, which limits the performance and flexibility of the computational model trained on the dataset. In this paper, we argue that FBP is a multi-paradigm computation problem, and propose a new diverse benchmark dataset, called SCUT-FBP5500, to achieve multi-paradigm facial beauty prediction. The SCUT-FBP5500 dataset has totally 5500 frontal faces with diverse properties (male/female, Asian/Caucasian, ages) and diverse labels (face landmarks, beauty scores within [1,~5], beauty score distribution), which allows different computational models with different FBP paradigms, such as appearance-based/shape-based facial beauty classification/regression model for male/female of Asian/Caucasian. We evaluated the SCUT-FBP5500 dataset for FBP using different combinations of feature and predictor, and various deep learning methods. The results indicates the improvement of FBP and the potential applications based on the SCUT-FBP5500.
- Dec 14 2017 cs.AR arXiv:1712.04771v1With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers.
- Nov 15 2017 cs.CV arXiv:1711.04249v1In this paper, we propose a refined scene text detector with a \textitnovel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement. Retrospectively, both region proposal with \textitonly $3\times 3$ sliding-window feature and text detection refinement with \textitsingle scale high level feature are insufficient, especially for smaller scene text. Therefore, we design a new FEN network with \textittask-specific, \textitlow and \textithigh level semantic features fusion to improve the performance of text detection. Besides, since \textitunitary position-sensitive RoI pooling in general object detection is unreasonable for variable text regions, an \textitadaptively weighted position-sensitive RoI pooling layer is devised for further enhancing the detecting accuracy. To tackle the \textitsample-imbalance problem during the refinement stage, we also propose an effective \textitpositives mining strategy for efficiently training our network. Experiments on ICDAR 2011 and 2013 robust text detection benchmarks demonstrate that our method can achieve state-of-the-art results, outperforming all reported methods in terms of F-measure.
- Locally repairable codes, or locally recoverable codes (LRC for short) are designed for application in distributed and cloud storage systems. Similar to classical block codes, there is an important bound called the Singleton-type bound for locally repairable codes. In this paper, an optimal locally repairable code refers to a block code achieving this Singleton-type bound. Like classical MDS codes, optimal locally repairable codes carry some very nice combinatorial structures. Since introduction of the Singleton-type bound for locally repairable codes, people have put tremendous effort on constructions of optimal locally repairable codes. Due to hardness of this problem, there are few constructions of optimal locally repairable codes in literature. Most of these constructions are realized via either combinatorial or algebraic structures. In this paper, we employ automorphism groups of rational function fields to construct optimal locally repairable codes by considering the group action on the projective lines over finite fields. It turns out that we are able to construct optimal locally repairable codes with reflexibility of locality as well as smaller alphabet size comparable to the code length. In particular, we produce new families of $q$-ary locally repairable codes, including codes of length $q+1$ via cyclic groups and codes via dihedral groups.
- In 1964, Massey introduced a class of codes with complementary duals which are called Linear Complimentary Dual (LCD for short) codes. He showed that LCD codes have applications in communication system, side-channel attack (SCA) and so on. LCD codes have been extensively studied in literature. On the other hand, MDS codes form an optimal family of classical codes which have wide applications in both theory and practice. The main purpose of this paper is to give an explicit construction of several classes of LCD MDS codes, using tools from algebraic function fields. We exemplify this construction and obtain several classes of explicit LCD MDS codes for the odd characteristic case.
- Minimum storage regenerating codes have minimum storage of data in each node and therefore are maximal distance separable (MDS for short) codes. Thus, the number of nodes is upper bounded by $2^{\fb}$, where $\fb$ is the bits of data stored in each node. From both theoretical and practical points of view (see the details in Section 1), it is natural to consider regenerating codes that nearly have minimum storage of data, and meanwhile the number of nodes is unbounded. One of the candidates for such regenerating codes is an algebraic geometry code. In this paper, we generalize the repairing algorithm of Reed-Solomon codes given in \cite[STOC2016]GW16 to algebraic geometry codes and present an efficient repairing algorithm for arbitrary one-point algebraic geometry codes. By applying our repairing algorithm to the one-point algebraic geometry codes based on the Garcia-Stichtenoth tower, one can repair a code of rate $1-\Ge$ and length $n$ over $\F_{q}$ with bandwidth $(n-1)(1-\Gt)\log q$ for any $\Ge=2^{(\Gt-1/2)\log q}$ with a real $\tau\in(0,1/2)$. In addition, storage in each node for an algebraic geometry code is close to the minimum storage. Due to nice structures of Hermitian curves, repairing of Hermitian codes is also investigated. As a result, we are able to show that algebraic geometry codes are regenerating codes with good parameters. An example reveals that Hermitian codes outperform Reed-Solomon codes for certain parameters.
- Jul 14 2017 cs.CV arXiv:1707.03993v1Human action recognition in videos is one of the most challenging tasks in computer vision. One important issue is how to design discriminative features for representing spatial context and temporal dynamics. Here, we introduce a path signature feature to encode information from intra-frame and inter-frame contexts. A key step towards leveraging this feature is to construct the proper trajectories (paths) for the data steam. In each frame, the correlated constraints of human joints are treated as small paths, then the spatial path signature features are extracted from them. In video data, the evolution of these spatial features over time can also be regarded as paths from which the temporal path signature features are extracted. Eventually, all these features are concatenated to constitute the input vector of a fully connected neural network for action classification. Experimental results on four standard benchmark action datasets, J-HMDB, SBU Dataset, Berkeley MHAD, and NTURGB+D demonstrate that the proposed approach achieves state-of-the-art accuracy even in comparison with recent deep learning based models.
- May 22 2017 cs.CV arXiv:1705.06849v1Inspired by the great success of recurrent neural networks (RNNs) in sequential modeling, we introduce a novel RNN system to improve the performance of online signature verification. The training objective is to directly minimize intra-class variations and to push the distances between skilled forgeries and genuine samples above a given threshold. By back-propagating the training signals, our RNN network produced discriminative features with desired metrics. Additionally, we propose a novel descriptor, called the length-normalized path signature (LNPS), and apply it to online signature verification. LNPS has interesting properties, such as scale invariance and rotation invariance after linear combination, and shows promising results in online signature verification. Experiments on the publicly available SVC-2004 dataset yielded state-of-the-art performance of 2.37% equal error rate (EER).
- May 16 2017 cs.CV arXiv:1705.05207v1Currently, owing to the ubiquity of mobile devices, online handwritten Chinese character recognition (HCCR) has become one of the suitable choice for feeding input to cell phones and tablet devices. Over the past few years, larger and deeper convolutional neural networks (CNNs) have extensively been employed for improving character recognition performance. However, its substantial storage requirement is a significant obstacle in deploying such networks into portable electronic devices. To circumvent this problem, we propose a novel technique called DropWeight for pruning redundant connections in the CNN architecture. It is revealed that the proposed method not only treats streamlined architectures such as AlexNet and VGGNet well but also exhibits remarkable performance for deep residual network and inception network. We also demonstrate that global pooling is a better choice for building very compact online HCCR systems. Experiments were performed on the ICDAR-2013 online HCCR competition dataset using our proposed network, and it is found that the proposed approach requires only 0.57 MB for storage, whereas state-of-the-art CNN-based methods require up to 135 MB; meanwhile the performance is decreased only by 0.91%.
- We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks. The generator is itself a discriminator, capable of introspection: being able to self-evaluate the difference between its generated samples and the given training data. When followed by repeated discriminative learning, desirable properties of modern discriminative classifiers are directly inherited by the generator. IGM learns a cascade of CNN classifiers using a synthesis-by-classification algorithm. In the experiments, we observe encouraging results on a number of applications including texture modeling, artistic style transferring, face modeling, and semi-supervised learning.
- We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities. We employ a reclassification-by-synthesis algorithm to perform training using a formulation stemmed from the Bayes theory. Our ICN tries to iteratively: (1) synthesize pseudo-negative samples; and (2) enhance itself by improving the classification. The single CNN classifier learned is at the same time generative --- being able to directly synthesize new samples within its own discriminative model. We conduct experiments on benchmark datasets including MNIST, CIFAR-10, and SVHN using state-of-the-art CNN architectures, and observe improved classification results.
- Mar 20 2017 cs.CV arXiv:1703.05870v2Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
- It was shown by Massey that linear complementary dual (LCD for short) codes are asymptotically good. In 2004, Sendrier proved that LCD codes meet the asymptotic Gilbert-Varshamov (GV for short) bound. Until now, the GV bound still remains to be the best asymptotical lower bound for LCD codes. In this paper, we show that an algebraic geometry code over a finite field of even characteristic is equivalent to an LCD code and consequently there exists a family of LCD codes that are equivalent to algebraic geometry codes and exceed the asymptotical GV bound.
- Mar 07 2017 cs.CV arXiv:1703.01425v1Detecting incidental scene text is a challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. Retrospective research has only focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. To address these issues, we propose a new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. First, we use quadrilateral sliding windows in several specific intermediate convolutional layers to roughly recall the text with higher overlapping area and then a shared Monte-Carlo method is proposed for fast and accurate computing of the polygonal areas. After that, we designed a sequential protocol for relative regression which can exactly predict text with compact quadrangle. Moreover, a auxiliary smooth Ln loss is also proposed for further regressing the position of text, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability. The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization". The performance of our method is evaluated by using F-measure and found to be 70.64%, outperforming the existing state-of-the-art method with F-measure 63.76%.
- Feb 28 2017 cs.CV arXiv:1702.07975v1Like other problems in computer vision, offline handwritten Chinese character recognition (HCCR) has achieved impressive results using convolutional neural network (CNN)-based methods. However, larger and deeper networks are needed to deliver state-of-the-art results in this domain. Such networks intuitively appear to incur high computational cost, and require the storage of a large number of parameters, which renders them unfeasible for deployment in portable devices. To solve this problem, we propose a Global Supervised Low-rank Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve the problems of speed and storage capacity. We design a nine-layer CNN for HCCR consisting of 3,755 classes, and devise an algorithm that can reduce the networks computational cost by nine times and compress the network to 1/18 of the original size of the baseline model, with only a 0.21% drop in accuracy. In tests, the proposed algorithm surpassed the best single-network performance reported thus far in the literature while requiring only 2.3 MB for storage. Furthermore, when integrated with our effective forward implementation, the recognition of an offline character image took only 9.7 ms on a CPU. Compared with the state-of-the-art CNN model for HCCR, our approach is approximately 30 times faster, yet 10 times more cost efficient.
- Feb 27 2017 cs.CV arXiv:1702.07508v1This paper presents an investigation of several techniques that increase the accuracy of online handwritten Chinese character recognition (HCCR). We propose a new training strategy named DropDistortion to train a deep convolutional neural network (DCNN) with distorted samples. DropDistortion gradually lowers the degree of character distortion during training, which allows the DCNN to better generalize. Path signature is used to extract effective features for online characters. Further improvement is achieved by employing spatial stochastic max-pooling as a method of feature map distortion and model averaging. Experiments were carried out on three publicly available datasets, namely CASIA-OLHWDB 1.0, CASIA-OLHWDB 1.1, and the ICDAR2013 online HCCR competition dataset. The proposed techniques yield state-of-the-art recognition accuracies of 97.67%, 97.30%, and 97.99%, respectively.
- Nov 29 2016 cs.CV arXiv:1611.08991v2Instance segmentation has attracted recent attention in computer vision and existing methods in this domain mostly have an object detection stage. In this paper, we study the intrinsic challenge of the instance segmentation problem, the presence of a quotient space (swapping the labels of different instances leads to the same result), and propose new methods that are object proposal- and object detection- free. We propose three alternative methods, namely pixel-based affinity mapping, superpixel-based affinity learning, and boundary-based component segmentation, all focusing on performing labeling transformations to cope with the quotient space problem. By adopting fully convolutional neural networks (FCN) like models, our framework attains competitive results on both the PASCAL dataset (object-centric) and the Gland dataset (texture-centric), which the existing methods are not able to do. Our work also has the advantages in its transparency, simplicity, and being all segmentation based.
- In 2011, Guruswami-HÃ¥stad-Kopparty \citeGru showed that the list-decodability of random linear codes is as good as that of general random codes. In the present paper, we further strengthen the result by showing that the list-decodability of random \it Euclidean self-orthogonal codes is as good as that of general random codes as well, i.e., achieves the classical Gilbert-Varshamov bound. Specifically, we show that, for any fixed finite field $\F_q$, error fraction $\delta\in (0,1-1/q)$ satisfying $1-H_q(\delta)\le \frac12$ and small $\epsilon>0$, with high probability a random Euclidean self-orthogonal code over $\F_q$ of rate $1-H_q(\delta)-\epsilon$ is $(\delta, O(1/\epsilon))$-list-decodable. This generalizes the result of linear codes to Euclidean self-orthogonal codes. In addition, we extend the result to list decoding \it symplectic dual-containing codes by showing that the list-decodability of random symplectic dual-containing codes achieves the quantum Gilbert-Varshamov bound as well. This implies that list-decodability of quantum stabilizer codes can achieve the quantum Gilbert-Varshamov bound. The counting argument on self-orthogonal codes is an important ingredient to prove our result.
- Oct 11 2016 cs.CV arXiv:1610.02616v2Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps using a sliding window-based method, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MCFCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.10% and 97.15%, respectively, which are significantly better than the best result reported thus far in the literature.
- May 25 2016 cs.CV arXiv:1605.07314v1In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN). First, we propose the inception region proposal network (Inception-RPN) and design a set of text characteristic prior bounding boxes to achieve high word recall with only hundred level candidate proposals. Next, we present a powerful textdetection network that embeds ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP) for text and non-text classification and accurate localization. Finally, we apply an iterative bounding box voting scheme to pursue high recall in a complementary manner and introduce a filtering algorithm to retain the most suitable bounding box, while removing redundant inner and outer boxes for each text instance. Our approach achieves an F-measure of 0.83 and 0.85 on the ICDAR 2011 and 2013 robust text detection benchmarks, outperforming previous state-of-the-art results.
- Apr 19 2016 cs.CV arXiv:1604.04953v1This paper proposes an end-to-end framework, namely fully convolutional recurrent network (FCRN) for handwritten Chinese text recognition (HCTR). Unlike traditional methods that rely heavily on segmentation, our FCRN is trained with online text data directly and learns to associate the pen-tip trajectory with a sequence of characters. FCRN consists of four parts: a path-signature layer to extract signature features from the input pen-tip trajectory, a fully convolutional network to learn informative representation, a sequence modeling layer to make per-frame predictions on the input sequence and a transcription layer to translate the predictions into a label sequence. The FCRN is end-to-end trainable in contrast to conventional methods whose components are separately trained and tuned. We also present a refined beam search method that efficiently integrates the language model to decode the FCRN and significantly improve the recognition results. We evaluate the performance of the proposed method on the test sets from the databases CASIA-OLHWDB and ICDAR 2013 Chinese handwriting recognition competition, and both achieve state-of-the-art performance with correct rates of 96.40% and 95.00%, respectively.
- This note introduces a piecewise-deterministic queueing (PDQ) model to study the stability of traffic queues in parallel-link transportation systems facing stochastic capacity fluctuations. The saturation rate (capacity) of the PDQ model switches between a finite set of modes according to a Markov chain, and link inflows are controlled by a state-feedback policy. A PDQ system is stable only if a lower bound on the time-average link inflows does not exceed the corresponding time-average saturation rate. Furthermore, a PDQ system is stable if the following two conditions hold: the nominal mode's saturation rate is high enough that all queues vanish in this mode, and a bilinear matrix inequality (BMI) involving an underestimate of the discharge rates of the PDQ in individual modes is feasible. The stability conditions can be strengthened for two-mode PDQs. These results can be used for design of routing policies that guarantee stability of traffic queues under stochastic capacity fluctuations.
- Feb 16 2016 cs.CV arXiv:1602.04348v1Maximally stable extremal regions (MSER), which is a popular method to generate character proposals/candidates, has shown superior performance in scene text detection. However, the pixel-level operation limits its capability for handling some challenging cases (e.g., multiple connected characters, separated parts of one character and non-uniform illumination). To better tackle these cases, we design a character proposal network (CPN) by taking advantage of the high capacity and fast computing of fully convolutional network (FCN). Specifically, the network simultaneously predicts characterness scores and refines the corresponding locations. The characterness scores can be used for proposal ranking to reject non-character proposals and the refining process aims to obtain the more accurate locations. Furthermore, considering the situation that different characters have different aspect ratios, we propose a multi-template strategy, designing a refiner for each aspect ratio. The extensive experiments indicate our method achieves recall rates of 93.88%, 93.60% and 96.46% on ICDAR 2013, SVT and Chinese2k datasets respectively using less than 1000 proposals, demonstrating promising performance of our character proposal network.
- Both MDS and Euclidean self-dual codes have theoretical and practical importance and the study of MDS self-dual codes has attracted lots of attention in recent years. In particular, determining existence of $q$-ary MDS self-dual codes for various lengths has been investigated extensively. The problem is completely solved for the case where $q$ is even. The current paper focuses on the case where $q$ is odd. We construct a few classes of new MDS self-dual code through generalized Reed-Solomon codes. More precisely, we show that for any given even length $n$ we have a $q$-ary MDS code as long as $q\equiv1\bmod{4}$ and $q$ is sufficiently large (say $q\ge 2^n\times n^2)$. Furthermore, we prove that there exists a $q$-ary MDS self-dual code of length $n$ if $q=r^2$ and $n$ satisfies one of the three conditions: (i) $n\le r$ and $n$ is even; (ii) $q$ is odd and $n-1$ is an odd divisor of $q-1$; (iii) $r\equiv3\mod{4}$ and $n=2tr$ for any $t\le (r-1)/2$.
- A pure quantum state is called $k$-uniform if all its reductions to $k$-qudit are maximally mixed. We investigate the general constructions of $k$-uniform pure quantum states of $n$ subsystems with $d$ levels. We provide one construction via symmetric matrices and the second one through classical error-correcting codes. There are three main results arising from our constructions. Firstly, we show that for any given even $n\ge 2$, there always exists an $n/2$-uniform $n$-qudit quantum state of level $p$ for sufficiently large prime $p$. Secondly, both constructions show that their exist $k$-uniform $n$-qudit pure quantum states such that $k$ is proportional to $n$, i.e., $k=\Omega(n)$ although the construction from symmetric matrices outperforms the one by error-correcting codes. Thirdly, our symmetric matrix construction provides a positive answer to the open question in \citeDA on whether there exists $3$-uniform $n$-qudit pure quantum state for all $n\ge 8$. In fact, we can further prove that, for every $k$, there exists a constant $M_k$ such that there exists a $k$-uniform $n$-qudit quantum state for all $n\ge M_k$. In addition, by using concatenation of algebraic geometry codes, we give an explicit construction of $k$-uniform quantum state when $k$ tends to infinity.
- Nov 10 2015 cs.CV arXiv:1511.02465v1This paper proposes a deep leaning method to address the challenging facial attractiveness prediction problem. The method constructs a convolutional neural network of facial beauty prediction using a new deep cascaded fine-turning scheme with various face inputting channels, such as the original RGB face image, the detail layer image, and the lighting layer image. With a carefully designed CNN model of deep structure, large input size and small convolutional kernels, we have achieved a high prediction correlation of 0.88. This result convinces us that the problem of facial attractiveness prediction can be solved by deep learning approach, and it also shows the important roles of the facial smoothness, lightness, and color information that were involved in facial beauty perception, which is consistent with the result of recent psychology studies. Furthermore, we analyze the high-level features learnt by CNN through visualization of its hidden layers, and some interesting phenomena were observed. It is found that the contours and appearance of facial features, especially eyes and moth, are the most significant facial attributes for facial attractiveness prediction, which is also consistent with the visual perception intuition of human.
- Nov 10 2015 cs.CV arXiv:1511.02282v1We introduce a new pipeline for hand localization and fingertip detection. For RGB images captured from an egocentric vision mobile camera, hand and fingertip detection remains a challenging problem due to factors like background complexity and hand shape variety. To address these issues accurately and robustly, we build a large scale dataset named Ego-Fingertip and propose a bi-level cascaded pipeline of convolutional neural networks, namely, Attention-based Hand Detector as well as Multi-point Fingertip Detector. The proposed method significantly tackles challenges and achieves satisfactorily accurate prediction and real-time performance compared to previous hand and fingertip detection methods.
- Nov 10 2015 cs.CV arXiv:1511.02459v1In this paper, a novel face dataset with attractiveness ratings, namely, the SCUT-FBP dataset, is developed for automatic facial beauty perception. This dataset provides a benchmark to evaluate the performance of different methods for facial attractiveness prediction, including the state-of-the-art deep learning method. The SCUT-FBP dataset contains face portraits of 500 Asian female subjects with attractiveness ratings, all of which have been verified in terms of rating distribution, standard deviation, consistency, and self-consistency. Benchmark evaluations for facial attractiveness prediction were performed with different combinations of facial geometrical features and texture features using classical statistical learning methods and the deep learning method. The best Pearson correlation (0.8187) was achieved by the CNN model. Thus, the results of our experiments indicate that the SCUT-FBP dataset provides a reliable benchmark for facial beauty perception.
- Owing to the rapid growth of touchscreen mobile terminals and pen-based interfaces, handwriting-based writer identification systems are attracting increasing attention for personal authentication, digital forensics, and other applications. However, most studies on writer identification have not been satisfying because of the insufficiency of data and difficulty of designing good features under various conditions of handwritings. Hence, we introduce an end-to-end system, namely DeepWriterID, employed a deep convolutional neural network (CNN) to address these problems. A key feature of DeepWriterID is a new method we are proposing, called DropSegment. It designs to achieve data augmentation and improve the generalized applicability of CNN. For sufficient feature representation, we further introduce path signature feature maps to improve performance. Experiments were conducted on the NLPR handwriting database. Even though we only use pen-position information in the pen-down state of the given handwriting samples, we achieved new state-of-the-art identification rates of 95.72% for Chinese text and 98.51% for English text.
- The Reed-Muller (RM) code encoding $n$-variate degree-$d$ polynomials over ${\mathbb F}_q$ for $d < q$, with its evaluation on ${\mathbb F}_q^n$, has relative distance $1-d/q$ and can be list decoded from a $1-O(\sqrt{d/q})$ fraction of errors. In this work, for $d \ll q$, we give a length-efficient puncturing of such codes which (almost) retains the distance and list decodability properties of the Reed-Muller code, but has much better rate. Specificially, when $q =\Omega( d^2/\epsilon^2)$, we given an explicit rate $\Omega\left(\frac{\epsilon}{d!}\right)$ puncturing of Reed-Muller codes which have relative distance at least $(1-\epsilon)$ and efficient list decoding up to $(1-\sqrt{\epsilon})$ error fraction. This almost matches the performance of random puncturings which work with the weaker field size requirement $q= \Omega( d/\epsilon^2)$. We can also improve the field size requirement to the optimal (up to constant factors) $q =\Omega( d/\epsilon)$, at the expense of a worse list decoding radius of $1-\epsilon^{1/3}$ and rate $\Omega\left(\frac{\epsilon^2}{d!}\right)$. The first of the above trade-offs is obtained by substituting for the variables functions with carefully chosen pole orders from an algebraic function field; this leads to a puncturing for which the RM code is a subcode of a certain algebraic-geometric code (which is known to be efficiently list decodable). The second trade-off is obtained by concatenating this construction with a Reed-Solomon based multiplication friendly pair, and using the list recovery property of algebraic-geometric codes.
- This paper presents an open source tool for testing the recognition accuracy of Chinese handwriting input methods. The tool consists of two modules, namely the PC and Android mobile client. The PC client reads handwritten samples in the computer, and transfers them individually to the Android client in accordance with the socket communication protocol. After the Android client receives the data, it simulates the handwriting on screen of client device, and triggers the corresponding handwriting recognition method. The recognition accuracy is recorded by the Android client. We present the design principles and describe the implementation of the test platform. We construct several test datasets for evaluating different handwriting recognition systems, and conduct an objective and comprehensive test using six Chinese handwriting input methods with five datasets. The test results for the recognition accuracy are then compared and analyzed.
- May 29 2015 cs.CV arXiv:1505.07675v1Deep convolutional neural networks (DCNNs) have achieved great success in various computer vision and pattern recognition applications, including those for handwritten Chinese character recognition (HCCR). However, most current DCNN-based HCCR approaches treat the handwritten sample simply as an image bitmap, ignoring some vital domain-specific information that may be useful but that cannot be learnt by traditional networks. In this paper, we propose an enhancement of the DCNN approach to online HCCR by incorporating a variety of domain-specific knowledge, including deformation, non-linear normalization, imaginary strokes, path signature, and 8-directional features. Our contribution is twofold. First, these domain-specific technologies are investigated and integrated with a DCNN to form a composite network to achieve improved performance. Second, the resulting DCNNs with diversity in their domain knowledge are combined using a hybrid serial-parallel (HSP) strategy. Consequently, we achieve a promising accuracy of 97.20% and 96.87% on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.1, respectively, outperforming the best results previously reported in the literature.
- May 26 2015 cs.CV arXiv:1505.06623v1In this paper, we present an effective method to analyze the recognition confidence of handwritten Chinese character, based on the softmax regression score of a high performance convolutional neural networks (CNN). Through careful and thorough statistics of 827,685 testing samples that randomly selected from total 8836 different classes of Chinese characters, we find that the confidence measurement based on CNN is an useful metric to know how reliable the recognition results are. Furthermore, we find by experiments that the recognition confidence can be used to find out similar and confusable character-pairs, to check wrongly or cursively written samples, and even to discover and correct mis-labelled samples. Many interesting observations and statistics are given and analyzed in this study.
- May 21 2015 cs.CV arXiv:1505.05354v1Inspired by the theory of Leitners learning box from the field of psychology, we propose DropSample, a new method for training deep convolutional neural networks (DCNNs), and apply it to large-scale online handwritten Chinese character recognition (HCCR). According to the principle of DropSample, each training sample is associated with a quota function that is dynamically adjusted on the basis of the classification confidence given by the DCNN softmax output. After a learning iteration, samples with low confidence will have a higher probability of being selected as training data in the next iteration; in contrast, well-trained and well-recognized samples with very high confidence will have a lower probability of being involved in the next training iteration and can be gradually eliminated. As a result, the learning process becomes more efficient as it progresses. Furthermore, we investigate the use of domain-specific knowledge to enhance the performance of DCNN by adding a domain knowledge layer before the traditional CNN. By adopting DropSample together with different types of domain-specific knowledge, the accuracy of HCCR can be improved efficiently. Experiments on the CASIA-OLHDWB 1.0, CASIA-OLHWDB 1.1, and ICDAR 2013 online HCCR competition datasets yield outstanding recognition rates of 97.33%, 97.06%, and 97.51% respectively, all of which are significantly better than the previous best results reported in the literature.
- May 21 2015 cs.CV arXiv:1505.04925v1Just like its great success in solving many computer vision problems, the convolutional neural networks (CNN) provided new end-to-end approach to handwritten Chinese character recognition (HCCR) with very promising results in recent years. However, previous CNNs so far proposed for HCCR were neither deep enough nor slim enough. We show in this paper that, a deeper architecture can benefit HCCR a lot to achieve higher performance, meanwhile can be designed with less parameters. We also show that the traditional feature extraction methods, such as Gabor or gradient feature maps, are still useful for enhancing the performance of CNN. We design a streamlined version of GoogLeNet [13], which was original proposed for image classification in recent years with very deep architecture, for HCCR (denoted as HCCR-GoogLeNet). The HCCR-GoogLeNet we used is 19 layers deep but involves with only 7.26 million parameters. Experiments were conducted using the ICDAR 2013 offline HCCR competition dataset. It has been shown that with the proper incorporation with traditional directional feature maps, the proposed single and ensemble HCCR-GoogLeNet models achieve new state of the art recognition accuracy of 96.35% and 96.74%, respectively, outperforming previous best result with significant gap.
- May 21 2015 cs.CV arXiv:1505.04922v1Most existing online writer-identification systems require that the text content is supplied in advance and rely on separately designed features and classifiers. The identifications are based on lines of text, entire paragraphs, or entire documents; however, these materials are not always available. In this paper, we introduce a path-signature feature to an end-to-end text-independent writer-identification system with a deep convolutional neural network (DCNN). Because deep models require a considerable amount of data to achieve good performance, we propose a data-augmentation method named DropStroke to enrich personal handwriting. Experiments were conducted on online handwritten Chinese characters from the CASIA-OLHWDB1.0 dataset, which consists of 3,866 classes from 420 writers. For each writer, we only used 200 samples for training and the remaining 3,666. The results reveal that the path-signature feature is useful for writer identification, and the proposed DropStroke technique enhances the generalization and significantly improves performance.
- To improve accuracy and speed of regressions and classifications, we present a data-based prediction method, Random Bits Regression (RBR). This method first generates a large number of random binary intermediate/derived features based on the original input matrix, and then performs regularized linear/logistic regression on those intermediate/derived features to predict the outcome. Benchmark analyses on a simulated dataset, UCI machine learning repository datasets and a GWAS dataset showed that RBR outperforms other popular methods in accuracy and robustness. RBR (available on https://sourceforge.net/projects/rbr/) is very fast and requires reasonable memories, therefore, provides a strong, robust and fast predictor in the big data era.
- Bayesian optimization is a powerful global optimization technique for expensive black-box functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. Moreover, it creates serious theoretical concerns, as most of the convergence results assume that the exact optimum of the acquisition function can be found. In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. The experiments with global optimization benchmarks and a novel application to automatic information extraction demonstrate that the resulting technique is more efficient than the two approaches from which it draws inspiration. Unlike most theoretical analyses of Bayesian optimization with Gaussian processes, our finite-time convergence rate proofs do not require exact optimization of an acquisition function. That is, our approach eliminates the unsatisfactory assumption that a difficult, potentially NP-hard, problem has to be solved in order to obtain vanishing regret rates.
- Erasure list decoding was introduced to correct a larger number of erasures with output of a list of possible candidates. In the present paper, we consider both random linear codes and algebraic geometry codes for list decoding erasure errors. The contributions of this paper are two-fold. Firstly, we show that, for arbitrary $0<R<1$ and $\epsilon>0$ ($R$ and $\epsilon$ are independent), with high probability a random linear code is an erasure list decodable code with constant list size $2^{O(1/\epsilon)}$ that can correct a fraction $1-R-\epsilon$ of erasures, i.e., a random linear code achieves the information-theoretic optimal trade-off between information rate and fraction of erasure errors. Secondly, we show that algebraic geometry codes are good erasure list-decodable codes. Precisely speaking, for any $0<R<1$ and $\epsilon>0$, a $q$-ary algebraic geometry code of rate $R$ from the Garcia-Stichtenoth tower can correct $1-R-\frac{1}{\sqrt{q}-1}+\frac{1}{q}-\epsilon$ fraction of erasure errors with list size $O(1/\epsilon)$. This improves the Johnson bound applied to algebraic geometry codes. Furthermore, list decoding of these algebraic geometry codes can be implemented in polynomial time.
- It has been a great challenge to construct new quantum MDS codes. In particular, it is very hard to construct quantum MDS codes with relatively large minimum distance. So far, except for some sparse lengths, all known $q$-ary quantum MDS codes have minimum distance less than or equal to $q/2+1$. In the present paper, we provide a construction of quantum MDS codes with minimum distance bigger than $q/2+1$. In particular, we show existence of $q$-ary quantum MDS codes with length $n=q^2+1$ and minimum distance $d$ for any $d\le q+1$ (this result extends those given in \citeGu11,Jin1,KZ12); and with length $(q^2+2)/3$ and minimum distance $d$ for any $d\le (2q+2)/3$ if $3|(q+1)$. Our method is through Hermitian self-orthogonal codes. The main idea of constructing Hermitian self-orthogonal codes is based on the solvability in $\F_q$ of a system of homogenous equations over $\F_{q^2}$.
- A curve attaining the Hasse-Weil bound is called a maximal curve. Usually classical error-correcting codes obtained from a maximal curve have good parameters. However, the quantum stabilizer codes obtained from such classical error-correcting codes via Euclidean or Hermitian self-orthogonality do not always possess good parameters. In this paper, the Hermitian self-orthogonality of algebraic geometry codes obtained from two maximal curves is investigated. It turns out that the stabilizer quantum codes produced from such Hermitian self-orthogonal classical codes have good parameters.
- In an undirected social graph, a friendship link involves two users and the friendship is visible in both the users' friend lists. Such a dual visibility of the friendship may raise privacy threats. This is because both users can separately control the visibility of a friendship link to other users and their privacy policies for the link may not be consistent. Even if one of them conceals the link from a third user, the third user may find such a friendship link from another user's friend list. In addition, as most users allow their friends to see their friend lists in most social network systems, an adversary can exploit the inconsistent policies to launch privacy attacks to identify and infer many of a targeted user's friends. In this paper, we propose, analyze and evaluate such an attack which is called Friendship Identification and Inference (FII) attack. In a FII attack scenario, we assume that an adversary can only see his friend list and the friend lists of his friends who do not hide the friend lists from him. Then, a FII attack contains two attack steps: 1) friend identification and 2) friend inference. In the friend identification step, the adversary tries to identify a target's friends based on his friend list and those of his friends. In the friend inference step, the adversary attempts to infer the target's friends by using the proposed random walk with restart approach. We present experimental results using three real social network datasets and show that FII attacks are generally efficient and effective when adversaries and targets are friends or 2-distant neighbors. We also comprehensively analyze the attack results in order to find what values of parameters and network features could promote FII attacks. Currently, most popular social network systems with an undirected friendship graph, such as Facebook, LinkedIn and Foursquare, are susceptible to FII attacks.
- In the present paper, we show that if the dimension of an arbitrary algebraic geometry code over a finite field of even characters is slightly less than half of its length, then it is equivalent to an Euclidean self-orthogonal code. However, in the literatures, a strong contrition about existence of certain differential is required to obtain such a result. We also show a similar result on Hermitian self-orthogonal algebraic geometry codes. As a consequence, we can apply our result to quantum codes and obtain quantum codes with good asymptotic bounds.
- There have been various constructions of classical codes from polynomial valuations in literature \citeARC04, LNX01,LX04,XF04,XL00. In this paper, we present a construction of classical codes based on polynomial construction again. One of the features of this construction is that not only the classical codes arisen from the construction have good parameters, but also quantum codes with reasonably good parameters can be produced from these classical codes. In particular, some new quantum codes are constructed (see Examples \ref5.5 and \ref5.6).
- It is well known that quantum codes can be constructed through classical symplectic self-orthogonal codes. In this paper, we give a kind of Gilbert-Varshamov bound for symplectic self-orthogonal codes first and then obtain the Gilbert-Varshamov bound for quantum codes. The idea of obtaining the Gilbert-Varshamov bound for symplectic self-orthogonal codes follows from counting arguments.