results for au:Fan_F in:cs

- For manifold learning, it is assumed that high-dimensional sample/data points are on an embedded low-dimensional manifold. Usually, distances among samples are computed to represent the underlying data structure, for a specified distance measure such as the Euclidean distance or geodesic distance. For manifold learning, here we propose a metric according to the angular change along a geodesic line, thereby reflecting the underlying shape-oriented information or the similarity between high- and low-dimensional representations of a data cloud. Our numerical results are described to demonstrate the feasibility and merits of the proposed dimensionality reduction scheme
- Feb 13 2018 cs.OH arXiv:1802.03500v1Electricity users are the major players of the electric systems, and electricity consumption is growing at an extraordinary rate. The research on electricity consumption behaviors is becoming increasingly important to design and deployment of the electric systems. Unfortunately, electricity load profiles are difficult to acquire. Data synthesis is one of the best approaches to solving the lack of data, and the key is the model that preserves the real electricity consumption behaviors. In this paper, we propose a hierarchical multi-matrices Markov Chain (HMMC) model to synthesize scalable electricity load profiles that preserve the real consumption behavior on three time scales: per day, per week, and per year. To promote the research on the electricity consumption behavior, we use the HMMC approach to model two distinctive raw electricity load profiles. One is collected from the resident sector, and the other is collected from the non-resident sectors, including different industries such as education, finance, and manufacturing. The experiments show our model performs much better than the classical Markov Chain model. We publish two trained models online, and researchers can directly use these trained models to synthesize scalable electricity load profiles for further researches.
- Inspired by the fact that the neural network, as the mainstream for machine learning, has brought successes in many application areas, here we propose to use this approach for decoding hidden correlation among pseudo-random data and predicting events accordingly. With a simple neural network structure and a typical training procedure, we demonstrate the learning and prediction power of the neural network in extremely random environment. Finally, we postulate that the high sensitivity and efficiency of the neural network may allow to critically test if there could be any fundamental difference between quantum randomness and pseudo randomness, which is equivalent to the question: Does God play dice?
- We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model. WINN provides a significant improvement over the recent introspective neural networks (INN) method by enhancing INN's generative modeling capability. WINN has three interesting properties: (1) A mathematical connection between the formulation of Wasserstein generative adversarial networks (WGAN) and the INN algorithm is made; (2) The explicit adoption of the Wasserstein distance into INN results in a large enhancement to INN, achieving compelling results even with a single classifier --- e.g., providing a 20 times reduction in model size over INN within texture modeling; (3) When applied to supervised classification, WINN also gives rise to greater robustness with an $88\%$ reduction of errors against adversarial examples --- improved over the result of $39\%$ by an INN-family algorithm. In the experiments, we report encouraging results on unsupervised learning problems including texture, face, and object modeling, as well as a supervised classification task against adversarial attack.
- Oct 03 2017 cs.DB arXiv:1710.00204v1Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperative (HUMO) framework for entity resolution (ER), which divides an ER workload between machine and human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternative in quality control.
- The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2nd-order neurons. The numerical studies are performed to verify of the generalized BP algorithm.
- In machine learning, the use of an artificial neural network is the mainstream approach. Such a network consists of layers of neurons. These neurons are of the same type characterized by the two features: (1) an inner product of an input vector and a matching weighting vector of trainable parameters and (2) a nonlinear excitation function. Here we investigate the possibility of replacing the inner product with a quadratic function of the input vector, thereby upgrading the 1st order neuron to the 2nd order neuron, empowering individual neurons, and facilitating the optimization of neural networks. Also, numerical examples are provided to illustrate the feasibility and merits of the 2nd order neurons. Finally, further topics are discussed.
- The recently proposed generalized epidemic modeling framework (GEMF) \citesahneh2013generalized lays the groundwork for systematically constructing a broad spectrum of stochastic spreading processes over complex networks. This article builds an algorithm for exact, continuous-time numerical simulation of GEMF-based processes. Moreover the implementation of this algorithm, GEMFsim, is available in popular scientific programming platforms such as MATLAB, R, Python, and C; GEMFsim facilitates simulating stochastic spreading models that fit in GEMF framework. Using these simulations one can examine the accuracy of mean-field-type approximations that are commonly used for analytical study of spreading processes on complex networks.
- Mar 16 2015 cs.IR arXiv:1503.03961v1Since the length of microblog texts, such as tweets, is strictly limited to 140 characters, traditional Information Retrieval techniques suffer from the vocabulary mismatch problem severely and cannot yield good performance in the context of microblogosphere. To address this critical challenge, in this paper, we propose a new language modeling approach for microblog retrieval by inferring various types of context information. In particular, we expand the query using knowledge terms derived from Freebase so that the expanded one can better reflect users' search intent. Besides, in order to further satisfy users' real-time information need, we incorporate temporal evidences into the expansion method, which can boost recent tweets in the retrieval results with respect to a given topic. Experimental results on two official TREC Twitter corpora demonstrate the significant superiority of our approach over baseline methods.
- Dec 05 2014 cs.CG arXiv:1412.1680v3Given a real-valued function $f$ defined over a manifold $M$ embedded in $\mathbb{R}^d$, we are interested in recovering structural information about $f$ from the sole information of its values on a finite sample $P$. Existing methods provide approximation to the persistence diagram of $f$ when geometric noise and functional noise are bounded. However, they fail in the presence of aberrant values, also called outliers, both in theory and practice. We propose a new algorithm that deals with outliers. We handle aberrant functional values with a method inspired from the k-nearest neighbors regression and the local median filtering, while the geometric outliers are handled using the distance to a measure. Combined with topological results on nested filtrations, our algorithm performs robust topological analysis of scalar fields in a wider range of noise models than handled by current methods. We provide theoretical guarantees and experimental results on the quality of our approximation of the sampled scalar field.
- Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called $(\varepsilon,\delta)$-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point $z$ can be computed \emphexactly for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of $z$. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets.
- The efficiency of extracting topological information from point data depends largely on the complex that is built on top of the data points. From a computational viewpoint, the most favored complexes for this purpose have so far been Vietoris-Rips and witness complexes. While the Vietoris-Rips complex is simple to compute and is a good vehicle for extracting topology of sampled spaces, its size is huge--particularly in high dimensions. The witness complex on the other hand enjoys a smaller size because of a subsampling, but fails to capture the topology in high dimensions unless imposed with extra structures. We investigate a complex called the \em graph induced complex that, to some extent, enjoys the advantages of both. It works on a subsample but still retains the power of capturing the topology as the Vietoris-Rips complex. It only needs a graph connecting the original sample points from which it builds a complex on the subsample thus taming the size considerably. We show that, using the graph induced complex one can (i) infer the one dimensional homology of a manifold from a very lean subsample, (ii) reconstruct a surface in three dimension from a sparse subsample without computing Delaunay triangulations, (iii) infer the persistent homology groups of compact sets from a sufficiently dense sample. We provide experimental evidences in support of our theory.
- Algorithms for persistent homology and zigzag persistent homology are well-studied for persistence modules where homomorphisms are induced by inclusion maps. In this paper, we propose a practical algorithm for computing persistence under $\mathbb{Z}_2$ coefficients for a sequence of general simplicial maps and show how these maps arise naturally in some applications of topological data analysis. First, we observe that it is not hard to simulate simplicial maps by inclusion maps but not necessarily in a monotone direction. This, combined with the known algorithms for zigzag persistence, provides an algorithm for computing the persistence induced by simplicial maps. Our main result is that the above simple minded approach can be improved for a sequence of simplicial maps given in a monotone direction. A simplicial map can be decomposed into a set of elementary inclusions and vertex collapses--two atomic operations that can be supported efficiently with the notion of simplex annotations for computing persistent homology. A consistent annotation through these atomic operations implies the maintenance of a consistent cohomology basis, hence a homology basis by duality. While the idea of maintaining a cohomology basis through an inclusion is not new, maintaining them through a vertex collapse is new, which constitutes an important atomic operation for simulating simplicial maps. Annotations support the vertex collapse in addition to the usual inclusion quite naturally. Finally, we exhibit an application of this new tool in which we approximate the persistence diagram of a filtration of Rips complexes where vertex collapses are used to tame the blow-up in size.