Jun 22 2017 cs.CL
JaTeCS is an open source Java library that supports research on automatic text categorization and other related problems, such as ordinal regression and quantification, which are of special interest in opinion mining applications. It covers all the steps of an experimental activity, from reading the corpus to the evaluation of the experimental results. As JaTeCS is focused on text as the main input data, it provides the user with many text-dedicated tools, e.g.: data readers for many formats, including the most commonly used text corpora and lexical resources, natural language processing tools, multi-language support, methods for feature selection and weighting, the implementation of many machine learning algorithms as well as wrappers for well-known external software (e.g., SVM_light) which enable their full control from code. JaTeCS support its expansion by abstracting through interfaces many of the typical tools and procedures used in text processing tasks. The library also provides a number of "template" implementations of typical experimental setups (e.g., train-test, k-fold validation, grid-search optimization, randomized runs) which enable fast realization of experiments just by connecting the templates with data readers, learning algorithms and evaluation measures.
Apr 21 2017 cs.CV
The recently proposed stochastic residual networks selectively activate or bypass the layers during training, based on independent stochastic choices, each of which following a probability distribution that is fixed in advance. In this paper we present a first exploration on the use of an epoch-dependent distribution, starting with a higher probability of bypassing deeper layers and then activating them more frequently as training progresses. Preliminary results are mixed, yet they show some potential of adding an epoch-dependent management of distributions, worth of further investigation.
We propose a framework for the visualization of directed networks relying on the eigenfunctions of the magnetic Laplacian, called here Magnetic Eigenmaps. The magnetic Laplacian is a complex deformation of the well-known combinatorial Laplacian. Features such as density of links and directionality patterns are revealed by plotting the phases of the first magnetic eigenvectors. An interpretation of the magnetic eigenvectors is given in connection with the angular synchronization problem. Illustrations of our method are given for both artificial and real networks.
In this paper we tackle the problem of image search when the query is a short textual description of the image the user is looking for. We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation. Searching in the visual feature space has the advantage that any update to the translation model does not require to reprocess the, typically huge, image collection on which the search is performed. We propose Text2Vis, a neural network that generates a visual representation, in the visual feature space of the fc6-fc7 layers of ImageNet, from a short descriptive text. Text2Vis optimizes two loss functions, using a stochastic loss-selection method. A visual-focused loss is aimed at learning the actual text-to-visual feature mapping, while a text-focused loss is aimed at modeling the higher-level semantic concepts expressed in language and countering the overfit on non-relevant visual components of the visual loss. We report preliminary results on the MS-COCO dataset.
Feb 10 2016 cs.CE
Multiscale entropy (MSE) has been a prevalent algorithm to quantify the complexity of fluctuations in the local mean value of biomedical time series. Recent developments in the field have tried to improve the MSE by reducing its variability in large scale factors. On the other hand, there has been recent interest in using other statistical moments than the mean, i.e. variance, in the coarse-graining step of the MSE. Building on these trends, here we introduce the so-called refined composite multiscale fuzzy entropy based on the standard deviation (RCMFE\sigma) to quantify the dynamical properties of spread over multiple time scales. We demonstrate the dependency of the RCMFE\sigma, in comparison with other multiscale approaches, on several straightforward signal processing concepts using a set of synthetic signals. We also investigate the complementarity of using the standard deviation instead of the mean in the coarse-graining process using magnetoencephalograms in Alzheimer disease and publicly available electroencephalograms recorded from focal and non-focal areas in epilepsy. Our results indicate that RCMFE\sigma offers complementary information to that revealed by classical coarse-graining approaches and that it has superior performance to distinguish different types of physiological activity.
Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, in effect reducing the number of observations from $m$ (the number of observations to start with) to $k$ (the number of singular values retained in the low-rank model), while not sacrificing---sometimes even improving---the statistical performance. Theoretical analysis, as well as numerical experiments on both synthetic and real data, illustrate the efficiency of the method in large scale applications.
Feb 19 2014 cs.SE
Recently, the awareness of the importance of distributed software development has been growing in the software engineering community. Economic constraints, more and more outsourcing of development activities, and the increasing spatial distribution of companies come along with challenges of how to organize distributed development. In this article, we reason that a common process understanding is mandatory for successful distributed development. Integrated process planning, guidance and enactment are seen as enabling technologies to reach a unique process view. We sketch a synthesis of the software process modeling environment SPEARMINT and the XCHIPS system for web-based process support. Hereby, planners and developers are provided with collaborative planning and enactment support and advanced process guidance via electronic process guides (EPGs). We describe the usage of this integrated environment by using a case study for the development of a learning system.
Jan 16 2014 cs.AI
A weighted constraint satisfaction problem (WCSP) is a constraint satisfaction problem in which preferences among solutions can be expressed. Bucket elimination is a complete technique commonly used to solve this kind of constraint satisfaction problem. When the memory required to apply bucket elimination is too high, a heuristic method based on it (denominated mini-buckets) can be used to calculate bounds for the optimal solution. Nevertheless, the curse of dimensionality makes these techniques impractical on large scale problems. In response to this situation, we present a memetic algorithm for WCSPs in which bucket elimination is used as a mechanism for recombining solutions, providing the best possible child from the parental set. Subsequently, a multi-level model in which this exact/metaheuristic hybrid is further hybridized with branch-and-bound techniques and mini-buckets is studied. As a case study, we have applied these algorithms to the resolution of the maximum density still life problem, a hard constraint optimization problem based on Conways game of life. The resulting algorithm consistently finds optimal patterns for up to date solved instances in less time than current approaches. Moreover, it is shown that this proposal provides new best known solutions for very large instances.
Complex systems are usually represented as an intricate set of relations between their components forming a complex graph or network. The understanding of their functioning and emergent properties are strongly related to their structural properties. The finding of structural patterns is of utmost importance to reduce the problem of understanding the structure-function relationships. Here we propose the analysis of similarity measures between nodes using hierarchical clustering methods. The discrete nature of the networks usually leads to a small set of different similarity values, making standard hierarchical clustering algorithms ambiguous. We propose the use of "multidendrograms", an algorithm that computes agglomerative hierarchical clusterings implementing a variable-group technique that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when there are more than two clusters separated by the same maximum similarity (or minimum distance) during the agglomerative process. Forcing binary trees in this case means breaking ties in some way, thus giving rise to different output clusterings depending on the criterion used. Multidendrograms solves this problem grouping more than two clusters at the same time when ties occur.
Non-linear dimensionality reduction techniques such as manifold learning algorithms have become a common way for processing and analyzing high-dimensional patterns that often have attached a target that corresponds to the value of an unknown function. Their application to new points consists in two steps: first, embedding the new data point into the low dimensional space and then, estimating the function value on the test point from its neighbors in the embedded space. However, finding the low dimension representation of a test point, while easy for simple but often not powerful enough procedures such as PCA, can be much more complicated for methods that rely on some kind of eigenanalysis, such as Spectral Clustering (SC) or Diffusion Maps (DM). Similarly, when a target function is to be evaluated, averaging methods like nearest neighbors may give unstable results if the function is noisy. Thus, the smoothing of the target function with respect to the intrinsic, low-dimensional representation that describes the geometric structure of the examined data is a challenging task. In this paper we propose Auto-adaptive Laplacian Pyramids (ALP), an extension of the standard Laplacian Pyramids model that incorporates a modified LOOCV procedure that avoids the large cost of the standard one and offers the following advantages: (i) it selects automatically the optimal function resolution (stopping time) adapted to the data and its noise, (ii) it is easy to apply as it does not require parameterization, (iii) it does not overfit the training set and (iv) it adds no extra cost compared to other classical interpolation methods. We illustrate numerically ALP's behavior on a synthetic problem and apply it to the computation of the DM projection of new patterns and to the extension to them of target function values on a radiation forecasting problem over very high dimensional patterns.
The use of alternative measures to evaluate classifier performance is gaining attention, specially for imbalanced problems. However, the use of these measures in the classifier design process is still unsolved. In this work we propose a classifier designed specifically to optimize one of these alternative measures, namely, the so-called F-measure. Nevertheless, the technique is general, and it can be used to optimize other evaluation measures. An algorithm to train the novel classifier is proposed, and the numerical scheme is tested with several databases, showing the optimality and robustness of the presented classifier.
We show that the well-known Konig's Min-Max Theorem (KMM), a fundamental result in combinatorial matrix theory, can be proven in the first order theory $\LA$ with induction restricted to $\Sigma_1^B$ formulas. This is an improvement over the standard textbook proof of KMM which requires $\Pi_2^B$ induction, and hence does not yield feasible proofs --- while our new approach does. $\LA$ is a weak theory that essentially captures the ring properties of matrices; however, equipped with $\Sigma_1^B$ induction $\LA$ is capable of proving KMM, and a host of other combinatorial properties such as Menger's, Hall's and Dilworth's Theorems. Therefore, our result formalizes Min-Max type of reasoning within a feasible framework.
We study a network formation game where nodes wish to send traffic to other nodes. Nodes can contract bilaterally other nodes to form bidirectional links as well as nodes can break unilaterally contracts to eliminate the corresponding links. Our model is an extension of the model considered in Arcaute et al. The novelty is that we do no require the traffic to be uniform all-to-all. Each node specifies the amount of traffic that it wants to send to any other node. We characterize stable topologies under a static point of view and we also study the game under a myopic dynamics. We show its convergence to stable networks under some natural assumptions on the contracting functions. Finally we consider the efficiency of pairwise Nash topologies from a social point of view and we show that the problem of deciding the existence stable topologies of a given price is $\NP$-complete.
MultiDendrograms is a Java-written application that computes agglomerative hierarchical clusterings of data. Starting from a distances (or weights) matrix, MultiDendrograms is able to calculate its dendrograms using the most common agglomerative hierarchical clustering methods. The application implements a variable-group algorithm that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when two or more minimum distances between different clusters are equal during the agglomerative process, because then different output clusterings are possible depending on the criterion used to break ties between distances. MultiDendrograms solves this problem implementing a variable-group algorithm that groups more than two clusters at the same time when ties occur.
May 13 2011 cs.SE
Software is among the most complex endeavors of the human mind; large scale systems can have tens of millions of lines of source code. However, seldom is complexity measured above the lowest level of code, and sometimes source code files or low level modules. In this paper a hierarchical approach is explored in order to find a set of metrics that can measure higher levels of organization. These metrics are then used on a few popular free software packages (totaling more than 25 million lines of code) to check their efficiency and coherency.
Mar 30 2011 cs.RO
Current progresses in home automation and service robotic environment have highlighted the need to develop interoperability mechanisms that allow a standard communication between the two systems. During the development of the DHCompliant protocol, the problem of locating mobile devices in an indoor environment has been investigated. The communication of the device with the location service has been carried out to study the time delay that web services offer in front of the sockets. The importance of obtaining data from real-time location systems portends that a basic tool for interoperability, such as web services, can be ineffective in this scenario because of the delays added in the invocation of services. This paper is focused on introducing a web service to resolve a coordinates request without any significant delay in comparison with the sockets.
This paper presents a computational model for the cooperation of constraint domains and an implementation for a particular case of practical importance. The computational model supports declarative programming with lazy and possibly higher-order functions, predicates, and the cooperation of different constraint domains equipped with their respective solvers, relying on a so-called Constraint Functional Logic Programming (CFLP) scheme. The implementation has been developed on top of the CFLP system TOY, supporting the cooperation of the three domains H, R and FD, which supply equality and disequality constraints over symbolic terms, arithmetic constraints over the real numbers, and finite domain constraints over the integers, respectively. The computational model has been proved sound and complete w.r.t. the declarative semantics provided by the $CFLP$ scheme, while the implemented system has been tested with a set of benchmarks and shown to behave quite efficiently in comparison to the closest related approach we are aware of. To appear in Theory and Practice of Logic Programming (TPLP)
The maximum density still life problem (MDSLP) is a hard constraint optimization problem based on Conway's game of life. It is a prime example of weighted constrained optimization problem that has been recently tackled in the constraint-programming community. Bucket elimination (BE) is a complete technique commonly used to solve this kind of constraint satisfaction problem. When the memory required to apply BE is too high, a heuristic method based on it (denominated mini-buckets) can be used to calculate bounds for the optimal solution. Nevertheless, the curse of dimensionality makes these techniques unpractical for large size problems. In response to this situation, we present a memetic algorithm for the MDSLP in which BE is used as a mechanism for recombining solutions, providing the best possible child from the parental set. Subsequently, a multi-level model in which this exact/metaheuristic hybrid is further hybridized with branch-and-bound techniques and mini-buckets is studied. Extensive experimental results analyze the performance of these models and multi-parent recombination. The resulting algorithm consistently finds optimal patterns for up to date solved instances in less time than current approaches. Moreover, it is shown that this proposal provides new best known solutions for very large instances.
Dec 27 2007 cs.DC
Peer to peer (P2P) systems are moving from application specific architectures to a generic service oriented design philosophy. This raises interesting problems in connection with providing useful P2P middleware services capable of dealing with resource assignment and management in a large-scale, heterogeneous and unreliable environment. The slicing service, has been proposed to allow for an automatic partitioning of P2P networks into groups (slices) that represent a controllable amount of some resource and that are also relatively homogeneous with respect to that resource. In this paper we propose two gossip-based algorithms to solve the distributed slicing problem. The first algorithm speeds up an existing algorithm sorting a set of uniform random numbers. The second algorithm statistically approximates the rank of nodes in the ordering. The scalability, efficiency and resilience to dynamics of both algorithms rely on their gossip-based models. These algorithms are proved viable theoretically and experimentally.
Modular structure is ubiquitous in real-world complex networks, and its detection is important because it gives insights in the structure-functionality Modular structure is ubiquitous in real-world complex networks, and its detection is important because it gives insights in the structure-functionality relationship. The standard approach is based on the optimization of a quality function, modularity, which is a relative quality measure for a partition of a network into modules. Recently some authors [1,2] have pointed out that the optimization of modularity has a fundamental drawback: the existence of a resolution limit beyond which no modular structure can be detected even though these modules might have own entity. The reason is that several topological descriptions of the network coexist at different scales, which is, in general, a fingerprint of complex systems. Here we propose a method that allows for multiple resolution screening of the modular structure. The method has been validated using synthetic networks, discovering the predefined structures at all scales. Its application to two real social networks allows to find the exact splits reported in the literature, as well as the substructure beyond the actual split.
The ubiquity of modular structure in real-world complex networks is being the focus of attention in many trials to understand the interplay between network topology and functionality. The best approaches to the identification of modular structure are based on the optimization of a quality function known as modularity. However this optimization is a hard task provided that the computational complexity of the problem is in the NP-hard class. Here we propose an exact method for reducing the size of weighted (directed and undirected) complex networks while maintaining invariant its modularity. This size reduction allows the heuristic algorithms that optimize modularity for a better exploration of the modularity landscape. We compare the modularity obtained in several real complex-networks by using the Extremal Optimization algorithm, before and after the size reduction, showing the improvement obtained. We speculate that the proposed analytical size reduction could be extended to an exact coarse graining of the network in the scope of real-space renormalization.
Dec 07 2006 cs.DC
Peer to peer (P2P) systems are moving from application specific architectures to a generic service oriented design philosophy. This raises interesting problems in connection with providing useful P2P middleware services that are capable of dealing with resource assignment and management in a large-scale, heterogeneous and unreliable environment. One such service, the slicing service, has been proposed to allow for an automatic partitioning of P2P networks into groups (slices) that represent a controllable amount of some resource and that are also relatively homogeneous with respect to that resource, in the face of churn and other failures. In this report we propose two algorithms to solve the distributed slicing problem. The first algorithm improves upon an existing algorithm that is based on gossip-based sorting of a set of uniform random numbers. We speed up convergence via a heuristic for gossip peer selection. The second algorithm is based on a different approach: statistical approximation of the rank of nodes in the ordering. The scalability, efficiency and resilience to dynamics of both algorithms relies on their gossip-based models. We present theoretical and experimental results to prove the viability of these algorithms.
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.
Jan 17 2006 cs.PL
In this paper, we present our proposal to Constraint Functional Logic Programming over Finite Domains (CFLP(FD)) with a lazy functional logic programming language which seamlessly embodies finite domain (FD) constraints. This proposal increases the expressiveness and power of constraint logic programming over finite domains (CLP(FD)) by combining functional and relational notation, curried expressions, higher-order functions, patterns, partial applications, non-determinism, lazy evaluation, logical variables, types, domain variables, constraint composition, and finite domain constraints. We describe the syntax of the language, its type discipline, and its declarative and operational semantics. We also describe TOY(FD), an implementation for CFLPFD(FD), and a comparison of our approach with respect to CLP(FD) from a programming point of view, showing the new features we introduce. And, finally, we show a performance analysis which demonstrates that our implementation is competitive with respect to existing CLP(FD) systems and that clearly outperforms the closer approach to CFLP(FD).
Jan 04 2005 cs.NE
In this paper we apply a heuristic method based on artificial neural networks in order to trace out the efficient frontier associated to the portfolio selection problem. We consider a generalization of the standard Markowitz mean-variance model which includes cardinality and bounding constraints. These constraints ensure the investment in a given number of different assets and limit the amount of capital to be invested in each asset. We present some experimental results obtained with the neural network heuristic and we compare them to those obtained with three previous heuristic methods.
We study \em routing and \em scheduling in packet-switched networks. We assume an adversary that controls the injection time, source, and destination for each packet injected. A set of paths for these packets is \em admissible if no link in the network is overloaded. We present the first on-line routing algorithm that finds a set of admissible paths whenever this is feasible. Our algorithm calculates a path for each packet as soon as it is injected at its source using a simple shortest path computation. The length of a link reflects its current congestion. We also show how our algorithm can be implemented under today's Internet routing paradigms. When the paths are known (either given by the adversary or computed as above) our goal is to schedule the packets along the given paths so that the packets experience small end-to-end delays. The best previous delay bounds for deterministic and distributed scheduling protocols were exponential in the path length. In this paper we present the first deterministic and distributed scheduling protocol that guarantees a polynomial end-to-end delay for every packet. Finally, we discuss the effects of combining routing with scheduling. We first show that some unstable scheduling protocols remain unstable no matter how the paths are chosen. However, the freedom to choose paths can make a difference. For example, we show that a ring with parallel links is stable for all greedy scheduling protocols if paths are chosen intelligently, whereas this is not the case if the adversary specifies the paths.
Sep 25 2001 cs.PL
This paper focuses on the branching process for solving any constraint satisfaction problem (CSP). A parametrised schema is proposed that (with suitable instantiations of the parameters) can solve CSP's on both finite and infinite domains. The paper presents a formal specification of the schema and a statement of a number of interesting properties that, subject to certain conditions, are satisfied by any instances of the schema. It is also shown that the operational procedures of many constraint systems including cooperative systems) satisfy these conditions. Moreover, the schema is also used to solve the same CSP in different ways by means of different instantiations of its parameters.