Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.
In Operation Research, practical evaluation is essential to validate the efficacy of optimization approaches. This paper promotes the usage of performance profiles as a standard practice to visualize and analyze experimental results. It introduces a Web tool to construct and export performance profiles as SVG or HTML files. In addition, the application relies on a methodology to estimate the benefit of hypothetical solver improvements. Therefore, the tool allows one to employ what-if analysis to screen possible research directions, and identify those having the best potential. The approach is showcased on two Operation Research technologies: Constraint Programming and Mixed Integer Linear Programming.
May 10 2016 cs.SE
In the railway domain, an interlocking is the system ensuring safe train traffic inside a station by controlling its active elements such as the signals or points. Modern interlockings are configured using particular data, called application data, reflecting the track layout and defining the actions that the interlocking can take. The safety of the train traffic relies thereby on application data correctness, errors inside them can cause safety issues such as derailments or collisions. Given the high level of safety required by such a system, its verification is a critical concern. In addition to the safety, an interlocking must also ensure that availability properties, stating that no train would be stopped forever in a station, are satisfied. Most of the research dealing with this verification relies on model checking. However, due to the state space explosion problem, this approach does not scale for large stations. More recently, a discrete event simulation approach limiting the verification to a set of likely scenarios, was proposed. The simulation enables the verification of larger stations, but with no proof that all the interesting scenarios are covered by the simulation. In this paper, we apply an intermediate statistical model checking approach, offering both the advantages of model checking and simulation. Even if exhaustiveness is not obtained, statistical model checking evaluates with a parametrizable confidence the reliability and the availability of the entire system.
Apr 25 2016 cs.AI
We consider the problem of selecting the best variable-value strategy for solving a given problem in constraint programming. We show that the recent Embarrassingly Parallel Search method (EPS) can be used for this purpose. EPS proposes to solve a problem by decomposing it in a lot of subproblems and to give them on-demand to workers which run in parallel. Our method uses a part of these subproblems as a simple sample as defined in statistics for comparing some strategies in order to select the most promising one that will be used for solving the remaining subproblems. For each subproblem of the sample, the parallelism helps us to control the running time of the strategies because it gives us the possibility to introduce timeouts by stopping a strategy when it requires more than twice the time of the best one. Thus, we can deal with the great disparity in solving times for the strategies. The selections we made are based on the Wilcoxon signed rank tests because no assumption has to be made on the distribution of the solving times and because these tests can deal with the censored data that we obtain after introducing timeouts. The experiments we performed on a set of classical benchmarks for satisfaction and optimization problems show that our method obtain good performance by selecting almost all the time the best variable-value strategy and by almost never choosing a variable-value strategy which is dramatically slower than the best one. Our method also outperforms the portfolio approach consisting in running some strategies in parallel and is competitive with the multi armed bandit framework.
Apr 25 2016 cs.AI
In this paper, we describe Compact-Table (CT), a bitwise algorithm to enforce Generalized Arc Consistency (GAC) on table con- straints. Although this algorithm is the default propagator for table constraints in or-tools and OscaR, two publicly available CP solvers, it has never been described so far. Importantly, CT has been recently improved further with the introduction of residues, resetting operations and a data-structure called reversible sparse bit-set, used to maintain tables of supports (following the idea of tabular reduction): tuples are invalidated incrementally on value removals by means of bit-set operations. The experimentation that we have conducted with OscaR shows that CT outperforms state-of-the-art algorithms STR2, STR3, GAC4R, MDD4R and AC5-TC on standard benchmarks.
The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.
In the railway domain, an interlocking is a computerised system that controls the railway signalling objects in order to allow a safe operation of the train traffic. Each interlocking makes use of particular data, called application data, that reflects the track layout of the station under control. The verification and validation of the application data are performed manually and is thus error-prone and costly. In this paper, we explain how we built an executable model in NuSMV of a railway interlocking based on the application data. We also detail the tool that we have developed in order to translate the application data into our model automatically. Finally we show how we could verify a realistic set of safety properties on a real-size station model by customizing the existing model-checking algorithm with PyNuSMV a Python library based on NuSMV.