Databases (cs.DB)

  • PDF
    Complex Event Recognition applications exhibit various types of uncertainty, ranging from incomplete and erroneous data streams to imperfect complex event patterns. We review Complex Event Recognition techniques that handle, to some extent, uncertainty. We examine techniques based on automata, probabilistic graphical models and first-order logic, which are the most common ones, and approaches based on Petri Nets and Grammars, which are less frequently used. A number of limitations are identified with respect to the employed languages, their probabilistic models and their performance, as compared to the purely deterministic cases. Based on those limitations, we highlight promising directions for future work.
  • PDF
    We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear$^\ast$ delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis. $^\ast)$ By sublinear we mean $O(n^{1-\varepsilon})$ for some $\varepsilon>0$, where $n$ is the size of the active domain of the current database.
  • PDF
    Understanding the influence of a product is crucially important for making informed business decisions. This paper introduces a new type of skyline queries, called uncertain reverse skyline, for measuring the influence of a probabilistic product in uncertain data settings. More specifically, given a dataset of probabilistic products P and a set of customers C, an uncertain reverse skyline of a probabilistic product q retrieves all customers c in C which include q as one of their preferred products. We present efficient pruning ideas and techniques for processing the uncertain reverse skyline query of a probabilistic product using R-Tree data index. We also present an efficient parallel approach to compute the uncertain reverse skyline and influence score of a probabilistic product. Our approach significantly outperforms the baseline approach derived from the existing literature. The efficiency of our approach is demonstrated by conducting extensive experiments with both real and synthetic datasets.
  • PDF
    The Apriori algorithm that mines frequent itemsets is one of the most popular and widely used data mining algorithms. Now days many algorithms have been proposed on parallel and distributed platforms to enhance the performance of Apriori algorithm. They differ from each other on the basis of load balancing technique, memory system, data decomposition technique and data layout used to implement them. The problems with most of the distributed framework are overheads of managing distributed system and lack of high level parallel programming language. Also with grid computing there is always potential chances of node failures which cause multiple re-executions of tasks. These problems can be overcome by the MapReduce framework introduced by Google. MapReduce is an efficient, scalable and simplified programming model for large scale distributed data processing on a large cluster of commodity computers and also used in cloud computing. In this paper, we present the overview of parallel Apriori algorithm implemented on MapReduce framework. They are categorized on the basis of Map and Reduce functions used to implement them e.g. 1-phase vs. k-phase, I/O of Mapper, Combiner and Reducer, using functionality of Combiner inside Mapper etc. This survey discusses and analyzes the various implementations of Apriori on MapReduce framework on the basis of their distinguishing characteristics. Moreover, it also includes the advantages and limitations of MapReduce framework.