Databases (cs.DB)

  • PDF
    Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DataSpread, to holistically integrate spreadsheets as a front-end interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make a first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-Hard; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 20% reduction in storage, and up to 50% reduction in formula evaluation time.
  • PDF
    Outsourcing data into the cloud becomes popular thanks to the pay-as-you-go paradigm. However, such practice raises privacy concerns. The conventional way to achieve data privacy is to encrypt sensitive data before outsourcing. When data are encrypted, a trade-off must be achieved between security and efficient query processing. Existing solutions that adopt multiple encryption schemes induce a heavy overhead in terms of data storage and query performance, and are not suited for cloud data warehouses. In this paper, we propose an efficient additive encryption scheme (S4) based on Shamir's secret sharing for securing data warehouses in the cloud. S4 addresses the shortcomings of existing approaches by reducing overhead while still enforcing good data privacy. Experimental results show the efficiency of S4 in terms of computation and storage overhead with respect to existing solutions.
  • PDF
    Reasoning over semantically annotated data is an emerging trend in stream processing aiming to produce sound and complete answers to a set of continuous queries. It usually comes at the cost of finding a trade-off between data throughput and the cost of expressive inferences. Strider-lsa proposes such a trade-off and combines a scalable RDF stream processing engine with an efficient reasoning system. The main reasoning tasks are based on a query rewriting approach for SPARQL that benefits from an intelligent encoding of RDFS+ (RDFS + owl:sameAs) ontology elements. Strider-lsa runs in production at a major international water management company to detect anomalies from sensor streams. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance.