results for au:Ahmed_A in:cs

- Apr 18 2018 cs.CV arXiv:1804.06094v1We show that unsupervised training of latent capsule layers using only the reconstruction loss, without masking to select the correct output class, causes a loss of equivariances and other desirable capsule qualities. This implies that supervised capsules networks can't be very deep. Unsupervised sparsening of latent capsule layer activity both restores these qualities and appears to generalize better than supervised masking, while potentially enabling deeper capsules networks. We train a sparse, unsupervised capsules network of similar geometry to Sabour et al (2017) on MNIST, and then test classification accuracy on affNIST using an SVM layer. Accuracy is improved from benchmark 79% to 90%.
- Feb 13 2018 cs.CV arXiv:1802.04073v2This paper proposes a new framework to regularize the \textitill-posed and \textitnon-linear blind image deconvolution problem by using deep generative priors. We employ two separate deep generative models --- one trained to produce sharp images while the other trained to generate blur kernels from lower-dimensional parameters. The regularized problem is efficiently solved by simple alternating gradient descent algorithm operating in the latent lower-dimensional space of each generative model. We empirically show that by doing so, excellent image deblurring results are achieved even under extravagantly large blurs, and heavy noise. Our proposed method is in stark contrast to the conventional end-to-end approaches, where a deep neural network is trained on blurred input, and the corresponding sharp output images while completely ignoring the knowledge of the underlying forward map (convolution operator) in image blurring.
- Dec 01 2017 cs.LG arXiv:1711.11179v1Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \citezaheer2017latent of combining topic models with LSTM. However, unlike \citezaheer2017latent, we do not make any factorization assumptions in our inference algorithm. We present an efficient sampler based on sequential Monte Carlo (SMC) method that draws from the joint posterior directly. Experimental results confirms the superiority and stability of this SMC inference algorithm on a variety of domains.
- Nov 15 2017 cs.PL arXiv:1711.04559v1Software developers compose systems from components written in many different languages. A business-logic component may be written in Java or OCaml, a resource-intensive component in C or Rust, and a high-assurance component in Coq. In this multi-language world, program execution sends values from one linguistic context to another. This boundary-crossing exposes values to contexts with unforeseen behavior---that is, behavior that could not arise in the source language of the value. For example, a Rust function may end up being applied in an ML context that violates the memory usage policy enforced by Rust's type system. This leads to the question of how developers ought to reason about code in such a multi-language world where behavior inexpressible in one language is easily realized in another. This paper proposes the novel idea of linking types to address the problem of reasoning about single-language components in a multi-lingual setting. Specifically, linking types allow programmers to annotate where in a program they can link with components inexpressible in their unadulterated language. This enables developers to reason about (behavioral) equality using only their own language and the annotations, even though their code may be linked with code written in a language with more expressive power.
- Nov 13 2017 cs.PL arXiv:1711.03871v1We present FunTAL, the first multi-language system to formalize safe interoperability between a high-level functional language and low-level assembly code while supporting compositional reasoning about the mix. A central challenge in developing such a multi-language is bridging the gap between assembly, which is staged into jumps to continuations, and high-level code, where subterms return a result. We present a compositional stack-based typed assembly language that supports components, comprised of one or more basic blocks, that may be embedded in high-level contexts. We also present a logical relation for FunTAL that supports reasoning about equivalence of high-level components and their assembly replacements, mixed-language programs with callbacks between languages, and assembly components comprised of different numbers of basic blocks.
- Nov 09 2017 cs.PL arXiv:1711.03050v2High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named \sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such constant folding, dead code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition.
- Sep 18 2017 cs.CV arXiv:1709.05311v1Millions of surveillance cameras operate at 24x7 generating huge amount of visual data for processing. However, retrieval of important activities from such a large data can be time consuming. Thus, researchers are working on finding solutions to present hours of visual data in a compressed, but meaningful way. Video synopsis is one of the ways to represent activities using relatively shorter duration clips. So far, two main approaches have been used by researchers to address this problem, namely synopsis by tracking moving objects and synopsis by clustering moving objects. Synopses outputs, mainly depend on tracking, segmenting, and shifting of moving objects temporally as well as spatially. In many situations, tracking fails, thus produces multiple trajectories of the same object. Due to this, the object may appear and disappear multiple times within the same synopsis output, which is misleading. This also leads to discontinuity and often can be confusing to the viewer of the synopsis. In this paper, we present a new approach for generating compressed video synopsis by grouping tracklets of moving objects. Grouping helps to generate a synopsis where chronologically related objects appear together with meaningful spatio-temporal relation. Our proposed method produces continuous, but a less confusing synopses when tested on publicly available dataset videos as well as in-house dataset videos.
- Aug 29 2017 cs.CV arXiv:1708.08141v1Object recognition has become a crucial part of machine learning and computer vision recently. The current approach to object recognition involves Deep Learning and uses Convolutional Neural Networks to learn the pixel patterns of the objects implicitly through backpropagation. However, CNNs require thousands of examples in order to generalize successfully and often require heavy computing resources for training. This is considered rather sluggish when compared to the human ability to generalize and learn new categories given just a single example. Additionally, CNNs make it difficult to explicitly programmatically modify or intuitively interpret their learned representations. We propose a computational model that can successfully learn an object category from as few as one example and allows its learning style to be tailored explicitly to a scenario. Our model decomposes each image into two attributes: shape and color distribution. We then use a Bayesian criterion to probabilistically determine the likelihood of each category. The model takes each factor into account based on importance and calculates the conditional probability of the object belonging to each learned category. Our model is not only applicable to visual scenarios, but can also be implemented in a broader and more practical scope of situations such as Natural Language Processing as well as other places where it is possible to retrieve and construct individual attributes. Because the only condition our model presents is the ability to retrieve and construct individual attributes such as shape and color, it can be applied to essentially any class of visual objects.
- Jul 18 2017 cs.PL arXiv:1707.04984v3Instead of a monolithic programming language trying to cover all features of interest, some programming systems are designed by combining together simpler languages that cooperate to cover the same feature space. This can improve usability by making each part simpler than the whole, but there is a risk of abstraction leaks from one language to another that would break expectations of the users familiar with only one or some of the involved languages. We propose a formal specification for what it means for a given language in a multi-language system to be usable without leaks: it should embed into the multi-language in a fully abstract way, that is, its contextual equivalence should be unchanged in the larger system. To demonstrate our proposed design principle and formal specification criterion, we design a multi-language programming system that combines an ML-like statically typed functional language and another language with linear types and linear state. Our goal is to cover a good part of the expressiveness of languages that mix functional programming and linear state (ownership), at only a fraction of the complexity. We prove that the embedding of ML into the multi-language system is fully abstract: functional programmers should not fear abstraction leaks. We show examples of combined programs demonstrating in-place memory updates and safe resource handling, and an implementation extending OCaml with our linear language.
- Jun 14 2017 cs.CV arXiv:1706.03867v3Plant aliveness is proven through laboratory experiments and special scientific instruments. In this paper, we aim to detect the degree of animation of plants based on the magnification of the small color changes in the plant's green leaves using the Eulerian video magnification. Capturing the video under a controlled environment, e.g., using a tripod and direct current (DC) light sources, reduces camera movements and minimizes light fluctuations; we aim to reduce the external factors as much as possible. The acquired video is then stabilized and a proposed algorithm used to reduce the illumination variations. Lastly, the Euler magnification is utilized to magnify the color changes on the light invariant video. The proposed system does not require any special purpose instruments as it uses a digital camera with a regular frame rate. The results of magnified color changes on both natural and plastic leaves show that the live green leaves have color changes in contrast to the plastic leaves. Hence, we can argue that the color changes of the leaves are due to biological operations, such as photosynthesis. To date, this is possibly the first work that focuses on interpreting visually, some biological operations of plants without any special purpose instruments.
- Mar 07 2017 cs.CG arXiv:1703.01544v1In an $\mathsf{L}$-embedding of a graph, each vertex is represented by an $\mathsf{L}$-segment, and two segments intersect each other if and only if the corresponding vertices are adjacent in the graph. If the corner of each $\mathsf{L}$-segment in an $\mathsf{L}$-embedding lies on a straight line, we call it a monotone $\mathsf{L}$-embedding. In this paper we give a full characterization of monotone $\mathsf{L}$-embeddings by introducing a new class of graphs which we call "non-jumping" graphs. We show that a graph admits a monotone $\mathsf{L}$-embedding if and only if the graph is a non-jumping graph. Further, we show that outerplanar graphs, convex bipartite graphs, interval graphs, 3-leaf power graphs, and complete graphs are subclasses of non-jumping graphs. Finally, we show that distance-hereditary graphs and $k$-leaf power graphs ($k\le 4$) admit $\mathsf{L}$-embeddings.
- We consider the bilinear inverse problem of recovering two vectors, $x$ and $w$, in $\mathbb{R}^L$ from their entrywise product. For the case where the vectors have known signs and belong to known subspaces, we introduce the convex program BranchHull, which is posed in the natural parameter space and does not require an approximate solution or initialization in order to be stated or solved. Under the structural assumptions that $x$ and $w$ are the members of known $K$ and $N$ dimensional random subspaces, we prove that BranchHull recovers $x$ and $w$ up to the inherent scaling ambiguity with high probability whenever $L \gtrsim K+N$. This program is motivated by applications in blind deconvolution and self-calibration.
- Jan 24 2017 cs.CY arXiv:1701.06429v1Effective monitoring and management of environment pollution is key to the development of modern metropolitan cities. To sustain and to cope with the exponential growth of the cities with high industrialization, expert decision making is very essential in this process. A good governance system must be supported by an actively participating population. In participatory sensing, individuals and groups engages in the data collection actively and the helps the city governance to make proper decisions. In this paper, we propose a participatory sensing based three-tier framework to fight environment pollution in urban areas of Bangladesh. The framework includes an android application named `My City, My Environment', a server for storage and computation and also a web server for the authority to monitor and maintain environmental issues through expert decision making. We have already developed a prototype system and deployed it to a small scale and demonstrated the effectiveness of this framework.
- This paper considers recovering $L$-dimensional vectors $\boldsymbol{w}$, and $\boldsymbol{x}_1,\boldsymbol{x}_2, \ldots, \boldsymbol{x}_N$ from their circular convolutions $\boldsymbol{y}_n = \boldsymbol{w}*\boldsymbol{x}_n, \ n = 1,2,3, \ldots, N$. The vector $\boldsymbol{w}$ is assumed to be $S$-sparse in a known basis that is spread out in the Fourier domain, and each input $\boldsymbol{x}_n$ is a member of a known $K$-dimensional random subspace. We prove that whenever $K + S\log^2S \lesssim L /\log^4(LN)$, the problem can be solved effectively by using only the nuclear-norm minimization as the convex relaxation, as long as the inputs are sufficiently diverse and obey $N \gtrsim \log^2(LN)$. By "diverse inputs", we mean that the $\boldsymbol{x}_n$'s belong to different, generic subspaces. To our knowledge, this is the first theoretical result on blind deconvolution where the subspace to which $\boldsymbol{w}$ belongs is not fixed, but needs to be determined. We discuss the result in the context of multipath channel estimation in wireless communications. Both the fading coefficients, and the delays in the channel impulse response $\boldsymbol{w}$ are unknown. The encoder codes the $K$-dimensional message vectors randomly and then transmits coded messages $\boldsymbol{x}_n$'s over a fixed channel one after the other. The decoder then discovers all of the messages and the channel response when the number of samples taken for each received message are roughly greater than $(K+S\log^2S)\log^4(LN)$, and the number of messages is roughly at least $\log^2(LN)$.
- Orthogonal Frequency Division Multiplexing (OFDM) has gained a lot of popularity over the years. Due to its popularity, OFDM has been adopted as a standard in cellular technology and Wireless Local Area Network (WLAN) communication systems. To improve the bit error rate (BER) performance, forward error correction (FEC) codes are often utilized to protect signals against unknown interference and channel degradations. In this paper, we apply soft-decision FEC, more specifically polar codes and a convolutional code, to an OFDM system in a quasi-static multipath fading channel, and compare BER performance in various channels. We investigate the effect of interleaving bits within a polar codeword. Finally, the simulation results for each case are presented in the paper.
- This paper investigates conditions under which certain kinds of systems of bilinear equations have a unique structured solution. In particular, we look at when we can recover vectors $\boldsymbol{w},\boldsymbol{q}$ from observations of the form \[ y_\ell = <\boldsymbolw,\boldsymbolb_\ell><\boldsymbolc_\ell,\boldsymbolq>, \quad \ell = 1,\ldots,L, \]where $\boldsymbol{b}_\ell,\boldsymbol{c}_\ell$ are known. We show that if $\boldsymbol{w}\in\mathbb{C}^{M_1}$ and $\boldsymbol{q}\in\mathbb{C}^{M_2}$ are sparse, with no more than $K$ and $N$ nonzero entries, respectively, and the $\boldsymbol{b}_\ell,\boldsymbol{c}_\ell$ are generic, selected as independent Gaussian random vectors, then $\boldsymbol{w},\boldsymbol{q}$ are uniquely determined from \[L ≥\mathrmConst⋅(K+N)\log^5(M_1M_2) \]such equations with high probability. The key ingredient in our analysis is a uniform probabilistic bound on how far a random process of the form \[Z(\boldsymbolX) = \sum_\ell=1^L|\boldsymbolb_\ell^*\boldsymbolX\boldsymbolc_\ell|^2 \]deviates from its mean over a set of structured matrices $\boldsymbol{X}\in\mathcal{X}$. As both $\boldsymbol{b}_\ell$ and $\boldsymbol{c}_\ell$ are random, this is a specialized type of $4$th order chaos; we refer to $Z(\boldsymbol{X})$ as an \em empirical chaos process. Bounding this process yields a set of general conditions for when the map $\boldsymbol{X}\rightarrow \{\boldsymbol{b}_\ell^*\boldsymbol{X}\boldsymbol{c}_\ell\}_{\ell=1}^L$ is a restricted isometry over the set of matrices $\mathcal{X}$. The conditions are stated in terms of general geometric properties of the set $\mathcal{X}$, and are explicitly computed for the case where $\mathcal{X}$ is the set of matrices that are simultaneously sparse and low rank.
- Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to interpret and give recommendations that are hard to clearly explain to users. In contrast, in this paper, we attack this problem through succinct additive co-clustering. We devise a novel Bayesian technique for summing co-clusterings of Poisson distributions. With this novel technique we propose a new Bayesian model for joint collaborative filtering of ratings and text reviews through a sum of simple co-clusterings. The simple structure of our model yields easily interpretable recommendations. Even with a simple, succinct structure, our model outperforms competitors in terms of predicting ratings with reviews.
- Latent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications. A large amount of effort has been made to develop systems that is able to extend the systems to a large scale, in the hope to make use of them on industry scale data. In this paper, we describe a system that operates at a scale orders of magnitude higher than previous works, and an order of magnitude faster than state-of-the-art system at the same scale, at the same time showing more robustness and more accurate results. Our system uses a number of advances in distributed inference: high performance in synchronization of sufficient statistics with relaxed consistency model; fast sampling, using the Metropolis-Hastings-Walker method to overcome dense generative models; statistical modeling, moving beyond Latent Dirichlet Allocation (LDA) to Pitman-Yor distributions (PDP) and Hierarchical Dirichlet Process (HDP) models; sophisticated parameter projection schemes, to resolve the conflicts within the constraint between parameters arising from the relaxed consistency model. This work significantly extends the domain of applicability of what is commonly known as the Parameter Server. We obtain results with up to hundreds billion oftokens, thousands of topics, and a vocabulary of a few million token-types, using up to 60,000 processor cores operating on a production cluster of a large Internet company. This demonstrates the feasibility to scale to problems orders of magnitude larger than any previously published work.
- We propose several sampling architectures for the efficient acquisition of an ensemble of correlated signals. We show that without prior knowledge of the correlation structure, each of our architectures (under different sets of assumptions) can acquire the ensemble at a sub-Nyquist rate. Prior to sampling, the analog signals are diversified using simple, implementable components. The diversification is achieved by injecting types of "structured randomness" into the ensemble, the result of which is subsampled. For reconstruction, the ensemble is modeled as a low-rank matrix that we have observed through an (undetermined) set of linear equations. Our main results show that this matrix can be recovered using standard convex programming techniques when the total number of samples is on the order of the intrinsic degree of freedom of the ensemble --- the more heavily correlated the ensemble, the fewer samples are needed. To motivate this study, we discuss how such ensembles arise in the context of array processing.
- Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model learned, significantly beats all factorization approaches to matrix approximation. Even more surprisingly, we find that summing over small co-clusterings is more effective in modeling matrices than classic co-clustering, which uses just one large partitioning of the matrix. Following Occam's razor principle suggests that the simple structure induced by our model better captures the latent preferences and decision making processes present in the real world than classic co-clustering or matrix factorization. We provide an iterative minimization algorithm, a collapsed Gibbs sampler, theoretical guarantees for matrix approximation, and excellent empirical evidence for the efficacy of our approach. We achieve state-of-the-art results on the Netflix problem with a fraction of the model complexity.
- Dec 05 2014 cs.CV arXiv:1412.1506v1Mass abnormality segmentation is a vital step for the medical diagnostic process and is attracting more and more the interest of many research groups. Currently, most of the works achieved in this area have used the Gray Level Co-occurrence Matrix (GLCM) as texture features with a region-based approach. These features come in previous phase for segmentation stage or are using as inputs to classification stage. The work discussed in this paper attempts to experiment the GLCM method under a contour-based approach. Besides, we experiment the proposed approach on various tissues densities to bring more significant results. At this end, we explored some challenging breast images from BIRADS medical Data Base. Our first experimentations showed promising results with regard to the edges mass segmentation methods. This paper discusses first the main works achieved in this area. Sections 2 and 3 present materials and our methodology. The main results are showed and evaluated before concluding our paper.
- The pedagogy of teaching and learning has changed with the proliferation of communication technology and it is necessary to develop interactive learning materials for children that may improve their learning, catching, and memorizing capabilities. Perhaps, one of the most important innovations in the age of technology is multimedia and its application. It is imperative to create high quality and realistic learning environment for children. Interactive learning materials can be easier to understand and deal with their first learning. We developed some interactive learning materials in the form of a video for Playgroup using multimedia application tools. This study investigated the impact of students' abilities to acquire new knowledge or skills through interactive learning materials. We visited one kindergartens (Nursery schools), interviewed class teachers about their teaching methods and level of students' ability of recognizing English alphabets, pictures, etc. The course teachers were provided interactive learning materials to show their playgroups for a number of sessions. The video included English alphabets with related words and pictures, and motivational fun. We noticed that almost all children were very interested to interact with their leaning video. The students were assessed individually and asked to recognize the alphabets, and pictures. The students adapted with their first alphabets very quickly. However, there were individual differences in their cognitive development. This interactive multimedia can be an alternative to traditional pedagogy for teaching playgroups.
- Teachers have tried to teach their students by introducing text books along with verbal instructions in traditional education system. However, teaching and learning methods could be changed for developing Information and Communication Technology. It's time to adapt students with interactive learning system so that they can improve their learning, catching, and memorizing capabilities. It is indispensable to create high quality and realistic leaning environment for students. Visual learning can be easier to understand and deal with their learning. We developed visual learning materials in the form of video for students of primary level using different multimedia application tools. The objective of this paper is to examine the impact of students abilities to acquire new knowledge or skills through visual learning materials and blended leaning that is integration of visual learning materials with teachers instructions. We visited a primary school in Dhaka city for this study and conducted teaching with three different groups of students, (i) teacher taught students by traditional system on same materials and marked level of students ability to adapt by a set of questions, (ii) another group was taught with only visual learning material and assessment was done with 15 questionnaires, (iii) the third group was taught with the video of solar system combined with teachers instructions and assessed with the same questionnaires. This integration of visual materials with verbal instructions is a blended approach of learning. The interactive blended approach greatly promoted students ability of acquisition of knowledge and skills. Students response and perception were very positive towards the blended technique than the other two methods. This interactive blending leaning system may be an appropriate method especially for school children.
- Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing techniques that support explanation. We state and prove correctness properties for these techniques and present a proof-of-concept implementation in Haskell.
- Evolution of cooperation is a widely studied problem in biology, social science, economics, and artificial intelligence. Most of the existing approaches that explain cooperation rely on some notion of direct or indirect reciprocity. These reciprocity based models assume agents recognize their partner and know their previous interactions, which requires advanced cognitive abilities. In this paper we are interested in developing a model that produces cooperation without requiring any explicit memory of previous game plays. Our model is based on the notion of, a concept introduced within behavioral economics, whereby individuals care about payoff equality in outcomes. Here we explore the effect of using income inequality to guide partner selection and interaction. We study our model by considering both the well-mixed and the spatially structured population and present the conditions under which cooperation becomes dominant. Our results support the hypothesis that inequity aversion promotes cooperative relationship among nonkin.
- Jan 13 2014 cs.DB arXiv:1401.2250v1Fast and efficient data management is one of the demanding technologies of todays aspect. This paper proposes a system which makes the working procedures of present manual system of storing and retrieving huge citizens information of Bangladesh automated and increases its effectiveness. The implemented search methodology is user friendly and efficient enough for high speed data retrieval ignoring spelling error in the input keywords used for searching a particular citizen. The main concern in this research is minimizing the total searching time for a given keyword. This can be done if we can pre-establish the idea of getting the data belonging to the searching keyword. The primary and secondary key-code generated by the Double Metaphone Algorithm for each word is used to establish that idea about the word. This algorithm is used for creating the map of the original database, through which the keyword is matched against the data.
- Due to the rapid growth of internet broadband access and proliferation of modern mobile devices, various types of multimedia (e.g. text, images, audios and videos) have become ubiquitously available anytime. Mobile device users usually store and use multimedia contents based on their personal interests and preferences. Mobile device challenges such as storage limitation have however introduced the problem of mobile multimedia overload to users. In order to tackle this problem, researchers have developed various techniques that recommend multimedia for mobile users. In this survey paper, we examine the importance of mobile multimedia recommendation systems from the perspective of three smart communities, namely, mobile social learning, mobile event guide and context-aware services. A cautious analysis of existing research reveals that the implementation of proactive, sensor-based and hybrid recommender systems can improve mobile multimedia recommendations. Nevertheless, there are still challenges and open issues such as the incorporation of context and social properties, which need to be tackled in order to generate accurate and trustworthy mobile multimedia recommendations.
- Oct 24 2013 cs.PL arXiv:1310.6299v2Provenance is an increasing concern due to the ongoing revolution in sharing and processing scientific data on the Web and in other computer systems. It is proposed that many computer systems will need to become provenance-aware in order to provide satisfactory accountability, reproducibility, and trust for scientific or other high-value data. To date, there is not a consensus concerning appropriate formal models or security properties for provenance. In previous work, we introduced a formal framework for provenance security and proposed formal definitions of properties called disclosure and obfuscation. In this article, we study refined notions of positive and negative disclosure and obfuscation in a concrete setting, that of a general-purpose programing language. Previous models of provenance have focused on special-purpose languages such as workflows and database queries. We consider a higher-order, functional language with sums, products, and recursive types and functions, and equip it with a tracing semantics in which traces themselves can be replayed as computations. We present an annotation-propagation framework that supports many provenance views over traces, including standard forms of provenance studied previously. We investigate some relationships among provenance views and develop some partial solutions to the disclosure and obfuscation problems, including correct algorithms for disclosure and positive obfuscation based on trace slicing.
- We present a general architecture for the acquisition of ensembles of correlated signals. The signals are multiplexed onto a single line by mixing each one against a different code and then adding them together, and the resulting signal is sampled at a high rate. We show that if the $M$ signals, each bandlimited to $W/2$ Hz, can be approximated by a superposition of $R < M$ underlying signals, then the ensemble can be recovered by sampling at a rate within a logarithmic factor of $RW$ (as compared to the Nyquist rate of $MW$). This sampling theorem shows that the correlation structure of the signal ensemble can be exploited in the acquisition process even though it is unknown a priori. The reconstruction of the ensemble is recast as a low-rank matrix recovery problem from linear measurements. The architectures we are considering impose a certain type of structure on the linear operators. Although our results depend on the mixing forms being random, this imposed structure results in a very different type of random projection than those analyzed in the low-rank recovery literature to date.
- Classical deterministic simulations of epidemiological processes, such as those based on System Dynamics, produce a single result based on a fixed set of input parameters with no variance between simulations. Input parameters are subsequently modified on these simulations using Monte-Carlo methods, to understand how changes in the input parameters affect the spread of results for the simulation. Agent Based simulations are able to produce different output results on each run based on knowledge of the local interactions of the underlying agents and without making any changes to the input parameters. In this paper we compare the influence and effect of variation within these two distinct simulation paradigms and show that the Agent Based simulation of the epidemiological SIR (Susceptible, Infectious, and Recovered) model is more effective at capturing the natural variation within SIR compared to an equivalent model using System Dynamics with Monte-Carlo simulation. To demonstrate this effect, the SIR model is implemented using both System Dynamics (with Monte-Carlo simulation) and Agent Based Modelling based on previously published empirical data.
- We consider the problem of recovering two unknown vectors, $\w$ and $\x$, of length $L$ from their circular convolution. We make the structural assumption that the two vectors are members of known subspaces, one with dimension $N$ and the other with dimension $K$. Although the observed convolution is nonlinear in both $\w$ and $\x$, it is linear in the rank-1 matrix formed by their outer product $\w\x^*$. This observation allows us to recast the deconvolution problem as low-rank matrix recovery problem from linear measurements, whose natural convex relaxation is a nuclear norm minimization program. We prove the effectiveness of this relaxation by showing that for "generic" signals, the program can deconvolve $\w$ and $\x$ exactly when the maximum of $N$ and $K$ is almost on the order of $L$. That is, we show that if $\x$ is drawn from a random subspace of dimension $N$, and $\w$ is a vector in a subspace of dimension $K$ whose basis vectors are "spread out" in the frequency domain, then nuclear norm minimization recovers $\w\x^*$ without error. We discuss this result in the context of blind channel estimation in communications. If we have a message of length $N$ which we code using a random $L\times N$ coding matrix, and the encoded message travels through an unknown linear time-invariant channel of maximum length $K$, then the receiver can recover both the channel response and the message when $L\gtrsim N+K$, to within constant and log factors.
- Congestion games model a wide variety of real-world resource congestion problems, such as selfish network routing, traffic route guidance in congested areas, taxi fleet optimization and crowd movement in busy areas. However, existing research in congestion games assumes: (a) deterministic movement of agents between resources; and (b) perfect rationality (i.e. maximizing their own expected value) of all agents. Such assumptions are not reasonable in dynamic domains where decision support has to be provided to humans. For instance, in optimizing the performance of a taxi fleet serving a city, movement of taxis can be involuntary or nondeterministic (decided by the specific customer who hires the taxi) and more importantly, taxi drivers may not follow advice provided by the decision support system (due to bounded rationality of humans). To that end, we contribute: (a) a general framework for representing congestion games under uncertainty for populations with assorted notions of rationality. (b) a scalable approach for solving the decision problem for perfectly rational agents which are in the mix with boundedly rational agents; and (c) a detailed evaluation on a synthetic and realworld data set to illustrate the usefulness of our new approach with respect to key social welfare metrics in the context of an assorted human-agent population. An interesting result from our experiments on a real-world taxi fleet optimization problem is that it is better (in terms of revenue and operational efficiency) for taxi drivers to follow perfectly rational strategies irrespective of the percentage of drivers not following the advice.
- We describe Hokusai, a real time system which is able to capture frequency information for streams of arbitrary sequences of symbols. The algorithm uses the CountMin sketch as its basis and exploits the fact that sketching is linear. It provides real time statistics of arbitrary events, e.g. streams of queries as a function of time. We use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. Queries can be answered in constant time.
- Sep 25 2012 cs.OH arXiv:1209.5420v1Fully controlled digital home had always been considered as a luxury of rich people because of excessive cost to install the system. It is now within the reach of mass people with lots of inexpensive cool features. In this paper we have designed and developed a very low cost, efficient and reliable Digital home system. Fully Controlled Digital Home is no more a Luxury. Our proposed system made it affordable. We built a low-cost feature-rich Digital Home System (DHS). Digital Home System is combination of automated services i.e. Electronic Device Controller, IR Security System, Web Desktop, Remote Video Surveillance System and Virtual Mobile by which we can control our home by avoiding old manual processes e.g. our physical presence at home is optional. The System provides some of the modern luxury & security features to us. Now we can control Light, fan, AC or any electronic devices by voice command, Blue-tooth, GPRS or Website. To control the system remotely, GPRS connectivity is added. We can also monitor our home from remote area by using Remote Video Surveillance System. This enables live video into mobile device of the digital home. Moreover, we can also access our PC and do the necessary tasks from any internet enabled computer in the world by using Web Desktop which is specially built for this purpose. Furthermore, access of unauthorized person in the home will be notified by SMS & store the image of the person and also generate a voice alarm. So that ensures the security of our valuable things. Also we can identify and monitor the location of our valuable assets e.g. precious metals remotely. Finally, Virtual mobile application is a Universal mobile Driver by which we can exactly perform some same task e.g. Remote call, Phone book access, SMS read-write of our mobile device from our new invented computer's virtual mobile.
- Jul 03 2012 cs.DB arXiv:1207.0136v1Recommender systems based on latent factor models have been effectively used for understanding user interests and predicting future actions. Such models work by projecting the users and items into a smaller dimensional space, thereby clustering similar users and items together and subsequently compute similarity between unknown user-item pairs. When user-item interactions are sparse (sparsity problem) or when new items continuously appear (cold start problem), these models perform poorly. In this paper, we exploit the combination of taxonomies and latent factor models to mitigate these issues and improve recommendation accuracy. We observe that taxonomies provide structure similar to that of a latent factor model: namely, it imposes human-labeled categories (clusters) over items. This leads to our proposed taxonomy-aware latent factor model (TF) which combines taxonomies and latent factors using additive models. We develop efficient algorithms to train the TF models, which scales to large number of users/items and develop scalable inference/recommendation algorithms by exploiting the structure of the taxonomy. In addition, we extend the TF model to account for the temporal dynamics of user interests using high-order Markov chains. To deal with large-scale data, we develop a parallel multi-core implementation of our TF model. We empirically evaluate the TF model for the task of predicting user purchases using a real-world shopping dataset spanning more than a million users and products. Our experiments demonstrate the benefits of using our TF models over existing approaches, in terms of both prediction accuracy and running time.
- Jun 05 2012 cs.DL arXiv:1206.0570v1The eXtensible Markup Language (XML) can be used as data exchange format in different domains. It allows different parties to exchange data by providing common understanding of the basic concepts in the domain. XML covers the syntactic level, but lacks support for reasoning. Ontology can provide a semantic representation of domain knowledge which supports efficient reasoning and expressive power. One of the most popular ontology languages is the Web Ontology Language (OWL). It can represent domain knowledge using classes, properties, axioms and instances for the use in a distributed environment such as the World Wide Web. This paper presents a new method for automatic generation of OWL ontology from XML data sources.
- Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome.
- Appel and McAllester's "step-indexed" logical relations have proven to be a simple and effective technique for reasoning about programs in languages with semantically interesting types, such as general recursive types and general reference types. However, proofs using step-indexed models typically involve tedious, error-prone, and proof-obscuring step-index arithmetic, so it is important to develop clean, high-level, equational proof principles that avoid mention of step indices. In this paper, we show how to reason about binary step-indexed logical relations in an abstract and elegant way. Specifically, we define a logic LSLR, which is inspired by Plotkin and Abadi's logic for parametricity, but also supports recursively defined relations by means of the modal "later" operator from Appel, Melliès, Richards, and Vouillon's "very modal model" paper. We encode in LSLR a logical relation for reasoning relationally about programs in call-by-value System F extended with general recursive types. Using this logical relation, we derive a set of useful rules with which we can prove contextual equivalence and approximation results without counting steps.
- Provenance is information about the origin, derivation, ownership, or history of an object. It has recently been studied extensively in scientific databases and other settings due to its importance in helping scientists judge data validity, quality and integrity. However, most models of provenance have been stated as ad hoc definitions motivated by informal concepts such as "comes from", "influences", "produces", or "depends on". These models lack clear formalizations describing in what sense the definitions capture these intuitive concepts. This makes it difficult to compare approaches, evaluate their effectiveness, or argue about their validity. We introduce provenance traces, a general form of provenance for the nested relational calculus (NRC), a core database query language. Provenance traces can be thought of as concrete data structures representing the operational semantics derivation of a computation; they are related to the traces that have been used in self-adjusting computation, but differ in important respects. We define a tracing operational semantics for NRC queries that produces both an ordinary result and a trace of the execution. We show that three pre-existing forms of provenance for the NRC can be extracted from provenance traces. Moreover, traces satisfy two semantic guarantees: consistency, meaning that the traces describe what actually happened during execution, and fidelity, meaning that the traces "explain" how the expression would behave if the input were changed. These guarantees are much stronger than those contemplated for previous approaches to provenance; thus, provenance traces provide a general semantic foundation for comparing and unifying models of provenance in databases.
- Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.