results for au:Zhou_W in:cs

- Feb 12 2018 cs.CR arXiv:1802.03110v1The future of Internet of Things (IoT) is already upon us. IoT applications have been widely used in many field of social production and social living such as healthcare, energy and industrial automation. While enjoying the convenience and efficiency that IoT brings to us, new threats from IoT also have emerged. There are increasing research works to ease these threats, but many problems remain open. To better understand the essential reasons of new threats and the challenges in current research, this survey first proposes the concept of "IoT features". Then, the security and privacy effects of eight IoT new features were discussed including the threats they cause, existing solutions and challenges yet to be solved. To help researchers follow the up-to-date works in this field, this paper finally illustrates the developing trend of IoT security research and reveals how IoT features affect existing security research by investigating most existing research works related to IoT security from 2013 to 2017.
- Jan 31 2018 cs.CV arXiv:1801.10111v1Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.
- Nov 28 2017 cs.DC arXiv:1711.09123v1The Cloud computing paradigm has revolutionized the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.
- Oct 24 2017 cs.AI arXiv:1710.07983v3Apprenticeship learning (AL) is a class of "learning from demonstrations" techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.
- Oct 17 2017 cs.CR arXiv:1710.05095v1With the development of Big Data and cloud data sharing, privacy preserving data publishing becomes one of the most important topics in the past decade. As one of the most influential privacy definitions, differential privacy provides a rigorous and provable privacy guarantee for data publishing. Differentially private interactive publishing achieves good performance in many applications; however, the curator has to release a large number of queries in a batch or a synthetic dataset in the Big Data era. To provide accurate non-interactive publishing results in the constraint of differential privacy, two challenges need to be tackled: one is how to decrease the correlation between large sets of queries, while the other is how to predict on fresh queries. Neither is easy to solve by the traditional differential privacy mechanism. This paper transfers the data publishing problem to a machine learning problem, in which queries are considered as training samples and a prediction model will be released rather than query results or synthetic datasets. When the model is published, it can be used to answer current submitted queries and predict results for fresh queries from the public. Compared with the traditional method, the proposed prediction model enhances the accuracy of query results for non-interactive publishing. Experimental results show that the proposed solution outperforms traditional differential privacy in terms of Mean Absolute Value on a large group of queries. This also suggests the learning model can successfully retain the utility of published queries while preserving privacy.
- Network slicing has been advocated by both academia and industry as a cost-efficient way to enable operators to provide networks on an as-a-service basis and meet the wide range of use cases that the fifth generation wireless network will serve. The existing works on network slicing are mainly targeted at the partition of the core network, and the prospect of network slicing in radio access networks should be jointly exploited. To solve this challenge, an enhanced network slicing in fog radio access networks (F-RANs), termed as access slicing, is proposed. This article comprehensively presents a novel architecture and related key techniques for access slicing in F-RANs. The proposed hierarchical architecture of access slicing consists of centralized orchestration layer and slice instance layer, which makes the access slicing adaptively implement in an convenient way. Meanwhile, key techniques and their corresponding solutions, including the radio and cache resource management, as well as the social-aware slicing, are presented. Open issues in terms of standardization developments and field trials are identified.
- Online interactive recommender systems strive to promptly suggest to consumers appropriate items (e.g., movies, news articles) according to the current context including both the consumer and item content information. However, such context information is often unavailable in practice for the recommendation, where only the users' interaction data on items can be utilized. Moreover, the lack of interaction records, especially for new users and items, worsens the performance of recommendation further. To address these issues, collaborative filtering (CF), one of the recommendation techniques relying on the interaction data only, as well as the online multi-armed bandit mechanisms, capable of achieving the balance between exploitation and exploration, are adopted in the online interactive recommendation settings, by assuming independent items (i.e., arms). Nonetheless, the assumption rarely holds in reality, since the real-world items tend to be correlated with each other (e.g., two articles with similar topics). In this paper, we study online interactive collaborative filtering problems by considering the dependencies among items. We explicitly formulate the item dependencies as the clusters on arms, where the arms within a single cluster share the similar latent topics. In light of the topic modeling techniques, we come up with a generative model to generate the items from their underlying topics. Furthermore, an efficient online algorithm based on particle learning is developed for inferring both latent parameters and states of our model. Additionally, our inferred model can be naturally integrated with existing multi-armed selection strategies in the online interactive collaborating setting. Empirical studies on two real-world applications, online recommendations of movies and news, demonstrate both the effectiveness and efficiency of the proposed approach.
- Jun 20 2017 cs.GR arXiv:1706.05864v13D feature descriptor provide information between corresponding models and scenes. 3D objection recognition in cluttered scenes, however, remains a largely unsolved problem. Practical applications impose several challenges which are not fully addressed by existing methods. Especially in cluttered scenes there are many feature mismatches between scenes and models. We therefore propose Histograms of Gaussian Normal Distribution (HGND) for extracting salient features on a local reference frame (LRF) that enables us to solve this problem. We propose a LRF on each local surface patches using the scatter matrix's eigenvectors. Then the HGND information of each salient point is calculated on the LRF, for which we use both the mesh and point data of the depth image. Experiments on 45 cluttered scenes of the Bologna Dataset and 50 cluttered scenes of the UWA Dataset are made to evaluate the robustness and descriptiveness of our HGND. Experiments carried out by us demonstrate that HGND obtains a more reliable matching rate than state-of-the-art approaches in cluttered situations.
- The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval. With the ignorance of visual content as a ranking clue, methods with text search techniques for visual retrieval may suffer inconsistency between the text words and visual content. Content-based image retrieval (CBIR), which makes use of the representation of visual content to identify relevant images, has attracted sustained attention in recent two decades. Such a problem is challenging due to the intention gap and the semantic gap problems. Numerous techniques have been developed for content-based image retrieval in the last decade. The purpose of this paper is to categorize and evaluate those algorithms proposed during the period of 2003 to 2016. We conclude with several promising directions for future research.
- Jun 16 2017 physics.soc-ph cs.SI arXiv:1706.04730v1The availability of big data recorded from massively multiplayer online role-playing games (MMORPGs) allows us to gain a deeper understanding of the potential connection between individuals' network positions and their economic outputs. We use a statistical filtering method to construct dependence networks from weighted friendship networks of individuals. We investigate the 30 distinct motif positions in the 13 directed triadic motifs which represent microscopic dependences among individuals. Based on the structural similarity of motif positions, we further classify individuals into different groups. The node position diversity of individuals is found to be positively correlated with their economic outputs. We also find that the economic outputs of leaf nodes are significantly lower than that of the other nodes in the same motif. Our findings shed light on understanding the influence of network structure on economic activities and outputs in socioeconomic system.
- Jun 13 2017 cs.CV arXiv:1706.03424v2Remote sensing image retrieval(RSIR), which aims to efficiently retrieve data of interest from large collections of remote sensing data, is a fundamental task in remote sensing. Over the past several decades, there has been significant effort to extract powerful feature representations for this task since the retrieval performance depends on the representative strength of the features. Benchmark datasets are also critical for developing, evaluating, and comparing RSIR approaches. Current benchmark datasets are deficient in that 1) they were originally collected for land use/land cover classification and not image retrieval, 2) they are relatively small in terms of the number of classes as well the number of sample images per class, and 3) the retrieval performance has saturated. These limitations have severely restricted the development of novel feature representations for RSIR, particularly the recent deep-learning based features which require large amounts of training data. We therefore present in this paper, a new large-scale remote sensing dataset termed "PatternNet" that was collected specifically for RSIR. PatternNet was collected from high-resolution imagery and contains 38 classes with 800 images per class. We also provide a thorough review of RSIR approaches ranging from traditional handcrafted feature based methods to recent deep learning based ones. We evaluate over 35 methods to establish extensive baseline results for future RSIR research using the PatternNet benchmark.
- Mar 07 2017 cs.CV arXiv:1703.01702v1This paper studies the problem of how to choose good viewpoints for taking photographs of architectures. We achieve this by learning from professional photographs of world famous landmarks that are available on the Internet. Unlike previous efforts devoted to photo quality assessment which mainly rely on 2D image features, we show in this paper combining 2D image features extracted from images with 3D geometric features computed on the 3D models can result in more reliable evaluation of viewpoint quality. Specifically, we collect a set of photographs for each of 15 world famous architectures as well as their 3D models from the Internet. Viewpoint recovery for images is carried out through an image-model registration process, after which a newly proposed viewpoint clustering strategy is exploited to validate users' viewpoint preferences when photographing landmarks. Finally, we extract a number of 2D and 3D features for each image based on multiple visual and geometric cues and perform viewpoint recommendation by learning from both 2D and 3D features using a specifically designed SVM-2K multi-view learner, achieving superior performance over using solely 2D or 3D features. We show the effectiveness of the proposed approach through extensive experiments. The experiments also demonstrate that our system can be used to recommend viewpoints for rendering textured 3D models of buildings for the use of architectural design, in addition to viewpoint evaluation of photographs and recommendation of viewpoints for photographing architectures in practice.
- We study the coexistence problem in a two-tier heterogeneous network (HetNet) with cognitive small cells. In particular, we consider an underlay HetNet, where the cognitive small base station (C-SBS) is allowed to use the frequency bands of the macro cell with an access probability (AP) as long as the C-SBS satisfies a preset interference probability (IP) constraint at macro users (MUs). To enhance the AP (or transmission opportunity) of the C-SBS, we propose a learning-based algorithm for the C-SBS and exploit the distance information between the macro base station (MBS) and MUs. Generally, the signal from the MBS to a specific MU contains the distance information between the MBS to the MU. We enable the C-SBS to analyze the MBS signal on a target frequency band, and learn the distance information between the MBS and the corresponding MU. With the learnt distance information, we calculate the upper bound of the probability that the C-SBS may interfere with the MU, and design an AP with a closed-form expression under the IP constraint. Numerical results indicate that the proposed algorithm outperforms the existing methods up to $60\%$ AP (or transmission opportunity).
- Oct 11 2016 cs.CV arXiv:1610.03023v2Learning powerful feature representations for image retrieval has always been a challenging task in the field of remote sensing. Traditional methods focus on extracting low-level hand-crafted features which are not only time-consuming but also tend to achieve unsatisfactory performance due to the content complexity of remote sensing images. In this paper, we investigate how to extract deep feature representations based on convolutional neural networks (CNN) for high-resolution remote sensing image retrieval (HRRSIR). To this end, two effective schemes are proposed to generate powerful feature representations for HRRSIR. In the first scheme, the deep features are extracted from the fully-connected and convolutional layers of the pre-trained CNN models, respectively; in the second scheme, we propose a novel CNN architecture based on conventional convolution layers and a three-layer perceptron. The novel CNN model is then trained on a large remote sensing dataset to learn low dimensional features. The two schemes are evaluated on several public and challenging datasets, and the results indicate that the proposed schemes and in particular the novel CNN are able to achieve state-of-the-art performance.
- In underlay heterogeneous networks (HetNets), the distance between a macro base station (MBS) and a macro user (MU) is crucial for a small-cell based station (SBS) to control the interference to the MU and achieve the coexistence. To obtain the distance between the MBS and the MU, the SBS needs a backhaul link from the macro system, such that the macro system is able to transmit the information of the distance to the SBS through the backhaul link. However, there may not exist any backhaul link from the macro system to the SBS in practical situations. Thus, it is challenging for the SBS to obtain the distance. To deal with this issue, we propose a median based (MB) estimator for the SBS to obtain the distance between the MBS and the MU without any backhaul link. Numerical results show that the estimation error of the MB estimator can be as small as $4\%$.
- Jul 15 2016 cs.SC arXiv:1607.04176v2Given a nonsingular $n \times n$ matrix of univariate polynomials over a field $\mathbb{K}$, we give fast and deterministic algorithms to compute its determinant and its Hermite normal form. Our algorithms use $\widetilde{\mathcal{O}}(n^\omega \lceil s \rceil)$ operations in $\mathbb{K}$, where $s$ is bounded from above by both the average of the degrees of the rows and that of the columns of the matrix and $\omega$ is the exponent of matrix multiplication. The soft-$O$ notation indicates that logarithmic factors in the big-$O$ are omitted while the ceiling function indicates that the cost is $\widetilde{\mathcal{O}}(n^\omega)$ when $s = o(1)$. Our algorithms are based on a fast and deterministic triangularization method for computing the diagonal entries of the Hermite form of a nonsingular matrix.
- Jul 13 2016 cs.DS arXiv:1607.03292v1Building concurrent spatial trees is more complicated than binary search trees since a space hierarchy should be preserved during modifications. We present a non-blocking quadtree-quadboost-that supports concurrent insert, remove, move, and contain operations. To increase its concurrency, we propose a decoupling approach that separates physical adjustment from logical removal within the remove operation. In addition, we design a continuous find mechanism to reduce its search cost. The move operation combines the searches for different keys together and modifies different positions with atomicity. The experimental results show that quadboost scales well on a multi-core system with 32 hardware threads. More than that, it outperforms existing concurrent trees in retrieving two-dimensional keys with up to 109% improvement when the number of threads is large. The move operation proved to perform better than the best-known algorithm, with up to 47%.
- In cognitive radio networks, the channel gain between primary transceivers, namely, primary channel gain, is crucial for a cognitive transmitter (CT) to control the transmit power and achieve spectrum sharing. Conventionally, the primary channel gain is estimated in the primary system and thus unavailable at the CT. To deal with this issue, two estimators are proposed by enabling the CT to sense primary signals. In particular, by adopting the maximum likelihood (ML) criterion to analyze the received primary signals, a ML estimator is first developed. After demonstrating the high computational complexity of the ML estimator, a median based (MB) estimator with proved low complexity is then proposed. Furthermore, the estimation accuracy of the MB estimation is theoretically characterized. By comparing the ML estimator and the MB estimator from the aspects of the computational complexity as well as the estimation accuracy, both advantages and disadvantages of two estimators are revealed. Numerical results show that the estimation errors of the ML estimator and the MB estimator can be as small as $0.6$ dB and $0.7$ dB, respectively.
- Feb 08 2016 cs.SC arXiv:1602.02049v1Given a square, nonsingular matrix of univariate polynomials $\mathbf{F} \in \mathbb{K}[x]^{n \times n}$ over a field $\mathbb{K}$, we give a fast, deterministic algorithm for finding the Hermite normal form of $\mathbf{F}$ with complexity $O^{\sim}\left(n^{\omega}d\right)$ where $d$ is the degree of $\mathbf{F}$. Here soft-$O$ notation is Big-$O$ with log factors removed and $\omega$ is the exponent of matrix multiplication. The method relies of a fast algorithm for determining the diagonal entries of its Hermite normal form, having as cost $O^{\sim}\left(n^{\omega}s\right)$ operations with $s$ the average of the column degrees of $\mathbf{F}$.
- Oct 14 2015 cs.LO arXiv:1510.03531v2The Internet, as it stands today, is highly vulnerable to attacks. However, little has been done to understand and verify the formal security guarantees of proposed secure inter-domain routing protocols, such as Secure BGP (S-BGP). In this paper, we develop a sound program logic for SANDLog-a declarative specification language for secure routing protocols for verifying properties of these protocols. We prove invariant properties of SANDLog programs that run in an adversarial environment. As a step towards automated verification, we implement a verification condition generator (VCGen) to automatically extract proof obligations. VCGen is integrated into a compiler for SANDLog that can generate executable protocol implementations; and thus, both verification and empirical evaluation of secure routing protocols can be carried out in this unified framework. To validate our framework, we encoded several proposed secure routing mechanisms in SANDLog, verified variants of path authenticity properties by manually discharging the generated verification conditions in Coq, and generated executable code based on SANDLog specification and ran the code in simulation.
- Sep 22 2015 physics.soc-ph cs.SI arXiv:1509.06197v2People in modern societies form different social networks through numerous means of communication. These communication networks reflect different aspects of human's societal structure. The billing records of calls among mobile phone users enable us to construct a directed calling network (DCN) and its Bonferroni network (SVDCN) in which the preferential communications are statistically validated. Here we perform a comparative investigation of the cliques of the original DCN and its SVDCN constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that the statistical properties of the cliques of the two calling networks are qualitatively similar and the clique members in the DCN and the SVDCN exhibit idiosyncratic behaviors quantitatively. Members in large cliques are found to be spatially close to each other. Based on the clique degree profile of each mobile phone user, the most active users in the two calling networks can be classified in to several groups. The users in different groups are found to have different calling behaviors. Our study unveils interesting communication behaviors among mobile phone users that are densely connected to each other.
- Sep 01 2015 physics.soc-ph cs.SI arXiv:1508.07503v2Humans are heterogenous and the behaviors of individuals could be different from that at the population level. We conduct an in-depth study of the temporal patterns of cellphone conversation activities of 73'339 anonymous cellphone users with the same truncated Weibull distribution of inter-call durations. We find that the individual call events exhibit a pattern of bursts, in which high activity periods are alternated with low activity periods. Surprisingly, the number of events in high activity periods are found to conform to a power-law distribution at the population level, but follow an exponential distribution at the individual level, which is a hallmark of absence of memory in individual call activities. Such exponential distribution is also observed for the number of events in low activity periods. Together with the exponential distributions of inter-call durations within bursts and of the intervals between consecutive bursts, we demonstrate that the individual call activities are driven by two independent Poisson processes, which can be combined within a minimal model in terms of a two-state first-order Markov chain giving very good agreement with the empirical distributions using the parameters estimated from real data for about half of the individuals in our sample. By measuring directly the distributions of call rates across the population, which exhibit power-law tails, we explain the difference with previous population level studies, purporting the existence of power-law distributions, via the "Superposition of Distributions" mechanism: The superposition of many exponential distributions of activities with a power-law distribution of their characteristic scales leads to a power-law distribution of the activities at the population level.
- Mar 13 2015 physics.soc-ph cs.SI arXiv:1503.03746v1Constituents of complex systems interact with each other and self-organize to form complex networks. Empirical results show that the link formation process of many real networks follows either the global principle of popularity or the local principle of similarity or a tradeoff between the two. In particular, it has been shown that in social networks individuals exhibit significant homophily when choosing their collaborators. We demonstrate, however, that in populations in which there is a division of labor, skill complementarity is an important factor in the formation of socioeconomic networks and an individual's choice of collaborators is strongly affected by heterophily. We analyze 124 evolving virtual worlds of a popular "massively multiplayer online role-playing game" (MMORPG) in which people belong to three different professions and are allowed to work and interact with each other in a somewhat realistic manner. We find evidence of heterophily in the formation of collaboration networks, where people prefer to forge social ties with people who have professions different from their own. We then construct an economic model to quantify the heterophily by assuming that individuals in socioeconomic systems choose collaborators that are of maximum utility. The results of model calibration confirm the presence of heterophily. Both empirical analysis and model calibration show that the heterophilous feature is persistent along the evolution of virtual worlds. We also find that the degree of complementarity in virtual societies is positively correlated with their economic output. Our work sheds new light on the scientific research utility of virtual worlds for studying human behaviors in complex socioeconomic systems.
- Handover rate is one of the most import metrics to instruct mobility management and resource management in wireless cellular networks. In the literature, the mathematical expression of handover rate has been derived for homogeneous cellular network by both regular hexagon coverage model and stochastic geometry model, but there has not been any reliable result for heterogeneous cellular networks (HCNs). Recently, stochastic geometry modeling has been shown to model well the real deployment of HCNs and has been extensively used to analyze HCNs. In this paper, we give an effective handover analysis for HCNs by stochastic geometry modeling, derive the mathematical expression of handover rate by employing an infinitesimal method for a generalized multi-tier scenario, discuss the result by deriving some meaningful corollaries, and validate the analysis by computer simulation with multiple walking models. By our analysis, we find that in HCNs the handover rate is related to many factors like the base stations' densities and transmitting powers, user's velocity distribution, bias factor, pass loss factor and etc. Although our analysis focuses on the scenario of multi-tier HCNs, the analytical framework can be easily extended for more complex scenarios, and may shed some light for future study.
- Sep 22 2014 cs.SC arXiv:1409.5462v1Given a square, nonsingular matrix of univariate polynomials $\mathbf{F}\in\mathbb{K}[x]^{n\times n}$ over a field $\mathbb{K}$, we give a deterministic algorithm for finding the determinant of $\mathbf{F}$. The complexity of the algorithm is $\bigO \left(n^{\omega}s\right)$ field operations where $s$ is the average column degree or the average row degree of $\mathbf{F}$. Here $\bigO$ notation is Big-$O$ with log factors omitted and $\omega$ is the exponent of matrix multiplication.
- Apr 01 2014 physics.soc-ph cs.SI arXiv:1403.7879v1In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (${\rm{M}}_9$) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.
- Mar 18 2014 physics.soc-ph cs.SI arXiv:1403.3785v1Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, generalized to the directed case. We study two large datasets of mobile phone records, one from Europe and the other from China. For both datasets we compare the raw data networks with the corresponding Bonferroni networks and point out significant differences in the structures and in the basic network measures. We show evidence that the Bonferroni network provides a better proxy for the network of social interactions than the original one. By using the filtered networks we investigated the statistics and temporal evolution of small directed 3-motifs and conclude that closed communication triads have a formation time-scale, which is quite fast and typically intraday. We also find that open communication triads preferentially evolve to other open triads with a higher fraction of reciprocated calls. These stylized facts were observed for both datasets.
- Mar 04 2014 cs.CV arXiv:1403.0284v2The Bag-of-Words (BoW) representation is well applied to recent state-of-the-art image retrieval works. Typically, multiple vocabularies are generated to correct quantization artifacts and improve recall. However, this routine is corrupted by vocabulary correlation, i.e., overlapping among different vocabularies. Vocabulary correlation leads to an over-counting of the indexed features in the overlapped area, or the intersection set, thus compromising the retrieval accuracy. In order to address the correlation problem while preserve the benefit of high recall, this paper proposes a Bayes merging approach to down-weight the indexed features in the intersection set. Through explicitly modeling the correlation problem in a probabilistic view, a joint similarity on both image- and feature-level is estimated for the indexed features in the intersection set. We evaluate our method through extensive experiments on three benchmark datasets. Albeit simple, Bayes merging can be well applied in various merging tasks, and consistently improves the baselines on multi-vocabulary merging. Moreover, Bayes merging is efficient in terms of both time and memory cost, and yields competitive performance compared with the state-of-the-art methods.
- Feb 26 2014 cs.SI physics.soc-ph arXiv:1402.6573v2Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.
- The spatial structure of transmitters in wireless networks plays a key role in evaluating the mutual interference and hence the performance. Although the Poisson point process (PPP) has been widely used to model the spatial configuration of wireless networks, it is not suitable for networks with repulsion. The Ginibre point process (GPP) is one of the main examples of determinantal point processes that can be used to model random phenomena where repulsion is observed. Considering the accuracy, tractability and practicability tradeoffs, we introduce and promote the $\beta$-GPP, an intermediate class between the PPP and the GPP, as a model for wireless networks when the nodes exhibit repulsion. To show that the model leads to analytically tractable results in several cases of interest, we derive the mean and variance of the interference using two different approaches: the Palm measure approach and the reduced second moment approach, and then provide approximations of the interference distribution by three known probability density functions. Besides, to show that the model is relevant for cellular systems, we derive the coverage probability of the typical user and also find that the fitted $\beta$-GPP can closely model the deployment of actual base stations in terms of the coverage probability and other statistics.
- Apr 12 2013 physics.ins-det cs.OH arXiv:1304.3242v1A large number of data need to be transmitted in high-speed between Field Programmable Gate Array (FPGA) and Advanced RISC Machines 11 micro-controller (ARM11) when we design a small data acquisition (DAQ) system for nuclear experiments. However, it is a complex problem to beat the target. In this paper, we will introduce a method which can realize the high-speed data transmission. By this way, FPGA is designed to acquire massive data from Front-end electronics (FEE) and send it to ARM11, which will transmit the data to other computer through the TCP/IP protocol. This paper mainly introduces the interface design of the high-speed transmission between FPGA and ARM11, the transmission logic of FPGA and the driver program of ARM11. The research shows that the maximal transmission speed between FPGA and ARM11 by this way can reach 50MB/s theoretically, while in nuclear physics experiment, the system can acquire data with the speed of 2.2MB/s.
- Matrix completion has been well studied under the uniform sampling model and the trace-norm regularized methods perform well both theoretically and numerically in such a setting. However, the uniform sampling model is unrealistic for a range of applications and the standard trace-norm relaxation can behave very poorly when the underlying sampling scheme is non-uniform. In this paper we propose and analyze a max-norm constrained empirical risk minimization method for noisy matrix completion under a general sampling model. The optimal rate of convergence is established under the Frobenius norm loss in the context of approximately low-rank matrix reconstruction. It is shown that the max-norm constrained method is minimax rate-optimal and yields a unified and robust approximate recovery guarantee, with respect to the sampling distributions. The computational effectiveness of this method is also discussed, based on first-order algorithms for solving convex optimizations involving max-norm regularization.
- Modern technologies not only provide a variety of communication modes, e.g., texting, cellphone conversation, and online instant messaging, but they also provide detailed electronic traces of these communications between individuals. These electronic traces indicate that the interactions occur in temporal bursts. Here, we study the inter-call durations of the 100,000 most-active cellphone users of a Chinese mobile phone operator. We confirm that the inter-call durations follow a power-law distribution with an exponential cutoff at the population level but find differences when focusing on individual users. We apply statistical tests at the individual level and find that the inter-call durations follow a power-law distribution for only 3460 individuals (3.46%). The inter-call durations for the majority (73.34%) follow a Weibull distribution. We quantify individual users using three measures: out-degree, percentage of outgoing calls, and communication diversity. We find that the cellphone users with a power-law duration distribution fall into three anomalous clusters: robot-based callers, telecom frauds, and telephone sales. This information is of interest to both academics and practitioners, mobile telecom operator in particular. In contrast, the individual users with a Weibull duration distribution form the fourth cluster of ordinary cellphone users. We also discover more information about the calling patterns of these four clusters, e.g., the probability that a user will call the $c_r$-th most contact and the probability distribution of burst sizes. Our findings may enable a more detailed analysis of the huge body of data contained in the logs of massive users.
- Nov 15 2012 cs.CR arXiv:1211.3191v2Open communication over the Internet poses a serious threat to countries with repressive regimes, leading them to develop and deploy censorship mechanisms within their networks. Unfortunately, existing censorship circumvention systems do not provide high availability guarantees to their users, as censors can identify, hence disrupt, the traffic belonging to these systems using today's advanced censorship technologies. In this paper we propose SWEET, a highly available censorship-resistant infrastructure. SWEET works by encapsulating a censored user's traffic to a proxy server inside email messages that are carried over by public email service providers, like Gmail and Yahoo Mail. As the operation of SWEET is not bound to specific email providers we argue that a censor will need to block all email communications in order to disrupt SWEET, which is infeasible as email constitutes an important part of today's Internet. Through experiments with a prototype of our system we find that SWEET's performance is sufficient for web traffic. In particular, regular websites are downloaded within couple of seconds.
- Oct 23 2012 cs.NI arXiv:1210.5614v1Though cooperative relaying is believed to be a promising technology to improve the energy efficiency of cellular networks, the relays' static power consumption might worsen the energy efficiency therefore can not be neglected. In this paper, we focus on whether and how the energy efficiency of cellular networks can be improved via relays. Based on the spatial Poisson point process, an analytical model is proposed to evaluate the energy efficiency of relay-assisted cellular networks. With the aid of the technical tools of stochastic geometry, we derive the distributions of signal-to-interference-plus-noise ratios (SINRs) and mean achievable rates of both non-cooperative users and cooperative users. The energy efficiency measured by "bps/Hz/W" is expressed subsequently. These established expressions are amenable to numerical evaluation and corroborated by simulation results.
- Oct 16 2012 cs.NI arXiv:1210.3872v2In this letter, we consider a joint macro-relay network with densely deployed relay stations (RSs) and dynamically varied traffic load measured by the number of users. An energy-efficient strategy is proposed by intelligently adjusting the RS working modes (active or sleeping) according to the traffic variation. Explicit expressions related to the network energy efficiency are derived based on stochastic geometry theory. Simulation results demonstrate that the derived analytic results are reasonable and the proposed strategy can significantly improve the network energy efficiency.
- Mar 23 2012 cs.NI arXiv:1203.4913v2In cognitive radio networks, channel aggregation (CA) and channel fragmentation (CF) techniques have been proposed to enhance the spectrum utilization. While most of the literature studies CA and CF independently, in this paper we combine CA and CF innovatively and present a new spectrum sharing strategy named CAF (Channel Aggregation and Fragmentation). We elaborate on the proposed CAF strategy and derive the balance equation by a continuous time Markov chain (CTMC) model. Then various system performance metrics including blocking probability, dropping probability, spectrum utilization and throughput of the secondary network are evaluated. Both analytical and simulation results show that our strategy lowers the blocking and dropping probabilities and enhances the spectrum utilization and throughput effectively. Moreover, by tuning the bandwidth requirement of each secondary user, different system performance can be achieved.
- May 19 2010 physics.soc-ph cs.IR arXiv:1005.3124v2Network-based recommendation algorithms for user-object link predictions have achieved significant developments in recent years. For bipartite graphs, the reallocation of resource in such algorithms is analogous to heat spreading (HeatS) or probability spreading (ProbS) processes. The best algorithm to date is a hybrid of the HeatS and ProbS techniques with homogenous initial resource configurations, which fulfills simultaneously high accuracy and large diversity. We investigate the effect of heterogeneity in initial configurations on the HeatS+ProbS hybrid algorithm and find that both recommendation accuracy and diversity can be further improved in this new setting. Numerical experiments show that the improvement is robust.
- Mar 31 2010 cs.DC arXiv:1003.5794v3To save cost, recently more and more users choose to provision virtual machine resources in cluster systems, especially in data centres. Maintaining a consistent member view is the foundation of reliable cluster managements, and it also raises several challenge issues for large scale cluster systems deployed with virtual machines (which we call virtualized clusters). In this paper, we introduce our experiences in design and implementation of scalable member view management on large-scale virtual clusters. Our research contributions are three-fold: 1) we propose a scalable and reliable management infrastructure that combines a peer-to-peer structure and a hierarchy structure to maintain a consistent member view in virtual clusters; 2) we present a light-weighted group membership algorithm that can reach the consistent member view within a single round of message exchange; and 3) we design and implement a scalable membership service that can provision virtual machines and maintain a consistent member view in virtual clusters. Our work is verified on Dawning 5000A, which ranked No.10 of Top 500 super computers in November, 2008.
- Mar 05 2010 cs.DC arXiv:1003.0951v2This paper presents a methodology and a system, named LogMaster, for mining correlations of events that have multiple attributions, i.e., node ID, application ID, event type, and event severity, in logs of large-scale cluster systems. Different from traditional transactional data, e.g., supermarket purchases, system logs have their unique characteristic, and hence we propose several innovative approaches to mine their correlations. We present a simple metrics to measure correlations of events that may happen interleavedly. On the basis of the measurement of correlations, we propose two approaches to mine event correlations; meanwhile, we propose an innovative abstraction: event correlation graphs (ECGs) to represent event correlations, and present an ECGs based algorithm for predicting events. For two system logs of a production Hadoop-based cloud computing system at Research Institution of China Mobile and a production HPC cluster system at Los Alamos National Lab (LANL), we evaluate our approaches in three scenarios: (a) predicting all events on the basis of both failure and non-failure events; (b) predicting only failure events on the basis of both failure and non-failure events; (c) predicting failure events after removing non-failure events.
- In recent years, there has been a proliferation of declarative logic-based trust management languages and systems proposed to ease the description, configuration, and enforcement of security policies. These systems have different tradeoffs in expressiveness and complexity, depending on the security constructs (e.g. authentication, delegation, secrecy, etc.) that are supported, and the assumed trust level and scale of the execution environment. In this paper, we present LBTrust, a unified declarative system for reconfigurable trust management, where various security constructs can be customized and composed in a declarative fashion. We present an initial proof-of-concept implementation of LBTrust using LogicBlox, an emerging commercial Datalog-based platform for enterprise software systems. The LogicBlox language enhances Datalog in a variety of ways, including constraints and meta-programming, as well as support for programmer defined constraints which on the meta-model itself ? meta-constraints ? which act to restrict the set of allowable programs. LBTrust utilizes LogicBlox?s meta-programming and meta-constraints to enable customizable cryptographic, partitioning and distribution strategies based on the execution environment. We present uses cases of LBTrust based on three trust management systems (Binder, D1LP, and Secure Network Datalog), and provide a preliminary evaluation of a Binder-based trust management system.
- Massive multiplayer online role-playing games (MMORPGs) are very popular in China, which provides a potential platform for scientific research. We study the online-offline activities of avatars in an MMORPG to understand their game-playing behavior. The statistical analysis unveils that the active avatars can be classified into three types. The avatars of the first type are owned by game cheaters who go online and offline in preset time intervals with the online duration distributions dominated by pulses. The second type of avatars is characterized by a Weibull distribution in the online durations, which is confirmed by statistical tests. The distributions of online durations of the remaining individual avatars differ from the above two types and cannot be described by a simple form. These findings have potential applications in the game industry.
- Jun 09 2009 cs.DC arXiv:0906.1346v6Different departments of a large organization often run dedicated cluster systems for different computing loads, like HPC (high performance computing) jobs or Web service applications. In this paper, we have designed and implemented a cloud management system software Phoenix Cloud to consolidate heterogeneous workloads from different departments affiliated to the same organization on the shared cluster system. We have also proposed cooperative resource provisioning and management policies for a large organization and its affiliated departments, running HPC jobs and Web service applications, to share the consolidated cluster system. The experiments show that in comparison with the case that each department operates its dedicated cluster system, Phoenix Cloud significantly decreases the scale of the required cluster system for a large organization, improves the benefit of the scientific computing department, and at the same time provisions enough resources to the other department running Web services with varying loads.
- Jun 09 2009 cs.DC arXiv:0906.1328v1It is effective to improve the reliability and availability of large-scale cluster systems through the analysis of failures. Existed failure analysis methods understand and analyze failures from one or few dimension. The analysis results are partial and with less precision because of the limitation of data source. This paper presents multidimensional analysis based on graph mining to analyze multi-source system logs, which is a promising failure analysis method to get more complete and precise failure knowledge.
- A \it dynamic $k$-coloring of a graph $G$ is a proper $k$-coloring of the vertices of $G$ such that every vertex of degree at least 2 in $G$ will be adjacent to vertices with at least 2 different colors. The smallest number $k$ for which a graph $G$ can have a dynamic $k$-coloring is the \it dynamic chromatic number, denoted by $\chi_d(G)$. In this paper, we investigate the dynamic 3-colorings of claw-free graphs. First, we prove that it is $NP$-complete to determine if a claw-free graph with maximum degree 3 is dynamically 3-colorable. Second, by forbidding a kind of subgraphs, we find a reasonable subclass of claw-free graphs with maximum degree 3, for which the dynamically 3-colorable problem can be solved in linear time. Third, we give a linear time algorithm to recognize this subclass of graphs, and a linear time algorithm to determine whether it is dynamically 3-colorable. We also give a linear time algorithm to color the graphs in the subclass by 3 colors.
- For an integer $r>0$, a conditional $(k,r)$-coloring of a graph $G$ is a proper $k$-coloring of the vertices of $G$ such that every vertex $v$ of degree $d(v)$ in $G$ is adjacent to vertices with at least $min\{r, d(v)\}$ different colors. The smallest integer $k$ for which a graph $G$ has a conditional $(k,r)$-coloring is called the $r$th order conditional chromatic number, denoted by $\chi_r(G)$. It is easy to see that the conditional coloring is a generalization of the traditional vertex coloring for which $r=1$. In this paper, we consider the complexity of the conditional colorings of graphs. The main result is that the conditional $(3,2)$-colorability is $NP$-complete for triangle-free graphs with maximum degree at most 3, which is different from the old result that the traditional 3-colorability is polynomial solvable for graphs with maximum degree at most 3. This also implies that it is $NP$-complete to determine if a graph of maximum degree 3 is $(3,2)$- or $(4,2)$-colorable. Also we have proved that some old complexity results for traditional colorings still hold for the conditional colorings.
- Jul 17 2007 cs.MS arXiv:0707.2347v5We propose several new schedules for Strassen-Winograd's matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using a first recursive level of classical multiplication. In particular, we show two fully in-place schedules: one having the same number of operations, if the input matrices can be overwritten; the other one, slightly increasing the constant of the leading term of the complexity, if the input matrices are read-only. Many of these schedules have been found by an implementation of an exhaustive search algorithm based on a pebble game.