results for au:Xu_Y in:cs

- Feb 19 2018 cs.MM arXiv:1802.06057v1Immersive video offers the freedom to navigate inside virtualized environment. Instead of streaming the bulky immersive videos entirely, a viewport (also referred to as field of view, FoV) adaptive streaming is preferred. We often stream the high-quality content within current viewport, while reducing the quality of representation elsewhere to save the network bandwidth consumption. Consider that we could refine the quality when focusing on a new FoV, in this paper, we model the perceptual impact of the quality variations (through adapting the quantization stepsize and spatial resolution) with respect to the refinement duration, and yield a product of two closed-form exponential functions that well explain the joint quantization and resolution induced quality impact. Analytical model is cross-validated using another set of data, where both Pearson and Spearman's rank correlation coefficients are close to 0.98. Our work is devised to optimize the adaptive FoV streaming of the immersive video under limited network resource. Numerical results show that our proposed model significantly improves the quality of experience of users, with about 9.36\% BD-Rate (Bjontegaard Delta Rate) improvement on average as compared to other representative methods, particularly under the limited bandwidth.
- This paper proposes an accurate approach to estimate vehicles' 3D pose and shape from multi-view monocular images with a small overlap. This approach utilizes a state-of-the-art convolutional neural network (CNN) to detect vehicles' semantic keypoint in images and then introduces a Cross Projection Optimization (CPO) method to estimate the 3D pose accurately. During the iterative CPO process, it implements a new vehicle shape adjustment method named Hierarchical Wireframe Constraint (HWC). The approach is tested under both simulated and real-world scenes for performance verification. It's shown that this approach outperforms other existing monocular and stereo visual methods for vehicles' 3D pose and shape estimation. This approach provides new and robust solutions for accurate visual vehicle localization and it can be applied to the massive surveillance camera networks for intelligent transportation applications such as automatic driving assistance.
- Jan 29 2018 cs.CR arXiv:1801.08737v1In this work, we provide the first lattice-based group signature that offers full dynamicity (i.e., users have the flexibility in joining and leaving the group), and thus, resolve a prominent open problem posed by previous works. Moreover, we achieve this non-trivial feat in a relatively simple manner. Starting with Libert et al.'s fully static construction (Eurocrypt 2016) - which is arguably the most efficient lattice-based group signature to date, we introduce simple-but-insightful tweaks that allow to upgrade it directly into the fully dynamic setting. More startlingly, our scheme even produces slightly shorter signatures than the former, thanks to an adaptation of a technique proposed by Ling et al. (PKC 2013), allowing to prove inequalities in zero-knowledge. Our design approach consists of upgrading Libert et al.'s static construction (EUROCRYPT 2016) - which is arguably the most efficient lattice-based group signature to date - into the fully dynamic setting. Somewhat surprisingly, our scheme produces slightly shorter signatures than the former, thanks to a new technique for proving inequality in zero-knowledge without relying on any inequality check. The scheme satisfies the strong security requirements of Bootle et al.'s model (ACNS 2016), under the Short Integer Solution (SIS) and the Learning With Errors (LWE) assumptions. Furthermore, we demonstrate how to equip the obtained group signature scheme with the deniability functionality in a simple way. This attractive functionality, put forward by Ishida et al. (CANS 2016), enables the tracing authority to provide an evidence that a given user is not the owner of a signature in question. In the process, we design a zero-knowledge protocol for proving that a given LWE ciphertext does not decrypt to a particular message.
- Jan 26 2018 cs.CR arXiv:1801.08323v1Group signature is a fundamental cryptographic primitive, aiming to protect anonymity and ensure accountability of users. It allows group members to anonymously sign messages on behalf of the whole group, while incorporating a tracing mechanism to identify the signer of any suspected signature. Most of the existing group signature schemes, however, do not guarantee security once users' secret keys are exposed. To reduce potential damages caused by key exposure attacks, Song (CCS 2001) put forward the concept of forward-secure group signatures (FSGS). For the time being, all known secure FSGS schemes are based on number-theoretic assumptions, and are vulnerable against quantum computers. In this work, we construct the first lattice-based FSGS scheme. In Nakanishi et al.'s model, our scheme achieves forward-secure traceability under the Short Integer Solution (SIS) assumption, and offers full anonymity under the Learning With Errors (LWE) assumption. At the heart of our construction is a scalable lattice-based key-evolving mechanism, allowing users to periodically update their secret keys and to efficiently prove in zero-knowledge that the key-evolution process is done correctly. To realize this essential building block, we first employ the Bonsai-tree structure by Cash et al. (EUROCRYPT 2010) to handle the key evolution process, and then develop Langlois et al.'s construction (PKC 2014) to design its supporting zero-knowledge protocol. In comparison to all known lattice-based group signatures (that are \emphnot forward-secure), our scheme only incurs a very reasonable overhead: the bit-sizes of keys and signatures are at most O(log N), where N is the number of group users; and at most O(log^3 T), where T is the number of time periods.
- Jan 25 2018 cs.CL arXiv:1801.07746v2The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in developing technologies that help incorporate the findings of the science of happiness into users' daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can understand how people express their happy moments in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we make publicly available. This paper describes HappyDB and its properties, and outlines several important NLP problems that can be studied with the help of the corpus. We also apply several state-of-the-art analysis techniques to analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to be developed which makes HappyDB an exciting resource for follow-on research.
- Jan 24 2018 cs.DC astro-ph.IM arXiv:1801.07548v1With many large science equipment constructing and putting into use, astronomy has stepped into the big data era. The new method and infrastructure of big data processing has become a new requirement of many astronomers. Cloud computing, Map/Reduce, Hadoop, Spark, etc. many new technology has sprung up in recent years. Comparing to the high performance computing(HPC), Data is the center of these new technology. So, a new computing architecture infrastructure is necessary, which can be shared by both HPC and big data processing. Based on Astronomy Cloud project of Chinese Virtual Observatory (China-VO), we have made much efforts to optimize the designation of the hybrid computing platform. which include the hardware architecture, cluster management, Job and Resource scheduling.
- Jan 23 2018 cs.CV arXiv:1801.06940v1We develop a novel cross-modality generation framework that learns to generate predicted modalities from given modalities in MR images without real acquisition. Our proposed method performs image-to-image translation by means of a deep learning model that leverages conditional generative adversarial networks (cGANs). Our framework jointly exploits the low-level features (pixel-wise information) and high-level representations (e.g. brain tumors, brain structure like gray matter, etc.) between cross modalities which are important for resolving the challenging complexity in brain structures. Based on our proposed framework, we first propose a method for cross-modality registration by fusing the deformation fields to adopt the cross-modality information from predicted modalities. Second, we propose an approach for MRI segmentation, translated multichannel segmentation (TMS), where given modalities, along with predicted modalities, are segmented by fully convolutional networks (FCN) in a multi-channel manner. Both these two methods successfully adopt the cross-modality information to improve the performance without adding any extra data. Experiments demonstrate that our proposed framework advances the state-of-the-art on five MRI datasets. We also observe encouraging results in cross-modality registration and segmentation on some widely adopted datasets. Overall, our work can serve as an auxiliary method in clinical diagnosis and be applied to various tasks in medical fields. Keywords: Image-to-image, cross-modality, registration, segmentation, MRI
- Cell movement in the early phase of C. elegans development is regulated by a highly complex process in which a set of rules and connections are formulated at distinct scales. Previous efforts have demonstrated that agent-based, multi-scale modeling systems can integrate physical and biological rules and provide new avenues to study developmental systems. However, the application of these systems to model cell movement is still challenging and requires a comprehensive understanding of regulation networks at the right scales. Recent developments in deep learning and reinforcement learning provide an unprecedented opportunity to explore cell movement using 3D time-lapse microscopy images. We presented a deep reinforcement learning approach within an agent-based modeling system to characterize cell movement in the embryonic development of C. elegans. We tested our model through two scenarios within real developmental processes: the anterior movement of the Cpaaa cell via intercalation and the restoration of the superficial left-right symmetry. Our modeling system overcame the local optimization problems encountered by traditional rule-based, agent-based modeling by using greedy algorithms. It also overcame the computational challenges in the action selection which has been plagued by the traditional tabular-based reinforcement learning approach. Our system can automatically explore the cell movement path by using live microscopy images and it can provide a unique capability to model cell movement scenarios where regulatory mechanisms are not well studied. In addition, our system can be used to explore potential paths of a cell under different regulatory mechanisms or to facilitate new hypotheses for explaining certain cell movement behaviors.
- Jan 09 2018 cs.NI arXiv:1801.02077v1In this paper, we propose a two-layer framework to learn the optimal handover (HO) controllers in possibly large-scale wireless systems supporting mobile Internet-of-Things (IoT) users or traditional cellular users, where the user mobility patterns could be heterogeneous. In particular, our proposed framework first partitions the user equipments (UEs) with different mobility patterns into clusters, where the mobility patterns are similar in the same cluster. Then, within each cluster, an asynchronous multi-user deep reinforcement learning scheme is developed to control the HO processes across the UEs in each cluster, in the goal of lowering the HO rate while ensuring certain system throughput. In this scheme, we use a deep neural network (DNN) as an HO controller learned by each UE via reinforcement learning in a collaborative fashion. Moreover, we use supervised learning in initializing the DNN controller before the execution of reinforcement learning to exploit what we already know with traditional HO schemes and to mitigate the negative effects of random exploration at the initial stage. Furthermore, we show that the adopted global-parameter-based asynchronous framework enables us to train faster with more UEs, which could nicely address the scalability issue to support large systems. Finally, simulation results demonstrate that the proposed framework can achieve better performance than the state-of-art on-line schemes, in terms of HO rates.
- Jan 09 2018 cs.CV arXiv:1801.02475v2In this paper, we proposed a effective but extensible residual one-dimensional convolution neural network as base network, based on the this network, we proposed four subnets to explore the features of skeleton sequences from each aspect. Given a skeleton sequences, the spatial information are encoded into the skeleton joints coordinate in a frame and the temporal information are present by multiple frames. Limited by the skeleton sequence representations, two-dimensional convolution neural network cannot be used directly, we chose one-dimensional convolution layer as the basic layer. Each sub network could extract discriminative features from different aspects. Our first subnet is a two-stream network which could explore both temporal and spatial information. The second is a body-parted network, which could gain micro spatial features and macro temporal features. The third one is an attention network, the main contribution of which is to focus the key frames and feature channels which high related with the action classes in a skeleton sequence. One frame-difference network, as the last subnet, mainly processes the joints changes between the consecutive frames. Four subnets ensemble together by late fusion, the key problem of ensemble method is each subnet should have a certain performance and between the subnets, there are diversity existing. Each subnet shares a wellperformance basenet and differences between subnets guaranteed the diversity. Experimental results show that the ensemble network gets a state-of-the-art performance on three widely used datasets.
- Decoupled fractional Laplacian wave equation can describe the seismic wave propagation in attenuating media. Fourier pseudospectral implementations, which solve the equation in spatial frequency domain, are the only existing methods for solving the equation. For the earth media with curved boundaries, the pseudospectral methods could be less attractive to handle the irregular computational domains. In the paper, we propose a radial basis function collocation method that can easily tackle the irregular domain problems. Unlike the pseudospectral methods, the proposed method solves the equation in physical variable domain. The directional fractional Laplacian is chosen from varied definitions of fractional Laplacian. Particularly, the vector Grünwald-Letnikov formula is employed to approximate fractional directional derivative of radial basis function. The convergence and stability of the method are numerically investigated by using the synthetic solution and the long-time simulations, respectively. The method's flexibility is studied by considering homogeneous and multi-layer media having regular and irregular geometric boundaries.
- Jan 04 2018 cs.CV arXiv:1801.00926v3Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA dataset. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets.
- Jan 03 2018 cs.NI arXiv:1801.00095v1With the rapid growth of congestion-sensitive and data-intensive applications, traditional settlement-free peering agreements with best-effort delivery often do not meet the QoS requirements of content providers (CPs). Meanwhile, Internet access providers (IAPs) feel that revenues from end-users are not sufficient to recoup the upgrade costs of network infrastructures. Consequently, some IAPs have begun to offer CPs a new type of peering agreement, called paid peering, under which they provide CPs with better data delivery quality for a fee. In this paper, we model a network platform where an IAP makes decisions on the peering types offered to CPs and the prices charged to CPs and end-users. We study the optimal peering schemes for the IAP, i.e., to offer CPs both the paid and settlement-free peering to choose from or only one of them, as the objective is profit or welfare maximization. Our results show that 1) the IAP should always offer the paid and settlement-free peering under the profit-optimal and welfare-optimal schemes, respectively, 2) whether to simultaneously offer the other peering type is largely driven by the type of data traffic, e.g., text or video, and 3) regulators might want to encourage the IAP to allocate more network capacity to the settlement-free peering for increasing user welfare.
- Social awareness and social ties are becoming increasingly fashionable with emerging mobile and handheld devices. Social trust degree describing the strength of the social ties has drawn lots of research interests in many fields including secure cooperative communications. Such trust degree reflects the users' willingness for cooperation, which impacts the selection of the cooperative users in the practical networks. In this paper, we propose a cooperative relay and jamming selection scheme to secure communication based on the social trust degree under a stochastic geometry framework. We aim to analyze the involved secrecy outage probability (SOP) of the system's performance. To achieve this target, we propose a double Gamma ratio (DGR) approach through Gamma approximation. Based on this, the SOP is tractably obtained in closed form. The simulation results verify our theoretical findings, and validate that the social trust degree has dramatic influences on the network's secrecy performance.
- This paper proposes a distributed multiple relay selection scheme to maximize the satisfaction experiences of unmanned aerial vehicles (UAV) communication networks. The multi-radio and multi-channel (MRMC) UAV communication system is considered in this paper. One source UAV can select one or more relay radios, and each relay radio can be shared by multiple source UAVs equally. Without the center controller, source UAVs with heterogeneous requirements compete for channels dominated by relay radios. In order to optimize the global satisfaction performance, we model the UAV communication network as a many-to-many matching market without substitutability. We design a potential matching approach to address the optimization problem, in which the optimizing of local matching process will lead to the improvement of global matching results. Simulation results show that the proposed distributed matching approach yields good matching performance of satisfaction, which is close to the global optimum result. Moreover, the many-to-many potential matching approach outperforms existing schemes sufficiently in terms of global satisfaction within a reasonable convergence time.
- This paper investigates an energy-efficient non-orthogonal transmission design problem for two downlink receivers that have strict reliability and finite blocklength (latency) constraints. The Shannon capacity formula widely used in traditional designs needs the assumption of infinite blocklength and thus is no longer appropriate. We adopt the newly finite blocklength coding capacity formula for explicitly specifying the trade-off between reliability and code blocklength. However, conventional successive interference cancellation (SIC) may become infeasible due to heterogeneous blocklengths. We thus consider several scenarios with different channel conditions and with/without SIC. By carefully examining the problem structure, we present in closed-form the optimal power and code blocklength for energy-efficient transmissions. Simulation results provide interesting insights into conditions for which non-orthogonal transmission is more energy efficient than the orthogonal transmission such as TDMA.
- Dec 01 2017 cs.CV arXiv:1711.11317v2The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as cell-level classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial networks architecture with a new formulation of loss to perform robust cell-level visual representation learning in an unsupervised setting. Our model is not only label-free and easily trained but also capable of cell-level unsupervised classification with interpretable visualization, which achieves promising results in the unsupervised classification of bone marrow cellular components. Based on the proposed cell-level visual representation learning, we further develop a pipeline that exploits the varieties of cellular elements to perform histopathology image classification, the advantages of which are demonstrated on bone marrow datasets.
- Nov 30 2017 cs.CV arXiv:1711.10658v2Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor. However, the spatial context between these parts is ignored for the independent extractor to each separate part. In this paper, we propose to apply Long Short-Term Memory (LSTM) in an end-to-end way to model the pedestrian, seen as a sequence of body parts from head to foot. Integrating the contextual information strengthens the discriminative ability of local representation. We also leverage the complementary information between local and global feature. Furthermore, we integrate both identification task and ranking task in one network, where a discriminative embedding and a similarity measurement are learned concurrently. This results in a novel three-branch framework named Deep-Person, which learns highly discriminative features for person Re-ID. Experimental results demonstrate that Deep-Person outperforms the state-of-the-art methods by a large margin on three challenging datasets including Market-1501, CUHK03, and DukeMTMC-reID. Specifically, combining with a re-ranking approach, we achieve a 90.84% mAP on Market-1501 under single query setting.
- Nov 27 2017 cs.CV arXiv:1711.08608v2We propose a registration algorithm for 2D CT/MRI medical images with a new unsupervised end-to-end strategy using convolutional neural networks. The contributions of our algorithm are threefold: (1) We transplant traditional image registration algorithms to an end-to-end convolutional neural network framework, while maintaining the unsupervised nature of image registration problems. The image-to-image integrated framework can simultaneously learn both image features and transformation matrix for registration. (2) Training with additional data without any label can further improve the registration performance by approximately 10 %. (3) The registration speed is 100x faster than traditional methods. The proposed network is easy to implement and can be trained efficiently. Experiments demonstrate that our system achieves state-of-the-art results on 2D brain registration and achieves comparable results on 2D liver registration. It can be extended to register other organs beyond liver and brain such as kidney, lung, and heart.
- First-order methods have been popularly used for solving large-scale problems. However, many existing works only consider unconstrained problems or those with simple constraint. In this paper, we develop two first-order methods for constrained convex programs, for which the constraint set is represented by affine equations and smooth nonlinear inequalities. Both methods are based on the classic augmented Lagrangian function. They update the multipliers in the same way as the augmented Lagrangian method (ALM) but employ different primal variable updates. The first method, at each iteration, performs a single proximal gradient step to the primal variable, and the second method is a block update version of the first one. For the first method, we establish its global iterate convergence as well as global sublinear and local linear convergence, and for the second method, we show a global sublinear convergence result in expectation. Numerical experiments are carried out on the basis pursuit denoising and a convex quadratically constrained quadratic program to show the empirical performance of the proposed methods. Their numerical behaviors closely match the established theoretical results.
- Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The traditional solution is to design efficient heuristic algorithms with performance guarantee under certain assumptions. In this paper, we improve a recently proposed job scheduling algorithm using deep reinforcement learning and extend it to multiple server clusters. Our study reveals that deep reinforcement learning method has the potential to outperform traditional resource allocation algorithms in a variety of complicated environments.
- Augmented Lagrangian method (ALM) has been popularly used for solving constrained optimization problems. Its convergence and local convergence speed have been extensively studied. However, its global convergence rate is still open for problems with nonlinear inequality constraints. In this paper, we work on general constrained convex programs. For these problems, we establish the global convergence rate of ALM and its inexact variants. We first assume exact solution to each subproblem in the ALM framework and establish an $O(1/k)$ ergodic convergence result, where $k$ is the number of iterations. Then we analyze an inexact ALM that approximately solves the subproblems. Assuming summable errors, we prove that the inexact ALM also enjoys $O(1/k)$ convergence if smaller stepsizes are used in the multiplier updates. Furthermore, we apply the inexact ALM to a constrained composite convex problem with each subproblem solved by Nesterov's optimal first-order method. We show that $O(\varepsilon^{-\frac{3}{2}-\delta})$ gradient evaluations are sufficient to guarantee an $\varepsilon$-optimal solution in terms of both primal objective and feasibility violation, where $\delta$ is an arbitrary positive number. Finally, for constrained smooth problems, we modify the inexact ALM by adding a proximal term to each subproblem and improve the iteration complexity to $O(\varepsilon^{-1}|\log\varepsilon|)$.
- Nov 16 2017 cs.CL arXiv:1711.04964v1This paper presents a new MRC model that is capable of three key comprehension skills: 1) handling rich variations in question types; 2) understanding potential answer choices; and 3) drawing inference through multiple sentences. The model is based on the proposed MUlti-Strategy Inference for Comprehension (MUSIC) architecture, which is able to dynamically apply different attention strategies to different types of questions on the fly. By incorporating a multi-step inference engine analogous to ReasoNet (Shen et al., 2017), MUSIC can also effectively perform multi-sentence inference in generating answers. Evaluation on the RACE dataset shows that the proposed method significantly outperforms previous state-of-the-art models by 7.5% in relative accuracy.
- Source separation (SS) aims to separate individual sources from an audio recording. Sound event detection (SED) aims to detect sound events from an audio recording. We propose a joint separation-classification (JSC) model trained only on weakly labelled audio data, that is, only the tags of an audio recording are known but the time of the events are unknown. First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events. Second, a classification mapping is built from each T-F segmentation mask to the presence probability of each audio event. In the source separation stage, sources of audio events and time of sound events can be obtained from the T-F segmentation masks. The proposed method achieves an equal error rate (EER) of 0.14 in SED, outperforming deep neural network baseline of 0.29. Source separation SDR of 8.08 dB is obtained by using global weighted rank pooling (GWRP) as probability mapping, outperforming the global max pooling (GMP) based probability mapping giving SDR at 0.03 dB. Source code of our work is published.
- We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida--Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric (RS) and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of L1-based methods, and the minimum representation error under RS assumption is obtained at the edge of the RS/RSB phase. The correspondence between the convergence of the existing coordinate descent algorithm and RS/RSB transition is also indicated.
- This paper investigates the classification of the Audio Set dataset. Audio Set is a large scale weakly labelled dataset of sound clips. Previous work used multiple instance learning (MIL) to classify weakly labelled data. In MIL, a bag consists of several instances, and a bag is labelled positive if at least one instances in the audio clip is positive. A bag is labelled negative if all the instances in the bag are negative. We propose an attention model to tackle the MIL problem and explain this attention model from a novel probabilistic perspective. We define a probability space on each bag, where each instance in the bag has a trainable probability measure for each class. Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure. Experimental results show that our proposed attention model modeled by fully connected deep neural network obtains mAP of 0.327 on Audio Set dataset, outperforming the Google's baseline of 0.314 and recurrent neural network of 0.325.
- This paper proposes a practical approach for automatic sleep stage classification based on a multi-level feature learning framework and Recurrent Neural Network (RNN) classifier using heart rate and wrist actigraphy derived from a wearable device. The feature learning framework is designed to extract low- and mid-level features. Low-level features capture temporal and frequency domain properties and mid-level features learn compositions and structural information of signals. Since sleep staging is a sequential problem with long-term dependencies, we take advantage of RNNs with Bidirectional Long Short-Term Memory (BLSTM) architectures for sequence data learning. To simulate the actual situation of daily sleep, experiments are conducted with a resting group in which sleep is recorded in resting state, and a comprehensive group in which both resting sleep and non-resting sleep are included.We evaluate the algorithm based on an eight-fold cross validation to classify five sleep stages (W, N1, N2, N3, and REM). The proposed algorithm achieves weighted precision, recall and F1 score of 58.0%, 60.3%, and 58.2% in the resting group and 58.5%, 61.1%, and 58.5% in the comprehensive group, respectively. Various comparison experiments demonstrate the effectiveness of feature learning and BLSTM. We further explore the influence of depth and width of RNNs on performance. Our method is specially proposed for wearable devices and is expected to be applicable for long-term sleep monitoring at home. Without using too much prior domain knowledge, our method has the potential to generalize sleep disorder detection.
- Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits.
- Power plant is a complex and nonstationary system for which the traditional machine learning modeling approaches fall short of expectations. The ensemble-based online learning methods provide an effective way to continuously learn from the dynamic environment and autonomously update models to respond to environmental changes. This paper proposes such an online ensemble regression approach to model power plant performance, which is critically important for operation optimization. The experimental results on both simulated and real data show that the proposed method can achieve performance with less than 1% mean average percentage error, which meets the general expectations in field operations.
- Data-driven predictive analytics are in use today across a number of industrial applications, but further integration is hindered by the requirement of similarity among model training and test data distributions. This paper addresses the need of learning from possibly nonstationary data streams, or under concept drift, a commonly seen phenomenon in practical applications. A simple dual-learner ensemble strategy, alternating learners framework, is proposed. A long-memory model learns stable concepts from a long relevant time window, while a short-memory model learns transient concepts from a small recent window. The difference in prediction performance of these two models is monitored and induces an alternating policy to select, update and reset the two models. The method features an online updating mechanism to maintain the ensemble accuracy, and a concept-dependent trigger to focus on relevant data. Through empirical studies the method demonstrates effective tracking and prediction when the steaming data carry abrupt and/or gradual changes.
- In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.
- This letter investigates the problem of anti-jamming communications in dynamic and unknown environment through on-line learning. Different from existing studies which need to know (estimate) the jamming patterns and parameters, we use the spectrum waterfall, i.e., the raw spectrum environment, directly. Firstly, to cope with the challenge of infinite state of raw spectrum information, a deep anti-jamming Q-network is constructed. Then, a deep anti-jamming reinforcement learning algorithm is proposed to obtain the optimal anti-jamming strategies. Finally, simulation results validate the the proposed approach. The proposed approach is relying only on the local observed information and does not need to estimate the jamming patterns and parameters, which implies that it can be widely used various anti-jamming scenarios.
- Oct 10 2017 cs.CV arXiv:1710.03011v1Almost all existing visual saliency models focus on predicting a universal saliency map across all observers. Yet psychology studies suggest that visual attention of different observers can vary a lot under some specific circumstances, especially when they view scenes with multiple salient objects. However, few work explores this visual attention difference probably because of lacking a proper dataset, as well as complex correlation between visual attention, personal preferences, and image contents. In this paper, we set out to study this heterogenous visual attention pattern between different observers and build the first dataset for personalized saliency detection. Further, we propose to decompose a personalized saliency map (referred to as PSM) into a universal saliency map (referred to as USM) which can be predicted by any existing saliency detection models and a discrepancy between them. Then personalized saliency detection is casted as the task of discrepancy estimation between PSM and USM. To tackle this task we propose two solutions: i) The discrepancy estimation for different observers are casted as different but related tasks. Then we feed the image and its USM into a multi-task convolutional neural network framework to estimate the discrepancy between PSM and USM for each observer; ii) As the discrepancy is related to both image contents and the observers' person-specific information, we feed the image and its associated USM into a convolutional neural network with person-specific information encoded filters to estimate the discrepancy. Extensive experimental results demonstrate the effectiveness of our models for PSM prediction as well their generalization capability for unseen observers.
- Oct 06 2017 cs.GT arXiv:1710.01918v1This paper investigates the incentive mechanism design from a novel and practically important perspective in which mobile users as contributors do not join simultaneously and a requester desires large efforts from early contributors. A two-stage Tullock contest framework is constructed:at the second stage the potential contributors compete for splittable reward by exerting efforts, and at the first stage the requester can orchestrate the incentive mechanism to maximize his crowdsensing efficiency given the rewarding budget. A general reward discrimination mechanism is developed for timeliness sensitive crowdsensing where an earlier contributor usually has a larger maximum achievable reward and thus allocates more efforts. Owning to the lack of joining time information, two practical implementations, namely earliest-n and termination time, are announced to the contributors. For each of them, we formulate a Stackelberg Bayesian game in which the joining time of a contributor is his type and not available to his opponents. The uniqueness of Bayesian Nash equilibrium (BNE) is proved in each strategy. To maximize the requester's efficiency, we compute the optimal number of rewarded contributors in the earliest-n scheme and the optimal deadline in the termination time scheme. Our contest framework is applicable not only to the closed crowdsensing with fixed number of contributors, but also to the open crowdsensing that the arrival of contributors is governed by a stochastic process. Extensive simulations manifest that with appropriate reward discriminations, the requester is able to achieve a much higher efficiency with the optimal selection of the number of rewarded contributiors and the termination time.
- In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge. The audio clips in this task, which are extracted from YouTube videos, are manually labeled with one or a few audio tags but without timestamps of the audio events, which is called as weakly labeled data. Two sub-tasks are defined in this challenge including audio tagging and sound event detection using this weakly labeled data. A convolutional recurrent neural network (CRNN) with learnable gated linear units (GLUs) non-linearity applied on the log Mel spectrogram is proposed. In addition, a temporal attention method is proposed along the frames to predicate the locations of each audio event in a chunk from the weakly labeled data. We ranked the 1st and the 2nd as a team in these two sub-tasks of DCASE 2017 challenge with F value 55.6\% and Equal error 0.73, respectively.
- Sep 20 2017 cs.GT arXiv:1709.06367v1We consider the well-studied game-theoretic version of machine scheduling in which jobs correspond to self-interested users and machines correspond to resources. Here each user chooses a machine trying to minimize her own cost, and such selfish behavior typically results in some equilibrium which is not globally optimal: An equilibrium is an allocation where no user can reduce her own cost by moving to another machine, which in general need not minimize the makespan, i.e., the maximum load over the machines. We provide tight bounds on two well-studied notions in algorithmic game theory, namely, the price of anarchy and the strong price of anarchy on machine scheduling setting which lies in between the related and the unrelated machine case. Both notions study the social cost (makespan) of the worst equilibrium compared to the optimum, with the strong price of anarchy restricting to a stronger form of equilibria. Our results extend a prior study comparing the price of anarchy to the strong price of anarchy for two related machines (Epstein, Acta Informatica 2010), thus providing further insights on the relation between these concepts. Our exact bounds give a qualitative and quantitative comparison between the two models. The bounds also show that the setting is indeed easier than the two unrelated machines: In the latter, the strong price of anarchy is $2$, while in ours it is strictly smaller.
- Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose changes are mostly attributed to the subject's interactions with the surrounding, e.g., crossing behind another objects, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluents and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate the above joint task as an iterative search of feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrated that our method clearly outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.
- Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing framework that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.
- Sep 14 2017 cs.CV arXiv:1709.04344v3How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the concept of layer-wise priority which binarizes parameters in inverse order of their layer depth. In each training step, our method selects a specific network layer, minimizes the discrepancy between the original real-valued weights and its binary approximations, and fine-tunes the whole network accordingly. During the iteration of the above process, it is significant that we can flexibly decide whether to binarize the remaining floating layers or not and explore a trade-off between the loss of performance and the compression ratio of model. The resulting binary network is applied for efficient pedestrian detection. Extensive experimental results on several benchmarks show that under the same compression ratio, our method achieves much lower miss rate and faster detection speed than the state-of-the-art neural network binarization method.
- Sep 12 2017 cs.CV arXiv:1709.03272v3In this paper, we introduce a novel end-end framework for multi-oriented scene text detection from an instance-aware semantic segmentation perspective. We present Fused Text Segmentation Networks, which combine multi-level features during the feature extracting as text instance may rely on finer feature expression compared to general objects. It detects and segments the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task. Not involving any extra pipelines, our approach surpasses the current state of the art on multi-oriented scene text detection benchmarks: ICDAR2015 Incidental Scene Text and MSRA-TD500 reaching Hmean 84.1% and 82.0% respectively. Morever, we report a baseline on total-text containing curved text which suggests effectiveness of the proposed approach.
- Sep 08 2017 cs.CV arXiv:1709.02054v3Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon "attention drift". To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.
- Sep 05 2017 cs.SD arXiv:1709.00551v2In this technique report, we present a bunch of methods for the task 4 of Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are YouTube video excerpts focusing on transportation and warnings due to their industry applications. There are two tasks, audio tagging and sound event detection from weakly labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU) based recurrent neural network (RNN) are adopted as our basic framework. We proposed a learnable gating activation function for selecting informative local features. Attention-based scheme is used for localizing the specific events in a weakly-supervised mode. A new batch-level balancing strategy is also proposed to tackle the data unbalancing problem. Fusion of posteriors from different systems are found effective to improve the performance. In a summary, we get 61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the sound event detection subtask on the development set. While the official multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the audio tagging and 1.02 for the sound event detection.
- Aug 22 2017 cs.AI arXiv:1708.05930v1In this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.
- Existing block-diagonal representation researches mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intra-class representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image datasets, three character datasets, and the fifteen scene multi-categories dataset. It not only shows superior potential on image recognition but also outperforms state-of-the-art methods.
- This paper investigates the application of simultaneous wireless information and power transfer (SWIPT) to cooperative non-orthogonal multiple access (NOMA). A new cooperative multiple-input single-output (MISO) SWIPT NOMA protocol is proposed, where a user with a strong channel condition acts as an energy-harvesting (EH) relay to help a user with a poor channel condition. The power splitting (PS) scheme is adopted at the EH relay. By jointly optimizing the PS ratio and the beamforming vectors, the design objective is to maximize the data rate of the "strong user" while satisfying the QoS requirement of the "weak user". It boils down to a challenging nonconvex problem. To resolve this issue, the semidefinite relaxation (SDR) technique is applied to relax the quadratic terms related with the beamformers, and then it is solved to its global optimality by two-dimensional exhaustive search. We prove the rank-one optimality, which establishes the equivalence between the relaxed problem and the original one. To further reduce the high complexity due to the exhaustive search, an iterative algorithm based on successive convex approximation (SCA) is proposed, which can at least attain its stationary point efficiently. In view of the potential application scenarios, e.g., IoT, the single-input single-output (SISO) case of the cooperative SWIPT NOMA system is also studied. The formulated problem is proved to be strictly unimodal with respect to the PS ratio. Hence, a golden section search (GSS) based algorithm with closed-form solution at each step is proposed to find the unique global optimal solution. It is worth pointing out that the SCA method can also converge to the optimal solution in SISO cases. In the numerical simulation, the proposed algorithm is numerically shown to converge within a few iterations, and the SWIPT-aided NOMA protocol outperforms the existing transmission protocols.
- This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.
- May 30 2017 cs.NI arXiv:1705.09999v1Network Function Virtualization (NFV) shed new light for the design, deployment, and management of cloud networks. Many network functions such as firewalls, load balancers, and intrusion detection systems can be virtualized by servers. However, network operators often have to sacrifice programmability in order to achieve high throughput, especially at networks' edge where complex network functions are required. Here, we design, implement, and evaluate Hybrid Modular Switch (HyMoS). The hybrid hardware/software switch is designed to meet requirements for modern-day NFV applications in providing high-throughput, with a high degree of programmability. HyMoS utilizes P4-compatible Network Interface Cards (NICs), PCI Express interface and CPU to act as line cards, switch fabric, and fabric controller respectively. In our implementation of HyMos, PCI Express interface is turned into a non-blocking switch fabric with a throughput of hundreds of Gigabits per second. Compared to existing NFV infrastructure, HyMoS offers modularity in hardware and software as well as a higher degree of programmability by supporting a superset of P4 language.
- Determining deep holes is an important topic in decoding Reed-Solomon codes. Let $l\ge 1$ be an integer and $a_1,\ldots,a_l$ be arbitrarily given $l$ distinct elements of the finite field ${\bf F}_q$ of $q$ elements with the odd prime number $p$ as its characteristic. Let $D={\bf F}_q\backslash\{a_1,\ldots,a_l\}$ and $k$ be an integer such that $2\le k\le q-l-1$. In this paper, we study the deep holes of generalized projective Reed-Solomon code ${\rm GPRS}_q(D, k)$ of length $q-l+1$ and dimension $k$ over ${\bf F}_q$. For any $f(x)\in {\bf F}_q[x]$, we let $f(D)=(f(y_1),\ldots,f(y_{q-l}))$ if $D=\{y_1, ..., y_{q-l}\}$ and $c_{k-1}(f(x))$ be the coefficient of $x^{k-1}$ of $f(x)$. By using Dür's theorem on the relation between the covering radius and minimum distance of ${\rm GPRS}_q(D, k)$, we show that if $u(x)\in {\bf F}_q[x]$ with $\deg (u(x))=k$, then the received codeword $(u(D), c_{k-1}(u(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\sum\limits_{y\in I}y$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$. We show also that if $j$ is an integer with $1\leq j\leq l$ and $u_j(x):= \lambda_j(x-a_j)^{q-2}+\nu_j x^{k-1}+f_{\leq k-2}^{(j)}(x)$ with $\lambda_j\in {\bf F}_q^*$, $\nu_j\in {\bf F}_q$ and $f_{\leq{k-2}}^{(j)}(x)\in{\bf F}_q[x]$ being a polynomial of degree at most $k-2$, then $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\binom{q-2}{k-1}(-a_j)^{q-1-k}\prod\limits_{y\in I}(a_j-y)+e$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$, where $e$ is the identity of the group ${\bf F}_q^*$. This implies that $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if $p|k$.
- Recent several years have witnessed the surge of asynchronous (async-) parallel computing methods due to the extremely big data involved in many modern applications and also the advancement of multi-core machines and computer clusters. In optimization, most works about async-parallel methods are on unconstrained problems or those with block separable constraints. In this paper, we propose an async-parallel method based on block coordinate update (BCU) for solving convex problems with nonseparable linear constraint. Running on a single node, the method becomes a novel randomized primal-dual BCU with adaptive stepsize for multi-block affinely constrained problems. For these problems, Gauss-Seidel cyclic primal-dual BCU needs strong convexity to have convergence. On the contrary, merely assuming convexity, we show that the objective value sequence generated by the proposed algorithm converges in probability to the optimal value and also the constraint residual to zero. In addition, we establish an ergodic $O(1/k)$ convergence result, where $k$ is the number of iterations. Numerical experiments are performed to demonstrate the efficiency of the proposed method and significantly better speed-up performance than its sync-parallel counterpart.
- The minimum feedback arc set problem asks to delete a minimum number of arcs (directed edges) from a digraph (directed graph) to make it free of any directed cycles. In this work we approach this fundamental cycle-constrained optimization problem by considering a generalized task of dividing the digraph into D layers of equal size. We solve the D-segmentation problem by the replica-symmetric mean field theory and belief-propagation heuristic algorithms. The minimum feedback arc density of a given random digraph ensemble is then obtained by extrapolating the theoretical results to the limit of large D. A divide-and-conquer algorithm (nested-BPR) is devised to solve the minimum feedback arc set problem with very good performance and high efficiency.
- May 16 2017 cs.AR arXiv:1705.04981v1In this paper, we investigate the challenges to apply Statistical Static Timing Analysis (SSTA) in hierarchical design flow, where modules supplied by IP vendors are used to hide design details for IP protection and to reduce the complexity of design and verification. For the three basic circuit types, combinational, flip-flop-based and latch-controlled, we propose methods to extract timing models which contain interfacing as well as compressed internal constraints. Using these compact timing models the runtime of full-chip timing analysis can be reduced, while circuit details from IP vendors are not exposed. We also propose a method to reconstruct the correlation between modules during full-chip timing analysis. This correlation can not be incorporated into timing models because it depends on the layout of the corresponding modules in the chip. In addition, we investigate how to apply the extracted timing models with the reconstructed correlation to evaluate the performance of the complete design. Experiments demonstrate that using the extracted timing models and reconstructed correlation full-chip timing analysis can be several times faster than applying the flattened circuit directly, while the accuracy of statistical timing analysis is still well maintained.
- Background: Mining gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. In this work, we explore the plausibility of detecting gene modules by factorizing gene-phenotype associations from a phenotype ontology rather than the conventionally used gene expression data. In particular, the hierarchical structure of ontology has not been sufficiently utilized in clustering genes while functionally related genes are consistently associated with phenotypes on the same path in the phenotype ontology. Results: We propose a hierarchal Nonnegative Matrix Factorization (NMF)-based method, called Consistent Multiple Nonnegative Matrix Factorization (CMNMF), to factorize genome-phenome association matrix at two levels of the hierarchical structure in phenotype ontology for mining gene functional modules. CMNMF constrains the gene clusters from the association matrices at two consecutive levels to be consistent since the genes are annotated with both the child phenotype and the parent phenotype in the consecutive levels. CMNMF also restricts the identified phenotype clusters to be densely connected in the phenotype ontology hierarchy. In the experiments on mining functionally related genes from mouse phenotype ontology and human phenotype ontology, CMNMF effectively improved clustering performance over the baseline methods. Gene ontology enrichment analysis was also conducted to reveal interesting gene modules. Conclusions: Utilizing the information in the hierarchical structure of phenotype ontology, CMNMF can identify functional gene modules with more biological significance than the conventional methods. CMNMF could also be a better tool for predicting members of gene pathways and protein-protein interactions. Availability: https://github.com/nkiip/CMNMF
- May 02 2017 cs.CV arXiv:1705.00609v1In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
- There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals.
- We collect and analyze the darkweb (a.k.a. the "onionweb") hyperlink graph. We find properties highly dissimilar to the well-studied world wide web hyperlink graph; for example, our analysis finds that >87% of darkweb sites never link to another site. We compare our results to prior work on world-wide-web and speculate about reasons for their differences. We conclude that in the term "darkweb", the word "web" is a connectivity misnomer. Instead, it is more accurate to view the darkweb as a set of largely isolated dark silos.
- Apr 19 2017 cs.SY arXiv:1704.05411v2Microgrids are resources that can be used to restore critical loads after a natural disaster, enhancing resilience of a distribution network. To deal with the stochastic nature of intermittent energy resources, such as wind turbines (WTs) and photovoltaics (PVs), many methods rely on forecast information. However, some microgrids may not be equipped with power forecasting tools. To fill this gap, a risk-limiting strategy based on measurements is proposed. Gaussian mixture model (GMM) is used to represent a prior joint probability density function (PDF) of power outputs of WTs and PVs over multiple periods. As time rolls forward, the distribution of WT/PV generation is updated based the latest measurement data in a recursive manner. The updated distribution is used as an input for the risk-limiting load restoration problem, enabling an equivalent transformation of the original chance constrained problem into a mixed integer linear programming (MILP). Simulation cases on a distribution system with three microgrids demonstrate the effectiveness of the proposed method. Results also indicate that networked microgrids have better uncertainty management capabilities than stand-alone microgrids.
- For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.
- Apr 18 2017 cs.CV arXiv:1704.04613v2Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system. Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification. In our framework, the attention mechanism is adopted to reveal the relevance between each recognized word and the given image, which further enhances the recognition performance. We have performed experiments on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed for fine-grained classification of business places and drink bottles, respectively. The experimental results consistently demonstrate that the proposed method combining textual and visual cues significantly outperforms classification with only visual representations. Moreover, we have shown that the learned representation improves the retrieval performance on the drink bottle images by a large margin, making it potentially useful in product search.
- Apr 18 2017 cs.DS arXiv:1704.04615v2Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text is widely accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g., suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text index, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are particularly slow, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual quadtree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the quadtree. Compared with the existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient.
- Recently manifold learning algorithm for dimensionality reduction attracts more and more interests, and various linear and nonlinear, global and local algorithms are proposed. The key step of manifold learning algorithm is the neighboring region selection. However, so far for the references we know, few of which propose a generally accepted algorithm to well select the neighboring region. So in this paper, we propose an adaptive neighboring selection algorithm, which successfully applies the LLE and ISOMAP algorithms in the test. It is an algorithm that can find the optimal K nearest neighbors of the data points on the manifold. And the theoretical basis of the algorithm is the approximated curvature of the data point on the manifold. Based on Riemann Geometry, Jacob matrix is a proper mathematical concept to predict the approximated curvature. By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results. By quantitative analysis, the embedding quality measured by residual variance is increased 45.45% after using the proposed algorithm in LLE.
- Traditionally, Internet Access Providers (APs) only charge end-users for Internet access services; however, to recoup infrastructure costs and increase revenues, some APs have recently adopted two-sided pricing schemes under which both end-users and content providers are charged. Meanwhile, with the rapid growth of traffic, network congestion could seriously degrade user experiences and influence providers' utility. To optimize profit and social welfare, APs and regulators need to design appropriate pricing strategies and regulatory policies that take the effects of network congestion into consideration. In this paper, we model two-sided networks under which users' traffic demands are influenced by exogenous pricing and endogenous congestion parameters and derive the system congestion under an equilibrium. We characterize the structures and sensitivities of profit- and welfare-optimal two-sided pricing schemes and reveal that 1) the elasticity of system throughput plays a crucial role in determining the structures of optimal pricing, 2) the changes of optimal pricing under varying AP's capacity and users' congestion sensitivity are largely driven by the type of data traffic, e.g., text or video, and 3) APs and regulators will be incentivized to shift from one-sided to two-sided pricing when APs' capacities and user demand for video traffic grow. Our results can help APs design optimal two-sided pricing and guide regulators to legislate desirable policies.
- In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(k\log(pq)+\log(k!))$, and the necessary number of samples is $\Omega(k\log (pq)-\log(k!))$. We further propose a greedy algorithm for regression in order to validate our theoretical findings through synthetic experiments.
- Apr 04 2017 cs.CV arXiv:1704.00438v2Unconstrained face recognition performance evaluations have traditionally focused on Labeled Faces in the Wild (LFW) dataset for imagery and the YouTubeFaces (YTF) dataset for videos in the last couple of years. Spectacular progress in this field has resulted in saturation on verification and identification accuracies for those benchmark datasets. In this paper, we propose a unified learning framework named Transferred Deep Feature Fusion (TDFF) targeting at the new IARPA Janus Benchmark A (IJB-A) face recognition dataset released by NIST face challenge. The IJB-A dataset includes real-world unconstrained faces from 500 subjects with full pose and illumination variations which are much harder than the LFW and YTF datasets. Inspired by transfer learning, we train two advanced deep convolutional neural networks (DCNN) with two different large datasets in source domain, respectively. By exploring the complementarity of two distinct DCNNs, deep feature fusion is utilized after feature extraction in target domain. Then, template specific linear SVMs is adopted to enhance the discrimination of framework. Finally, multiple matching scores corresponding different templates are merged as the final results. This simple unified framework exhibits excellent performance on IJB-A dataset. Based on the proposed approach, we have submitted our IJB-A results to National Institute of Standards and Technology (NIST) for official evaluation. Moreover, by introducing new data and advanced neural architecture, our method outperforms the state-of-the-art by a wide margin on IJB-A dataset.
- Mar 30 2017 cs.LG arXiv:1703.10094v2The inverse mapping of GANs'(Generative Adversarial Nets) generator has a great potential value.Hence, some works have been developed to construct the inverse function of generator by directly learning or adversarial learning.While the results are encouraging, the problem is highly challenging and the existing ways of training inverse models of GANs have many disadvantages, such as hard to train or poor performance.Due to these reasons, we propose a new approach based on using inverse generator ($IG$) model as encoder and pre-trained generator ($G$) as decoder of an AutoEncoder network to train the $IG$ model. In the proposed model, the difference between the input and output, which are both the generated image of pre-trained GAN's generator, of AutoEncoder is directly minimized. The optimizing method can overcome the difficulty in training and inverse model of an non one-to-one function.We also applied the inverse model of GANs' generators to image searching and translation.The experimental results prove that the proposed approach works better than the traditional approaches in image searching.
- Mar 22 2017 cs.SD arXiv:1703.07172v1We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work. A series of experiments show that joint LPS and MFCC learning improves the SE performance, and IBM-based post-processing further enhances listening quality of the reconstructed speech.
- Mar 20 2017 cs.SD arXiv:1703.06052v1Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.
- Mar 09 2017 cs.DS arXiv:1703.02693v2This is paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and e cient method for weighted sampling from a stream of uniquely keyed items, there is no current algorithm that realizes the benefits of order sampling in the context of stream aggregation over non-unique keys. A naive approach to order sample regardless of key then aggregate the results is hopelessly inefficient. In distinction, our proposed algorithm uses a single persistent random variable across the lifetime of each key in the cache, and maintains unbiased estimates of the key aggregates that can be queried at any point in the stream. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. This approach represents a considerable reduction in computational complexity compared with the state of the art in adapting Sample and Hold to operate with a fixed cache size. Concerning statistical properties, we prove that PBA provides unbiased estimates of the true aggregates. We analyze the computational complexity of PBA and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queries
- Mar 01 2017 cs.GT arXiv:1702.08794v1The recent online platforms propose multiple items for bidding. The state of the art, however, is limited to the analysis of one item auction without resubmission. In this paper we study multi-item lowest unique bid auctions (LUBA) with resubmission in discrete bid spaces under budget constraints. We show that the game does not have pure Bayes-Nash equilibria (except in very special cases). However, at least one mixed Bayes-Nash equilibria exists for arbitrary number of bidders and items. The equilibrium is explicitly computed for two-bidder setup with resubmission possibilities. In the general setting we propose a distributed strategic learning algorithm to approximate equilibria. Computer simulations indicate that the error quickly decays in few number of steps. When the number of bidders per item follows a Poisson distribution, it is shown that the seller can get a non-negligible revenue on several items, and hence making a partial revelation of the true value of the items. Finally, the attitude of the bidders towards the risk is considered. In contrast to risk-neutral agents who bids very small values, the cumulative distribution and the bidding support of risk-sensitive agents are more distributed.
- Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
- Feb 22 2017 cs.DS arXiv:1702.06256v2The maximum duo-preservation string mapping (Max-Duo) problem is the complement of the well studied minimum common string partition (MCSP) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-Max-Duo is the restricted version of Max-Duo, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known maximum independent set (MIS) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-Max-Duo can then be approximated arbitrarily close to $1.8$ using the state-of-the-art approximation algorithm for the MIS problem. $2$-Max-Duo was proved APX-hard and very recently a $(1.6 + \epsilon)$-approximation was claimed, for any $\epsilon > 0$. In this paper, we present a vertex-degree reduction technique, based on which, we show that $2$-Max-Duo can be approximated arbitrarily close to $1.4$.