We study domain-specific video streaming. Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission. Several popular video streaming services, such as the video game streaming services of GeForce Now and Twitch, fall in this category. While conventional video compression standards such as H.264 are commonly used for this task, we hypothesize that one can leverage the property that the videos are all in the same domain to achieve better video quality. Based on this hypothesis, we propose a novel video compression pipeline. Specifically, we first apply H.264 to compress domain-specific videos. We then train a novel binary autoencoder to encode the leftover domain-specific residual information frame-by-frame into binary representations. These binary representations are then compressed and sent to the client together with the H.264 stream. In our experiments, we show that our pipeline yields consistent gains over standard H.264 compression across several benchmark datasets while using the same channel bandwidth.
Nov 10 2017 cs.LG
The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structures in practice, and we argue that there exist some unknown orders within the data instances. Aiming to find such orders, we introduce a novel Generative Markov Network (GMN) which we use to extract the order of data instances automatically. Specifically, we assume that the instances are sampled from a Markov chain. Our goal is to learn the transitional operator of the chain as well as the generation order by maximizing the generation probability under all possible data permutations. One of our key ideas is to use neural networks as a soft lookup table for approximating the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model and make the transitional operator generalizable to unseen instances. To ensure the learned Markov chain is ergodic, we propose a greedy batch-wise permutation scheme that allows fast training. Empirically, we evaluate the learned Markov chain by showing that GMNs are able to discover orders among data instances and also perform comparably well to state-of-the-art methods on the one-shot recognition benchmark task.
Large scale applications are increasingly built by composing sets of microservices. In this model the functionality for a single application might be split across 100s or 1000s of microservices. Resource provisioning for these applications is complex, requiring administrators to understand both the functioning of each microservice, and dependencies between microservices in an application. In this paper we present ThrottleBot, a system that automates the process of determining what resource when allocated to which microservice is likely to have the greatest impact on application performance. We demonstrate the efficacy of our approach by applying ThrottleBot to both synthetic and real world applications. We believe that ThrottleBot when combined with existing microservice orchestrators, e.g., Kubernetes, enables push-button deployment of web scale applications.
Oct 24 2017 cs.LG
Deep Neural Networks (DNNs) often struggle with one-shot learning where we have only one or a few labeled training examples per category. In this paper, we argue that by using side information, we may compensate the missing information across classes. We introduce two statistical approaches for fusing side information into data representation learning to improve one-shot learning. First, we propose to enforce the statistical dependency between data representations and multiple types of side information. Second, we introduce an attention mechanism to efficiently treat examples belonging to the 'lots-of-examples' classes as quasi-samples (additional training samples) for 'one-example' classes. We empirically show that our learning architecture improves over traditional softmax regression networks as well as state-of-the-art attentional regression networks on one-shot recognition tasks.
Oct 19 2017 cs.CV
We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models. Compared to previous methods that only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts. In addition, these learned features are utilized to generate global and spatial priors for explicit classes inference. We then design modules to embed the feature representations and the priors into the segmentation network as additional global context cues. We show that the proposed method can eliminate false positives that are not compatible with the global context representations. Experiments on both the MIT ADE20K and PASCAL Context datasets show that the proposed method performs favorably against existing methods.
Oct 04 2017 cs.CV
We propose a new neural network architecture for automatic generation of missing characters in a Chinese font set. We call the neural network architecture the Variational Grid Setting Network which is based on the variational autoencoder (VAE) with some tweaks. The neural network model is able to generate missing characters relatively large in size ($256 \times 256$ pixels). Moreover, we show that one can use very few samples for training data set, and get a satisfied result.
Sep 21 2017 cs.CV
This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos. The proposed SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework. The segmentation branch is based on a fully convolutional network, which has been proved effective in image segmentation task, and the optical flow branch takes advantage of the FlowNet model. The unified framework is trained iteratively offline to learn a generic notion, and fine-tuned online for specific objects. Extensive experiments on both the video object segmentation and optical flow datasets demonstrate that introducing optical flow improves the performance of segmentation and vice versa, against the state-of-the-art algorithms.
Sep 15 2017 cs.CV
We propose a deep learning-based framework for instance-level object segmentation. Our method mainly consists of three steps. First, We train a generic model based on ResNet-101 for foreground/background segmentations. Second, based on this generic model, we fine-tune it to learn instance-level models and segment individual objects by using augmented object annotations in first frames of test videos. To distinguish different instances in the same video, we compute a pixel-level score map for each object from these instance-level models. Each score map indicates the objectness likelihood and is only computed within the foreground mask obtained in the first step. To further refine this per frame score map, we learn a spatial propagation network. This network aims to learn how to propagate a coarse segmentation mask spatially based on the pairwise similarities in each frame. In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video. Finally, we decide the instance-level object segmentation in each video by comparing score maps of different instances.
Sep 13 2017 cs.SY
The millimeter-wave (mmWave) full-dimensional (FD) MIMO system employs planar arrays at both the base station and user equipment and can simultaneously support both azimuth and elevation beamforming. In this paper, we propose atomic-norm-based methods for mmWave FD-MIMO channel estimation under both uniform planar arrays (UPA) and non-uniform planar arrays (NUPA). Unlike existing algorithms such as compressive sensing (CS) or subspace methods, the atomic-norm-based algorithms do not require to discretize the angle spaces of the angle of arrival (AoA) and angle of departure (AoD) into grids, thus provide much better accuracy in estimation. In the UPA case, to reduce the computational complexity, the original large-scale 4D atomic norm minimization problem is approximately reformulated as a semi-definite program (SDP) containing two decoupled two-level Toeplitz matrices. The SDP is then solved via the alternating direction method of multipliers (ADMM) where each iteration involves only closed-form computations. In the NUPA case, the atomic-norm-based formulation for channel estimation becomes nonconvex and a gradient-decent-based algorithm is proposed to solve the problem. Simulation results show that the proposed algorithms achieve better performance than the CS-based and subspace-based algorithms.
Jun 09 2017 cs.LG
The paradigm shift from shallow classifiers with hand-crafted features to end-to-end trainable deep learning models has shown significant improvements on supervised learning tasks. Despite the promising power of deep neural networks (DNN), how to alleviate overfitting during training has been a research topic of interest. In this paper, we present a Generative-Discriminative Variational Model (GDVM) for visual classification, in which we introduce a latent variable inferred from inputs for exhibiting generative abilities towards prediction. In other words, our GDVM casts the supervised learning task as a generative learning process, with data discrimination to be jointly exploited for improved classification. In our experiments, we consider the tasks of multi-class classification, multi-label classification, and zero-shot learning. We show that our GDVM performs favorably against the baselines or recent generative DNN models.
Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation learning models (i.e., auto-encoders) together with cross-domain learning criteria (i.e., Maximum Mean Discrepancy loss) to learn joint embeddings for semantic and visual features. A novel technique of unsupervised-data adaptation inference is introduced to construct more comprehensive embeddings for both labeled and unlabeled data. We evaluate our method on Animals with Attributes and Caltech-UCSD Birds 200-2011 dataset with a wide range of applications, including zero and few-shot image recognition and retrieval, from inductive to transductive settings. Empirically, we show that our framework improves over the current state of the art on many of the considered tasks.
Mar 02 2017 cs.CV
Compositing is one of the most common operations in photo editing. To generate realistic composites, the appearances of foreground and background need to be adjusted to make them compatible. Previous approaches to harmonize composites have focused on learning statistical relationships between hand-crafted appearance features of the foreground and background, which is unreliable especially when the contents in the two layers are vastly different. In this work, we propose an end-to-end deep convolutional neural network for image harmonization, which can capture both the context and semantic information of the composite images during harmonization. We also introduce an efficient way to collect large-scale and high-quality training data that can facilitate the training process. Experiments on the synthesized dataset and real composite images show that the proposed network outperforms previous state-of-the-art methods.
Jan 09 2017 cs.CV
Automatic photo cropping is an important tool for improving visual quality of digital photos without resorting to tedious manual selection. Traditionally, photo cropping is accomplished by determining the best proposal window through visual quality assessment or saliency detection. In essence, the performance of an image cropper highly depends on the ability to correctly rank a number of visually similar proposal windows. Despite the ranking nature of automatic photo cropping, little attention has been paid to learning-to-rank algorithms in tackling such a problem. In this work, we conduct an extensive study on traditional approaches as well as ranking-based croppers trained on various image features. In addition, a new dataset consisting of high quality cropping and pairwise ranking annotations is presented to evaluate the performance of various baselines. The experimental results on the new dataset provide useful insights into the design of better photo cropping algorithms.
Nov 24 2016 cs.CV
We present Fast Fourier Color Constancy (FFCC), a color constancy algorithm which solves illuminant estimation by reducing it to a spatial localization task on a torus. By operating in the frequency domain, FFCC produces lower error rates than the previous state-of-the-art by 13-20% while being 250-3000 times faster. This unconventional approach introduces challenges regarding aliasing, directional statistics, and preconditioning, which we address. By producing a complete posterior distribution over illuminants instead of a single illuminant estimate, FFCC enables better training techniques, an effective temporal smoothing technique, and richer methods for error analysis. Our implementation of FFCC runs at ~700 frames per second on a mobile device, allowing it to be used as an accurate, real-time, temporally-coherent automatic white balance algorithm.
In this study, a new subspace-constrained diagonal loading (SSC-DL) method is presented for robust beamforming against the issue of a mismatched direction of arrival (DoA), based on an extension to the well known diagonal loading (DL) technique. One important difference of the proposed SSC-DL from conventional DL is that it imposes an additional constraint to restrict the optimal weight vector within a subspace whose basis vectors are determined by a number of angles neighboring to the estimated DoA. Unlike many existing methods which resort to a beamwidth expansion, the weight vector produced by SSC-DL has a relatively small beamwidth around the DoA of the target signal. Yet, the SSC-DL beamformer has a great interference suppression level, thereby achieving an improved overall SINR performance. Simulation results suggest the proposed method has a near-to-optimal SINR performance.
Nov 17 2015 cs.PL
This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, short-circuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete.
May 09 2013 cs.CV
Colorectal polyps are important precursors to colon cancer, a major health problem. Colon capsule endoscopy (CCE) is a safe and minimally invasive examination procedure, in which the images of the intestine are obtained via digital cameras on board of a small capsule ingested by a patient. The video sequence is then analyzed for the presence of polyps. We propose an algorithm that relieves the labor of a human operator analyzing the frames in the video sequence. The algorithm acts as a binary classifier, which labels the frame as either containing polyps or not, based on the geometrical analysis and the texture content of the frame. The geometrical analysis is based on a segmentation of an image with the help of a mid-pass filter. The features extracted by the segmentation procedure are classified according to an assumption that the polyps are characterized as protrusions that are mostly round in shape. Thus, we use a best fit ball radius as a decision parameter of a binary classifier. We present a statistical study of the performance of our approach on a data set containing over 18,900 frames from the endoscopic video sequences of five adult patients. The algorithm demonstrates a solid performance, achieving 47% sensitivity per frame and over 81% sensitivity per polyp at a specificity level of 90%. On average, with a video sequence length of 3747 frames, only 367 false positive frames need to be inspected by a human operator.
For the multiuser multiple-input multiple-output (MIMO) downlink channel, the users feedback their channel state information (CSI) to help the base station (BS) schedule users and improve the system sum rate. However, this incurs a large aggregate feedback bandwidth which grows linearly with the number of users. In this paper, we propose a novel scheme to reduce the feedback load in a downlink orthogonal space division multiple access (SDMA) system with zero-forcing receivers by allowing the users to dynamically determine the number of feedback bits to use according to multiple decision thresholds. Through theoretical analysis, we show that, while keeping the aggregate feedback load of the entire system constant regardless of the number of users, the proposed scheme almost achieves the optimal asymptotic sum rate scaling with respect to the number of users (also known as the multiuser diversity). Specifically, given the number of thresholds, the proposed scheme can achieve a constant portion of the optimal sum rate achievable only by the system where all the users always feedback, and the remaining portion (referred to as the sum rate loss) decreases exponentially to zero as the number of thresholds increases. By deriving a tight upper bound for the sum rate loss, the minimum number of thresholds for a given tolerable sum rate loss is determined. In addition, a fast bit allocation method is discussed for the proposed scheme, and the simulation results show that the sum rate performances with the complex optimal bit allocation method and with the fast algorithm are almost the same. We compare our multi-threshold scheme to some previously proposed feedback schemes. Through simulation, we demonstrate that the proposed scheme can reduce the feedback load and utilize the limited feedback bandwidth more effectively than the existing feedback methods.
Oct 26 2007 cs.AR
On-chip networks have been proposed as the interconnect fabric for future systems-on-chip and multi-processors on chip. Power is one of the main constraints of these systems and interconnect consumes a significant portion of the power budget. In this paper, we propose four leakage-aware interconnect schemes. Our schemes achieve 10.13%~63.57% active leakage savings and 12.35%~95.96% standby leakage savings across schemes while the delay penalty ranges from 0% to 4.69%.