Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation learning models (i.e., auto-encoders) together with cross-domain learning criteria (i.e., Maximum Mean Discrepancy loss) to learn joint embeddings for semantic and visual features. A novel technique of unsupervised-data adaptation inference is introduced to construct more comprehensive embeddings for both labeled and unlabeled data. We evaluate our method on Animals with Attributes and Caltech-UCSD Birds 200-2011 dataset with a wide range of applications, including zero and few-shot image recognition and retrieval, from inductive to transductive settings. Empirically, we show that our framework improves over the current state of the art on many of the considered tasks.
Mar 02 2017 cs.CV
Compositing is one of the most common operations in photo editing. To generate realistic composites, the appearances of foreground and background need to be adjusted to make them compatible. Previous approaches to harmonize composites have focused on learning statistical relationships between hand-crafted appearance features of the foreground and background, which is unreliable especially when the contents in the two layers are vastly different. In this work, we propose an end-to-end deep convolutional neural network for image harmonization, which can capture both the context and semantic information of the composite images during harmonization. We also introduce an efficient way to collect large-scale and high-quality training data that can facilitate the training process. Experiments on the synthesized dataset and real composite images show that the proposed network outperforms previous state-of-the-art methods.
Jan 09 2017 cs.CV
Automatic photo cropping is an important tool for improving visual quality of digital photos without resorting to tedious manual selection. Traditionally, photo cropping is accomplished by determining the best proposal window through visual quality assessment or saliency detection. In essence, the performance of an image cropper highly depends on the ability to correctly rank a number of visually similar proposal windows. Despite the ranking nature of automatic photo cropping, little attention has been paid to learning-to-rank algorithms in tackling such a problem. In this work, we conduct an extensive study on traditional approaches as well as ranking-based croppers trained on various image features. In addition, a new dataset consisting of high quality cropping and pairwise ranking annotations is presented to evaluate the performance of various baselines. The experimental results on the new dataset provide useful insights into the design of better photo cropping algorithms.
Nov 24 2016 cs.CV
We present Fast Fourier Color Constancy (FFCC), a color constancy algorithm which solves illuminant estimation by reducing it to a spatial localization task on a torus. By operating in the frequency domain, FFCC produces lower error rates than the previous state-of-the-art by 13-20% while being 250-3000 times faster. This unconventional approach introduces challenges regarding aliasing, directional statistics, and preconditioning, which we address. By producing a complete posterior distribution over illuminants instead of a single illuminant estimate, FFCC enables better training techniques, an effective temporal smoothing technique, and richer methods for error analysis. Our implementation of FFCC runs at ~700 frames per second on a mobile device, allowing it to be used as an accurate, real-time, temporally-coherent automatic white balance algorithm.
In this study, a new subspace-constrained diagonal loading (SSC-DL) method is presented for robust beamforming against the issue of a mismatched direction of arrival (DoA), based on an extension to the well known diagonal loading (DL) technique. One important difference of the proposed SSC-DL from conventional DL is that it imposes an additional constraint to restrict the optimal weight vector within a subspace whose basis vectors are determined by a number of angles neighboring to the estimated DoA. Unlike many existing methods which resort to a beamwidth expansion, the weight vector produced by SSC-DL has a relatively small beamwidth around the DoA of the target signal. Yet, the SSC-DL beamformer has a great interference suppression level, thereby achieving an improved overall SINR performance. Simulation results suggest the proposed method has a near-to-optimal SINR performance.
Nov 17 2015 cs.PL
This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, short-circuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete.
May 09 2013 cs.CV
Colorectal polyps are important precursors to colon cancer, a major health problem. Colon capsule endoscopy (CCE) is a safe and minimally invasive examination procedure, in which the images of the intestine are obtained via digital cameras on board of a small capsule ingested by a patient. The video sequence is then analyzed for the presence of polyps. We propose an algorithm that relieves the labor of a human operator analyzing the frames in the video sequence. The algorithm acts as a binary classifier, which labels the frame as either containing polyps or not, based on the geometrical analysis and the texture content of the frame. The geometrical analysis is based on a segmentation of an image with the help of a mid-pass filter. The features extracted by the segmentation procedure are classified according to an assumption that the polyps are characterized as protrusions that are mostly round in shape. Thus, we use a best fit ball radius as a decision parameter of a binary classifier. We present a statistical study of the performance of our approach on a data set containing over 18,900 frames from the endoscopic video sequences of five adult patients. The algorithm demonstrates a solid performance, achieving 47% sensitivity per frame and over 81% sensitivity per polyp at a specificity level of 90%. On average, with a video sequence length of 3747 frames, only 367 false positive frames need to be inspected by a human operator.
For the multiuser multiple-input multiple-output (MIMO) downlink channel, the users feedback their channel state information (CSI) to help the base station (BS) schedule users and improve the system sum rate. However, this incurs a large aggregate feedback bandwidth which grows linearly with the number of users. In this paper, we propose a novel scheme to reduce the feedback load in a downlink orthogonal space division multiple access (SDMA) system with zero-forcing receivers by allowing the users to dynamically determine the number of feedback bits to use according to multiple decision thresholds. Through theoretical analysis, we show that, while keeping the aggregate feedback load of the entire system constant regardless of the number of users, the proposed scheme almost achieves the optimal asymptotic sum rate scaling with respect to the number of users (also known as the multiuser diversity). Specifically, given the number of thresholds, the proposed scheme can achieve a constant portion of the optimal sum rate achievable only by the system where all the users always feedback, and the remaining portion (referred to as the sum rate loss) decreases exponentially to zero as the number of thresholds increases. By deriving a tight upper bound for the sum rate loss, the minimum number of thresholds for a given tolerable sum rate loss is determined. In addition, a fast bit allocation method is discussed for the proposed scheme, and the simulation results show that the sum rate performances with the complex optimal bit allocation method and with the fast algorithm are almost the same. We compare our multi-threshold scheme to some previously proposed feedback schemes. Through simulation, we demonstrate that the proposed scheme can reduce the feedback load and utilize the limited feedback bandwidth more effectively than the existing feedback methods.
Oct 26 2007 cs.AR
On-chip networks have been proposed as the interconnect fabric for future systems-on-chip and multi-processors on chip. Power is one of the main constraints of these systems and interconnect consumes a significant portion of the power budget. In this paper, we propose four leakage-aware interconnect schemes. Our schemes achieve 10.13%~63.57% active leakage savings and 12.35%~95.96% standby leakage savings across schemes while the delay penalty ranges from 0% to 4.69%.