Computer Vision and Pattern Recognition (cs.CV)

  • PDF
    We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples. We introduce a novel framework to discover event concept attributes from the web and use that to extract semantic features from images and classify them into social event categories with few training examples. Discovered concepts include a variety of objects, scenes, actions and event sub-types, leading to a discriminative and compact representation for event images. Web images are obtained for each discovered event concept and we use (pretrained) CNN features to train concept classifiers. Extensive experiments on challenging event datasets demonstrate that our proposed method outperforms several baselines using deep CNN features directly in classifying images into events with limited training examples. We also demonstrate that our method achieves the best overall accuracy on a dataset with unseen event categories using a single training example.
  • PDF
    While recent deep neural networks have achieved promising results for 3D reconstruction from a single-view image, these rely on the availability of RGB textures in images and extra information as supervision. In this work, we propose novel stacked hierarchical networks and an end to end training strategy to tackle a more challenging task for the first time, 3D reconstruction from a single-view 2D silhouette image. We demonstrate that our model is able to conduct 3D reconstruction from a single-view silhouette image both qualitatively and quantitatively. Evaluation is performed using Shapenet for the single-view reconstruction and results are presented in comparison with a single network, to highlight the improvements obtained with the proposed stacked networks and the end to end training strategy. Furthermore, 3D re- construction in forms of IoU is compared with the state of art 3D reconstruction from a single-view RGB image, and the proposed model achieves higher IoU than the state of art of reconstruction from a single view RGB image.
  • PDF
    Computer vision has made remarkable progress in recent years. Deep neural network (DNN) models optimized to identify objects in images exhibit unprecedented task-trained accuracy and, remarkably, some generalization ability: new visual problems can now be solved more easily based on previous learning. Biological vision (learned in life and through evolution) is also accurate and general-purpose. Is it possible that these different learning regimes converge to similar problem-dependent optimal computations? We therefore asked whether the human system-level computation of visual perception has DNN correlates and considered several anecdotal test cases. We found that perceptual sensitivity to image changes has DNN mid-computation correlates, while sensitivity to segmentation, crowding and shape has DNN end-computation correlates. Our results quantify the applicability of using DNN computation to estimate perceptual loss, and are consistent with the fascinating theoretical view that properties of human perception are a consequence of architecture-independent visual learning.
  • PDF
    We present Convolutional Oriented Boundaries (COB), which produces multiscale oriented contours and region hierarchies starting from generic image classification Convolutional Neural Networks (CNNs). COB is computationally efficient, because it requires a single CNN forward pass for multi-scale contour detection and it uses a novel sparse boundary representation for hierarchical segmentation; it gives a significant leap in performance over the state-of-the-art, and it generalizes very well to unseen categories and datasets. Particularly, we show that learning to estimate not only contour strength but also orientation provides more accurate results. We perform extensive experiments for low-level applications on BSDS, PASCAL Context, PASCAL Segmentation, and NYUD to evaluate boundary detection performance, showing that COB provides state-of-the-art contours and region hierarchies in all datasets. We also evaluate COB on high-level tasks when coupled with multiple pipelines for object proposals, semantic contours, semantic segmentation, and object detection on various databases (MS-COCO, SBD, PASCAL VOC'07), showing that COB also improves the results for all tasks.
  • PDF
    Recently there has been an enormous interest in generative models for images in deep learning. In pursuit of this, Generative Adversarial Networks (GAN) and Variational Auto-Encoder (VAE) have surfaced as two most prominent and popular models. While VAEs tend to produce excellent reconstructions but blurry samples, GANs generate sharp but slightly distorted images. In this paper we propose a new model called Variational InfoGAN (ViGAN). Our aim is two fold: (i) To generated new images conditioned on visual descriptions, and (ii) modify the image, by fixing the latent representation of image and varying the visual description. We evaluate our model on Labeled Faces in the Wild (LFW), celebA and a modified version of MNIST datasets and demonstrate the ability of our model to generate new images as well as to modify a given image by changing attributes.
  • PDF
    Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession. Despite the recent advances in deep learning that attain impressive results in many domains, pain estimation risks not being able to benefit from this due to the difficulty in obtaining data sets of considerable size. In this work we propose a combination of hand-crafted and deep-learned features that makes the most of deep learning techniques in small sample settings. Encoding shape, appearance, and dynamics, our method significantly outperforms the current state of the art, attaining a RMSE error of less than 1 point on a 16-level pain scale, whilst simultaneously scoring a 67.3% Pearson correlation coefficient between our predicted pain level time series and the ground truth.
  • PDF
    Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps back and forth. We use this specific motion profile to perform short loop closures aligned with wearer's footsteps. For egocentric videos, depth estimation is usually noisy. In an important departure, we use 2D computations for rotation averaging which do not rely upon depth estimates. The two modification results in much more stable algorithm as is evident from our experiments on various egocentric video datasets for different egocentric applications. The proposed algorithm resolves a long standing problem in egocentric vision and unlocks new usage scenarios for future applications.
  • PDF
    Sky/cloud imaging using ground-based Whole Sky Imagers (WSI) is a cost-effective means to understanding cloud cover and weather patterns. The accurate segmentation of clouds in these images is a challenging task, as clouds do not possess any clear structure. Several algorithms using different color models have been proposed in the literature. This paper presents a systematic approach for the selection of color spaces and components for optimal segmentation of sky/cloud images. Using mainly principal component analysis (PCA) and fuzzy clustering for evaluation, we identify the most suitable color components for this task.

Recent comments

Jaiden Mispy Jul 09 2015 10:12 UTC

There's also a [docker image](http://ryankennedy.io/running-the-deep-dream/) if you want to play with it, though if you're on Linux or OS X you might want to install everything natively in order to get GPU acceleration (the gradient ascent can be quite slow on higher layers in the network)

Jaiden Mispy Jul 09 2015 08:13 UTC

The image recognition model described here is the one responsible for [deepdream](http://github.com/google/deepdream).

![deepdream nebula][1]

[1]: https://pbs.twimg.com/media/CI_EASXWcAAGXnK.jpg