Multimedia (cs.MM)

  • PDF
    Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. In this paper, for the first time, we study the challenging problem of learning to generate posters from scientific papers. To this end, a data-driven framework, that utilizes graphical models, is proposed. Specifically, given content to display, the key elements of a good poster, including attributes of each panel and arrangements of graphical elements are learned and inferred from data. During the inference stage, an MAP inference framework is employed to incorporate some design principles. In order to bridge the gap between panel attributes and the composition within each panel, we also propose a recursive page splitting algorithm to generate the panel layout for a poster. To learn and validate our model, we collect and release a new benchmark dataset, called NJU-Fudan Paper-Poster dataset, which consists of scientific papers and corresponding posters with exhaustively labelled panels and attributes. Qualitative and quantitative results indicate the effectiveness of our approach.
  • PDF
    Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. In a world where nearly every new service has its own API, documentation, and/or client library, data scientists who need to combine diverse features obtained from multiple sources are often forced to write and maintain ever more elaborate feature extraction pipelines. To address this challenge, we introduce a new open-source framework for comprehensive multimodal feature extraction. Pliers is an open-source Python package that supports standardized annotation of diverse data types (video, images, audio, and text), and is expressly with both ease-of-use and extensibility in mind. Users can apply a wide range of pre-existing feature extraction tools to their data in just a few lines of Python code, and can also easily add their own custom extractors by writing modular classes. A graph-based API enables rapid development of complex feature extraction pipelines that output results in a single, standardized format. We describe the package's architecture, detail its major advantages over previous feature extraction toolboxes, and use a sample application to a large functional MRI dataset to illustrate how pliers can significantly reduce the time and effort required to construct sophisticated feature extraction workflows while increasing code clarity and maintainability.
  • PDF
    In this paper, we study a simplified affine motion model based coding framework to overcome the limitation of translational motion model and maintain low computational complexity. The proposed framework mainly has three key contributions. First, we propose to reduce the number of affine motion parameters from 6 to 4. The proposed four-parameter affine motion model can not only handle most of the complex motions in natural videos but also save the bits for two parameters. Second, to efficiently encode the affine motion parameters, we propose two motion prediction modes, i.e., advanced affine motion vector prediction combined with a gradient-based fast affine motion estimation algorithm and affine model merge, where the latter attempts to reuse the affine motion parameters (instead of the motion vectors) of neighboring blocks. Third, we propose two fast affine motion compensation algorithms. One is the one-step sub-pixel interpolation, which reduces the computations of each interpolation. The other is the interpolation-precision-based adaptive block size motion compensation, which performs motion compensation at the block level rather than the pixel level to reduce the interpolation times. Our proposed techniques have been implemented based on the state-of-the-art high efficiency video coding standard, and the experimental results show that the proposed techniques altogether achieve on average 11.1% and 19.3% bits saving for random access and low delay configurations, respectively, on typical video sequences that have rich rotation or zooming motions. Meanwhile, the computational complexity increases of both encoder and decoder are within an acceptable range.
  • PDF
    This paper proposes a novel advanced motion model to handle the irregular motion for the cubic map projection of 360-degree video. Since the irregular motion is mainly caused by the projection from the sphere to the cube map, we first try to project the pixels in both the current picture and reference picture from unfolding cube back to the sphere. Then through utilizing the characteristic that most of the motions in the sphere are uniform, we can derive the relationship between the motion vectors of various pixels in the unfold cube. The proposed advanced motion model is implemented in the High Efficiency Video Coding reference software. Experimental results demonstrate that quite obvious performance improvement can be achieved for the sequences with obvious motions.