Conditional Neural Language Models for Multimodal Learning and Natural Language Understanding
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this thesis we introduce conditional neural language models based on log-bilinear and recurrent neural networks with applications to multimodal learning and natural language understanding. We first introduce a LSTM encoder for learning visual-semantic embeddings for ranking the relevance of text to images in a joint embedding space. Next we introduce three log-bilinear models for generating image descriptions that integrate both additive and multiplicative interactions. Beyond image conditioning, we describe a multiplicative conditional neural language model for learning distributed representations of attributes and meta data. Our model allows for contextual word relatedness comparisons through decompositions of a word embedding tensor. Finally we show how we can abstract the skip-gram model for learning word representations to a conditional recurrent neural language model for unsupervised learning of sentence representations. We introduce a family of models called contextual encoder-decoders and demonstrate how our models can be used to induce generic sentence representations as well as unaligned generation of short stories conditioned on images. This thesis closes by highlighting several open areas of future work.
Description
Keywords
Citation
DOI
ISSN
Creative Commons
Creative Commons URI
Collections
Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.