Conditional Neural Language Models for Multimodal Learning and Natural Language Understanding

Date

2018-06

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this thesis we introduce conditional neural language models based on log-bilinear and recurrent neural networks with applications to multimodal learning and natural language understanding. We first introduce a LSTM encoder for learning visual-semantic embeddings for ranking the relevance of text to images in a joint embedding space. Next we introduce three log-bilinear models for generating image descriptions that integrate both additive and multiplicative interactions. Beyond image conditioning, we describe a multiplicative conditional neural language model for learning distributed representations of attributes and meta data. Our model allows for contextual word relatedness comparisons through decompositions of a word embedding tensor. Finally we show how we can abstract the skip-gram model for learning word representations to a conditional recurrent neural language model for unsupervised learning of sentence representations. We introduce a family of models called contextual encoder-decoders and demonstrate how our models can be used to induce generic sentence representations as well as unaligned generation of short stories conditioned on images. This thesis closes by highlighting several open areas of future work.

Description

Keywords

Computer Vision, Deep Learning, Language Models, Machine Learning, Natural Language Processing, Neural Networks

Citation

DOI

ISSN

Creative Commons

Creative Commons URI

Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.