Hirst, GraemeWang, Tong2017-06-042017-06-042016-11http://hdl.handle.net/1807/77437A fundamental principle in distributional semantic models is to use similarity in linguistic environment as a proxy for similarity in meaning. Known as the distributional hypothesis, the principle has been successfully applied to many areas in natural language processing. As a proxy, however, it also suffers from critical limitations and exceptions, many of which are the result of the overly simplistic definitions of the linguistic environment. In this thesis, I hypothesize that the quality of distributional models can be improved by carefully incorporating linguistic knowledge into the definition of linguistic environment. The hypothesis is validated in the context of three closely related semantic relations, namely near-synonymy, lexical similarity, and polysemy. On the lexical level, the definition of linguistic environment is examined under three different distributional frameworks including lexical co-occurrence, syntactic and lexicographic dependency, and taxonomic structures. Firstly, combining kernel methods and lexical level co-occurrence with matrix factorization is shown to be highly effective in capturing the fine-grained nuances among near-synonyms. Secondly, syntactic and lexicographic information is shown to result in notable improvement in lexical embedding learning when evaluated in lexical similarity benchmarks. Thirdly, for taxonomy-based measures of lexical similarity, the intuitions for using structural features such as depth and density are examined and challenged, and the refined definitions are shown to improve correlation between the features and human judgements of similarity as well as performances of similarity measures using these features. On the compositional level, distributional models of multi-word contexts are also shown to benefit from incorporating syntactic and lexicographic knowledge. Analytically, the use of syntactic teacher-forcing is motivated by derivations of full gradients in long short-term memory units in recurrent neural networks. Empirically, syntactic knowledge helps achieve statistically significant improvement in language modelling and state-of-the-art accuracy in near-synonym lexical choice. Finally, a compositional similarity function is developed to measure the similarity between two sets of random events. Application in polysemy with lexicographic knowledge produces state-of-the-art performance in unsupervised word sense disambiguation.Artificial IntelligenceCompositional SemanticsComputational LinguisticsDistributional SemanticsLexical Semantics0984Exploiting Linguistic Knowledge in Lexical and Compositional Semantic ModelsThesis