AutoExtend

AutoExtend - Extending Word Embeddings to Embeddings for Synsets and Lexeme

Introduction

AutoExtend will extend existing word embeddings to embeddings for lexeme and synsets. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings.

To learn more about AutoExtend, read the following paper:
http://www.aclweb.org/anthology/P15-1173
Or watch the talk (24min):
http://techtalks.tv/talks/extending-word-embedddings-to-embeddings-for-synsets-and-lexemes/61854/

Get started

git clone https://github.com/casaro/AutoExtend.git

Extract words, synsets, etc. from WordNet

1a) open file WordNetExtractor.java
1b) set path to JWNL (jwnl-1.4_rc3 or newer)
1c) set path to your input word embeddings (can be binary or text)
1d) set path to desired output folder
1e) get it running

Run AutoExtend

2a) open AutoExtend.m
2b) set folder in else clause to folder of 1d)
2c) get it running
2e) play around with parameters
2f) run writeVectors.m to get word and synset embeddings in one file

Run IMS with with synset features

3a) add all three files to the IMS system
3b) search for "path to word and synset vectors"
3c) set path to file of 2f)
3d) get it running

Pre-trained embeddings

You can download pre-trained synsets and lexeme embeddings here (163 MB). They live in the same vector space as the pre-trained word embeddings by Mikolov (see https://code.google.com/p/word2vec/). The synset ids correspond to the ids in WordNet 1.7.1. A mapping of synset ids to lexemes in the synset is included in the zip file. Note: An updated version for the lexeme vectors was uploaded on 2015/11/25.

Cite

If you use AutoExtend, please cite the following paper:

@InProceedings{rothe-schutze:2015:P15-1,
  author    = {Rothe, Sascha  and  Sch\"{u}tze, Hinrich},
  title     = {AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes},
  booktitle = {Proceedings of the ACL},
  year      = {2015}
}

Contact: Sascha Rothe (cis page)