AutoExtend - Extending Word Embeddings to Embeddings for Synsets and Lexeme
Introduction
AutoExtend will extend existing word embeddings to embeddings for lexeme and synsets. It is flexible in that it can take
any word embeddings as input and does not need an additional
training corpus. The synset/lexeme embeddings obtained live
in the same vector space as the word embeddings.
To learn more about AutoExtend, read the following paper:
http://www.aclweb.org/anthology/P15-1173
Or watch the talk (24min):
http://techtalks.tv/talks/extending-word-embedddings-to-embeddings-for-synsets-and-lexemes/61854/
Get started
git clone https://github.com/casaro/AutoExtend.git
Extract words, synsets, etc. from WordNet
1a) open file WordNetExtractor.java
1b) set path to JWNL (jwnl-1.4_rc3 or newer)
1c) set path to your input word embeddings (can be binary or text)
1d) set path to desired output folder
1e) get it running
Run AutoExtend
2a) open AutoExtend.m
2b) set folder in else clause to folder of 1d)
2c) get it running
2e) play around with parameters
2f) run writeVectors.m to get word and synset embeddings in one file
Run IMS with with synset features
3a) add all three files to the IMS system
3b) search for "path to word and synset vectors"
3c) set path to file of 2f)
3d) get it running
Pre-trained embeddings
You can download pre-trained synsets and lexeme embeddings
here (163 MB). They live in the same vector space as the pre-trained word embeddings by Mikolov (see
https://code.google.com/p/word2vec/). The synset ids correspond to the ids in WordNet 1.7.1. A mapping of synset ids to lexemes in the synset is included in the zip file.
Note: An updated version for the lexeme vectors was uploaded on 2015/11/25.
Cite
If you use AutoExtend, please cite the following paper:
@InProceedings{rothe-schutze:2015:P15-1,
author = {Rothe, Sascha and Sch\"{u}tze, Hinrich},
title = {AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes},
booktitle = {Proceedings of the ACL},
year = {2015}
}
Contact: Sascha Rothe (cis page)