Fnishing off the the more theory-focused lectures for my course at Harvard Extension, we’re focusing on the more state-of-the-art NLP model design elements. Particularly, we’re going through some basic use of attention and transformers.
In this week, we do some basic exploration of the more recent advancements in NLP modelling. The main elements we discuss are:
We use ELMo and BERT for their contextual representations of individual tokens. The idea here is to show how simple it is to make use of the state of the art and how they can add a lot of value on top of the typical “shallow” word embeddings (e.g.GloVe).
For attention, we actually implement an “attention block” into our sentiment LSTM model from the previous class. This implementation is extremely simplified. My goal was to get the “core” of the mechanism in there. That meant leaving out steps like reprojecting the query and key vectors and associated parameters with those reprojections (e.g. dropout). So the output of the block is not particularly informative, but I feel it gives the idea of the mechanism.
This notebook finishes off the more theory-focused part of the class. For the next four sessions, we’ll be focusing on applications. My thinking right now is to select some good examples of soup-to-nuts projects and taking the class through how to go from idea, to data processing to model training and output.
Stay tuned for more! As always, let me know what you think (Twitter or Github right there on the left!)