nlp course, nlp, tutorials,

NLP course Week 4 - Attention and transformers

Ben Ben Follow May 04, 2020 · 1 min read
Share this

Fnishing off the the more theory-focused lectures for my course at Harvard Extension, we’re focusing on the more state-of-the-art NLP model design elements. Particularly, we’re going through some basic use of attention and transformers.

Week 4 notebook.

In this week, we do some basic exploration of the more recent advancements in NLP modelling. The main elements we discuss are:

  • Bi-directional LSTMs, with focus on ELMo
  • Attention
  • Transformers, focus on BERT

We use ELMo and BERT for their contextual representations of individual tokens. The idea here is to show how simple it is to make use of the state of the art and how they can add a lot of value on top of the typical “shallow” word embeddings (e.g.GloVe).

For attention, we actually implement an “attention block” into our sentiment LSTM model from the previous class. This implementation is extremely simplified. My goal was to get the “core” of the mechanism in there. That meant leaving out steps like reprojecting the query and key vectors and associated parameters with those reprojections (e.g. dropout). So the output of the block is not particularly informative, but I feel it gives the idea of the mechanism.

This notebook finishes off the more theory-focused part of the class. For the next four sessions, we’ll be focusing on applications. My thinking right now is to select some good examples of soup-to-nuts projects and taking the class through how to go from idea, to data processing to model training and output.

Stay tuned for more! As always, let me know what you think (Twitter or Github right there on the left!)

Written by Ben Follow
I am the type of person who appreciates the telling of a great story. As a data scientist, I am interested in using AI to understand, explore and communicate across borders and boundaries. My focus is on Natural Language Processing and the amazing ways it can help us understand each other and our world. The material on this blog is meant to share my experiences and understandings of complex technologies in a simple way, focused on application.