Bagging to BERT tutorial

Well, it’s official. I’m bad at blogging.

But! It’s never too late to have a restart, so I’m getting back at this.

I wanted to share some materials from a tutorial I gave at PyData NYC this past year. I’m very proud of how it came out and the response from the audience was really excellent.

Tutorial github repository

Video of the talk

The focus of this tutorial was providing an overview of NLP methods. I aimed to create a set of exercises that built on one another. I started with fundamentals like tokenization and word counts, added complexity in the form of learned weights (TF-IDF and topic models), then began to build in the transfer learning of modern NLP.

I’m planning on doing another version of this tutorial this year at ODSC East. In this updated version I’ll be bringing in spaCy a bit more. Rather than spend a bunch of time looking at PyTorch code, I figured spaCy gives a good setup for all types of deep models. I imagine folks equipped with that tool set would do better than having a separate tool for each type of NLP model.

Anyway - more to come on that.

In the meantime, I’m really going to try and put more up here. Maybe shorter form musings, rather than more complex posts.

Bagging to BERT tutorial

Written by Ben Follow