Whew! A bit of a delay here, brought on by the fact that teaching a course is hard! Who knew (besides everyone who’s ever done it)!
In week 6 of my course at Harvard Extension we build on the previous week’s scoping discussion by applying it to some specific use-cases in NLP. Those use-cases are:
1) Unsupervised clustering/grouping of text 2) Named-entity recognition (NER) 3) Natural language generation (NLG)
For 1 and 2, I provide some of my own experience working with these use-cases. When I worked at ThriveHive, the team spent a lot of time thinking about ways to group our customer population, which was a particularly challenging task since small businesses don’t fit neatly into existing categories. We arrived at an NLP pipeline that could ingest website text (i.e. the business’ own description of itself) and create a business “representation” vector. This was useful for grouping and for similarity calculation.
For NER, I walked through an example training a model from scratch with SpaCy. All this material will be made available as slides once the course is over.
For NLG, I don’t really have a real-world use-case to model. But I found it fascinating that we could take the sentiment prediction model we implemented in week 3 and turn it into a generative model with just a few tweaks. So I put in a simple implementation in the notebook
The week 6 notebook walks through some simple examples of the three use-cases. I thought it might be useful for students to walk through these examples to get an idea how they might implement solutions similar to what I spoke about in class.
Definitely let me know if you have feedback!