Posts by Ben

Ben Needs a Friend tutorial release

It’s ready! In this repository you will find all the materials for my ODSC tutorial “Ben Needs a Friend”.

In LLM, tutorials, May 09, 2024

ODSC East 2024 Recap

It’s a wrap! ODSC East 2024 has come to and end and what a ride it was. I’ll have another post next week on the materials from my LLM application workshop, but for now I wanted to ge...

In Conference recap, Apr 26, 2024

Building a Command ARR+ agent

This month superstar LLM startup Cohere released their Command-R+ model. I experimented with this model to create a functional (and free!) agent workflow to look up local events.

In tutorial, LLM, Apr 11, 2024

Teaching text representation techniques

This is an update of some materials from my Harvard NLP course. I focused here on three methods for representation of text data that are simple, quick and transparent. This is part ...

In tutorial, NLP, Mar 29, 2024

EU's AI Act - What it is, what it means

On Wednesday, European Union leaders approved the AI Act, which proposes a “risk categorization” framework for the use of AI. Some uses will be banned all together, while others will...

In AI policy, articles, Mar 15, 2024

LLM Apps Part 7 - Looking ahead

We’ve come to the end of my LLM app series (for now). I really enjoyed putting this together and I will be working in the coming weeks on expanding some of these sections. I’m looki...

In LLM App Series, LLM, tutorials, Mar 08, 2024

LLM Apps Part 6 - Call my LLM agent!

We’ve explored how to make our LLM applications conversational, relevant and personalized, but now it’s time to put them to work.

In LLM App Series, LLM, tutorials, Feb 26, 2024

Insight Lane at MIT

I’m still cranking away at part 6 of my LLM series, but I didn’t want you all to think I’d gone dormant again! This week I got the old band together to promote our Insight Lane crash ...

In updates, Feb 09, 2024

Fine-tuning ChatGPT for Dermatology

The main goal of my project with dermatology tech startup Melatech is to build a prototype clinical note generation engine. The notes generated need to have the following qualities:

In Dermatology, tutorial, LLM, Jan 24, 2024

LLM Apps Part 5 - Even finer tuning - Quantization and LoRA

In this post, we continue exploring how to “tune” the output of our LLM application to be more…Friends-ly.

In LLM App Series, LLM, tutorials, Jan 17, 2024

LLM Apps Part 4 - Only the finest tuning

In this post, we explore how to make our LLM application to “talk” the way we want by fine-tuning OpenAI’s GPT3.5 model.

In LLM App Series, LLM, tutorials, Jan 10, 2024

New Year, New Blog

Happy new year! You may have noticed a few…changes here.

In updates, Jan 03, 2024

LLM Apps Part 3 - RAGs to riches

This is part 3 of my series on LLM application development, see the previous two parts:

In LLM App Series, LLM, tutorials, Dec 17, 2023

LLM technology for dermatology

One of the projects I’ve been working on recently has been with a startup named Melatech, which aims to develop AI-assisted workflows for clinical training and management. One of the...

In Dermatology, research, Dec 12, 2023

LLM Apps Part 2 - Ollama, LlamaBot and Oobabooga - Oh my!

This is part 2 of my series on LLM applications. Check out part 1

In LLM App Series, LLM, tutorials, Dec 01, 2023

LLM Apps Part 1 - Coding from scratch

Good news everyone! I’m going to be presenting again at ODSC East! This time I’m going to shift focus and really dig into Large Language Models (LLMs). Partly because everyone else i...

In LLM App Series, LLM, tutorials, Nov 21, 2023

Tokens to Transformers: An update to Bagging to BERT

I swear, I’m still here! The past few weeks have been a bit of madness with job transition, travel, etc.

In Bagging to BERT, NLP, tutorials, Oct 04, 2023

Llama-2: Judgement Day

This is another paper summary, this time focusing on the Llama-2 paper

In papers, NLP, LLM, Sep 06, 2023

Platypus - Quick, Cheap, and Powerful Refinement of LLMs

Hi all! In the interest of keeping up with the latest and greatest in LLMs, I’ve been collecting some articles and will hopefully have time to work through each. Typically my process...

In LLM, papers, Aug 22, 2023

The Embiggening of NLP - Part 3 - Fine-tuning

In the last two posts, I talked about language models and some reasons people are saying that size matters. The last part of this I want to cover is something I feel is often missing...

In LLM Intro Series, LLM, tutorials, Aug 10, 2023

The Embiggening of NLP - Part 2 - LARGE Language Models

Continuing where we left off last week, let’s get into some BIG topics. We’re going to be large and in charge. This will be no small task.

In LLM Intro Series, LLM, tutorials, Jul 27, 2023

The Embiggening of NLP - Part 1 - Language Models

I’m not sure if you’ve heard, but there’s a new kid on the NLP block.

In LLM Intro Series, LLM, tutorials, Jul 20, 2023

Update to Bagging to BERT Tutorial

Since the last time I presented on NLP there’s been some…developments.

In May 18, 2023

Geoffrey Hinton Interview with On the Media

I recently listened to a really interesting interview with Geoffrey Hinton, one of the major figures in the deep learning space. The main focus of the conversation was around ChatGPT...

In Feb 16, 2023

Bagging to BERT tutorial

Well, it’s official. I’m bad at blogging.

In Bagging to BERT, NLP, tutorials, Jan 24, 2023

Bias and Ethics in NLP Talk

Still working on getting more content up here, but thought in the meantime I’d share a talk I gave at the Spark NLP Summit earlier this month.

In ethics, NLP, talks, Oct 18, 2021

PyData Global 2020 recap

Holy moly am I the king of dropping the ball here. I had put this recap together based on some of the really awesome talks I attended during PyData Global and then promptly forgot ab...

In Conference recap, Jan 05, 2021

GPT-Who? Exploring the history of GPT

I’m not sure if this confused anyone else, but each time I hear of the “new” GPT, I wonder, what exactly is “new” about it? I know, generally, it utilizes some version of the transfor...

In LLM, articles, Oct 28, 2020

NLP course slides

Whew, a bit of a delay here, kind of a crazy time (for everyone, I’m sure).

In NLP Course, NLP, tutorials, Oct 21, 2020

NLP course Week 7 - Ethics and bias in NLP

It’s the last week! For as much as we covered, this all seems to have gone very fast. I have some reflections on the course I’ll dig into next week. This week, though, I’m releasing...

In NLP Course, NLP, tutorials, Aug 13, 2020

NLP course Week 6 - NLP use-cases

Whew! A bit of a delay here, brought on by the fact that teaching a course is hard! Who knew (besides everyone who’s ever done it)!

In NLP Course, NLP, tutorials, Aug 04, 2020

NLP course Week 5 - Scoping an NLP project

The second half of the my course at Harvard Extension will focus on applications of the theory we cover in the first half. That means the notebooks are mainly going to be walkthrough...

In NLP Course, NLP, tutorials, Jun 08, 2020

New website!

Just wanted to put a small announcement up here about my new, shiny website. Shiny as in lustrous, not RShiny :P

In Jun 03, 2020

PyData NLP Workshop

Tomorrow I’ll be hosting a virtual meetup with PyData demonstrating some basic techniques in NLP!

In May 26, 2020

NLP course Week 4 - Attention and transformers

Fnishing off the the more theory-focused lectures for my course at Harvard Extension, we’re focusing on the more state-of-the-art NLP model design elements. Particularly, we’re going...

In NLP Course, NLP, tutorials, May 04, 2020

NLP course Week 3 - Context

More notebooks from my upcoming course at Harvard Extension, here is week 2, focusing on going from tokens to vectors.

In NLP Course, NLP, tutorials, Apr 14, 2020

NLP course Week 2 - Vectorization

Continuing with the release of some of the notebooks from my upcoming course at Harvard Extension, here is week 2, focusing on going from tokens to vectors.

In NLP Course, NLP, tutorials, Mar 31, 2020

NLP course Week 1 - Tokenization

As I mentioned in a previous post, I’m going to be teaching a course at Harvard Extension this summer. I figured I’d begin posting some of the material here on the blog in case peopl...

In NLP Course, NLP, tutorials, Mar 10, 2020

Terraform tutorials - Set up with GCP

Something a little different here. I’ve found that I often am setting up a lot of infrastructure to serve applications and models and usually my approach is a bit ad-hoc. I have som...

In Mar 04, 2020

Strata AI talk and Harvard class

Happy new year!

In Feb 10, 2020

MIT Statistical Learning Lecture 3 Features and Kernels

All apologies for the long delay here. Lots of stuff going on including, but not limited to: A 3-week journey to China The revival of the Boston chapter of PyData Various work-rel...

In Dec 06, 2019

MIT Statistical Learning Lecture 2 - Regularized Least Squares

So we’re back with more Statistical Learning. In this class, we introduce the idea of regularization and some of the statistical underpinnings there.

In Oct 04, 2019

MIT Statistical Learning Lecture 1 - Overview

Recently I started attending a class at MIT on Statistical Learning Theory and Applications. The course is basically a deep dive into the statistical underpinnings of Machine Learnin...

In Sep 25, 2019

Sentiment analysis with spaCy-PyTorch Transformers

Trying another new thing here: There’s a really interesting example making use of the shiny new spaCy wrapper for PyTorch transformer models that I was excited to dive into. I figure...

In Sep 18, 2019

Sustainability certification data release

Sustainability certification brand inventory

In Sep 05, 2019

Matching the Blanks: Distributional Similarity for Relation Learning

I heard from a colleague about this paper recently and thought it was really interesting and relevant to some of the work I’m doing. I figure I’ll aim to do a paper summary here ever...

In Aug 28, 2019

What's missing in NLP

What’s missing in Natural Language Processing

In Aug 19, 2019

Entity-Linking in spaCy

Annnnd we’re back with more overviews of talks from the spaCy IRL conference. This time Sofie Van Landeghem takes us through the work-in-progress Entity-Linking model in spaCy.

In Aug 12, 2019

Transfer Learning in Open-source NLP

As I mentioned last time, I’m going to start digging into some topics of interest to me and put together my notes in blog form. My main purpose here is to keep an inventory of the to...

In Aug 06, 2019

Stanford CS230 Lecture 5 - AI and Healthcare

All apologies for taking so long to get back to this, lots of development in life has kind of derailed my routines. But I want to keep to a schedule of updating on a weekly basis. M...

In Jul 29, 2019

ODSC 2019 recap

Sorry, all, for not updating for a while. Big job change happened back in June so I’m just now getting my head above water. That’ll have to be my (insufficient) excuse for taking so...

In Conference recap, Jul 22, 2019

Stanford CS230 Lecture 4 - Adversarial Attacks

Kian Katanforoosh dives deep again in this episode, presenting inherent vulnerabilities in Neural Network architectures and how investigation into these vulnerabilities has spawned a ...

In May 27, 2019

Stanford CS230 Lecture 3 - Full-cycle deep learning projects

So we’re back with Andrew Ng in this lecture to go over, at a pretty high level, the steps and considerations of a complete Deep Learning project. Again, all of the materials come fr...

In May 16, 2019

Stanford CS230 Lecture 2 - Deep Learning Intuition

Now we’re getting into the meat of things. This lecture was taught by Kian Katanforoosh and dove into a set of examples of Deep Learning projects to discuss design considerations. I’l...

In May 08, 2019

Stanford CS230 Lecture 1 - Introduction

Hi folks! I’m going to start a series documenting my progress through the Stanford CS230 course. The course is meant to go alongside the Coursera Deeplearning.ai materials, but right...

In May 01, 2019

Using ROC curves for Classification

I’m thinking it might be good to start producing some kind of content aside from infrequent Github updates. To be honest, I use a lot of these concepts on a daily basis, but really d...

In Apr 01, 2019

NLP Workshop at Data Science Salon

I’ll be running a workshop tomorrow (Feb 21st) at the Data Science Salon in Austin. Will be really interested, excited to chat with everyone during and after.

In Feb 20, 2019

PyData Probabilistic Modelling Workshop video

Very late on the update, but the video for Matt and my workshop was posted on the PyData Youtube channel. Check out the rest of the materials on this post.

In Feb 10, 2019

Recaps of 2018 conferences

I put together some summaries of some of the interesting talks I went to at PyData NYC and ODSC Europe. This was mostly to share with the engineering team at my office, but I thought...

In Dec 08, 2018

Video of ODSC presentation on hierarchical modeling

Great news! I just uploaded a recording of my talk at ODSC Europe. Again, slides are available here.

In Nov 25, 2018

Hierarchical Modelling in Real Life at PyData NYC

It’s conference season huzzah! My friend Matt Moocarme and I will be giving a workshop on hierarchical probabilistic models tomorrow at PyData NYC. It’s based on the methods I descri...

In Oct 18, 2018

Bayesian Hierarchical modeling talk

I’ll be giving a talk at ODSC Europe tomorrow (2018-09-22) on Bayesian Hierarchical modeling and an application at my work. Should be really interesting. I’m putting the slides up h...

In Sep 21, 2018

Progress on D4D project

I wanted to share some recent recognition the D4D Crash model project (now referred to as Insight Lane) has been making in recent months. First, we released version 1 of our project,...

In Sep 05, 2018

Series on interpretability in Machine Learning

Hi internets,

In Aug 13, 2018

Updates

Sorry for the radio silence, I recently started a job as Data Scientist for ThriveHive, a SaaS company providing managed marketing for small businesses. It’s super interesting, and h...

In Jun 14, 2017

Results of the D4D hackathon

Thanks to all the awesome people at Data 4 Democracy, MIT and data.world we pulled off a great hackathon. If you want to see some of the participants showing off their work (and my t...

In Apr 03, 2017

D4D traffic modelling hackathon.

It’s happening, people! The Data 4 Democracy (with a little help from City of Boston) hackathon. If you’re in the Boston area or a fan of the Boston area or, heck, if you just got no...

In Mar 23, 2017

Adventures in spatial analysis

Related to my project with Data 4 Democracy, I’ve started compiling some scripts for spatial analysis. Particulary interesting (at least to me) is a bit of TSPLIB I adapted for use w...

In Mar 17, 2017

Data 4 Democracy project

Hello internet fans! Just wanted to add a quick update that I’m working on an interesting project with the Data 4 Democracy folks. Check it out.

In Mar 06, 2017

VoterKarma app now with more boost

The team and I from Debug Politics are still working on updating the VoterKarma app.

In Feb 05, 2017

New tutorial

Hi internet friends! I put together a small tutorial using pandas and some API wizardry to do simple analysis. I use open data from Boston and the Wunderground API and deal with data...

In Feb 04, 2017

VoterKarma app

Hello adoring public. This past weekend I attended the Debug Politics Hackathon, a hackathon dedicated to addressing issues with politics, voting and the rhetoric around both. It wa...

In Jan 18, 2017

Basic python lesson

There’s a small chance I may be doing some part-time data science instructoring, so I put together a very basic tutorial for those just getting started with Python (or programming gen...

In Jan 06, 2017

Reddit climate writeup

The Case For Climate on RedditBen Batorsky

In Jan 02, 2017

Reddit climate project

Just posted an exercise I did with reddit data and characterizing the conversation around climate change. What was super interesting was using the networkx library, which allows you ...

In Jan 02, 2017

Update to Wellness Tool

Bad news: I’ve been held up on doing my election project,

In Nov 27, 2016

Data challenge code

Some bonus good news: I put up a data challenge I did for an application. And whether or not they think it’s worth hiring me over, you all get to peruse the code. I have some neatly...

In Nov 27, 2016

Mini update for mini-project

Wow, what a mess, huh?

In Nov 11, 2016

Doing a mini-project

I’m updating my poorly organized/documented twittertools. Turns out Tweepy isn’t being maintained anymore, so I will be switching to python twitter tools. Get excited, I have some e...

In Nov 07, 2016

Testing 1 2 3

Hello world. Going to start adding to this page, stay tuned, internet.

In Nov 02, 2016