Posts by Ben
Some thoughts on job searches
I recently had a very successful end to a particularly stressful job search and I thought it might be useful to share some of my experiences.
In updates, articles, Jun 10, 2024Ben Needs a Friend tutorial release
It’s ready! In this repository you will find all the materials for my ODSC tutorial “Ben Needs a Friend”.
In LLM, tutorials, May 09, 2024ODSC East 2024 Recap
It’s a wrap! ODSC East 2024 has come to and end and what a ride it was. I’ll have another post next week on the materials from my LLM application workshop, but for now I wanted to ge...
In Conference recap, Apr 26, 2024Building a Command ARR+ agent
This month superstar LLM startup Cohere released their Command-R+ model. I experimented with this model to create a functional (and free!) agent workflow to look up local events.
In tutorial, LLM, Apr 11, 2024Teaching text representation techniques
This is an update of some materials from my Harvard NLP course. I focused here on three methods for representation of text data that are simple, quick and transparent. This is part ...
In tutorial, NLP, Mar 29, 2024EU's AI Act - What it is, what it means
On Wednesday, European Union leaders approved the AI Act, which proposes a “risk categorization” framework for the use of AI. Some uses will be banned all together, while others will...
In AI policy, articles, Mar 15, 2024LLM Apps Part 7 - Looking ahead
We’ve come to the end of my LLM app series (for now). I really enjoyed putting this together and I will be working in the coming weeks on expanding some of these sections. I’m looki...
In LLM App Series, LLM, tutorials, Mar 08, 2024LLM Apps Part 6 - Call my LLM agent!
We’ve explored how to make our LLM applications conversational, relevant and personalized, but now it’s time to put them to work.
In LLM App Series, LLM, tutorials, Feb 26, 2024Insight Lane at MIT
I’m still cranking away at part 6 of my LLM series, but I didn’t want you all to think I’d gone dormant again! This week I got the old band together to promote our Insight Lane crash ...
In updates, Feb 09, 2024Fine-tuning ChatGPT for Dermatology
The main goal of my project with dermatology tech startup Melatech is to build a prototype clinical note generation engine. The notes generated need to have the following qualities:
In Dermatology, tutorial, LLM, Jan 24, 2024LLM Apps Part 5 - Even finer tuning - Quantization and LoRA
In this post, we continue exploring how to “tune” the output of our LLM application to be more…Friends-ly.
In LLM App Series, LLM, tutorials, Jan 17, 2024LLM Apps Part 4 - Only the finest tuning
In this post, we explore how to make our LLM application to “talk” the way we want by fine-tuning OpenAI’s GPT3.5 model.
In LLM App Series, LLM, tutorials, Jan 10, 2024New Year, New Blog
Happy new year! You may have noticed a few…changes here.
In updates, Jan 03, 2024LLM Apps Part 3 - RAGs to riches
This is part 3 of my series on LLM application development, see the previous two parts:
In LLM App Series, LLM, tutorials, Dec 17, 2023LLM technology for dermatology
One of the projects I’ve been working on recently has been with a startup named Melatech, which aims to develop AI-assisted workflows for clinical training and management. One of the...
In Dermatology, research, Dec 12, 2023LLM Apps Part 2 - Ollama, LlamaBot and Oobabooga - Oh my!
This is part 2 of my series on LLM applications. Check out part 1
In LLM App Series, LLM, tutorials, Dec 01, 2023LLM Apps Part 1 - Coding from scratch
Good news everyone! I’m going to be presenting again at ODSC East! This time I’m going to shift focus and really dig into Large Language Models (LLMs). Partly because everyone else i...
In LLM App Series, LLM, tutorials, Nov 21, 2023Tokens to Transformers: An update to Bagging to BERT
I swear, I’m still here! The past few weeks have been a bit of madness with job transition, travel, etc.
In Bagging to BERT, NLP, tutorials, Oct 04, 2023Llama-2: Judgement Day
This is another paper summary, this time focusing on the Llama-2 paper
In papers, NLP, LLM, Sep 06, 2023Platypus - Quick, Cheap, and Powerful Refinement of LLMs
Hi all! In the interest of keeping up with the latest and greatest in LLMs, I’ve been collecting some articles and will hopefully have time to work through each. Typically my process...
In LLM, papers, Aug 22, 2023The Embiggening of NLP - Part 3 - Fine-tuning
In the last two posts, I talked about language models and some reasons people are saying that size matters. The last part of this I want to cover is something I feel is often missing...
In LLM Intro Series, LLM, tutorials, Aug 10, 2023The Embiggening of NLP - Part 2 - LARGE Language Models
Continuing where we left off last week, let’s get into some BIG topics. We’re going to be large and in charge. This will be no small task.
In LLM Intro Series, LLM, tutorials, Jul 27, 2023The Embiggening of NLP - Part 1 - Language Models
I’m not sure if you’ve heard, but there’s a new kid on the NLP block.
In LLM Intro Series, LLM, tutorials, Jul 20, 2023Update to Bagging to BERT Tutorial
Since the last time I presented on NLP there’s been some…developments.
In May 18, 2023Geoffrey Hinton Interview with On the Media
I recently listened to a really interesting interview with Geoffrey Hinton, one of the major figures in the deep learning space. The main focus of the conversation was around ChatGPT...
In Feb 16, 2023Bagging to BERT tutorial
Well, it’s official. I’m bad at blogging.
In Bagging to BERT, NLP, tutorials, Jan 24, 2023Bias and Ethics in NLP Talk
Still working on getting more content up here, but thought in the meantime I’d share a talk I gave at the Spark NLP Summit earlier this month.
In ethics, NLP, talks, Oct 18, 2021PyData Global 2020 recap
Holy moly am I the king of dropping the ball here. I had put this recap together based on some of the really awesome talks I attended during PyData Global and then promptly forgot ab...
In Conference recap, Jan 05, 2021GPT-Who? Exploring the history of GPT
I’m not sure if this confused anyone else, but each time I hear of the “new” GPT, I wonder, what exactly is “new” about it? I know, generally, it utilizes some version of the transfor...
In LLM, articles, Oct 28, 2020NLP course slides
Whew, a bit of a delay here, kind of a crazy time (for everyone, I’m sure).
In NLP Course, NLP, tutorials, Oct 21, 2020NLP course Week 7 - Ethics and bias in NLP
It’s the last week! For as much as we covered, this all seems to have gone very fast. I have some reflections on the course I’ll dig into next week. This week, though, I’m releasing...
In NLP Course, NLP, tutorials, Aug 13, 2020NLP course Week 6 - NLP use-cases
Whew! A bit of a delay here, brought on by the fact that teaching a course is hard! Who knew (besides everyone who’s ever done it)!
In NLP Course, NLP, tutorials, Aug 04, 2020NLP course Week 5 - Scoping an NLP project
The second half of the my course at Harvard Extension will focus on applications of the theory we cover in the first half. That means the notebooks are mainly going to be walkthrough...
In NLP Course, NLP, tutorials, Jun 08, 2020New website!
Just wanted to put a small announcement up here about my new, shiny website. Shiny as in lustrous, not RShiny :P
In Jun 03, 2020PyData NLP Workshop
Tomorrow I’ll be hosting a virtual meetup with PyData demonstrating some basic techniques in NLP!
In May 26, 2020NLP course Week 4 - Attention and transformers
Fnishing off the the more theory-focused lectures for my course at Harvard Extension, we’re focusing on the more state-of-the-art NLP model design elements. Particularly, we’re going...
In NLP Course, NLP, tutorials, May 04, 2020NLP course Week 3 - Context
More notebooks from my upcoming course at Harvard Extension, here is week 2, focusing on going from tokens to vectors.
In NLP Course, NLP, tutorials, Apr 14, 2020NLP course Week 2 - Vectorization
Continuing with the release of some of the notebooks from my upcoming course at Harvard Extension, here is week 2, focusing on going from tokens to vectors.
In NLP Course, NLP, tutorials, Mar 31, 2020NLP course Week 1 - Tokenization
As I mentioned in a previous post, I’m going to be teaching a course at Harvard Extension this summer. I figured I’d begin posting some of the material here on the blog in case peopl...
In NLP Course, NLP, tutorials, Mar 10, 2020Terraform tutorials - Set up with GCP
Something a little different here. I’ve found that I often am setting up a lot of infrastructure to serve applications and models and usually my approach is a bit ad-hoc. I have som...
In Mar 04, 2020MIT Statistical Learning Lecture 3 Features and Kernels
All apologies for the long delay here. Lots of stuff going on including, but not limited to: A 3-week journey to China The revival of the Boston chapter of PyData Various work-rel...
In Dec 06, 2019MIT Statistical Learning Lecture 2 - Regularized Least Squares
So we’re back with more Statistical Learning. In this class, we introduce the idea of regularization and some of the statistical underpinnings there.
In Oct 04, 2019MIT Statistical Learning Lecture 1 - Overview
Recently I started attending a class at MIT on Statistical Learning Theory and Applications. The course is basically a deep dive into the statistical underpinnings of Machine Learnin...
In Sep 25, 2019Sentiment analysis with spaCy-PyTorch Transformers
Trying another new thing here: There’s a really interesting example making use of the shiny new spaCy wrapper for PyTorch transformer models that I was excited to dive into. I figure...
In Sep 18, 2019Sustainability certification data release
Sustainability certification brand inventory
In Sep 05, 2019Matching the Blanks: Distributional Similarity for Relation Learning
I heard from a colleague about this paper recently and thought it was really interesting and relevant to some of the work I’m doing. I figure I’ll aim to do a paper summary here ever...
In Aug 28, 2019Entity-Linking in spaCy
Annnnd we’re back with more overviews of talks from the spaCy IRL conference. This time Sofie Van Landeghem takes us through the work-in-progress Entity-Linking model in spaCy.
In Aug 12, 2019Transfer Learning in Open-source NLP
As I mentioned last time, I’m going to start digging into some topics of interest to me and put together my notes in blog form. My main purpose here is to keep an inventory of the to...
In Aug 06, 2019Stanford CS230 Lecture 5 - AI and Healthcare
All apologies for taking so long to get back to this, lots of development in life has kind of derailed my routines. But I want to keep to a schedule of updating on a weekly basis. M...
In Jul 29, 2019ODSC 2019 recap
Sorry, all, for not updating for a while. Big job change happened back in June so I’m just now getting my head above water. That’ll have to be my (insufficient) excuse for taking so...
In Conference recap, Jul 22, 2019Stanford CS230 Lecture 4 - Adversarial Attacks
Kian Katanforoosh dives deep again in this episode, presenting inherent vulnerabilities in Neural Network architectures and how investigation into these vulnerabilities has spawned a ...
In May 27, 2019Stanford CS230 Lecture 3 - Full-cycle deep learning projects
So we’re back with Andrew Ng in this lecture to go over, at a pretty high level, the steps and considerations of a complete Deep Learning project. Again, all of the materials come fr...
In May 16, 2019Stanford CS230 Lecture 2 - Deep Learning Intuition
Now we’re getting into the meat of things. This lecture was taught by Kian Katanforoosh and dove into a set of examples of Deep Learning projects to discuss design considerations. I’l...
In May 08, 2019Stanford CS230 Lecture 1 - Introduction
Hi folks! I’m going to start a series documenting my progress through the Stanford CS230 course. The course is meant to go alongside the Coursera Deeplearning.ai materials, but right...
In May 01, 2019Using ROC curves for Classification
I’m thinking it might be good to start producing some kind of content aside from infrequent Github updates. To be honest, I use a lot of these concepts on a daily basis, but really d...
In Apr 01, 2019NLP Workshop at Data Science Salon
I’ll be running a workshop tomorrow (Feb 21st) at the Data Science Salon in Austin. Will be really interested, excited to chat with everyone during and after.
In Feb 20, 2019PyData Probabilistic Modelling Workshop video
Very late on the update, but the video for Matt and my workshop was posted on the PyData Youtube channel. Check out the rest of the materials on this post.
In Feb 10, 2019Recaps of 2018 conferences
I put together some summaries of some of the interesting talks I went to at PyData NYC and ODSC Europe. This was mostly to share with the engineering team at my office, but I thought...
In Dec 08, 2018Video of ODSC presentation on hierarchical modeling
Great news! I just uploaded a recording of my talk at ODSC Europe. Again, slides are available here.
In Nov 25, 2018Hierarchical Modelling in Real Life at PyData NYC
It’s conference season huzzah! My friend Matt Moocarme and I will be giving a workshop on hierarchical probabilistic models tomorrow at PyData NYC. It’s based on the methods I descri...
In Oct 18, 2018Bayesian Hierarchical modeling talk
I’ll be giving a talk at ODSC Europe tomorrow (2018-09-22) on Bayesian Hierarchical modeling and an application at my work. Should be really interesting. I’m putting the slides up h...
In Sep 21, 2018Progress on D4D project
I wanted to share some recent recognition the D4D Crash model project (now referred to as Insight Lane) has been making in recent months. First, we released version 1 of our project,...
In Sep 05, 2018Updates
Sorry for the radio silence, I recently started a job as Data Scientist for ThriveHive, a SaaS company providing managed marketing for small businesses. It’s super interesting, and h...
In Jun 14, 2017Results of the D4D hackathon
Thanks to all the awesome people at Data 4 Democracy, MIT and data.world we pulled off a great hackathon. If you want to see some of the participants showing off their work (and my t...
In Apr 03, 2017D4D traffic modelling hackathon.
It’s happening, people! The Data 4 Democracy (with a little help from City of Boston) hackathon. If you’re in the Boston area or a fan of the Boston area or, heck, if you just got no...
In Mar 23, 2017Adventures in spatial analysis
Related to my project with Data 4 Democracy, I’ve started compiling some scripts for spatial analysis. Particulary interesting (at least to me) is a bit of TSPLIB I adapted for use w...
In Mar 17, 2017Data 4 Democracy project
Hello internet fans! Just wanted to add a quick update that I’m working on an interesting project with the Data 4 Democracy folks. Check it out.
In Mar 06, 2017VoterKarma app now with more boost
The team and I from Debug Politics are still working on updating the VoterKarma app.
In Feb 05, 2017New tutorial
Hi internet friends! I put together a small tutorial using pandas and some API wizardry to do simple analysis. I use open data from Boston and the Wunderground API and deal with data...
In Feb 04, 2017VoterKarma app
Hello adoring public. This past weekend I attended the Debug Politics Hackathon, a hackathon dedicated to addressing issues with politics, voting and the rhetoric around both. It wa...
In Jan 18, 2017Basic python lesson
There’s a small chance I may be doing some part-time data science instructoring, so I put together a very basic tutorial for those just getting started with Python (or programming gen...
In Jan 06, 2017Reddit climate project
Just posted an exercise I did with reddit data and characterizing the conversation around climate change. What was super interesting was using the networkx library, which allows you ...
In Jan 02, 2017Data challenge code
Some bonus good news: I put up a data challenge I did for an application. And whether or not they think it’s worth hiring me over, you all get to peruse the code. I have some neatly...
In Nov 27, 2016Doing a mini-project
I’m updating my poorly organized/documented twittertools. Turns out Tweepy isn’t being maintained anymore, so I will be switching to python twitter tools. Get excited, I have some e...
In Nov 07, 2016Testing 1 2 3
Hello world. Going to start adding to this page, stay tuned, internet.
In Nov 02, 2016