Transcript#

This transcript was generated automatically and may contain errors.

So welcome to the Data Science Lab, everybody. My name is Libby. I'm a data community manager here at Posit, and I am joined by my co-host, Isabella Velazquez. Isabella, please say hello.

Hi, everyone. Thanks for joining us.

We are so excited to be joined today by Edgar Ruiz. Edgar, he is a maintainer of the mall package, which we will be talking about today. Edgar, would you like to say hello?

Hi, everyone. Thank you for joining. Also, thank you for being here in such cold weather, and hopefully we can show some cool stuff that y'all can use.

If you have not used the mall package, and we are hopefully sticking links to everything in the Discord chat. We are talking about a package that allows you to use ellmer to connect to a variety of LLMs and then apply LLMs to your data in a programmatic way, which means like, hey, I have this column of text maybe, and I want to transform it somehow or classify it, translate it, whatever that is with an LLM, and then have a column that represents the output from the LLM. This is amazing for me, because when I first started using LLMs, I was like, okay, but how do I use this, right? To me, it was just like this is a chat thing. How do I use this programmatically if I want to apply it to my data, and the mall package abstracted away all the difficulty for me, and I think that that's great.

Edgar's intro to mall

Yeah, so this is an introduction of an idea that occurred to me when I was looking at some output from LLMs some time back, and I saw that being able to ask it, is this text positive or not, was actually pretty straightforward. Even a locally installed LLM would actually do a pretty decent job at it, and that's kind of what got it started because at the time that I started writing mall, because of how fast everything has evolved, year and a half, year ago, companies were reticent about sending data into the cloud, if it was a cloud provider for the LLM. Now, that's not as bad. More companies are doing that.

So at the time, most companies wanted to keep it local, and it looked like something such as a LLM model was actually going to work well that could be installed locally. So one thing that I can ask folks here today is if you start seeing things that we could actually do better or add, I kind of talked through several different things like the sentiment, classify, extract. Also, you can do your own custom one, and Libby will explain more about what that does. If you see other ones that we could possibly add, that'd be great. Also, the code is open, and it really boils down to a simple prompt that I'm using to run recursively over your dataset. So improvements and prompts and things like that are always welcome.

The GitHub, the link to the GitHub repo is in the website, so please feel free to reach out.

All right, perfect. So what Edgar was mentioning, sentiment analysis, text summarizing, classification of text, extraction of text, translation, and binary verification, true, false. These are all built into mall, and then there's also a custom. So what I'm going to do is run through some code where I show you what it looks like to get this connected to ellmer, to an actual LLM, and then what it looks like to use some of these on your data.

NLP versus LLMs for text tasks

Can I say something real quick about the NLP thing? That's the other thing that I noticed that the local LLM was actually doing real well, even though it may take a bit longer to recursively go through a dataset. It's kind of like you're paying for the time that it will take you to develop your own NLP. Yeah, it's a tax, right? It takes longer, yes, to run, but you didn't have to spend the time doing the NLP yourself and tokenizing your words, and then doing all this other analysis you have to do.

It's kind of like you're paying for the time that it will take you to develop your own NLP.

NLP, I feel like, is much more predictable. In the chat, someone asked, what's the difference between using mall for these NLP tasks versus traditional or classical NLP? The difference is really just that you are using an LLM versus using traditional NLP methods, which might be like named entity recognition or sentiment analysis using a lexicon, a defined lexicon, or let's say you might be doing something that I love to do in NLP, which is sort of like a power analysis. That means that you are creating this lexicon or this database of maybe words or phrases that indicate certain things, and then you are basically using code and math to compare the text that you are working on to what you have in your database of what is a positive word or a negative word or a neutral word or a powerful word.

Live coding: setting up ellmer and Ollama

So what I will say at the top of the script is if you have not already installed mall and ellmer, those are the two that you are going to really, really need. I have a little install.packages for you if you would like, and I also have this repo for you.

So I'm going to go ahead and run this chunk, which is just going to load ellmer and mall and tidyverse, and then if you have access to a commercial LLM, you need to make sure that your API key is in your .environ file or your .env file so that ellmer can recognize it and use it. I am going to be using Anthropic Model today, but it's not through an API key in the same way that you would use it. I'm using it through AWS Bedrock. So what I'm going to also show you is Ollama, which is so much more accessible. It's a local LLM that you install on your machine. You can go to ollama.com slash download.

And Corey had just said, can mall be used with a local LLM? Yep, absolutely. So you can use it with Ollama and what I'm going to do is just show you that I already have Ollama installed and it's running. On my Mac menu bar, there is a little symbol that looks like Jar Jar Binks to me, but it's definitely, definitely Ollama. And here it is. And I just asked it a question. I said, tell me a joke. And it said, here's one. What do you call a fake noodle and impasta?

So Ollama is up and running on my system. I'm going to let Edgar hop in here and just mention the size caveat on some of these, because with local LLMs, you're pulling them down onto your machine to run them.

Yes, exactly. That's exactly the point. The issue is that these have to run in memory. So you may have a hard drive that's big enough, but you still need enough RAM to run these things. LLAMA 3.2 is wonderful because it's very generalized. Going back to the NLP thing, you're kind of limited on the amount of training data you'll have to do your own NLP. But with LLAMA, you have so many more billions of tokens you'll have on text. But the trade-off is that it takes a lot of space. So with LLAMA 3.2, the latest is two gigabytes. The latest LLAMA, which is four, is actually 67 gigabytes on the smaller side.

Yeah, 3.2 is what I recommend. Also, as time goes by, more folks publish models that are more specialized. You may want to use this in a very specific field. There may be some that are like that. So definitely follow the ollama.com website and you can search for your own type of model that you may want to use. But at this point, LLAMA, to me, is the better one.

So I'm going to close this and talk about the ollama-R package really quickly. So the ollama-R package allows you to pull down Ollama models and have them recognized on your machine very quickly. For example, I was able to run ollama-R pull and just give it LLAMA 3.2 as my model. And then once that's done, and I'm not going to run it because it already is, I can run this line of code right here that's asking to test the connection to Ollama. And I've got my status 200, which means all good.

Okay, so let's head over to do something very important here, which is something I actually forgot to do the first time I did this. But it is now in the repo for you to set a seed so that these random samples are reproducible. What I'm doing here is collecting a few data sets that are full of tweets. This is from one of the Kaggle data sets that's tweets. It's linked in the readme of the repo. So if you would like to go download that, download that tweets.csv. It is also in the repo as data right here.

And what I've done is I'm taking a sample. So I have a tweet sample that's just 25 random samples of English language tweets. Then I have a random sampling of Jimmy Fallon tweets. I like Jimmy Fallon tweets because they're sort of a mix up of really neutral like announcement tweets that are like announcing a guest on a show or something. And then also personal tweets from him. And then I have a random sampling of Katy Perry tweets because I'm going to talk about safety and curse words. And Katy Perry sure does like to curse on Twitter.

Connecting ellmer to mall

What I'm going to do next is I'm going to set up ellmer. This is not mall yet. This is still ellmer. It's very important to realize that ellmer and mall go together. You don't use mall without ellmer's support. And it's going to give you the ability to access those models that you are then going to use programmatically.

I'm going to create two different chat objects. I'm going to use ellmer chat AWS Bedrock to set up a chat with Anthropic so I can run this. And then I'm going to create chat Ollama. And I'm going to tell it the model is 3.2. So when I run both of those, I now have a chat object. And then I can use a mall function to switch between those different ellmer chats.

I just want to mention real quickly, we're talking a lot about ellmer and not mall. Ellmer, I went ahead and created the backend for ellmer inside mall because initially it didn't have it. It was only Ollama. So that is easier for you to be able to connect to different kinds of providers, right? That way mall didn't have to itself have a code to be able to connect to the different ones. Ellmer is my gateway, right? So I can just focus on the main stuff that mall does. So that's what we're talking about, two packages. And ellmer does a great job at this kind of integration as Libby's going to show.

No, yeah. And I want to hop in and answer David Onder's question here, which was what kind of specs should I realistically have to make good use of these models? And he said he had 64 gigs of RAM, more than enough. 64 gigs of RAM is more than enough. You would be fine with 16 gigs of RAM, I think for 3.2. But also when you go to the Ollama download and you look at the different models, each of the models actually has like two or three different size versions of that model to install.

Running LLM sentiment analysis

So I have created my chat objects for both Anthropic and Ollama. Yours might look something more like this, like chat open AI, and then model equals. Okay. So I need to tell mall I have this ellmer chat object. I want to use it. I want to use Ollama. So I'm going to run use Ollama.

This is me trying to stop a cache problem from building up because I'm trying to break the ellmer package here. So I will do this and tell mall I want to use ellmer on the back end. And then my LLM session, my model is using Llama 3.2. We're all good to go. Now what I'm going to do is I'm going to apply the LLM sentiment function to my content variable.

So let's go look at my tweets sample. This is the data set that I'm going to use right here. If we take a look at it here in the console, I only have two columns, author and content. Author is a Twitter screen name and Twitter handle and content is just the contents of the tweet as a string.

So I'm passing the content variable right here to the LLM sentiment and then I'm giving it some extra optional things. So let's go over to mall really quickly and we'll go to the sentiment function here. We've already given it the dot data because I'm piping it in. Remember when you were piping something in, that pipe, it takes everything on the left hand side of the pipe and feeds it to the next function as the very first argument. That means that this is satisfied. Then I'm giving it my column. That's that content column. And then you can give it options. This is like, hey, these are the options that I want to accept as output from the LLM. Mall is very smart and if it gets output that doesn't fit that, it will coerce it to NA for you.

And then pred name, most important one to me. I don't really love the dot sentiment dot pred dot whatever. I like naming it something specific, especially because I like to compare different models. And then you also have the option of an additional prompt. I wanted to ask you, Edgar, is this like a system prompt or is this an additional individual prompt?

This gets attached to the main prompt that's being sent. So if you, for example, something like this, you can say if the person says these specific words, then consider it neutral or whatever. So you can add some extra feedback to make it more fine-tuned and better.

But I am not going to use that. What I am going to do is just say, hey, here are your options, positive, negative, and neutral. And name my output column sentiment llama. So I can run this. And if everyone has lit their candles correctly, my code will run. A little progress bar at the bottom here. Love that. Aha, there we go. It was created.

And here we are. I now have an extra column over here of sentiment. This is only 25 rows, y'all. If you have a enormous data set, please be prepared to wait. So I had a question in Discord. Are you limited to those three options, positive, negative, neutral? I could give it whatever I wanted. So you could do something, yes, like very positive, positive, neutral, negative. I would suggest giving it instructions about what each of those things means, right?

Libby. Yeah. I think Javier is also asking if it can return the prediction probabilities, which doesn't because it's not part of the prompt. And ellmer, like if you go into the articles themselves, there's a way that you can actually build a prompt like that where you say, you know, give me a percentage you think that you can kind of make it do that. But it's not, I don't think you'll be able to get probabilities properly as if it was like an actual LLP model.

Also, while I have the floor, I just want to mention real quickly, because of the time, how long it takes and know that that's what we have the cache option. So like if you're running the same, you know, QMD or same script, and you just refine in it, and you're going to get the exact same result. If you were to run this, let's say LLM sentiment, then it works much faster the second time because it's automatically caching results. Into a temporary folder that was faster for you, you don't have to rerun everything.

Comparing Ollama and Anthropic outputs

So now I can, it's good to read all of these many things I have open. We will go open tweets sample sentiment one more time. And this will allow us to look at a comparison here between these two. The one on the left is Ollama. The one on the right is Anthropic. And we can see that they don't agree for all of these, right? So this is Ollama. And Ollama is coding, for example, this is negative when all it says is Mondays and fur babies. Mondays could be negative for sure. Like Ollama, you might be right. But Anthropic coded that as positive. We have another disagreement on the second one that just says tomorrow Toronto and a URL. I've noticed that Ollama frequently codes something that has a URL in it as negative. And I don't know why. But that seems to happen a lot. Whereas I think that Anthropic is more correct here that this is a neutral tweet.

What I'm doing is I am looking at this sample of sentiments, which I just showed you here. This is so that I can render the document. And you can look at it as a PDF if you want to. And then I calculate the agreement. So 0.68, 68% of the time, they agreed. Otherwise, they did not match. This is something where I really recommend you go through and review. We all know that LLMs aren't perfect. Neither is NLP, by the way.

Summarization and custom prompts

Let's head over to summarization because we are about halfway through. I'm switching over to LLAMA 3.2 again. And I'm going to use Jimmy Fallon tweets this time. And what I'm going to do is create an extra column in this new tweets underscore Jimmy Fallon underscore summary data set called Summary LLAMA. And the extra thing I've given it this time is a max words 10. Now, LLM's never listened to us. It's frequently not going to stop at 10 or give me less than 10. But a lot of times this can help stop it be too wordy.

My friend slept in our walk-in pantry. When he laid down, his feet would stick out the door. Hashtag my first apartment. Okay, LLAMA's on the left. Crazy and quite concerning living situation for a first apartment. Not wrong, technically. Let's see what the Anthropic summary is. Friend slept in tiny pantry, feet stuck outdoor. That is much more accurate to me, that Anthropic summary, right?

Met's bucket hat guy spotted at event probably more correct. So again, we have this like Ollama is not technically wrong, but is maybe not super helpful and then Anthropic is slightly more helpful. You're gonna have to go through and review whether or not this is good enough for you, right? If you're doing a summary task with Ollama locally, is this good enough? If not, you might want to attempt to lean on a more commercial LLM like Anthropic's models.

Now I'm going to go through really quickly because we have 20 minutes left and I want to stop talking and hand this over to show Python code. What if you don't like those standard functions, what if you want to do your own thing? Let me show you how to make a multi-stepped prompt using the Anthropic model specifically. What I'm gonna do is look at Katy Perry's tweets and have it classified for me whether or not they are safe or dangerous and I am going to define safe or dangerous really simply here and I'm also going to tell it to ignore URLs because I've noticed that LLMs just get like confused by URLs sometimes.

I'm also going to tell it safe text contains no slurs or curse words and I'm going to tell it assess whether the text is safe or dangerous and then return one word either safe or dangerous. I am saving this in a prompt object. It's just a string it's not going anywhere yet. It's just a string object saved as prompt.

I'm gonna go through my sampling of Katy Perry tweets. For my prompt option in the LLM underscore custom mall function I'm gonna give it my prompt and then I'm also going to give it what my valid responses are only safe and dangerous. I want it to coerce everything else to NA and then I'm going to name my new column that it's creating safety underscore Anthropic. So let's create this see how long it takes to run through with that custom prompt.

There is one question about why use paste if it is all text. Ah, that was me creating my prompt up here. I used paste because this is exactly taken from the docs for mall and this is just what was in the docs in mall and that's all I did. You can just put this all together. What paste is going to do is put this together with a space between each one. But having it like this is nice because it allows you to look at it line by line and understand what exactly you are telling your LLM to do.

Isabella lit a candle for us and it finished coincidence you decide. Okay, let's go look at our Katy Perry safety. Do we dare look at this live on the internet? We're gonna do it.

Okay, Katy Perry's tweets they have been classified as safe or dangerous. This first one is safe. A visual explanation to what people in Florida are in surrounding areas experiencing send them prayers looks pretty safe. Row three here has it has decided this is unsafe and it does have a curse word in it. I think that that is great. Good job Anthropic. You did a good job.

Okay, so before I move on I want to say that prompt took iteration. It did not work the first like eight times. I did it. It didn't do what I wanted it to do. I had to iterate and iterate and iterate until my prompt was going to do what I wanted it to do reliably. Also, LLMs are not deterministic. I could run it one time and run it again, and it could you know, give me a different answer. So use LLMs for things like this for anything at your own risk here.

Okay, so before I move on I want to say that prompt took iteration. It did not work the first like eight times. I did it. It didn't do what I wanted it to do. I had to iterate and iterate and iterate until my prompt was going to do what I wanted it to do reliably.

I wanted to also before I hand this over for the last 15 minutes to Edgar here talk about an ellmer, not mall, an ellmer function, which is pretty nice. It's the token usage function. I think this is pretty new right Edgar. It tells you how many tokens you've used. So hey, here are the models that you've used, here are the number of input and output tokens. Notice that I don't have anything that's cached because I told it not to cache anything. And then the price is in a — I have seen people where it has access to a price for an actual commercial model. This is me using it through Bedrock so it's not the same. But I have seen people use this when it's hooked up to a commercial model where the price actually does show.

Python demo: mall with polars

All right, yeah, you're his code. Yeah, there we go. There you go. All right, so I'm assuming that most of us here are R users. So I'm going to do some explanations here like additional explanation of how it actually translates into python. Actually how mall workflow translates into python, so that's hopefully can well you can also see as an R user how how it differs in this thing with uh python.

So mall, unlike R where you can have like a standalone package or something that can modify data frames, uh in python, but specifically for the pi data community, we have something such as pandas and that has some pandas data frame for example. But uh, there's a new one called polars. Polars is great. It's very fast. It was written in Rust. And this was a lot of folks are starting to use now, in fact, it's like the recommended way of using data frames going forward. So that's what mall does on the python side.

So what it what mall is is it's an extension of polars. And clarify here what extension what I mean by extension here in a second. I have this uh critics data frame loaded, these are criticisms, I guess uh or reviews about the Nintendo game Animal Crossing New Horizons. So can be pretty favorable using the Tidy Tuesday data sets. We'll have a link here so you can see essentially, uh the name and also the actual uh text of the review.

The next thing I'm going to do I'm going to import mall. Notice that but go to critics and type dot, which is kind of like dollar sign in R, if I start typing llm you see that's not available. And leave me short earlier where you can tap you can see where llm dot sentiment that that's not available. Becomes available whenever you actually load the mall package. Now you should be able to see it.

Which was something really interesting to me because I'm kind of new to python. So that's basically what extension means is basically that it becomes part of that um object that you have, which is your data frame is an object that now is extended to be able to use the functionality. So now that it's there, we have to assign it per data frame, which back in that we're going to use as a provider and I'm going to use Ollama here.

In this case I'm using the out of the box integration that I can use with Ollama which is the its own Ollama, uh library from PyPI. So once you start mall, it's actually going to install this package too. And then I can just use it. And very similar to how it works where instead of piping into it just basically call the extension and run it. Now it's gonna it's gonna run.

Yeah, so his little call there on line 34 is the data frame name dot llm dot sentiment so we're working on that data frame class. And then we're passing the text which is a variable name as a string. Correct. For anybody who's not used to looking at python, that's something that kind of took me a while to get used to where I have to always quote the names of the of the actual columns I'm using. So it's going to be quoted. So it's running right now.

In the meantime, we have a question from Dan asking in the documentation says in python mall is the library extension to polars. Does that mean that it does not work with pandas? Correct you won't be able if it's a if it's an actual pandas data frame and you don't have polars loaded. It won't recognize it. You won't be able to use it. You have to convert it.

I want to show here, as Libby mentioned there's also, you can select R or Python here and it can walk you through how you can set that up in your in your machine. Uh, all the same examples are available as well as the reference. Once you select python and any of the pages in the site, it'll switch everything to python. So the documentation will also be available for for it.

The package itself on both sides on R and python side come with a small data set that has three reviews and you can use it as you know to test it for the first time.

Yeah, I got I got the I put it on my other screen while running. Yeah, so you can see it ran and if I were to run this again, you notice that it ran almost immediately because of the of the cache. So you can see the big advantage of having the cache is especially if you're trying to re-render things and rerun them. It's not going to take as long the second time.

And then I'm going to run to see if there's anything that is not possible because everybody liked it. This is the only one that kind of didn't like it. So we can see there's some variance there as far as not everything was classified as positive.

For this one I'm going to use the user reviews. I'm going to read this in. And instead of 100 I'm just going to do the top 10 for right now. And what I'm going to do here is I'm going to extract the language that the review was written in. So we're using llm dot extract or llm underscore extract from R. We are passing it the text variable and then the prompt that we're giving it basically is I want you to extract the language, be it English, Spanish, etc. So that little part at the end that second argument is what the LLM is getting as its prompt.

And hopefully it does okay because we do have some reviews in all different languages. Yeah, good question Edgar. Where is this data from? This is uh from uh, Tidy Tuesday. It's 2020. Uh, May 5th.

Um, one thing I wanted to show is uh the dust translations. There's a few here that are not uh in English that could kind of showcase uh, the translate actually does refer to human language so you can translate from one language to the other. Um, one thing that I found with LLMs that is so cool is that you don't need to specify the origin language just the target language so it'll adapt to whatever language you say that you have. You know, Italian, Spanish, Russian all in the original text that you want them all translated to English, you don't need to specify each one as a you know, the origin just to say I want English which is very different from other translations or algorithms that are out there. The LLMs just pick it up and it does it automatically. And this is true both sides on the R and python side.

Using chatlas as the Python provider

Uh before we go. I just want to show you uh how mall works with an external LLM or external provider. Uh, so instead of uh ellmer we're using chatlas, which is the essentially the same package but for python. Um, these are this is not my package, uh or ellmer for that matter, ellmer is uh, at least package.

I'm basically doing the same thing. Uh, let me do it on the other side. We're uh setting up a chat object. And what I'm doing here is under user reviews. I'm going to use that chat object. So now it's set up for me to use and I can use it directly here. Now it is calling AWS is going to uh Anthropic and running it just like it ran with uh, in R. The only thing that we don't have is the nice progress bar. But uh, it is working. So oh wow on this one said it's all negatives. This is from user reviews, yeah, but it worked it looks like but it works. Yeah, some of them are right.

All right, everybody. We have a minute left. Um, I wanted to re-put the repo in the chat that um, we use that contains both of these files. This is just my personal repo if it's a mess, don't judge me. This was really really fun. I'm so glad that you hung out with us at the data science lab. I wanted to let you know that we have Sarah Altman joining us next week and we just happen to have another LLM one. The lab is not always about LLMs, I promise, but we are going to be doing data analysis with the assistance of AI which might mean data bot might mean Claude code. Like what does that look like in February of 2026 for Sarah. I hope that you'll come and join us and on Thursday at the data science lab we have Alexander Shock from Sitel, but he's also the effective statistician podcaster. Come with your data science career questions, especially if you are a stats flavored data science person, it's going to be transformative and wonderful. I hope that you have a fantastic rest of your week. Please hang out on the Discord server with us if you have more questions. We are going to try to answer them. We love you, and we'll see you next time. Bye everybody.