AI SDK Ultimate Guide to Building AI Powered Next.js Applications

Introduction

In a groundbreaking livestream session, Nico from Vercel demonstrated the power and flexibility of the AI SDK, showcasing how developers can create sophisticated AI-powered applications with ease. This comprehensive guide explores how the SDK simplifies AI integration while maintaining type safety and providing powerful features for building production-ready applications.

AI SDK https://sdk.vercel.ai/docs/introduction

Understanding the AI SDK Foundation

The AI SDK addresses three critical challenges in AI development:

Unified API access across multiple AI providers

Type-safe interactions with AI models

Structured handling of streaming and non-deterministic responses

Key Features:

Provider abstraction for easy model switching

TypeScript-first development experience

Built-in streaming support

Zod schema integration for type safety

Structured Data Extraction

Making AI Responses Predictable

One of the most powerful features demonstrated was the SDK's ability to enforce structured outputs from AI models:

Use of Zod schemas to define expected response formats

Type-safe handling of AI-generated content

Support for complex nested data structures

Multi-language capability out of the box


const schema = z.object({
  category: z.enum(['billing', 'support', 'sales']),
  urgency: z.enum(['low', 'medium', 'high']),
  summary: z.string()
});

Building Interactive AI Interfaces

Chat Interfaces and Streaming

The SDK provides robust support for building chat interfaces:

Built-in streaming support

State management through the useChat hook

Real-time response rendering

Multi-step conversation handling

Generative UI Components

A revolutionary approach to AI-driven interfaces:

Dynamic component generation

React component integration

Interactive element support

Stateful component management

Function Calling and External Tools

Extending AI Capabilities

The SDK introduces a powerful tool system for external integrations:

API integration support

Weather data retrieval demonstration

Type-safe function definitions

Multi-step processing capabilities

Key Takeaways

The AI SDK significantly reduces the complexity of building AI-powered applications

Type safety and structured outputs make AI interactions more reliable

Built-in streaming and state management simplify complex features

Function calling enables powerful external integrations

The SDK supports rapid development while maintaining production readiness

Final Thoughts

The AI SDK represents a significant step forward in making AI development more accessible and reliable. Its thoughtful design and powerful features enable developers to focus on building great experiences rather than wrestling with AI integration challenges.

Transcript

I have a legendary show today with a member of Vercel. And we're going to talk about the AI SDK and how this can actually unlock really amazing and exceptional user experiences for your AI apps. So today I have with me, Nico. Hey, Nico, how's it going? Welcome to the show. Hey, man, it's going very well. I think a little bit colder here in London that it is for you in Hawaii. But I'll be just fine. Yes. No, I'm so delighted to have you. We're literally covering so many time zones here. You know, I have to wake up super early. It's about 7 a.m. And I'm just really happy to have you here because one of the things that really kind of blew me away while I was at the Nextjs conference was the fact that you did this really great presentation about the AI SDK. And it's open source. And it's a way to actually make your AI apps responsive in these different ways. and part of my background is having work with UI and UX experiences on mobile. It's I'm learning more about web frameworks. And what you were showing was a level of detail that I haven't really seen in other frameworks. And this is just getting started and is continuing to mature. And some of those things included some structured execution or structured extraction, some generative UI stuff and some Zod schemas and validations. And so if this sounds French or if this sounds French, or if this sounds another language to you, don't be put off by this because we're going to try to approach this a little bit from a, , sort of beginner or whatever. I am still early and still learning Next.js as a programming language, but I come from the client-side world. And so what I want to do is try to maybe help you as a user explain some of these processes. And I'm actually kind of coming in from the top down. And when I say that, I'm coming in more from the, I pretend to be a user, how would I use these devices and what are those types of things? and experiences that I can kind of tighten up to make my apps just that much better. And the way that we can do that, you know, is by looking through the documentation. But really, who has time to look through all this documentation? It's too much, right? So we can speed that up by having Nico here. So, yeah, now I'm really excited that we're going to be able to kind of cover some of these topics here about, you know, how do you talk to a language model? How do you get that information from a language model? And then once you have that, then how do you combine that into? with some type of UI experience. And so we're going to walk through a series of code and some slides. And this can be really highly interactive. If you have any questions, feel free to drop them into the actual comments and reactions. So that way, I'll slide you right in, just Pulp ML here. And also Hertzfeld Labs, you know, that's awesome. Thank you, Hertzfeld as well. So, yeah, without further ado, thanks, Nico. Awesome. Well, yeah, thank you for having me here. This feels really fun as well. It was really fun to finally meet because I think we've known each other for almost a year now on the internet. That's right. And so it was really cool to finally meet you and meet Mickey and Dhravya and all the other guys at NextConf. But yeah, thank you for having me. What we want to do today is go through a bit of an introduction to the AI SDK. I don't have a strict agenda for this, but we were going to take, as you, had said a bit of my session, my workshop that I had given at NextConf, which was talking through how to build an AI app. And so we're going to go through and hopefully learn a bit of this through actually coding and building some cool stuff. And with that in mind, I have about four or five small little projects that are split across a few different areas that we'll we'll use, we'll start by building a few terminal programs just to understand how to call the model programmatically or call models programmatically. Then we'll look at something I really called Invisible AI. And these are features that leverage AI to improve existing applications that aren't just chatbots. So these can be things linear automatically generating the title of an issue from a Slack comment or something that. Wow. Or even Figma allowing users to search for assets based on what they're looking for, not necessarily what the file is named. So these are quality of life improvements that aren't AI applications. You wouldn't even really know it was AI if they didn't tell you. But certainly make using the application as the developers intended as as simple as possible. And then last but not least, as I think you mentioned, we're going to be looking into actually building a chatbot and looking at generative UI as well and how we can give the model the ability to effectively go beyond text and actually generate interactive components. That's excellent. Yeah, that sounds good. I'm ready for it. If you're ready for it, make sure you drop a . Drop a comment as well while we're watching this live. And also, if you leave any comments on YouTube, I'll work through afterwards. If you're just watching this in the stream after, you know, I'll be up in the comments, you know, answering questions as well. And then if you're on X, you know, feel free to interact, share this with some friends, let people know that this is happening because this is going to be a great working session. It's kind of rare that we can actually, you know, work with someone who's on the inside working on this type of stuff and has just a breadth of knowledge to share with us. So I'm super delighted. So, yeah, if you want to start with the screen share, and then I'll be able to. pop that into the screen here so we could do this thing and so classic thing you're supposed to do beforehand right it's okay it's all it's all pretty fast here so we'll get this set up so that's how you know this is live my friends all we do is yeah we only push to prod that's all we do on friday on friday yeah that's right yeah yeah so yeah before we go into things i do want to make sure that people are aware of our documentation if you do love reading documentation our docs are at sdk.vercel.ai. And in here we've got a bunch of awesome guides alongside templates, which we can talk about a little bit later. But what we're going to do today, as I mentioned, was we're going to focus first on building a few little projects to understand the SDK. Now, you may know Vercel from Next.js, a popular web framework built on top of React. The AI SDK, is something that we built as we were building v0. I'm sure viewers of the Ray Fernando stream would be familiar with v0, as I know you've used it quite frequently, but if you haven't, it's the fast and fun way to build applications just with natural language. But in building, in trying to build this product, we ran into some issues around a nascent tooling ecosystem. And they really focused around three primary areas. One was that interacting with models was typically done using the library of the model provider. So if you're using Open AI, you're using Open AI's library. If you're using Anthropic, you're using theirs. And this is great. that library is tailor made for that model. but with the speed at which these models are coming out, if we wanted to test out how Sonnet 3.5V2 worked on our application that was using 4-0, we would have to rewrite the entire application, which isn't really feasible. Additionally, as well, we started to see great use cases internally for using multiple models from different providers for different tasks. So you might use Gemini's Flash for classification tasks because it's so cheap and so fast. And you may want to use Sonnet because it's great at React or it's great at writing. And so obviously maintaining two different libraries for these different tasks just isn't a great scalable solution. So right off the bat, one of the big things that the AI SDK provides is a unified API to be able to call any of these language models. So we have these concepts of providers. These are language model providers. So you can think of Open AI, Azure, Anthropic, Bedrock. And the idea throughout this unified API, if we go to just the overview here, is that we provide this way of calling the language model where if you want to switch between models, all you have to do is switch out a single line of code, a single line of code, not counting the import, as these are ballooning out. And this is really powerful because it allows you to really easily experiment across different models, but really easily build an application that can scale. And scale not only in terms of size, but also in terms of time. As you go two, three, four months down the line, you have an API that is intended to work across all of these providers. So we support a ton of providers at the moment and also provide this, the entire library is free and open source beyond just this provider model. But as a result, we have people from the community building community providers for whatever provider that they want outside of whatever might be. Our first party core team supported providers that you see up here on the left. So sorry, that was problem number one. And I'll go through this as quickly as possible, because building is a lot more fun. But it's helpful to have this in mind when you see some of the design decisions that we took on the API. The second point was around just wanting something that was TypeScript first. most of the tooling was Python first because that is where the research for the models was being done. And obviously knowing that we were going to build v0 with Ness, we wanted something that was built tailored from the ground up to work with Next.js and work with TypeScript. And importantly, work also in any no jazz environment, as we'll see in a bit. And the last thing, which is a nuanced but important point, is that AI brings a ton of new challenges from a UI and product perspective. And those are namely, we don't know exactly what the model is going to give us. It's non-deterministic. And also, when we're dealing with streaming, we don't know how much of what we want it's going to give us in every chunk or every piece that it's giving us. So we have this novel difficulty of how do we render the information that we want the model to generate within the UI. And so those were the three main things that led us to building the SDK and the three main problems that we've, I would say, really been focusing on heavily from a DX and a product perspective with this. So with that out the way, I think I can finally start. Yeah, that's very handy context for everyone who's listening, especially if you're going to be where you're watching this later on. Because I kind of want to have this serve as something that you can go back in reference. You know, for me, I have sometimes a lot of difficulty scanning through documents and just kind of having that perspective. That was kind of one of the first things I have to kind of digest around. It's , okay, what does it look when I want to test a different model? Another thing that I've been kind of realizing is that when I'm starting to prompt models, you know, basically taking the prompt approach first. So just prompt my code and then later go back, try to understand it. And I didn't really understand what the AI was walking through. So this perspective is going to actually help you as a user prompt better. Because once you understand the structure of how some of these frameworks work this, the power is then unleashed because you can use those right keywords you're saying, models one of them, providers are one of them. In this specific context, the language model. is going to know that you want to deliver something with the AI SDK, and you're , I want a Google provider. It's going to go grab those bits of information. So that way, when it crawls through those documents, it's going to make it easier for you. So it has both purposes. You're going to need to do this if you're just going to code yourself. But if you're also using AI to assist you with code, having this context, just with the way Nico was explaining, is going to be very helpful. So if you want to go back later into the stream, you're going to be able to just go back to the YouTube part and just kind of relisten to that section is going to be super handy. for you. So just kind of keep that in mind if you're a person who's just prompting. It may sound word salad to you, but just kind of picking up on those terms is going to be helpful. If you're a little more advanced, you know, you're going to be , oh, okay, cool. This is a really cool framework. I want to check this out and have some fun because I do come from the Python programming background as well for writing tools and various things. And so this is part of the reason why I, you know, said, hey, let's go play around with NextJ.S. So this is great. Yeah. Cool. And yeah, I, I'm going to keep this really high level and not dive into one of the things that I quite enjoyed doing for the session was a very very very brief introduction to language models and end to prompting because I talk from experience of kind of ignoring that for the first few months of building applications prompt engineering that's that's a meme right people are getting they're just you're just talking if you can communicate that's fine and it kind of highlights a funny thing in that most of us struggle with communication. And so when you put this into the context of working with language models, it's one of the reasons why you hear a lot of people say, oh, chat GPT is a fad. This AI stuff is never going to change. And other people saying, , it's really changed my life. And it's changed the way that I work. And because it's a tool. And any tool, you need to be intentional with how you're using it with the strategies. And this sounds kind of mumbo gumbo a bit, it, but it is, I highly suggest going through our foundations section, just so that you have this as a foundation. The last thing I'll say with prompt engineering, which I to think about is it's a little bit a chef approaching trying to cook a meal with bad ingredients. if you start off with bad ingredients, it's more than likely you're going to end up with a bad meal, no matter what kind of pampering you do to the meal, both presentation, And it's the same with being considered communicating clearly. Yeah. So I think that's good. I think that's enough of me rambling with just a blank website in front of me. And we can maybe get to actually. Let us cook, let's go. If you're ready for this, drop some likes right now in the stream. So hit the likes in the stream. If you're really watching this, just hit the button. Share it with a friend. Tell him , yo, these guys are about to cook. And it's about to go down right. now. So of course we have to pop in the code editors and just let us know kind of what you're using right now. Sure. Yeah. Code editor up here. This is that you can use anything you want. I really I really said. But what we've got here in this project is a Next js project and we've got a few basic packages configured right off the bat. So we do have the AI SDK setup, which is this package. here. They did, or we did manage to get the AI package on MPM. So installing is as simple as MPM. Oh, that's, that was just my, my, my API key, which I'll have to come in out later. Let's see if I can open that without throwing that away. Good job, Nico. That's a good start. YOLO. So the AI package is installed just with MPMI AI. Then we've got here our first provider and the provider we're going to be using in this project, which is the OpenAI provider. We've got a bunch of Radix UI dependencies, which we'll see later, and this is for Shatzien, which we have set up already. If you don't know Shatzien, it's a very configurable component library that uses RADX UI primitives under the hood to give you. an awesome, awesome building experience with Next Chess. And what's really cool is V0 knows shadcn really well. So you can prompt for buttons and things that and immediately pull that out and put it into your code. But that's me getting distracted. A bunch of other things for NextGS and Tailwind. And then finally Zod, which you had mentioned briefly before. And Zod is what we're going to be using for constraining or forcing the model to to give us back what we want it to give back. So cool. Those are the dependencies. The first two things that we're going to be looking into is actually some terminal programs. And this is for me, one, to show you how you can start thinking about calling a language model programmatically rather than in a UI from chat GPT or Cloud or whatever that may be. But it's also to stress that the AI SDK runs in purely. your no-j-s environment. So you could run this in GitHub actions. You could run this literally on servers wherever you want and treat it any other asynchronous API that you would be calling. So the first file we're going to, or the first use case that we're really going to be looking into is a really cool one, quite a basic one, but that is giving the model a bunch of some kind of material and then asking a question on it. Pretty straightforward. And throughout this to kind of save me from having to type everything over and over again and be able to explain all of it to you. I'm going to be copying and pasting the code that we're going to add from a little companion site that I pulled together. But hopefully this gives an opportunity for less typing and more actually talking and understanding. If you want to just pop up the font maybe plus one or two or see if you're going to. Yeah. Okay, perfect. Yeah, I think there's some folks or on their phones trying to read this as well. So it's just, yeah, that's perfect. I get you. It's a little bit small light. So, cool. Let's start off with, , what we're actually passing the model. And this is Silicon Valley. We are talking about AI. So what better than the founder mode, the infamous founder mode essay from Paul Graham? So we are going to be giving the model. The intention is to give the model that essay, which we're pulling out just with the file system package from Node. And then right now all this script is doing if we were to run it with PMPM extraction, which is just going to run the script. It's going to pull out the first 100 characters of that essay. So let's look at what we're going to do. The AI SDK has four main functions, and we're going to focus on the first one. One of the great things about the SDK is that the naming has been very simple. So to generate text, you use the generate text. function. So we're going to import that from the AI package. Now, we talked about providers. The first thing that you need to do when you're calling any language model with or using any AI SDK function is specifying the model that you want to do, that you want to use. So in this case, we're going to want to use OpenAI's GPT4O, which if I can hide this out the way, GPD 4-0. Great. I skipped this step. GPD 4-0 mini, sorry. We can update that for the next one. The next thing that we're going to do is pass a prompt. And a prompt is just anything else that you've been using on chat GPD. That's just telling the model what you actually want to do. In this case, what we're going to do is we're going to pass a prompt saying extract all the names mentioned in this essay. We're going to add two blank lines and then we're literally just going to pass in the essay as a string. The final thing that we're going to do is we are going to log out the result that it generates. So I can now copy this over, save the file, run the script again. And in a second, we should see the four names that are mentioned in the essay. This is obviously really simple, but imagine trying to write an algorithm for pulling out names. I don't even know if there's a regex that can pull that kind of thing out. That's amazing. This is something that with something GPT4O Mini is literally negligible from a cost perspective is pretty fast. And gets us exactly what we want. And to, I mean, to prove that, we can go into the essay and look up. We've got Brian Chesky here. Who are the other ones we have? Ron Conway. Ron Conway. Right? There he is. Yeah. Steve Jobs Yeah, there he is And then John Scully Yeah, so just that It's kind of There's really simple But this already communicates me Okay, you could just play around with this And ask it what to do So for example, let's Let's maybe change up the prompt And have something maybe a little bit less defined and something a bit more arbitrary what's the key takeaway in 50 words. Now this is interesting because a key takeaway in 50 words is something that likely would take me quite a while because you're going to have to read through it first, maybe read through it again and then think about, I wouldn't really be thinking in 50 words. I'd probably break it down into a couple. But we can literally run this one command. And then we'll have out this summary in 50 words. Brian Chesky's talk highlighted the distinction between founder mode and manager mode and running companies. Founders often received misguided advice to adopt a managerial approach which can hinder their success. Understanding and embracing founder mode could lead to better outcomes as evidenced by successful founders navigating this challenge. So cool. That's amazing. I mean, again, I to pause. this may seem silly to pause at something that seems relatively simple in what's happened in the last 12 months, let alone the last two months. But this is something that we're doing just simple node environment passing. It's an essay. I'm just asking what to do and getting exactly what we want. Yeah. So somebody asks, would that be viable for , you know, a huge 100,000 line type of text? Or would Rag be better? Rag would probably anything, it depends. The answer is not great. And this actually highlights a great thing is that there's no right answer to any of this stuff. If anybody says that they know, they're wrong because this stuff is one, every model is different, has different characteristics. Obviously, with different parameters is going to respond differently to different inputs. But I think, depending on your use case, yes, and it's worth experimenting with. anything, giving a whole huge amount of text is going to, and forcing the model to point and find something in a pinpoint location, it's going to be less effective than necessarily giving it just the relevant information and asking it to synthesize an example. there is that it will be obfuscated a little bit by being surrounded in that. That's true. And part of the reasons why I kind of wanted to have this talk today about the AI SDK is because of that provider's thing that you exactly said, how quick that you can change out a model or maybe run two calls at the same time that can then run the extraction and then you can compare the results yourself. I think those are having a framework that can do that for you, that you don't have to rewrite the structures of your prompts and then type in, you know, oh, this is model this. Oh, this one always needs temperature or else it's going to fail. You know, just those types of things are just kind of the vein of your existence. So we're going to probably get into a little bit more of how that provider thing looks . And this is just a really good example. That's a great question. Yeah, because now we start getting into the technicals of the rag technicians and people who are trying to study this as an actual serious field. Yeah. That's amazing. Yeah, yeah. And I think when in doubt, this is probably the most overblown Shay analogy now, but thinking of these models very, very smart interns with zero context on a certain task. And so they're very capable. But equally, if you don't give them instructions or you make things somewhat obfuscated, as I was saying, you're reducing the likelihood of getting the outcome that you want. And at the end of the day, this was something that I mentioned in the session, which sounds a bit obvious when you say it and you think about it, but language models are trained on insane quantities of text to be able to generalize patterns in language, to be able to take a certain input and basically predict what is the most likely from that. And so we can use that understanding to guide how we're prompting to say, okay, if we want the model to do something, give it an example of what it is that we want, saying key takeaway, it's likely that, or summary, it's likely that the model has this concept of summary in its training data in many places. And so it will, it's a way of, the model. sinuating or implying what the task it has to do is. Very cool. Very, very cool. Yeah, some great questions. And yeah, keep the comments coming with people. And as we move through different things, we're going to kind of cover some of those. That's so cool. And so on that vein of testing and experimenting, I do suggest, , as we're doing in this next step, switching up to maybe a bigger model. So before we were using 4O Mini, if we instead look at how 4O would respond, we can see a thing difference. And in this case, we get the key takeaway is that conventional wisdom of scaling companies, blah, blah, blah. The interesting thing I want to highlight here is that for O picked up that this was akin to Steve Jobs methods. So really picking up on the subject that wasn't necessarily mentioned by name in each of the examples throughout the essay, but was the subject introduced at the beginning of the essay and then alluded to or implied that these were things that Steve Jobs did throughout it. So this is more of a difficult, a more accurate takeaway, both in terms of details as well as kind of cues that weren't necessarily stated. So just another reason to test, play around, experiment with all of your your functions and all of your use cases because you never know which model is going to give you the output that you're looking for you may not want the you may have preferred the 4-0 mini and that but you you just need to experiment as much as you possibly can that's cool that's cool cool so that's the first one i hope to to show you how simple this is i mean this is 18 lines of code where i would really say that only what is it, eight of them are really the AI SDK logic, but to have the model give us what we wanted to get, which I just want to emphasis how this API has been made to be as simple as possible and as akin to programming patterns that you're already used to. So this is awaited any other fetch call that you would have to an asynchronous API. and it's intuitive. Okay, I tell you the model that I want to use. I give you a prompt, and I'm going to get out a text generation. There's no, I'm thinking about this in AI because it's completely different. You rely on primitives that you and programming patterns that you already understand. So cool. The next use case, I'm really excited to go through. And I'm excited to go through this because this is something that we actually have in production at Riesel. And this is taking a bunch of unstructured information and being able to classify it to be able to do something with it. And our use case here is that if folks are familiar with Next.js, it's a very popular web framework. And in being open source, it gets a lot of issues in GitHub every single day. And there's actually there's a relatively small team of people that are managing those issues. and those can pile up very quickly. And so what the team built, and shout out to Sam Coe, who built this, was a way of running each support request through a language model to be able to pull out a few structured categories. Okay, is this a feature request, or is this a bug or something this? And then be able to imply things severity and other things that to then trigger automations. So sending a Slack message, directly to the person or triggering a zap or whatever it may be. This kind of thing leads to us having a much shorter time between these issues being presented and being resolved. So what I want to do here is show you this but in a very controlled environment. And this is going to introduce us to a few of those structured extraction things we were talking. And you were mentioning at the very beginning. Let's go, folks. if you're ready for this, hit that thumbs up right now. If you're watching on YouTube, if you're watching on X, definitely retweet and tell a friend because this is about to get, we're about to get deep right now. So stay tuned, my friends. Yes, let's go. This is the stuff I've been waiting for you. This is the stuff you enjoy. This is the stuff that really blows me away. So I'm excited to share this with you guys. So what we've got here is a very authentic JSON list of issues for a fake SaaS company. So each issue has an ID and some text. In this case, it's , I'm having trouble logging into my account. Can you please assist? The export feature isn't working correctly. I need help integrating the API, things that. So we've got seven of these. And so in our TypeScript file, we're just importing in those support requests. And then as it is right now, we're logging out the first two items. So if I run this script with, as so, you can see we're pulling out those first two support requests. So last time, we are going to first try using generate text. And the idea of here being very simple, we were able to before take an essay, throw it to a language model and ask it to do something. Let's kind of try and do the same thing and see what happens. So what we're doing here is we're importing generate text. and we're importing OpenAI, the provider. We're calling generate text again, specifying we want to use GPT40 Mini, and then we're passing a prompt that says, classify the following support requests. We're giving the model the categories that we want. So we're leaving very little up for the model to insinuate or imply from the response. So we're saying these categories are limited to billing product issues, enterprise sales, account issues, product feedback, two lines, two new lines, and then we just stringify those support requests and pass it to the model. The very last thing that we're doing is we're going to lock out the resulting text generation. So if I copy this over, paste it in, run the script again. In a second, we should see, depending on my Wi-Fi. There we go. Let me zoom out a little bit because I think the formatting is a bit messed up. But you can see that the model responded here are the classified support requests. Number one, account issues. I'm having trouble logging into my account. Can you please assist? It also passes in an ID and so on. It goes through all of these. And what's cool is that these are correct, , or how I would approach categorizing each of these issues. But this isn't really usable as a format. So what could we do here? let's keep in mind that the model has also done the classic thing of telling us what it's actually going to do, which I think we all know the the all caps, please only respond with JSON. Please, please, please. So we could do that. We could do that and then maybe write a parser or something that. But I don't really want to do that and I think it would take a little while. And instead, this is where we can introduce the second AI SDK function called generate object. So again, with the simple naming conventions, this function literally generates structured objects. So if we, what we're going to do is we're going to copy over this text. All that's happening here is that we change from generate text to generate object, and we also change the function we are invoking. And then finally, instead of logging out the resulting text generation, we're logging out the resulting object. And I can bump this font up again. But you'll see we have an error. And the reason we have an error is that we haven't told the model what we want it to give us. So we need it to generate an object, but we haven't told it. So what we're going to have to do is define what's called a schema. And if you're familiar with Zod, you'd be familiar with this idea of schemas. But basically, Zod is this incredible, I'm the biggest Zod fanboy. It is a typescript library for effectively defining these. these schemas with a really nice ergonomic syntax. So to define an object you use the Z class and then you use the object function And then within there you can define any key you want In this example if we want to think about what we want the model to return we want it to return an array of have the request, which is what the person had asked for, and then a category. And this should be the constrained category that we had provided in the prompt before. So let's copy this code over. And I think our, yep, our errors go away. And if we run this in the console, let's see what happens. Okay. Kind of progress, I guess. We got one instead of the seven that we were supposed to get. Yeah. But we did get the correct thing for that one. And we did get the, you know, we got the correct category, which is really cool and it's constraint. But this isn't what we want. How can we fix this? Well, we could go in and update this schema to be an array of these objects, but the AI SDK actually has an output mode that we can set on the generate object function that allows us to tell this generate object function whether we want just a single object or we want an array of the object schema that we've provided. So by adding this one key, it keeps jumping, but you'll see that we've added just this one, key right here, this output array, if we save the file, head back to the console and run it again. And I'll take a second because it's doing seven now. You'll see that we have those seven requests in this structured JSON, this structured object, which is actually an array, nicely to go. And to prove that it is nicely typed and ready to go, we could go through here. here and say, let's check the first object that's in, let's check the first element that's in this object. So if you say result, that object and take out the first one, and we see that it's typed, it has a category and a request. And if we look at the category, we can see that category has literally constrained to exactly those types. So if we wanted to surround this in parentheses and make some kind of condition here. Say we want to check if category equals, bang, we've got Intellicense all lined up here, which is this alone is such a game changer for working with models. And what's going on behind the scenes here is not only is the AI SDK providing that schema to the model to say, please, please, please return it in this format. it's also parsing the output and sending it through Zod so that if the model's output doesn't meet the schema that you pass in, it throws an error. And so, , you don't have to do any part. You know that we can be guaranteed that this type, that this category is one of these five options because it's already gone through a Zod parser to ensure that it actually has this option and it's one of these potential. values. So this is , this already, we've solved that kind of that problem number three of how do we know what exactly is going to come out. And we haven't gone into streaming yet, but this is hugely, hugely, this is a game changer for working in these kind of smaller, smaller use cases. So you could imagine, for example, mapping through each of these elements in the object and saying if category equals this, send it to Slack or do something else. But what I'd love to convey here is that it's kind of the same way you would think about doing this if you just sent it to an API that was doing this in a discrete manner. Sorry. So, yeah, this is really, really, really cool. Yeah, that's really cool. I just kind of want to see if we can kind of digest that a little bit. I can take a pause there. Yeah. Yeah, so to just digest that a little bit for those who are maybe watching kind of on the earlier stage and maybe not completely understand, , especially Zod, if Zod is kind of new to you, at a very high level, what you're saying is that I have an app and I give it these prompts and I give it these really long prompts to try to generate stuff and try to say, oh, don't say this, but say this. Or here's an example of a structured output. Can you just give this to me? So instead of just trying to tell the model that you should do things, I should just kind of let the model be free. Let it do its analysis and let it spit out. Certainly, I'll be happy to help you and whatever X, Y, Z things. wants to tell me before the output comes out, I could take this and say, okay, now I want the data structure in a very certain way. So I'm going to just import Zod and select my little schema array that I want the data to be presented in. For example, YouTube timestamps, you know, I want a schema that's going to say, here's this first thing I wanted to say as far as a string, and then here's this nested array to look for these specific timestamps looking things with this other text next to it. So if it spits out anything that, that. And if it says this extra stuff, Zod is just going to go ahead and, you know, , get the output from the AI and then put that in and format it. And if it doesn't come out with that formatting, it's going to throw an error saying, hey, you know, your time sent to resolve or something that. And then kind of just at a high level, that that's the way that you want to think about it. So I was just thinking about this more of , if you're architecting an app, the reason why you would want to use this is because you want this type of structure to be available. so in your user interface experience, we're kind of building that base here. It's make sure you have the data, make sure that consistency is going to be there, and that you're going to be able to handle that on the way up in the stack so that when you present something to the user, it looks more complete, not just kind of , you know, these extra fluffer, or sometimes you don't know what's going to come back. So I think this is a huge unlock for me, and I think that's kind of where I had that, , you know, moments and stuff. So that's cool, cool. Yeah, it's really, It's pretty unreal. And we're kind of getting into where it starts getting even cooler for me, which is as you're iterating. I will sound a broken record as I continue to repeat this. But with these, when working with AI, you're never done. You've never got the perfect prompt. You've never got the perfect logic. You're always doing more. You're always trying to see how you can improve and modify to get a better output. And one of the really fascinating things or really exciting things of work in this way, particularly with Zod, is when I want to add a new feature, I mentioned before how we have attached, had the model try and infer urgency from these issues. I think previously the way you'd go about it is you'd probably go back to your prompt and you'd start to, you'd want to add in some way, okay, we're also. wanting to infer urgency, this is how you should think about inferring it, and so on and so forth. Instead, when you're iterating with this structured object format with Zod, it's really as simple as just adding another key to your schema. So in this case, if we want to pull out the urgency, we say, please return another key for urgency, and we're using this enum type to, again, constrain the possible output options that the model can give us. So if we were to update this and say we now want the urgency as well, save this file. You can see we've just changed that right there. If I clear this up and run it again. And part of the reason why we're wanting to extract the urgency is because in our prompt earlier, we told it to give us some urgency values in its output, right? No, no, and that's kind of the crazy thing. Wait, hold on what? We just classify the following support requests. And that's what's really cool about this, is that we're not, we're not, this could probably get a lot better by providing a system instruction and system instructions if you don't know are character instructions for the model. you would give to an actor to say, this is who you should be. So you'd maybe add a system instruction you are an expert in customer support. , yeah, co-pilots already. So you'd add something that. So it has an idea of how it should behave. Whoa. No, we didn't have to change the prompt at all. All we did is we say we now want this urgency key as well. And then we want the model. We tell the model you can respond in three ways, low, medium, and high. And if we look at the output, we can start to see what the model was doing. So we could say, , in this case, the dashboard's not displaying real-time data. How can I fix this? And what's interesting is that it's cited this as high. Now, taking this a step. further, you could add something maybe urgency reason and say that this is, sorry, this is a a string. And in fact, we're going to put this a little bit, we'll put this before. And let's see what happens here. In this case, we want the model to describe why something is urgent before actually tagging it as whatever urgent thing is. should be. And so in this case, the dashboard is not displaying real-time data. It says that this is medium urgency because the user is experiencing a malfunction in a product feature which may affect their workflow. Instead, a user inquiring about support options would be more low because it's not to do with their current use of the product. This you could obviously improve with adding to the prompt. But it's a really, , it's a really cool way of working with this, where you're just adding keys to get more out of it. And in this case, extending your ability to use this information massively with just one additional line of code. Wow. Wow. So just to summarize, we didn't change anything in the actual text prompt saying, I want you to pretend to be a magical wizard who understands this really obscure thing. We just said, here's a schema of kind of what I want. And then before we have the urgency process. think of everything kind of as a line by line processing step. So in that next step, which is going to be urgency, if we add something just above it to give it a reason, then that has a better sense of how it's going to output the next step below it. So think about these kind of as change of information that's kind of flowing through. but it just kind of all happens at once with these AI systems. That is awesome. And that's a really important point that you mentioned, which is I did it without mentioning it and is why you saw the model's urgency reason change. And that is you can improve classification this by forcing the model to first actually write out the reasoning behind it. As a result, it's deducing this urgency reason or this actual urgency valve the reason that it it writes out above it so you can think of it the way that humans would approach in general thinking about something complex why something is urgent you'd have you'd be more likely to get to the right outcome if you thought about it first and then you decided rather than just deciding straight up very cool very cool wow we have, I think, one more. Yeah, yeah. So this is really cool. A lot of this stuff makes me giggle a lot because I just kind of can't believe that it works in a lot of cases. And that is a lot of these models, you'll have to test this out depending on the model you're using, but are multilingual by default. And so this is another step forward for how useful this can be. I've prepared beforehand a version of these support requests in a bunch of different languages. So we've got German, Spanish, I think this is Chinese and Japanese, and then Italian and German again. So we can literally change just the JSON file that we're importing, keep everything else the same. And the only other thing that we'll do is add a key to the Zod schema saying language. We're not telling it actually much beyond just saying you should return a language. And we're going to try and see if the model can infer, one, the language that's being spoken, but two, also return us accurate information across all of those support requests. So if I save this file again, we'll clear the console and run the script again. we should see in a second exactly the same output that we had before in all of these different languages with the correct language being pulled out so Japanese being pulled out German being pulled out Spanish but this highlights an interesting issue which is when I say language I'm thinking Spanish not yes I'm thinking Chinese not ZH how can we we how can we tell the model what we want it to do on this level of a key? We could maybe go into the prompt and add to the end of the prompt when you return language, make sure it's the full thing written out, but we can actually use one of Zod's core functions that it provides, which is chaining a describe function. And this is really cool. It allows you to describe these keys in line what they should be. And this is extra context that's going to be provided to the model on the level of the key. So in this example, we could say for the language, the description of this key is that this is the language the support request is in. And then we give it examples. And this goes back to what I was saying around language models, generalizing patterns in language and figuring out most likely output based on a given input. If we give it examples of what we're looking for, we're going to give it the best chance of giving us what we want. So if we go back to the file, copy all of this in, again, I've just added this describe function here. I'm going to run the script again. We should now see the language is returned exactly in the format that we were looking for. So really, really. Again, really, really, really cool. I think we lost Ray for a second. He's probably answering the door. Let's see where we're at next. I think that's all we wanted to go through on Zod. I may stop for questions here, if anyone has any questions, but I don't think I can see the questions. Well, we'll crack on with the next section. And this next section introduces, brings us into something people, and I've liked to call Invisible AI. And this is, as a concept I introduced earlier, these features that are backed by or powered by AI that just make your application better for your users to use without them even knowing it's really AI. So these are things generating titles for linear issues, as I was mentioning. By way, Ray, I don't know if you want to, if there were any questions about the last section. I kind of just stopped over it. I'm happy to answer any questions if there were because there was a lot there. Oh, no. I think there was, yeah, if anyone has them, please drop them in. I think people were just really excited about the, you know, this is GPT4 Mini. It's really good with all languages. is, yes, that's why it's really kind of nice to experiment with these different models and kind of take a look at these different things. And as far as other people that were saying things, the people are pretty excited about just building with the frameworks and people who come with a Python background. You know, the water is warm over here. And it's kind of one of the reasons why, you know, I picked Next.js. It's , oh, this is a really detailed framework. There's a lot of stuff you can do. I want to build better experiences on the web. And, you know, I kind of want to, you know, really check. to start to deliver these types of things and think about things differently than I did before. And so how nice to have a really simple framework to describe, you know, data structures and then give it just some basic descriptions to extract out these little pieces of data. And this are just some of the beginning nuggets to, you know, start really exploring as a developer and kind of getting into those documents. And there's a lot of rich documentation there. And it's really, I'm really excited to have, you know, Nico on the stream today because it's for me to learn something, I learning, you know, in person or going to someone and just sitting in their office. And it's just the way that I've learned growing up. And so by joining the stream and doing this here, it's kind of we're hanging out together. You get to ask us questions. You know, we're live. It's so cool. And it kind of felt that way at the conference I can talk to people, share my experiences, and kind of, you know, talk back and forth. So just super honored to, you know, have you here sharing some of this stuff. It's literally what, interrupting you during the keynote live. You know, it's, it's so cool. I love it, man. Yeah, yeah. This is, this stuff, I said, it makes me, most of the time it makes me giggle. When you're playing around with this, no, that's not going to work. That's okay. Wow, all right. Move on. That was powerful. So fast. I, , these models, this will, this will go to if hopefully we have time to chat through the use case that you were talking about, but one of the, and that use case being working with, with models where, okay, we're giving, in these cases, seven support requests, but what happens when you're giving it seven million and then all of this kind of stuff and or seven billion lines of, whatever these things may be. And it helps to think back of these models aren't, we're not, we haven't hit AGI yet. and keep keep that in mind when you're working with these things. These models have their limitations. They're incredibly good when you use them intentionally. And that's what I'm trying to showcase in a lot of these scenarios is be realistic with what you're asking the model to do. If it's great at understanding patterns, give it something with the pattern that you're looking for it to replicate, and you'll most likely get out what you want to know. And we'll see as we're going through this idea of breaking down big tasks into smaller tasks that are likely, that the model is likely able to complete as we want, rather than saying one shot, can you build Facebook for me and ad stripe and payments and ad ad sense and all that kind of stuff that is not very reasonable that even I would, I wouldn't be able to do that. So asking a model to do that and being disappointed when you've got to approach a lot of these use cases, knowing where you can leverage the best part of it. And that's what I want to show, funny enough, in these next few little projects. Let's cook. Yeah, so if anyone's watching the stream, definitely give the thumbs up, share this with some folks, you know, and prepare any questions that you may have because this is your chance to ask them live as we go through them. So yeah, I'm really excited for this next section here. Cool. So this next section is going to be focused. I think we might just do one of these, just knowing time, we'll do one of these invisitions, and then we can move into the chatbot because I think a lot of people the chat bot a lot, and then we can leave time for questions. But I do think this invisible AI stuff is really cool. So the feature that we're going to focus on right now is something that I think everybody wishes, was in every application, every social application they use. And that is, I think everyone has once had that scenario where you come back to your laptop, you look in Slack and there are 6,000 messages that have come in since you left your machine. And you'd love for that one thread that is 300 comments long to get an idea of what happened, what do I need to know, and what's going to happen. And I saw Brian over at Campsite build this feature into their platform and do an awesome demo on Twitter showing how this wasn't marketed at all as an AI feature. It was just , oh great, now you get summaries of comment threads if you want. And so we're going to try and replicate that feature as well. So the first thing that we're going to do is run the dev server. And we can use that with PMPM and run dev. Head back to the browser. I'm going to run these side by side, so hopefully we can get things looking big enough. Oops, that's not what we wanted to do slash summarization. It's this one right here. So what we've got here, naturally, is a comment thread that has been generated by AI, because why not? And in this comment thread, it is a bunch of folks working at an agency that are giving or discussing an update on a certain client project. I think there are 20 comments in here, and I don't want to read that. But naturally, given I've done this demo, I can tell you exactly what's going on. They're talking about the client being happy with the changes that they're making, and some people are going to be working on some part of the project so that they can prep for a meeting that they're going to have. But again, we don't have time to read through all this. We'd rather just have one button that we can click right here. And when we click that one button, we should be able to just push out a summary. And what's cool here is that this is going to build upon what we've learned before. And that is, okay, we've got a bunch of kind of unstructured information. Let's send it to a language model and ask it to summarize what happened. So that's exactly what we're going to do. And because we're using XHS here, we're going to, let me see if I can pull this over. Yeah, that looks good. We're going to be using server actions here as our server-side environment. So before we are using just Node.js, now we're going to use server actions as our pseudo-ABI routes that we can call directly from our front end. And if you've never worked with Next.js or never worked with server actions, think of them as just this is server-side code that you can call directly from your front end. So they're really, really cool. I just play around with them. if you can. So the first thing we're going to do is create this actions.ts file in this summarizations folder. And in here, the first thing is we are going to market, we're going to use the use server directive to communicate to the next compiler that this is indeed a server action. We're going to then import the generate object. We want to generate some structured information here, and the reason why we're going to do that rather than just text is because of a problem that we saw before when working with No.js, when the model sent us back, , here is what you asked for and, , text asked. We don't want that. We just want exactly what we want. So we're going to define an asynchronous function, just any other JavaScript asynchronous function. It's going to take in an array of comments. We've typed this as any, just because. The type, we don't really know, but in your application, you probably want to type that. And then we call the generate object function and return the resulting object. So if you remember from before, from our previous things, this looks a lot the main function, literally the function called main in our other files where it's asynchronous. It calls whatever AI SDK function, and then it returns resulting generation. So I'm sounding a broken record here. We're using an AI SDK function. We've got to tell it what model we want to use and then what we want it to do. So that means giving it a model, this case GPD40 and then a prompt. In this case, we're saying please summarize the following comments. And then we have a separator. And then we have a label saying these are the comments and we stringify those comments. Finally, because this is a generate object, we're actually going to have to give it schema we got to tell it what we want it to do and so for our for for for our summary we want it to have a headline this is okay top level what happened context what prompted the the the conversation discussion points what did they actually talk about and takeaways who's doing what so I think all we have to do now is copy this paste copy this paste copy this code paste it in, and then we're going to be heading over to our root page. So our server action is done. Our AI code is done here. So we open the file browser, open the root page, and look at what we have to do. Well, the first thing that we're going to be doing is we're going to pull in that new action that we defined in the file right next to this. And then we're using a really cool trick for typing the state. So we're going to have to assume you were working with an asynchronous API here and you're fetching data and then you want to do something with it in your UI. You're going to fetch it and then probably set the state locally so that you can then send that as props into another component. And we're going to do the same thing here. Typing, I'm actually, even though I kind of sounded I don't , I love TypeScript so, so much. And this is one of the coolest type script little tips, which is there are two utility classes you can use called awaited and return type. And if we pass in the type of the generate summary action that we just passed in, it will literally type this state as exactly whatever you return from the action. So if we were to change our schema in the generate object, that would obviously change what is returned in the result. object, and that would update our type of our state automatically. So this is just really handy for, for as you're building your applications, not having to, to manually type things everywhere. And I'll show you. I that explanation. I think there are people who are new to type script and don't understand kind of what that means. And then sometimes you explaining this was super helpful because, , they would probably just go create, you know, or have the AI go create another type file, you know, and put that in your types. So then you're managing a Zod schema with types. And it's , no, that's already built into your Zod schema. And then with the server action, when you're pulling that in. So if you see here, we created in this action, right, this resulting object is of type, this object that has a headline string, contact string, discussion point string, takeaway string. And if we go into our action, all we're doing when, or sorry, into our page, when we're passing this, we're using a generic here. And generics, long story short, they allow you to. modify the type effectively that will be returned. It can be used under the hood, but you can think of it just as a way to decorate what is going to come out. And in this case, we are passing in the return type of that generate summary action. And what that gives us is this summary state locally in our component that is exactly typed to what we had in the object. So just to prove this, if we got rid that takeaways and we went back here you can see that summary now doesn't have that that takeaways type in there so this is really helpful as you're building and iterating obviously changing up this this schema a lot to not have to change it up in a lot of places and you don't have to completely understand generics that you could just copy that code and reuse it don't don't be too intimidated by it it takes a while trust me I'm years into trying to figure it out and I now feel I can, I get it and I love it. That's amazing. Yeah, sorry. That's a tangent. Yeah, yeah, this is so cool. This is why we do these streams. It's just kind of fun to get in more detail and talk about these extra stuff. So, yeah, I love geeking out on this. So what we've done here, yeah, all we've done here is we're just, we've created a new state variable to hold the summary that we expect to be created from the action. The next step that we're going to do is we're actually. going to call the action. And what's pretty cool here is, again, this is an on-click handler that you would use in React, where you're literally, it's asynchronous, we're setting the loading state as true, we're then generating the summary with that action. We're awaiting the result, and then sending that result into the set summary state call to update our state to the result of the, to the result of the generate summary action. So this could have been on three lines, but it's just on one line for brevity. But importantly, we're using it any other asynchronous action that we would, we don't have to, we're not thinking about streaming yet. We're not, this is just a normal asynchronous function. So cool. We've now generated that summary, updated the state. The last thing we're going to do is we're going to want to render that in the UI some way. and we have a summary card component that we want to, after we click the summary button, it should take the summary that's been generated and pass it in as props to this summary card component to show our summary. And this summary card is something I created on V0 very quickly. And if we look at what it accepts in, it takes in a headline, discussion points, takeaways, and context. So the same thing that we had defined in our action. So I think that's all we need to do here. We can save both of those files, head back to the browser, and click the summarize button and see what happens. Moment of truth. -oh. Yolo, everyone. This is happening. This is going to work. Yeah. Do you have the internet? Is your key still working? Okay. Oh, wow. There we go. So now we've got a summary. But it's not very summary-e. , it's pretty long. It kind of feels as long as the thread itself. So this is where we head back to our best friend Zod and our best friend the described function with Zod, and we give the model a little bit more context on exactly what we want for each of those keys. So let's head back down and see what we're doing. We're going to add descriptions to each of our keys, both telling the model what we want, as well as giving it an indication of how much of it we want. Now, Zod has some constraining functions dot length, so we could constrain the number of characters that we want. But these are guidelines rather than actual constraints. I just want the model that when I say max five words, it's going to interpret, okay, this isn't a lot. This is a little bit. Whereas when I go below and say, what is the relevant context that prompted the discussion, max two sentences, it knows, okay, that's a bit. it longer. And so that's the theme throughout. We give it some more context, but we also give it an indication of the length of what we want it to do. So I think that's all we need to do to update this. That'd be funny. So again, this is a change that we've made. We've just said exactly what we want and constrained in quotation marks, because it's not actually going to fail if it doesn't meet those constraints. It's just an indication. Oh, okay. So if I go back and just click the summarize button again, and maybe I'll, you can see the resulting summary is a lot more what we were looking for, right? Wow. The headline, this is client feedback in next steps. The first point, the team discussed the client's positive feedback on the latest design. They address the need for an updated timeline and software license. And this is the thing that blows my mind is that we have in this described function for the takeaway, we say what are the key takeaways and I said literally two words include names and what it's pulled out is exactly who is going to be doing what and that to me is just air horns yeah right that's what you want when you see when you see all of this and yeah we could we could even have it add an emoji as as is it matichick yeah mattace yeah we could have it let's do that that'd be so Cool. Can we do that? Let's have some fun. Why not? I mean, this is the whole point. Let's do it live. I don't know. I was going to describe this and say the emoji that best represents the summary. I don't know. copilot, take the wheel, right? Yeah. And now the only thing that we'll have to do is go back into our summary card. And we're going to take in an emoji, which is a string. Nice. Emoji. This could be easier if I, , did that same trick here. Oh, I see. And then let's add that emoji here. This might fail right now because we don't have. Insert air horse sound. Let's go. Yeah. Oh. There you go. We got a phone. We got a phone. I don't know. It's not, it's not really the best. But you can see how this kind of thing, how easy it is to iterate. emoji, we added it in two seconds. So it's really, really powerful. iterating on these things. Yeah, I think that's pretty fun as far as, , how you can think about things and just how you can actually use these types of ideas. And you have free reign. I mean, this is, you know, just watching you do this is just, , there's just so much knowledge distilled down into this, even just from your demonstration. It's , okay, how would you add the emoji? Where do you go in these types of things to add that? And that kind of the big advantage for me of you know to go from building from Python apps to the next JS apps to think about user experiences right So you know you have all this data I want to do something cool And then how do we get that data and then actually extract it out into a user experience here So then it makes it more fun. You want to have your app, you know, branded in that sense. If your brand is emojis or certain types of things, you can do that. And provide those types of experiences that just kind of make it more feature rich, which is super awesome. And yeah, we're just, I mean, we're literally just scratching the surface, I feel here. This is just amazingly cool. Wow. Yeah. And what I want to emphasize, now that I think about it, I've said I've wanted emphasize everything because there's a lot to think about. But the fact here is that AI can be very scary from an app development perspective because it's this black box that you don't really know what's coming out. But with this workflow, you're able to bring some level of determinism to both the expectation of what's coming out, but also the building process. So we wanted to add an emoji and you add it in your Zod schema. And then you pass it in as a prop just you would pass in discrete data that you were getting from an API. It's really, once that clicks, your iteration process as you're building this application. becomes really fun. I can't think of a better word to say other than fun. And you can build some really things that you never would have thought were possible just by adding simple keys. So anyway, I'm waffling on this because I really it. Well, I think where it leads us into is kind of the next step, with the chat app. And then also maybe some generative UI stuff in the future is that, okay, so now you have something. You can define a system. where you send it off to an AI process of some sort, it does its little magical things, and it comes back, and then you can have some deterministic thing to draw UI for you. And so just think about all these different things. imagine if it came back with a fire emoji or three fires or five stars, or, you know, you can just give rating systems kind of as things go. It's , how challenging is this problem? You know, oh, this is three, you know, three stars or three fire emojis. And then, you know, instead of ranking from low, medium high, you could just literally just do emoji based, you know, give me three emojis with fire if it's a really high priority. And you know, it could just, you could just have fun with it. And I feel that's kind of where you're going. And it's beyond, you know, beyond the language, but then it's in a structured way where it's, it's repeatable. And going back to the first part of this, you know, it's starting off with those providers. So in the AI SDK, how you can then select multiple providers. You can do two different runs if you wanted to do at the same time. And then do that same process and return the results, you know, two different results. and then you can compare them on the screen. You may have seen that if you're ever chatting with chat GPT, you'll say, hey, this result or this result. They're probably doing the same query, you know, and they've returned back and showing you side by side so you can compare it. So just kind of getting used to think about these things as they're being built. It's cool. And so keep in mind, we've also only looked at two functions in the AI SDK so far. We've looked at generate text and generate object. There is, there is, there is actually our own. only two other core functions and the only difference for those two other core functions, there was stream text and stream object. And you probably guessed it the difference is that they allow you to stream the information or the data back, which is really, really powerful. And what we're going to look at next. So I think we're going to skip over the, I had one other invisible AI, but we're going to skip that and we're going to go right to the chat thought so we can start looking at building a trappot and then probably the most powerful primitive which taps into i think matichex agents we're going to look into tools or or function calling as a lot of providers to call it where things really start to get pretty nuts where the model interacts with with the outside world very cool sound good sounds good let's go let's cook all right let's cook so Excuse me. We're going to build a chat bot now. So let's head to back to the editor. We can close all of these files off. And I'm going to head into this fifth one here. So what we've got so far, the dev server is still running, just Next.js app. And we've got a route at slash chat. And in this route, let's see what we've got. If I can type slash chat. All right. Very, very compelling start to our chatbot. I think this is an H1 maybe with, no, it's just a div. It's a div that says chatbot. And that's my TED talk, everybody. That's how you build a chat bot. No, so we're going to go away from server actions for this part of the demo, and we're going to look at route handlers. The AI SDK ships a, has a few libraries, but has a powerful primitive for building chat interfaces on the front end. And it's called Use Chat. And Use Chat is a hook that manages, well, actually, I don't want to get into that complexity yet. Let's first think about what is a chatbot. You've used, probably used chat chpt. You've used Claude. The rough process that happens is you've got an interface with a text box, a list of messages above. You send a message. Claude is thinking or something, V0 is thinking. and then a response is streamed in incrementally, and you're seeing these characters, tokens, as they're coming in to the screen. So this introduces a new primitive for us to look at, and that is streaming. Now, streaming is really, sounds kind of simple, the face of it, but it's really complex under the hood in the sense that you are opening a request with a server, And then you're just starting to send a response back. And when we're thinking about adding language models to the picture, it's not the most straightforward thing in the world. But we've built this use chat hook that abstracts a lot of the complexity of handling the streaming. And it's built from the ground up to be working with route handlers or API routes. And one thing to stress here is that use chat works across all major UI. framework. So we're using Next.js today, but it works with React, Svelte, Svelk kit, Vue, Nux, solid, all the logic and the way that we build this is applicable across all of those frameworks. But what we're going to start with are the route handler or the API route. And this is where, this is our server-side environment where we're going to call the model. We're going to ask it for something and then we're going to send that response back. So let's create our first file. Now this file is going to be in this root directory. It's going to be API slash chat slash route.t. And in here, we've done this for Vurcell. Versailles as a serverless platform has functions that run for specific period of time as they're invoked. And by default, on the hobby plan, the duration that they can run by is 10 seconds. And a lot of the time, AI models, language models, need more than 10 seconds to generate response. And so on the hobby tier with Roussel, you get up to 60 seconds for free. And so you can set that with just using this export cons, max duration. I tend to set it to 30, just to , it's a nice middle ground. And so that's what we're doing here. And then with Next. You can define a post, a post API route at a at a given route. So in this case, it's at i. slash chat. And then by defining the action that you want in all caps. So in this case, we're exporting an asynchronous post request here. We are then going to remember when we think about building a chatbot, we're going to have messages, right? And those messages are going to be coming over the wires. So we're going to pull those off of the request body. And then we are going to see the, stream text function for the first time. So this stream text will look very familiar to generate text in that we first define the model that we want to use. But we're now seeing, instead of defining a prompt, we're seeing a messages key. And we can use either messages or prompt depending on the use case. In this case, because it's a chatbot, we're going to be wanting to send in a list of messages rather than a long string with all of the messages concatenated or just a single request. I mean, actually, we're using this convert to core messages function. This is no longer necessary, which is really cool. I think this is using old dependencies in this case. But if you're building with this right now, with the latest version of the SDK, you won't need this convert to core messages utility function. But we'll leave it in for now. The very last thing, which is usually quite a complicated consideration, is , okay, great, we've got our streaming text response from the month. how do we prepare a route handler to actually send a send all of that in a streaming response? And this is where this awesome function comes in on that response, which is the two data stream response function, which will literally convert that into a data stream response that we can then pull in on the front end. So a lot of complexity just abstracted away, but still giving you the power to do with it as you please. So we are going to add that to our route handler and save the file. Now, next up is actually using this use chat hook. Because we're using a hook, we are going to mark the page as a client component. We do that with the use client directive. We then import use chat from the AI React package. Remember I said that if you're using view or solid or Svelte, you just import this from slash Svelte or whatever. UI framework you're used. Then one of the really cool things that UsChat does here is that it manages the the state of the conversation for you. So in this case, we're going to pull out this messages object, which is just a stateful variable that as we send a new message, it will be appended to that message's array, and then as a new message is received from the server, it will be incrementally streamed in but streamed into that messages state which is really really cool so the way that we would use it is that we destructure it from the hook and then we map over those messages and then we check if the we render a certain markup depending on whether it's the AI saying it or whether it's the the user saying it so that's rendering our messages and then the last thing right we need a form. We need to actually be able to have a way to send something to the model. And what's really cool is that the use chat function abstracts all of this away as well. So you can destructure the, I mean, give more space, the input, the input value, the handle input change function and the handle submit function. And we just attach these to the on submit form handler and then create a new input where we also attach the onchange handler. and the input value. And it will manage both updating that state and triggering the form submission and appending new messages as well. So if I copy this over, back to our page, our beautiful chatbot, and now we head back, we should see something that looks kind of a chatbot. So we could say, hello from Next.js. Wow. And just that, we've got our kind of chat GPT moment of a streaming response in. So really, really quick, really easy. But obviously, this is not what you're shipping to production. But I want to take a bit of a pause to look at something that we haven't looked at yet, which we mentioned very briefly. And those are system instructions. And this is these character instructions. that you can give to the model to influence the way that it's going to behave. And these are really powerful. A lot of the times you might be using these to keep the chat bot with your brand to say you can only talk to people this, you can only do that. But there's a really cool example in here that I want to, I'm going to skip over to this example, which is I'm a bit of an Apple fanboy and an early, , computing history buff, let's say. And I was just playing around with the system instructions. And as a model to say, basically given the instructions, you're Steve Jobs, assume his character, both strengths. Actually, let's just copy this in so we can see it in the editor. It'll be easy. Yeah, here we go. So you're Steve Jobs, assume his characters, both strengths and flaws, respond exactly how he would in exactly his tone. It is 1984. you've just created the Macintosh. So three line prompts or three line system prompt change. And let's do something asking the model, who is Bill? And I this because if we asked chat GPT who is Bill, this is basically, this is Claude, but it's going to say , oh, okay, it actually did pick up Bill Gates. I wonder if you asked it a lot of times. If we got rid of, let's say we removed better tests is if we remove this system prompt here and ask who is Bill. Bill can refer to a number of people, right? Model kind of doesn't really know. But the moment that we add this system prompt back in and we say, who is Bill? Or we can also ask who's John, John Scully? Bill Gates. He's the guy running Microsoft and he's been a bit of a competitor to say the least. We've had our differences, but he's undeniably smart. , it's crazy. , right? It's in his tone. It's in his voice. That's right. It's got his, I'm sure he said a lot of this kind of stuff. And so you can have a conversation with Steve Jobs. , , , tell me about John. Let's say something. John Scully. , what about who's John? Yeah. Yeah, who's John? See if it's, yeah. With, with the age or with. up the H. I think it's with the H. It would be it. But let's just say our relationship has been complicated. We have different ideas about where Apple should be heading. And it's been a bit of, and this is crazy because this would be before, , before the outsting, right? Right. So you get this character. You can, you can, you can. talk to. And this is more of a fun anecdote of what you can do with these models, but you should play around with it because, I mean, imagine you've got Steve Jobs that you can just ask questions to. Obviously, it's not, but I think it's incredibly powerful and to show you the power of system instruction, who's in this case just three very clear instructions of what it should be doing. But anyway, that was a tangent off to the side. We now get to move to something a bit more interesting. And we're going to remove that system instruction. We're going to ask the model to do something that I don't think it's going to be able to do. And that is asking something about the world. So let's say what is the weather in... Where are you in Hawaii again, remind me? Kailua. In Kailua. Did I spell it right? Yeah. No. Kind of? Yeah. Yeah, that's perfect. Okay, it's still, I don't save my route. That's the problem. Oops. What is the weather in Kailua? So I'm unable to provide real-time weather updates. For the current weather in Kailua, I recommend checking a reliable weather, website, or app Weather Channel. Not super helpful. And this is where we're going to introduce to probably the most powerful primitive in AI right now. and this is this idea of function calling, of tools, tools as we call them in the AI SDK. And I want to take a second just to explain what tools are in very simple terms. The problem that we have here is that the model has an incredible amount of training data, but is unable to access the outside world or access information that's happened outside of its training data. But the model is very good at this pattern recognition process and this ability to, to quote unquote reason, big quote unquote there. And what we do with tools is that we effectively give the model a list of capabilities. You can call them functions, tools, whatever you want to call them. And we describe what each of those tools would do. So in the case of whether we'd say, okay, you have a, get weather tool and this tool you should use if the user ever asks to get the weather in a certain location in order to use that tool make sure that you have the location and the unit of temperature that you want. And what's going to happen is that as the model is going through the conversation, if it interprets or thinks that the user wants to get the weather, rather than returning a text response it was doing before, it will return a function call, a tool call. And then it will pull out the exact tool or identify the tool that it wants to use. And then it will pull out those parameters that it needs to run. So the location or whatever you define beforehand. And then what the AI SDK is doing behind the scenes is it's pulling out that information. It's running that code in your environment. And then boom, the model. model has run code or has executed a function that, , to, to, to achieve some kind of task. And so that's how tools work in a nutshell. And we're going to define our first tool here to get the weather in Kailua. Sweet. Let's go. It's going to be fun. Yeah. So this is kind of what somebody is asking for. They're thinking about agentic things and other stuff. So, , you know, kind of start with some of these tools to see, kind of how they work in the SDK. And this is, for me, this is a nice way of packaging all the information in a live stream. So if you're kind of digging this, definitely hit the thumbs up here. If you're watching on YouTube, if you're on X, just share the stream with a friend, because this is going to be a great resource to kind of look back on. After the stream, I'm going to have an AI process, actually going to go through and generate timestamps for us. And so that will be in the description below as well. So, and yeah, thank you. and go ahead and let's go and get some weather up in here with some tools. Let's get some weather. So this showcase is a really cool part of the AI SDK and this very considered API in that every level of complexity that we move down builds upon the previous primitives or concepts that we're used to working with. So adding a tool to stream text is adding. a new key to the stream text function. It's not approaching the way that you call the function in any different way. It's just now providing these tools to the model. So the way that we do that is we define this key, this tool's key. This is a record or or an object where you define the name of the tool and then inside that tool we're going to have to define a few things that we'll see in a second. Now two important things to note here. One, the name of your tool is very important. of this as part of the description of the tools. So get weather is pretty clear about what this tool is going to use. So if the user asks, what's the weather in Kailua, even without a description of what the tool does, there's a fairly good likelihood that it's going to pick get weather to achieve the task. And the second thing is this tool utility function that we're importing from AI. We'll see this in a second. This is not necessary, but this when we're going to define exactly what we require for this function to use. It means that when we define the function that gets called, we have full type safety between the two. So not necessary, but it helps. It gives you a better DX. So we defined our first tool. We are going to have to give it a description. So in this case, we want it to get the current weather at a location. This is what the model is going to use alongside the name of the tool to say, okay, hey, yeah, this is, I need to call this tool right now to solve this. task. Then we have to define what it is that the model actually needs to run this function. And this is introducing, again, our best friend Zod, so that we can define it in this type save format, but also a cool little tip. And that is when you're looking at this off the bat, I'm sure, based on how we already use this, we're not asking for the location, we're not asking for the weather in Kailua, by providing the coordinates. We're saying, I want it in Kailua. But something cool, we can use the model's inference ability. And if we're using a sufficiently large model that likely has coordinates within its training data, it is able to infer things latitude and longitude from the city without us actually ever having to provide some kind of conversion function or anything that. Now, use this. , I'm going to say you need e-vals. We're not even going to talk about evils in this because we don't have the time, but you're going to want to be careful with how you do this. But because this is a demo, I want to inspire you to play around with these parameters because it doesn't necessarily have to explicitly exist in the context the model has. The model can infer it from the context it has available to it. So remember, these parameters are what the model needs in order to run what's going to come next, which is the function that, actually gets run when this model, when this tool is called. So if we scroll down, the last thing we're going to have to do is actually define that function. And what we have here is an actual call to a weather API. This is the open Meteo API where we're pulling in the latitude, longitude, and city. And then we are calling this API, passing in the latitude and longitude, getting out the current weather, weather code. And then we are unwrapping it. And finally, we're returning the values that we want here. Really cool. Remember, in this execute function, this is just, again, a Node.js environment that you can run any code you want here. So we're literally calling a weather API and returning it. So with these, whatever it is, 10, 12 lines of code, the model now has access in a fairly, quote-unquote, deterministic way to getting the weather in a sense. certain occasion. Really, really, really powerful this. Wow. So I'm going to copy this and we are going to update our route. Great. No errors here, thankfully. That's the beauty of kind of live coding, but kind of not. And if we ask the same question again, let's say, what's the weather in Kailua? we should see the weather. But we don't. And I've kind of alluded to why we don't see the weather here. And that's because the model isn't returning a content or a response. It's generating a call. So we need to render that tool call in the UI. And the way that we can do that is by accessing what are called tool invocations within our front end. So I'm going to just replace this code here and we can walk through it very quickly. So remember before we were just rendering the message content. In this case, we're checking if there are any tool invocations. And if there are, we're going to render the tool invocations in a stringified format in a pre-tag. Otherwise, we'll just render message content. And so if we save, I'm going to do it side by side so we can kind of see it pop up. If we save, we should see that tool call now show up. So we've got the, we know that the result it's in, or the state that it's in right now, it's a tool result. We called the get weather tool. As you can see, , it pulled out successfully Kailua from the question. It also inferred the latitude and longitude. And then it got the temperature. I don't know if Ray, if you can, is it 26 degrees there? Oh, yeah, yeah. It's two degrees in London right now. I'm so jealous. I am so jealous. For those in Fahrenheit, that should probably be around 78. Sorry? It's , I think that's around 78 degrees Fahrenheit, right? Yes. Let's see, 26C. Yeah, yeah. 78.8. It's cold when it gets under 70. The model now has access to the weather in Kailua as part of its message history. So it's kind of simple what we just did, but a huge, huge shift of our model has now moved from being the do-everything machine to really being this, again, I'm putting in massive quotes, this reasoning machine, but you say, okay, these are a bunch of tasks that you're able to do. Here's deterministic code for these actual tasks, these tools, these execute functions, and then you pass it back into the context of the conversation, and the model's pulling out whatever it needs to complete the next action. And so going back to, I think it was Mattercheck's question before on agents, this is really the format or the infrastructure that makes building agents with the SDK doable and really powerful. But I think we've presented, we've seen an issue, which is the model has an action. actually told us what the weather is in Kailua. It just blurted this information out. And that's not a great user experience. Come on, we're, we're, I'm at Versailles here. , you can't, that doesn't work. We're trying to embody the Steve Jobs, you know. Yeah, yeah, exactly. We got it, we got, we're product obsessed. And so to take this to the next level, what we'd want to do is say, have some logic on the back end or the front end. to say, okay, you've received the tool result. It's done. Send it back to the model alongside all of the previous context of the conversation to trigger another generation. And hopefully what's going to happen is the model's going to see, okay, the history of the conversation is the last two messages. Okay, I got a question, and then a tool call happened. Then there was a tool result. And so I can infer from there that I can answer. that I can answer the original question using whatever results came out. And so the idea here is creating these multi-step flows where the model is able to decide, okay, I need to do more. I haven't fulfilled the request, the initial request. And this, as we were just talking about it, is kind of complicated to think about, okay, when would you do it, how would you do it, what prompts? sending something back. And what's awesome with the SDK is that enabling these multiple steps is as easy as as easy as adding a single configuration key to your use chat function. So if we head back to use chat, we're going to replace everything here. As we can see up here, we've just added this max steps. If I zoom in here, you can see we just added that max steps. If I save, we're going to refresh the page here. pull things out a little bit and say what is the weather in Kailua? We should see this tool result come in. And then we have the, it was so fast, but then the model took that information, said, okay, the weather is in Kailua is 26 degrees Celsius with humidity of 74%. So really cool. Now the model is using that to complete the conversation and to complete the task. Now a lot of people ask, I'm not going to get too into the weeds here, but what do you set max? steps too, why 5, why all of this? This is something to play around with. But don't be super overwhelmed or overthink this value. This is thinking this will go into how you're architecting what tools you're providing to complete what tasks. But in this case, we literally just needed two steps to get to the outcome that we needed. If our model also had get the itinerary in a place based on the weather, that would be three steps, but you could have collections of tools that are all doing different things and not necessarily need 50 steps to get somewhere because you're only needing up to three or four steps max to complete a task. So that's the way to think about it is the maximum number of steps you would need to complete any given task based on the tools that you've provided. That's fantastic. So the last and the most exciting thing is how can we make this look better? And we're going to lean on, it sounds a little bit hacky, but it's really cool, and it's how a lot of this works behind the scenes, is that at the end of the day, we're using tools to interact with the outside world. What are tools? They are JavaScript functions. They're functions that take in an input, return an output. What are React components? They are functions that take in an input and return an output. And so that's exactly really what we're going to be doing here is we're effectively going to be mapping tool calls and the input that they take to React components and the props that they take. And so in this way, we're going to define a React component that simply takes in as props, temperature, weather code, humidity, and city, and we're going to say, if the state of the tool call is result, render that component. This is really as simple as that. So the way that we're going to do that in our UI, I've generated beforehand, surprise, surprise, with V0, a weather component that is going to, exactly as I just said, map through those tool invocations and say, if the tool name is get weather, and the state, if it's in a result state, render the weather component, pass in the result as the weather date, otherwise, don't render anything. And so by doing this, let's copy this over. And again, we want to have the cool effect of seeing this change in real time. I'm going to, wait, I'm going to not save yet. I'm going to zoom out a little bit. So we're going to see this change before our eyes. If I save, we should see now this really that. that tool result rendered into a React component. And that's how you can approach generative UI in your applications, is really thinking about, okay, what information do I need to get? How do I get the model to get that? And then how do I pipe it through, map it to the component within my UI? So really cool. I have one more part to this that I'm happy to go through. I don't know if we want to do that first or it's a very small part, but it's adding, allowing, adding interactivity to these components so that it can then... Yeah. Why don't we just go through it and then we could talk through it after you put in the code or whatever you do in and kind of just, you know, kind of just, I know you move pretty fast and then we can kind of walk through the, what that looks and stuff. Yeah. Let's do it. Let's do it. So the idea here is, and you can see this with many applications that you be using Claude or V0 or. any of things that, you have these components that are generated within messages, and they can have interactive elements in it. So in this case, , this is Kailua, this is talking about weather, we could probably add something here to say, maybe a button that gets the weather in another location. Because this is a weather component, the action is likely to do with either Kailua or weather. But this allows us to add that interactivity to components, because at the end of the day, it's just a React component. There's nothing, this component is, is, let's actually, we're going to open it in this stat. I'm going to copy in this code so we can see what's happening. I'll zoom in. We have our weather component. In this weather component, we, the props that, the interface for the props that are coming in, it takes in those, those props. And then it's literally rendering based, it's rendering. it's rendering different elements based on this component. There's nothing AI related about this component whatsoever. It's just the data that came in was sourced and called by a model. So I'm going to copy in this updated code and we'll see, oopsie. Wow. They're right. So if someone is using V0, they can maybe copy in their Zod scheme and saying, hey, I have this data that comes back for my AI model. Can you build a weather component that will track? these or you can even ask V0 to say hey that what I did okay that what I did for this component I literally said I want the Apple stock weather emoji and I copied in my TypeScript type that was on the state. And in one shot, I got this component. And then I installed it. And then I had it as part of my chat bot. Let's go. This is the way that we think of generative UI is that. The model, while not necessarily generating exactly the parts of the interface, it is literally showing components as without necessarily, without guidance. it's doing it in this agentic, in this agentic fashion. And what's really fascinating about this for a product design perspective, chatbots are really scary because when the model is generating text. You don't know what it's going to give you. But as soon as you now say, okay, if I want the weather, this is how it should be presented in the UI. Not only is it more interactive, but we've now limited the risk of what the model is going to say back to user, and this could be branded exactly how we wanted it. Wow. So that's cool. That's really cool. And then the last point that I'll mention on this, is really cool is because these are just React components, you can use your existing application components. So that means you add, you have your existing design system, you have your existing registry of components. If you want to build a chatbot that is able to leverage and use all of this, you just make them available to the model via tools. And now you have this chatbot that is using your existing application interface to perform tasks that are coded directly, into the components, but are now, , presented on demand to the, to the user as they ask for it. So this is, I think a lot of where we're thinking about how generative UI can exist within applications. Is this almost unified interface for using an application when you know exactly what you want to do, change my credit card, a book of flight, whatever it may be. You don't want to be navigating around everywhere you say, I want to go to row. And you have you have that all sorted for you. That's amazing. Before I forget, I'll add this very last part to the component. And what we're doing here to add interactivity is that we are first adding an ID to the use chat hook. And this allows us to similar to how you do with React query, you tag queries so that you can change them, invalidate them elsewhere in your application. By doing this, we can trigger subsequent messages to the conversation from other parts in our application without having to prop, drill or have a context that we're sharing around our application. So the very last thing that we're going to do is head into that weather component. And I'm just going to replace it all here and show you what we've done. We have imported in use chat. we've called the use chat hook, passing in the same ID, and then we're destructuring the append method. And append is really cool because it allows you literally just to add a new message programmatically to the conversation. So in this case, I said, we want the model to generate the weather in a random location. We can add a button with an on-click handler, have some component state. So we'll know whether it's clicked or not, just any React component. and then use this append function to add a new message to the convert saying, saying the user said get the weather in a random location. So if I were