EssayApr 1, 2026

Why the AI You Use Every Day Works (Or Doesn't)

A practical guide to context, memory, and setup: the real skills that separate people who get work done with AI from the ones who stay frustrated.

Originally published on LinkedIn

I want to start this with a story that's a little embarrassing. A while back I started a project with an AI agent. I was excited about it, had a rough idea of what I wanted to build, and I just jumped in. Started prompting, started asking it to write things, started iterating. A few hours in, the whole thing was a mess. The agent was giving me answers that didn't fit together. It kept forgetting decisions we'd made. It was suggesting approaches that contradicted things I'd told it an hour earlier. I found myself re-explaining the same stuff over and over. Eventually I closed it, frustrated, thinking the AI just wasn't smart enough to do what I wanted.

The AI was fine. I was the problem.

I never actually sat down with the agent before I started working. I never told it what I was building, how I wanted it to think, what I wanted it to remember, what kind of help I was looking for. I just started typing. And the result was exactly what you'd expect from a smart assistant who'd been dropped into the middle of a project with no background on what was going on.

This is the single biggest thing I've learned from using AI seriously over the past year or two. The difference between AI that works and AI that doesn't is almost never about how smart the model is. It's about whether the person using it understands two concepts most people never get taught: context and memory. And once you understand those, a lot of the weird stuff that happens when you use AI starts making sense.

That's what this paper is about. It's for anyone who uses AI in their job or their personal life and sometimes wonders why it works well one day and goes sideways the next. You don't need to be a developer. You don't need to know how transformers work. You just need to be curious about why the tools you're using behave the way they do, because that curiosity is what's going to separate the people who get real value out of AI from the people who stay frustrated.

AI, LLMs, and agents: what are these things?

People use these three words like they mean the same thing. They don't. And the distinction actually matters a lot for the rest of this paper, so I want to spend a minute on it.

AI is the broad term. It covers anything that tries to mimic intelligent behavior. Spam filters are AI. Netflix recommendations are AI. The spellcheck in your email is AI. When someone says "AI," they could mean almost anything that uses an LLM.

LLMs are a specific kind of AI. The letters stand for Large Language Models, and they're the engine behind most of what people mean when they say "AI" today. Think of an LLM as the raw, bare-metal intelligence. It's been trained on a huge amount of text, and its one job is to take in some input and produce output based on what it learned during training. That's it. An LLM on its own has no memory between conversations. It doesn't have tools. It can't do anything except generate text. It's the brain, sitting there, waiting to process whatever you give it.

Agents are the wrapper around the LLM. An agent is what takes that raw LLM and turns it into something that can actually do work. Agents give the LLM tools (so it can search the web, read files, send emails, edit code). Agents manage memory (so the LLM remembers what happened last session). Agents load skills and extra context (so the LLM has the right background for the job). Agents run the whole thing in a loop, where the LLM can decide what to do, do it, check the result, and keep going until the task is done.

The clean way to think about this: the LLM is the engine. The agent is the car built around it. The engine on its own is useful, but it's not a car. The car is what you actually drive.

Here's the part most people don't realize. When you use ChatGPT with web browsing turned on, you're using an agent. When you use Claude with the memory feature enabled, you're using an agent. When you use Copilot, Cursor, Claude Code, Devin, or OpenClaw, you're using an agent. Every one of those products is an LLM wrapped in something that manages context, memory, and tools on your behalf. A lot of people have been using agents for months without realizing it, because the products don't advertise "you're using an agent now." They just quietly do agent things in the background.

The flip side is that if you open the raw API for Claude or GPT and send a single message, you're just talking to an LLM. No tools. No memory. No loop. That's the bare-metal experience. You're unlikely to ever use AI that way unless you're a developer building your own systems.

This distinction matters for everything that comes next, because context and memory and skills and tools, all the stuff this paper is about, happen at the agent layer, not the LLM layer. The LLM doesn't manage its own context. It doesn't remember things between sessions. It doesn't decide when to use a tool. The agent does all of that. So when we talk about getting better at AI, we're really talking about getting better at working with agents.

One more thing while we're here. Not all LLMs are equal. Claude is famously good at following complicated instructions buried inside long documents. GPT tends to be better at quick pattern-matching. Gemini can swallow huge amounts of information but has its own quirks. If you switch between different tools and something that used to work stops working, it's probably not you. Different LLMs behave differently, and different agents wrap them in different ways. Getting good at AI partly means learning the personalities of the tools you use.

Prompting: useful, but not the whole story

You've probably heard of prompt engineering. It's been one of the biggest topics in AI for the last two years. There are courses, books, Twitter threads, the whole thing. And prompting is useful. It absolutely matters. But I want to make a case that it's not the most important skill anymore, and that understanding context is.

The basic idea of prompting is that how you ask the AI affects what you get back. That's still true. A vague prompt gets a vague answer. A clear prompt with examples gets a better answer. There's a simple framework that covers most of what you need:

Role. Tell the agent what hat to wear. "You're a marketing strategist helping me plan a launch." "You're a careful editor reviewing this draft." "You're a research assistant summarizing these papers."
Goal. Tell it what success looks like. "I need five headline options for a Facebook ad." "I need you to flag anything in this contract that's unusual."
Constraints. Tell it what not to do. "Don't use jargon." "Don't suggest changes to the pricing section." "Keep each answer under 100 words."
Output. Tell it how to format what it gives you. "Give me the answer as a bulleted list." "Put your reasoning first, then the recommendation."

If you use that framework, you'll be ahead of most people. It works across basically every AI tool out there.

But here's the thing. The newer LLMs are smart enough that you can be sloppy with your wording and they'll still figure out what you meant. What they can't do is read your mind about the situation you're in. They don't know what project you're working on. They don't know what you already tried. They don't know what you care about. They don't know what "good" looks like in your specific context.

That's the problem prompt engineering doesn't solve. And it's the problem context and memory do.

Context: what the agent can actually see

Here's the concept that unlocked everything for me.

When you're talking to an agent, the LLM inside it has no idea what's going on outside of what the agent has passed to it. It doesn't know your company. It doesn't know your project. It doesn't know what you talked about yesterday, unless the agent explicitly loaded that into the conversation. The only thing the LLM knows, in that moment, is what's in front of it right now.

That's context. It's the total set of stuff the LLM can see at the moment you ask it something. The conversation so far, the files you've uploaded, the instructions you gave it, the tool outputs the agent has pulled in, any memory the agent has loaded. All of that is context. Nothing else exists to the LLM.

Two things about context matter a lot, and most people don't know either of them.

First, context has a limit. Every LLM has a maximum amount of stuff it can hold in its head at once. This is called the context window, and it's measured in tokens, which are basically chunks of words. The newer models can hold a lot. Claude can handle about a million tokens, which is something like 750,000 words of input. That sounds like a ton, and it is. But it's still finite. If you've been working with an agent for a long time, or if you've dumped a huge document into it, you can run out of room. And when you do, stuff starts getting dropped.

Second, and this is the weird one, the LLM gets worse as the context gets bigger even before you hit the limit. A model with a million-token capacity is noticeably worse at 200,000 tokens than it is at 50,000 tokens, and noticeably worse at 50,000 than at 5,000. This has a name. People in the field call it context rot. Researchers have tested this on every major model and it's true across all of them. The more stuff you cram into the context, the less reliable the output.

This is probably the single most surprising fact about using AI, and once you know it, a lot of things make sense. The long conversation that started great and then got confused? Context rot. The agent that used to be able to follow your project and now seems to be ignoring half of what you said? Context rot. The sense that the AI was "smarter" when you first started talking to it? Yeah. Partly that.

The practical lesson from this is that keeping your context clean and focused matters more than cramming everything into it. If you're working on something important, don't just keep piling on. Start fresh conversations when the current one is getting heavy. Give the agent the minimum it needs to do the task, not everything that might possibly be relevant. Trust that less context, well-chosen, beats more context, poorly-chosen. Almost every time.

This is also where a concept called compaction comes in, and it's worth knowing about because it's happening behind the scenes in tools you're probably already using. Compaction is when an agent takes a long conversation, summarizes it, and effectively starts a new session with just the summary plus whatever's most recent. Nothing important gets lost, but a huge chunk of tokens gets freed up, and the quality of the output picks back up. Claude Code, Cursor, and ChatGPT all do this, usually without you even noticing. It's one of the most important things modern agents do, and it's a great example of how the agent layer is quietly managing context rot on your behalf.

Memory: what the agent keeps between conversations

Context is what the agent is seeing right now. Memory is what it keeps across conversations, sessions, or projects. These are related but separate, and both matter.

There are a few different kinds of memory worth knowing about.

Working memory. This is just the current conversation. When you close the window, it's gone. Unless the agent has been set up to save something, this is all the LLM has to work with. That's why ChatGPT doesn't remember your name between sessions by default, and why a coding agent that understood your codebase yesterday doesn't remember it today unless something was saved.
Persistent memory. This is stuff the agent can look up again later. It lives outside the conversation, in a file, a database, a notes system, somewhere. When you start a new conversation, the agent pulls it back in. This is how Claude's memory feature works. This is how ChatGPT's "memories" feature works. This is how Claude Code can remember your project across sessions. The key thing is that someone, either you or the agent itself, has to decide what's worth storing and then pull it back in when it becomes relevant.
Built-in knowledge. The stuff the LLM already knows from being trained. It knows a huge amount about the world, how programming languages work, how to write a memo, how to structure an argument. But it's frozen at whatever date the training ended. It doesn't know what happened last week. It doesn't know the specific details of your company. It can be confidently wrong about recent things.
Tool memory. This one's a little abstract, but it matters for agents. The agent remembers how to do things through its tools. It doesn't need to remember how to search your files, because it has a search tool. It doesn't need to remember how to send an email, because it has an email tool. The tools themselves are a kind of memory, but it's memory about capabilities, not facts.

When people say "I wish the AI could just remember stuff," they're usually talking about the gap between working memory (which vanishes) and persistent memory (which requires setup). A lot of modern agents are filling that gap in different ways. Claude has a memory feature. ChatGPT has memories. Cursor and Claude Code read project files automatically. Custom GPTs can be given persistent instructions. The specifics differ, but the direction is the same: persistent memory is becoming standard in agent products.

The practical lesson is that if you want the agent to remember something, you need to put it somewhere it'll be brought back in. Hoping the agent will magically recall your preferences doesn't work. Giving it clear, re-usable setup instructions does.

The real skill: setting up your workspace

Here's where everything comes together.

The best AI work I've seen, and the best AI work I've personally done, doesn't come from typing clever prompts. It comes from sitting down before the real work starts and setting up a workspace the agent can operate in. You give the agent a role, a goal, constraints, and the stuff it needs to know about your specific situation. You decide how you want it to remember things as you go. You think about what you'll need it to still know in a week or a month.

This is the step most people skip, and it's the step that makes everything else work.

Let me show you what this looks like across a few different kinds of work.

A marketer planning a product launch might set up their agent by giving it the product one-pager, the target audience description, the brand voice guide, and the launch timeline before asking for any help. Once that's in place, every request after that (write me a landing page headline, draft a social post, critique this email) lands in an agent that actually understands the product and the voice. No re-explaining. No generic output.

A lawyer doing case research might give the agent the client's situation summary, the relevant case files, a list of statutes in play, and explicit instructions on how they want sources cited. From that point forward, the agent is working inside the frame of that specific case, not giving generic legal answers.

A researcher doing a literature review might hand the agent the research question, the methodology they're using, a list of papers already reviewed, and a template for how findings should be structured. Then every conversation produces output that actually fits into the bigger project.

A developer working on a codebase might drop a markdown file at the root of the project that explains the architecture, the conventions, what to check before making changes, and what to avoid. Coding agents like Claude Code and Cursor will read that file automatically at the start of every session.

Different jobs, same pattern. In every case, the person did the setup work first. The prompts they write afterward are simple. "Fix this." "Draft that." "Compare these two options." They don't need to be clever prompts because the agent already has everything it needs to give a good answer.

Here's the mental shift I'd encourage you to make: stop thinking about prompts as the place where your skill shows up. Start thinking about setup as the place where your skill shows up. The prompt is just the request. The setup is what makes the request land in an agent that can actually help you.

One more thing worth saying. Good setup also includes thinking about what you want the agent to keep track of as it works. If you're on a project that's going to span more than one session, tell the agent to keep notes. Ask it to summarize where things stand at the end of each session. Have it maintain a running file of decisions you've made. This is how you get an agent that can pick up a project after you've been away for a week without losing the plot. Without that, every new session starts from zero.

Tools, Skills, and MCP: briefly

Three terms you're going to hear more often, and I want to give you the quick version so they don't feel like jargon next time. All three are things that live at the agent layer, not the LLM layer. They're how agents extend what the LLM can do.

Tools are what turn a raw LLM into an agent in the first place. A tool is anything the agent can invoke to do something in the real world. Search the web. Read a file. Send an email. Run a calculation. When ChatGPT browses the internet for you, the agent is using a tool. When Claude Code edits a file, the agent is using a tool. The LLM decides when to use a tool based on the task, and the agent actually runs it.

MCP stands for Model Context Protocol. It's basically a universal plug for connecting agents to outside systems. Before MCP, if you wanted your agent to connect to your company's Slack, or your Google Drive, or your internal database, someone had to build a custom integration. MCP standardizes that. Now anyone can build an MCP server for a system, and any agent that speaks MCP can use it. Most major AI companies support MCP now, including Anthropic, OpenAI, Google, and Microsoft. It's becoming the way agents talk to the rest of your software.

Skills are a newer thing. A Skill is a folder of instructions and scripts that teach an agent how to do a specific kind of task. Anthropic launched them in late 2025 and made them an open standard shortly after. The idea is that instead of cramming everything into your prompt every time, you write the knowledge down in a file once, and the agent loads it when it's relevant. There's a Skill for editing Excel files. There's a Skill for making PowerPoint decks. There's a Skill for writing in a specific brand voice. And you can write your own.

The relationship between these three is actually pretty clean. Tools are the raw capabilities. MCP is how tools get connected. Skills are how you teach the agent to use those tools well for specific jobs.

You don't need to master any of this today. But knowing the words means you'll understand what's happening when your AI tool adds a new integration, or when your company rolls out an AI system that mentions MCP, or when a colleague sends you a Skill to try.

What to do with this

I want to leave you with something practical. Not a summary. Just a short list of things you can actually do starting now that I think will meaningfully change how you work with AI.

And here's something I want to say up front, because it's the thing most people miss. You don't have to figure all of this out on your own. The agent is the thing you're trying to set up, but the AI inside it is also the thing that can help you set it up. You can literally ask it to help you build your own context framework. You can ask it how it thinks the memory should be structured for your project. You can ask it to write the first draft of your setup instructions. This is probably the single most useful thing I've learned. The tool helps you use the tool.

With that in mind, here's what I'd actually do.

Set up before you start. Before your next real project with an agent, spend a few minutes on setup. Tell it who it is, what the goal is, what to avoid, and what context it should know. Don't just start typing.
Use the AI to build your setup. Describe your project to the agent and ask it how it thinks you should frame the context. Ask it what it would want to know to help you well. Ask it to propose a structure for how you'll keep track of decisions as you go. The first draft won't be perfect, but you'll get to a good setup way faster than you would from scratch.
Let the agent summarize its own work. Before you close a session, ask the agent to summarize what got done, what decisions were made, and what's left. Save that somewhere. Next session, paste it back in. That's how you get continuity across sessions without having to remember everything yourself.
Keep context clean. If a conversation is getting long or feels confused, start a new one. Give the new one a short summary of where things stood. You'll get better results than you would by pushing through.
Be explicit about memory. If you want the agent to remember something, write it down somewhere the system will pull it back in. A memory note. A project file. A pinned instruction. Don't assume it'll remember.
Pay attention to which model you're using. If something stops working when you switch tools, the underlying LLM might be the problem. Different LLMs behave differently. Try the same setup somewhere else. See what changes.
Treat setup as the skill. The clever prompt matters less than people think. The workspace you build for the agent matters more. And the good news is you don't have to build that workspace alone. The AI is right there, ready to help.

Here's why I think this matters, and I'll keep it to two sentences.

Understanding context and memory is what separates people who use AI as a fancy search bar from people who actually get work done with it. And when you're working on something real, something that matters, something you'll need to come back to in a month and keep building on, the difference between those two kinds of users is going to be the difference between AI helping you and AI frustrating you.