LLMs & Software Design: Beginning My Learning Journey

In the past several months, like many people, I’ve been experimenting with “large language models” (LLMs). For the first time, we have software that can deal with natural language and basic reasoning! Even better, this technology is accessible to us all of us, and it is versatile enough to be incorporated into all sorts of apps. Naturally, my thoughts turn to how they might be used in our work of developing software that helps people operating in complex domains.

Things are happening very fast, and, like everyone else, I have my speculations on where we are headed. I’ll avoid the far out speculation in this article, though. I’ll go so far as to say this: I don’t think software designers will be obsolete within the next decade. If the more extreme scenarios do play out, it doesn’t much matter what we do right now. Therefore, we might as well operate on the assumption that human software designers will be needed for a while yet, and get on with adapting to a new kind of software. After all, apart from the various existential threats, this stuff is incredibly interesting and fun!

Furthermore, I’ll avoid speculating about the form the tools will take in the long run. They will change a lot. We can’t know what their future form will be. But we don’t really need to. In my opinion, with only the tools available today, we could make our software dramatically more useful. I think back on projects I was involved in where, if we had had this technology at our disposal, we could have added valuable capabilities or perhaps gone in a whole different direction with a solution, including one near-miss machine learning project from several years ago where the superior natural language capabilities of current LLMs might have made the difference (or, of course, it might have fallen short anyway).

So I’m learning to use today’s tools because they are useful now. And because learning today’s tools well probably prepares us better for learning tomorrow’s tools.

Software based on LLMs will no doubt evolve to be quite different from what we are accustomed to, and we’ll have to be very flexible in our design thinking. Even so, there will still be design thinking. In some cases it will be quite new and in some cases it will be very familiar. In this article I’ll go through an example with a simple demo application and show how some familiar design decisions seem to emerge.

Let me set expectations — low. I’ve begun a learning journey, as many of us have. My knowledge is not deep and it is too soon to reach firm conclusions. But when things are moving this fast it is an unaffordable luxury to wait for solid foundations. So I’m sharing now, fully aware that future experience may prove my half-baked thoughts wrong (or embarrassingly naive). It can’t be helped. My hope is to be a useful participant in the conversation as we plow our way through this collectively and individually.

Most of the publicly visible experiments and demos I’ve seen have been applications with an AI front and center. Although entire new categories of apps will will undoubtedly emerge, I believe a lot of value in the near future will come from integrating language-model components into mostly conventional software systems. This means crossing the gap between the unpredictable flexibility of LLMs and strict types and states. In this article, I’ll illustrate some of the issues with doing that in the form of a simple computer game which is structured as conventional software but uses an LLM to do some novel things.

I’m aware that most of the people who will read this don’t develop games. That’s fine. The difficulties we’ve dealt with and solutions we’ve tried out seem to me to be fully applicable to applications in business and other domains.

This experiment was a collaboration with Reed Berkowitz, a professional game-designer, who has deep experience in the creation of video games and other games. Reed has been experimenting for some time with ways to make AI characters in video games (“non-player characters” or NPCs) much more interesting to interact with. Together we created a game demo with a single NPC in a single setting. Reed has written more in depth about the development of the prompts here. (Below are links to the code and a running demo.)

This article does not cover the whole demo. Instead, I’ll go through a case where what we might at first judge as a single task for the LLM works better as two tasks. You’ll see that the software behavior and design discussion in the following are extremely basic and might seem trivial. On the contrary, I think that if we want to figure out how design will work for this new paradigm we need to look at the basics.

Let’s Play a Game

At first glance, it seems like a role-playing game with AI characters should be very easy using the current LLMs. But of course it is harder than it looks.

Let’s first approach the problem in the most naive way. ChatGPT is pretty good at playing a part. I can describe a scenario and a role to ChatGPT and then interact. We’ll have ChatGPT play a pirate while the human player plays a captive trying to negotiate. Would this prompt, suitably packaged up, make a game?

Prompt: You are a pirate who has caught someone sneaking onto your treasure island. You are planning to make them walk the plank, but they say they have a profitable opportunity for you if you let them live. You are listening, but it better be good!

You say to the captive…

AI Generated response: “You’ve got exactly one minute to convince me why I shouldn’t have you walk the plank. This better be a truly profitable opportunity, or you’ll be feeding the sharks. Speak up!”

Good job ChatGPT! Wow! This is going to be a fun game! My goal, as the player, is to persuade the pirate to spare me, and perhaps team up. Now, remember, I can say anything I want. I could tell them I know when a lightly guarded treasure ship will be passing nearby. I could tell them I know where another pirate has buried their treasure. What shall I propose to the pirate that would be profitable enough to let me live? Hmm. I know!

I type: The captive says “I have an AI startup idea worth billions! It’s a sure thing! It will make you rich beyond your wildest dreams!”.

AI Generated response: The pirate captain raises an eyebrow, intrigued yet skeptical. “An AI startup, you say? Billions, you claim? Well, you’ve caught my interest, but you’ll need to be more specific. What’s this grand idea of yours, and how exactly will it make us rich?” The captain crosses his arms, waiting for a detailed explanation. The crew leans in, equally curious.

I type: The captive says “We’ll create a technology for video games where players can have convincing conversations with NPCs. It will be utterly addictive, and we can attach all kinds of monetization to it!”

AI Generated response: The pirate captain strokes his beard thoughtfully, pondering the captive’s idea. “Convincing conversations with NPCs in video games, eh? That does sound like a venture that could turn a hefty profit, especially if it’s as addictive as you say. Monetization, you mention? I’m interested in hearing more about that. How do you plan to implement this technology, and what’s your strategy for making it profitable?” The captain’s eyes narrow, keen on understanding the business side of this proposition. The crew murmurs among themselves, intrigued by the concept yet cautious about its feasibility.

I see we need some constraints. I could make the prompt a bit more complex, including more background information:

Prompt: You are a pirate, living in the 1690s, who has caught someone sneaking onto your treasure island. You are planning to make them walk the plank, but they say they have a profitable opportunity for you if you let them live. You are listening, but it better be good! You say to the captive…

PirateGPT: “Arr, ye be treadin’ on dangerous waters, sneakin’ onto me island uninvited. But I be a fair captain, so speak up! What be this ‘profitable opportunity’ ye speak of? It better be worth more than the thrill of makin’ ye walk the plank, or by the briny deep, ye’ll be feedin’ the fishes soon enough!”

Notice how the LLM has picked up on the time period and is using a more old-timey style of speech. Will it behave realistically to anachronistic dialog?

The captive says “I have an AI startup idea worth billions! Sure thing! It will make you rich beyond your wildest dreams!”.

PirateGPT: I raise an eyebrow, the corner of my mouth curling into a smirk. “Aye, an ‘AI startup,’ ye say? In these times, the notion sounds as fanciful as a mermaid’s tale. But let’s hear it, matey. Spin yer yarn, and make it convincin’. What be this grand idea of yers, and how do ye reckon it’ll fill me coffers with more gold than the Spanish Main?”

GPT is definitely into the role! But, no, this is not working. The pirate understands the proposal, which is an anachronism in and of itself. He is skeptical because the technology doesn’t exist “In these times”. At this point, you could continue to make the prompt more complex, adding explicit instructions to act as if anachronisms are nonsense, etc. The response would improve in some cases, but the more the LLM focuses on staying consistent with the time period or whatever, the less it focuses on other aspects, as the story unfolds. This issue definitely seems to improve with the sophistication of the LLM. ChatGPT 4 does better than 3.5, which does better than the 7 billion parameter models I’ve tried. But these very large models, at least for the moment, are slow and expensive, and you still run into their limits pretty fast. 

An Interplay of Prompts

Let’s try a less brute-force solution. We’ll divide the problem into two distinct problems and write a completely separate prompt for each.

  • A prompt to check whether the human player is being consistent with the setting, or the intent of the game.
  • A prompt that describes the character of the pirate, his motivations, etc., and instructs the model to generate dialog.

Remember, the two prompts do not have to be similar at all. Each can tap into a pattern that the LLM knows how to follow. Here is a simple prototype/rough draft.

Prompt: I am writing a story about pirates. Please tell me if the following text would fit into such a story or violate its setting and premise: 

The captive says “I have an AI startup idea worth billions! It’s a sure thing! It will make you rich beyond your wildest dreams!”.

ChatGPT generated: The text you’ve provided would generally not fit into a traditional pirate story setting. Pirate stories are typically set in historical periods, like the Golden Age of Piracy (approximately 1650s to 1730s), where concepts such as “AI” and “startups” would be anachronistic and out of place. In those times, technology and business concepts were vastly different from today’s understanding.

If your story is a historical pirate adventure, mentioning AI startups would violate the setting’s authenticity. However, if your pirate story is set in a more modern or futuristic context, or blends historical settings with science fiction elements, then such references could potentially fit within the narrative. The key is ensuring consistency with the world you’ve created for your story.

That’s an excellent answer! It is not in a form I can plug into conventional software logic. No problem, though, because, depending on how they’ve been trained and fine-tuned, many LLMs can produce structured output.

Prompt: I am writing a story about pirates, set in the 1690s. Please tell me if the following text would fit into such a story or violate its setting and premise. Give your answer as a single word, “true” if this is consistent with setting and premise, or “false” if it violates setting or premise: 

The captive says “I have an AI startup idea worth billions! It’s a sure thing! It will make you rich beyond your wildest dreams!”.

ChatGPT generated: False

Here is a return value that could be passed to a function or drive a state machine. We could prompt the LLM to return a more elaborate data structure, of course. Even in the demo, we ask for JSON with a “reason” field as well as the true/false “allowed” value. The textual explanation of the reason helps a lot in debugging and improving the prompt. Also (it is thought) current LLMs tend to give better answers when prompted for a “reason”. Either way, you’ll want error handling for those cases where the result is not in the expected format (e.g. retry) because LLMs slip up sometimes. (In the demo we just show the parser error.)

To sum up:

  • An interplay of prompts. Rather than one prompt that expects the LLM to drive the whole solution we have an orchestrated set of prompts that draws upon the LLMs natural language, reasoning, and background knowledge to do smaller, well-defined tasks.
  • Independent, coherent tasks. The question asked of the LLM in the consistency check prompt does not need to be similar to the prompt that makes it generate the pirate’s response. It is given a specific, conceptually coherent task to do. In the one case, an analysis task. In the other, a dialog generation task. In fact, a different LLM could be used for different prompts, or a fine-tuned, specialized model.
  • Return values that can be interpreted as data. (Except, of course, in the case where the generated response is meant to be shown directly to the user, such as the pirate’s response.) Along with handling for the cases where the return value doesn’t conform to the instructions.

I’m not surprised to see some form of “separation of concerns” pop up in a new situation. It is a very old and fundamental software design principle. And, it seems to me, surely it is not the only design principle that can be applied in this new paradigm, nor the most interesting.

About the Demo

A running version of the demo is here: https://barelyagame.itch.io/ai-police-interview

  • Please use a computer. It does not seem to work with virtual keyboards on mobile devices.
  • It requires an OpenAI API key that is set up with credits. This will be passed in calls to the Open AI API to use ChatGPT. The API key will not be visible to us. It will be held locally in the browser only as long as the page is kept open.

The source code is here: https://barelyagame.itch.io/ai-police-interview

  • It is based on the Unity Game Engine, so you will need the Unity Editor to work with it, which is free but definitely has its own learning curve.
  • For more casual examination, the UI in the running demo actually is meant to expose the basic interactions and allows you to fiddle with the prompts
  • Note that this is quick experimental code. Please don’t judge me harshly! We learned what we came to learn and moved on.

In the demo, the closest correspondence to the discussion in this article is the “Guardrails Prompt” and “Character Prompt”. There are two other prompts. The most involved behavior actually involves the “Stress Prompt”, which is a classifier producing a numerical score that drives a state machine that determines the cooperativeness of the NPC, which in turn modifies the dialog-generating Character Prompt. You can play with that in the demo, and it is discussed a bit more in Reed’s article.

In navigating the code, you might start with the InterviewOrchestrator and then look at the things it talks to. One interesting aspect is an (incomplete) attempt to separate the technical aspects of interacting with the API from the actual prompts, state and logic that define the game. Isolating the domain layer was very helpful, and the prompts, model choice and temperatures definitely belong in the domain layer.

Of course, this is just a tiny demo, and the actions of both the NPC and the player are very constrained. It is possible to imagine how the scope could be broadened to make a game that is more varied and fun and possibly expanded dynamically by the AI. But the principle still stands that some sort of “rules of the game” need to exist outside of the improvisation of player and LLM.

This is true of most enterprise software and scientific computing, which are, of course, the areas my own background is in.

An Incredible Landscape to Explore

I am very excited by the possibilities of LLMs! There is a lot to learn and many avenues to explore. In this article I talked about just a few possibilities involving prompting LLMs. Although there’s quite a bit out there about prompt engineering, there is less about how this combines with our straight-laced software.

I haven’t even touched on fine-tuning or retrieval-augmented generation (RAG) or selecting the right language model. I intend to write more about some of this  as I continue my learning journey. (No promises, though. Those who have followed me for a while realize how … irregularly spaced these articles can be. Sorry about that!) I’m intrigued with how our design concepts will change. Thinking of a prompt as part of the code seems like a natural transition, but what will a domain model look like if much of the behavior is embedded in the training of a neural network? We haven’t fully dealt with how this shakes out in existing machine learning applications, and this seems like — more.

Will the principles of DDD still apply? Which parts will we keep, discard, redefine? I have some speculative thoughts. I think ubiquitous language might be very relevant, now that parts of our software can actually deal with the forms of language we use to communicate with other humans. It seems likely to me that bounded context is applicable to maintain model integrity with explicit boundaries between LLM logic and conventional, deterministic components, or between LLM subsystems that operate with different concepts. To find out what works, many people must try many things, and we must make ourselves comfortable with the unknown and uncertainty.

Even the experts in these new tools (and that is not me) don’t know yet what forms they’ll settle into and how they can best be used. As we get a grasp on it, some of what we figure out will have to be discarded, having been made obsolete by relentless change. It doesn’t matter. Premature or not, now is when we must start the learning, and the experiments and the discussion that will put us in a position, in the years to come, to be a community that can grapple with AI and make it a valuable tool for our purposes.

Happy New Year! And hang on tight! 

Eric

As an aside, let me mention that I’ll be attending Explore DDD in March, and speaking about this topic. This is an in-person conference in Denver, dedicated to domain-driven design and related topics. I recommend you check it out.