Based on my previous post, Using LLMs in Gameplay, I started developing a prototype. The goal is to build a simple game environment where LLMs can interact with the world and each other.
The first goal was to get an LLM connected to a Unity NavMesh agent so it could tell the agent where to walk. To do that, I started with the Synty Polygon Prototype and Synty Animation packs I had left over from a previous Unity project. They let me quickly set up an animated NavMesh agent. Next, I needed to integrate the OpenAI APIs to let an LLM control the character. I initially tried the official OpenAI C# library, but that didn’t work well. One issue is that Unity doesn’t have native NuGet support (.NET Package Manager). I tried using NuGetForUnity, a Unity plugin. With it, I could install the OpenAI library, but it wasn’t recognized correctly in the project. Various sources suggested this was because Unity uses an older .NET Framework version. I tried a few workarounds, then switched to an unofficial OpenAI API library made specifically for Unity. This worked perfectly and appears to support the latest OpenAI features. That matters to me because I don’t want to be stuck on, for example, the now-deprecated Completions API.
Banana 🍌 vs. Gem 💎
Together with Chatty, I designed simple AgentHarness and AgentIntelligence scripts to let the LLM control the character. AgentHarness is intended as a base for AgentIntelligence to interact with the world. For this first demo, I implemented a walkTo tool that can target all annotated game objects in the scene. AgentIntelligence is where the OpenAI API calls happen. In this small scene, AgentIntelligence tells the LLM there are two objects, a banana and a gem, and asks it to decide which one to walk to, then call the walkTo tool.
Interestingly, the LLM initially always chose to walk to the gem, assuming it was more valuable and would grant some kind of power-up. Only after I changed the banana’s description to a “golden banana” did its choices become more evenly split.
Approximating vision
Next, I wanted to give the LLM a basic vision capability. It should only know about the game objects it currently sees or just saw — ideally with a decay, so while moving it would see more objects and remember them briefly.
So I introduced an AgentVision script that gathers all annotated game objects within a sphere around the character. It then raycasts to determine whether each object is visible or occluded by environment geometry. For this, I separated game objects into “Interactable” and “Environment” layers. This worked fine for a coffeemaker on a table but failed for a large desk standing in the environment. After adding some debugging gizmos, I realized I’d hit a classic game-dev issue: the visibility ray originated at the character’s feet (0/0/0), so it was occluded by the floor. It also targeted the object’s 0/0/0 point, which is often below the floor. The quick fix was to cast rays from eye height and use multiple rays targeting the object’s corners and center. This way also a partly occluded object can still be seen by the agent.