llm.rb is the most capable runtime for building AI systems in Ruby.

llm.rb is designed for Ruby, and although it works great in Rails, it is not tightly
coupled to it. It runs on the standard library by default (zero dependencies),
loads optional pieces only when needed, includes built-in ActiveRecord support through
acts_as_llm and acts_as_agent, includes built-in Sequel support through
plugin :llm and plugin :agent, and is designed for engineers who want control over
long-lived, tool-capable, stateful AI workflows instead of just
request/response helpers.

It provides one runtime for providers, agents, tools, skills, MCP servers, streaming,
schemas, files, and persisted state, so real systems can be built out of one coherent
execution model instead of a pile of adapters.

Want to see some code? Jump to the examples section.
Want to see an agentic framework built on top of llm.rb? Check out general-intelligence-systems/brute.
Want a taste of what llm.rb can build? See the screencast.

LLM::Context
is the execution boundary in llm.rb.

It holds:

  • message history
  • tool state
  • schemas
  • streaming configuration
  • usage and cost tracking

Instead of switching abstractions for each feature, everything builds on the
same context object.

The following list is not exhaustive, but it covers a lot of ground.

Skills are reusable, directory-backed capabilities loaded from SKILL.md.
They run through the same runtime as tools, agents, and MCP. They do not
require a second orchestration layer or a parallel abstraction. If you've
used Claude or Codex, you know the general idea of skills, and llm.rb
supports that same concept with the same execution model as the rest of the
system.

In llm.rb, a skill has frontmatter and instructions. The frontmatter can
define name, description, and tools. The tools entries are tool names,
and each name must resolve to a subclass of
LLM::Tool that is already
loaded in the runtime.

If you want Claude/Codex-like skills that can drive scripts or shell
commands, you would typically pair the skill with a tool that can execute
system commands.

```

name: release
description: Prepare a release
tools:
- search_docs
- git


Review the release state, summarize what changed, and prepare the release.
```

```
class Agent < LLM::Agent
model "gpt-5.4-mini"
skills "./skills/release"
end

llm = LLM.openai(key: ENV["KEY"])
Agent.new(llm, stream: $stdout).talk("Let's prepare the release!")
```

Any ActiveRecord model or Sequel model can become an agent-capable model,
including existing business and domain models, without forcing you into a
separate agent table or a second persistence layer.

acts_as_agent extends a model with agent capabilities: the same runtime
surface as LLM::Agent,
because it actually wraps an LLM::Agent, plus persistence through a text,
JSON, or JSONB-backed column on the same table.

```
class Ticket < ApplicationRecord
acts_as_agent provider: :set_provider
model "gpt-5.4-mini"
instructions "You are a support assistant."

private

def set_provider
{ key: ENV["#{provider.upcase}_SECRET"], persistent: true }
end
end
```

llm.rb is especially strong when you want to build agentic systems in a Ruby
way. Agents can be ordinary application models with state, associations,
tools, skills, and persistence, which makes it much easier to build systems
where users have their own specialized agents instead of treating agents as
something outside the app.

That pattern works so well in llm.rb because
LLM::Agent,
acts_as_agent, plugin :agent, skills, tools, and persisted runtime state
all fit the same execution model. The runtime stays small enough that the
main design work becomes application design, not orchestration glue.

For a concrete example, see
How to build a platform of agents.

The same runtime can be serialized to disk, restored later, persisted in JSON
or JSONB-backed ORM columns, resumed across process boundaries, or shared
across long-lived workflows.

ctx = LLM::Context.new(llm) ctx.talk("Remember that my favorite language is Ruby.") ctx.save(path: "context.json")

Long-lived contexts can compact older history into a summary instead of
growing forever. Compaction is built into LLM::Context
through LLM::Compactor,
and when a stream is present it emits on_compaction and
on_compaction_finish through LLM::Stream.
The compactor can also use a different model from the main context, which is
useful when you want summarization to run on a cheaper or faster model.

ctx = LLM::Context.new( llm, compactor: { message_threshold: 200, retention_window: 8, model: "gpt-5.4-mini" } )

LLM::Stream is not just for printing tokens. It supports on_content,
on_reasoning_content, on_tool_call, on_tool_return, on_compaction,
and on_compaction_finish, which means visible output, reasoning output, tool
execution, and context compaction can all be driven through the same
execution path.

```
class Stream < LLM::Stream
def on_tool_call(tool, error)
queue << tool.spawn(:thread)
end

def on_tool_return(tool, result)
puts(result.value)
end
end
```

Tool execution can run sequentially with :call or concurrently through
:thread, :task, :fiber, and experimental :ractor, without rewriting
your tool layer.

class Agent < LLM::Agent model "gpt-5.4-mini" tools FetchWeather, FetchNews, FetchStock concurrency :thread end

Remote MCP tools and prompts are not bolted on as a separate integration
stack. They adapt into the same tool and prompt path used by local tools,
skills, contexts, and agents.

begin mcp = LLM::MCP.http(url: "https://api.githubcopilot.com/mcp/").persistent mcp.start ctx = LLM::Context.new(llm, tools: mcp.tools) ensure mcp.stop end

Cancellation is one of the harder problems to get right, and while llm.rb
makes it possible, it still requires careful engineering to use effectively.
The point though is that it is possible to stop in-flight provider work cleanly
through the same runtime, and the model used by llm.rb is directly inspired by
Go's context package. In fact, llm.rb is heavily inspired by Go but with a Ruby
twist.

ctx = LLM::Context.new(llm, stream: $stdout) worker = Thread.new do ctx.talk("Write a very long essay about network protocols.") rescue LLM::Interrupt puts "Request was interrupted!" end STDIN.getch ctx.interrupt! worker.join

  • A system layer, not just an API wrapper
    Put providers, tools, MCP servers, and application APIs behind one runtime
    model instead of stitching them together by hand.
  • Contexts are central
    Keep history, tools, schema, usage, persistence, and execution state in one
    place instead of spreading them across your app.
  • Contexts can be serialized
    Save and restore live state for jobs, databases, retries, or long-running
    workflows.

  • Streaming and tool execution work together
    Start tool work while output is still streaming so you can hide latency
    instead of waiting for turns to finish.

  • Agents auto-manage tool execution
    Use LLM::Agent when you want the same stateful runtime surface as
    LLM::Context, but with tool loops executed automatically according to a
    configured concurrency mode such as :call, :thread, :task, :fiber,
    or experimental :ractor support for class-based tools. MCP tools are not
    supported by the current :ractor mode, but mixed tool sets can still
    route MCP tools and local tools through different strategies at runtime.
  • Tool calls have an explicit lifecycle
    A tool call can be executed, cancelled through
    LLM::Function#cancel,
    or left unresolved for manual handling, but the normal runtime contract is
    still that a model-issued tool request is answered with a tool return.
  • Requests can be interrupted cleanly
    Stop in-flight provider work through the same runtime instead of treating
    cancellation as a separate concern.
    LLM::Context#cancel!
    is inspired by Go's context cancellation model.
  • Concurrency is a first-class feature
    Use threads, fibers, async tasks, or experimental ractors without
    rewriting your tool layer. The current :ractor mode is for class-based
    tools and does not support MCP tools, but mixed workloads can branch on
    tool.mcp? and choose a supported strategy per tool. :ractor is
    especially useful for CPU-bound tools, while :task, :fiber, or
    :thread may be a better fit for I/O-bound work.
  • Advanced workloads are built in, not bolted on
    Streaming, concurrent tool execution, persistence, tracing, and MCP support
    all fit the same runtime model.

  • MCP is built in
    Connect to MCP servers over stdio or HTTP without bolting on a separate
    integration stack.

  • ActiveRecord and Sequel persistence are built in
    llm.rb includes built-in ActiveRecord support through acts_as_llm and
    acts_as_agent, plus built-in Sequel support through plugin :llm and
    plugin :agent.
    Use acts_as_llm when you want to wrap LLM::Context, acts_as_agent
    when you want to wrap LLM::Agent, plugin :llm when you want a
    LLM::Context on a Sequel model, or plugin :agent when you want an
    LLM::Agent. These integrations support provider: and context: hooks,
    plus format: :string for text columns or format: :jsonb for native
    PostgreSQL JSON storage when ORM JSON typecasting support is enabled.
  • ORM models can become persistent agents
    Turn an ActiveRecord or Sequel model into an agent-capable model with
    built-in persistence, stored on the same table, with jsonb support when
    your ORM and database support native JSON columns.
  • Persistent HTTP pooling is shared process-wide
    When enabled, separate
    LLM::Provider
    instances with the same endpoint settings can share one persistent
    pool, and separate HTTP
    LLM::MCP
    instances can do the same, instead of each object creating its own
    isolated per-instance transport.
  • OpenAI-compatible gateways are supported
    Target OpenAI-compatible services such as DeepInfra and OpenRouter, as well
    as proxies and self-hosted servers, with host: and base_path: when they
    preserve OpenAI request shapes but change the API root path.
  • Provider support is broad
    Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek,
    Z.ai, xAI, llama.cpp, and Ollama through the same runtime.
  • Tools are explicit
    Run local tools, provider-native tools, and MCP tools through the same path
    with fewer special cases.
  • Skills become bounded runtime capabilities
    Point llm.rb at directories with a SKILL.md, resolve named tools through
    the registry, and adapt each skill into its own callable capability through
    the normal runtime. Unlike a generic skill-discovery tool, each skill runs
    with its own bounded tool subset and behaves like a task-scoped sub-agent.
  • Providers are normalized, not flattened
    Share one API surface across providers without losing access to provider-
    specific capabilities where they matter.
  • Responses keep a uniform shape
    Provider calls return
    LLM::Response
    objects as a common base shape, then extend them with endpoint- or
    provider-specific behavior when needed.
  • Low-level access is still there
    Normalized responses still keep the raw Net::HTTPResponse available when
    you need headers, status, or other HTTP details.
  • Local model metadata is included
    Model capabilities, pricing, and limits are available locally without extra
    API calls.

  • Runs on the stdlib
    Start with Ruby's standard library and add extra dependencies only when you
    need them.

  • It is highly pluggable
    Add tools, swap providers, change JSON backends, plug in tracing, or layer
    internal APIs and MCP servers into the same execution path.
  • It scales from scripts to long-lived systems
    The same primitives work for one-off scripts, background jobs, and more
    demanding application workloads with streaming, persistence, and tracing.
  • Thread boundaries are clear
    Providers are shareable. Contexts are stateful and should stay thread-local.

Execution:

  • Chat \& Contexts — stateless and stateful interactions with persistence
  • Context Serialization — save and restore state across processes or time
  • Streaming — visible output, reasoning output, tool-call events
  • Request Interruption — stop in-flight provider work cleanly
  • Concurrent Execution — threads, async tasks, and fibers

Runtime Building Blocks:

  • Tool Calling — class-based tools and closure-based functions
  • Run Tools While Streaming — overlap model output with tool latency
  • Agents — reusable assistants with tool auto-execution
  • Skills — directory-backed capabilities loaded from SKILL.md
  • MCP Support — stdio and HTTP MCP clients with prompt and tool support
  • Context Compaction — summarize older history in long-lived contexts

Data and Structure:

  • Structured Outputs — JSON Schema-based responses
  • Responses API — stateful response workflows where providers support them
  • Multimodal Inputs — text, images, audio, documents, URLs
  • Audio — speech generation, transcription, translation
  • Images — generation and editing
  • Files API — upload and reference files in prompts
  • Embeddings — vector generation for search and RAG
  • Vector Stores — retrieval workflows

Operations:

  • Cost Tracking — local cost estimation without extra API calls
  • Observability — tracing, logging, telemetry
  • Model Registry — local metadata for capabilities, limits, pricing
  • Persistent HTTP — optional connection pooling for providers and MCP

This example uses LLM::Context directly for an interactive REPL.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)

loop do
print "> "
ctx.talk(STDIN.gets || break)
puts
end
```

This example uses LLM::Agent directly and lets the agent manage tool execution.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"

class ShellAgent < LLM::Agent
model "gpt-5.4-mini"
instructions "You are a Linux system assistant."
tools Shell
concurrency :thread
end

llm = LLM.openai(key: ENV["KEY"])
agent = ShellAgent.new(llm)
puts agent.talk("What time is it on this system?").content
```

This example uses LLM::Agent with directory-backed skills so SKILL.md capabilities run through the normal tool path. In llm.rb, a skill is exposed as a tool in the runtime. When that tool is called, it spawns a sub-agent with relevant context plus the instructions and tool subset declared in its own SKILL.md.
See the deepdive (web) or deepdive (markdown) for more examples.

Each skill runs only with the tools declared in its own frontmatter.

```
require "llm"

class Agent < LLM::Agent
model "gpt-5.4-mini"
instructions "You are a concise release assistant."
skills "./skills/release", "./skills/review"
end

llm = LLM.openai(key: ENV["KEY"])
puts Agent.new(llm).talk("Use the review skill.").content
```

This example uses LLM::Stream directly so visible output and tool execution can happen together.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"

class Stream < LLM::Stream
def on_content(content)
$stdout << content
end

def on_tool_call(tool, error)
return queue << error if error
$stdout << "\nRunning tool #{tool.name}...\n"
queue << tool.spawn(:thread)
end

def on_tool_return(tool, result)
if result.error?
$stdout << "Tool #{tool.name} failed\n"
else
$stdout << "Finished tool #{tool.name}\n"
end
end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])

ctx.talk("Run date and uname -a.")
ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
```

This example uses LLM::Context,
LLM::Compactor, and
LLM::Stream together so
long-lived contexts can summarize older history and expose the lifecycle
through stream hooks. This approach is inspired by General Intelligence
Systems' Brute. The
compactor can also use its own model: if you want summarization to run on a
different model from the main context.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"

class Stream < LLM::Stream
def on_compaction(ctx, compactor)
puts "Compacting #{ctx.messages.size} messages..."
end

def on_compaction_finish(ctx, compactor)
puts "Compacted to #{ctx.messages.size} messages."
end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
llm,
stream: Stream.new,
compactor: {
message_threshold: 200,
retention_window: 8,
model: "gpt-5.4-mini"
}
)
```

This example uses LLM::Stream with the OpenAI Responses API so reasoning output is streamed separately from visible assistant output. See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"

class Stream < LLM::Stream
def on_content(content)
$stdout << content
end

def on_reasoning_content(content)
$stderr << content
end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
llm,
model: "gpt-5.4-mini",
mode: :responses,
reasoning: {effort: "medium"},
stream: Stream.new
)
ctx.talk("Solve 17 * 19 and show your work.")
```

Need to cancel a stream? llm.rb has you covered through LLM::Context#interrupt!.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"
require "io/console"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)

worker = Thread.new do
ctx.talk("Write a very long essay about network protocols.")
end

STDIN.getch
ctx.interrupt!
worker.join
```

The plugin :llm integration wraps LLM::Context on a Sequel::Model and keeps tool execution explicit.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"
require "net/http/persistent"
require "sequel"
require "sequel/plugins/llm"

class Context < Sequel::Model
plugin :llm, provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
end

ctx = Context.create(provider: "openai", model: "gpt-5.4-mini")
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content
```

ActiveRecord (ORM): acts_as_llm

The acts_as_llm method wraps LLM::Context and
provides full control over tool execution.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"
require "net/http/persistent"
require "active_record"
require "llm/active_record"

class Context < ApplicationRecord
acts_as_llm provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
end

ctx = Context.create!(provider: "openai", model: "gpt-5.4-mini")
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content
```

ActiveRecord (ORM): acts_as_agent

The acts_as_agent method wraps LLM::Agent and
manages tool execution for you.
See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"
require "net/http/persistent"
require "active_record"
require "llm/active_record"

class Ticket < ApplicationRecord
acts_as_agent provider: :set_provider
model "gpt-5.4-mini"
instructions "You are a concise support assistant."
tools SearchDocs, Escalate
concurrency :thread

private

def set_provider
{ key: ENV["#{provider.upcase}_SECRET"], persistent: true }
end
end

ticket = Ticket.create!(provider: "openai", model: "gpt-5.4-mini")
puts ticket.talk("How do I rotate my API key?").content
```

This example uses LLM::MCP over HTTP so remote GitHub MCP tools run through the same LLM::Context tool path as local tools. See the deepdive (web) or deepdive (markdown) for more examples.

```
require "llm"
require "net/http/persistent"

llm = LLM.openai(key: ENV["KEY"])
mcp = LLM::MCP.http(
url: "https://api.githubcopilot.com/mcp/",
headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
).persistent

begin
mcp.start
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
ctx.talk("Pull information about my GitHub account.")
ctx.talk(ctx.call(:functions)) while ctx.functions.any?
ensure
mcp.stop
end
```

This screencast was built on an older version of llm.rb, but it still shows
how capable the runtime can be in a real application:

BSD Zero Clause

See LICENSE