RubyLLM: Concurrent Execution Of Multiple Tool Calls
Enhance your RubyLLM applications by understanding how to implement concurrent tool execution. This article delves into the proposal for adding concurrent tool execution to RubyLLM, addressing the limitations of sequential execution and presenting a design that aligns with the library's core principles. Let's explore the problem, the proposed solutions, and the benefits of this powerful feature.
The Problem: Sequential Tool Execution
Currently, RubyLLM processes tool calls sequentially. When a Language Model (LLM) returns multiple tool calls in a single response, RubyLLM executes them one after the other. This approach can be inefficient, especially for I/O-bound tools such as API calls, database queries, and file operations. Imagine a scenario where the model requests three weather API calls; the system waits for each call to complete before initiating the next, leading to wasted time and resources.
def handle_tool_calls(response, &)
response.tool_calls.each_value do |tool_call|
result = execute_tool tool_call # Sequential execution
# ... add message
end
complete(&)
end
This sequential execution becomes a bottleneck, hindering the performance and responsiveness of applications that rely on multiple tool calls. Therefore, concurrent execution emerges as a crucial optimization strategy.
Naming Considerations: Avoiding Confusion
It's essential to clarify the terminology to avoid confusion. LLM providers often use the term "parallel tool calls" to describe multiple tool calls within a single response. This proposal focuses on the concurrent execution of these multiple calls, distinguishing it from the existing terminology. To maintain clarity and precision, the suggested name for this feature is concurrent_tool_execution or concurrent_tools.
Proposed Design: A Concurrency Solution
The proposed design for concurrent tool execution in RubyLLM adheres to the library's core philosophy, ensuring a seamless and efficient integration. Let's examine how the design aligns with these principles:
- Beautiful API: The design seamlessly integrates with the existing
with_toolspattern, ensuring a natural and intuitive user experience. - Minimal Configuration: The feature is enabled with a simple keyword argument, minimizing the complexity for developers.
- Extensible: The registry pattern allows for the addition of new concurrency modes, providing flexibility and scalability.
- No Heavy Classes: The design avoids introducing bulky handler classes, opting for a lightweight keyword argument approach.
- Focused Scope: The solution addresses the core concern of LLM communication, ensuring a targeted and effective implementation.
API Design: Streamlining Concurrency
The API design leverages a keyword argument within the with_tools method to enable concurrent execution. This approach maintains consistency with existing patterns and minimizes the learning curve for developers.
chat = RubyLLM.chat
.with_tools(Weather, StockPrice, CurrencyConverter, concurrency: :async)
.ask("What's the weather in NYC, AAPL stock price, and EUR/USD rate?")
# Different concurrency modes
chat.with_tools(W, S, C, concurrency: :async) # Async (fiber-based)
chat.with_tools(W, S, C, concurrency: :threads) # Threads (future)
chat.with_tools(W, S, C) # Sequential (current behavior)
Why this design?
- Single Concern: Tool configuration remains centralized, simplifying management and maintenance.
- Minimal API Surface: Only a keyword argument is added, minimizing the impact on the existing API.
- Clear Naming: The
concurrency:keyword clearly conveys the feature's purpose. - Backward Compatibility: The absence of the argument defaults to sequential execution, preserving existing behavior.
- Fits Pattern: The design aligns with the existing
replace:option inwith_tools(*tools, replace: false).
Built-in Strategies: Async and Threads
To provide immediate value, the proposal includes two built-in concurrency strategies:
:async- Fiber-based Concurrency:- Leverages the
Asyncgem, already included as a development dependency. - Ideal for I/O-bound operations, which constitute the majority of tool calls.
- Lightweight and efficient, minimizing overhead.
- Leverages the
:threads- Thread-based Concurrency:- Utilizes Ruby's built-in threads.
- Provides broad compatibility without external dependencies.
- Serves as a reliable fallback for environments lacking
Async.
The :async strategy is the primary recommendation for I/O-bound operations due to its efficiency and lightweight nature. The :threads strategy offers a robust alternative for environments where Async is not available.
Extensible Concurrency Registry: Customization and Flexibility
To accommodate diverse needs and environments, the proposal includes an extensible concurrency registry. This allows users to register custom concurrency modes, tailoring the execution strategy to specific requirements.
# Register a custom concurrency mode (e.g., thread-based)
RubyLLM.register_tool_executor(:threads) do |tool_calls, &execute|
tool_calls.map do |_id, tool_call|
Thread.new { [tool_call, execute.call(tool_call)] }
end.map(&:value)
end
# Use it
chat.with_tools(W, S, C, concurrency: :threads)
This flexibility empowers developers to optimize tool execution for their unique use cases, ensuring peak performance and efficiency.
Why This Design Fits RubyLLM: Consistency and Elegance
The proposed design seamlessly integrates with RubyLLM's existing patterns and principles, ensuring a cohesive and intuitive experience.
-
Follows Existing Patterns:
.with_tool(Weather) # Add single tool .with_tools(W, S, C) # Add multiple tools .with_tools(W, S, C, concurrency: :async) # Execute concurrently
2. **Minimal API Surface:**
```ruby
chat.with_tools(Weather, StockPrice, concurrency: :async)
-
Extensible via Registry:
RubyLLM.register_tool_executor(:custom) { |calls, &exec| ... } chat.with_tools(W, concurrency: :custom)
4. **Leverages Existing Async Infrastructure:**
```ruby
# Works great inside Async blocks
Async do
chat.with_tools(W, concurrency: :async).ask("...")
end
-
Backward Compatible:
- No argument defaults to sequential execution, preserving existing behavior.
The design's elegance and consistency with RubyLLM's core principles make it a natural and intuitive extension of the library's capabilities.
Implementation Approach: Building the Concurrency Engine
The implementation of concurrent tool execution involves three key components:
- Concurrency Registry
- Chat Configuration
- Enhanced Tool Execution
Let's delve into the details of each component.
1. Concurrency Registry (in RubyLLM module)
The concurrency registry, housed within the RubyLLM module, manages the available concurrency executors. This registry allows users to register custom executors and provides a central point for managing concurrency strategies.
# lib/ruby_llm.rb - Add to class << self block
module RubyLLM
class << self
def tool_executors
@tool_executors ||= {}
end
def register_tool_executor(name, &block)
tool_executors[name.to_sym] = block
end
end
end
# After module definition (like Provider.register calls)
RubyLLM.register_tool_executor(:async) do |tool_calls, &execute|
begin
require 'async' unless defined?(Async)
rescue LoadError
raise LoadError, "The 'async' gem is required for concurrent tool execution. " \
"Add `gem 'async'` to your Gemfile."
end
Async do
tool_calls.map do |_id, tool_call|
Async { [tool_call, execute.call(tool_call)] }
end.map(&:wait)
end.wait
end
The registry stores executors as a hash, with the executor name as the key and the execution block as the value. The :async executor, utilizing the Async gem, is registered as the default concurrent executor.
2. Chat Configuration
The Chat class is extended to include a concurrency attribute, allowing users to specify the desired concurrency mode. The with_tools method is modified to set this attribute.
class Chat
attr_reader :model, :messages, :tools, :params, :headers, :schema, :concurrency
def with_tools(*tools, replace: false, concurrency: nil)
@tools.clear if replace
tools.compact.each { |tool| with_tool tool }
@concurrency = concurrency
self
end
private
def concurrent_tools?
return false unless @concurrency
unless RubyLLM.tool_executors.key?(@concurrency)
raise ArgumentError, "Unknown concurrency mode: #{@concurrency}. " \
"Available: #{RubyLLM.tool_executors.keys.join(', ')}"
end
true
end
end
The concurrent_tools? method checks if a valid concurrency mode is set, ensuring that the specified mode exists in the registry.
3. Enhanced Tool Execution
The handle_tool_calls method is enhanced to support concurrent execution. This involves executing tool calls concurrently and handling callbacks appropriately.
def handle_tool_calls(response, &)
halt_result = nil
if concurrent_tools?
# Execute all tools concurrently with callbacks firing during execution
results = execute_tools_concurrently(response.tool_calls)
# Add messages sequentially after concurrent execution completes
results.each do |tool_call, result|
@on[:new_message]&.call
tool_payload = result.is_a?(Tool::Halt) ? result.content : result
content = content_like?(tool_payload) ? tool_payload : tool_payload.to_s
message = add_message role: :tool, content:, tool_call_id: tool_call.id
@on[:end_message]&.call(message)
halt_result = result if result.is_a?(Tool::Halt)
end
else
# Current sequential behavior
response.tool_calls.each_value do |tool_call|
@on[:new_message]&.call
@on[:tool_call]&.call(tool_call)
result = execute_tool(tool_call)
@on[:tool_result]&.call(result)
tool_payload = result.is_a?(Tool::Halt) ? result.content : result
content = content_like?(tool_payload) ? tool_payload : tool_payload.to_s
message = add_message role: :tool, content:, tool_call_id: tool_call.id
@on[:end_message]&.call(message)
halt_result = result if result.is_a?(Tool::Halt)
end
end
halt_result || complete(&)
end
def execute_tools_concurrently(tool_calls)
executor = RubyLLM.tool_executors[@concurrency]
# Callbacks fire INSIDE concurrent execution (before/after each tool)
executor.call(tool_calls) do |tool_call|
@on[:tool_call]&.call(tool_call)
result = execute_tool(tool_call)
@on[:tool_result]&.call(result)
result
end
end
Important note on callbacks:
on_tool_callandon_tool_resultfire during concurrent execution (may interleave in thread-based adapters).on_new_messageandon_end_messagefire sequentially after all tools complete (message ordering is deterministic).- In
:asyncmode, callbacks are fiber-safe (cooperative scheduling). - In
:threadsmode, callbacks must be thread-safe (user responsibility).
This enhanced execution flow ensures that tool calls are executed concurrently while maintaining the integrity of the callback system.
User-Defined Adapters: Tailoring Concurrency to Your Needs
The extensible concurrency registry allows users to define their own concurrency adapters, enabling fine-grained control over tool execution. This flexibility is crucial for adapting RubyLLM to diverse environments and use cases.
# In an initializer or before using RubyLLM
RubyLLM.register_tool_executor(:my_custom_executor) do |tool_calls, &execute|
# tool_calls is a Hash: { id => ToolCall, id => ToolCall, ... }
# execute is a block that runs callbacks + executes the tool
# Must return: Array of [tool_call, result] tuples
results = []
tool_calls.each_value do |tool_call|
result = execute.call(tool_call)
results << [tool_call, result]
end
results
end
# Use it
chat.with_tools(MyTool, concurrency: :my_custom_executor)
Adapter Contract: Ensuring Compatibility
Each adapter block adheres to a specific contract, ensuring seamless integration with the RubyLLM framework.
The adapter block receives:
tool_calls- Hash of{ id => ToolCall }objects (ordered by insertion).&execute- Block that:- Fires
on_tool_callcallback. - Executes the tool.
- Fires
on_tool_resultcallback. - Returns the result.
- Fires
The adapter must return:
- Array of
[tool_call, result]tuples (in the same order as input).
Important: Results should be returned in the same order as tool_calls to ensure deterministic message ordering. Ruby Hashes maintain insertion order, so iterating tool_calls preserves order.
Example: Thread Pool with Limit
To illustrate the power of custom adapters, let's examine an example that implements a thread pool with a concurrency limit.
require 'concurrent-ruby'
RubyLLM.register_tool_executor(:thread_pool) do |tool_calls, &execute|
pool = Concurrent::FixedThreadPool.new(5)
futures = tool_calls.map do |_id, tool_call|
Concurrent::Future.execute(executor: pool) do
[tool_call, execute.call(tool_call)]
end
end
futures.map(&:value)
ensure
pool.shutdown
end
This adapter utilizes the concurrent-ruby gem to create a fixed-size thread pool, limiting the number of concurrent tool executions. This is particularly useful for managing resource consumption and preventing overload.
Example: Custom Error Handling
Another compelling use case for custom adapters is implementing custom error handling. The following example demonstrates an adapter that gracefully handles tool execution errors.
RubyLLM.register_tool_executor(:resilient_async) do |tool_calls, &execute|
require 'async' unless defined?(Async)
Async do
tool_calls.map do |_id, tool_call|
Async do
result = begin
execute.call(tool_call)
rescue StandardError => e
{ error: e.message, tool: tool_call.name }
end
[tool_call, result]
end
end.map(&:wait)
end.wait
end
This adapter wraps each tool execution in a begin...rescue block, catching any exceptions and returning an error payload. This ensures that tool execution failures do not disrupt the overall process.
Advanced: CPU-Bound Tools and Ractor Considerations
While the :async adapter is ideal for I/O-bound operations, CPU-bound tools may benefit from thread-based concurrency. Users can register a :threads adapter for such cases.
RubyLLM.register_tool_executor(:threads) do |tool_calls, &execute|
tool_calls.map do |_id, tool_call|
Thread.new { [tool_call, execute.call(tool_call)] }
end.map(&:value)
end
chat.with_tools(CPUIntensiveTool, concurrency: :threads)
Note on Ractors: Ractors, Ruby's actor-based concurrency model, have strict isolation requirements that are incompatible with the block-based registry pattern. If Ractor support is needed, it would require a different architecture. For now, :async (I/O-bound) and :threads (CPU-bound) cover the common cases.
Recommendation: Ship with :async. Add :threads if there's demonstrated need - the registry makes this trivial.
Benefits: Unleashing Performance and Scalability
The introduction of concurrent tool execution in RubyLLM brings a multitude of benefits.
- Performance: Execute multiple API calls in approximately the time of one, significantly reducing latency.
- Simple API: Enable the feature with a single method call, minimizing complexity.
- Extensible: The registry pattern makes adding new concurrency strategies effortless.
- No Breaking Changes: Sequential execution remains the default, ensuring backward compatibility.
- Fits Philosophy: The design aligns with RubyLLM's principles of beautiful, minimal APIs and sensible defaults.
Example Usage: Weather Tool Scenario
Consider a scenario involving a weather tool that makes HTTP requests to retrieve weather information for a given city.
# Weather tool that makes HTTP requests
class Weather < RubyLLM::Tool
description "Get weather for a city"
param :city
def execute(city:)
# I/O bound - perfect for async
response = Faraday.get("https://api.weather.com/#{city}")
JSON.parse(response.body)
end
end
# Multiple cities requested - concurrent execution
chat = RubyLLM.chat
.with_tools(Weather, concurrency: :async)
chat.ask "What's the weather in NYC, London, and Tokyo?"
# Model returns 3 tool calls
# All 3 HTTP requests execute concurrently
# Total time: ~1 request instead of ~3
In this example, the model requests weather information for three cities. With concurrent execution enabled, the three HTTP requests are executed concurrently, significantly reducing the overall response time.
Questions for Discussion: Addressing Nuances and Edge Cases
To ensure a robust and well-rounded implementation, several questions merit discussion.
-
Callback Interleaving: In concurrent mode,
on_tool_callandon_tool_resultmay fire concurrently (interleaved). Is this acceptable?- Current Behavior:
on_tool_call→ execution →on_tool_result(sequential per tool). - Concurrent Behavior: Same semantics per tool, but multiple tools run simultaneously.
- Implication: Users with thread-unsafe callbacks need to handle synchronization.
- Recommendation: Document this clearly;
:asyncis fiber-safe (cooperative),:threadsrequires care.
- Current Behavior:
-
Error Handling: If one tool fails, should others continue?
- Recommendation: Yes, continue all tools. The adapter can handle errors (see resilient example). Exceptions propagate naturally.
-
Rate Limiting: Should we add semaphore support in the future?
chat.with_tools(W, concurrency: :async, max_concurrency: 5)
* **Recommendation:** Not in v1, users can implement via custom adapters (see thread pool example).
4. **Multiple `with_tools` Calls:** How to handle concurrency setting?
```ruby
chat.with_tools(A, concurrency: :async).with_tools(B) # B resets to nil or inherits?
* **Recommendation:** Each call overrides the setting (last one wins, consistent with **`replace:`**).
-
Async Gem as Dependency: Should we add
asyncas a runtime dependency or keep it optional?- Recommendation: Keep optional, require when
:asyncis used (current proposal).
- Recommendation: Keep optional, require when
Implementation Plan: A Phased Approach
The implementation of concurrent tool execution will follow a phased approach, ensuring a smooth and well-tested integration.
- Phase 1: Core Implementation
- Add
concurrency:argument towith_tools. - Add registry with
RubyLLM.register_tool_executor. - Ship
:asyncadapter. - Add tests.
- Update documentation.
- Add
- Phase 2: Refinements (if needed)
- Add error handling patterns.
- Add
:threadsadapter if requested. - Performance benchmarks.
- Phase 3: Advanced Features (if requested)
- Rate limiting (semaphore support).
- Ractor adapter for CPU-bound tools.
- Per-tool concurrency control.
Conclusion: Embracing Concurrency for Enhanced RubyLLM Applications
This proposal introduces concurrent tool execution to RubyLLM, aligning with the library's philosophy of beautiful, minimal APIs. By using the simple keyword argument concurrency: :async, developers can unlock significant performance improvements. The extensible registry pattern allows for future adapter additions, and the included :async adapter caters to I/O-bound tools. Backward compatibility is preserved, ensuring a smooth transition for existing applications.
The design adheres to crmne's preference for options like Async and Ractors without necessitating heavy handler classes. The registry pattern simplifies the addition of new concurrency modes, making RubyLLM a versatile and powerful tool for building intelligent applications.
Suggested Issue Title: "Support concurrent execution of multiple tool calls"
Suggested Label: [FEATURE]
For more information on concurrent programming in Ruby, visit the official Ruby Concurrency Documentation.