RubyLLM: Concurrent Execution Of Multiple Tool Calls

by Alex Johnson 53 views

Enhance your RubyLLM applications by understanding how to implement concurrent tool execution. This article delves into the proposal for adding concurrent tool execution to RubyLLM, addressing the limitations of sequential execution and presenting a design that aligns with the library's core principles. Let's explore the problem, the proposed solutions, and the benefits of this powerful feature.

The Problem: Sequential Tool Execution

Currently, RubyLLM processes tool calls sequentially. When a Language Model (LLM) returns multiple tool calls in a single response, RubyLLM executes them one after the other. This approach can be inefficient, especially for I/O-bound tools such as API calls, database queries, and file operations. Imagine a scenario where the model requests three weather API calls; the system waits for each call to complete before initiating the next, leading to wasted time and resources.

def handle_tool_calls(response, &)
 response.tool_calls.each_value do |tool_call|
 result = execute_tool tool_call # Sequential execution
 # ... add message
 end
 complete(&)
end

This sequential execution becomes a bottleneck, hindering the performance and responsiveness of applications that rely on multiple tool calls. Therefore, concurrent execution emerges as a crucial optimization strategy.

Naming Considerations: Avoiding Confusion

It's essential to clarify the terminology to avoid confusion. LLM providers often use the term "parallel tool calls" to describe multiple tool calls within a single response. This proposal focuses on the concurrent execution of these multiple calls, distinguishing it from the existing terminology. To maintain clarity and precision, the suggested name for this feature is concurrent_tool_execution or concurrent_tools.

Proposed Design: A Concurrency Solution

The proposed design for concurrent tool execution in RubyLLM adheres to the library's core philosophy, ensuring a seamless and efficient integration. Let's examine how the design aligns with these principles:

  1. Beautiful API: The design seamlessly integrates with the existing with_tools pattern, ensuring a natural and intuitive user experience.
  2. Minimal Configuration: The feature is enabled with a simple keyword argument, minimizing the complexity for developers.
  3. Extensible: The registry pattern allows for the addition of new concurrency modes, providing flexibility and scalability.
  4. No Heavy Classes: The design avoids introducing bulky handler classes, opting for a lightweight keyword argument approach.
  5. Focused Scope: The solution addresses the core concern of LLM communication, ensuring a targeted and effective implementation.

API Design: Streamlining Concurrency

The API design leverages a keyword argument within the with_tools method to enable concurrent execution. This approach maintains consistency with existing patterns and minimizes the learning curve for developers.

chat = RubyLLM.chat
 .with_tools(Weather, StockPrice, CurrencyConverter, concurrency: :async)
 .ask("What's the weather in NYC, AAPL stock price, and EUR/USD rate?")

# Different concurrency modes
chat.with_tools(W, S, C, concurrency: :async) # Async (fiber-based)
chat.with_tools(W, S, C, concurrency: :threads) # Threads (future)
chat.with_tools(W, S, C) # Sequential (current behavior)

Why this design?

  • Single Concern: Tool configuration remains centralized, simplifying management and maintenance.
  • Minimal API Surface: Only a keyword argument is added, minimizing the impact on the existing API.
  • Clear Naming: The concurrency: keyword clearly conveys the feature's purpose.
  • Backward Compatibility: The absence of the argument defaults to sequential execution, preserving existing behavior.
  • Fits Pattern: The design aligns with the existing replace: option in with_tools(*tools, replace: false).

Built-in Strategies: Async and Threads

To provide immediate value, the proposal includes two built-in concurrency strategies:

  1. :async - Fiber-based Concurrency:
    • Leverages the Async gem, already included as a development dependency.
    • Ideal for I/O-bound operations, which constitute the majority of tool calls.
    • Lightweight and efficient, minimizing overhead.
  2. :threads - Thread-based Concurrency:
    • Utilizes Ruby's built-in threads.
    • Provides broad compatibility without external dependencies.
    • Serves as a reliable fallback for environments lacking Async.

The :async strategy is the primary recommendation for I/O-bound operations due to its efficiency and lightweight nature. The :threads strategy offers a robust alternative for environments where Async is not available.

Extensible Concurrency Registry: Customization and Flexibility

To accommodate diverse needs and environments, the proposal includes an extensible concurrency registry. This allows users to register custom concurrency modes, tailoring the execution strategy to specific requirements.

# Register a custom concurrency mode (e.g., thread-based)
RubyLLM.register_tool_executor(:threads) do |tool_calls, &execute|
 tool_calls.map do |_id, tool_call|
 Thread.new { [tool_call, execute.call(tool_call)] }
 end.map(&:value)
end

# Use it
chat.with_tools(W, S, C, concurrency: :threads)

This flexibility empowers developers to optimize tool execution for their unique use cases, ensuring peak performance and efficiency.

Why This Design Fits RubyLLM: Consistency and Elegance

The proposed design seamlessly integrates with RubyLLM's existing patterns and principles, ensuring a cohesive and intuitive experience.

  1. Follows Existing Patterns:

.with_tool(Weather) # Add single tool .with_tools(W, S, C) # Add multiple tools .with_tools(W, S, C, concurrency: :async) # Execute concurrently


2.  **Minimal API Surface:**

    ```ruby
chat.with_tools(Weather, StockPrice, concurrency: :async)
  1. Extensible via Registry:

RubyLLM.register_tool_executor(:custom) { |calls, &exec| ... } chat.with_tools(W, concurrency: :custom)


4.  **Leverages Existing Async Infrastructure:**

    ```ruby
# Works great inside Async blocks
Async do
 chat.with_tools(W, concurrency: :async).ask("...")
end
  1. Backward Compatible:

    • No argument defaults to sequential execution, preserving existing behavior.

The design's elegance and consistency with RubyLLM's core principles make it a natural and intuitive extension of the library's capabilities.

Implementation Approach: Building the Concurrency Engine

The implementation of concurrent tool execution involves three key components:

  1. Concurrency Registry
  2. Chat Configuration
  3. Enhanced Tool Execution

Let's delve into the details of each component.

1. Concurrency Registry (in RubyLLM module)

The concurrency registry, housed within the RubyLLM module, manages the available concurrency executors. This registry allows users to register custom executors and provides a central point for managing concurrency strategies.

# lib/ruby_llm.rb - Add to class << self block
module RubyLLM
 class << self
 def tool_executors
 @tool_executors ||= {}
 end

 def register_tool_executor(name, &block)
 tool_executors[name.to_sym] = block
 end
 end
end

# After module definition (like Provider.register calls)
RubyLLM.register_tool_executor(:async) do |tool_calls, &execute|
 begin
 require 'async' unless defined?(Async)
 rescue LoadError
 raise LoadError, "The 'async' gem is required for concurrent tool execution. " \
 "Add `gem 'async'` to your Gemfile."
 end

 Async do
 tool_calls.map do |_id, tool_call|
 Async { [tool_call, execute.call(tool_call)] }
 end.map(&:wait)
 end.wait
end

The registry stores executors as a hash, with the executor name as the key and the execution block as the value. The :async executor, utilizing the Async gem, is registered as the default concurrent executor.

2. Chat Configuration

The Chat class is extended to include a concurrency attribute, allowing users to specify the desired concurrency mode. The with_tools method is modified to set this attribute.

class Chat
 attr_reader :model, :messages, :tools, :params, :headers, :schema, :concurrency

 def with_tools(*tools, replace: false, concurrency: nil)
 @tools.clear if replace
 tools.compact.each { |tool| with_tool tool }
 @concurrency = concurrency
 self
 end

 private

 def concurrent_tools?
 return false unless @concurrency

 unless RubyLLM.tool_executors.key?(@concurrency)
 raise ArgumentError, "Unknown concurrency mode: #{@concurrency}. " \
 "Available: #{RubyLLM.tool_executors.keys.join(', ')}"
 end

 true
 end
end

The concurrent_tools? method checks if a valid concurrency mode is set, ensuring that the specified mode exists in the registry.

3. Enhanced Tool Execution

The handle_tool_calls method is enhanced to support concurrent execution. This involves executing tool calls concurrently and handling callbacks appropriately.

def handle_tool_calls(response, &)
 halt_result = nil

 if concurrent_tools?
 # Execute all tools concurrently with callbacks firing during execution
 results = execute_tools_concurrently(response.tool_calls)

 # Add messages sequentially after concurrent execution completes
 results.each do |tool_call, result|
 @on[:new_message]&.call
 tool_payload = result.is_a?(Tool::Halt) ? result.content : result
 content = content_like?(tool_payload) ? tool_payload : tool_payload.to_s
 message = add_message role: :tool, content:, tool_call_id: tool_call.id
 @on[:end_message]&.call(message)

 halt_result = result if result.is_a?(Tool::Halt)
 end
 else
 # Current sequential behavior
 response.tool_calls.each_value do |tool_call|
 @on[:new_message]&.call
 @on[:tool_call]&.call(tool_call)
 result = execute_tool(tool_call)
 @on[:tool_result]&.call(result)

 tool_payload = result.is_a?(Tool::Halt) ? result.content : result
 content = content_like?(tool_payload) ? tool_payload : tool_payload.to_s
 message = add_message role: :tool, content:, tool_call_id: tool_call.id
 @on[:end_message]&.call(message)

 halt_result = result if result.is_a?(Tool::Halt)
 end
 end

 halt_result || complete(&)
end

def execute_tools_concurrently(tool_calls)
 executor = RubyLLM.tool_executors[@concurrency]

 # Callbacks fire INSIDE concurrent execution (before/after each tool)
 executor.call(tool_calls) do |tool_call|
 @on[:tool_call]&.call(tool_call)
 result = execute_tool(tool_call)
 @on[:tool_result]&.call(result)
 result
 end
end

Important note on callbacks:

  • on_tool_call and on_tool_result fire during concurrent execution (may interleave in thread-based adapters).
  • on_new_message and on_end_message fire sequentially after all tools complete (message ordering is deterministic).
  • In :async mode, callbacks are fiber-safe (cooperative scheduling).
  • In :threads mode, callbacks must be thread-safe (user responsibility).

This enhanced execution flow ensures that tool calls are executed concurrently while maintaining the integrity of the callback system.

User-Defined Adapters: Tailoring Concurrency to Your Needs

The extensible concurrency registry allows users to define their own concurrency adapters, enabling fine-grained control over tool execution. This flexibility is crucial for adapting RubyLLM to diverse environments and use cases.

# In an initializer or before using RubyLLM
RubyLLM.register_tool_executor(:my_custom_executor) do |tool_calls, &execute|
 # tool_calls is a Hash: { id => ToolCall, id => ToolCall, ... }
 # execute is a block that runs callbacks + executes the tool
 # Must return: Array of [tool_call, result] tuples

 results = []
 tool_calls.each_value do |tool_call|
 result = execute.call(tool_call)
 results << [tool_call, result]
 end
 results
end

# Use it
chat.with_tools(MyTool, concurrency: :my_custom_executor)

Adapter Contract: Ensuring Compatibility

Each adapter block adheres to a specific contract, ensuring seamless integration with the RubyLLM framework.

The adapter block receives:

  • tool_calls - Hash of { id => ToolCall } objects (ordered by insertion).
  • &execute - Block that:
    1. Fires on_tool_call callback.
    2. Executes the tool.
    3. Fires on_tool_result callback.
    4. Returns the result.

The adapter must return:

  • Array of [tool_call, result] tuples (in the same order as input).

Important: Results should be returned in the same order as tool_calls to ensure deterministic message ordering. Ruby Hashes maintain insertion order, so iterating tool_calls preserves order.

Example: Thread Pool with Limit

To illustrate the power of custom adapters, let's examine an example that implements a thread pool with a concurrency limit.

require 'concurrent-ruby'

RubyLLM.register_tool_executor(:thread_pool) do |tool_calls, &execute|
 pool = Concurrent::FixedThreadPool.new(5)
 futures = tool_calls.map do |_id, tool_call|
 Concurrent::Future.execute(executor: pool) do
 [tool_call, execute.call(tool_call)]
 end
 end
 futures.map(&:value)
ensure
 pool.shutdown
end

This adapter utilizes the concurrent-ruby gem to create a fixed-size thread pool, limiting the number of concurrent tool executions. This is particularly useful for managing resource consumption and preventing overload.

Example: Custom Error Handling

Another compelling use case for custom adapters is implementing custom error handling. The following example demonstrates an adapter that gracefully handles tool execution errors.

RubyLLM.register_tool_executor(:resilient_async) do |tool_calls, &execute|
 require 'async' unless defined?(Async)

 Async do
 tool_calls.map do |_id, tool_call|
 Async do
 result = begin
 execute.call(tool_call)
 rescue StandardError => e
 { error: e.message, tool: tool_call.name }
 end
 [tool_call, result]
 end
 end.map(&:wait)
 end.wait
end

This adapter wraps each tool execution in a begin...rescue block, catching any exceptions and returning an error payload. This ensures that tool execution failures do not disrupt the overall process.

Advanced: CPU-Bound Tools and Ractor Considerations

While the :async adapter is ideal for I/O-bound operations, CPU-bound tools may benefit from thread-based concurrency. Users can register a :threads adapter for such cases.

RubyLLM.register_tool_executor(:threads) do |tool_calls, &execute|
 tool_calls.map do |_id, tool_call|
 Thread.new { [tool_call, execute.call(tool_call)] }
 end.map(&:value)
end

chat.with_tools(CPUIntensiveTool, concurrency: :threads)

Note on Ractors: Ractors, Ruby's actor-based concurrency model, have strict isolation requirements that are incompatible with the block-based registry pattern. If Ractor support is needed, it would require a different architecture. For now, :async (I/O-bound) and :threads (CPU-bound) cover the common cases.

Recommendation: Ship with :async. Add :threads if there's demonstrated need - the registry makes this trivial.

Benefits: Unleashing Performance and Scalability

The introduction of concurrent tool execution in RubyLLM brings a multitude of benefits.

  1. Performance: Execute multiple API calls in approximately the time of one, significantly reducing latency.
  2. Simple API: Enable the feature with a single method call, minimizing complexity.
  3. Extensible: The registry pattern makes adding new concurrency strategies effortless.
  4. No Breaking Changes: Sequential execution remains the default, ensuring backward compatibility.
  5. Fits Philosophy: The design aligns with RubyLLM's principles of beautiful, minimal APIs and sensible defaults.

Example Usage: Weather Tool Scenario

Consider a scenario involving a weather tool that makes HTTP requests to retrieve weather information for a given city.

# Weather tool that makes HTTP requests
class Weather < RubyLLM::Tool
 description "Get weather for a city"
 param :city

 def execute(city:)
 # I/O bound - perfect for async
 response = Faraday.get("https://api.weather.com/#{city}")
 JSON.parse(response.body)
 end
end

# Multiple cities requested - concurrent execution
chat = RubyLLM.chat
 .with_tools(Weather, concurrency: :async)

chat.ask "What's the weather in NYC, London, and Tokyo?"
# Model returns 3 tool calls
# All 3 HTTP requests execute concurrently
# Total time: ~1 request instead of ~3

In this example, the model requests weather information for three cities. With concurrent execution enabled, the three HTTP requests are executed concurrently, significantly reducing the overall response time.

Questions for Discussion: Addressing Nuances and Edge Cases

To ensure a robust and well-rounded implementation, several questions merit discussion.

  1. Callback Interleaving: In concurrent mode, on_tool_call and on_tool_result may fire concurrently (interleaved). Is this acceptable?

    • Current Behavior: on_tool_call → execution → on_tool_result (sequential per tool).
    • Concurrent Behavior: Same semantics per tool, but multiple tools run simultaneously.
    • Implication: Users with thread-unsafe callbacks need to handle synchronization.
    • Recommendation: Document this clearly; :async is fiber-safe (cooperative), :threads requires care.
  2. Error Handling: If one tool fails, should others continue?

    • Recommendation: Yes, continue all tools. The adapter can handle errors (see resilient example). Exceptions propagate naturally.
  3. Rate Limiting: Should we add semaphore support in the future?

chat.with_tools(W, concurrency: :async, max_concurrency: 5)


    *   **Recommendation:** Not in v1, users can implement via custom adapters (see thread pool example).

4.  **Multiple `with_tools` Calls:** How to handle concurrency setting?

    ```ruby
 chat.with_tools(A, concurrency: :async).with_tools(B) # B resets to nil or inherits?
*   **Recommendation:** Each call overrides the setting (last one wins, consistent with **`replace:`**).
  1. Async Gem as Dependency: Should we add async as a runtime dependency or keep it optional?

    • Recommendation: Keep optional, require when :async is used (current proposal).

Implementation Plan: A Phased Approach

The implementation of concurrent tool execution will follow a phased approach, ensuring a smooth and well-tested integration.

  1. Phase 1: Core Implementation
    • Add concurrency: argument to with_tools.
    • Add registry with RubyLLM.register_tool_executor.
    • Ship :async adapter.
    • Add tests.
    • Update documentation.
  2. Phase 2: Refinements (if needed)
    • Add error handling patterns.
    • Add :threads adapter if requested.
    • Performance benchmarks.
  3. Phase 3: Advanced Features (if requested)
    • Rate limiting (semaphore support).
    • Ractor adapter for CPU-bound tools.
    • Per-tool concurrency control.

Conclusion: Embracing Concurrency for Enhanced RubyLLM Applications

This proposal introduces concurrent tool execution to RubyLLM, aligning with the library's philosophy of beautiful, minimal APIs. By using the simple keyword argument concurrency: :async, developers can unlock significant performance improvements. The extensible registry pattern allows for future adapter additions, and the included :async adapter caters to I/O-bound tools. Backward compatibility is preserved, ensuring a smooth transition for existing applications.

The design adheres to crmne's preference for options like Async and Ractors without necessitating heavy handler classes. The registry pattern simplifies the addition of new concurrency modes, making RubyLLM a versatile and powerful tool for building intelligent applications.

Suggested Issue Title: "Support concurrent execution of multiple tool calls"

Suggested Label: [FEATURE]

For more information on concurrent programming in Ruby, visit the official Ruby Concurrency Documentation.