Justin Paulson · 2025-04-30 · ruby, rails

Streaming LLM Responses with Rails: SSE vs. Turbo Streams

In the world of Rails development, integrating large language models (LLMs) like OpenAI's GPT has become increasingly common. One challenge developers face is streaming these responses efficiently to provide a smooth user experience.

This post will explore some different techniques for streaming LLM responses in Rails applications. We'll look at using server-sent events (SSE) and Turbo Streams as two different options for delivering streaming interfaces in Rails applications. We'll also provide some code examples for a demo chat application we made — it has three different bot personalities you can interact with through SSE or Turbo Streams.

Implementing LLM streaming with SSEs

SSEs are a simple, yet effective way to push data from a server to a webpage in real time. They're particularly well suited for streaming LLM responses due to their simplicity and wide browser support.

How SSEs work

When using an SSE, the server maintains an open connection and sends events to the client, while the client listens for these events and updates the UI accordingly. The event names need to be coordinated between the server and client. The names and data for each event are customized, and there is no standard set of events to expect in an SSE lifecycle. This provides flexibility — but you might encounter issues if the backend and client are not in sync.

Implementing SSEs in Rails

To create an SSE response from a Rails controller, you first need to include the ActionController::Live mix-in.

include ActionController::Live

After that, the SSE class provides all interactions with the event stream. For our case, we'll also set the Content-Type to text/event-stream. Here is how we write events for an SSE stream:

response.headers['Content-Type'] = 'text/event-stream'
sse = SSE.new(response.stream)
ChatBot.response do |chunk, _|
  message = chunk.dig('choices', 0, 'delta', 'content') || ''
  sse.write({ text: message }, event: 'text')
end
sse.write({}, event: 'done')
sse.close

Here, we have created a text event and a done event that will stream to the client. On the client side, you initiate the connection and then create listeners for each of the relevant events. Here is how that might look in a Stimulus controller:

connect() {
  const botType = this.data.get("bot") || "ChatBot";
  const messagesJSON = JSON.stringify([]);

  this.eventSource = new EventSource(`/chat?bot=${botType}&messages=${encodeURIComponent(messagesJSON)}`);
  this.assistantMessage = "";

  this.eventSource.addEventListener('text', this.handleMessageEvent.bind(this));
  this.eventSource.addEventListener('done', this.handleDoneEvent.bind(this));
}

handleMessageEvent(event) {
  const data = JSON.parse(event.data);
  this.assistantMessage += data.text;
  this.outputTarget.textContent = this.assistantMessage;
}

handleDoneEvent(event) {
  this.eventSource.close();
}

Adding the streaming functionality

You can implement the streaming functionality in two ways, depending on your preferred approach.

For SSE streaming, the implementation focuses on writing chunks directly to the event stream:

def stream_with_sse(chunk)
  message = chunk.dig('choices', 0, 'delta', 'content') || ''
  sse.write({ text: message }, event: 'text')
end

For Turbo Streams, the implementation updates a database record that triggers broadcasts:

def stream_with_turbo(chunk)
  content = chunk.dig('choices', 0, 'delta', 'content') || ''
  message.update(content: message.content + content) if content
end

This dual approach allows for flexibility in how you deliver streaming content to users. With SSE, you get direct streaming through an event source connection. With Turbo Streams, you leverage Rails' broadcasting capabilities to update the UI in real time as message records also update.

HTTP method flexibility in SSE

It's important to note that while GET requests are the only standard supported for use with SSE, it's also possible to initiate SSE connections using POST requests. This flexibility can be particularly useful when dealing with large amounts of data or sensitive information.

GET requests

These are typically used for simple SSE connections where minimal data needs to be sent to the server.

Advantages: Simple to implement

Disadvantages: Limited by URL length (typically around 2,000 characters), less secure for sensitive data

POST requests

These can be used when you need to send larger amounts of data to initiate the SSE connection.

Advantages: No practical limit on data size, can send complex data structures, more secure for sensitive information

Disadvantages: Slightly more complex to implement, not cacheable or bookmarkable, and require the use of a dependency to handle the post on the front end.

Here is how you would use POST with SSE in Rails:

class ChatController < ApplicationController
  include ActionController::Live

  def chat
    response.headers['Content-Type'] = 'text/event-stream'
    @sse = SSE.new(response.stream)

    # Access POST parameters
    bot_type = params['bot'] || 'ChatBot'
    @messages = JSON.parse(params['messages'])

    begin
      chat_bot = create_chat_bot(bot_type)
      chat_bot.response
    ensure
      @sse.write({}, event: 'done')
      @sse.close
    end
  end

  private

  def create_chat_bot(bot_type)
    bot_type.constantize.new(sse: @sse, messages: @messages)
  end
end

On the client side, you'd initiate the SSE connection with a POST request:

const payload = JSON.stringify({conversation_history: yourConversationHistory});

// using POST
const eventSource = new fetchEventSource('/chat', {
  method: 'POST',
  body: payload,
  headers: {
    'Content-Type': 'application/json',
  }
});

// using GET
const searchParams = new URLSearchParams(payload);
const url = "/chat?" + searchParams.toString();
const eventSource = new EventSource(url);

eventSource.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  // Update your UI with the new data
});

By using POST requests with SSE, you can overcome the limitations of URL length and more securely handle sensitive or large amounts of data, such as conversation history for LLM interactions.

Implementing LLM streaming with Hotwire and Turbo Streams

Now, let's explore how to implement LLM response streaming using Hotwire and Turbo Streams in a vanilla Rails 7+ setup. We'll create a message model that updates in real time as the LLM generates content.

Setting up your Rails project

First, ensure your Rails 7+ project has Hotwire installed. It should be included by default in new Rails 7 applications.

Creating the message model

Generate a message model with role and content attributes:

rails generate model Message role:string content:text
rails db:migrate

Update the app/models/message.rb file to include broadcasting:

class Message < ApplicationRecord
  BOT_NAMES = {
    "ChatBot" => "Default",
    "BelleBot" => "Southern Belle",
    "PirateBot" => "Pirate"
  }

  validates :role, presence: true, inclusion: { in: %w[user assistant] }
  validates :content, presence: true
  validates :bot, presence: true, inclusion: { in: BOT_NAMES.keys}

  scope :for_bot, ->(bot) { where(bot: bot) }

  broadcasts_to ->(message) { [message.bot, "messages"] }, inserts_by: :prepend
end

This setup will automatically broadcast updates to the "chat" stream whenever a message updates.

Updating the controller

Update your ChatsController to handle message creation and updates:

class MessagesController < ApplicationController
  def create
    Message.create(message_params)

    @messages = Message.for_bot(@bot)
    @message = Message.create(role: 'assistant', content: '', bot: @bot)

    chat_bot = create_chat_bot(@bot)
    chat_bot.response
  end

  def index
    @messages = Message.for_bot(params[:bot]).reverse
    @current_bot = params[:bot]

    respond_to do |format|
      format.html
      format.turbo_stream do
        render turbo_stream: turbo_stream.replace(partial: "messages/index", locals: { messages: @messages, current_bot: @current_bot })
      end
    end
  end

  private

  def message_params
    content, @bot, role = params.merge(role: 'user').require([:content, :bot, :role])
    {content: content}.merge({bot: @bot, role: role})
  end

  def create_chat_bot(bot)
    bot.constantize.new(messages: @messages, message: @message)
  end
end

Generating a view

Create a view to display messages and handle real-time updates:

<!-- app/views/messages/index.html.erb -->
<div id="chat">
  <%= turbo_stream_from [@current_bot, "messages"] %>

  <div id="messages">
    <%= render partial: "messages/index", locals: { messages: @messages, current_bot: @current_bot } %>
  </div>

  <%= form_with(url: messages_path, method: :post, local: false) do |f| %>
    <%= f.hidden_field :role, value: 'user' %>
    <%= f.text_field :content %>
    <%= f.hidden_field :bot, value: @current_bot %>
    <%= f.submit "Send" %>
  <% end %>

  <div class="bot-selector">
    <% Message::BOT_NAMES.each do |bot, name| %>
      <%= link_to name, messages_path(bot: bot), data: { turbo_frame: "_top" }, class: @current_bot == bot ? 'active' : '' %>
    <% end %>
  </div>
</div>

Create a partial to render individual messages:

<!-- app/views/messages/_index.html.erb -->
<%= turbo_frame_tag "messages" do %>
  <% messages.each do |message| %>
    <div class="message <%= message.role %>">
      <%= message.content %>
    </div>
  <% end %>
<% end %>

Starting background jobs for LLM processing

Create a background job to handle the LLM response generation:

class GenerateLLMResponseJob < ApplicationJob
  queue_as :default

  def perform(message_id, bot_type)
    user_message = Message.find(message_id)
    assistant_message = Message.create(role: 'assistant', content: '', bot: bot_type)

    messages = Message.for_bot(bot_type)

    # Create the appropriate bot instance for streaming
    chat_bot = bot_type.constantize.new(message: assistant_message, messages: messages)

    chat_bot.response
  end
end

How streaming works with Rails broadcasting

When a user submits a message in the Turbo Streams implementation, a new message record is created.
An empty "assistant" message record is immediately created and will fill gradually.
The LLM integration is triggered to generate content.
As content is generated chunk by chunk, two potential streaming paths occur:
- SSE path: Each chunk is written directly to the event stream with sse.write({ text: message }, event: 'text').
- Turbo Streams path: Each chunk updates the message record with message.update(content: message.content + message).
With Turbo Streams, Rails automatically broadcasts each update via the broadcasts_to setup in the message model.
The client receives these real-time updates and progressively renders the response.

This dual approach provides flexibility in how streaming content is delivered. SSE offers direct streaming without database overhead, and Turbo Streams leverages Rails' broadcasting for a more integrated approach with persistence.

Both methods achieve the same user experience goal: showing the LLM response generating in real time rather than making users wait for the complete response.

SSE vs. Turbo Streams: A comparison

When choosing between SSEs and Turbo Streams for streaming LLM responses, it's important to consider their respective advantages and disadvantages.

SSEs

Advantages include:

Simplicity: SSE is straightforward to implement and understand.
Wide browser support: Most modern browsers support SSE natively.
Real-time updates: They provide a true streaming experience with minimal latency.
Lightweight: They use a single HTTP connection, which is more efficient than polling.
Automatic reconnection: Browsers automatically attempt to reconnect if the connection is lost.

Disadvantages include:

One-way communication: SSE is unidirectional; the server can only send data to the client.
Limited to same-origin requests by default: Cross-origin requests require additional setup.
Maximum number of open connections: Browsers typically limit the number of SSE connections.

Turbo Streams

Advantages include:

Rails integration: There is a deep integration with the Rails ecosystem and Hotwire.
Flexibility: Turbo Streams can work over WebSocket, SSE, or in response to form submissions.
HTML-based: Turbo Streams sends HTML fragments, which can be easier to work with in some cases.
Partial page updates: It efficiently updates only the necessary parts of the page.
Progressive enhancement: This approach works with or without JavaScript, improving accessibility.
Persistence: Messages are persisted in the database, making it easier to maintain conversation history.
Scalability: Background job processing allows for better handling of concurrent requests.

Disadvantages include:

Learning curve: Turbo Streams requires an understanding of Hotwire and Turbo concepts.
Overhead: For simple streaming tasks, there might be more overhead than SSE.
Less granular control: The abstraction layer might limit fine-grained control in some scenarios (for instance, manual intervention to stop a process can be difficult or delayed with background jobs).
Potential for larger payload: Sending HTML fragments can be more verbose than sending raw data.

Choosing the right approach

The choice between SSE and Turbo Streams depends on your specific use case.

Use SSE when:
- You need a simple, lightweight streaming solution.
- You want fine-grained control over the streaming process.
- Your application doesn't heavily rely on other Hotwire features.
Use Turbo Streams when:
- You're already using Hotwire in your Rails application.
- You need to update multiple parts of the page simultaneously.
- You want to leverage Rails' built-in view-rendering capabilities.
- You need a solution that gracefully degrades for clients without JavaScript.
- You want to take advantage of Rails' built-in broadcasting and job processing features.

For streaming LLM responses, both approaches can provide an excellent user experience. SSE offers a more straightforward, low-level approach, while Turbo Streams provides a more integrated solution within the Rails ecosystem. The Turbo Streams approach we've detailed in this post offers the additional benefits of easy bot personality-switching and seamless integration with active record and background jobs.

Ultimately, the best choice depends on your application's specific requirements, your team's familiarity with Hotwire, and how the LLM interaction fits into your broader application architecture.

Learn more

To see a complete working example of these approaches, check out our sample repository on GitHub. It demonstrates both SSE and Turbo Streams implementations for streaming LLM responses in Rails.

Our team is happy and hiring engineers — join us.