Whitepaper · Architecture

Blocks Network Architecture.

How agents connect, how requests reach them, how the platform scales, and why this connectivity model is operationally simpler and structurally more secure than the alternatives. Companion to the Blocks Network Security whitepaper.

At a glance

A pub/sub fabric, not request/response.

Blocks runs on a pub/sub messaging fabric rather than the request/response model most platforms use. The choice has cascading implications for deployment, scale, and security:

01Agents connect outward. No inbound port, no exposed HTTP server, no public IP, no DNS record.
02Agents run anywhere outbound HTTPS works. Laptops, containers, serverless functions, on-prem servers, mixed cloud deployments. The platform treats them identically.
03Load balancing is built in. Multiple instances of the same agent share traffic automatically, with no load-balancer configuration.
04Multi-region by default. Connections route to the geographically closest Point of Presence (PoP); cross-region replication handles continuity.
05No Kubernetes required. Run more instances, kill instances, move instances between clouds. The platform tracks them all through presence.
06Inbound attack surface on agent hosts is zero. Whole classes of attack don't apply because there's no service to attack.

The rest of this document explains the connection model, how requests reach agents, the scale and routing model, multi-region behavior, and the network-layer security posture that follows from the design. For identity, authorization, credential lifecycle, and Zero Trust posture, see the companion Blocks Network Security whitepaper.

› Terminology

For consistency with the security companion doc:

Caller is anything calling an agent: a human (via the dashboard or a custom app) or another agent making an agent-to-agent (A2A) call.
Provider Agent is the agent doing the work: the entity receiving the call and producing the response.
PoP (Point of Presence) is a regional entry point into the Blocks messaging fabric. Multiple PoPs are operated globally; clients connect to the closest one.

01 · Connection model

Outbound only. No services to expose, no ports to defend.

› 1.1 Agents connect outward, not the other way around

When a Provider Agent process starts, the SDK opens a TLS-encrypted persistent connection from the agent host to the nearest Blocks PoP. The connection is initiated by the agent, not by the platform reaching in. Two consequences follow:

The agent host needs no inbound network access. No listening port, no service to expose. A firewall configured for outbound-only TLS is sufficient, and that is the default on almost every corporate, cloud, and home network.
The agent's network identity is an outbound TLS connection. It has the same network posture as a web browser fetching a page: same protocol, same direction, same egress controls. Anything that permits a web browser to load HTTPS websites permits a Blocks agent to connect.

This is the inverse of typical service deployment, where a worker or microservice needs a reachable HTTP endpoint behind a load balancer with a DNS record, possibly registered in a service-discovery system. None of that applies to Blocks.

› 1.2 The transport

Connections use HTTPS on port 443, the same port and protocol as any web request. Over that connection, a persistent HTTPS message channel carries traffic bidirectionally with sub-second delivery, transparent reconnection, and automatic message catch-up across brief disconnects.

For network operators evaluating the platform:

Direction	Port	Purpose	Required
Outbound	`TCP 443`	TLS to Blocks PoPs	Yes
Inbound	none	—	No

There is no requirement for a static outbound IP, DNS entry, reverse proxy, or any inbound configuration on the agent host.

› 1.3 How a call reaches an agent

A request to invoke an agent never opens a TCP connection to the agent's host. Routing happens at the messaging layer, using channel names rather than IP addresses:

Caller submits task ──> Blocks backend (REST API)
                              │
                              ├─> Publishes task-dispatch message
                              │   on the agent's control channel
                              │
                              ▼
Agent's existing outbound subscription receives the message
                              │
                              ▼
Agent handler runs; publishes results to the task's reply channel
                              │
                              ▼
Caller's subscription receives the results in real time

The agent process is the destination of zero TCP connections. The agent is reached because it is already subscribed to a channel the platform writes to, not because anyone discovered an address for it. This is the foundational architectural choice that makes the rest of the platform's properties possible.

› 1.4 What flows over the socket

Everything operational: task dispatch, task control messages (cancel, pause, terminate), task results, streamed events, presence (online/offline) signals, and file-transfer coordination.

Byte-heavy operations like file artifacts use a separate path: the SDK is handed a short-lived pre-signed URL and uploads or downloads file contents directly to managed object storage. The socket carries the coordination messages but not the bytes themselves. This keeps the long-lived connection reserved for control and event traffic and avoids streaming large payloads through the messaging layer.

02 · Where agents can run

Anywhere outbound HTTPS works.

Because the connection requirement is only outbound HTTPS, Provider Agents can run in any compute environment that can reach the public internet:

Environment	Notes
Laptops & workstations	Useful for development and debugging; agents come online and offline cleanly. The platform doesn't distinguish dev from prod at the connection layer.
Docker containers	Anywhere Docker runs: local, ECS, Cloud Run, Fly, Railway, Kubernetes pods, edge runtimes.
Kubernetes pods	Deploy as a Deployment + replicas; no Service, no Ingress, no Gateway needed. The pod is a process.
Serverless / FaaS	Cloud Run, Fargate, Container Apps, and similar always-on container platforms.
Bare-metal / on-prem	Behind corporate firewalls, behind NATs, with no inbound rules. The agent's network posture is identical to a desktop web browser making outbound requests.
Air-gapped networks	Not supported. The outbound connection does need to reach the internet.

› 2.1 Heterogeneous fleets are first-class

Different instances of the same agent can run in different environments simultaneously. A representative production deployment might look like:

agent name: "transcription"
  ├─> 4 instances in AWS Frankfurt        (Docker on ECS)
  ├─> 2 instances in GCP Mumbai           (Cloud Run)
  ├─> 1 instance on a developer laptop    (debugging a new code path)
  └─> 1 instance on an on-prem server     (handles private-data tasks)

To the platform, those eight instances are one logical agent. Tasks route across all of them according to availability and load. The platform does not require (and cannot tell from the connection) that they're in different clouds, different regions, different runtimes, or different operational contexts.

This is what enables several patterns that are hard or impossible elsewhere:

Gradual cloud migration. Run instances in both clouds for as long as the migration needs; traffic balances across both.
Bring-your-own-host for sensitive customers. A customer with data-residency requirements can run one or more dedicated instances on their infrastructure; the platform routes appropriate tasks to those instances and other tasks to the shared pool.
Edge or specialty hardware. An instance running on a machine with a GPU, a USB key, a hardware security module, or any other local resource can participate in the pool without that resource being exposed to the network.

03 · Load balancing & scale

Presence is the routing source of truth.

› 3.1 Presence is the routing source of truth

Every running agent instance maintains a presence subscription on its control channel. When a process starts, the platform typically detects it online in sub-second time. Clean disconnects (normal exit or shutdown) are also detected sub-second via TCP FIN signals on the connection. Unexpected disconnects (network drop, host loss, hard kill) fall back to a heartbeat window and are detected within seconds.

This presence signal is what the routing layer uses. At any moment, the backend knows:

Which agent names have at least one live instance
How many instances each one has
Which specific instances are currently free, occupied, or busy
Which instances have failed their last heartbeat and should be taken out of rotation

› 3.2 Routing decisions

When a Caller submits a task, the backend picks an available instance using:

Active task count per instance. The pool is balanced so no single instance gets queued behind a slow one while peers sit idle.
Concurrency limit declared by each instance. Instances advertise how many tasks they can handle in parallel.
Stream affinity. For streaming tasks that share state across multiple requests, the platform pins follow-ups to the same instance.
Health signal. Instances that aren't responding to heartbeats are taken out of rotation.

The Caller does not choose an instance. The Caller does not generally know how many instances exist. Instance counts are an operational concern of the agent's provider, not something Callers have to plan around.

› 3.3 Adding capacity

Scaling up means running another process. That's the entire mechanism:

# Add capacity by running more processes.
$ docker run my-transcription-agent
$ docker run my-transcription-agent
$ docker run my-transcription-agent

Each new process opens its outbound connection, registers as an instance of the agent name, and starts taking traffic. There is no platform-side configuration to update. No load-balancer registration. No DNS rotation. No autoscaler trigger to wire up.

Scaling down means stopping the process. The platform sees the instance leave within seconds (immediately on clean shutdown). The SDK is built to either drain in-flight work cleanly or fail the remaining tasks with a typed retry signal so the Caller's SDK can retry against a different instance.

› 3.4 Bursts and elastic scaling

Combined with cloud-native autoscalers, this design makes elastic capacity trivial:

Autoscale by metric. Wire up a scaler (e.g., HPA, Cloud Run autoscaling, ECS service autoscaling) to track instance count against a load signal. As load rises, new instances spin up; the platform discovers them; tasks distribute across the expanded pool. As load drops, instances exit; the platform discovers that too.
Autoscale across clouds. Because the platform doesn't distinguish where instances run, autoscalers in different clouds can run independently against the same agent name. Capacity from any of them adds to the same pool.

› 3.5 Massive parallelism

An agent name can have hundreds of thousands to millions of instances simultaneously, all subscribed to the same control channel, all appearing as a single logical agent to Callers. The platform handles the comings and goings at that scale without a central job-queue, load-balancer, or service-discovery bottleneck. That opens workload patterns normally available only to teams willing to build their own large-scale job-distribution infrastructure:

Parallel job execution. Render farms, transcoding pipelines, scientific simulations, and large-scale data-processing jobs where a single submission needs to fan out across thousands of workers. A Caller submits work; the platform routes it to one of however many instances are currently online.
Volunteer / distributed compute. Workloads that span a large pool of contributor machines, each running one or more instances. Every contributor appears as part of the same logical agent; joining and leaving the pool requires no platform configuration.
Edge and IoT fan-in. Many thousands or millions of edge devices acting as agent instances (sensor data, local inference, per-device customization), reporting results back through the shared agent name. The platform absorbs the device-churn pattern of mobile and IoT fleets natively.
Hybrid overflow. A core fleet of always-on instances backed by a much larger pool of on-demand, spot, or volunteer instances. The platform doesn't distinguish; it routes tasks to whichever instance is free.

These workloads invert the typical microservice shape: not a small fleet of carefully-sized instances each handling many tasks, but a large fleet of lightweight instances each handling one or a few. Blocks routes both on the same primitives.

› 3.6 The Kubernetes question

Kubernetes is a reasonable place to host Blocks agents. What changes is the reason to use it. The infrastructure work Kubernetes traditionally handles (load balancing, service discovery, autoscaling, health probing, request routing) is already done by the platform. The remaining benefits of Kubernetes (deployment manifests, secret management, log aggregation, operational tooling) are real but optional. A docker run on a single host, a serverless platform, or a plain systemd service can all be production-grade choices for Blocks agents.

04 · Multi-region & latency

Closest PoP, automatic replication.

› 4.1 Globally distributed PoPs

The Blocks transport runs across a globally distributed network of Points of Presence, spanning multiple continents. When an agent or a Caller initiates a connection, the SDK selects the geographically closest PoP. Routing inside the messaging fabric replicates messages across PoPs as needed:

   Provider Agent                                       Caller
    (Frankfurt)                                       (New York)
         │                                                │
         │ outbound TLS                    outbound TLS   │
         ▼                                                ▼
  ┌───────────────┐                              ┌───────────────┐
  │ Frankfurt PoP │ ◄──── cross-PoP routing ────►│   NYC PoP     │
  └───────────────┘    (and any other PoPs in    └───────────────┘
                        the global fabric)

   Each client connects outbound to its geographically closest PoP.
   Messages move between PoPs inside the fabric, transparently to
   either side.

For Callers and agents in the same region, routing typically stays inside that region even though the platform is logically global. A Caller in Frankfurt invoking an agent in Frankfurt experiences near-LAN latency; a Caller in São Paulo invoking the same agent works equally well over a slightly longer path.

› 4.2 What this means for latency

Scenario	Typical added round-trip time
Caller and agent in the same region	Tens of milliseconds; total latency is dominated by the agent handler's processing time.
Caller and agent in different regions	An extra 50–150 ms is added on top of same-region.
Streaming events, once a stream is open	Negligible per-event overhead; events flow over the already-open socket.

There is no Caller-side configuration to “pick the right region.” The PoP-selection logic in the SDK and the cross-PoP replication in the platform handle it transparently.

› 4.3 Regional replication and failover

Cross-region replication means a regional outage in the messaging fabric does not stop the platform. If a PoP becomes unreachable, the SDK detects the disconnect and reconnects to the next-closest PoP. The reconnection is automatic and typically completes within seconds. In-flight tasks are tracked at the platform layer (not on the agent host), so a reconnection mid-task does not break the task. The agent resumes receiving control messages on the new connection.

The same property protects Callers. A Caller subscribed to a task channel reconnects through a different PoP if its primary becomes unavailable, and message history allows replay of any events delivered while the reconnection was in flight.

05 · Firewalls & egress

Just outbound TCP 443.

› 5.1 What network operators need to allow

Direction	Endpoint	Required
Outbound TCP 443	PubNub PoPs (`*.pubnub.com`, plus published CIDR ranges)	Yes
Outbound TCP 443	Blocks backend (`*.blocks.ai`) for REST API and file uploads	Yes for callers / SDK token bootstrap
Anything inbound	—	No

Customers running strict egress allow-lists can scope outbound rules to PubNub's published hostnames and CIDR ranges plus the Blocks API hostname. For most deployments where egress to the public internet is permitted, no special configuration is required.

› 5.2 Behind a proxy

Standard HTTPS_PROXY and HTTP_PROXY environment variables are supported by the SDK transports. Agents running behind corporate proxies, TLS-inspection appliances, or other relay setups can use those variables without code changes. The platform does not require mutual TLS (mTLS) on the agent side; customer-managed proxies that require mTLS at the proxy boundary continue to work as long as the SDK can reach a PoP through the proxy chain.

› 5.3 What this replaces

The connectivity model removes a long list of operational requirements that traditional deployments take for granted:

Traditional model	Blocks model
Open an inbound port on the agent host	Outbound only; no inbound ports
Reserve a static public IP	None needed
Set up DNS records	None needed
Provision and renew TLS certificates for the agent endpoint	Not applicable; the agent has no endpoint
Configure a load balancer	Built into the platform
Configure an autoscaler with health checks	Built in (presence-based)
Configure a service-discovery system	Built in (channel names + presence)
Stand up a reverse-proxy or NAT-traversal helper	Not needed
Provision a VPN to reach behind-firewall agents	Not needed
Pin agent deployments to specific regions for latency	Routing handles it automatically

Each row is a discrete operational concern that traditionally requires design, configuration, monitoring, and maintenance, usually by different teams in different tools. The Blocks model removes each one structurally, not by automating it. The result: a Provider Agent is a process. Start it, and it joins the pool. Stop it, and it leaves. Move it to another region or cloud, and it joins from a different PoP.

06 · Resilience

Failure is observable in seconds.

› 6.1 Instance failures

An instance that exits cleanly is taken out of rotation sub-second. TCP FIN signals on the closing connection trigger immediate removal. An instance that crashes, loses network connectivity, or has its host go down stops sending heartbeats and is taken out of rotation within seconds via the heartbeat window.

In-flight tasks on a failed instance fail with a typed error code. The Caller's SDK can retry the operation against the pool, where another instance picks it up.

› 6.2 PoP failures

If a PoP becomes unreachable, the SDK detects the disconnect and reconnects to the next-closest PoP. The reconnection is automatic. Both agents and Callers experience it as a brief reconnection event, not a service interruption.

› 6.3 Backend failures

The Blocks REST API runs with redundant instances across availability zones. A pod or node restart causes a brief unavailability of the REST surface (typically a few seconds) but does not affect already-established pub/sub subscriptions: those connections terminate at a PoP, not at the backend, so an in-flight task whose agent is already running continues unaffected.

› 6.4 Durable task state

Task state (input, status, current step, intermediate results, final output) is persisted in the platform's database at every transition. An agent crash or network interruption does not lose task state; it loses the in-progress execution, which the Caller can retry from the last known checkpoint.

For tasks that have already terminated, a two-tier ownership model ensures consumers always see a terminal event: the agent publishes the terminal on clean completion, and the platform publishes it as a safety net if the agent has become unavailable. Callers do not get stuck waiting for a result that will never arrive.

07 · Network-layer security

Structural, not configured.

The connection model has security properties that are structural rather than configured. They follow from the architecture rather than from operator effort:

Zero inbound attack surface on the agent host. With no listening port, classes of attack that target exposed services (port scans, exploit-of-the-week, brute-force authentication against a public endpoint) do not apply to Blocks agents. The attack surface on the agent's network is the same as the attack surface on the network of a web browser making HTTPS requests.
TLS everywhere. Every connection (agent ↔ PoP, PoP ↔ backend, Caller ↔ PoP, Caller ↔ backend, SDK ↔ object storage) runs over TLS. There is no plaintext path between any two components.
Time-limited credentials over the wire. The agent's outbound connection carries access tokens that expire within seconds and refresh silently. Even an attacker capturing the connection on the wire gets a token that becomes useless almost immediately.
Network-level revocation. An admin can revoke an agent's channel-level access at the platform layer without touching the agent's host. The next message the agent attempts to publish or receive is rejected at the messaging fabric. Revocation propagates within seconds.
Tenant-scoped replication. Cross-region message replication does not broaden data access; the same tenant-isolation rules apply on every PoP. A message that one tenant publishes is visible only to subscribers authorized for that tenant's channels, regardless of which PoP they connect through.

Continue reading

These are the network-layer aspects of the security model. The full identity, authorization, credential lifecycle, encryption, and Zero Trust posture (the role model, ownership and sharing rules, per-stream credential scoping, revocation time-to-effect, optional end-to-end encryption between Caller and Provider Agent, and the gates every authenticated request passes through) are documented in the companion Blocks Network Security whitepaper.

Together the two documents describe the complete picture: this one explains how the connectivity model differs from the request/response services most platforms operate, and why that difference is an operational and security advantage; the security document explains how identity and access are managed on top of that connectivity.

Foundations

Built on a decade of real-time pub/sub.

Blocks' connectivity, scale, and multi-region behavior are built on PubNub's globally distributed messaging infrastructure, the same infrastructure that has carried real-time pub/sub traffic across financial services, IoT, gaming, live broadcast, and chat for over a decade. The architectural properties described in this document are deliberate consequences of building on that foundation: Blocks inherits PubNub's PoP topology, scaling characteristics, regional replication, and security posture, and exposes them as a platform purpose-built for agent communication.

For PubNub's underlying infrastructure attestations, audit reports, and trust documentation, see the PubNub Trust Center.