Algorand Node API Acceleration

Accelerating Performance of the Algorand Node API

Share this article:

It’s been a few months since we launched our Algorand API as a Service. In that time, we have gotten a lot of great feedback on what is working well and what still needs more work.

One thing we have heard and observed ourselves is that certain operations time out and don’t return any data. Digging into this, we observed that one of the queries that we provide has highly variable performance associated with it. Our API service is fronting pools of Algorand nodes, and calls are serviced by the Algorand Node Rest API on these nodes. The performance issues we are seeing are issues with a specific query to the Node Rest API itself.

The /v1/account/{address}/transactions Endpoint

The REST endpoint that has the performance issues is /v1/account/{address}/transactions. This query is only available if you run a full archival indexer node.

The purpose of this endpoint is to be able to query the transaction history for a given account. It is a useful query from a developer point of view as you could use it to, for example, populate a transaction history when looking at an account in a wallet or other application.

Sometimes, this query returns quickly and sometimes, the query times out. By default, the query has a max result set of 100. The current behavior of the query is that, given an account, the indexer node will start walking backwards from the head of the chain looking for transactions to fill its result bucket that meet the query constraints.

This becomes problematic for accounts with relatively low transaction volume. If there has been a lot of transaction activity on the account, the query will reach the result set limit quickly and return. If there has not been a lot of transaction activity, it will keep walking backwards — all the way to genesis — looking for transactions that meet the query criteria. In this second, low-transaction activity scenario, the query generally times out and also has the side effect of making the node unresponsive to other queries.

This second scenario is problematic, since users are actively hitting this endpoint, which starts taking nodes in our node pools offline for periods of time. While we have plenty of capacity in our node pools, if enough of these queries were received in rapid succession, we could suffer an outage due to a kind of denial of service situation.

Even for accounts that have a large transaction volume, when parameters restricting the scope of the search are used, for instance fromDate and toDate, the query becomes non-performant and the service suffers.

Improving the Endpoint with a Backend Datastore

First, it is important to recognize that the node can’t be optimized for every situation. It already performs a variety of different roles including supporting consensus, relaying, etc. However, given the current behavior of this query, we must provide an improved path to our customers so they can reliably retrieve transaction data.

We have decided to replace the backend handler for this query with a datastore that is optimized to return results much more quickly and efficiently.

This backend datastore is based on AWS Aurora and includes a set of AWS lambda data management routines to keep this datastore reliably in sync with the Algorand TestNet, MainNet, and BetaNet. Queries coming into this particular endpoint will be serviced by this new datastore. All other queries will remain serviced by our node pools.

GET STARTED WITH PURESTAKE’S API SERVICE FOR ALGORAND

Philosophical Questions

The direct node API represents the truth in terms of the state of the Algorand blockchain. The downside to servicing this query with an alternate third-party data store is that there is a chance that the third-party will return the wrong data due to a bug or other problem with the infrastructure.

At PureStake, we spent some time debating this situation. We could have just turned off the endpoint, but that didn’t seem like a good solution, and certainly not helpful to developers trying to build on Algorand. After all, this is a useful query to have available for a variety of use cases.

Additionally, we had to consider that the way the query works against the node isn’t necessarily what most developers want. Even for the scenarios where the query returns — accounts with high transaction activity — you only get the last 100 transactions, not all of them.

In the spirit of providing a solution to the intent behind this endpoint, we decided to move ahead with offering a more performant version of this query, while at the same time making it compatible with the available SDKs. However, we will clearly distinguish between queries that are serviced by the node and ones that are serviced by other infrastructure we are running. We will mark this particular query in the response headers and in our portal as being serviced in a different way to the other node api queries. We feel this approach strikes a good compromise and helps developers building on Algorand.

Comparative Performance of the New Endpoint

To take a concrete example, the following query when run against a TestNet indexer node will time out (on a reasonably spec’d machine):

GET /v1/account/EWZYOHWLR2C44MDIPNZMOGZMAAY66BWI2ALGJB2NE22TWO7YGNCS7NTFVQ/transactions

The state of account EWZYOHWLR2C44MDIPNZMOGZMAAY66BWI2ALGJB2NE22TWO7YGNCS7NTFVQ is that it has 2 transactions in its history, less than the 100 max results query limit, so the node will start walking back towards genesis looking for transactions. The query times out in 30 seconds for API users, but will continue running on the node until complete – in a test for this article the query was manually stopped after 25 minutes. The io / iops on the node goes very high while the query is running on the node and consumes the allotted burst capacity, eventually locking the machine up.

Backed by the new datastore queries to this endpoint return in under 1 second generally or 3 seconds or under from a cold lambda start .

Next Steps and Future Direction

It isn’t possible for the node to achieve high performance for all queries.

PureStake’s new datastore will be the basis of a new query-optimized set of endpoints that will be offered alongside the node-based APIs. These APIs would be well-suited to certain types of applications, such as explorers, wallets, etc. However, users will have the choice to opt for the node-backed APIs or the query-optimized APIs, depending on their preferences.

PureStake will likely continue to create different kinds of optimized data stores over time in order to support different types of queries and use cases. A de-normalized data warehouse is another obvious optimization for aggregate and over-time type data queries that aren’t possible with the current node API.

We may remove our query-optimized endpoint for this specific node API in the future, if the performance and behavior of the underlying node API changes.

Are there other APIs or features you would like to see added to the PureStake API services application? Reach out to us and let us know.

 

Share this article: