Posts

crypto infrastructure and devops best practices

DevOps Practices for Crypto Infrastructure, Part I: Version Control, Full Stack Automation, and Secrets Management

When standing up services that will have cryptographic interactions with a blockchain, the DevOps infrastructure and practices you employ will dictate a lot about the security and reliability of those services. In this two-part series of posts, I will introduce core DevOps principles that will help guide crypto infrastructure creation. I’ll also share different DevOps infrastructure aspects that I have to worked well for me, and could be helpful to other teams looking to stand up crypto as-a-service offerings.

Cloud vs Roll-Your-Own

For many infrastructure elements, you must choose whether to go with a cloud provider such as AWS, Azure, or Google, or to roll-your-own in a colocated data center with self-managed software. In crypto and blockchain, there are some specific requirements, particularly relating to key security, which may factor into requirements around hardware security modules (HSMs), physical servers, and tiers of colocated data centers (more on this later).

But in general, all other things being equal, if there is an option from one of the three major cloud providers for an as-a-service offering vs rolling your own with purchased hardware and self managed software, there are a lot of reasons to go for the cloud option.

In my experience, it is very easy to underestimate the investment and labor required to self-manage infrastructure elements in a high-quality way over time. Especially when the software is open source, the temptation is always to just pull down the software and start running it. Focus tends to be on the cost of the hardware instead of the cost of the cloud service. The DevOps staff that is required to manage, upgrade, performance tune, patch and evolve this infrastructure over time is almost always underestimated by startup teams and becomes baggage as the team and the company grows. For any piece of infrastructure, you really have to ask yourself if this is the best use of your team’s time.

In most cases, you will want to focus your energy on things that you only you can do, and purchase services where possible from a reputable cloud provider. The three major cloud providers (AWS, Azure, Google) all have large and highly specialized teams surrounding each of their as-a-service offerings. For smaller companies, there is no way you are going to do a better job with management and security than these cloud provider teams for base/commodity offerings.

My take: go with a cloud provider or (better yet) more than one cloud provider, so you can and focus on building and running things that you can’t purchase as a service and that are unique to your offering.

SECURE, RELIABLE CRYPTO INFRASTRUCTURE FROM PURESTAKE

Version Control

In recent years, the idea of infrastructure-as-code has become a leading principle in DevOps. This is part of a larger evolution of DevOps that continues to shift the discipline towards looking more and more like a software development practice. A core part of any software development practice is storing all your software artifacts in a version control repository. Artifacts can include source code, configuration files, data files, and in general any of the inputs needed to build your software and your infrastructure environment. It seems like a given, but I have seen operational environments where not all of the artifacts necessary to build the environments were stored in source control.

The benefits of storing everything under version control is that you have a unique version for a given state of the artifacts used to build your environments. This allows for the repeatable build of environments, the implementation of processes around change to these artifacts, and the ability to roll back to any previous known good state in case there are issues. High-quality and cost-effective cloud-based services such as GitHub make this an easy choice to serve as a foundation for DevOps activity.

Full Stack Automation

One of the best things about using the cloud for your infrastructure is the programmability and APIs that the cloud vendors provide. These APIs can be used to automate the entire application stack from base layer network, DNS, storage, compute, up to operating systems and serverless functions, and all the way through to the custom code in your application. Taking an infrastructure-as-code approach means having software artifacts in your source code repository and a build process that can create an entire application environment in a fully automated way. This automation can be used to drive the initial build and incremental change to development, test, and production environments.

There are good tooling options these days to achieve this kind of infrastructure automation. At the base infrastructure level, there are solutions native to cloud provider environments such as AWS CloudFormation or Google Cloud Deployment Manager. We are fans of Terraform as it allows for the management of infrastructure in AWS, Azure, and Google from the same codebase with provider-specific modules and extensions. Once the base level infrastructure has been provisioned, packer images combined with configuration management tools like Ansible, Chef, or Puppet can be used configure host-based services.

There are a lot of benefits to be had from automating the full application stack. Automation eliminates the chance of manual errors and allows for a repeatable process. It also can drive the same stack into dev, test, and prod, thus minimizing the chances of environmental differences leading to surprises. Automation can also be used to support blue/green production deploys in which an entire new environment is built with updated code and then traffic is cut over from the existing to the new environment in a controlled fashion. In addition, it is easy to roll back in this model if there is a problem with the new environment.

Full stack automation also lends itself to the switch from thinking about servers as unique elements with individual character to managing servers as interchangeable elements. It becomes a straightforward proposition to rip and replace troublesome infrastructure and to use tightly-focused servers rather than sprawling snowflakes that acquire dozens of responsibilities and take on a life of their own.

Secrets Management

When you have an automated environment it is very important that the secrets that are part of your application are managed carefully. Secrets could include service passwords, API tokens, database passwords, and cryptographic keys. The management of crypto keys is particularly critical for crypto infrastructure where private keys are present, such as exchange infrastructure and validators on proof of stake networks. Read my recent blog to learn more about crypto key management using multisig accounts and offline keys.

However, a lot of the same principles apply to infrastructure, application, and crypto secrets. You want to make sure that these secrets are not in your source code repo, but rather that they are obtained at build or, better yet, at runtime in the different environments in which your application is running.

Software and platform native tools that help protect secrets in production environments include AWS KMS/CloudHSM, Azure Key Vault and Hashicorp Vault if you are looking for something cross platform. Some very sensitive secrets such as crypto private keys can benefit from hardware key management systems such as YubiHSM2 and Azure Dedicated HSM based on Safenet Luna hardware. The downside is that hardware solutions are generally less cloud-friendly than software ones and, while they may improve key security, some aspects of security are worsened by taking a hardware approach over a more automatable cloud-native software approach. The infrastructure costs and surface area that needs to be managed can also be far higher when taking a hardware-centric approach.

Intel SGX is a promising hardware technology that allows processes to run in secure enclaves.  A process running in a secure enclave is totally isolated from the host operating system. What this means is that, if you have access to the guest operating system, you cannot read the memory of the process running in the SGX enclave even if you have root privileges.  I am excited by the use of SGX enclaves combined with e.g. Hashicorp Vault to improve the security of software and cloud native secrets management. SGX is available today via Azure Trusted Compute but has the downside of requiring coding to the SGX APIs. We eagerly await further developments of the AWS Nitro architecture which we believe will greatly improve the security of software and cloud native secrets management. Nitro is the AWS version of providing hardware support for isolation of customer workloads on shared infrastructure.

Topics to Cover in Part II

There are many aspects to consider when thinking about secure and reliable infrastructure for crypto based applications.  We’ve only touched on a handful of areas in this article. Here are some additional areas I cover in part II:

  • Authentication
  • Authorization and Roles
  • Networking
  • Monitoring
  • Logging

Looking for further information about infrastructure for crypto-based applications? Contact us today

Participation Keys in Algorand Blog Banner Image

Participation Keys in Algorand

What Are Algorand Participation Keys?

In Algorand, there are 2 types of nodes: relay nodes and participation nodes. Relay nodes serve as network hubs in Algorand, relaying protocol messages very quickly and efficiently between participation nodes. Participation nodes support the consensus mechanism in Algorand by proposing and validating new blocks. Participation keys live on participation nodes and are used to sign consensus protocol messages.

A participation key in Algorand is distinct and totally separate from a spending key. When you have an account in Algorand there is an associated spending key (or multiple keys in the case of a multi-sig account). The spending key is needed to spend funds in the account. A participation key, on the other hand, is associated with an account and is used to bring stake online on the network. Importantly, participation keys cannot be used to spend funds in the associated account, they can only be used for helping to support the consensus protocol.

Participation Keys Are Good

Having distinct keys for spending the Algo in an account, and staking the Algo in an account, results in several key security improvements.

In any crypto network, protecting the spending keys is of the utmost importance. Situations that require having spending keys on an internet connected computer are inherently dangerous and always contain the risk of loss of funds.

In Algorand, the spending key never has to be online. The spending key can be kept on an airgapped computer or other offline setup and only used for signing transactions offline. The participation key, in contrast, lives on the participation node and signs protocol messages, but the participation key cannot spend any funds in the account.

This separation of duties in 2 different keys improves the security of Algorand infrastructure substantially. Spending keys can always be kept totally offline and an attacker, if they are able to compromise an internet connected participation node, cannot spend or steal any of the funds in the associated account.

Of course, this doesn’t mean that participation keys shouldn’t be highly protected and secured. If an attacker does compromise a participation key, they can stand up a second participation node with the same participation key. This will result in protocol messages being double-signed, which the network will see as malicious behavior and will treat the node / associated stake as offline.

There is no bonding or slashing in Algorand, and staking rewards are still coming in the future, but regardless: being forced offline due to double signing is undesirable and means that the stake in question will no longer be supporting the consensus mechanism.

Participation Key Mechanics

My examples assume Algorand Node v1 software is installed and running in a participation node configuration on the Algorand MainNet. The software is installed using the Debian package on Ubuntu 18.04, with a standard non-multi-sig Algorand account with some Algo in it, and a separate offline computer with the spending key for the account.

To create a participation key you will need to use the “goal addpartkey” command and specify the account that you want to create the part key for and a validity range:

goal account addpartkey -a WHNXGKYOVIQADYS4VTYBG6SGWFIG6235C5LMXM76J3LHE475QJLIHUC5KY --roundFirstValid 789014 --roundLastValid 4283414

A few things to note. The account specified in the -a flag in the command above (WHNXGKYOVIQADYS4VTYBG6SGWFIG6235C5LMXM76J3LHE475QJLIHUC5KY) is made up and you would need to replace it with your account. Do not use this account as it, and the associated spending key, are not real. Any funds sent to this address will be permanently lost.

The validity range is specified in rounds. Rounds are equivalent to blocks in Algorand. So if you, for example, want to have a key that is valid from now until a point in the future, you need to find the current block height for the roundFirstValid and a future block height for the roundLastValid flag corresponding to the validity range you want.

To find the current block height you can use the “goal node status” command:

derek@algo-node:~$ goal node status Last committed block: 789014 Time since last block: 2.4s Sync Time: 0.0s Last consensus protocol: https://github.com/algorandfoundation/specs/tree/5615adc36bad610c7f165fa2967f4ecfa75125f0 Next consensus protocol: https://github.com/algorandfoundation/specs/tree/5615adc36bad610c7f165fa2967f4ecfa75125f0 Round for next consensus protocol: 789015 Next consensus protocol supported: true Genesis ID: mainnet-v1.0 Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

The last committed block, which is the same as the current block height, is reported as 789014, so we use that for our roundFirstValid. Figuring out the right value for the roundLastValid is a little more involved.

First, you have to determine what time range you want. It is a good practice to rotate participation keys and not to create a key with a really long validity range. In our example, we will use a time range of 6 months. What round corresponds to 6 months from now?

To figure that out, we have to do a little math. 6 months is approximately 182 days. So 182 days x 24 hours / day x 60 min / day x 60 sec / min = 15724800 seconds. At the time of writing, each round in Algorand takes about 4.5 sec. So 15724800 seconds / 4.5 seconds per block = 3494400 blocks. Now we need to add 3494400 to the current block height to get the height 6 months from now. E.g. 3494400 + 789014 = 4283414. This is where the 4283414 in the command above comes from for the roundLastValid.
As the network grows, the 4.5 second block time may not be a safe assumption. This may make the validity range slightly different than 6 months. You need to monitor for key validity and make sure to put a new key in place before the old one expires.

Once the addpartkey command has executed, you can find the participation key at:

/var/lib/algorand/mainnet-v1.0/WHNXGKYOVIQADYS4VTYBG6SGWFIG6235C5LMXM76J3LHE475QJLA.789014.4283414.partkey

It’s beyond the scope of this article, but this file is actually a sqlite database with N number of keys in it which will be internally rotated through automatically during the validity window. This is an additional security measure that is part of Algorand, where the keys used to sign protocol messages are rotated as rounds progress.

With the participation key created, the next step is to bring the account online. An account being online in Algorand means that the Algo in the account is supporting the consensus mechanism. We bring an account online by using the “goal account changeonlinestatus” command. Note that this action requires that you have a small amount of Algo in the account to pay for the transaction. If you have the spending key for the account directly on the participation node you can simply run this command

goal account changeonlinestatus -a WHNXGKYOVIQADYS4VTYBG6SGWFIG6235C5LMXM76J3LHE475QJLA -o=1

However, having the spending key on the participation node is not recommended and kind of defeats the whole purpose of having participation keys in the first place. It is much better to have an airgapped and totally offline computer that has the spending key on it. The process is a little more involved with this setup, but it is much more secure. With this setup you would issue the following command instead:

goal account changeonlinestatus -a WHNXGKYOVIQADYS4VTYBG6SGWFIG6235C5LMXM76J3LHE475QJLA -o=1 -t online.tx

This will produce a transaction file called online.tx in the current directory which has an unsigned transaction to bring the account online. This transaction file then needs to be securely moved to the airgapped computer with the spending key on it. Once on the airgapped computer you can use the algokey utility to sign the transaction file. The command would be:

algokey sign -k spendingkeyfile -t online.tx -o online.tx.signed

Note that algokey is standalone and does not need a running Algorand node. Also, the spendingkeyfile is the file that has the spending key for the account. This file can be created by algokey when you first set up your account.

There is also an option to specify the spending key mnemonic instead of a file, but I find this option worse as it leaves the mnemonic in the shell history, etc. The result of this command is that online.tx.signed will be created in the current directory. This file contains the signed online transaction and it needs to be securely moved back to the running participation node.

Once you have online.tx.signed back on the participation node you can send it to the network with the following command:

goal clerk rawsend -f online.tx.signed

Wait a little bit for the transaction to be processed, and your account should now be online. The creation of a transaction file, movement to the airgapped machine to sign the transaction, movement of the signed transaction back to the online node, and then sending the signed transaction to the network is a general pattern for sending transactions in Algorand without ever putting your spending key online.

Final Thoughts on Participation Keys in Algorand

The design of Algorand using separate keys for spending funds and for participating in network consensus improves the security of nodes running on the Algorand network substantially by protecting spending keys and removing the need for them to ever be online. I think this was a good design choice and wouldn’t be surprised if other protocols adopt this approach.

 

Why We Started PureStake Blog Banner

Why We Started PureStake

Many of us at PureStake were just starting our careers in the mid-to-late 90s, during the first internet wave. Since then, we have spent the last 20-plus years building infrastructure, software, and cloud companies based on the possibilities opened up by the internet. I recall the atmosphere and feeling of those early internet days and, in the intervening years, I hadn’t experienced that feeling since until I started getting involved with crypto.

The crypto genie is out of the bottle, and it has unleashed forces which cannot be stopped or contained. We believe that using blockchains to move value in an open, low friction, low-cost way will have as large an impact on all of us as the internet has had in moving information in an open, low friction, low-cost way. We are only at the beginning of a historical shift where crypto networks and applications will disintermediate many existing companies, structures, and practices, replacing them with code.

While the strategic direction of this shift is clear, the particulars of how this shift will play out are harder to call. That said, we have several beliefs that we stand behind:

  1. The future will be a multi-chain future vs one-chain-to-rule-them-all. In this future, bitcoin will continue to have a foundational place in the ecosystem, but there will also be many other blockchains, each of them good at different things.
  2. Public and permissionless blockchains will lead the way in terms of innovation and interesting applications vs private and permissioned ones.
  3. Proof of Stake consensus protocols are a more scalable, more efficient, and ultimately more secure consensus mechanism versus more traditional Proof of Work consensus protocols. As decentralized currencies, networks, and applications continue to mature and get traction, we believe there is a large opportunity to provide infrastructure as a service to support participation in and development on these decentralized networks.

We are taking all of our experience building and running cloud services and applying it to crypto infrastructure. Given that this infrastructure will be directly handling value, the security and reliability of our services must come first (and features will sometimes have to come second).

We use a software-first approach to solving problems. Treating our infrastructure as code and using software engineering best practices to deliver change to our infrastructure is one example of this. We aim to hide infrastructural complexity from our users and customers. We want to provide them with services that are simple to consume, freeing them to focus on the reasons they want to interact with the blockchain vs the details and mechanics of how to interact with the blockchain.

We will engage closely with a select number of networks that we believe in. We want to focus our energy on fewer vs more networks to be able to go deep on them to understand how they work, their nuances, their APIs, and their infrastructure needs. As we build expertise on specific networks we will be giving back to those networks in the form of services, tools, and information that help the community. Our goal is to provide secure and reliable blockchain infrastructure that participants can depend on and that developers can build upon.

The first network we are focused on is Algorand. Algorand is currently in TestNet and will be launching their MainNet soon.

Why Algorand? We personally know many of the people on the Algorand team. They have an extremely talented engineering, research, and business operational team. We believe in Silvio Micali, Steve Kokinos, and the team they have assembled. We think they can execute on a complicated and difficult roadmap in a way that other projects have historically been challenged with.

Our experience with the Algorand software and network has been similarly very positive. The quality of the code, the security, and design innovations, and the the rich set of financial primitives have all made a big impression. The performance of the network we have seen on the TestNet without significant sacrifices to security or decentralization we believe will move the needle among public blockchains and blockchain design in general.

We are excited to be one of the companies helping to support the upcoming Algorand MainNet network launch and look forward to engaging with participants and developers in the Algorand community.

Stay tuned for updates on our journey by signing up for our newsletter, or feel free to contact us if you are developing an Algorand application or need help with blockchain infrastructure.