Avalanche Consensus - Does it Perform as Promised? Part 3

Philipp Schneider with contributions by Ignacio Amores-Sesar and Christian Cachin

The third part of the series deals with the more advanced Consensus protocols introduced by the whitepaper called “Snowflake” and “Snowball”, and in particular their mechanisms to finalize a consensus decision of nodes. Further, we will outline a weakness in these mechanisms and propose a change (“Blizzard”) that resolves it.

Part 3: Snowflake, Snowball and Blizzard

Snowflake, and for the most part also Snowball, build on the basic randomized sampling principle of Slush (with Snowball introducing somewhat deeper changes which we won’t dive into here). The main difference of the two protocols that we want to highlight lies in the introduction of a mechanism with which nodes can finalize or irrevocably decide on an emerging majority opinion once it becomes apparent from the samples that they have obtained.

The goal is to provide the nodes with a condition to settle on an opinion that safely ensures consensus. The condition of finalization provided by the whitepaper uses a parameter \(\beta\) to control the probability that this mechanism fails (larger \(\beta\) implies a lower probability of failure). More concretely, in Snowflake a node finalizes its opinion as soon as it observes \(\beta\) samples in a row that all have an \(\alpha\) majority of one and the same opinion.

The intuition behind this is as follows. The probability that a node ends up finalizing the (wrong) minority opinion requires that it samples an \(\alpha\)-majority of that minority opinion in \(\beta\) consecutive samples. However, this probability decreases exponentially as \(\beta\) increases (it is at most \(1/2^\beta\)), which all but ensures that no node decides as long as the network has not converged to a strong majority yet. Only when the network has eventually established such a majority (after \(O(\log n)\) rounds), all nodes will decide on this majority opinion. This is because it will now become sufficiently likely that they observe \(\beta\) such majorities in a row in their samples.

A Weakness of Snow Consensus in Terms of Liveness

The explanation above does not yet take into account a Byzantine adversary that may take control of a few nodes. The finalization condition requires \(\beta\) consecutive queries with one and the same \(\alpha\) majority. In fact, if there is only a small chance of sampling an \(\alpha\) majority of Byzantine nodes, then the chance that this happens once in a sequence of \(\beta\) queries increases relatively fast in \(\beta\).

That gives a Byzantine adversary a high chance to make such a sequence invalid by “owning” at least one sample (i.e., having an \(\alpha\) majority) in any given sequence of \(\beta\) consecutive samples. This forces the nodes to start over many times, potentially stalling them for a long time. Ultimately, this effect implies a bad tradeoff between safety and latency, as a safety level that decreases sufficiently fast in \(\beta\) (inverse polynomial in \(\beta\)) implies that the number of rounds a node can be delayed grows very fast as function of \(\beta\) (that is, this number grows super-polynomially in \(\beta\), in technical terms).

Blizzard: An Improved Finalization Condition

We show that this liveness risk can be avoided by slightly changing the finalization condition - we call it “Blizzard” to distinguish. Instead of requiring \(\beta\) majorities of one and the same opinion in a row, each nodes maintains two counters that track the total number of rounds in which the node observes an \(\alpha\) majority of opinion 0 or 1, respectively. The node finalizes or decides on an opinion of the larger counter if the difference between the two counters surpasses a calculated threshold that depends on \(\beta\) and \(n\). We show that

  1. Blizzard ensures safety with low failure probability (decreasing exponentially in \(\beta\)),
  2. tolerates a certain amount of Byzantine nodes, and
  3. only requires \(O(\log n + \beta)\) rounds to finalization.

This provides a significantly better trade-off between performance and safety compared to Snowflake and Snowball. Our bottom line is that, although safety against the limited adversary considered in the whitepaper is not at risk, there is a loophole allowing a (relatively weak) adversary to stall finalization. This loophole can be closed with the relatively simple modification of the finalization condition introduced by Blizzard.

Conclusion and Further Reading

We answer the question whether the Avalanche consensus protocol performs as promised affirmatively, albeit with some caveats. As far as latency is concerned, Slush as well as (in most cases) the Snowflake and Snowball protocols achieve fast consensus as postulated in the whitepaper even for large \(n\).

Furthermore, we demonstrate the effects of the parameters \(k\) and \(\alpha\) on latency, where we discover the first caveat, namely that the effect of the parameters \(k\) and \(\alpha\) that the whitepaper introduces for optimizing latency is either very limited or detrimental (in terms of increasing \(k\) too much and choosing any value of \(\alpha\) other than \(\lceil \tfrac{k+1}{2} \rceil\), respectively).

We remark that the Snowball protocol in the whitepaper envisions a dual role for the parameter \(\alpha\), where values close to \(k\) can be beneficial for safety. With this in mind, it is interesting to point to a more recent paper by AvaLabs summarized in this blogpost that aims to optimize both latency and safety aspects by “splitting” the parameter \(\alpha\) into independent values for the different roles it plays in the protocol in order to optimize for both safety and latency.

The second caveat concerns an unfavorable trade-off in the finalization condition, where in order to maintain a high level of safety, an adversary is implicitly given the opportunity stall the protocol. We provide a simple change and prove that it avoids this effect. The recent paper by AvaLabs proposes a different change to mitigate cases of unexpectedly high latency of Snowball in general by temporarily switching to a more robust protocol but comes with worse performance guarantees.

All details, more background, protocols in pseudocode notation, and the detailed analyses are available in the complete research paper.

Written on May 14, 2024