Graphene Interim Report

This document summarizes the current progress on Graphene version 2 as described in BUIP 093. BUIP 093 is divided into two top-level objectives: phase 1 and phase 2. Phase 1 is nearing completion and phase 2 is expected to begin within the next one or two months. We begin with a brief summary of the tasks in phase 1 and their status.

Status

  1. [IN PROGRESS] Protocol specification for current IBLT implementation
  2. [COMPLETE] Use SipHash plus salt for generating Graphene cheap hashes: Implemented as part of PR 1551 and merged to the Bitcoin Unlimited (BU) dev branch on March 18, 2019.
  3. [COMPLETE] Allow arbitrary random seeds to be used for IBLT hash functions: Implemented as part of PR 1582 and merged to BU dev branch on March 19, 2019.
  4. [REVISED] Use filterInventoryKnown Bloom filter: Based on our research documented here, it was decided that using filterInventoryKnown to estimate the contents of the receiver mempool is not an effective strategy for reducing IBLT decode failures. Instead, the research suggested a particular IBLT padding scheme is a better approach. Based on those findings, we implemented this IBLT padding scheme as part of PR 1426 and merged it to BU dev on December 17, 2018.
  5. [COMPLETE] Improve Graphene optimization algorithm: Implemented as part of PR 1525 and merged to BU dev on January 4, 2019.
  6. [DROPPED] Re-tune Graphene parameters: We conducted new research that determined the reduction in Bloom filter variance that can be achieved by retuning parameters as suggested in previous research would lead to exceptionally large IBLTs. As a result, we decided not to pursue this change.
  7. [DEFERRED] Implement Expedited Graphene: After a discussion with the BU dev team, it was decided that implementation of expedited Graphene blocks would be deferred to phase 2 because the changes necessary for its implementation will involve similar portions of the code.
  8. [COMPLETE] Use CFastFilter: Implemented as part of PR 1603 and merged to the BU dev branch on April 9, 2019.
  9. [COMPLETE] Reduce IBLT checksum: Implemented with all tests passing on a branch of BU dev. However, because of the complexity of the change, the BU dev team decided to defer reviewing / merging this code until after the upcoming release as part of phase 2.
  10. [IN PROGRESS] Update protocol specification for Graphene

During the course of phase 1 we also had an opportunity to leverage the new canonical transaction ordering (CTOR) in Graphene as part of PR 1532.

Current Performance

Decode failure rate

Prior to phase 1 work, a significant fraction of Graphene blocks failed because of decode failures in the IBLT. In our own testing, the failure rate could reach as high as 10% during periods of high transaction throughput (such as stress tests). However, since PR 1426 was merged, we have consistently observed decode failure rates below 0.5%, which is slightly lower than the desired theoretical failure rate as defined in the IBLT optimization repository and encoded in the IBLT parameters set here. The exact failure rate from a recent test is reported below.

Compute optimization

Two PRs were responsible for improvements in Graphene block compute time. The first was PR 1525, which uses an O(1) heuristic for determining optimal Bloom filter and IBLT sizes for sufficiently large blocks. Prior to this improvement, the optimal sizes were determine by brute-force parameter search, an O(n) operation (where n is the number of transactions in the receiver's mempool). This heuristic trades up to 10% of Graphene's compression on smaller blocks (those containing fewer than 500 transactions) for O(1) optimization time.

The second compute improvement was PR 1603, which adapts the existing CFastFilter code to be usable by Graphene. The idea behind CFastFilter is to use the entropy from the transaction hash for the entropy in the Bloom filter. This allows for the Bloom filter to avoid computing any hashes, a big performance improvement. Overall, this change was able to reduce the time required to assemble, and subsequently reconstruct, a Graphene set by approximately 30% (test results are reported in the output of the Graphene unit test).

Block Compression

The only work merged during phase 1 to the BU dev branch directly targeted at improving Graphene block compression was PR 1532, which allows Graphene to leverage CTOR. Soon, we hope to also merge our work on reducing the IBLT checksum size. Below are the results of a 440 block trial run on mainnet with a binary built from commit d52c79e0 on the BU dev branch. The results were generated using this script, which parses the raw output log from our test node. The test commenced with block 000000000000000000f908a28ba23790adfed60720086cb84ccec7eaada20f95. Note: these test results do not include our recent checksum changes.

We experienced 2 decode failures out of 541 blocks, which amounts to a failure rate of 0.37% (fewer than one failed block per day). Of those 541 Graphene blocks, only 4 required an additional round-trip to fetch missing transactions (the sizes of these fetched transactions are included in the compression rate calculated below). Compression rate was calculated with the equation $1 - g/f$, where $g$ and $f$ are the sizes (in bytes) of the Graphene and full blocks, respectively. Compression was significantly better for larger blocks. The best compression rate achieved during the test was 0.999, which came from the block having the most transactions (2545). In the table below, we report compression statistics for all blocks and those with more than 1K transactions.

block txs number of blocks mean compression median compression
$\geq 0$ 539 0.953 0.989
> 1K 17 0.998 0.998

The following boxplot gives a better idea of how compression rates scale with the number of transactions in a block. The raw data for the plot can be found here. Blocks are grouped according to transaction count, and each group (marked along the x-axis) represents a multiple of 50 transactions, e.g. every block in the group marked "150" has between 150 and 200 transactions. Each box in the plot depicts the inter-quartile range of compression rates during our test for that particular group of blocks. As can be seen from the plot, nearly all blocks with more than 250 transactions had a compression rate exceeding 0.99. There are two technical things to note about the plot. First, because it is the only block in the group marked "2500", the largest block is not obvious in the plot --- it is rendered as a single dashed line very close to 1.0. Second, in order to more clearly see compression for larger blocks, the y-limit on this plot is truncated at 0.95. But it is important to point out that there are numerous small blocks with compression rate below the mean.