Transaction Fee Estimations: How To Save On Gas? Part 2
We are excited to announce our improved Fees API which uses machine learning techniques to improve gas price stability and overall prediction performance. This project is supported by the European Union Regional Development Fund as an open-source initiative to benefit the blockchain community.
Part 1 of this blog series, covered transaction fee pricing on Proof-Of-Work blockchains (Ethereum & Bitcoin), transfer demand surges, fee volatility, and the resulting necessity for precise gas fee predictions. The second part discusses machine learning techniques that we implemented to create a significantly better fee prediction mechanism, the challenges we faced in the process, the current state and future of transaction fees on the Ethereum ecosystem within the context of asset tokenization. Let’s get into it!
Data Aggregation: Collecting transaction confirmation times
As an initial hypothesis to build a regression model for the prediction of gas prices for the Ethereum network, we aimed to determine the distributions of gas prices across a continuous spectrum of confirmation speeds and their corresponding confirmation times. The idea was to look at the fees paid historically for a given confirmation time and estimate what needs to be paid at the time of a new transaction submission. Transaction senders would simply provide their desired time for confirmation, and would receive the gas price that needs to be paid.
The supply of available transactions (with the exact same gas usage) that can be included per mined block stays the same on the Ethereum network due to the gas block limit. Hence, the naive hypothesis, applies to steady demand scenarios and high network stability. The time that a given transaction enters the mempool (unconfirmed transaction pool from which miners select transactions to mine), prior to getting confirmed, is not stored on the blockchain but rather has to be somehow extracted.
Generating transaction confirmation data using automated transfers
So, we developed an automated Ethereum transaction sender, checker, and validator that cycles between 7 different gas prices, 20 wallet addresses, and auto-sends transactions to an Infura node, keeping all other transaction variables intact. For the transaction sender, the time at which a transaction is sent is determined by the timestamp of the last Ethereum block confirmed at the time of sending (the latest block). A previously unselected gas price is chosen, for an available wallet, and a transaction is sent to another available wallet (here both wallets owned by the same party). At transaction broadcast time, the transaction hash is logged, enabling the checking of newly mined blocks for confirmation. If the transaction is included in a block that has been recently mined, we establish the actual transaction confirmation time corresponding to the mined block’s timestamp. The time difference between sent time (latest block timestamp at sending time) and confirmation time (transaction in a block timestamp) is calculated as seconds (or blocks) waited. Finally, all the gas prices used to send the transactions are stored with the transaction hash as the database key.
Data Modelling: Estimating gas prices empirically
The distribution of sent gas prices revealed a variety of transactions going into the network with embedded Gaussian, Poisson, and Delta-like distributions. The plot of gas prices against blocks waited revealed an exponential decay (Poisson Process). To this data we generated a fit, plotting sufficient numbers of sent mainnet transactions (5K+). As a “static” model, we still could not account for the time-dependent variations of gas prices. After considering plenty of literature for time-series predictions (including multivariate time series ARIMA, GARIMA, VAR models), we selected a Long short-term memory (LSTM), Recurrent Neural Net Model (RNN) to train and test on the constructed database. We could then account for long term gas price correlations taking the parameters from the Poisson fit for transaction confirmation time (blocks waited) as a starting point for the RNN.
Challenges faced with model stabilization
Under real-time network conditions (tests on the Ropsten testnet were encouraging) on the mainnet, the gas price predictions fluctuated wildly, at times by a degree of order 1 i.e if the gas price was 1 GWei, the models estimated 10 GWei. This model was extremely sensitive to mainnet conditions. Reinforcing the data by using continuous real-time feeds of freshly sent transactions to the RNN (with the automated sender-checker-validator) could not re-stabilize the model. The effect of every new data point in training the network was proportional only to the number of points that came before it. Selecting the tradeoff of data to employ for model training between “current” (freshly sent) transactions VS “historical” (previously sent) transactions depended strongly on the fluctuating demand. We could not know this ratio in advance. For example, a contract associated with a popular Ethereum game like Cryptokitties is deployed to the network so users rush to upgrade their kittens within a short span of time to avoid missing out or markets dip due to global calamity and everyone frantically tries to close out their collateralized debt positions to prevent financial losses.
How many sent transactions could we log and how much of that data could we ingest at any given time? This was discouraging as it led to longer training (waiting) times for LSTM RNN models. Even though we got good accuracy (90%+) while testing on Ropsten, live mainnet predictions using LSTM RNNs destabilized repeatedly. We needed a bit more than an empirically estimated formula retrofitted to a univariate time-series prediction model.
Advances with Data Modeling: Incorporating demand network signals
To enable an advanced model in light of the above challenges, we added factors affecting gas price volatility and network properties signalling demand; aggregating and storing them on a minute-by-minute basis for the model re-training. We collected over 45K+ distinct signal points across 8 different signals before training any new models. These signals are:
- Number of unconfirmed transactions
- Gas prices provided from various Ethereum public oracles
- Ethereum mining pool reported hashing rates
- ETH vs USD prices, ETH vs BTC prices
- Number of active miners (& workers) in a given pool
- Ethereum total block difficulties & block times
Seeing that the distribution of historical gas prices was far from Normal or Gaussian we decided it made sense to split it into three models categorized into confirmation speeds namely - Fast, Medium, and Slow. In the place of making continuous predictions from a fitting function; categorizing them into these buckets prevented estimations of a given speed’s model from bleeding into another. This also preserved some of the Normal/Gaussian like distribution qualities for those respective speed categories.
Improving neural networks with hyperparameter grid search
With over a month of time-stamped network signals collected and merged with Ethereum block info, we trained multivariate regression models employing (deep) Neural Networks. These models first dynamically pick up the best type of regression to use - linear, logistic, polynomial etc., at training time and further, optimize the regression coefficients of the dependent variables for the target - gas price. We then proceeded with tuning the trained models by hyperparameter grid search using K-fold cross-validation, for the objective of minimizing the mean absolute errors, bringing strong stability to the estimations. MAE is the Average absolute difference between the labels (recorded gas prices) and the predicted values (predicted gas prices) in training data. This ranges from zero to infinity, where a lower value indicates a higher quality model. The Neural Network’s hyperparameters were applied from a list of predetermined value ranges and tested for minimizing MAE:
- Number of NN Nodes - 64 to 256
- Number of Layers - 2 to 6
- Hidden layers up to 4
- Cross layers up to 2
- Dropout rates - 0 to 0.375
These models started being referred to as “FeNN” or Fee estimations using Neural Networks resulted in MAEs of ~ 0.34 or below. At training, we avoided potential input data related manipulation vectors by using feeds from 6 different sources (6 separate providers), logged every minute on an internal server with expanding storage. Simultaneous manipulation of all the sources is highly improbable. With credible sources and reliable signals, we achieved accuracy scores of 0.95 and above, for all model speed categories in predicting gas prices. Overall, accuracies for predicting gas prices on the testing set were quite high (>95%) along with the lowest MAEs we had seen in comparison up to this point.
We deployed FeNN models for all three confirmation speeds - fast, medium, and slow into production to infer from real-time network stats. First, we found gas price predictions from these models didn’t vary by an order of magnitude 1 or higher. Second, the confirmation time on each speed category got lower as the models stabilized with more data. These were encouraging signs! As we aimed for cheaper transactions we went ahead and applied weights for the above-mentioned parameters in the model for inference, ranging from 85% to 95% depended on their sensitivity towards estimating gas price.
Over time we developed significant confidence in FeNN models as they continued to consume more than a month’s worth of network signals and ethereum block information. The recorded inferences from FeNN were logged every minute and then compared to ethgasstation.info suggested gas prices (Egs is a very widely used public source for Ethereum gas prices) for an overview of operational performance. The real test, however, was in utilizing these inferences directly to send transactions.
Benchmarking Performance: FeNN Vs EGS on the Ethereum Mainnet
FeNN’s stability and prediction characteristics enabled sending mainnet transactions continuously over extended periods of time. Having measured latencies of 103.69 Milliseconds for 95% of the calls to the FeNN API inferring from highly (95%+) accurate models. We refactored the auto transaction sender-checker-validator (described in the Data Aggregation section) for mainnet transfers with gas prices suggested from the FeNN API and Egs API. Running the auto sender on three parallel threads (fast, medium and slow) whilst feeding resulting transaction confirmation time and gas prices into a unified database, flagging each transfer with either an Egs (Fast, Average, SafeLow) fee estimate transaction or the FeNN (fast, medium, slow) estimation.
We confirmed a total of 1403 FeNN API & another 1403 transactions with the Egs API. Both APIs were also queried at approximately the same time (results from the benchmarking:
- FeNN led transactions were confirmed cheaper with Egs by 1.862%
- FeNN led transactions were confirmed faster with Egs by 10.21%
- FeNN led transaction performed better versus Egs (Cheaper/Faster) by 18.23%
As of 11/05/2020, FeNN gas price inferences are overall cheaper by 13.35% compared to Egs. FeNN is also individually cheaper in all categories fast, medium, and slow.
Full integration into the Upvest Asset Tokenization Engine
The advanced Fee API blends in quite well into the overall Upvest product suite. As a market leader in the field of asset tokenization, we know that the fee aspects of transactions have a high impact and should not be underestimated. For asset issuers, custodians, or exchanges managing large volumes of high-value transfers, sending transactions with the optimal fees becomes crucial. It is very likely to overpay a non time-critical transaction or to underpay a time-crucial transaction so it doesn’t get cleared as soon as desired. Reducing the overall fees paid in addition to optimizing the time window of transaction execution benefits these situations immediately. Knowing this, we’ve fully integrated the FeNN fee prediction model into our asset tokenization engine. Clients of Upvest receive this feature for auto-calculating cost-efficient speeds for slow/medium/fast transaction execution times right out of the box.
A Glimpse into the future of Ethereum fees
Upvest enables abstracting away a lot of complexity for users dealing with Ethereum network transaction fees. Meanwhile, the Ethereum protocol is also constantly evolving and taking a closer look at the rather cumbersome fee processes. The current gas price mechanism built into the protocol at Ethereum layer-1 prevents a simpler approach. EIP-1559 is the current proposed approach for enabling simplicity, predictability and stability to transaction fees. Originally proposed by Vitalik Buterin (Ethereum founder) in his Blockchain Resource Pricing paper as a correction for the first auction-based gas price mechanism. It is also purported to bring down needless confirmation delays and increased fairness to the users of the network. A few fundamental changes are planned at the Ethereum consensus layer as a result. It is still ongoing heavy development for the next Ethereum hard fork as one of the most anticipated upgrades to the economic fee incentive model of Ethereum.
Want to try it out or contribute? Get in touch!
If you’d like to test-run our advanced Fees API live on the mainnet for your transaction, we gladly invite you to sign up and trial the Beta. Curious to know more about how this tool works? Open-sourcing the entire codebase is among our top priorities and we’re happy to answer any questions with respect to the gathered data, code contributions and those to be added in the future. We welcome you to participate in our ongoing discussions with the Ethereum community on Reddit or reach out directly to the developers (and authors) for any questions or further information - pranay#@#upvest.co / paul#@#upvest.co (remove ‘#’). To know as soon as possible about the open-sourcing of the entire codebase for this project, subscribe to our announcements. We’d be keen to hear from you regarding input for further developments and future collaborations.