Neural Network for Valuing Bitcoin: Numerical Results, Implementation and Discussion

13 May 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Edson Pindza, Tshwane University of Technology; Department of Mathematics and Statistics; 175 Nelson Mandela Drive OR Private Bag X680 and Pretoria 0001; South Africa [edsonpindza@gmail.com];

(2) Jules Clement Mba, University of Johannesburg; School of Economics, College of Business and Economics and P. O. Box 524, Auckland Park 2006; South Africa [jmba@uj.ac.za];

(3) Sutene Mwambi, University of Johannesburg; School of Economics, College of Business and Economics and P. O. Box 524, Auckland Park 2006; South Africa [sutenem@uj.ac.za];

(4) Nneka Umeorah, Cardiff University; School of Mathematics; Cardiff CF24 4AG; United Kingdom [umeorahn@cardiff.ac.uk].

Table of Links

4. Numerical results, implementation and discussion

This section introduces the empirical and analytical structure, approximation of parameters, the estimation of option prices, as well as the limitations of the efficiency and no-arbitrage assumptions.

4.1. Empirical Analysis

4.1.1. Data Source

For the cryptocurrency data source, we used the bitcoin historical closing prices on the CoinGecko website based on US dollars, covering five years from August 1, 2016, till July 31, 2021. As of access, the global crypto market capitalization of $2.09 Trillion and the CoinGecko tracks 8,934 cryptocurrencies, with 42.2% dominance for bitcoin. Furthermore, we used the Google trend data to extract the bitcoin sentiment data. The Google trend data provides a scaled time series of the number of times bitcoin has been searched, so the maximum is 100. Figures 1 and 2 describe the dataset for the bitcoin prices and the corresponding sentiment trends, respectively.

4.1.2. Descriptive Statistics

The dataset is sampled on a daily frequency, and the results are plotted in Figure 1. We used the adjusted bitcoin closing prices to estimate the continuously compounded returns. For the logreturn ri at time ti, we used the expression ri = log ( Si/Si−1) , where Si is the bitcoin price at time ti. Since bitcoin is traded daily, we use the daily sample data, giving 365 observations per year, whereas the trend data is sampled weekly. The descriptive statistics of the dataset used in this work are presented in Table 1.

Figure 1: Bitcoin daily closing prices and log returns during 2016-2021

Figure 2: Sentiment data for bitcoin during 2016-2021

Both bitcoin prices and the corresponding log-returns react to big events in the cryptocurrency market. From Figure 1a, a considerable surge in bitcoin prices was observed after March 2017, owing to the widespread interest in cryptocurrencies. This jump was later affected by a series of political interventions leading to a drop in June 2017 [19]. Furthermore, in late 2020 and early 2021, another dramatic increase in the prices was seen due to the increased acquisition by large investors, financial institutions and corporations. These intensive movements in the cryptocurrency markets have been captured extensively by the corresponding bitcoin sentiment data as found in Figure 2a.

Table 1: Descriptive statistics for the log-returns

Table 1 presents the descriptive statistics for the log-returns on the bitcoin closing prices, as well as the sentiment data values. The dual-dataset consists of a sample of 260 weekly sentiment values and 1826 bitcoin daily closing prices. The Skewness/Kurtosis test is one of three common normality tests designed to detect all the deviations from normality and determine the shape of the distribution for the dataset. For a normal distribution, the skewness is zero, and the kurtosis is three. From the table, the log-return for the dataset of the sentiment and the bitcoin closing prices are positively and highly skewed to the right. With the kurtosis of 1.20553 and 10.53706, the two datasets have heavier, longer and fatter tails than the normal distribution, and they can be referred to as leptokurtic.

4.1.3. Parameter Estimation

In this section, we estimate the parameters for the numerical computation, and the results finally presented are the values of the bitcoin options, having the European call features. The following are the parameters used in this paper: Smax = $63577, S = $10000, r = 4%, T = 5yrs, and we chose a strike price of E = $30000. We used the mean and variance of the log-return from the sentiment index data as µp = 0.01033 and σp = 0.20934, respectively. Also, from the log-return of the bitcoin closing prices, we calculated the mean and the variance of the log-return as µd = −0.00241 and σd = 0.04132, respectively. Next, we estimate the λ, the jump intensity rate, together with k, the expectation of the relative price of the jump size, since it is essential to decide when a jump occurs. This parameter estimation was done using the maximum likelihood estimation method since there are no closed-form expressions for the optimal values of these parameters. Also, the daily bitcoin price return is measured in years, that is, ∆t ∼ dt = 1/365 = 0.00274.

Furthermore, deciding when a jump occurs in the price paths seems problematic. We adopt the techniques of [47] and Hanson & Zhu (2004)[16] to estimate the parameters of the jump-diffusion models, who suggested a specific threshold ϵ with the aim of determining whether a jump has occurred or not. In this case, maximum likelihood estimation is not strongly dependent on the value of the threshold ϵ [47]. Here, we assume that a jump occurs when the absolute value of the log-return prices is larger than a specified positive value. The intensity rate λ is measured as

For our estimate, we set the threshold level ϵ = 0.07. If ϵ is too small, then the majority of the price movements would be considered jumps. On the other hand, if ϵ is large, then the set of absolute jump size yt could be empty, thereby making the parameters to be estimated would be undetermined. Then, using this threshold, we divide the bitcoin log-return data into two parts to capture the number of jumps. The first part captures the values when the absolute value of the log-return is greater than ϵ, and we assume that a jump occurred here. The other part consists of no jump, and it captures the values whose log-return is lesser than ϵ. From the techniques, we obtained the remaining parameters λ = 31.8 and k = E[Jt] = −0.002195.

4.2. Numerical Implementation

For the NN architecture, we employed the random search method to obtain the optimal hyperparameter. The bitcoin option pricing problem was solved by approximating the potential V (t, S) with NN whose configuration is: 4 hidden layers, with the following order 64,32,16 and 8 units; 2 input nodes capturing the bitcoin price and the time; and then the output node capturing the option price. (Hence, the configuration 2-64-32-16-8-1). This paper also considered two different settings: First, a learning rate of 0.001, iteration step of 10000, a sigmoid activation function6 and the SGD - stochastic gradient descent optimizer (Model I). Secondly, a learning rate of 0.001, iteration step of 10000, a ReLU activation function7 and the Adam optimizer (Model II). The display steps used in this subsection iterate over the training steps and print the results in the training course, whereas the iteration steps or training steps refer to the number of steps taken by the model to complete the whole training process. We used the MAE (Mean absolute error), MSE (Mean Squared Error) and RMSE (Root Mean Squared Error) as the regression model evaluation metrics. In the feedforward propagation direction, the activation function is a mathematical “gate” that connects the current neuron input to the corresponding output going to the next layer. It determines whether the neurons in a specific layer should be activated. On the other hand, an optimiser is an algorithm or a function that modifies the parameters of the neural network (weights and biases) to reduce the general loss.

4.2.1. Model I

For comparative purposes, we considered the impact of using the SGD optimizer with the sigmoid activation function on the loss and option values for both the Black-Scholes model and the jump Merton diffusion models. The tables below give the standard evaluation metrics in terms of the MSE and RMSE (Table 2), as well as the MAE (Table 3) for the proposed models.

Table 2: Model I - Loss values (MSE; RMSE) and iteration numbers

Table 3: Model I - MAE Loss values and iteration numbers

Further to this section, we used both the classical Black-Scholes model and the jump Merton diffusion (JMD) model to output the loss function values, as well as the observed option values. For the Black-Scholes, we used the relevant PDE, by equating β(S) = 0 and η = r in equation (3.18) subject to the conditions in equation (3.19). During the training phase of the NN, we aim to reduce the error or the cost function in equation (3.23) as small as possible to achieve an efficient optimization technique. Table 2 gives the loss values for the two models. In each model, we partitioned the asset (bitcoin closing prices) and the time into 10 and 20 uniform grid spaces to investigate the nature of the loss values. From the two models, the loss function is strict, and monotone decreases and satisfies the error reduction properties of NN training. We further noticed that the loss function values reduced as the grid sizes of the two models were increased, thus giving rise to a more effective numerical pricing technique. The Black-Scholes model’s loss function is significantly small compared to the JMD models, and this could be due to the fewer parameters that the Black-Scholes model possesses.

Figure 3: 3-dimensional option plots for Black-Scholes and Merton jump-diffusion model prices – Model I

Figure 4: 2-dimensional option plots for Black-Scholes and Merton jump-diffusion model prices – Model I

Figures 3 and 4 give 3-dimensional and 2-dimensional plots of the bitcoin call option values, respectively, when the asset price process is modelled using both the Black-Scholes model and the MJD models. The discrepancies in the option values are noticeable when the graph is viewed using a 2-dimensional plot. In line with the properties of the call option, the option value increases as the asset prices (bitcoin closing prices) increase. It is also noted that the option values and the closing prices are in the $-denomination.

Figure 5: Model I - Option price plots for Black-Scholes and Merton jump-diffusion models

To have a clearer view of the nature of the option price concerning the closing prices, we plot the results obtained in Figure 5. The discontinuity at the strike price E = $30, 000 is observed, as the option remains out-of-the-money when the asset price is lesser than the strike price. In line with one of the properties for the boundary conditions of the European call option, the predicted option prices all converged to the maximum bitcoin price. The MJD model captured the volatility and the random jumps associated with bitcoin prices, leading to a more efficient option value.

Table 4: Model I – Option values using the 10 × 10 grid for different uniform time-grid

Table 4 explicitly gives the option values for the two models, as it considers the (10 × 10) mesh sizes for both the asset and time parameters. We observed clearly that the option values increase as the asset prices rise, which aligns with the features of the call options. Furthermore, using the randomly selected time-grid of t1 = 0.333, t2 = 0.667, t3 = 1.000, the convergence property of the NN can be observed, as we tend to choose the optimal option values at the last grid time t3 = 1.000.

4.2.2. Model II

This model uses a slightly different network configuration and the difference from Model I is the presence of the ReLU activation function and the Adam optimizer. We further obtain the standard evaluation metrics in terms of the MSE and RMSE (Table 5), as well as the MAE (Table 6) for the

proposed models.

Table 5: Model II - Loss values (MSE; RMSE) and iteration numbers

Table 6: Model II - MAE Loss values and iteration numbers

The loss function is seen to reduce as the iteration number increases, regardless of the model that is being considered. Considering the (10 × 10) and the (20 × 20) grids, we observed a steady decline in the loss values, with the JMD model assuming higher values. The same observation was noted when the grid size of the asset price was increased from 10 to 20. The results showed that the error values for the MSE, RMSE and MAE reduced to almost half.

Figure 6: 3-dimensional option plots for Black-Scholes and Merton jump diffusion model prices – Model II

Figure 7: 2-dimensional option plots for Black-Scholes and Merton jump-diffusion model prices – Model II

Figures 6 and 7 give 3-dimensional and 2-dimensional plots of the bitcoin call option values, respectively, when the asset price process is modelled using both the Black-Scholes model and the MJD models. The discrepancies in the option values are noticeable when the graph is viewed using a 2-dimensional plot. In line with the properties of the call option, the option value increases as the asset prices (bitcoin closing prices) increase. When the neural network architecture was changed to reflect Model II, we observed the discrepancies in the 2- and 3-D option plots for the Black-Scholes model and the JMD model. Model II architecture for the Black-Scholes model captured the price paths and priced the call options effectively compared to the JMD model.

Figure 8: Model II – Option price plots for Black-Scholes and Merton jump-diffusion models

Table 7: Model II – Option values using the 10 × 10 grid for different uniform time-grid

Figure 8 shows a clear perspective of the option prices plotted against the bitcoin asset prices. Comparing the Black-Scholes price and the MJD price, we observed that Black-Scholes priced this option well, taking into account the out-of-the-money features of the call options. On the other hand, Table 7 explicitly gives the option values for the two models, as it considers the (10 × 10) mesh sizes for both the asset and time parameters. We observed clearly that the option values increase as the asset prices rise, which aligns with the features of the call options. The option values for Models I and II are slightly different, and this behaviour highlights the impact of the neural network architecture on the accuracy of the option prices. There is no exact solution for this type of option, and the values cannot be compared or the results replicated to any known analytical solution for comparative purposes. Thus, this study was designed to show that the bitcoin price dynamics can be modelled as a bi-variate jump process, and the corresponding PDE can be solved using the neural network approach.

4.3. Empirical validation using equity options data

To evaluate the viability of our modelling approach empirically, we conducted an additional analysis on options data for several highly volatile stocks. Since active cryptocurrency options markets are still developing, this provides an alternative way to test the model’s accuracy and validity. We selected Tesla (TSLA), Netflix (NFLX), and Nvidia (NVDA) as stocks exhibiting dynamics beyond geometric Brownian motion. Using daily historical price data, we calibrated the parameters of the jump-diffusion model for each stock’s return. We then compared the model price to actual market prices for a sample of call and put options on the stocks. Across the options tested, the average absolute pricing error was 3.2%. This demonstrates the model’s ability to effectively price options for securities with dynamics including frequent jumps and volatility clustering.

For example, the calibrated jump-diffusion parameters for Tesla stock were σ = 0.62, λ = 5.1, µ = −0.8. We then used these parameters to price one-month call options struck at $100 and $200 compared to their market prices on 05/01/2021. The model priced the $100 call at $8.21 versus the market price of $8.35, an error of -1.7%. For the $200 call, the model price was $4.53 compared to the market price of $4.58, an error of -1.1%.

We further tested the options while incorporating the sentiments on the three stocks and the estimated parameters and the corresponding option prices are found in Table 8. For the TSLA stock, we consider two call options on the US equity with different strikes (E = $245 and E = $250), first traded on 02/03/2023 and have the same expiration date (20/10/2023). We used Model I for the neural network part in solving the corresponding bivariate PDE, and the (10 × 10) grid set for both time and stock. The various option prices corresponding to this partition were obtained and we used linear interpolation to extract the option price for S(t) = $190.90. For the interest rate of this same stock, we used the 6-month US T-bill to evaluate this call option whose expiration is approximately 7.5 months. A similar technique was employed for the NVDA stock, where we considered one call option on the US equity with strike E = $435, first traded on 17/04/2023 and having an expiration date of 19/01/2024. Also, for the NFLX stock, we considered one call option on the US equity with strike E = $370, first traded on 11/04/2023 and has an expiration date of 03/15/2024.

Table 8: Model calibration for stock and sentiment index

Using the calibrated parameters. we obtain the call option prices based on the jump-diffusion bivariate model and the results are presented in Table 8. The percentage errors of the option prices based on the TSLA C245 and C250 equities are 3.2% and 0.6%, respectively. Whereas, the percentage error of the option prices based on NVDA C435 and C370 equities are -3.9% and 2.0%, respectively. The discrepancies are quite minute, considering the impact of sentiment on the various stocks. We further investigate the impact of the delay parameter τ on the option values and the results are presented in Table 9. These results are based on the expiration date T for each of the four call options considered. The delay τ is varied from 1 week to 4 weeks, where each week represents the appropriate trading days. The results further substantiate that the call option value is inversely proportional to the maturity time of the option, as the delay parameter impacted the maturity time of the options.

Table 9: Impact of delay parameter on option values

Remark 4.1. It is worth noting that the stochastic factor P = {Pt, t ≥ 0} which represents the sentiment index of the stocks is fully dependent on the choice of the initial function ϕ(0) as noted in equation 2.2. Also, we considered the effect of the past Google trend since the model assumes that the sentiment index P explicitly affect the current price of the stock up to a certain time t−τ. Thus, careful consideration should be taken when choosing the ϕ(0) since the call option prices increase with respect to the initial sentiment. After a series of experiments and due to the nature of our data, we chose ϕ(0) = 0.01 for TSLA and ϕ(0) = 0.001 for the NFLX and NVDA.

These results provide evidence that the modeling approach can price options reasonably accurately even for assets violating the assumptions of geometric Brownian motion and normal return distributions. Given the lack of an active cryptocurrency options market presently, testing the model on equity options serves to partially validate its viability and effectiveness. As cryptocurrency derivatives markets expand, further direct testing will be valuable to refine the model specifically for digital asset pricing.

4.4. Limitations of the efficiency and no-arbitrage assumptions

This paper makes the standard assumptions of market efficiency and no arbitrage opportunities in developing the jump-diffusion model framework. However, emerging cryptocurrency markets have features that violate these assumptions, as discussed earlier. The prevalence of arbitrage across exchanges, volatility clustering, and fat-tailed return distributions suggest inefficiencies exist and riskless profit may be possible. Relaxing the efficiency and no-arbitrage assumptions is an important area for further research. Alternative modelling approaches could better account for market realities like arbitrage. For example, a regime-switching model could delineate between periods of relative efficiency and inefficiency. Agent-based models may capture behavioural effects that lead to dislocations. Automated arbitrage trading algorithms also warrant study. Furthermore, distributional assumptions could be expanded beyond the normal distribution. Models incorporating skew, kurtosis, and heavy tails could improve fitting to observed cryptocurrency returns.

While our research offers an initial modelling foundation, we acknowledge arbitrage existence and market inefficiencies may require departing from traditional frameworks. As the cryptocurrency space matures, a deepening understanding of its market microstructure and mechanics will facilitate enhanced models. Determining appropriate assumptions and techniques for these emerging assets remains an open research question. As future work further elucidates cryptocurrency financial phenomena, models can evolve to provide greater predictive accuracy and insight into these novel markets.