Optimising cryptocurrency portfolios through stable clustering of price
correlation networks
Ruixue Jinga , Ryota Kobayashib,c , Luis E. C. Rochaa,d
a Department of Economics, Ghent University, Ghent, 9000, Belgium
arXiv:2505.24831v1 [physics.pop-ph] 30 May 2025
b Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8561, Japan
c Mathematics and Informatics Center, The University of Tokyo, Tokyo, 113-8656, Japan
d Department of Physics and Astronomy, Ghent University, Ghent, 9000, Belgium
Abstract
The emerging cryptocurrency market presents unique challenges for investment due to its unregulated nature and
inherent volatility. However, collective price movements can be explored to maximise profits with minimal risk using
investment portfolios. In this paper, we develop a technical framework that uses historical data on daily closing prices
and integrates network analysis, price forecasting, and portfolio theory to identify cryptocurrencies to build profitable
portfolios under uncertainty. Our method uses the Louvain network community algorithm and consensus clustering
to detect robust and temporally stable clusters of highly correlated cryptocurrencies from which cryptocurrencies are
chosen. A price prediction step using the ARIMA model guarantees that the portfolio performs well for up to 14 days
in the investment horizon. Empirical analysis over a 5-year period shows that despite the high volatility in the crypto
market, hidden price patterns can be effectively used to generate consistently profitable, time-agnostic cryptocurrency
portfolios.
Keywords: Cryptocurrency, Time series Prediction, Price Correlation, Network Modelling, Community Detection,
Portfolio, Optimisation
1. Introduction
The global cryptocurrency market has experienced a surge in growth. Cryptocurrencies, decentralised digital
assets, have reshaped traditional financial landscapes [31]. The cryptocurrency market has undergone dynamic evolution since the introduction of Bitcoin, the pioneering cryptocurrency, in 2009 [16]. Initially viewed with scepticism,
cryptocurrencies have gradually gained widespread acceptance and recognition as a distinct asset class. The market
is characterised by decentralised digital assets, enabling peer-to-peer transactions without the need for intermediaries such as banks [5]. Cryptocurrencies are increasingly recognised as financial assets, offering benefits such as
diversification, potentially high returns, and access to novel investment opportunities beyond traditional markets [12].
Bitcoin’s success paved the way for numerous cryptocurrencies, each with unique features, such as technologies and
Preprint submitted to Elsevier
June 2, 2025
market sentiment [7]. Even less popular cryptocurrencies can be considered to form portfolios generating profitable
returns [25]. Their decentralised nature facilitates borderless transactions, fostering financial inclusion and reducing
reliance on intermediaries like banks. However, cryptocurrencies also pose challenges, including price volatility, regulatory uncertainty, and security risks such as hacking and fraud [6, 34]. Despite these limitations, institutional adoption
is growing, underscoring the evolving role of cryptocurrencies in the global financial landscape. Understanding their
potential benefits and risks is essential for investors navigating this dynamic market.
Cryptocurrency price prediction is a challenging task, due to its large fluctuations, unpredictable behaviour and
complex nature [37]. Yang et al. [43] highlights the increased volatility and unpredictability of cryptocurrency markets compared to traditional stocks by examining the entropy and conditional entropy of price movements, revealing
a significant degree of unpredictability, or ”randomness,” in cryptocurrency price changes. This insight highlights
the limitations of conventional predictive models, which often fail to remain robust across varying market conditions.
Consequently, these findings suggest the necessity for diversified methods in modelling to grasp the intricate dynamics of cryptocurrency prices. Time series prediction models such as Auto-Regressive Integrated Moving Average
(ARIMA) and Long Short-Term Memory (LSTM) leverage historical data to forecast cryptocurrency price movements and trends, offering essential tools in this dynamic environment [27]. Patel et al. [36] investigated a deep
learning-based cryptocurrency price prediction system specifically designed for financial institutions, demonstrating
that such models could adaptively capture market trends and assist in risk management. Similarly, Vikram et al. [42]
developed a machine learning model that not only predicted prices but also identified optimal buying and selling
points, thereby enhancing trading strategies. Deprez et al. [15] examined the effectiveness of simple technical trading
rules which, while not providing additional returns during market bubbles, significantly mitigated downside risks and
managed large drawdowns. These studies collectively indicate that investors should employ adaptive, model-based
strategies that integrate both machine learning and traditional financial analytics to better predict and react to rapid
price changes in cryptocurrency markets.
Price prediction poses significant challenges, underscoring the importance of portfolios as risk management tools
in investment strategies. Portfolio management in the stock market encompasses various strategies aimed at achieving
investment goals while managing risk. Diversification spreads investments across different assets to mitigate individual asset volatility [19]. Asset allocation involves distributing investments among different asset classes based on risk
tolerance and financial objectives [8]. Value investing seeks undervalued stocks for potential price corrections, while
growth investing targets companies with high growth potential [14]. Sector rotation shifts investments among sectors
based on economic cycles, and market timing adjusts allocations based on price predictions [23]. Factor investing
focuses on specific characteristics that historically influence stock returns [4], while quantitative strategies use math2
ematical models for trading decisions [17]. Effective portfolio management requires a blend of financial analysis,
market research, and risk management techniques to navigate market dynamics and optimise returns. Portfolio theory, particularly Modern Portfolio Theory (MPT), is instrumental in shaping investment strategies within the financial
market and cryptocurrency domain. Brauneis et al. [10] demonstrate that MPT portfolios can yield higher Sharpe
ratios compared to individual cryptocurrencies. MPT emphasises diversification and risk management, aiming to
construct portfolios that balance optimal returns with acceptable levels of risk. The application of MPT to cryptocurrency portfolios involves estimating optimal weights for different assets based on historical returns and correlations,
considering their costs and potential risks [18]. Beyond MPT, other portfolio construction methodologies have been
proposed for the cryptocurrency market. Risk parity strategies allocate portfolio weights based on the risk contribution of each asset, rather than their expected returns, thereby achieving a more balanced risk exposure across assets
[11]. Furthermore, the emergence of algorithmic trading and machine learning techniques has revolutionised portfolio
management in the cryptocurrency space [3]. These advanced methods utilise computational algorithms to analyse
large datasets and identify profitable trading opportunities. For example, reinforcement learning algorithms can adapt
to learn optimal trading strategies in dynamic market environments [24]. Portfolio management in the cryptocurrency
market is a multifaceted field that combines traditional portfolio theory with cutting-edge technologies and methodologies. Overall, MPT provides the theoretical foundation for portfolio construction, guiding investors in creating
diversified portfolios along the Efficient Frontier.
The crypto portfolios can be built to involve sentiment analysis [26], by analysing social media and news sentiments related to cryptocurrencies, which can provide valuable insights into market sentiment and investor behaviour
[28], influencing the selection of assets for inclusion in portfolios. Furthermore, a machine learning-based technique can be employed to categorise cryptocurrencies based on various features, such as market capitalisation, trading
volume, and historical price trends [24]. This method seeks to identify inherent patterns and groupings within the
cryptocurrency landscape, guiding the construction of portfolios with assets exhibiting certain characteristics [38].
Another avenue to explore is the utilisation of blockchain analytics to assess on-chain data [44]. By analysing transactional patterns, wallet movements, and token flows, we can gain insights into liquidity patterns, identify major
stakeholders and significant transactions, and detect abnormal behaviours that could indicate market manipulation or
forecast significant price changes, which are crucial for developing informed investment strategies and enhancing risk
management. Integrating analytics into the portfolio formation process adds a fundamental layer to decision-making,
capturing the intrinsic qualities of each asset [29]. Principal Component Analysis (PCA) is specifically used in Bitcoin
portfolio construction to assist in simplifying and providing actionable insights from complicated market data. PCA
achieves this by reducing the dimensionality of the dataset, transforming the original variables into a new set of or3
thogonal components that represent the most significant market movements. This allows investors to identify the key
drivers of cryptocurrency price changes, facilitating more informed decisions on asset allocation and risk management
in the inherently volatile cryptocurrency market [20].
The price correlation between assets is crucial for diversification because it allows investors to mitigate risk by
spreading investments across assets that demonstrate varying behaviours. Utilising network modelling in this context is particularly beneficial for mapping these correlations, providing both a visual and quantitative analysis of
how different cryptocurrencies interact within the market. This approach is effective because it uncovers not only
pairwise correlations but also complex interdependencies that can influence the behaviour of asset collections. Such
complex systems, like the cryptocurrency market, are expected to contain underlying mechanisms driving the dynamics of prices that can be exploited to extract information to understand market behaviour [21]. Network analysis,
which involves constructing a network based on price correlations between cryptocurrency assets, is based on the
assumption that correlations between currencies provide more informative insights into market dynamics compared
to focusing on information deriving from single currencies. In the stock market, research has examined the efficacy of
various network-based portfolio selection methods—including hierarchical clustering trees, minimum spanning trees,
and neighbour-Nets—compared to random and industry group selection strategies [39]. Network analysis methods
offered unique insights into the stock market’s correlation structure, which is crucial for optimising portfolio diversification. This method helps to understand not only individual stock behaviours but also the broader market dynamics
by visualising the interconnections and distances between stocks, thereby facilitating more informed investment decisions. The application of rolling correlation yield trees, which involves analysing correlations over moving time
windows, has been employed to capture the evolving behaviours and clustering patterns in the stock market. This
standard approach helps to identify periods of higher density and industry clustering by revealing how relationships
between stocks intensify or diminish over time [32]. The minimum spanning tree (MST) of the correlation networks
contains the least correlated stocks in its periphery in contrast to highly correlated stocks in the centre of the tree
[25, 35]. This structure is beneficial for identifying potential investment opportunities by focusing on the periphery,
where the least correlated cryptocurrencies are located, offering insights into constructing diversified portfolios that
potentially yield higher profits. However, transforming correlation weights into distances and pruning the network’s
edges, as the MST method requires, may result in the loss of useful structural information from the original correlation network. Such information loss can obscure interdependencies between cryptocurrencies essential for assembling
portfolios with minimally correlated assets [22].
Identifying clusters of highly correlated cryptocurrencies is essential for effective portfolio diversification. To
achieve this, network community detection methods like the Louvain algorithm are employed. This algorithm opti4
mises modularity by assessing the density of edges within communities compared to what would be expected in a
randomised network [9]. Unlike MST, which focuses on the connectivity of the least correlated nodes, the Louvain
method aims to reveal densely connected groups of nodes, providing a different perspective on the underlying market structure. However, the ever-changing nature of cryptocurrency markets can render these communities transient
and the detection algorithms, due to their stochastic nature, may become sensitive to structural changes if the community structure is not sufficiently robust. The primary challenge is that the dependencies identified by traditional
community detection methods can be too temporal, reflecting fleeting market conditions rather than stable intrinsic
interdependencies. Lancichinetti et al. [30] propose consensus clustering to combine multiple stochastic partitions
into a reliable consensus, enhancing both the stability and accuracy of the clustering outcomes. In this study, we
present a clustering algorithm designed to extract static clusters from the dynamic community structures typically
observed in cryptocurrency markets. This method focuses on detecting more enduring and less time-sensitive relationships among cryptocurrencies. By integrating time series analysis and portfolio theory, the proposed technique
not only improves the reliability of conventional community detection methods but also enhances the development of
investment strategies that are less impacted by short-term market volatility. This comprehensive framework supports
the construction of diversified cryptocurrency portfolios using clusters of highly correlated cryptocurrencies that balance historical insights with predictive accuracy, maximising risk-adjusted returns over various investment periods. In
this way, investors can leverage stable interdependencies to make more informed decisions about asset allocation in
rapidly evolving markets.
2. Materials and Methods
Our method integrates time series prediction, network construction, community detection, clustering detection
and cryptocurrency selection. Figure 1 outlines the sequential steps and different components of our framework for
portfolio construction.
2.1. Cryptocurrency price return
We denote Pi,t as the price (in USD) of cryptocurrency i at time t. To remove trends from the time series, we
transform the original time series of prices to a time series of log returns ri,t as follows:
ri,t = ln
Pi,t
Pi,t−1
(1)
The study period is composed of a training and a test period for each cryptocurrency. The training period is set
to [t0 − ∆T , t0 ], where ∆T = 351 days. The performance of the portfolios is evaluated over an investment horizon
5
Step 1 Log return prediction
Step 3 Robust clustering detection
Step 2 Community detection
crypto a
crypto b
0.7
network G1
network Gn
training
0.6
0.3
θM = 0.5
crypto d
test
crypto e
0.6
crypto f
0.8
t0
t0+Δh
cluster 2
0.7
0.7
t0-ΔT
asset 1
crypto c
...
ri,t
cluster 1
random selection
0.6
Δc
Step 4 Cryptocurrency selection
random selection
asset 2
0.8
t
crypto g
Figure 1: The workflow illustrates each step of our method, including price log return prediction (step 1), community detection (step 2), robust
clustering detection (step 3), and cryptocurrency selection (step 4). The full blue lines in step 1 represent the empirical price log return data of the
cryptocurrencies, while the dashed blue lines indicate the predicted value. Total n networks (obtained in step 2) are aggregated in step 3. The blue
circles in step 4 indicate the robust clusters detected using our network clustering method.
spanning from [t0 , t0 + ∆h ], with ∆h varying from 1 to 14 days (Fig.1, step 1).
The log return at a future time th , expressed as:
ri,th = ln
Pi,th
Pi,th−1
(2)
which is predicted using the historical log return data from the study period.
2.2. Cryptocurrency price prediction
We conducted a comparative analysis of price predictive models, including Auto-Regressive Integrated Moving
Average (ARIMA), Long Short-Term Memory (LSTM), and a Naı̈ve method for each cryptocurrency. The AutoRegressive Integrated Moving Average (ARIMA) model is a generalised model of Auto-Regressive Moving Average
(ARMA) that combines Auto-Regressive (AR) and Moving Average (MA) model and builds a composite model of the
time series, which would be a standard model for forecasting a time series. The class of an ARIMA model is defined as
ARIMA(p, d, q), where: p is the order of the auto-regressive model, d is the the order of differencing, and q is the order
of moving average model. We fit the ARIMA model to achieve the minimum Akaike information criterion (AIC),
making the model’s predictive performance better, where AIC quantifies the relative quality of statistical models for a
given set of data [2].
The Long Short-Term Memory (LSTM) is a Recurrent Neural Network (RNN) designed to retain information
6
from previous time steps for future predictions. To forecast the value of each cryptocurrency in a certain study period,
we fit the historical data using the TensorFlow LSTM model [1] and tune the essential hyper-parameters, using the
mean squared error (MSE) loss function. These include the number of units, defining the dimension of hidden states
in the LSTM layer; the dropout rate, which determines the probability of training each node in a layer; the learning
rate, controlling the adaptation speed of the model adjusted, and the sequence length, representing the number of time
steps input into the LSTM network.
The Naı̈ve method serves as a reference model in our study, employing a straightforward prediction method where
the last observed value within the training period [t0 − ∆T , t0 ] is simply used for the prediction. This approach assumes
that the behaviour of the cryptocurrency price remains, that is, the predicted log return for a future time th , denoted as
r̂i,th , is set equal to the log return at the last time in the training period t0 , i.e., r̂i,th = ri,t0 .
The Mean Squared Error (MSE) of each cryptocurrency i at ∆h day is used to evaluate the performance of the
prediction model.
MS Ei,th = (ri,th − r̂i,th )2
(3)
where ri,th is the observed log return values, and r̂i,th is the predicted log return values at time th = t0 + ∆h .
2.3. Cryptocurrency correlation network
A systematic approach is used to evaluate the price relationships between different cryptocurrencies (Fig.1, step
2). We used a fixed period of ∆c = 30 days to calculate correlations between pairs of cryptocurrencies, which allows
us to track the interactions between different cryptocurrencies over time [40]. We employ the correlation network to
examine the dynamic relationships between cryptocurrencies, which provides a framework for analysing the properties
and characteristics of interconnected nodes.
We constructed a weighted network with nodes i = {1, 2, 3, ..., N} representing cryptocurrencies, and edges representing the correlation ρi j between cryptocurrencies. Note that we neglect the correlation if it is smaller than a
threshold θρ . The correlation ρi j between the two cryptocurrencies i and j is used to measure the level of interdependency between them and is represented by a weighted edge (i, j) within the network, where ρi j serves as the weight of
the edge [13]. We use the Pearson correlation coefficient ρi j between log returns of cryptocurrencies i and j over the
fixed period ∆c :
ri,t − r̄i r j,t − r̄ j
ρi j = q
2
P
2 P t ri,t − r̄i
t r j,t − r̄ j
P
t
(4)
where r̄i and r̄ j are the mean log returns of cryptocurrencies i and j respectively. The Pearson correlation coefficient
ρi j ranges from -1 to +1, indicating the strength and direction of the linear relationship between the log returns of
7
cryptocurrencies i and j. A threshold θρ = 0.5 is applied to the original correlation network to filter the network
information by removing the least correlated edges. That is, only edges characterised by ρi j > θρ = 0.5 are retained.
2.4. Network clustering
Networks often consist of several groups of nodes. These groups of nodes form the so-called network communities. Each community within the network consists of a subset of nodes where the pair of nodes in the same community
are more densely connected than those in different communities.
Modularity, represented as Q, quantifies the extent to which a network can be divided into these distinct communities (eq.5). The value of Q measures the density of edges inside communities relative to what would be expected at
random. A higher Q value indicates a network with a modular structure.
Q=
"
#
ki k j
1 X
ρi j −
δ(ci , c j )
2m i, j
2m
(5)
where ρi j represents the weight of the edge between nodes i and j. The weighted degree ki and k j are the sum of
the weights of the edges connected to nodes i and j respectively. The total sum of all edge weights in the network is
denoted by m. The function δ(ci , c j ) equals 1 if nodes i and j belong to the same community, and 0 otherwise, where
ci and c j denote the community assignments for nodes i and j, respectively. The Louvain algorithm is a method for
detecting communities in networks by maximising modularity [9]. We apply it to identify the communities in the
correlation network of cryptocurrencies. The community structure was detected for each network in the time interval
∆c .
The outcome of the Louvain algorithm is not unique, which depends on the initial condition. Thus, we designed
a method for detecting temporally robust clusters of highly correlated cryptocurrencies. The community detection
procedure was repeated n = 30 times, where each time, the detecting interval ∆c was shifted backwards by one
day to obtain total n networks (Fig.1, step 2). When predicted information is incorporated, n networks spanning
[t0 + ∆h − ∆c − n + 1, t0 + ∆h − n + 1], [t0 + ∆h − ∆c − n + 2, t0 + ∆h − n + 2], . . . , [t0 + ∆h − ∆c , t0 + ∆h ]. Otherwise, spanning
[t0 − ∆c − n + 1, t0 − n + 1], [t0 − ∆c − n + 2, t0 − n + 2], . . . , [t0 − ∆c , t0 ]. We aggregated community detection outcomes
across n time intervals to build a N × N similarity matrix S . All entries si j of S start with 0 and are incremented by 1
each time cryptocurrencies i and j are identified within the same community during a given detection interval ∆c . The
s
normalised similarity score s̃i j between any pair of cryptocurrencies is calculated by s̃i j = ni j , where si j is an element
of the aggregated matrix, and n is the total number of time intervals. s̃i j ranges from 0 to 1, where 0 indicates that
the pair of cryptocurrencies i and j never appear in the same community, and 1 indicates consistent co-membership
across all investigated periods. This normalised matrix S̃ serves as the basis for constructing a similarity network Γ
8
(Fig.1, step 3), where nodes represent cryptocurrencies and edges now reflect the frequency of co-occurrence in the
same communities.
The final robust clustering structure is obtained through a hierarchical decomposition of the matrix S̃ . We apply a
threshold θ M = 0.5 to the normalised network S̃ to ensure that two cryptocurrencies are considered part of the same
robust cluster only if they have appeared in the same community in at least 50% of the times, removing all edges with
s̃i j < θ M . The connected components in network Γ are isolated through this process, with each component representing
a robust cluster of interdependent cryptocurrencies (see clusters of step 4 in Fig.1). This step increased the reliability
of the final clusters by averaging the detected communities over different realisations of the stochastic algorithms and
by shifting study periods, thus removing spurious memberships and focusing on strong inter-dependencies.
2.5. Portfolio forming strategy
A portfolio is a combination of financial assets held by an individual or entity to achieve specific investment
objectives. In the traditional market, it can encompass various assets like stocks, bonds, and commodities, aiming to
balance risk and return. For the portfolio construction process, the robust clustering structure can be used to identify
groups of interdependent cryptocurrencies. To ensure that the portfolio incorporates a diverse set of cryptocurrencies,
each representing a different aspect of market behaviour and potential investment value, one cryptocurrency can be
selected uniformly at random from each cluster to be included in the portfolio (Fig.1, step 4).
No Shifting (Baseline)
No price prediction
With Shifting (S)
Historical data
No Shifting (P)
With price prediction
With Shifting (P-S)
Figure 2: Flowchart illustrating the strategies employed for detecting robust clusters of highly correlated cryptocurrencies. The strategies are
divided based on whether they utilise predicted data and whether they include shifting in the training periods.
We proposed four strategies that utilise both historical and predicted price data to identify robust clusters within
the price correlation network. These strategies are categorised based on the use of predicted data and the procedure of
network clustering (Fig.2). The baseline strategy employs purely historical price data to identify community structures
without any alteration to the detecting interval ∆c . It repeatedly applies community detection within the same interval
across n different initial conditions to identify robust clusters. The S strategy also relies solely on historical price
9
data but incorporates a shifting mechanism where the detecting interval ∆c is systematically shifted backwards by one
day for each realisation of the community detection algorithm, over n intervals. This mechanism exploits historical
patterns reflected on average over varied periods. The P strategy integrates predicted price data into the network
construction process, employing the prediction model to forecast future individual prices. Community detection is
performed within the same interval ∆c including the test period and repeated n times, which is the same as the baseline
method. The P-S strategy uses predicted prices obtained from the prediction model and includes the shifting for
clustering detection. This method leverages both future price predictions and historical price movements to identify
persistent correlations between cryptocurrencies and consequently more robust intrinsic inter-dependencies between
them.
Each strategy is designed to capture different aspects of market behaviour, from static historical patterns to dynamic changes influenced by predicted movements. This approach aims to enhance the accuracy and robustness of
the cryptocurrency clustering structure, thereby refining the cryptocurrency selection process for diversified portfolio
construction.
2.6. Cryptocurrency portfolio
Optimal asset allocation remains a central issue in investment portfolio management [41]. Modern Portfolio
Theory (MPT) provides a robust framework for distributing asset weights by evaluating the variance and correlations
among assets. This framework is effective not only for traditional asset classes but also for non-traditional ones, such
as cryptocurrencies. Cryptocurrencies present a unique interdependency structure based on price correlations, which
can be used to substantially enhance portfolio diversification and improve the overall risk-return performance.
To navigate the trade-off between risk and return, our study incorporates the Sharpe Ratio (SR) in the portfolio
optimisation (eq.6). The Sharpe Ratio measures the excess return per unit of risk relative to a risk-free rate (r f ). The
method is designed to maximise the efficiency of the portfolio by aiming for the highest possible returns adjusted for
risk.
h
i
h
i
E rportfolio − r f
E rportfolio − r f
SR = q
h
i=
σportfolio
Var rportfolio − r f
(6)
h
i
where E rportfolio is the expected portfolio return, and σportfolio is the standard deviation of the portfolio return, representing the total risk of the portfolio. r f is the risk-free rate, set here to be a constant value of 0.02, which represents
the expected average return of 10-year U.S. Treasury bonds (considered as a benchmark for a risk-free investment).
The inclusion of the risk-free rate provides a more comprehensive evaluation metric by assessing the performance of
an investment compared to a risk-free asset, after adjusting for its risk.
10
h
i
The expected return of the portfolio, E rportfolio , is calculated by taking the weighted average of the expected
returns of individual assets in the portfolio, where wi represents the weight assigned to each asset i and µi represents
the expected return of asset i:
h
i X
E rportfolio =
wi µi
(7)
The portfolio’s volatility σportfolio is computed using the covariance matrix of the cryptocurrencies’ returns Σ and
the vectors of weights for each cryptocurrency w:
√
σportfolio =
wT Σw
(8)
By integrating the SR maximisation approach within the framework of MPT, our method not only targets higher
returns but also ensures that these returns are achieved with a calibrated risk level, suitable for the volatile nature of
cryptocurrency markets. The optimisation problem for determining optimal asset weights in the portfolio, given a
selected set of cryptocurrencies based on clustering, can be expressed as:
µT w − r f
max √
w
wT Σw
(9)
subject to eT w = 1, where µ = (µ1 , µ2 , . . . ) is the vector of expected returns for each cryptocurrency and e = (1, 1, . . . )
denotes the unity vector.
To conduct these calculations, we employ the PyPortfolioOpt library, which utilises MPT principles to optimise the
SR. This Python-based tool performs complex calculations of expected returns and associated risks to strategically
balance potential returns against risk. Utilising the EfficientFrontier module from PyPortfolioOpt, the process of
assigning weights to various assets is streamlined, aiming to maximise risk-adjusted returns [33].
2.7. Portfolio performance metric
Three key metrics are used to assess the trading performance of portfolios constructed using the proposed strategies. The Average Trade (AT) quantifies the mean profit or loss per trade:
PM
AT =
itr =1 Pitr
M
(10)
where Pitr represents the profit or loss for trade itr , equivalent to the return for that trade when the initial investment is
normalized to 1. Note that Pitr can take a negative value for losing trades. M denotes the total number of trades. A
positive AT value means that, on average, the strategy produced a profit.
11
The Win Rate (WR) measures the ability of a trading strategy to generate profitable trades, which is the percentage
of trades with positive returns out of all trades:
WR =
Mwin
M
(11)
where Mwin is the number of winning trades (i.e., trades with a positive return). A win rate larger than 0.5 indicates
that more than half of the trades were profitable, implying a generally successful strategy.
The Profit Factor (PF) measures the ratio of the total profits to the total losses from all trades:
+
i =1 Pitr
PM
PF = P Mtr
−
itr =1 Pitr
(12)
where P+itr is the profit from trade itr , and P−itr is the loss from trade itr . P+itr and P−itr are non-negative numbers. A
profit factor above 1.0 indicates that the strategy’s gross profits surpass its gross losses, highlighting its financial
effectiveness.
2.8. Cryptocurrency Price Data
The dataset comprises daily closing market prices (at midnight) of 5,450 cryptocurrencies collected from various
online sources. We extracted the top 1,000 currencies with the highest market cap on 22.02.2022 and then those
persistently active (being traded) from 15.11.2017 to 15.04.2022 (T total = 1, 613 days). Cryptocurrencies with missing
prices for more than ten days are excluded. The final data set contains a time series of prices (in USD) for N = 157
cryptocurrencies. The full list of cryptocurrencies used in our study is given in the supplementary information.
3. Results and Discussion
In this section, we present the descriptive statistics of the cryptocurrencies’ prices during the study periods. We
then examine the performance of the future price prediction, followed by an analysis for the community structure of the
correlation network, and the properties of the detected cryptocurrencies clusters. Finally, we assess the performance
of the proposed portfolios.
3.1. Descriptive statistics
To assess the dynamics of the cryptocurrency market, we first examine descriptive statistics of the price return data
across a selection of cryptocurrencies. This initial exploration aims to provide an understanding of their inherent properties, including volatility, skewness, and kurtosis, which are crucial in guiding subsequent future price forecasting
and community detection analysis.
12
T
Mean (×10−3 )
Std deviation
Minimum
Median (×10−3 )
Maximum
Skewness
Kurtosis
ADF test statistic
ADF test p-value
BTC
ETH
LTC
USDT
XRP
OCN
DLT
ENG
ETP
FUEL
1612
1.065
0.042
-0.480
1.691
0.203
-0.922
12.743
-28.080
<0.001
1612
1.378
0.053
-0.570
1.752
0.246
-0.927
10.496
-27.494
<0.001
1612
0.349
0.058
-0.466
-0.254
0.426
-0.059
9.223
-15.159
<0.001
1612
-0.003
0.004
-0.057
0.000
0.053
0.066
75.109
-10.320
<0.001
1612
0.823
0.067
-0.541
0.178
0.618
0.917
15.453
-27.452
<0.001
1612
0.770
0.176
-0.581
-4.167
6.099
25.969
898.229
-42.422
<0.001
1612
-2.732
0.094
-0.960
-1.321
0.809
-0.374
15.624
-11.396
<0.001
1612
-1.800
0.090
-0.769
-2.219
0.889
0.364
13.972
-43.292
<0.001
1612
-2.110
0.078
-0.829
-3.151
0.591
-0.417
15.725
-27.504
<0.001
1612
-2.708
0.375
-2.943
-3.774
2.920
-0.070
32.004
-7.631
<0.001
Table 1: Descriptive statistics and augmented Dickey–Fuller (ADF) test without trend of log returns (ri,t ) taken at T times (days) for the top 5 and
bottom 5 cryptocurrencies according to their average market capitalisation over the whole study period. The p-values of the ADF test reject the
null hypothesis, which indicates that the time series are stationary. Time T is given in days and the Mean, Std deviation, Minimum, Median, and
Maximum of the log returns are calculated in USD.
Table 1 shows the log return ri,t for the top 5 (BTC: Bitcoin; ETH: Ethereum; LTC: Litecoin; USDT: Tether; XRP:
Ripple) and bottom 5 (OCN: Odyssey; DLT: Agrello; ENG: Enigma; ETP: Metaverse; FUEL: Etherparty) cryptocurrencies according to their average market capitalisation during the study periods. The descriptive statistics reveal
differences in the behaviour of these assets. Bitcoin (BTC) and Ethereum (ETH) show positive mean returns, while
lower market-cap cryptocurrencies such as DLT and FUEL exhibit negative mean returns. The standard deviation
indicates higher volatility for lower market cap cryptocurrencies, with FUEL having the highest standard deviation.
A Gaussian random variable would have skewness of 0 and kurtosis of 0, therefore, the skewness and kurtosis values
indicate that the returns exhibit non-normal distribution properties, characterised by heavy tails and frequent extreme
values for cryptocurrencies like OCN. Augmented Dickey-Fuller (ADF) test results confirm that the log returns ri,t of
all examined cryptocurrencies are stationary, rejecting the null hypothesis at a significance level below 0.001. This
implies that statistical properties such as mean and variance do not change over time, i.e., the stationarity of the times
series.
BTC
ETH
LTC
USDT
XRP
OCN
DLT
ENG
ETP
FUEL
BTC
ETH
LTC
USDT
XRP
OCN
DLT
ENG
ETP
FUEL
1
0.57
0.77
0.02
0.56
0.18
0.30
0.45
0.53
0.12
0.57
1
0.58
0.03
0.49
0.13
0.37
0.38
0.42
0.12
0.77
0.58
1
0.06
0.64
0.19
0.32
0.44
0.52
0.11
0.02
0.03
0.06
1
-0.02
-0.06
-0.03
-0.02
-0.05
0.00
0.56
0.49
0.64
-0.02
1
0.13
0.27
0.35
0.44
0.12
0.18
0.13
0.19
-0.06
0.13
1
0.08
0.11
0.10
0.03
0.30
0.37
0.32
-0.03
0.27
0.08
1
0.26
0.32
0.08
0.45
0.38
0.44
-0.02
0.35
0.11
0.26
1
0.36
0.08
0.53
0.42
0.52
-0.05
0.44
0.10
0.32
0.36
1
0.08
0.12
0.12
0.11
0.00
0.12
0.03
0.07
0.08
0.08
1
Table 2: Correlation matrix (ρi j ) of log returns for the top 5 and bottom 5 cryptocurrencies ranked according to their average market capitalisation
over the whole study period. Correlations larger than 0.5 are in bold.
13
Table 2 shows the price correlation matrix between the pairs of the cryptocurrencies. The highest correlations are
observed between the high-cap cryptocurrencies such as BTC, ETH, and LTC, indicating that these cryptocurrencies
often move together, which can be a factor that should be beneficial for predictive models aiming to capture shared
market movements. Conversely, Tether (USDT), in particular, shows relatively low correlations with other cryptocurrencies, reflecting its function as a stablecoin pegged to the U.S. dollar. Rather than exhibiting a perfectly constant
price, USDT is designed to minimize volatility relative to traditional cryptocurrencies, which can make it an attractive
asset during turbulent markets. When traders and investors move into stablecoins like USDT during upheavals, correlations across the broader cryptocurrency market can shift abruptly. These sudden changes in correlation and investor
behaviour can pose substantial challenges for predictive models, which may struggle to capture the rapidly evolving
dynamics of the market.
3.2. Performance price prediction
We evaluated the performance of three price prediction methods — ARIMA, LSTM, and Naı̈ve method (see
Sec.2.2 for details) — across n s = 42 study periods, each including a training period ∆T = 351 days and up to 14-day
test period.
Figure 3 shows the log return prediction for investment horizons ranging from 1 to 14 days for cryptocurrencies
BTC and ETH, during periods with and without external shocks. In periods of market stability, day-by-day MSE
values remain relatively low for both cryptocurrencies. For BTC (Fig. 3A), averaging the MSE over the 1–14 day
horizon indicates that LSTM achieves the lowest mean MSE at 5.80×10−4 , followed by ARIMA at 6.34×10−4 , while
the Naı̈ve method exhibits the highest mean MSE of 29.77×10−4 . For ETH (Fig. 3B), the MSEs are generally higher
than those for BTC but still manageable. Over the same 14-day horizon, LSTM maintains a slightly lower mean MSE
of 15.04×10−4 , compared to ARIMA at 16.53×10−4 and the Naı̈ve method at 40.21×10−4 .
The performance changes during turbulent scenarios, such as the time when the World Health Organization
(WHO) declared the novel coronavirus (COVID-19) outbreak a global pandemic on March 11, 2020. This declaration led to heightened uncertainty in global markets, including cryptocurrencies, which subsequently saw increased
volatility. In this context, average MSEs over 1 to 14 days prediction period for both cryptocurrencies rise, indicating
the increased prediction challenge. For BTC (Fig. 3C), the mean MSE over the 1–14 day horizon rises to 2.14×10−2
for ARIMA and LSTM, with the Naı̈ve method performing comparably, also averaging at 2.16×10−2 . Similarly, ETH
(Fig. 3D) experiences heightened forecasting challenges, as ARIMA and LSTM approach mean MSE at 3.07×10−2
and 3.06×10−2 , while the Naı̈ve method records a slightly lower, though still elevated, mean MSE of 3.00×10−2 .
While ARIMA and LSTM generally outperform the Naı̈ve method in stable environments by maintaining lower
average MSE levels, their advantage diminishes during periods of heightened volatility. Under such turbulent sce14
0.05
0.00
-15
-08
20
20
-12
-01
20
20
20
20
-12
-22
-11
-15
20
20
-11
-08
20
-22
-03
-15
20
-03
-08
20
20
Date
20
20
20
-03
-01
-03
20
-22
20
-02
20
20
-02
-15
ETH
-08
20
20
-02
-22
-03
-15
20
-03
-08
20
D
-0.60
20
Date
20
20
20
-03
-01
-03
20
-22
20
-02
-15
20
20
20
-02
-08
-02
20
20
-0.40
BTC
Date
-0.20
20
C
-11
-01
0.00
-0.60
20
20
20
0.00
ri, t
0.20
-0.20
ETH
-11
-15
-08
20
0.20
-0.40
B
-0.10
20
20
-12
-01
Date
20
20
20
-12
-22
-11
-15
20
20
-11
-08
20
-11
20
20
20
20
20
-11
-01
BTC
20
A
-0.10
-12
-0.05
-12
-0.05
20
ri, t
0.00
20
ri, t
0.05
ri, t
0.10
Empirical
ARIMA
LSTM
Naïve
20
0.10
Figure 3: Time series of log returns for two cryptocurrencies (A) BTC and (B) ETH for each model of predicting horizon from 1 to 14 days, using
price data during a period without market disruptions, and (C) BTC and (D) ETH during an external shock (COVID-19 outbreak). The vertical
black dashed line shows time t0 , which separates the training and prediction periods. The dashed lines represent price predictions according to
different models.
narios, the Naı̈ve method becomes comparatively more resilient. The overall increase in average MSE indicates that
sudden market changes are challenging for the log return prediction of cryptocurrencies. This result highlights the
need for robust predictive strategies that can adapt to both normal and extreme market conditions.
∆h (days)
LSTM
ARIMA
Naı̈ve
1
1.58
1.35
2.88
2
1.96
1.92
3.21
3
1.97
1.84
3.06
4
1.70
1.54
2.96
5
1.94
1.71
3.26
6
2.00
2.09
3.78
7
1.94
1.90
2.80
8
1.30
1.29
2.43
9
2.27
2.25
4.52
10
2.04
1.94
3.91
11
8.72
8.60
10.10
12
2.11
2.13
3.76
13
2.27
2.15
3.68
14
1.57
1.48
2.97
Table 3: The average of the median MSE (×10−3 ) for prediction horizons ∆h = 1, 2, ...14 days. This metric is obtained by first taking the median
MSE across all N = 157 cryptocurrencies, followed by averaging these medians across all n s = 42 study periods. The best prediction model with
the smallest MSE for each day prediction is labelled in bold.
We evaluated the prediction accuracy of the three methods across all study periods by calculating the median of
the Mean Squared Error (MSE) for each cryptocurrency over prediction horizons ∆h (Tab.3). To assess the overall
predictive power, the calculated median MSEs for each method were averaged over all n s = 42 study periods. The
15
median MSE was chosen because it is less sensitive to extreme values, providing a more stable evaluation across the
diverse behaviours of cryptocurrencies. Table 3 indicates that the ARIMA overall outperforms the LSTM and Naı̈ve
method, achieving the lowest MSE across nearly all prediction horizons.
2.0
Median MSE
1.5
1.0
0.5
0.0
0.5
1.0
DI
ON ME
I
FUON
E
XCL
SYP
BIT S
C
MDNY
QRT
KEL
VE Y
OMRI
ZANI
SO P
U
PA L
UB C
NEQ
BCU
CND
EDD
G
PR
E
1.5
Cryptocurrency
Figure 4: The top 20 cryptocurrencies with the largest average of the median MSE from the ARIMA method and its standard deviation over all
n s = 42 study periods. The median MSE for each cryptocurrency is computed for all prediction horizons (∆h = 1, 2, ..., 14 days) using the ARIMA
method. This median is then averaged over all n s = 42 study periods to get the final ranking. The dashed horizontal line shows median MSE = 0.
Figure 4 shows the top 20 cryptocurrencies with the largest average of the median MSE for ARIMA model. The
results emphasise that while the majority (98.73%) of cryptocurrencies have an average of the median MSE lower than
0.056, DIME and ONION are characterised by higher unpredictability. This indicates that these cryptocurrencies are
notably difficult to predict accurately, possibly due to their volatile market behaviours, lower liquidity, or susceptibility
to market manipulations and external shocks. The distinction calls for tailored predictive strategies or reconsideration
of their inclusion in standard prediction methods due to their complex price movements.
3.3. Cryptocurrency cluster identification
⟨Q⟩
σ
Baseline
0.124
0.087
P(ARIMA)
0.123
0.084
S
0.121
0.071
P(ARIMA)-S
0.122
0.069
Table 4: Average modularity (⟨Q⟩) and the corresponding standard deviation (σ) over all n s = 42 study periods. This metric is obtained by first
computing the mean Q across the n = 30 realisations for each study period, and then averaging those means over all n s = 42 study periods.
The analysis of the community structure from one giant connected component after applying the threshold θρ = 0.5
from the price correlation networks shows that the average modularity ⟨Q⟩ values for each strategy across n = 30
realisations for each period are relatively low (Tab.4), with values around 0.12. Modularity values range from −1
16
to 1, where values closer to 1 indicate a strong and well-defined community structure. Thus, the observed low
modularity values suggest a weak modular or clustered structure. Among all the strategies, those without sliding
windows, specifically the Baseline and P(ARIMA), resulted in slightly higher average modularity ⟨Q⟩. Conversely,
while sliding window strategies, such as S and P(ARIMA)-S, demonstrate lower standard deviations σ, reflecting
their higher consistency in cluster definitions across different periods, at the cost of reduced modularity.
0.8
0.8
without external shock
with external shock
0.6
Proportion
Proportion
0.6
0.4
0.2
0.0
0.4
0.2
a
0
5
10
15
20
25
Cluster Size
30
35
40
0.0
b
0
5
10
15
20
25
Cluster Size
30
35
40
Figure 5: Size distribution of the final robust cryptocurrency clusters after applying the (A) Baseline strategy and (B) P(ARIMA) strategy, during
the same study periods, comparing scenarios without external shocks and with external shocks (COVID-19 outbreak).
Figure 5 shows the size distribution of final clusters constructed using the Baseline and P(ARIMA) strategies,
during the same study period with and without external shocks. Under both scenarios, the cluster size distributions
remain skewed toward single-node clusters, with the P(ARIMA) strategy yielding a higher number of such clusters.
Although single-node clusters dominate, larger clusters containing up to 38 nodes also appear. The predominance
of single-node clusters results in a more consistent selection of cryptocurrencies for portfolio construction, as fewer
cryptocurrencies are available per cluster, typically leading to repeated selections of the same cryptocurrency. However, larger clusters are also formed, introducing variability in the randomised selection process due to the increased
number of cryptocurrencies available in those clusters. This variability not only influences the overall diversity of
the portfolio but can also affect the resilience in returns over time, thereby introducing a layer of unpredictability in
portfolio performance.
Figure 6 shows the distribution of the top 20 cryptocurrencies, ranked by the proportion of occurrence within the
largest cluster identified in each study period for both the Baseline and P(ARIMA) strategies. Among these top 20, 14
cryptocurrencies, are common to both strategies. Some top identified cryptocurrencies, though not as widely recognised as major cryptocurrencies like BTC, also consistently appear in the largest clusters across different clustering
17
Proportion
0.4
0.3
0.2
B
0.5
0.4
0.3
0.2
0.1
0.0
0.0
PAGR
ZCSC
FA
VAIR
L
T
SCOA
R
FL BTCT
AS
PNH
D
SYGRC
PNHX
WR
B
FATC
VAIR
T
FL OAL
AS
PPH
C
SCBCN
RT
SPYND
GNRX
NMC
BT C
FCM
CT
PHVC
ZCR
ML
BCTL
FTHC
0.1
Cryptocurrency
P
MAKC
FR
DGCT
MA B
ID
0.5
Proportion
0.6
A
PL
NMPC
C
0.6
Cryptocurrency
Figure 6: Top 20 cryptocurrencies ranked by the proportion of occurrence within the largest cluster identified across all n s = 42 study periods for
the (A) Baseline and (B) P(ARIMA) strategies. Cryptocurrencies appearing in the top 20 for both strategies are highlighted in bold.
strategies. Their presence alongside a dominant cryptocurrency suggests that they correlate with BTC’s movements.
Such correlations are valuable in portfolio construction, as these cryptocurrencies could be complementary or counterbalance assets relative to BTC. The repeated presence of these cryptocurrencies in large clusters also indicates that
they have more significant interactions with a broad set of other cryptocurrencies, not just BTC. This can be useful
for investors aiming to diversify their portfolios. By investing in these cryptocurrencies, they might be indirectly
exposed to a wider network of cryptocurrencies through the interdependencies captured in these clusters. This can
help in spreading risk and potentially capturing gains from multiple sources within the cryptocurrency market, which
are driven by similar underlying trends or collective market sentiments.
3.4. Portfolio performance
We evaluate the performance of the proposed portfolio strategies: Baseline, S, P(ARIMA), and P(ARIMA)-S(see
Sec.2.5 for details). The performance of each portfolio over the period [t0 , t0 + ∆h ] is evaluated by calculating three
metrics: Average Trade (AT), Win Rate (WR), and Profit Factor (PF) (see Sec.2.5 for details). According to all
indicators (Tab.5), all strategies generally perform well over shorter investment horizons. As the investment horizon
increases, there is a decreasing performance of the portfolios (independently of the chosen strategy). First, the AT
values across all strategies are positively strong on the first day. These positive performances continue, though at a
diminishing rate, up to the 9-day holding period where all cluster-structured portfolios still exhibit positive average
trades. However, a decline is observed as the investment horizon extends. By the 14-day holding, only the P(ARIMA)
strategy shows a positive AT, indicating potential wins. The standard deviation of AT tends to decrease with longer
holding periods, suggesting that the variability in returns is reduced over longer investment horizons, which indicates
that while returns diminish, the variability also becomes more stable over time. Second, the WR starts strong, with
18
∆h (days)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Average Trade (AT) (x100%)
Baseline P(ARIMA)
S
6.32
4.86
5.45
4.29
3.15
3.52
2.39
0.84
0.75
2.12
1.60
1.72
2.17
2.08
1.10
1.12
1.51
0.63
0.68
1.20
0.61
1.17
1.41
0.91
0.32
0.54
0.12
0.15
0.95
-0.22
0.03
0.56
-0.62
0.09
0.40
-0.63
0.02
0.26
-0.71
-0.21
0.50
-0.94
P(ARIMA)-S
5.90
4.18
1.47
2.38
1.79
1.08
0.74
0.98
0.09
-0.23
-0.70
-0.84
-0.91
-0.81
Standard Deviation of Average Trade (AT)
Baseline P(ARIMA)
S
P(ARIMA)-S
13.54
11.86
15.33
13.41
11.14
10.87
10.64
9.58
8.73
8.15
8.87
7.88
8.45
8.89
9.94
9.65
7.42
6.62
7.49
6.95
7.29
6.72
6.72
6.20
6.98
6.03
6.43
6.36
6.34
5.46
5.99
6.11
5.75
5.00
5.33
5.38
5.40
5.99
5.08
4.78
6.24
5.06
6.60
6.56
5.73
5.50
6.41
6.09
5.70
5.61
6.39
6.22
5.11
5.78
5.42
5.06
∆h (days)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Win Rate (WR)
Baseline P(ARIMA)
0.69
0.71
0.71
0.62
0.62
0.64
0.62
0.67
0.57
0.71
0.55
0.64
0.55
0.64
0.52
0.62
0.50
0.52
0.48
0.55
0.48
0.52
0.45
0.52
0.48
0.50
0.50
0.60
P(ARIMA)-S
0.74
0.69
0.64
0.62
0.60
0.67
0.55
0.57
0.48
0.57
0.50
0.48
0.60
0.57
Baseline
4.73
3.14
2.17
2.02
2.12
1.47
1.27
1.57
1.15
1.07
1.01
1.04
1.01
0.90
S
0.74
0.71
0.60
0.62
0.60
0.62
0.55
0.57
0.45
0.57
0.57
0.55
0.60
0.55
Profit Factor (PF)
P(ARIMA)
S
P(ARIMA)-S
3.19
3.30
4.23
2.17
2.64
3.50
1.32
1.27
1.65
1.74
1.62
1.98
2.41
1.45
1.91
1.91
1.29
1.60
1.67
1.27
1.34
1.90
1.46
1.50
1.30
1.06
1.04
1.55
0.89
0.88
1.36
0.75
0.71
1.21
0.75
0.66
1.13
0.73
0.64
1.30
0.64
0.65
Table 5: Statistics of four performance indicators of Average Trade (AT) (x100%), the Standard Deviation of Average Trade (AT), Win Rate (WR)
and Profit Factor (PF) of each strategy portfolio for investment horizon holding from 1 to 14 days over all n s = 42 studied periods. Performances
above the break-even values for the Average Trade (0), the Win Rate (0.5), and the Profit Factor (1) are in bold. The columns of strategy P(ARIMA)
that performed well across all holding periods are shaded in grey.
all strategies exceeding a 0.69 probability of profitable trades on day 1. The P(ARIMA) strategy maintains a WR
no less than 0.50 up to the 14-day horizon, suggesting a consistent probability of profitable outcomes. Finally, the
PF values are consistently above 1 in the early days for all strategies, reinforcing the notion of initial profitability.
This metric also decreases over time, aligning with the patterns observed in AT and WR, and indicating a common
trend of reduced profitability over longer investment periods. Despite this, the P(ARIMA) strategy maintains stronger
performance metrics throughout the investment horizon. The Baseline strategy serves as a reference approach. Its
consistently strong performance across shorter periods and comparative resilience in longer durations suggest that
our clustering strategy can be sufficiently effective even without the use of predictive models or sliding window
adjustments.
Figure 7 shows the performance of the Baseline and P(ARIMA) investment strategies over increasing holding
19
0.75
Baseline
P(ARIMA)
5
0.70
4
0.65
3
0.60
PF
WR
AT (x100%)
7
6
5
4
3
2
1
0
0.55
0.50
0.45
A
1
3
5
7
9
h (days)
11 13
1
B
0.40 1
2
3
5
7
9
h (days)
11 13
C
0 1
3
5
7
9
h (days)
11 13
Figure 7: Performance evaluation indicators for portfolio construction strategies Baseline and P(ARIMA) over an investment horizon (∆h ) from
1 to 14 days: A) Average Trade (AT), B) Win Rate (WR), and C) Profit Factor (PF). The dashed lines represent the reference levels of AT = 0,
WR = 0.5, and PF = 1, respectively.
periods. Both strategies maintain robust performance up to a 9-day investment horizon, with the P(ARIMA) strategy remaining profitable for all three indicators throughout the studied period. The Baseline strategy demonstrates a
92.86% probability of achieving positive returns within the investment horizons, while the P(ARIMA) strategy maintains a 100% probability of obtaining positive AT(Fig.7A). Our P(ARIMA) strategy, which uses the predicted prices
based on the ARIMA model instead of sliding the window of historical data, achieves an average expected return of
486.21% for a 1-day investment horizon. The WR fluctuates but remains above 0.5 for both strategies up to the 9day holding period (Fig.7B). Strategies incorporating a predictive model (P(ARIMA)) exhibit higher WR over longer
investment horizons compared to the Baseline strategy. This reflects the advantage of using predictive analytics to
adapt to market changes and enhance trading decisions. The PF for both strategies also declines over longer durations
but remains above 1 for the P(ARIMA) strategy throughout the studied period, indicating consistent profitability and
effective risk management by this strategy (Fig.7C). In contrast, the Baseline strategy, while strong initially, shows
a decrease in profitability over time, which indicates that the predictive capability of the strategy, like P(ARIMA),
captures price movements and trend shifts in the data, enhancing performance and providing a more consistent return
on investment.
Our results demonstrate that the portfolio strategies with time series prediction of log return price yield superior performance over longer investment horizons to the baseline strategy that relies solely on historical data. The
P(ARIMA) strategy leverages predicted price movement to enhance the cluster detection based on current market
data. This approach enables us to adapt to market trends more effectively than strategies that utilise shifting windows. It is worth mentioning that the ARIMA model is more computationally efficient than the LSTM model in this
study. However, it is important to note that predictive models demand substantial computational resources. This added
complexity might not result in commensurate gains in portfolio performance relative to the Baseline strategy. Conse20
quently, the return on investment in predictive approaches may diminish as computational costs rise. While predictive
and shifting strategies offer advantages in terms of adapting to market dynamics and potentially improving returns,
the Baseline strategy remains a viable option due to its lower computational demands and robust performance. Our
findings suggest that direct price predictions are less critical than accurately incorporating the correlation and interdependencies among cryptocurrencies. Understanding these relationships helps in constructing diversified portfolios
that are capable of mitigating risks and enhancing returns without the need for precise price predictions.
3.5. Crypto-market financial context
Between 2017 and 2022, the cryptocurrency market progressed through several distinct phases. The period from
late 2017 to early 2018 witnessed a boom fuelled by exponential price increases and numerous Initial Coin Offerings
(ICOs). This surge was followed by a market correction in 2018–2019, characterized by a prolonged bear market and
heightened regulatory scrutiny. Subsequently, renewed growth emerged in 2019, driven by rising institutional interest.
From 2019 to 2021, institutional adoption intensified, accompanied by significant regulatory developments and the
emergence of new use cases for cryptocurrencies. These trends collectively contributed to a more mature, diversified,
and resilient ecosystem. However, different periods within this timeline contributed differently to the performance
of cryptocurrency forming and investing strategies. The speculative frenzy of 2017 and the subsequent correction
phase had distinct impacts on market dynamics compared to the more stable and mature period of 2019 to 2021.
External shocks, such as the COVID-19 pandemic and periods of high inflation, also affected cryptocurrency prices
and influenced the performance of proposed strategies for portfolio investments. While the performance of indicators
performs well overall, the performance of investment strategies might vary across different market conditions. For instance, strategies that perform well during periods of market stability might not be as effective during times of extreme
volatility caused by external shocks. This variability of exogenous effects emphasises the need for adaptive investment
approaches that can respond to both anticipated and unforeseen market changes, ensuring robust performance across
varying market environments.
Our consensus clustering frequently identified Bitcoin’s (BTC) presence in the largest clusters, reflecting its role
as a key asset perceived by investors as a lead indicator. This frequent inclusion aligns with herd behaviour patterns,
particularly during periods of market uncertainty, where BTC’s influence drives high correlations across multiple
cryptocurrencies. Herd behaviour affects major assets like BTC to have a cascading effect on correlated assets, increasing systemic risk across the market. The robustness of our approach can distinguish persistent interdependencies
from short-lived, sentiment-driven correlations, suggesting that some relationships are more fundamental and resilient
to transient investor sentiment. For instance, cryptocurrencies that consistently move together can be considered as
belonging to the same risk class. Investors can leverage the patterns of clustering to proceed with hedging strategies
21
by combining assets from different risk classes to mitigate exposure to market-specific shocks. The discrepancies in
prediction accuracy during abrupt market changes can be interpreted through the perspective of behavioural finance,
such as investor reactions to regulatory news or macroeconomic events. These disruptions, captured by our portfolio
strategy, provide traders with opportunities to exploit temporary mispricing. For instance, if one cryptocurrency lag
behind one of its highly correlated cryptocurrencies during a sentiment-driven market rise, historical clustering data
allows traders to anticipate its subsequent adjustment, facilitating arbitrage.
We adopted a risk-free rate (r f ) of 0.02, which reflects the average long-term yield of stable government securities like 10-year U.S. Treasury bonds. This standard choice simplifies calculations and aligns with both academic
norms and industry practices, accommodating expected moderate inflation and economic growth projections. Incorporating r f instead of ignoring it, into our analysis, plays a crucial role, particularly in the context of the Sharpe
Ratio, which measures the excess return per unit of risk relative to the risk-free rate. A risk-free rate of 0.02, reflective of safe investment benchmarks like government bonds, establishes a foundational metric for evaluating the
performance of cryptocurrency portfolios. The adjustment of r f could affect the benchmark for achieving competitive
risk-adjusted returns. In the Capital Market Line (CML) framework, r f represents the point where risk-free investments are combined with risky assets to construct an optimal portfolio. A higher r f raises the threshold necessary for
achieving competitive risk-adjusted returns, as portfolios need to exhibit stronger performance to surpass this elevated
benchmark, effectively increasing the slope of the CML and pushing portfolios towards riskier allocations to optimise
returns. The consideration of r f ensures our assessment aligns with real-world investment conditions, providing a
robust framework for comparing the risk and return profiles of cryptocurrency investments. This also aligns with the
Efficient Frontier concept, as the Sharpe Ratio is used to determine the tangent portfolio on the Efficient Frontier, representing the most efficient trade-off between risk and return. It’s important to note that the selection of r f primarily
influences the relative performance metrics and not the absolute returns computed from our portfolios r port f olio . Since
our portfolios are assessed based on their performance relative to this baseline, the actual returns of the portfolios
themselves are not directly impacted by changes in the risk-free rate. This ensures that our evaluation of portfolio
returns remains focused on their ability to outperform the risk-free benchmark, thus providing a clear perspective on
the real investment value they offer within the dynamic cryptocurrency market environment.
4. Conclusion
Our study introduces a comprehensive approach that integrates predictive analytics, portfolio theory, and network
analysis to enhance the selection process of cryptocurrencies to maximise the asset diversification of cryptocurrency
portfolios. We introduce a method to identify clusters of highly correlated cryptocurrencies. Robust clusters capture
22
intrinsic interdependencies between cryptocurrencies, providing means to diversify portfolios and increase potential
returns independently of the market conditions. By integrating predictive models based on historical price data, our
methodology strikes a balance between foresight and retrospective insight, allowing for the anticipation of future market movements. This dual approach uncovers significant correlations between cryptocurrencies, which are critical for
constructing robust investment portfolios. By examining these relationships through network analysis, we were able
to identify clusters of cryptocurrencies likely to perform well together and also provide insights into the underlying
dynamics driving those correlations.
Our primary objective is not merely to detect communities within the cryptocurrency market, but to design a
methodology to identify groups of persistently highly inter-dependent cryptocurrencies, i.e. to detect intrinsic dependencies between cryptocurrencies. This clustering serves the practical purpose of facilitating the automated selection
of diverse coins for investment portfolios, thereby optimising risk management and potential returns. The patterns,
revealed through various forming strategies, are instrumental in understanding how groups of cryptocurrencies are
likely to behave in relation to one another under different market conditions.
The evaluation of the portfolio performance relied on key metric to assess the effectiveness and risk-adjusted
returns of our strategies. Strategies integrating predictive price information outperform those that rely solely on historical data or those applying the shifting techniques to capture dynamic inter-dependencies. This highlights the
importance of predictive models in capturing dynamic market trends and effectively forecasting price movements,
which are essential for strategic portfolio adjustments. The complexity and resource demands of implementing such
predictive models may not always justify the marginal improvements in portfolio performance, especially when simpler models without a price prediction component provide a relatively good performance with considerably lower
resource requirements.
Our study focused on the daily closing prices. Given the dynamic nature of the market, future research should
consider trading strategies based on higher temporal resolution data to exploit intra-day price fluctuations. However,
high frequency trading leads to a larger weight of trading costs in profitability assessments. Intra-day investment
should thus also include the trading cost overhead in the profit calculations. Risk assessment tools were limited to
standard deviation and Sharpe Ratio. Integrating additional risk assessment tools such as Value-at-Risk (VaR) and
Conditional Value-at-Risk (CVaR) would enhance risk assessment and practical applicability. Such enhancements
would refine the strategies for real-world application and compliance with evolving regulatory standards.
The core of our methodology is a robust detection of highly correlated cluster of cryptocurrencies via network
analysis. While machine learning offers alternative clustering techniques that could be integrated into our approach,
our network-based method has distinct advantages. It captures evolving interdependencies among cryptocurrencies,
23
offering interpretable insights into dynamic relationships. Compared to machine learning clustering methods, which
may rely on feature selection and exhibit high variance, our approach uses intrinsic network properties for more stable
and economically meaningful clusters. The consensus clustering further enhances robustness by aggregating multiple
outcomes, reducing stochastic variability, which is beneficial in the complex cryptocurrency market.
By identifying consistently co-moving cryptocurrencies and robust clusters, our approach helps to understand the
market stability and potential vulnerability, enhancing the comprehension of systemic risks. The hedging can proceed,
as identifying correlated assets enables investors to implement targeted risk mitigation strategies that reduce exposure
to specific market shocks. Our method distinguishes between risk classes, and identifies arbitrage opportunities,
addressing both market inefficiencies and investor behavioural biases. Recognizing strong, recurring correlations
allows for the prediction of price co-movements, enabling traders to capitalise on temporary mispricing across the
market. Traders can leverage this to profit from temporary pricing discrepancies, thus exploiting inefficiencies within
the market. The use of consensus clustering methods ensures resilience against transient market sentiments, making
the portfolio strategy more effective in the volatile cryptocurrency environment, which tends to rapid and fleeting
sentiment-driven shifts. Cryptocurrencies, as a unique asset class, exhibit characteristics of speculative stocks while
also serving as potential stores of value. By recognizing them as a distinct, highly inter-correlated asset category,
our method allows for their strategic integration with traditional assets (e.g., equities, bonds, commodities) to hedge
against economic shifts, inflation, and other macroeconomic factors.
In conclusion, our study highlights that cryptocurrencies are sufficiently mature and have a realistic potential to
be included as part of aggressive portfolios for investors willing to take acceptable risks. Employing advanced clustering techniques alongside network community analysis demonstrates a transformative potential in cryptocurrency
investment strategies. By focusing on the identification of clusters of highly correlated cryptocurrencies, our approach
offers a comprehensive understanding of asset relationships that can significantly enhance portfolio diversification and
resilience. Investors and regulatory bodies could leverage the strategic value of these clustering techniques, which not
only facilitate informed asset allocation but also optimise risk management across various market conditions. We
recommend the broader adoption of such methodologies to improve investment outcomes in an adaptive investment
environment. As the market evolves, integrating advanced asset clustering and price prediction techniques will be
essential for investors seeking to capitalise on emerging opportunities while navigating the complexities of regulatory
frameworks. Embracing this advanced analytical approach will empower stakeholders to achieve superior financial
performance and maintain competitiveness in the dynamic cryptocurrency landscape.
24
Acknowledgements
The authors declare that they have no conflict of interest. R.J. is funded by the China Scholarship Council (CSC)
from the Ministry of Education of P. R. China. R.K. is partially funded by JSPS KAKENHI (Nos. JP18K11560,
JP21H03559, JP21H04571, JP22H03695, and JP23K24950), JST PRESTO (No. JPMJPR1925), and AMED (No.
JP223fa627001). L.E.C.R. is partially funded by the FWO Scientific Research Network (W001625N), and Bijzonder
Onderzoeksfonds (BOF/STA/201909/022) from Ghent University, Belgium.
References
[1] Martı́n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving,
Michael Isard, et al. {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design
and implementation (OSDI 16), pages 265–283, 2016.
[2] Hirotugu Akaike. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6):716–723, 1974.
[3] Rasoul Amirzadeh, Asef Nazari, and Dhananjay Thiruvady. Applying artificial intelligence in cryptocurrency markets: A survey. Algorithms,
15(11):428, 2022.
[4] Andrew Ang. Asset management: A systematic approach to factor investing. Oxford University Press, 2014.
[5] Nikolaos Antonakakis, Ioannis Chatziantoniou, and David Gabauer. Cryptocurrency market contagion: Market uncertainty, market complexity, and dynamic portfolios. Journal of International Financial Markets, Institutions and Money, 61:37–51, 2019.
[6] Sonia Arsi, Soumaya Ben Khelifa, Yosra Ghabri, and Hela Mzoughi. Cryptocurrencies: Key risks and challenges. In Cryptofinance: A New
Currency for a New Economy, pages 121–145. World Scientific, 2022.
[7] Henri Arslanian. The emergence of new blockchains and crypto-assets. In The Book of Crypto: The Complete Guide to Understanding
Bitcoin, Cryptocurrencies and Digital Assets, pages 99–119. Springer, 2022.
[8] Suleyman Basak, Anna Pavlova, and Alexander Shapiro. Optimal asset allocation and risk shifting in money management. The Review of
Financial Studies, 20(5):1583–1621, 2007.
[9] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks.
Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
[10] Alexander Brauneis and Roland Mestel. Cryptocurrency-portfolios in a mean-variance framework. Finance Research Letters, 28:259–264,
2019.
[11] Denis Chaves, Jason Hsu, Feifei Li, and Omid Shakernia. Risk parity portfolio vs. other asset allocation heuristic portfolios. Journal of
Investing, 20(1):108, 2011.
[12] Shaen Corbet, Brian Lucey, Andrew Urquhart, and Larisa Yarovaya. Cryptocurrencies as a financial asset: A systematic analysis. International Review of Financial Analysis, 62:182–199, 2019.
[13] Luciano da Fontoura Costa, Osvaldo N Oliveira Jr, Gonzalo Travieso, Francisco Aparecido Rodrigues, Paulino Ribeiro Villas Boas, Lucas Antiqueira, Matheus Palhares Viana, and Luis Enrique Correa Rocha. Analyzing and modeling real-world phenomena with complex
networks: a survey of applications. Advances in Physics, 60(3):329–412, 2011.
[14] Aswath Damodaran. Value investing: investing for grown ups? Available at SSRN 2042657, 2012.
[15] Niek Deprez and Michael Frömmel. Are simple technical trading rules profitable in bitcoin markets? International Review of Economics &
Finance, 93:858–874, 2024.
25
[16] Abeer ElBahrawy, Laura Alessandretti, Anne Kandler, Romualdo Pastor-Satorras, and Andrea Baronchelli. Evolutionary dynamics of the
cryptocurrency market. Royal Society open science, 4(11):170623, 2017.
[17] Frank J Fabozzi, Sergio M Focardi, and Petter N Kolm. Quantitative equity investing: Techniques and strategies. John Wiley & Sons, 2010.
[18] Frank J Fabozzi, Harry M Markowitz, and Francis Gupta. Portfolio selection. Handbook of Finance, 2, 2008.
[19] William N Goetzmann and Alok Kumar. Equity portfolio diversification. Review of Finance, 12(3):433–463, 2008.
[20] Marı́a Guinda and Ritabrata Bhattacharyya. Using principal component analysis on crypto correlations to build a diversified portfolio.
Available at SSRN 3918398, 2021.
[21] Mi Yeon Hong and Ji Won Yoon. The impact of covid-19 on cryptocurrency markets: A network analysis based on mutual information. Plos
ONE, 17(2):e0259869, 2022.
[22] Evangelos Ioannidis, Iordanis Sarikeisoglou, and Georgios Angelidis.
Portfolio construction: A network approach.
Mathematics,
11(22):4670, 2023.
[23] George J Jiang, Tong Yao, and Tong Yu. Do mutual funds time the market? evidence from portfolio holdings. Journal of Financial Economics,
86(3):724–758, 2007.
[24] Zhengyao Jiang and Jinjun Liang. Cryptocurrency portfolio management with deep reinforcement learning. In 2017 Intelligent systems
conference (IntelliSys), pages 905–913. IEEE, 2017.
[25] Ruixue Jing and Luis EC Rocha. A network-based strategy of price correlations for optimal cryptocurrency portfolios. Finance Research
Letters, 58:104503, 2023.
[26] Vytautas Karalevicius, Niels Degrande, and Jochen De Weerdt. Using sentiment analysis to predict interday bitcoin price movements. The
Journal of Risk Finance, 19(1):56–75, 2018.
[27] Ahmed M Khedr, Ifra Arif, Magdi El-Bannany, Saadat M Alhashmi, and Meenu Sreedharan. Cryptocurrency price prediction using traditional
statistical and machine-learning techniques: A survey. Intelligent Systems in Accounting, Finance and Management, 28(1):3–34, 2021.
[28] Kwansoo Kim, Sang-Yong Tom Lee, and Said Assar. The dynamics of cryptocurrency market behavior: sentiment analysis using markov
chains. Industrial Management & Data Systems, 122(2):365–395, 2022.
[29] Monica Lam. Neural network techniques for financial performance prediction: integrating fundamental and technical analysis. Decision
support systems, 37(4):567–581, 2004.
[30] Andrea Lancichinetti and Santo Fortunato. Consensus clustering in complex networks. Scientific reports, 2(1):336, 2012.
[31] Fedor Ya Legotin, Ainura A Kocherbaeva, and Viktor E Savin. Prospects for crypto-currency and blockchain technologies in financial
markets. Revista Espacios, 39(19), 2018.
[32] Štefan Lyócsa, Tomáš Vỳrost, and Eduard Baumöhl. Stock market networks: The dynamic conditional correlation approach. Physica A:
Statistical Mechanics and its Applications, 391(16):4147–4158, 2012.
[33] Robert Andrew Martin. Pyportfolioopt: portfolio optimization in python. Journal of Open Source Software, 6(61):3066, 2021.
[34] Griffin Msefula, Tony Chieh-Tse Hou, and Tina Lemesi. Financial and market risks of bitcoin adoption as legal tender: evidence from el
salvador. Humanities and Social Sciences Communications, 11(1):1–15, 2024.
[35] J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto. Dynamics of market correlations: Taxonomy and portfolio
analysis. Physical Review E, 68(5):056110, 2003.
[36] Mohil Maheshkumar Patel, Sudeep Tanwar, Rajesh Gupta, and Neeraj Kumar. A deep learning-based cryptocurrency price prediction scheme
for financial institutions. Journal of information security and applications, 55:102583, 2020.
[37] Emmanuel Pintelas, Ioannis E Livieris, Stavros Stavroyiannis, Theodore Kotsilieris, and Panagiotis Pintelas. Investigating the problem of
cryptocurrency price prediction: a deep learning approach. In Artificial Intelligence Applications and Innovations: 16th IFIP WG 12.5
26
International Conference, AIAI 2020, Neos Marmaras, Greece, June 5–7, 2020, Proceedings, Part II 16, pages 99–110. Springer, 2020.
[38] Pradhyumna Rao, Nishit Bhasin, Puneet Goswami, and Lakshita Aggarwal. Crypto currency portfolio allocation using machine learning. In
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pages 1522–1527. IEEE,
2021.
[39] Alethea Rea and William Rea. Visualization of a stock market correlation matrix. Physica A: Statistical Mechanics and its Applications,
400:109–123, 2014.
[40] Luis EC Rocha, Naoki Masuda, and Petter Holme. Sampling of temporal networks: Methods and biases. Physical Review E, 96(5):052302,
2017.
[41] Dedhy Sulistiawan and Felizia Arni Rudiawarni. Do accrual minimise (maximise) stock risk (return)?: Evidence from Indonesia. International Journal of Globalisation and Small Business, 9(1):20–28, 2017.
[42] Karthik Vikram, Nikhil Sivaraman, and P Balamurugan. Crypto currency market price prediction using data science process. International
Journal for Research in Applied Science and Engineering Technology, 10(2):1451–1454, 2022.
[43] Liuqing Yang, Xiao-Yang Liu, Xinyi Li, and Yinchuan Li. Price prediction of cryptocurrency: an empirical study. In Smart Blockchain:
Second International Conference, SmartBlock 2019, Birmingham, UK, October 11–13, 2019, Proceedings 2, pages 130–139. Springer, 2019.
[44] Esteban Wilfredo Vilca Zuniga, Caetano Mazzoni Ranieri, Liang Zhao, Jó Ueyama, Yu-tao Zhu, and Donghong Ji. Maximizing portfolio
profitability during a cryptocurrency downtrend: A bitcoin blockchain transaction-based approach. Procedia Computer Science, 222:539–
548, 2023.
27
Supplementary Information
Data source
The daily price data of the cryptocurrencies used in our study was obtained from the following websites: www.investing.com,
coinmarketcap.com, www.coindesk.com, www.coincodex.com and www.marketwatch.com. Table S1 shows the code
of all the cryptocurrencies used in our study.
ADA
ADX
AE
AION
AMB
ANT
AOAR
ARDR
ARK
AST
BAT
BCD
BCH
BCN
BITCNY
BLK
BLOCK
BNB
BNT
BTC
BTM
BTS
BTU
NEBL
NEO
NEU
NLG
NMC
NMR
NXS
NXT
OAX
OCEANp
OCN
OK
OMG
OMNI
ONION
PAC
PART
PASC
PAY
PHR
PHX
PIVX
PLR
PND
POT
PPC
CND
CVC
DASH
DCN
DCR
DENT
DGB
DIME
DLT
DNT
DOGE
EDG
EMC2
ENG
ETC
ETH
ETP
EVX
FAIR
FCT
FLASH
FLO
FTC
FUEL
FUN
GAME
GAS
PPT
PRE
PRO
QRL
QTUM
RCN
RDD
RDN
REV
RLC
SALT
SBDR
SC
SCRT
SMART
SNC
SNM
SNT
SOUL
STEEM
STORJ
STRAX
SWFTC
SYNX
SYS
GBYTE
GEO
GLM
GRC
ICX
IGNIS
IOC
JNT
KEY
KMD
KNC
LBC
LINK
LRC
LSK
LTC
LUNA
MAID
MANA
MDA
MDT
MHC
IOTA
MKR
MLN
MONA
MTH
Table S1: The list of the N = 157 cryptocurrencies used in our study.
28
MTL
MYST
NAS
TOA
TRX
TUBE
UBQ
USDT
VAL
VERI
VET
VIA
VIB
WAVES
WGR
WINGS
XCP
XDN
XEM
XLM
XRP
XST
XTZ
XVG
ZAP
ZCL
ZEC
ZEN
ZRX
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )