64 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 Dynamic Games for Social Model Training Service Market via Federated Learning Approach Wenqing Cheng, Yuze Zou , Jing Xu , and Wei Liu , Member, IEEE Abstract— In recent years, an increasing amount of new social applications have been emerging and developing with the profound success of deep learning technologies, which have been significantly reshaping our daily life, e.g., interactive games and virtual reality. Deep learning applications are generally driven by a huge amount of training samples collected from the users’ participation, e.g., smartphones and watches. However, the users’ data privacy and security issues have been one of the main restrictions for a broader distribution of these applications. In order to preserve privacy while utilizing deep learning applications, federated learning becomes one of the most promising solutions, which gains growing attention from both academia and industry. It can provide high-quality model training by distributing the training tasks to individual users, relying on on-device local data. To this end, we model the users’ participation in social model training as a training service market. The market consists of model owners (MOs) as consumers (e.g., social applications) who purchase the training service and a large number of mobile device groups (MDGs) as service providers who contribute local data in federated learning. A two-layer hierarchical dynamic game is formulated to analyze the dynamics of this market. The service selection processes of MOs are modeled as a lower level evolutionary game, while the pricing strategies of MDGs are modeled as a higher level differential game. The uniqueness and stability of the equilibrium are analyzed theoretically and verified via extensive numerical evaluations. Index Terms— Differential game, evolutionary game, federated learning, machine learning as a service (MLaaS). N OMENCLATURE K and K N and N ωtm di,k pi,k pk x i,k Set of MDGs and number of MDGs, respectively. Set of MOs and number MOs, respectively. Weights of MDG m before the tth iteration. Dataset size of MDG k owns for MO i (i ∈ N and k ∈ K). Unit price that MDG k offers for MO i . pk [ p1,k , . . . , p N,k ]T , list of prices of MDG k for all MOs. Probability MO i requests training service from MDG k. Manuscript received December 4, 2020; revised April 10, 2021; accepted May 20, 2021. Date of publication June 23, 2021; date of current version January 31, 2022. This work was supported by the National Science Foundation of Hubei Province under Grant 2020CFB794. This article was presented in part at the IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM), 2019. (Corresponding author: Jing Xu.) The authors are with the Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China (e-mail: xujing@hust.edu.cn). Digital Object Identifier 10.1109/TCSS.2021.3086100 xi ci,k u i,k k i,k |·| xi [x i,1 , . . . , x 1,K ]T , list of probabilities of MO i to request training services from MDGs. Unit cost MDG k provides training service for MO i . Utility of MO i to request training service from MDG k. Expected profit of MDG k. Dynamics of MO i ’s selection on MDG k. Euclidean norm. I. I NTRODUCTION W ITH the great success of machine learning and deep learning technologies, enormous machine learning applications, such as image recognition, natural language processing, automatic driving, and medical diagnosis, are emerging and have profoundly improved our daily life [2]. According to the report by Stratistics MRC, the global machine learning as a service (MLaaS) market is expected to grow from U.S. $2.96 billion in 2019 to U.S. $49.13 billion by 2027 with a compound annual growth rate (CAGR) of 42.1% [3]. Machine learning and deep learning applications are generally driven by a huge amount of training datasets collected from personal mobile devices, e.g., smartphones and watches, which arouses thorny issues related to data privacy, such as data abuse and leakage. These issues are getting more and more attention from the public and administrations across the globe and become the barriers for the MLaaS market to access large amounts of data from personal mobile devices. In particular, regulations and laws protecting data privacy and security have been promulgated recently by states. For example, general data protection regulation (GDPR) has been enforced by the European Union in May 2018 to protect users’ privacy and data security [4]. To tackle these issues, federated learning proposed by Google becomes one of the most promising solutions and gains great attention from both academia and industry. In a nutshell, federated learning is a distributed and collaborative machine learning framework that fully utilizes the powerful mobile devices and takes advantage of the intelligence at the end users with their on-device data [5], [6]. Different from the conventional machine learning frameworks, federated learning keeps users’ private data on their devices instead of uploading them to a central data center. In this way, federated learning preserves users’ data privacy efficiently and prevents data abuse and leakage. To make federated learning more applicable in practice, however, there still exist several open challenges to be tackled. The current literature mainly focuses on solving related problems that 2329-924X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET typically include low communication efficiency caused by cumbersome weights of the machine learning model [7], [8], possible privacy leakage [9], and security issue such as model poisoning [10]–[12]. Nevertheless, a wide range of applications based on federated learning framework have been envisioned in the literature, including medical and health, such as disease diagnosis [13], natural language processing, e.g., next-word prediction [14], [15], traffic monitoring [16], and V2X communications [17]. Moreover, the emerging of 5G and mature of mobile edge computing (MEC) techniques [18]–[20] are powering up the applications of federated learning as it is intrinsically suitable for MEC scenario. MLaaS providers, such as Google Cloud AI (https://cloud.google.com/ai-platform), Microsoft Azure (https://azure.microsoft.com/services/machine-learning), and Amazon ML (https://aws.amazon.com/ai), can embrace federated learning framework to provide privacy-preserving model training services to model owners (MOs), such as small companies whose applications rely on machine learning models, but massive datasets from end users are not available or those institutions possess sensitive data, such as hospitals with medical images. As the pioneer of this concept, Google has announced a scalable production system for federated learning in the domain of mobile devices [21], which contributes to the formation of federated learning-based market. In this kind of market, we refer to MLaaS providers as mobile device groups (MDGs) since they have access to a federation of enormous mobile devices, e.g., Apple and Samsung have hundreds of millions of iOS and Android devices, respectively. On the other hand, we refer to the consumers in the market as MOs as they have machine learning models to purchase training services from the MDGs. In this article, we study the incentive mechanism for the participants in this federated learning market. First, we aim to design the incentive of an MDG to train a machine learning model for the MOs. Typically, the MDG can be rewarded by providing its on-device training services to the MOs. Second, to maximize the MOs’ profits, it is also a critical design problem for the MO to select appropriate MDGs in an open market, as the quality of service varies at different MDGs. We address these problems by building a price-based market framework to model the interactions between MOs and MDGs. In particular, MDGs can set different prices to their training services for MOs, according to the users’ preferences or willingness of participation. The MDG’s target is to maximize the cumulative profits of all users. Each MO can select serving MDGs from the set of available MDGs according to their prices and quality of services aiming to maximize their benefits (i.e., utilities or payoffs). The matching problem becomes more challenging when the MOs’ selections of MDGs change dynamically according to the time-varying performance satisfaction and cost. As such, the MDGs’ pricing strategies need to be adjusted accordingly to meet the MOs’ dynamic demands. To study this problem, we propose a two-layer dynamic game framework to model the dynamic behaviors of both MOs and MDGs in the model training service market. The game framework is fully distributed and is practically applicable for federated learning involving a large number 65 of participants. The contributions of this article lie in three folds. 1) We propose a price-based training service market model for federated learning to study the MDGs’ selections of training tasks, and the MOs’ selection of service providers with different quality of services. The optimal strategies of MOs and MDGs are obtained by maximizing individuals’ payoffs. The proposed market model allows participating users to set different prices of their training services, according to individuals’ risk preferences of privacy breach, providing a more flexible privacy-preserving mechanism for the emerging MLaaS applications. 2) A two-layer hierarchical dynamic game is proposed to model the interactions between MOs and MDGs in the above model training service market. The MOs’ service selections are studied in a lower level evolutionary game, while the MDGs’ pricing strategies are optimized in a higher level differential game. The solutions of the proposed game, i.e., dynamic equilibrium, are given theoretically and verified via extensive numerical evaluations. 3) We characterize the quality of training service in terms of dataset size and non-independent identically distributed (i.i.d.) property of the participants in federated learning. Extensive experiments reveal that the relationships between the quality of service and these properties can be fitted to exponential functions. Similar results also apply to other federated learning scenarios. The remainder of this article is organized as follows. Section II summarizes the related work and preliminary of federated learning is given in Section III. We describe the system model and propose the training service market in Section IV. Later on, we formulate the two-layer dynamic game and then analyze the uniqueness and stability of the equilibrium theoretically in Section V. Finally, numerical evaluations and conclusions are presented in Sections VI and VII, respectively. The major notations used in this article are given in the Nomenclature. II. R ELATED W ORK Federated learning is originated from Google back in 2016 [5], [6]. It aims to train a machine learning model in a highly distributed manner while preserving users’ privacy. Mcmahan et al. [5] proposed the FederatedAveraging algorithm to drive the federated learning system, which allows a server to collect local stochastic gradient descent (SGD) on each client and then perform model averaging. Konečnỳ et al. [6] introduced the concept of Federated optimization, which is a new and practical setting for distributed optimization in machine learning. Several algorithms, including stochastic variance reduced gradient (SVRG) [22], distributed approximate Newton (DANE) [23], and federated SVRG are analyzed theoretically for the federated setting. However, several technical challenges have to be solved before its practical deployment. These include Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. 66 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 low communication efficiency, potential privacy leakage [24], and security issues [9], [25]. The weights of deep neural networks in a machine learning model are typically of a large set. This implies a significant cost in the information exchange between servers and clients. Apart from that, the data privacy and security issues may prevent the users’ participation in federated learning. Melis et al. [24] showed that an adversarial participant can infer the presence of exact data points. Furthermore, the attackers may poison the shared model, i.e., Fung et al. [9] demonstrated that federated learning is vulnerable to the sybil-based label-flipping poisoning. In presence, a lot of works are emerging to improve the robustness and enhance the practicality of federated learning. To reduce the communication overhead, the most straightforward method is to design weights compression algorithms for the machine learning models [7], [8]. Konečnỳ et al. [7] designed structured updates and sketched updates to compress the weights, which can reduce the communication cost by two orders of magnitude. Furthermore, Lin et al. [8] proposed deep gradient compression (DGC) to highly compress the weights even further. In particular, the DGC algorithm can reduce the size of ResNet-50 from 97 to 0.35 MB and reduce DeepSpeech from 488 to 0.74 MB. This makes the communication cost negligible in a federated learning system. The privacy and security issues have also been studied extensively in the literature [10], [11], [26], [27]. Bonawitz et al. [10] proposed a secure aggregation scheme to protect the privacy of each user’s gradient estimation. A randomized mechanism is proposed in [11] to hide a single client’s contribution in weights aggregation and thus ensure data privacy in the learning process. Fung et al. [9] proposed the FoolGold algorithm to identify poisoning sybils based on the diversity of clients’ updates in the distributed learning process. There are several works concentrating on the incentive mechanisms in federated learning system. Kang et al. [28] adopted a contract theory to design an effective incentive mechanism for mobile devices with high-quality data to participate in federated learning. Jiao et al. [29] proposed an auction-based market model to incentivize data owners to participate in federated learning. They design two auction mechanisms for the federated learning system to maximize the social welfare of the federated learning services market. III. P RELIMINARY OF F EDERATED L EARNING The federated learning system generally works in four phases, as shown in Fig. 1. First, the central coordinator or model aggregator distributes its machine learning model to a group of mobile devices selected from the federation. More specifically, the coordinator can deliberately choose a subset of mobile devices with preferred quality of services. Second, the selected mobile devices train the machine learning model based on individuals’ local datasets. Third, after training for epochs, each mobile device uploads the updated model weights back to the model aggregator. To motivate the user’s participation, the mobile device typically receives a payoff from its model training service. Fourth, the coordinator aggregates Fig. 1. Interactions between mobile devices and model aggregator in federated learning. (e.g., by averaging [6]) all the weights of machine learning model uploaded by the mobile devices. This four-phase procedure repeats periodically and is expected to improve model accuracy. A. Local Update and Weights Aggregation Two core operations of federated learning are local update preformed on mobile devices and weights aggregation performed on the coordinator. Specifically, the local update of the model weights on the mobile device m at the tth iteration is given as follows: T T ωt+1 m = ωm − η∇(ωm ) (1) where ωTm denotes the obsoleted weights of mobile device m before the tth iteration. Learning rate and loss function of the machine learning model are represented by η and (·), aims to reduce the loss respectively. The new weight ωt+1 m function (·). Hence, it is updated by a gradient descent rule based on locally stored dataset on mobile device m. The weights aggregation performed on the coordinator is typically implemented via the FederatedAveraging algorithm proposed in [6], which is defined as follows: nm ωTm = (2) ωt+1 0 n T m∈M where MT denotes the set of selected mobile devices and ωt+1 denotes the averaged weights of learning model after the 0 tth iteration. Let n m denotethe size of training dataset on mobile device m and n = m∈MT n m be the complete size of training datasets of all participating mobile devices. Hence, the weighting parameter (n m /n) represents the significance of individual mobile device. It is clear that the mobile device can contribute more if it provides more training data to the selected training task. B. Quality of Federated Learning Training Service Similar to the conventional centralized machine learning tasks, the model accuracy under federated learning fashion is intuitively higher with larger training dataset size. This means that more mobile devices are preferred in the training service. Empirically, the accuracy of a machine learning model is a nondecreasing concave function of the training dataset size. Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET 67 Fig. 3. Social model training service market via federated learning that consists of MOs as service consumers and MDGs as service providers. Fig. 2. Demonstration of the impacts of dataset size and EMD on the training accuracy in federated learning system. Furthermore, however, different from the centralized fashion, training dataset under federated learning is distributed sparsely on devices, generally with non-i.i.d. distributions and varies from device to device. This intrinsic non-i.i.d. property of the dataset is also critical to the quality of the model training. Zhao [30] showed that the accuracy of machine learning model drops dramatically, by up to 55% trained based on highly skewed non-i.i.d. data, compared to that trained with i.i.d. data. To quantify this impact on the quality of the federated learning training service, we adopt the metric defined in [30] to measure the non-i.i.d. property of training data, which is Earth mover distance (EMD). Concretely, the EMD of mobile device m, denoted by σm , is given as follows: |Pm (l) − P(l)| (3) σm l∈L where P(l) and Pm (l) denote the probabilities of l for the whole MDG and device m, respectively, and L is the set of all possible categories in the dataset. Take the MNIST [31] handwritten digits database as an example, and its training set is divided into ten categories uniformly. Therefore, P(l) = 0.1 for l ∈ {0, . . . , 9}. Suppose that we have device 1 with all samples of digit 1 and device 2 with 50% of digit 1 and 50% of digit 2, and then, σ1 and σ2 are calculated as σ1 = 0.1 × 9 + 0.9 = 1.8 and σ2 = 0.1 × 8 + 0.4 × 2 = 1.2, respectively. To illustrate the impact of dataset size and non-i.i.d. property of training participants on the overall training performance, i.e., model accuracy, we conduct a group of experiments and the results are shown in Fig. 2. The machine learning model is trained on the MNIST database, with a federation of mobile devices involved. Each mobile device owns 100 samples that were randomly selected from the training dataset of MNIST with predefined skewness. We vary the number of participating devices in federated learning from 1 to 60 (accordingly, the total size of dataset involved in the training ranges from 100 to 6000). Each mobile device’s EMD is identical and chosen from the set {0.0, 0.2, 0.4}. Fig. 2 shows the accuracy of the aggregated model based on the test dataset after five rounds of on-device training. As we can see, the model’s accuracy increases with the dataset size with the same EMD value. However, for a fixed dataset size, the accuracy drops about 5% as the EMD increases by 0.2. IV. T RAINING S ERVICE M ARKET M ODEL We consider a model training service market for federated learning that consists of N MOs and K MDGs, as shown in Fig. 3. The sets of MOs and MDGs are denoted by N = {1, 2, . . . , N} and K = {1, 2, . . . , K }, respectively. In this market, each MO has a specific machine learning model to train, which can have different model structures based on its application, such as convolutional neutral networks (CNNs), long short-term memory (LSTM), and recurrent neutral networks (RNNs). Generally, MOs have no sufficiently large datasets to train their models. As such, MDGs can act as the service providers who can help train the MOs’ machine learning models in a federated learning manner. Specifically, let di,k denote the data size MDG k that owns in total for MO i . Each MDG is governed by an operator who is responsible for the training service aggregation. For example, the operator could be a smartphone company such as Apple and Samsung who have an enormous number of mobile devices. MDGs can set different prices for their training services in order to maximize their accumulative profits, while the MOs aim to select proper MDGs for their model training tasks, aiming to gain high model accuracy at a relatively low cost. A. Provider: MDGs In the model training service market, each MDG is governed by an operator who coordinates a group of mobile devices. For example, the training service happens on a federation of end devices, such as mobile phones, tablets, and watches, and the updated weights are uploaded to the operator of the federation. These weights are then aggregated by some algorithms, e.g., FederatedAveraging. The detailed interplay between individual mobile device and the operator within one MDG is beyond the scope of this article. Nevertheless, as shown in Fig. 2, the quality of federated learning training service mainly relates to the size of data size and the noni.i.d. property. Let x i,k denote probability that MO i selects training data from MDG k, and thus, the average data size that MDG k provides for MO i is given by x i,k di,k . On the other hand, to characterize the non-i.i.d. property in model training, we define the average EMD of all mobile devices in the same group (e.g., MDG k) for the same training task of MO i as σi,k . Let σ i [σi,1 , . . . , σi,K ] denote the EMD vector of different MDGs for MO i . The MDGs may adjust Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. 68 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 their pricing strategies over time in order to maximize their accumulative profits. Specifically, the profit of MDG k consists of two parts. 1) Payment From the MOs: The MOs have to pay MDGs for providing the model training services. Let pi,k denote the price that MDG k trains for MO i per unit data size. Then, the expected payment to MDG k received from MO i is given by pi,k x i,k di,k . 2) Cost of Model Training: The MDGs’ training services also incur a cost that accounts for the energy consumption and the users’ preferences. It is clear that a longer training time or more participating users in the training will incur higher energy consumption. We assume that the cost function is linearly increasing with the size of dataset involving in the training process and varies this assumption via extensive experiments as in Section VI. Let ci,k denote the unit cost of MDG k when it provides the training service for MO i . The expected dataset size of MDG k used for training the machine learning model of MO i can be denoted by x i,k di,k , where di,k is the size of the MDG k’s dataset allocated to MO i . Then, the cost of training is given by ci,k x i,k di,k . Therefore, the expected profit of MDG k can be simply denoted as follows: k (pk ) = pi,k x i,k di,k − ci,k x i,k di,k (4) i∈N where pk = [ p1,k , . . . , p N,k ]T denotes the MDG k’s pricing strategy for training different MOs’ machine learning models. B. Consumer: MOs According to the MDGs’ pricing strategies, the MOs can select different service providers, i.e., MDGs, to maximize their own utilities in terms of the model accuracy and cost. We assume that each MO selects one MDG at the same time since the model aggregation among different MDGs is generally not available due to extra coordination costs. We require x i,k ∈ [0, 1] and k∈K x i,k = 1. In practice, the probability x i,k can be explained as the portion of training time allocated to MDG k. For example, x i,k = 0.5 means that MO i will select the training service of MDG k for half of the overall training time. The selection of MDGs may dynamically change over time in order to improve the MOs’ utilities, which are determined by the following metrics. 1) Model Accuracy: It characterizes the quality of services provided by the selected MDGs. Let f k (d, σ ) denote the expected accuracy of the MOs’ machine learning models contributed by the model training service of MDG k, which relates to overall size d and the average EMD σ of the datasets involved in the training process. 2) Payment to MDGs: When MO i selects the training service from MDG k, the expected payment to MDG k is simply given by pi,k x i,k di,k , which is linear to the size of dataset contributed by MDG k. 3) Penalty for Congestion: MOs’ utilities are also affected by the potential congestion on the selection of MDGs. The MDG’s scheduling of training services becomes more complicated as more MOs simultaneously select the same MDGs. Such congestion in return incurs performance degradation (e.g., an increasing delay) at the MOs. To capture this effect, we define the penalty for congestion at MDG k as ( i∈N x i,k di,k )2 . Combining the above three terms, the utility of MO i , denoted by u i,k , can be given as follows: u i,k = ζi,k f k (x i,k di,k , σi,k ) − pi,k x i,k di,k 2 αk − x i,k di,k 2 i∈N (5) where the coefficient ζi,k controls the MO i ’s preference on the model accuracy. The constant αk denotes the MDG k’s sensitivity to congestion, which represents the MDG’s capability to deal with concurrent training requests. We require that the payoff function f k (d, σ ) related to the model’s accuracy should have the following properties. 1) Nondecreasing in d with a fixed σ and decreasing in σ with a fixed size d.1 2) First- and second-order differentiable in terms of d. The first property indicates that the MDG can provide a higher accuracy with more data2 that are involved in the model training task. V. DYNAMIC G AME AND E QUILIBRIUM A NALYSIS To depict the dynamics among MDGs and MOs in this training market under a federated learning framework, we devise a two-layer hierarchical game whose structure is summarized in Fig. 4. In particular, the dynamics of MOs’ selections are formulated by an evolutionary game due to their bounded rationality, while the pricing strategies of MDGs are modeled as a differential game. Specifically, MOs’ selection over time can be described by a group of ordinary differential equations (ODEs), which constitutes the dynamic states of MDGs’ differential game. A similar game structure is also adopted in [32] and [33]. A. Lower Level Evolutionary Game The MOs with bounded rationality select MDG from K candidate providers. Initially, each MO chooses MDG randomly. To obtain better utility, each MO adjusts its selection according to the price and time-varying observed model accuracy. With incomplete information, each MO could gradually learn by imitating the selection with higher payoff during the selection adaptation process. As such, the selection process of MO i can be formulated via replicator dynamics as follows: i,k (t) = δx i,k (t)(u i,k (t) − ū i (t)), k ∈ K (6) where δ is the learning rate that controls the selection adaptation frequency and ū i = k∈K x i,k u i,k is the average utility 1 The model’s accuracy may also decrease with the size of dataset when the dataset is highly skewed, i.e., EMD σ is relatively large. In this case, the operator can alleviate the performance degradation, e.g., by sharing a small portion of common dataset among mobile devices [30]. Besides, such MDGs can also be excluded by MOs due to their unsatisfactory model accuracy. 2 Here, “more data” means a larger size of dataset and more training time. Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET 69 where V (x i,k (t)) satisfies = 0, if x i,k = 0, ∀i ∈ N , k ∈ K V (x i,k (t)) > 0, otherwise. (8) By using the Lyapunov function defined in (7), the evolutionary equilibrium to the evolutionary game defined in (6) can be proved to be stable. Please refer to Appendix B for details. B. Upper Level Differential Game Fig. 4. Two-layer dynamic game framework for the social model training service market. The lower level is an evolutionary game that MOs adapt their training service selections, and the upper level is a differential game that MDGs adjust their service prices. of MO i . The initial selection probability xi (0) is randomly generated and denoted by xi(0) . According to the replicator dynamics, the probability of MO i ’s selection for MDG k will increase if its corresponding utility is higher than MO i ’s average utility [i.e., u i,k (t) > ū i (t)] and vice versa. The growth rate i,k (t) is proportional to the difference between the utility of the selection and MO i ’s average utility as well as the current probability of the selection, x i,k (t). Next, we prove the uniqueness and stability of the equilibrium of the lower level evolutionary game. To proceed, we first give the definition of evolutionary equilibrium and then provide the solution to the lower level game. Definition 1: Evolutionary Equilibrium: The solution of the game defined in (6) is defined as the evolutionary equilibrium. Proposition 1: Let i,k (xi (t), pi (t)) δx i,k (t)(u i,k (t) − ū i (t)), and the first-order derivative of i,k with respect to x j,l (t) is bounded for all ( j, l) ∈ N × K. Proof: The proof is given in Appendix A. Based on the definition, the uniqueness of the evolutionary equilibrium is guaranteed by Theorem 1. Theorem 1: The evolutionary game defined in (6) is uniquely solvable and hence admits a unique evolutionary equilibrium. Proof: Proposition 1 guarantees that i,k satisfies the Lipschitz condition with respect to x j,l for all ( j, l) ∈ N × K. Hence, the evolutionary game defined in (6) is uniquely solvable according to the Cauchy–Lipschitz theorem [34]. Second, according to Lyapunov’s second method for stability [35], we justify the stability of the evolutionary equilibrium to the evolutionary game defined in (6) as presented in Theorem 2. Theorem 2: The evolutionary game defined in (6) admits a stable evolutionary equilibrium. Proof: From Lyapunov’s second method for stability, we design a Lyapunov function as follows: 2 x i,k (t) (7) V (x i,k (t)) = i∈N k∈K As for the MDGs, they need to decide on the pricing for MOs and take the dynamics of MOs’ selections into consideration. For example, an MDG sets a higher price, which can increase the instant revenue. However, on the other hand, it may also deviate MOs to select other MOs that offer cheaper services. As such, we formulate a K -player (each represents an MDG) differential game to analyze this kind of dynamic decision-making problem. Different from MOs with bounded rationality, MDGs are supposed to be rational in that they are able to make decisions as best response to others’ decisions. Specifically, the MDGs offer their prices for all MOs initially. After that, MOs’ selections adapt over time based on replicator dynamics. Then, each MDG updates their prices as best response to the dynamics of MOs’ selections as well as other MDGs’ pricing strategies. All the MDGs aim to maximize their accumulative profits over a time horizon [0, T ]. The profit of MDG k and k is defined in (4). In the sequel, the problem to maximize the accumulative profit of MDG k with other MDGs’ strategies and the dynamics of MOs’ selections that are given can be transformed as an optimal control problem (OCP), which is given by T pi,k (t)x i,k (t)di,k − ci,k x i,k (t)di,k dt (9a) max pk (t) 0 i∈N s.t. i,k (t) = δx i,k (t)(u i,k (t) − ū i (t)), (k, i ) ∈ K × N (9b) xi (0) = xi(0) , i ∈ N . (9c) Next, we give the equilibrium analysis for the above upper level differential game, which can be reformulated into K OCPs. In the sequel, the solutions of OCPs are equivalent to maximize their corresponding Hamilton [36], which can be solved efficiently via iterative algorithm [37]. Specifically, the equilibrium existence of the upper level game is proven in Theorem 3, which is based on Lemma 1. For notational convenience, we let the objective of (9) inside integral as follows: k (xk (t), pk (t)) pi,k (t)x i,k (t)di,k − ci,k x i,k (t)di,k . i∈N T 0 k xk ϑ (t), pk ϑ (t) dt → ιk T = k (xk (t), pk (t))dt. sup (xk (·),pk (·))∈Dxk ×Dpk (10) 0 Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. 70 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 Lemma 1: For all i ∈ N , assume that there exists a sequence, i.e., pi ϑ (·) →w pi (·), ∀ϑ ≥ 1, xi ϑ (·) is the solution to the lower level evolutionary game corresponding to pi ϑ (·). Then, there exists a subsequence {xi ϑh (·)}h≥1 of {xi ϑ (·)}ϑ≥1 such that {xi ϑh (·)} pointwisely converges to xi (·) for all i ∈ N , denoted by xk ϑ (·) →s xk ∗ (·) for all ϑ ≥ 1, if i,k (·, ·) is Lipschitz continuous with respect to xi (·) and pi (·), e.g., x i,k ( xi (·) − x̂i (·) xi (·), pi (·)) − i,k (x̂i (·), pi (·)) ≤ κi,k i,k (xi (·), pi (·) − p̂i (·) pi (·)) − i,k (xi (·), p̂i (·)) ≤ κi,kp where xi (·) is the solution to the lower level evolutionary game corresponding to pi (·). Proof: Please refer to Appendix C for the proof. Based on Lemma 1 that guarantees the existence of pointwisely convergence of MO’s strategies under any given pricing sequences of MDGs, we can further prove the existence of the equilibrium of the upper level differential game in Theorem 3. Theorem 3: Suppose that k : T × Dxk × Dpk → R is a mapping such that the following conditions hold. 1) k : T × Dxk × Dpk → R+ is approximately lower semicontinuous. 2) k (xk (t), pk (t)) : Dxk × Dpk → R+ is lower semicontinuous for all t ∈ T . 3) k (xk (t), pk (t)) is convex with respect to pk (t) for all (t, xk (t)) ∈ T × Dxk . Here, Dxk = ×i∈N Dxi,k and Dpk = ×i∈N D pi,k . Then, the upper level differential game admits an optimal solution pair (x i,k ∗ (·), pi,k ∗ (·))i∈N ,k∈K . Proof: Let {(xk ϑ (·), pk ϑ (·))}ϑ≥1 be a maximizing sequence that follows (10). Here, passing to a subsequence if necessary, we assume that pk ϑ (·) →w pk ∗ (·) for all k ∈ K. In this case, we have xk ϑ (·) →s xk ∗ (·) for all ϑ ≥ 1 and k ∈ K based on Lemma 1, where xk ∗ (·) is the solution to the lower level evolutionary game corresponding to pk ∗ (·). Following this, we can accordingly have: T lim ϑ→+∞ 0 k xk ϑ (t), pk ϑ (t) dt ≤ T k xk ∗ (t), pk ∗ (t) dt 0 from which it can be further deduced as T k xk ∗ (t), pk ∗ (t) dt ιk ≤ 0 ≤ T sup (rh (·),pk (·))∈Dxk ×Dpk k (xk (t), pk (t))dt = ιk . 0 T Consequently, we have 0 k (xk ∗ (t), pk ∗ (t))dt = ιk , which implies that (xk ∗ (·), pk ∗ (·)) is the optimal solution pair to the upper level differential game. This completes the proof. VI. N UMERICAL E XPERIMENTS We conduct comprehensive numerical experiments to evaluate the dynamics of the federated learning training service market, in which we consider K = 3 MDGs serving for N = 2 MOs. The non-i.i.d. property, i.e., the EMD of these MDGs for MO 1 and MO 2, is given by σ i = [0.1, 0.15, 0.2], i ∈ {1, 2}, which means that MDG 1 has the relatively most balanced dataset among all MDGs. The maximum dataset size of each Fig. 5. Model accuracy against dataset size under different EMD settings. Dataset size ranges from 100 to 6000 and EMD, and σ ranges from 0 to 1. MDG can provide for each MO that is identical and given as 4000. Besides, the weights of accuracy term in (5) are also identical for a fair comparison and set as 6. A. Accuracy Fitting and Energy Consumption Measurement First, in order to present a proper empirical function f (d, σ ), we devise a group of experiments to evaluate the accuracy of a deep neural network trained by a federation of mobile devices against different dataset sizes and EMD settings. Apart from that, we also measure the energy consumption of model training on Raspberry Pi. Specifically, we train a three-layer neural network with one 512-unit fully connected hidden layer on a group of mobile devices in federated learning fashion. The actual and corresponding fitted model accuracies under different EMD and dataset size settings are shown in Fig. 5. As we can see in the figure, with any given EMD setting, the accuracy achieved by the federation of mobile devices increases with the training dataset size dramatically when the size is relatively small. However, the accuracy converges to a certain level when the dataset size is relatively large, and it has no further improvement with the increase in size. On the other hand, with given dataset size, the model can achieve higher accuracy with smaller EMD, σ , which shows the impact of non-i.i.d. property of the participants. To capture the marginal effect of training dataset size and the impact of EMD on accuracy, we adopt an exponential function to fit the experimental results. The fitted function is given in (11), in which EMD and σ can be considered as a parameter that affects the preference ceiling f (d, σ ) = a(σ )−b exp(−cd a(σ )) (11) where a(σ ) = 0.999 exp(−((σ + 0.3331/1.767))2), b = 0.3578, and c = 4.3720. The function fits well when σ ≤ 1.0. In particular, the coefficients of determination for all given σ (σ ≤ 1.0) are higher than 0.90, i.e., R 2 > 0.90. On the other hand, to measure the energy consumption of model training on mobile devices, we deploy the neural network on the Raspberry Pi Model 3 B to mimic mobile Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET Fig. 6. Mobile device’s energy consumption measurement. Three neural networks with a different number of neurons in the hidden layer are considered, which are 0, 64, and 512. 71 Fig. 8. Direction field of the replicator dynamics, indicating the stability of the evolutionary equilibrium. Specifically, we first give the evolutionary equilibrium of MOs’ selections at the lower level game, and then, the impact of congestion coefficient on MOs’ strategies is investigated. Finally, the equilibrium of MDGs’ pricing strategies at the upper level differential game is presented, and a comparison to static noncooperative equilibrium is given. B. Evolutionary Equilibrium of MOs’ Selections Fig. 7. Evolutionary trajectories of the training strategies adaptation over time for (a) MO 1 and (b) MO 2. device training scenario.3 To gather more general results, we deploy three neural networks with a different number of neurons within the hidden layer, which are 0, 64, and 512. The energy consumption results against training data size are shown in Fig. 6, as well as the fittings for each of these three neural network structures. As we can conclude from the figure, the energy consumption of mobile device is linear with the training dataset size, which is intuitive. Besides, this linearity is universal for different settings of the neural networks, which makes it applicable for different model settings. Furthermore, the slope of the line indicates the energy consumption of the machine learning model. The steeper the slope, the model is more complicated, and thus, the energy consumption is higher. Without loss of generality, we adopt the model with 512 units in the hidden layer for the simulations later, and its energy consumption coefficient is given by ci,k = 0.2148 J/103 , which indicates that the MDG consumes 0.2148 J for the training of per thousand samples. With accuracy function and energy consumption coefficient determined, we are ready to evaluate the dynamics of the presented games and the impacts of different parameters. 3 The Raspberry Pi Model 3 B [38] can be charged by a USB port on PC and has the similar computational capability of a modern smartphone, which makes it suitable to represent a mobile device. For the lower level evolutionary game, we plot the evolutionary trajectories of MOs’ selections over time in Fig. 7. As we can see in the figure, both MO 1 and MO 2’s selections evolve with time and gradually converge to stable states. For example, MO 1 initializes its training strategy as x1(0) = [0.2, 0.3, 0.5]. With the selection strategy of MO 1 evolving with time, the probability that MO 1 selects MDG 3 decreases, and meanwhile, the probability that MO 1 selects MDG 1 and MDG 2 increase, driven by replicator dynamics as defined in (6). Eventually, the selection of MO 1 converges to x1 (T ) = [0.3826, 0.3337, 0.2837], which indicates that MO 1 prefers MDG 1 over other two MDGs. According to the EMD settings of MDGs, i.e., σ 1 = [0.10, 0.15, 0.20], it is reasonable that MO 1 has a higher probability to choose the MDG with more balanced dataset, i.e., smaller EMD since all other parameters are identical. Similarly, MO 2’s selection strategy converges to a symmetric stable state of MO 1 as shown in Fig. 7(b), though it initiates with a different initial strategy compared to x10 . Furthermore, we verify the asymptotic stability of the MOs’ evolutionary equilibrium via a direction field of replicator dynamics. Without loss of generality, we give the direction field of MO 1, as shown in Fig. 8. The arrow at each point indicates the direction of the adaptation process for MO 1 at that state. It is governed by replicator dynamics as in (6) and defines MO 1’s selection evolving at the next step. As depicted in the figure, the probabilities of MO 1 choose MDG 1 and MDG 2, i.e., x 1,1 and x 1,2 always converge from any initial probabilities, which verifies the fact that the adaptation leads the MO 1 to achieve the evolutionary equilibrium. Next, we evaluate the impact of learning rate, i.e., δ in (6), on the convergence speed of the replicator dynamics, and the Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. 72 Fig. 9. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 Impact of learning rate on the convergence time. Fig. 10. Impact of congestion coefficient on the training strategies of MO 1. The congestion coefficient, αk in (5), varies from 0.05 to 0.5. results are shown in Fig. 9. The learning rate indicates the frequency of MOs’ selection adaptations, which controls the speed of strategy adaptation. As a comparison, we also implement a static noncooperative game for the MDGs, in which the decision on pricing of the MDGs only takes the evolutionary equilibrium of the lower level into account instead of the replicator dynamics as that considered in differential game fashion. The solution of this static noncooperative game uses a backward induction method, which is widely addressed in the literature and thus omitted here. We observe in Fig. 9 that the convergence speed increases with the increasing learning rate. Specifically, when the MOs fully utilize their observations (i.e., δ = 1), the MOs can achieve the evolutionary equilibrium at fastest rate. Without loss of generality, in the following numerical analysis, we set the learning rate δ = 1. Besides, as shown in Fig. 9, the differential game outperforms the static noncooperative game in terms of the convergence speed. C. Impact of Congestion Coefficient Fig. 10 shows the impact of congestion coefficient on MO 1’s training strategy. We vary the congestion coefficient α for both MOs from 0.05 to 0.5 and plot MO 1’s training strategies at evolutionary equilibrium. As we can see in Fig. 10, MO 1’s probabilities of selecting MDGs show convergence with increasing congestion coefficients. With larger congestion coefficients, the impact on MOs’ utility degradation is severer. Fig. 11. Average utility of MOs against number of MOs that ranges from 2 to 10. Fig. 12. Pricing strategies over time of MDGs for (a) MO 1 and (b) MO 2. Consequently, MOs tend to select MDGs dispersively in terms of probability in order to lower the chance of congestion. To investigate the scalability of the proposed game framework, we then vary the number of MOs from 2 to 10 and extend the number of MDGs to 6. The MOs’ average utilities against the number of MOs are shown in Fig. 11. As we can see in the figure, for given K , i.e., number of MDGs, the average utilities of the MOs decrease with N, i.e., the number of MOs. The reason is that the increase of MOs in the market with a given number of MDGs results in a more crowded network, i.e., increases the punishment introduced by the congestion term in (5). For the same reason, for any given number of MOs, more MDGs result in higher average utilities of the MOs. D. Equilibrium of MDGs’ Pricing Strategies As for the upper level differential game, the equilibrium of dynamic pricing strategies for the MDGs is shown in Fig. 12. As we can observe in Fig. 12(a) and (b), all MDGs adjust their prices for both MO 1 and MO 2 gradually over time, and the prices converge to static states eventually. This pricing dynamics is simultaneous with the MOs’ selections adaptation based on that the replicator dynamics of the lower level evolutionary game is considered for the solution of upper level differential game. Among these MDGs, MDG 1 offers the highest static price than other two MDGs for both MO 1 and MO 2. The reason is that MDG 1’s data quality in Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET 73 where (du i,k /dx j,l ) is x i,k Mid du i,k + = γk f (h(xk ))M dj dx j,l h(xk ) d x i,k Mid M dj dx i,k M j − × f k (h(xk )) dx j,l h(xk ) h 2 (xk ) − ηi,k Mid Fig. 13. Cumulative profit of MDGs under dynamic and static equilibrium strategy controls. terms of EMD is better than the others, i.e., smallest σ . Consequentially, higher price for better service. Furthermore, each MDG offers the same static price for MO 1 and MO 2. For example, prices that MDG 1 offers for MO 1 and MO 2 are both 0.4365. Since all MDGs have the same data qualities for MO 1 and MO 2. This is also reasonable that the same price for the same service. On the other hand, Fig. 13 shows the cumulative profit of MDGs. As shown in Fig. 13, MDG 1 has the highest cumulative profit among all MDGs due to its best data quality, i.e., lowest EMD. Besides, compared with static noncooperative game, differential game approach helps MDGs achieve better cumulative profits due to its high flexibility. dx i,k dx j,l and h(xk ) i∈N x i,k Mid . Besides, t is omitted here for convenience. In the sequel, for all (i, k) ∈ N × K, |(du i,k /dx j,l )| is bounded for all ( j, l) ∈ N × K due to the continuity of f (·) and f (·). Similarly, |(d ū i /dx j,l )| is also bounded. These facts admit that |(di,k /dx j,l )| is bounded, which completes the proof. A PPENDIX B P ROOF OF T HEOREM 2 To prove that the function defined in (7) meets the Lyapunov conditions, we need to verify that ∇t (V (t)) ≤ 0 for all values of V (t) = 0. ∇t (V (t)) is given as follows: ∇t (V (t)) = 2 x i,k (t) i,k (t) i∈N k∈K = 2 Nδ i∈N k∈K x i,k (t) u i,k (t) − ū i (t) i∈N k∈K = 2 Nδ x i,k (t)u i,k (t) − i∈N k∈K A PPENDIX A P ROOF OF P ROPOSITION 1 To prove Proposition 1, we give the derivative of i,k with respect to x j,l as follows: du i,k dx i,k d ū i di,k u i,k − ū i + x i,k = − dx j,l dx j,l dx j,l dx j,l ū i i∈N = 0. VII. C ONCLUSION With emerging social applications based on machine learning technologies, our daily life has gained profound improvement. Meanwhile, the data privacy issue along with these applications that may acquire our personal data as training samples for the machine learning models also gains more and more attention. To tackle this issue, federate learning becomes one of the most promising solutions due to its privacy-preserving property via on-device model training. To this end, in this article, we devise a two-layer dynamic game model consists of the lower level evolutionary game of the MOs and the upper level differential game of MDGs to study the incentive mechanism. The solutions of the proposed two-layer dynamic game are analyzed theoretically. The quality of federated learning service that relates to the size and non-i.i.d. property of the dataset is formulated based on extensive experiments, as well as the energy consumptions of on-device training are measured. As such, the solutions of the proposed two-layer dynamic game are then verified via numerical evaluations. This states that the evolutionary game defined in (6) follows the Lyapunov stability, i.e., the equilibrium is stable. A PPENDIX C P ROOF OF T HEOREM 1 Proof: Here, we have T (12), shown at the top of the next page, where z T = 0 |z(t)|L1 dt. i,k (·, ·) is Lipschitz continuous with respect to xi (·) and pi (·). As such, we have x i,k − x i,k T ϑ i∈N ,k∈K ⎡ 1 ≤ ⎣ μ i∈N ,k∈K + T 0 x xi ϑ (τ ) − xi (τ ) dτ e−μτ κi,k i∈N ,k∈K T 0 ⎤ e−μτ κi,kp pi ϑ (τ ) − pi (τ )dτ ⎦. x x = max{κi,k }∀i∈N ,k∈K and In the sequel, by letting κmax p = {κi,k }∀i∈N ,k∈K , we accordingly have p κmax xϑ − x T 1 x p κmax K xϑ − x T + κmax ≤ K pϑ − p T μ x K p K κmax pϑ − p xϑ − x T ≤ κmax ⇔ 1− μ μ T . Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. 74 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 9, NO. 1, FEBRUARY 2022 xϑ − x T x i,k − x i,k = ϑ T = i∈N ,k∈K ≤ i∈N ,k∈K T i∈N ,k∈K ⎡ ≤⎣ e−μt 0 = 0 T 0 0 ⎡ 1⎣ ≤ μ i∈N ,k∈K + i∈N ,k∈K T T e−μt 0 0 i,k xi ϑ (τ ), pi ϑ (τ ) − i,k (xi (τ ), pi (τ ))dτ dt i,k xi ϑ (τ ), pi ϑ (τ ) − i,k (xi (τ ), pi (τ )) dτ dt e−μτ i,k xi ϑ (τ ), pi ϑ (τ ) − i,k (xi (τ ), pi (τ )) T i∈N ,k∈K T T τ e−μ(t−τ ) dtdτ ⎤ e−μτ i,k xi ϑ (τ ), pi ϑ (τ ) − i,k (xi (τ ), pi (τ )) dτ ⎦ T 0 ∞ e−μt dt 0 e−μτ i,k xi ϑ (τ ), pi ϑ (τ ) − i,k xi (τ ), pi ϑ (τ ) dτ i∈N ,k∈K T 0 e−μτ ⎤ i,k xi (τ ), pi ϑ (τ ) − i,k (xi (τ ), pi (τ )) dτ ⎦ x As long as K κmax < μ, we can have xϑ (·) →s x(·) when w pϑ (·) → p(·). This completes the proof. R EFERENCES [1] Y. Zou, S. Feng, J. Xu, S. Gong, D. Niyato, and W. Cheng, “Dynamic games in federated learning training service market,” in Proc. IEEE Pacific Rim Conf. Commun., Comput. Signal Process. (PACRIM), Aug. 2019, pp. 1–6. [2] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015. [3] Stratistics MRC, “Machine learning—Global market outlook (2019– 2027),” Annapolis, MD, USA, Tech. Rep. SMRC19398, 2020. [4] REGULATION (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation), EU, Brussels, Belgium, 2016. [5] H. B. Mcmahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist. (AISTATS), 2017, pp. 1273–1282. [6] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated optimization: Distributed machine learning for ondevice intelligence,” 2016, arXiv:1610.02527. [Online]. Available: http://arxiv.org/abs/1610.02527 [7] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2016, arXiv:1610.05492. [Online]. Available: http://arxiv.org/abs/1610.05492 [8] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,” in Proc. ICLR, 2018, pp. 1–14. [9] C. Fung, C. J. M. Yoon, and I. Beschastnikh, “Mitigating sybils in federated learning poisoning,” 2018, arXiv:1808.04866. [Online]. Available: http://arxiv.org/abs/1808.04866 [10] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 1175–1191. [11] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated learning: A client level perspective,” in Proc. NIPS Workshop, Mach. Learn. Phone Other Consum. Devices, 2017, pp. 1–7. [12] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, “Reliable federated learning for mobile networks,” IEEE Wireless Commun., vol. 27, no. 2, pp. 72–80, Apr. 2020. [13] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W. Shi, “Federated learning of predictive models from federated electronic health records,” Int. J. Med. Informat., vol. 112, pp. 59–67, Apr. 2018. (12) [14] A. Hard et al., “Federated learning for mobile keyboard prediction,” 2018, arXiv:1811.03604. [Online]. Available: http://arxiv.org/ abs/1811.03604 [15] T. Yang et al., “Applied federated learning: Improving Google keyboard query suggestions,” 2018, arXiv:1812.02903. [Online]. Available: http://arxiv.org/abs/1812.02903 [16] L. Liang, H. Ye, and G. Y. Li, “Toward intelligent vehicular networks: A machine learning framework,” IEEE Internet Things J., vol. 6, no. 1, pp. 124–135, Feb. 2019. [17] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Distributed federated learning for ultra-reliable low-latency vehicular communications,” IEEE Trans. Commun., vol. 68, no. 2, pp. 1146–1159, Feb. 2020. [18] Z. Ning et al., “Mobile edge computing enabled 5G health monitoring for Internet of medical things: A decentralized game theoretic approach,” IEEE J. Sel. Areas Commun., vol. 39, no. 2, pp. 463–478, Feb. 2021. [19] Z. Ning et al., “Partial computation offloading and adaptive task scheduling for 5G-enabled vehicular networks,” IEEE Trans. Mobile Comput., early access, Sep. 18, 2020, doi: 10.1109/TMC.2020.3025116. [20] Z. Ning et al., “Intelligent edge computing in Internet of vehicles: A joint computation offloading and caching solution,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 4, pp. 2212–2225, Apr. 2021. [21] K. A. Bonawitz et al., “Towards federated learning at scale: System design,” in Proc. SysML, 2019, pp. 1–15. [Online]. Available: https://arxiv.org/abs/1902.01046 [22] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Proc. Adv. Neural Inf. Process. Syst., vol. 26, 2013, pp. 315–323. [23] O. Shamir, N. Srebro, and T. Zhang, “Communication-efficient distributed optimization using an approximate Newton-type method,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 1000–1008. [24] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov, “Inference attacks against collaborative learning,” 2017, arXiv:1805.04049. [Online]. Available: https://arxiv.org/abs/1805.04049 [25] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN: Information leakage from collaborative deep learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 603–618. [26] A. Gascón et al., “Privacy-preserving distributed linear regression on high-dimensional data,” Proc. Privacy Enhancing Technol., vol. 2017, no. 4, pp. 345–364, 2017. [27] S. Hardy et al., “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” 2017, arXiv:1711.10677. [Online]. Available: http://arxiv.org/abs/1711.10677 [28] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim, “Incentive design for efficient federated learning in mobile networks: A contract theory approach,” in Proc. IEEE VTS Asia Pacific Wireless Commun. Symp. (APWCS), Aug. 2019, pp. 1–5. [29] Y. Jiao, P. Wang, D. Niyato, B. Lin, and D. I. Kim, “Toward an automated auction framework for wireless federated learning services market,” IEEE Trans. Mobile Comput., early access, May 14, 2020, doi: 10.1109/TMC.2020.2994639. Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply. CHENG et al.: DYNAMIC GAMES FOR SOCIAL MODEL TRAINING SERVICE MARKET [30] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-IID data,” 2018, arXiv:1806.00582. [Online]. Available: http://arxiv.org/abs/1806.00582 [31] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [32] K. Zhu, E. Hossain, and D. Niyato, “Pricing, spectrum sharing, and service selection in two-tier small cell networks: A hierarchical dynamic game approach,” IEEE Trans. Mobile Comput., vol. 13, no. 8, pp. 1843–1856, Aug. 2014. [33] K. Zhu and E. Hossain, “Joint mode selection and spectrum partitioning for device-to-device communication: A dynamic Stackelberg game,” IEEE Trans. Wireless Commun., vol. 14, no. 3, pp. 1406–1420, Mar. 2015. [34] Cauchy-Lipschitz Theorem. Accessed: Feb. 25, 2021. [Online]. Available: https://www.encyclopediaofmath.org/index.php/Cauchy-Lipschitz_ theorem [35] S. Sastry, “Lyapunov stability theory,” in Nonlinear Systems (Interdisciplinary Applied Mathematics), vol. 10. New York, NY, USA: Springer, 1999, doi: 10.1007/978-1-4757-3108-8_5. [36] E. J. Dockner, S. Jorgensen, L. N. Van, and G. Sorger, Differential Games in Economics and Management Science. Cambridge, U.K.: Cambridge Univ. Press, 2000. [37] D. Tabak, “Numerical solutions of differential game problems,” Int. J. Syst. Sci., vol. 6, no. 6, pp. 591–599, 1975. [38] Raspberry Pi Foundation. Raspberry Pi 3 Model B. Accessed: Feb. 25, 2021. [Online]. Available: https://www.raspberrypi.org/ products/raspberry-pi-3-model-b/ Wenqing Cheng received the B.E. degree in telecommunication engineering and the Ph.D. degree in electronics and information engineering from the Huazhong University of Science and Technology, Wuhan, China, in 1985 and 2005, respectively. She is currently a Professor with the School of Electronic Information and Communications, Huazhong University of Science and Technology. Her research interests include mobile communications and wireless sensor networks, information systems, and e-learning applications. 75 Yuze Zou received the B.E. degree in electronic information communications (EIC) from the Huazhong University of Science and Technology, Wuhan, China, in 2015, where he is currently pursuing the Ph.D. degree with the School of Electronic Information and Communications. His research interests include federated learning, intelligent reflecting surface, and game theory and its applications in networked systems. Jing Xu received the B.E. degree in telecommunication engineering and the Ph.D. degree in electronics and information engineering from the Huazhong University of Science and Technology, Wuhan, China, in 2001 and 2011, respectively. He is currently an Associate Professor with the School of Electronic Information and Communications, Huazhong University of Science and Technology. His research interests include wireless networks and network security, with an emphasis on performance optimization, game theory, and reinforcement learning and their application in networked systems. Wei Liu (Member, IEEE) received the B.E. degree in telecommunication engineering and the Ph.D. degree in electronics and information engineering from the Huazhong University of Science and Technology, Wuhan, China, in 1999 and 2004, respectively. He is currently an Associate Professor with the School of Electronic Information and Communications, Huazhong University of Science and Technology. His research interests include wireless networks, the Internet measurement, and e-learning applications. Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on January 17,2023 at 06:50:34 UTC from IEEE Xplore. Restrictions apply.