Zoom in iOS Clones: Examining the Impact of Copycat Apps on Original App Downloads Beibei Li Param Vir Singh Quan Wang Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University beibeili@andrew.cmu.edu psidhu@andrew.cmu.edu quanw@andrew.cmu.edu Nov 2014 Abstract With the rapidly growth of mobile app market, a large amount of mobile app developers are encouraged to invest in app innovation. However, it inevitably invites other developers to imitate the design and appearance of innovative and original apps. In this paper, we examine the prevailing copycat app phenomenon and its impact. Using a combination of machine learning techniques such as natural language processing, latent semantic analysis, network-based clustering and image analysis, we are able to detect two types of copycats: deceptive and nondeceptive. Based on the detection results, we conduct econometric analysis to understand the major impacts on the download of original apps. Our analysis is validated on a unique dataset containing detailed information about 10,100 action game apps by 5,141 developers released at iOS App Store for five years. Our final results indicate significant heterogeneity in the interactions between copycats and original apps over time. In particular, our findings suggest that the copycat apps can be either friends or foes of the original apps. Specifically, high quality copycats tend to compete with the original app, especially if the copycat apps are non-deceptive. Interestingly, for low quality copycats, we find a significant and positive effect from the deceptive copycats on the original app downloads, suggesting a potential positive spillover effect. 1 Introduction The mobile commerce has grown immensely around the world in recent years. Much of the industry revolution is driven by mobile software applications, or apps. According to app analytics provider Flurry, the average US consumers are spending 2 hours and 42 minutes per day on mobile devices, 86% of which is spent on mobile apps (Flurry 2014). In terms of monetary expenditure, Apple announced that its users spent $10 billion in the app store in 2013, with $1 billion coming in December alone (Bloomberg Businessweek 2014). The tremendous demand of the mobile apps has created significant financial incentives and business opportunities for app developers. As a result, over 1,000,000 mobile apps are created and offered on each of the two leading platforms, Apple iOS and Google Android. A top-10 ranked grossing iOS game can easily make $47,000 in daily revenue and over $17 million a year (Distimo 2013). While this enormous economic opportunity attracts an increasing amount of mobile developers to invest in app innovation, it inevitably invites copycat developers to imitate the design and appearance of original apps. A study by The Guardian in 2012 shows that Apple’s App Store was flooded with copycat gaming applications (TheGuardian 2012). Only several days after the original game Flappy Bird hit the market, “Flappy Bird” copycats rocketed up and soared into four out of the five iOS app store’s top free apps. Now one can easily find tens of “Flappy Bird” knock-offs on any major mobile platform (NYTimes 2014). Flappy Bird isn’t alone with a clone. Many successful apps are closely followed by copycats (Technobuffalo 2014). The copycat phenomenon has been so ubiquitous that the “derivative” works don't just come from smaller developers - they come from large firms imitating smaller ones as well. Large gaming studios like Zynga have used their ample engineering resources to imitate smaller games like “Tiny Tower” (Huffington Post 2013). Large companies also love to parrot other large companies to maintain feature parity (for an example, see “Instagram” vs “Vine,” Huffington Post 2013). 2 Several unique features of the mobile app market have contributed to the prevalence of copycat apps. First of all, the barrier to copycat entry is almost zero. Compared with other creative industries such as film and TV, the cost of copy is substantially lower and the protection of intellectual property is weak in the mobile app market. Second, the success of innovation is unpredictable so that the cost of innovation becomes high relative to the expected revenue. Even large companies and superstar app developers are not guaranteed to generate successful innovation continuously. Third, the original app developers are usually lack of resources to educate consumers about who is authentic. As a result, the brand loyalty of the mobile app is insignificant. In addition to the factors that potentially displace the demand and supply from original apps to copycats, notable features of the mobile app market imply the shift in the opposite direction. The uniqueness of the copycat app phenomenon provides a great opportunity for researchers to examine various issues such as the economic impact of copycat apps to original apps, the optimal combating strategy, the trade-off between being innovating and imitating, innovation and growth, the optimal copyright protection policy, etc. However, very little knowledge has been generated so far due to a few notable challenges. The first challenge comes from the definition of copycat apps. The imitation can happen in multiple dimensions regarding the functions and appearances of the app. It’s possible that the copycat app mimic the authentic app by the functional dimensions or by the packaging strategy, or both. Different degrees of imitation can imply different types of original-copycat interactions. Second, due to the large amount of existing apps and the budget constraints of small firms, tracking the copycat behavior tends to be too costly for many app developers. As a result, the market collects very little knowledge about the actual behavior of copycat apps, which makes copycat detection highly exploratory and 3 unsupervised. The third challenge comes from the lack of theoretical support. The imitation issue is constantly understudied in the literature of economics and management, while the theories of counterfeiting are often context dependent. For example, Cho et al. (2013) contrast the combating strategies for different types of counterfeits. They highlight the trade-offs among different strategies in different contexts. Therefore, it is crucial to take the unique features of the mobile app market into consideration when examining the copycat app phenomenon. Keeping these important managerial and policy questions in mind, we have two major objectives in this paper. The first objective is the introduction of a functionality-based copycat detection method that achieves accurate results using publicly available data. This method should be fast and scalable to accommodate the large number and great variety of available apps. Using this method to access the copycat and original identifiers is highly beneficial to both practitioners and researchers. For practitioners, they can better monitor the behavior of the followers or the original innovator and hence respond accordingly. For researchers, detecting copycat apps and original apps can help produce many interesting work and advance the understanding of this booming market. Our second objective is to empirically analyze the interplay between original apps and copycat apps in terms of economic outcomes. In particular, this paper focuses on the potential sales cannibalization of copycat apps on the original app. We are especially interested in: - What is the impact of the demand for copycat apps on the demand of original apps? Specifically, under what conditions does the copycat app impede the sales of the original app, and when does it facilitate the sales of the original app? To achieve the goals, we first refer to the literature of counterfeiting and imitation to formally define the copycat apps and original apps. We specify two types of copycat apps: deceptive and 4 non-deceptive, based on the difference in the appearance. Then we collect panel data for all the action gaming apps from the iOS app store and AppAnnie.com for over five years. The data contains both structured data such as the download rank and characteristics of the apps as well as unstructured data such as textual and graphical descriptions of the apps. We combine various machine learning techniques to analyze the unstructured textual and graphical data to detect the copycat apps. Using surveys on Amazon Mechanical Turk as the benchmark, we evaluate the accuracy of the proposed detection approach. Then we examined the economic outcomes of copycat apps with panel data and econometric models. Our results provide solid empirical evidence that the interactions between original apps and copycat apps are highly heterogeneous. Copycat apps can be both friends with and foes of authentic apps, depending on the type of copycats. The non-deceptive copycats are special kinds of competitors and will cannibalize the authentic sales. Especially high quality non-deceptive copycats can steal the sales that would have gone to the original apps. However, if the copycat is deceptive, it can have both positive spillover effects and negative cannibalization on the original sales. In particular, for high quality deceptive copycats, the negative cannibalization dominates. Interestingly, for low quality deceptive copycats, we find a significant and positive effect from them on the original app downloads, suggesting a positive spillover effect. Our answers to these research questions provide a useful guidance to both industry managers and to policy makers, and also contributed to the growing academic literature on mobile commerce. To the best of our knowledge, our paper is the first study to focus on the copycat phenomenon in the mobile app settings. Our proposed copycat detection approach allows the practitioners as well as researchers to access the copycat identifiers and investigate into this emerging area. Applying econometric methodologies, we are able to examine the major economic impacts of 5 copycat phenomenon from a causal perspective. Our analysis can be the first step towards understanding the drivers of technological innovation in the mobile app market. Related Literature There are only a handful of studies on the economic and social aspects of the mobile apps market emerging in recent years. Carare (2012) measures the ranking effect of app popularity on the consumers’ willingness to pay. Garg and Telang (2013) present a method to calibrate the salesrank relationship for the apps market using public data. Liu et al. (2012) examine the impact of freemium pricing model on the sales volume and revenue of the paid apps. Anindya and Han (2014) estimate consumer preferences towards attributes of mobile apps in the iOS and Android markets. However, the extant literature has left the issue of copycat apps largely unexplored. Our study is related to the studies about imitation and innovation. Nelson and Winter (1982) defines innovation, in particular technical innovation, as the implementation of a design for a new product, or of a new way to produce a product. In other words, an innovator works with an extremely sparse set of clues about the details to solve a revolutionary problem independently. In contrast, imitators will borrow heavily from what has been produced, although they can offer incremental improvement on other aspects of the product, such as user interface and detailed competences. Previous studies have examined whether imitation works as a driving force or binder of innovation, and the result is mixed. Conventional wisdom holds that imitating an innovation breaks the monopoly power of the original producer so that the price is driven down to the marginal cost by competition. However, a recent study (Bessen and Maskin 2009) argues that imitation promotes innovation in certain industries such as software and computers. This is because the innovation in such high tech industries is both sequential and complementary. In particular, each successive invention builds on the preceding one. Also, each potential innovator 6 takes a different research line and thereby enhances the overall probability that a particular goal is reached. Our definition of copycat apps and original apps are built upon this stream of literature. Our project is also related to the literature on counterfeits – unauthorized copies that infringe the trademark or copyright of the original product. Several analytical studies in this stream focus on the optimal combating strategies, but the predictions and implications vary from context to context. Grossman and Shapiro (1988a) propose a vertical differentiation model to describe the interaction between the domestic brand name company and a foreign deceptive counterfeiter. They show that the domestic company may raise or lower the quality to battle counterfeits, depending on whether the counterfeiters’ entry can be effectively deterred. In another early paper, Grossman and Shapiro (1988b) study the status goods market where consumers are not deceived by the foreign counterfeiter. They show that government enforcement and increased tariff can be effective combating strategies. Cho et al. (2013) contrast the optimal combating strategies for both deceptive and non-deceptive counterfeits. They conclude that the effectiveness of the strategies depends on the type of the counterfeiters that the brand-name company faces. A few empirical studies in marketing have tested the market outcomes of counterfeiting. For example, Qian (2008) observes that Chinese shoe manufactures would improve qualities, raise prices, and integrate with downstream distributors to reduce counterfeit sales. In a follow-up study, Qian (2011) finds that counterfeits have both advertising effects and substitution effects on original products. She concludes that the advertising effects have a commanding influence for high-end original product sales while the substitution effects prevail for low-end product sales. Nevertheless, the copycat issue in the mobile app market differs from the counterfeit issue in 7 several ways. These differences make it hard to generalize the insights on counterfeits to the mobile apps copycat problem. Finally, our study is related to the piracy issue of information goods, particularly how piracy has impacted legitimate sales. The empirical findings are mixed. For instance Oberholzer-Gee and Strumpf (2007) found that file sharing has an effect on the legal sales of music which is statistically indistinguishable from zero. Smith and Telang (2009) found that the availability of pirated content has no significant effect on post-broadcast DVD sales. While those studies conclude no effect of piracy and file sharing, Danaher et al. (2014) and several other empirical studies find that piracy has a significant negative impact on authentic sales. However, piracy is different from copycat in at least the following aspects. First, the content of the pirated goods is almost the same as that of authentic goods, which may not hold in the copycat settings. Second, consumers are very likely to be aware of the authentic goods, even before the release of the product. However, mobile users are not likely to have noticed the product as such. Third, the traditional digital goods are released in multiple channels, while both original and copycat apps are accessible through the same app store. Therefore, the insight from the piracy studies may not be generalizable to the mobile app settings. Research Context and Data The research context of this study is the iOS app store, a digital distribution platform for mobile applications developed and maintained by Apple, Inc. Initialized as the first mobile app store ever, the iOS app store has been exploding since its open in July 2008. Starting with 500 apps in 2008, over 1 million apps were available in 24 categories by Dec 2013. More than 30 billion apps had been downloaded, resulting in a total of $10 billion developer revenues over the year 2013 (Apple Press info 2013). 8 Apps can be priced either as paid or free. For paid apps, developers can freely choose a download price at multiple of $1 minus 1 cent. For free apps, they are charged at zero cost when users make the purchase. The store provides three top charts to help users to browse apps. They are the top free download chart, top paid download chart, and top grossing revenue chart. The top chats help consumers to discover the latest popular apps. Meanwhile, they help the developers monitor other apps in the marketplace. The dataset used in this study is publicly available app information on the U.S. iOS store for iPhone. It consists of a random sample of 10,100 action game apps by 5,141 developers released between July 2008 and Dec 2013. We focus on the action game category for two reasons. First, games contribute to the largest revenue in the mobile app market (approximately 75% of the revenue on iOS and around 90% of venue on Google Play, according to appannie.com 2014). Action game is among the largest game genres. Second, action game is among the most innovative genres on the platform. Many novel and famous mobile games such as “Angry Birds”, “Fruit Ninja”, and “Clash of Clans” belong to this genre. Compared with more traditional genres such as card games and casino games, the action games have a large variation in the app originality. We randomly choose 10,100 apps from the population of 31,159 action game apps. Accounting for 1/3 of the population, this sample should be unbiased and representative. Our data contains cross-sectional descriptions of the app landing pages on the iOS website in Dec 2013. In addition to the numerical characteristic such as price, file size and consumer rating, unstructured information such as the image of the app, the textual description of the app, and the user reviews are also carefully collected. Our data also includes a panel data on the daily download ranks, daily grossing ranks, download price, and version updates of those apps since the release of each app. For the rank tables, we observe the top 1,500 apps on top charts in the 9 genre of action games. We calibrate the daily download quantity from the daily rank data using the method proposed by Garg and Telang (2013).1 For version updates, we observe the date of version updates. For download price, we observe when the price is changed from which value to which value. We find 16,757 version updates and 32,523 price changes in our dataset. Finally, our data contains the panel of Google Search Trend of app titles on the web Internet. This data will be used as a control variable in our main analysis. Table 1 presents the summary statistics about the major variables in the cross-sectional data. We see that 48% of the apps in our sample are paid apps. The remaining 52% are free apps. The average download price is $0.78. The consumer rating which has an ordinal scale 1 through 5 has an average score of 3.43. The age of the app measuring the month count since release has an average value of 26.60. The game center dummy refers to whether the app is connected to the iOS game center which lets users play and share games with their friends. 38% of the apps are connected to the game center. 6,918 apps have sibling apps that belong to the same developer. 3,184 apps are the single apps published by their developers. Variable # Obs Mean Std. dev. Download price 10,100 0.7815 3.8655 0 349.99 Paid dummy 10,100 0.4810 0.4997 0 1 App age 10,100 26.5970 13.5489 1 65 Rating 10,100 3.4268 1.0041 1 5 Game center dummy 10,100 0.3823 0.4860 0 1 # Apps by the developer 10,100 7.7479 16.4381 1 113 # Characters in description 10,100 872.2314 676.4216 0 3994 # Screenshots 10,100 3.6557 0 6 1.7782 Min Max Table 1. Summary Statistics 1 Our estimated shape parameter is -0.9996, similar to their reported parameter -0.944. 10 Copycat Detection To distinguish the copycat apps and original apps, we propose and verified a functionality-based detection architecture that is able to provide accurate app identification. But before introducing our copycat detection strategy, we formally define the copycat apps and original apps based on the literature of innovation and imitation. Original apps refer to the apps that have endowed significant amount of resources to implement original ideas and create innovative apps. These apps offer unique functionality and gameplay that are fundamentally different from existing apps. In contrast, the copycat apps refer to the apps that have borrowed heavily from one or more existing apps in terms of functionality and gameplay. However, copycat apps can make humble adaptation to the original apps. It’s possible that the copycat apps have improved the user experience and app competence instead of a simple replication of the original idea. To investigate the heterogeneity of copycats more deeply, consistent with the theory (Grossman and Shapiro, 1988a, 1988b), we define two types of copycats: deceptive and non-deceptive. The deceptive copycats refer to apps that are designed to deceive consumers in the appearance of the app. When the consumer purchase the deceptive copycats, they are likely to believe they have purchased the original one. In contrast, the nondeceptive copycats refer to the apps that mimic the original app’s functionality, but maintain distinctive appearance. The developers of non-deceptive copycat apps make efforts to differentiate themselves from the original app. And consumers can easily tell apart the original and non-deceptive copycat apps. In our proposed copycat detection framework, a mobile app is modeled as a collection of different functionalities and appearances. We first partition all apps into collections of apps that have similar functionality. We then define the original apps and copycat apps in each collection. We finally determine whether the copycat app is deceptive or non-deceptive. 11 We achieve our goal by solving three important questions. First, given an app, we need to know what functionalities it provides. Although the functionality is often explicitly stated at the textual descriptions of the app on the landing page, it’s usually a short paragraph that may not give a comprehensive overview of the app gameplay. However, we noticed that the functionality can be repeated mentioned in the consumer review. Therefore, we conduct textual mining on both the descriptions and consumer reviews to extract the app functionality. Second, we need to partition the apps based on the functionality. Although there are limited types of functionality in the app market, the possible combinations of the functionality aspects can be huge. For example, one app can be specified as “endless running” plus “in-app-purchase” plus “tilt”, while another similar app has features such as “endless running” plus “touch screen”. To reduce the dimension of functionality, we conduct latent semantic analysis. Then we cluster the apps based on the similarity between apps. Third, we need to determine the level of imitation. For the copycat apps that use name and image similar to the original app, we identify them as deceptive copycat. For other apps that are explicitly differentiated from the original apps in the appearance, we identify them as nondeceptive. In the following part of this section, we discuss in detail how we process the publicly available but highly unstructured data to detect different types of copycat apps. Step 1: Detecting Similarity in App Textual Descriptions and Reviews Using NLP The main purpose of this step is to first map each individual app to a collection of functional features, and then measure the app similarity at the feature level. To achieve the goal, we first obtain the features of each app by processing textual information. We follow Hu and Liu 2004 to combine user reviews with textual descriptions. Noticing that the user reviews contain noisy information that is not directly related to app features, we filter the review content with the 12 following strategy. First, we combine all the descriptions of apps as a bag of words. We perform text preprocessing, including tokenization, removing stop words, and Part-of-Speech tagging. Second, we keep the unique nouns and verbs in the bag of words to create the dictionary of app features. We only keep the nouns and verbs because we believe they are more relevant to app features than other word categories. Third, given an app, we compute the term weights of the app features that are included in its preprocessed textual descriptions and the most useful user reviews. The term weights are calculated using the standard TF-IDF (Salton and McGill 1983) scheme. By doing so, we map each app to a vector with weighted frequencies of app features. However, using purely TF-IDF can be problematic (Aggarwal and Zhai 2012). First, the dimensionality of the text representation is very large (in our case there are 26,642 unique stem words for 10,100 documents), but the underlying data is sparse. Second, the TF-IDF algorithm assumes the words are independent of each other. It ignores the synonymy, polysemy and underlying correlations between words. To solve the potential issues, we conduct latent semantic analysis (LSA). In particular, we conduct the singular value decomposition (SVD) method (Landauer et al. 1998). SVD is widely used in the large-scale data mining contexts to reduce the dimension of the feature vectors but preserve the similar structure among vectors. Hence, we apply SVD to the TF vectors. Finally, we apply the cosine similarity function to calculate the pairwise app similarities. The cosine similarity is a value between 0 and 1 that captures the probability of being identical. A larger value indicates that the pair of apps share a stronger functional similarity based on their textual descriptions. 13 Step 2: Network-Based Clustering Using Markov Clustering Algorithm The output of step one can be viewed as an undirected probabilistic graph. In this graph, a node represents an app. The pairwise similarity of apps represents an arc. The arc is undirected. And the value of similarity represents how likely the two connected nodes are the same. Therefore, our goal is to cluster this graph network based on the structure of the network. The expected outputs of step two are clusters of apps where apps in the same cluster are very similar in terms of functionality and gameplay, and apps in different clusters are divergent. To achieve this goal, we apply a network-based clustering method. It is an unsupervised learning method that allows us to leverage the network structure to extract groups of similar items. In particular, we use the Markov clustering algorithm (MCL) to cluster our app network (Dongen 2000). Compared with distance-based clustering algorithms such as k-means (MacQueen 1967) and hierarchical clustering (Eisen et al. 1998), MCL has a few merits (Satuluri et al. 2010). First, unlike K-Means based algorithm that converges to one of numerous local minima, MCL is insensitive to the initial starting conditions. Second, it doesn’t take any default number of clusters as input. Instead the algorithm allows the internal structure of the network to determine the granularity of the cluster. Third, compared with many state-of-the-art network clustering algorithms, it is more noise-tolerant as well as effective at discovering the cluster structure (Brohee and Helden 2006). This method has been widely applied in bioinformatics (Satuluri et al. 2010). It converges at a speed linear to the size of the matrix (Dongen, 2000). The basic intuition of the algorithm is based on random walk. The probability of visiting a connected node is proportional to the weight on the arc. In other words, the random walk will stabilize inside the dense regions of the network after many steps. The stabilized regions shape the clustering output and reflect the intrinsic structure of the network. Once we extract the clusters of similar apps, the next step is to distinguish the original apps from the followers. We 14 consider the app release date as our standard in this study. If an app is the first app released in a cluster, it’s labeled as original. Otherwise it’s labeled as a copycat app. However, if the original developer releases several apps in the same cluster, e.g. “Angry Birds” and “Angry Birds Space,” the differentiated apps are also labeled as original. Step 3a: Detecting Similarity in App Titles Using String Soft Matching According to theory, there are mainly two types of copycats (e.g., Grossman and Shapiro, 1988a, 1988b), deceptive and non-deceptive, depending on the nature of imitation. In our case, we identify an app as deceptive if either its title is similar to the original app’s title, or its icon looks similar to the original app’s icon. On the other hand, we identify the app as non-deceptive if it is identified as copycat from the previous step (i.e., similar in its textual descriptions), but neither the title nor the icon is similar to the original app. Under this definition, we can empirically identify the deceptive copycat apps from the non-deceptive ones by conducting further analyses to extract the similarity in app titles and icon images. We achieve our goal by conducting two separate analyses using string soft matching (Step 3a) and image matching analysis (Step 3b). To extract the similarity in app titles, we conduct the string soft matching techniques using the edit distance metrics to compare app names. The edit distance between two strings is the minimum number of edit operations needed to convert from one string to the other (Elmagarmid et al. 2007). There are three kinds of edit operations: inserting, deleting, and replacing. Each edit operation has cost 1. For each copycat app in a cluster, we compute a pairwise distance between the copycat app name and the original app name. A smaller distance indicates higher similarity. For normalization, we transform the computed distance values to a scale between 0 and 1. Using a rule of thumb cutoff value 0.7 (Kim and Lee, 2012), we are able to find the app pairs with 15 similar titles. Hence, we label a copycat as deceptive if its normalized edit distance to the original app in a cluster is above 0.7. Step 3b: Detecting Similarity in App Icons Using Image Matching Analysis To detect the imitation of app icons, we need an image matching algorithm that is invariant to image scale, rotation, change in illumination etc. This is because copycat developers may not take the exactly same image as the original. But it’s very likely that the copycat developers rescale, rotate, or add noises to the original icon. To address this challenge, we employ the ScaleInvariant Feature Transform (SIFT) algorithm proposed by Lowe (1999). The algorithm is one of the most robust and widely used image matching algorithms based on local features in the field of computer vision (Mikolajczyk and Schmid 2005). It extracts a core set of features from an image that reflect the most important and distinctive information from local regions of the image. After we represent the image by the core set of features, we can match this image with another image, a part of the other image, or a subset of the core features extracted from the other image. Therefore, SIFT is able to detect graphical similar patterns between images, and moreover even when the images have gone through structural transformations. We conduct the matching among all copycats’ icons against the original app’s icon in each cluster. The SIFT method will compute a matching score which captures the level of similarity between each copycat and the original app in a cluster. We label a copycat as deceptive if the image matching score exceeds a threshold. Figure 1 reveals two examples of the matching results. The first image is the icon of “angry birds”, which is a famous original game. The second image is the icon of “cut the birds”, which has very similar appearance to “angry birds” but is produced by an unrelated developer. The SIFT algorithm recognizes them as similar images. The third image is from “plants vs. zombies”, which is also a featured original game. The last image 16 is from “cut the zombies,” which looks like the original game but is offered by a different producer. The image matching algorithm also recognizes them as similar images. Overall, the image matching process reports 473 authentic-copycat pairs of similar images. Figure 1. Examples of Original Icon vs. Deceptive Icon In summary, we apply an automatic way of detecting different types of copycat apps using different machine learning techniques, including Natural Language Processing, Latent Semantic Analysis, Network-based Clustering, String Soft Matching and Image Matching analysis. Table 2 provides an overview of our goals, data in use, and methods in each step above. Goals Data Extract app functional similarity based on textual descriptions and reviews App textual description, user reviews Cluster apps based on functional similarity Textual similarity scores derived Identify original apps vs. copycats Release date, developer ID Identify deceptive vs. nondeceptive copycats App title, App icon image Methods Part-of-Speech; TF-IDF; Latent Semantic Analysis (Singular Vector Decomposition); Cosine Similarity Markov Cluster Algorithm --- String Soft Matching (Edit Distance); Image Matching Analysis (ScaleInvariant Feature Transform) Table 2. Summary of Different Methods for Copycat Detection Main Findings The feature extraction process is based on 35,996 unique nouns and verbs from the description of 10,100 apps. We report the results of feature extraction for five popular apps in Table 3. The 17 table shows that most of the extracted features are meaningful and specific to the app content. The good quality of the feature extraction has laid the foundations for effective copycat detection. App Name Features Angry Birds bird, angry, crash, level, push, power, new, up, love Contract Killer weapon, contract, mission, killer, gun, crash, graphic, iPad Despicable Me: Minion minion, gameloft, Christmas, rush, run, keep, cute, addict Rush Doodle Jump doodle, jump, monster, tilt, worth, multiplay, score, rock Fruit Ninja fruit, mode, halfbrick, juice, blade, hit, arcade, kid, unlock Table 3. Example of extracted App Features (1) Evolution of Mobile App Copycats. From our Network-based Clustering analysis in Step 2, we acquire 4080 clusters among which 1791 clusters contain more than one app. To explore the evolution of the copycats, we take the snapshots of the 50 largest clusters over time as shown in Figure 2. From left to right, the subgraphs represent the structures of the clusters in (1a) Dec 2009, (1b) Dec 2011, and (1c) Dec 2013, respectively. In these sub-graphs, a node refers to an app. An arc represents a nonzero similarity score between the two connected apps. Figure 2 indicates (i) the density of the clusters has grown rapidly from 2009 to 2013; (2) different clusters grow at different paces; (3) many recently released apps are not original; (4) some original apps are more likely to attract copycats than others. (2) Release of New (Original) Apps. To verify the first observation, we plot the number of app released per month and the time trend of percentage of original apps. Our findings are intriguing: although the mobile app market is growing with more new apps released over time, the proportion of original apps is in fact dropping rapidly. For example, in Figure 3 the number of action games released has increased by 18 more than 7 times from less than 50 per month in 2008 to 400 per month in 2013. Nevertheless, Figure 4 shows that the percentage of original apps among the released apps has reduced straightly from over 90% in Dec 2008 to around 45% in late 2013. In other words, for every two app released in late 2013, one is a clone of an existing app. (1a) Dec 2009 (1b) Dec 2011 (1c) Dec 2013 Figure 2. Clusters of Mobile App Copycats Over Time Figure 3. Count of New App Release Over Time Figure 4. Rate of Original Apps Release Over Time (3) Comparison between types of apps. After the process of copycat detection, we acquire the sets of original apps, the deceptive copycats, and the non-deceptive copycats. We find that there exist a significantly higher number of non-deceptive copycats than the deceptive copycats. In our dataset with over 10,000 apps, there are 4.84% deceptive copycats, 36.17% non-deceptive copycats, and remaining 58.98% original. 19 Moreover, we compare the means of a few cross-sectional variables for the three types of apps in Table 4. The joint test for the group means are also appended in the last column. The table shows that the characteristics of the original apps and copycats are significantly different in various aspects. First, the estimated daily download and price are highest for the original apps on average, followed by the non-deceptive copycats, then by the deceptive copycats. Second, the proportion of paid apps is higher for the original apps than for copycats. Third, the user rating and the number of screenshots tend to be higher for the copycats than for the original apps. For the original apps, their characteristics also vary subject to the numbers of copycats they have. Table 5 reports the means of variables for two groups of original apps. The first group has no copycat apps at all, and the second group has at least one copycat app. The download quantity is higher on average for the second group of apps. Also, the percentage of paid apps, number of apps by the same developer, app age, and number of ratings per month are higher for the second group. Interestingly, the average user rating is lower for the second group than for the first group. Variable Est daily quantity Original download 39.0546 Deceptive Non-deceptive F-test /Chi-square test 38.6382 38.8713 0.0022 Paid dummy 0.4954 0.4867 0.4566 0.0000 # Apps by developer 11.8096 5.4412 6.3134 0.0000 Price 0.8616 0.7231 0.6357 0.0000 Rating 3.3847 3.3916 3.4611 0.0000 App age 30.7677 23.2638 23.3022 0.0000 # Ratings per month 56.0764 13.3219 40.9430 0.3095 # Screenshots 3.5041 3.5910 3.6844 0.0001 726.7996 869.0194 0.0000 # Characters description in 969.8589 Table 4. Summary Statistics by App Type 20 Variable Original w/o copycats Original w/ copycats T-test/Chi-square test Est daily download quantity 38.6640 39.2928 0.0001 Paid dummy 0.4696 0.5116 0.0000 Cluster size 1 5.4383 0.0000 # Apps by developer 3.6522 11.8096 0.0000 Price 0.7458 0.8616 0.0000 Rating 3.4465 3.3847 0.0000 App age 25.8847 30.7677 0.0000 # Ratings per month 20.6849 56.0764 0.0047 # Screenshots 3.8685 3.5041 0.0000 # Characters in description 751.1429 969.8589 0.0000 Table 5. Comparison within Original Apps External Evaluation of Clustering The accuracy of the proposed copycat detection method is of vital importance as it will be used as input to the subsequent economic analysis. Therefore, we carefully evaluate the accuracy of the proposed method by conducting survey on Amazon Mechanical Turk. The results of the survey serve as external benchmarks so that it can be thought as a gold standard for evaluation. Therefore we are able to assess the closeness of our proposed approach to the predetermined benchmark classification. Our final results show that the proposed copycat detection approach is able to correctly identify whether apps are similar over 91.9% of the time. Amazon Mechanical Turk is a crowdsourcing web service that coordinates the supply and demand of tasks that require human intelligence to complete. It is an online labor market where workers are recruited by requesters for the execution of well-defined simple tasks such as image tagging, sentiment judgment, and survey completion. In machine learning and related areas, it has been heavily used for evaluating the performance of unsupervised methods (Heer and Bostock 2010) that rivals the quality of work by highly paid, domain-specific experts. 21 Specifically, we structure the external evaluation in the following four steps through which we are able to decompose the complicated evaluation task into series of smaller tasks. First, 1250 pairs of apps are sampled from all possible combinations of apps. Due to the sparcity of the similar pairs, these 1250 pairs are carefully sampled in a two-step manner. We will introduce the particular sampling strategy subsequently. Second, 250 questionnaires are created and published on Amazon MTurk. Each questionnaire asks to compare 5 distinctive pairs of apps based on the name, image, and gameplay. Each questionnaire is answered by 3 independent Amazon MTurk workers. Third, majority vote is employed to determine whether the apps are similar and which aspects are similar. The results of the majority vote work as the gold standard for the external evaluation. Finally, the quality of the copycat detection method is measured by the commonly used Rand measure and F-measure (e.g. Larsen and Aone 1999). As we briefly mentioned, the apps for Mturk evaluation are sampled carefully in two steps. This particular sampling strategy is needed because the sparcity of similar app pairs makes the naïve random sampling strategy unattractive. With sparse similar pairs, very few pairs in naïve random sample are similar pairs. However, the accuracy measures are not sensitive to false positives and false negatives. It is possible that the accuracy measures look great while the algorithm works poorly. To solve the issue introduced by sparse similar pairs, we sample the 1,250 pairs such that the proportion of the similar pairs is substantially high. We do so by temporarily treating the machine learning results as if they were ground truth. We first generate a random similar app for each app (according to the machine learning similarity). Similarly we generate a random unrelated app for each app (according to the machine learning similarity). Then we generate random samples from these two pools to form the 1250 pairs. 22 The primary results of the Mturk survey are four count numbers: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In our context, TP refers to the case that both the human evaluation and the copycat detection method report the pair of apps to be similar. TN refers to the case that both evaluations determine the pair to be very different. FP refers to the case that the human evaluation reports the apps to be different in all aspects while the machine learning results show them to be similar. And FN refers to the case that the human evaluation reports the apps to be similar while the algorithm reports them as different. The external evaluation supports that our proposed copycat detection method is accurate. Among the 1250 tested pairs, there are 501 true positive (TP) pairs, 648 true negative (TN) pairs, 67 false positive (FP) pairs, and 34 false negative (FN) pairs. Therefore, the Rand measure of our method is 0.919. The precision is 0.882. The recall is 0.936. And the F-measure is 0.908. In terms of the quality of the survey data, we find that (1) 96.9% of the workers answered our pre-test questionnaire correctly, (2) the demographics of the workers are quite diverse and representative, (3)90.16% of the answers from three different independent workers are always the same. Thus these results dovetail well with our machine learning output, which suggest that our copycat detection method can capture the true imitation relationship between apps. Model Consequences of Copycats In this section, we exploit the panel structure of the data to analyze whether copycat apps cannibalize the sales of original apps, and if so, under which condition is the cannibalization statistically and economically significant. Sales cannibalization from copycat apps may be particularly concerning in the mobile app industry, as the copycat apps compete with the original apps by providing the similar product 23 with almost the same functionality, reduced price, and good user experience. Consumers who would otherwise purchase the original product are now attracted by the copycat competitors. In this view, the original app as the sole seller would have obtained a higher monopoly profit than in the competitive world. Therefore, to stimulate innovation, policies that protect the original apps may be justified. However, the above intuition might be incomplete, because it is possible that the copycat apps have no effect on original sales, or even stimulate original sales. In the “stimulate sales” view, being imitated can be a strategic advantage for the original firm under certain conditions. One such condition is the present of network externality of the technology (Conner 1995). In Conner’s framework, the imitators bring lower-valuing buyers into the market. They collectively increase the user base of the type of technology. Due to the positive network externality, the expanded size of user base increases the perceived quality of the product. Consequently, the innovator’s product becomes more attractive to high- and medium-valuing buyers. When the benefit of added user base surpasses the sales lost to the clones, the innovator earns higher payoffs than as a monopolist. In this case, imitation works as reward of innovation. Moreover, Conner finds that the returns from imitation depend centrally on the magnitude of the network externality and the degree of consumer-perceived quality differences between the innovator’s and the imitator’s products. When the size of the network externality is large, or when the gap between the perceived quality is large, the benefit from imitation is increased. In other independent studies, researchers have also found that copying can increase firm’s profit, lead to better quality product, and increase social welfare even when there are no network effects and the market is saturated (Jain 2008). This is because copying can lead to reduced price 24 competition by allowing price-sensitive consumers to buy copies (Gu and Mahajan 2005, Jain 2008). In the mobile app context, especially for action games, the magnitude of network externality might not be dominant. However, it is possible that there is significant spillover effect of the copycat app, especially if the copycat app is associated with the original app through similar appearance. A typical mobile app shopper faces the following decisions: which apps to consider, and which app she is willing to buy. We argue that the anonymous amount of available apps, the nontrivial search cost, and the limited brand effect of the developers, among other factors, have made individual apps very difficult to stand out in the consideration stage. However, if the mobile user observes many similar apps of one type, she can probably believe this type of apps have large demand among early buyers. The perceived quality of this type of apps goes up. Consequently, potential consumer who would have ignored the entire group of apps is now aware of both copycat apps as well as the original ones. Then in the purchase stage, she can compare the group of apps and choose the optimal one. As the group of apps receives more awareness due to imitation, the demand of original app might go up. In particular, such spillover effect should be more salient when the copycat app is more easily to be associated with the original app, for example through similar looking. Therefore, the deceptive copycat apps are more likely to have spillover effect than the non-deceptive copycats. Also, such spillover effect should be more significant when the quality of the original app is much better than the quality of the copycat app. The low quality copycat apps are more likely to have spillover effect than the high quality copycat apps. To test the above competition and stimulation points of view, we conduct panel data analysis as follows. For each original app, we take a snapshot of the list of copycat apps and the download 25 rank as well as the time-varying characteristics, at the end of each day. We focus our analysis on a monthly basis, because we have a long panel of five years, and month can be the most salient unit of measurement when the app developers make decision. Nevertheless, the main results are robust when analyzed on a weekly basis. We denote the log transformed sum of download amount that original app i receives during tth month with yit where t=1,…,T. Similarly, we denote the log transformed sum of copycat download for the original app i received during tth month with xit . A naïve test of sales cannibalization would be to look for the sequential correlation in the download performance such that the sales of the original app are correlated with the sales of copycat apps. This test translates into a regression in which the dependent variable include yit , and the independent variables include xit , time-varying attributes of the original app Dit , time-invariant characteristics λi , time trend of the market φt : yit = αxit + Dit β1 + λi + φt + εit Where t=2,…T. The time-varying characteristics Dit includes the following variables: Log Original Price, the log transformed monthly average download price of the original app, Original Version, the monthly count of new version release, App Age, the age of the app at current month, App Age2, the square of the app age, Log Developer Download, the log transformed sum of monthly download of other apps by the same developer, Dev Version, the count of version updates of other apps by the same developer. The scalar α and vector β1 are parameters to be estimated. The available data are unlikely to capture every source of heterogeneity across apps. For example, an original app is different from other original apps in terms of imitability, functionality, popularity, etc. which are very likely to be correlated with both the demand of 26 copycat apps as well as the demand of original apps. Statistically, the absence of such variables results in omitted variable bias in the estimate of α. Fortunately, the panel structure of the data allows us to capture correlations between unobserved app features and market demand. In this model, we have ruled out the time-invariant unobserved heterogeneity across apps, as we decompose λi from the error term using fixed effects. The identification assumption is that unobservable heterogeneity of original apps λi is time invariant. This assumption is plausible in the mobile app setting because the major functional characteristics of an app are unlikely to change over the life cycle. Based on this assumption, we report the ordinary least square (OLS) estimation results of Equation (1) with standard errors clustered at the developer level in Column (1) of Table 6. The correlation between Log Copycat Apps and Log Original is positive but statistically insignificant. However, this result can be attributed to the following mechanisms: time-varying trend for the demand, selection of copycat, and the net effect of competition and sales stimulation. Below we discuss the empirical strategy employed to disentangle these mechanism. Identification of the Consequence Model Socially correlated purchase decisions may have also resulted from time-varying factors among consumers. In particular, mobile app users are influenced by the marketing mix of the products, the word-of-mouth of their friends, etc. Consider two original apps that are otherwise identical, the original app that has received a higher amount of marketing mix may still be more desirable. The marketing mix is likely to change over time, while it is correlated with both the demand of copycat apps and the demand of original apps. To exploit such variation, we decompose the unobserved time trends into two parts: the trends that impact all apps in the store, and the trends that impact the specific type of copycat and original apps. The time trends that affect all apps 27 (such as Holiday season, price change of the mobile phone, etc.) are already captured in the time dummy φt in the fixed effect model. For the time trends that affect the specific type of apps, we combine two strategies together: using Google Search Trend as a proxy of unobserved trend, and finding the appropriate instrumental variable for the copycat sales. In particular, Google Search Trend works as an index to measure the search volume of the title of the original apps on the web Internet. Although it may not be a perfect measure of time-varying demand shocks, it should be highly correlated with them. To operationalize this idea, we augment the model by explicitly including the Google Search Trend in our model. We expect the coefficient 𝛼 less likely to be biased after controlling for the Google Search Trend. To further capture such time-varying unobserved heterogeneity that drives the changes of both copycat downloads and original app downloads, we introduce two different types of instrumental variables for the copycat downloads. A valid instrumental variable should be correlated with the sales of copycat apps but uncorrelated with the time-varying unobserved error term (which might be correlated with the original app’s downloads). In the panel data setting, a valid instrumental variable should also have variations over time. Following the literature of using lagged price as the instrumental variable for current period price, we use the lagged copycat download as an instrument for current period download. In particular, we use the lagged terms in three successive months as the set of instrumental variables. The underlying assumption is that the lagged period copycat sales are uncorrelated with current period common shocks. However, if this assumption is violated, we will have underestimated or overestimated coefficients, depending on the correlation between the unobserved time trend and the original download as well as the correlation between the unobserved time trend and the copycat download. To address this concern, we propose a second type of instrumental variable which is the average file size of 28 the copycat apps in the cluster. The file size of copycat apps should be significantly correlated with the copycat download, which is verified with our data. Intuitively, the file size is associated with copycat demand as it reflects the richness of the content. Moreover, it’s conservative to assume that file size of copycat apps is uncorrelated with the time-varying unobserved shocks for the original download (such as the marketing mix, word-of-mouth etc.). Statistically, with two different types of instrumental variables, we are able to increase the power of the analysis by conducting the over-identification test for these instrumental variables. (1) Fixed effect Log Copycat Apps 0.0029 (0.0135) 0.0139 (0.0110) 0.6624*** (0.0408) 0.2251*** (0.0193) -0.0221*** (0.0064) -0.0277*** (0.0034) 0.0002*** (0.0000) (2) With Google search trend (3) With Lagged terms as IV 0.0027 (0.0135) 0.0135 (0.0110) 0.6609*** (0.0406) 0.2248*** (0.0193) -0.0223*** (0.0064) -0.02791*** (0.0034) 0.0002*** (0.0000) 0.1103*** (0.0492) Yes Yes 0.2306 -0.0165 (0.0166) Log Original Price 0.0141 (0.0106) Original Version 0.5705*** (0.0456) Log Developer Download 0.2091*** (0.0201) Dev Version -0.0196*** (0.0061) App Age -0.0244*** (0.0034) App Age2 0.0002** (0.0000) Log Search 0.1036** (0.0409) Individual Fixed Effect Yes Yes Time Fixed Effect Yes Yes 2 Adjusted/pseudo-R 0.2209 0.1302 Weak Instrument F test 6239.4 Over-identification J test 3.159 Number of individuals 3,667 3,667 3,667 Number of observations 109,166 109,166 109,166 * ** *** Note: Std. Err. in parentheses p < 0.1, p < 0.05, p < 0.01 (4) With Lagged term and file size as IV -0.0173 (0.0166) 0.0141 (0.0106) 0.5705*** (0.0456) 0.2092*** (0.0201) -0.0196*** (0.0061) -0.0244*** (0.0034) 0.0002** (0.0000) 0.1037** (0.0409) Yes Yes 0.1301 4106.8 7.622 3,667 109,166 Table 6. Overall Effect in the Consequence Model 29 The results in model (1) to (4) consistently show that the overall effect of copycat app sales on original sales is statistically insignificant. This implies that the competition point of view is incomplete in understanding the interplay between copycat apps and original apps. The next question is: what is the complete view? There are two possibilities: there’s no sales cannibalization of copycat apps, or the insignificant result is a net effect of countervailing effects. We explore these possibilities by splitting Log Copycat Apps into Log Highly-rated Copycat and Log Lowly-rated Copycat as two independent variables in the main regression equation. Highlyrated copycat apps refer to the imitators that have higher aggregate consumer ratings than the original app, whereas lowly rated copycat apps refer to the imitators that have lower aggregate consumer ratings than the original app. If the “no effect” view holds, we should expect to see the heterogeneity of copycat ratings doesn’t change the regression coefficients. However, if the insignificant main result is pooling result of conflicting effects, we should expect the coefficient of these two subgroups of copycat to differ. As consumer ratings can approximate the perceived quality of apps, we expect the copycat apps that have lower ratings are more likely to help the sales of original apps, while the copycat apps that have higher ratings are more likely to eat up the sales of original apps. Also, we hypothesize that app similarity positively contributes to the helpfulness of copycat apps. This is because the similarity in appearance makes it easy to associate the copycat app with the original app. Mobile users arriving at the copycat apps are more likely to pay attention to and get engaged in similar apps (which can be the original) at the exploring stage. Therefore, it is possible that non-deceptive copycat apps have a dominant substitution effect, while the deceptive apps have a dominant positive spillover effect. To examine the heterogeneity in the appearance of copycats, we break down the sales of copycat apps into sales of deceptive copycat apps and 30 sales of non-deceptive copycat apps. Lastly, we want to examine the interactions between deceptiveness and consumer rating. To operationalize this idea, we split the copycat apps into four subgroups: highly rated deceptive copycat, lowly rated deceptive copycat, highly rated nondeceptive copycat, and lowly rated non-deceptive copycat. Results Table 7 Column 1 demonstrates the estimation results by separately examining the effects of highly rated and lowly rated copycats. Log High Ratings Copycat has a significant and negative main effect, which confirms the competition point of view — copycat apps that have higher perceived quality than the original do hurt the demand of original app. This cannibalization result is an analogy for Danaher et al. (2014) on piracy of music sales. The point estimate of -0.0486 infers that a 10% increase in the relatively high quality copycat sales results in an average 0.486% decrease of the original app’s sales. More interestingly, Log Low Ratings Copycat has a significant and positive effect, which supports our hypothesis that copycat apps do help the original ones in certain conditions. It seems that the low quality copycat apps increase the awareness of the group of apps with similar functionality. As the copycat app is of relatively bad quality, the consumers prefer the original to the copycat. In particular, a 10% increase in the relatively low quality copycat sales results in an average 0.953% increase of the original app’s sales. The R2 statistics increases to 14.79% after we split copycat apps by consumer ratings, compared with 13.01% in Table 6 column 4. The coefficients for control variables remain similar in this model compared with Table 6. Very intuitively, the version update of the original app is positively associated with higher original download, while the version update of the sibling apps by the same developer is negatively associated with original download. The download of sibling apps is positively associated with 31 the original download, perhaps because the cross-promotion is likely to happen between sibling apps. The download quantity decreases as the app age increases, but the decreasing speed slows down as the app gets older. Finally, the model shows that the Google Search Trend has a positive and significant association with the download performance. Table 7 column 2 presents the estimated treatment effect for copycat apps with different levels of deceptiveness. As expected, non-deceptive copycats tend to compete directly with the original apps for demand. The point estimate of -0.0561 for Log Non-deceptive Copycat indicates that a 10% increase in the non-deceptive copycats’ download quantity results in an average 0.56% decrease in the original app’s download quantity. However, the effect of deceptive copycat apps goes against the hypothesis as the coefficient is negative but statistically insignificant. This result once again supports our hypothesis that different types of copycats have different effects on the sales of the original app. In particular, if the copycat app is non-deceptive, it turns to compete with the original apps in all dimensions so that the substitution effect dominates. However, the non-deceptive copycat apps might have mixed effect or no effect. To further explore the heterogeneity in deceptive and non-deceptive copycats separately, we consider the interactions between copycat quality and copycat type. We have thus specified four sub-categories: high quality deceptive copycats, low quality deceptive copycats, high quality non-deceptive copycats and low quality non-deceptive copycats. Consistent with the model of Table 7 column 1, we define a high quality copycat app as consumer ratings to be higher than that of the corresponding original app. A low quality copycat app refers to one with consumer ratings lower than that of the corresponding original app. The estimation results are reported in Table 7 Column 3. 32 Interestingly, we find significant negative effect, significant positive effect, and insignificant effect for different subgroups of copycat apps. It once again suggests that there are indeed conflicting forces in affecting the consequences of different copycats. In particular, high quality non-deceptive copycats have a negative effect on the original app sales mainly due to the substitution effect. This result is consistent with the hypothesis that high quality apps are more competitive, and non-deceptive copycat apps are more competitive. However, the negative effect from the high quality deceptive copycats is not statistically different from zero. Again, this result is consistent with the hypothesis that similarity in appearance can potentially generate spillover demand to the entire group of similar apps. When the substitution effect cancels out with the spillover effect, we would like to expect the main effect of high quality deceptive copycat apps to have almost no impact. In contrast, for low quality copycat apps, deceptive copycat apps results in increase in the original app’s sales. The point estimate suggests that a 10% increase in low quality deceptive copycat downloads results in an average of 0.948% increase in the original app’s download. But for non-deceptive copycat apps, such spillover effect is statistically insignificant. This result is also consistent with our previous hypothesis that original apps are less likely to benefit from copycat apps that have distinct appearance such as title and icon. The above analysis highlights the importance of quality difference and deceptiveness in the impact of copycat sales. How can original apps respond to the copycat apps in return? If the developers of original apps want to maximize the benefit of copycat apps and avoid competition, the original apps should lead the market not only by providing the first app of this type, but also by providing high quality products. This may require the original apps to improve the product continuously, especially if there are many high quality copycat apps coming in to the market. On 33 the other hand, the original apps many not need to worry about the deceptive copycat apps, as the appearance similarity tend to benefit the original apps instead of hurting them. (1)Rating differences Log High Ratings Copycat (2)Deceptive ness (3)Rating and deceptiveness -0.0486*** (0.0156) 0.0953*** (0.0337) Log Low Ratings Copycat Log Deceptive Copycat Log Non-deceptive Copycat 0.0564 (0.0480) -0.0561*** (0.0167) Log High Ratings Deceptive Copycat Log Low Ratings Deceptive Copycat Log High Ratings Non-deceptive Copycat Log Low Ratings Non-deceptive Copycat Log Original Price 0.0160 0.0160 (0.0111) (0.0113) Original Version 0.6518*** 0.6696*** (0.0407) (0.0435) Log Developer Download 0.2140*** 0.2353*** (0.0192) (0.0207) Dev Version -0.0233*** -0.02110*** (0.0062) (0.0065) App Age -0.0333*** -0.0301*** (0.0031) (0.0034) 2 App Age 0.0003*** 0.0002*** (0.0000) (0.0000) Log Search 0.1060** 0.0982** (0.0436) (0.0471) Individual Fixed Effect Yes Yes Time Fixed Effect Yes Yes Adjusted/pseudo-R2 0.1479 0.1217 Weak Instrument F test 5632.4 6532.1 Overidentification J test 7.023 4.586 Number of individuals 3,667 3,667 N 109,166 109,166 Std. Err. in parentheses * p < 0.1, ** p < 0.05, *** p < 0.01 -0.0641 (0.0815) 0.0948*** (0.0345) -0.0892*** (0.0156) 0.0125 (0.0607) 0.0161 (0.0111) 0.6516*** (0.0407) 0.2140*** (0.0191) -0.0233*** (0.0062) -0.0334*** (0.0031) 0.0003** (0.0000) 0.1060** (0.0436) Yes Yes 0.1480 2651.0 11.159 3,667 109,166 Table 7. Results of Consequence Model 34 Robustness Checks One restriction about the panel analysis is that the panelists cannot include original apps that have never been copied by any imitators. This is because statistically the variation of the independent variable cannot be zero. Ideally, we would like to randomly assign whether an original app is followed by none or at least one copycat app. Then we could compare the demand of original apps under the treatment of copycat apps. However, such experiment is too expensive and almost infeasible for researchers. A compromising strategy is to conduct propensity score matching in a separate cross-sectional analysis. The goal of the matching process is to generate a matched sample to mimic a randomized experiment where whether an original app is followed by copycat apps or not is randomly assigned by researchers. Then we compare the performance of original apps for the treatment group and control group. Matching original apps in this way should substantially reduce any remaining selection bias issues. To conduct propensity score matching, we need to identify a set of observable covariates X that influences treatment decisions and apps’ performance simultaneously. The goal is to balance the distribution of covariates so that the difference between these two groups is attributed to the treatment only. The selection criteria of covariates include (1) they affect both treatment and outcome; (2) they are not affected by treatment decisions or anticipation of it. In our context, the qualified covariates include most of the observable app features: download price, length of descriptions, number of supported devices, number of screenshots, file size, whether the app is connected to the game center, the level of content advisory, and number of clusters the developer belongs to. We use the standard logit regression and Radius Matching method to match original apps. And the treatment is defined as whether the original app has been followed by at least one copycat imitator in the first two years since release. By doing so, we create the artificial 35 treatment group and the artificial control group that have balanced propensity score (Figure 5). Finally the treatment effect is estimated as the mean difference between the treatment group and control group. We expect the propensity score matching analysis to provide insights similar to the results using panel data. 0 .1 .2 Propensity Score Untreated .3 .4 Treated Figure 5. Propensity Score Matching for the Existence of Copycat Apps The propensity score matching analysis shows that the existence of a copycat app reduces 367 downloads per month for an original app on average. When the highly rated copycat apps exist, the average treatment effect is a decrease of 605 download per month. And for lowly rated copycat apps, the average treatment effect is an increase of 571 download per month. Comparing the results of the propensity score matching with the results of the main model, we find consistent evidence that the heterogeneity of the copycat apps matters to the direction of the impact. Finally, in a series of robustness checks, we verify whether the finding of heterogeneous copycat impact is robust with respect to a set of alternative specifications. First, we alter the shape and scale parameter in the rank to download projection. The result is reported in Table 8 Column 1. Compared with Table 7 Column 3, the results are qualitatively the same. Second, we exclude the 36 reviews from the copycat detection process, and re-do the copycat detection process as well as the econometric analysis. Interestingly, we find the level of clustering granularity is slightly changed, while the direction of effect remains similar (Table 8 Column 2). Third, we aggregate the data biweekly instead of monthly. The results are reported in Table 8 Column 3. (1) Different download projection Log High Ratings Deceptive Copycat -0.0233 (0.0230) Log Low Ratings Deceptive Copycat -0.0296 (0.0249) Log High Ratings Non-deceptive Copycat -0.0590*** (0.0067) Log Low Ratings Non-deceptive Copycat 0.0239*** (0.0077) Log Original Price 0.0523*** (0.0125) Original Version 0.9320*** (0.0445) Log Developer Download 0.2396*** (0.0199) Dev Version -0.0243*** (0.0064) App Age -0.0492*** (0.0032) App Age2 0.0005*** (0.0000) Log Search 0.0371 (0.0257) Individual Fixed Effect Yes Time Fixed Effect Yes 2 Adjusted/pseudo-R 0.1229 Weak Instrument F test 11051.8 Overidentification J test 10.8 Number of individuals 3,667 N 109,166 Std. Err. in parentheses * p < 0.1, ** p < 0.05, *** p < 0.01 (2) No reviews in copycat detection -0.0957*** (0.0246) 0.0444* (0.0213) -0.0391* (0.0162) 0.0351 (0.0303) 0.0191 (0.0118) 0.1800*** (0.0425) 0.2050*** (0.0206) -0.0126*** (0.0039) -0.0043*** (0.0012) 0.0000 (0.0000) 0.0199*** (0.0031) Yes Yes 0.2284 144.20 8.28 3,667 109,166 (3) Biweekly panel -0.0226 (0.0228) -0.0273 (0.0291) -0.0514*** (0.0065) 0.0448*** (0.0077) 0.0344 (0.0209) 0.6598*** (0.0430) 0.2294*** (0.0204) -0.0207*** (0.0062) -0.0334*** (0.0025) 0.0003** (0.0000) 0.0425** (0.0156) Yes Yes 0.1340 11793.5 5.6 3,667 109,166 Table 8. Robustness Checks 37 Conclusion Although the mobile apps are now penetrating many aspects of our lives, the research studies on this field are few. Despite the fact that the copycat issue has been emerging for a while, little is known about the causes and effects. How are copycat apps defined and identified? What economic impact will copycats bring to the original apps? From the original app’s point of view, understanding the causes and effects of copycats can help find better combating or accommodating strategies. From the platform’s point of view, the sustainable and healthy development of the app store depends on the appropriate policy of whether to regulate the copycat followers. This motivates the need to have a deeper understanding of the copycat phenomenon in the setting of mobile apps. One interesting observation is that the degree of imitation varies from different types of copycats. While some copycat apps tend to retain distinctive appearances, some others have very similar name and look to the original apps. Therefore, non-deceptive copycats are likely to be differentiated versions of the original product, but deceptive copycats are likely to deceive the consumers and free-ride the popularity of the original app. For a given original app, some copycat apps can mimic the functionality very well or even improve the quality and user experience, whereas other copycat apps are shady imitators. In this paper, we propose an automatic machine learning approach to identify copycats and original apps using unstructured textual and graphical information. We verify the accuracy of the proposed method using Amazon Mechanical Turk as the golden external benchmark. We find strong evidence that the proportion of copycat apps in the new app release has increased dramatically in the action game genre during the past 5 years. When we further divide the copycat apps into deceptive and non-deceptive, we find that the deceptive and non-deceptive 38 copycat apps are systematically different in multiple dimensions. For example, non-deceptive apps have higher downloads, higher consumer ratings, larger number of consumer ratings, higher ratio of free apps, lower price, longer textual descriptions, more screenshots, etc. on average compared with the deceptive ones. Then we conduct the economic analysis on the consequence of copycat apps by analyzing the potential sales cannibalization. In particular, we examine two countervailing effects that copycat apps might have. On one hand, copycat apps compete with the original apps by provide similar functionality and perhaps lower priced product. On the other hand, copycat apps can help increase the awareness of the group of similar apps so that the original apps can be taken into consideration more often. Interestingly, we find that both the relative quality and the appearance similarity affect the dominant effect. For highly rated copycat apps, they tend to compete with the original ones and switch consumers to their products. For lowly rated copycat apps, they tend to stimulate the sales of the original apps. Similarly, for deceptive and non-deceptive copycat apps, we find that non-deceptive ones have dominant significant cannibalization effects on the original sales. This cannibalization effects majorly come from high quality non-deceptive copycat apps instead of the low quality non-deceptive ones. Moreover, for deceptive copycat apps, the effect tends to be in both directions. High quality deceptive copycats are likely to substitute the download of original apps. But for low quality deceptive copycats, we find a positive association between the copycat download and original download. It once again supports the possibility of positive spillover effect of copycat sales. Our paper also has some limitations, which could act as fruitful areas for future research. First, although we have considered the causes and effects of copycat apps, we haven’t explored the optimal combating strategy or the overall welfare impact of the copycat apps. These questions 39 are also important to the original developers and the platform owners. Second, we haven’t answered whether imitation can hinder or encourage subsequent innovation on this market. It’ll be good to explore the dynamic relationship between imitation and innovation. Third, ideally we should use the actual download quantities to conduct analysis. But due to the data limitation on the actual download quantities, we calibrate the download quantity from the download rank. Therefore, we have to impose a few assumptions such as the overall market size of the action game market is constant over time. However, such assumptions may not hold in reality. Despite these limitations, our paper is the first study that provides machine learning combined with economic analysis approach to analyze copycats and original’ behavior in the context of mobile apps, which help both practitioners and researchers to better understand this rapid growing industry. We hope our work can pave the way for future research on this important area. 40 References Aggarwal, C., Zhai, C., 2012. A survey of text clustering algorithms. Mining Text Data. 77-128. Appannie.com, 2014, App Annie index – market Q1 2014: revenue soars in the United States and China, http://blog.appannie.com/app-annie-index-market-q1-2014/ Appfreak, 2014, The ultimate guide to App Store optimization in 2014, http://www.appfreak.net/ultimate-guide-app-store-optimization-2014/ Apple Press info, 2013, App Store sales top $10 billion in 2013, http://www.apple.com/pr/library/2014/01/07App-Store-Sales-Top-10-Billion-in 2013.html?sr=hotnews.rss Apptamin, 2013, App Store Optimization (ASO): improve your app description, http://www.apptamin.com/blog/app-store-optimization-app-description/ Angrist,J., and Krueger., A. 2001. Instrumental variables and the search for identification: from supply and demand to natural experiments. Journal of Economic Perspectives, 15(4): 69-85. Archak, N., Ghose, A., Ipeirotis, P., 2011. Deriving the pricing power of product features by mining consumer reviews. Management Science. 57(8). 1485-1509. Bessen. J. and Maskin E. 2009. Sequential innovation, patents, and imitation. RAND Journal of Economics. Vol 40, No. 4. Pp. 611-635. Biais, B. and E. Perotti. 2008, Entrepreneurs and new ideas, RAND Journal of Economics, Vol. 39, No. 4, Winter 2008 pp. 1105-1125. Bloomberg Business, 2014, Apple Users Spent $10 Billion on Apps in 2013, http://www.businessweek.com/articles/2014-01-07/apple-users-spent-10-billion-on-apps-in2013 Bluecloud, 2013, App Store Optimization 101: the ultimate checklist, http://www.bluecloudsolutions.com/articles/app-store-optimization-101-ultimate-checklist/ Brohee, S., Helden, J. 2006. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 7. Burtch. G., Ghose. A., Wattal. S. 2013. An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. 24(3). 499-519. Carare, O. 2012. The impact of bestseller rank on demand: evidence from the app market. International Economic Review. 53(3), 717-742. Castro, J., D. Balkin, and D. Shepherd., 2008, Can entrepreneurial firms benefit from product piracy? Journal of Business Venturing, 23 (1): 75-90. Cho,S., Fang, X., Tayur, S. 2013. Combating strategies counterfeiters in licit and illicit supply chains. Working paper. Conner. K., 1995. Obtaining Strategic Advantage from Being Imitated: When Can Encouraging “Clones” Pay? Management Science. 41(2): 209-225. Danaher, B., Smith, M., Telang, R., Chen, S., 2014. The effect of graduated response anti-piracy laws on music sales: evidence from an event study in France. Journal of Industrial Economics. Forthcoming. Distimo, 2013. What Is Needed For Top Positions In The App Stores? http://www.distimo.com/publications Dongen, S., 2000. Graph clustering by flow simulation. PhD thesis. University of Utrecht. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95, 14863–14868. 41 Elmagarmid, A., Ipeirotis, P., Verykios, V., 2007. Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering. 19(1). Flurry, 2014, Apps Solidify Leadership Six Years into the Mobile Revolution, http://www.flurry.com/bid/109749/Apps-Solidify-Leadership-Six-Years-into-the-MobileRevolution#.VCuYffkapjc. Garg, R., Telang, R. 2013. Estimating app demand from publicly available data. MIS Quartlerly. 37(4),1253-1264. Gartner 2013. Gartner says mobile app stores will see annual downloads reach 102 billion in 2013, http://www.gartner.com/newsroom/id/2592315. Ghose. A., Ipeirotis, P, Li, B., 2012. Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science. 31(3). 493520. Ghose, A., Han, S. 2014. Estimating demand for mobile applications in the new economy. Management Science, Forthcoming. Gosline, R. 2010. Counterfeit labels: good for luxury brands? Forbes, 2/12/2010. Grossman, G., Shapiro, C., 1988a. Counterfeit-product Trade. American Economic Review. 78(1).59-75. Grossman, G., Shapiro, C., 1988b. Foreign counterfeiting of status goods. The Quarterly Journal of Economics. 103(1).79-100. Gu, B., V. Mahajan. 2005. How much anti-piracy effort is too much? A study of the global software industry. Working paper. Hausman, J. A. 1996. Valuation of new goods under perfect and imperfect competition, in T. F. Bresnahan and R. Gordon, eds., The Economics of New Goods, Studies in Income and Wealth. 58. NBER. Heer. J. and Bostock.M. 2010. Crowdsourcing Graphical Perception: Using Mechanical Turk to Access Visualization Design. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Pp. 203-212. Hu, M., and Liu, B. 2004. “Mining and summarizing customer reviews,” Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining: ACM, pp. 168-177. Huffington Post, 2013. The Evolution Of An Underground Copycat App Environment. http://www.huffingtonpost.com/himanshu-sareen/post_5236_b_3647228.html Jain. S. 2008. Digital Piracy: A Competitive Analysis. Marketing Science. 27(4): 610-626. Kim, J., Lee, H., 2012. Efficient exact similarity searches using multiple token orderings. IEEE Landauer, T., Laham, D., Foltz, P., 1998. Learning human-like knowledge by singular value decomposition: a progress report. Advances in Neural Information Processing Systems. Larsen, B., and Aone, C. 1999. “Fast and effective text mining using linear-time document clustering”. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discover and Data Mining. Pp. 16-22. Liu, C., Au, Y., Choi, H. 2012. An empirical study of the freemium strategy for mobile apps: evidence from the Google Play market. ICIS. Lowe, D., 1999. Object recognition from local scale-invariant features. The Proceedings of the Seventh IEEE International Conference on Computer Vision. Vol2. 1150-1157. MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. Pp 281-297. 42 Mikolajczyk. K., Schmid. C., 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol 27. 1615-1630. Nelson and Winter. 2009. An Evolutionary Theory of Economic Change. Harvard University Press. Nguyen 2013. Evolving competitive dynamics of the global mobile telecommunication industry in 2012 and beyond. Stanford business case. NYTimes. 2014. Flappy Bird copycats keep on flapping. http://bits.blogs.nytimes.com/2014/02/24/flappy-bird-copycats-keep-onflapping/?_php=true&_type=blogs&_r=0 Oberholzer-Gee, F., Strumpf, K. 2007 The effect of file sharing on record sales: an empirical analysis. Journal of Political Economy. 115(1). 1-42. PandoDaily, 2014, Why copycats are the best thing to happen to your company, http://pando.com/2014/02/19/why-copycats-are-the-best-thing-to-happen-to-your-company Qian, Y. 2008. Impacts of entry by counterfeiters. The Quarterly Journal of Economics. 123(4).1577-1609. Qian, Y. 2011. Counterfeiters: foes or friends? How do counterfeits affect different product quality tiers? Working paper. Satuluri, V., Parthasarathy. S., Ucar. D., 2010. Markov clustering of protein interaction networks with improved balance and scalability. ACM-BCB. pp 247-256. Siegfried. J., Evans. L, 1994. Empirical studies of entry and exit: a survey of the evidence. Review of Industrial Organization. 9(2). 121-155. Smith. M., Telang. R., 2009. Competing with free: the impact of movie broadcasts on DVD sales and internet piracy. MIS Quarterly 33(2). 321-338. TheGuardian, 2012. Should Apple take more action against march of the iOS clones? http://www.theguardian.com/technology/appsblog/2012/feb/03/apps-apple?newsfeed=true Technobuffalo, 2014, This is the biggest problem with mobile gaming today. http://www.technobuffalo.com/2014/04/23/biggest-problem-with-mobile-gaming/ Villas-Boas, J., Winer, R. 1999. Endogeneity in brand choice models. Management Science. 45(10). Pp 1324-1338. 43