Version Control Systems What is a “version control system”? Version control systems are a category of software tools that helps in recording changes made to files by keeping a track of modifications done in the code. Why Version Control system is so Important? As we know that a software product is developed in collaboration by a group of developers they might be located at different locations and each one of them contributes to some specific kind of functionality/features. So in order to contribute to the product, they made modifications to the source code(either by adding or removing). A version control system is a kind of software that helps the developer team to efficiently communicate and manage(track) all the changes that have been made to the source code along with the information like who made and what changes have been made. A separate branch is created for every contributor who made the changes and the changes aren’t merged into the original source code unless all are analyzed as soon as the changes are green signaled they merged to the main source code. It not only keeps source code organized but also improves productivity by making the development process smooth. Basically Version control system keeps track on changes made on a particular software and take a snapshot of every modification. Let’s suppose if a team of developer add some new functionalities in an application and the updated version is not working properly so as the version control system keeps track of our work so with the help of version control system we can omit the new changes and continue with the previous version. Benefits of the version control system: Enhances the project development speed by providing efficient collaboration, Leverages the productivity, expedites product delivery, and skills of the employees through better communication and assistance, Reduce possibilities of errors and conflicts meanwhile project development through traceability to every small change, Employees or contributors of the project can contribute from anywhere irrespective of the different geographical locations through this VCS, For each different contributor to the project, a different working copy is maintained and not merged to the main file unless the working copy is validated. The most popular example is Git, Helix core, Microsoft TFS, Helps in recovery in case of any disaster or contingent situation, Informs us about Who, What, When, Why changes have been made. Use of Version Control System: A repository: It can be thought of as a database of changes. It contains all the edits and historical versions (snapshots) of the project. Copy of Work (sometimes called as checkout): It is the personal copy of all the files in a project. You can edit to this copy, without affecting the work of others and you can finally commit your changes to a repository when you are done making your changes. Working in a group: Consider yourself working in a company where you are asked to work on some live project. You can’t change the main code as it is in production, and any change may cause inconvenience to the user, also you are working in a team so you need to collaborate with your team to and adapt their changes. Version control helps you with the, merging different requests to main repository without making any undesirable changes. You may test the functionalities without putting it live, and you don’t need to download and set up each time, just pull the changes and do the changes, test it and merge it back. It may be visualized as. Types of Version Control Systems: Local Version Control Systems Centralized Version Control Systems Distributed Version Control Systems Local Version Control Systems: It is one of the simplest forms and has a database that kept all the changes to files under revision control. RCS is one of the most common VCS tools. It keeps patch sets (differences between files) in a special format on disk. By adding up all the patches it can then re-create what any file looked like at any point in time. Centralized Version Control Systems: Centralized version control systems contain just one repository globally and every user need to commit for reflecting one’s changes in the repository. It is possible for others to see your changes by updating. Two things are required to make your changes visible to others which are: You commit They update The benefit of CVCS (Centralized Version Control Systems) makes collaboration amongst developers along with providing an insight to a certain extent on what everyone else is doing on the project. It allows administrators to fine-grained control over who can do what. It has some downsides as well which led to the development of DVS. The most obvious is the single point of failure that the centralized repository represents if it goes down during that period collaboration and saving versioned changes is not possible. What if the hard disk of the central database becomes corrupted, and proper backups haven’t been kept? You lose absolutely everything. Distributed Version Control Systems: Distributed version control systems contain multiple repositories. Each user has their own repository and working copy. Just committing your changes will not give others access to your changes. This is because commit will reflect those changes in your local repository and you need to push them in order to make them visible on the central repository. Similarly, When you update, you do not get others’ changes unless you have first pulled those changes into your repository. To make your changes visible to others, 4 things are required: You commit You push They pull They update The most popular distributed version control systems are Git, and Mercurial. They help us overcome the problem of single point of failure. Purpose of Version Control: Multiple people can work simultaneously on a single project. Everyone works on and edits their own copy of the files and it is up to them when they wish to share the changes made by them with the rest of the team. It also enables one person to use multiple computers to work on a project, so it is valuable even if you are working by yourself. It integrates the work that is done simultaneously by different members of the team. In some rare cases, when conflicting edits are made by two people to the same line of a file, then human assistance is requested by the version control system in deciding what should be done. Version control provides access to the historical versions of a project. This is insurance against computer crashes or data loss. If any mistake is made, you can easily roll back to a previous version. It is also possible to undo specific edits that too without losing the work done in the meanwhile. It can be easily known when, why, and by whom any part of a file was edited. Centralized vs Distributed Version Control: Which One Should We Choose? Difficulty Level : Easy Last Updated : 13 Sep, 2021 Many of us are aware of version control when it comes to work with multiple developers on a single project and collaborate with them. There is no doubt that version control makes developers work more easily and fast. In most of the organization, developers use either Centralized Version Control System(CVCS) like Subversion(SVN) or Concurrent Version System(CVS) or Distributed Version Control System(DVCS) like Git (Written in C), Mercurial (Written in Python) or Bazaar (Written in Python). Now come to the point, which one is best or which one we need to choose? We will compare each one’s workflow, learning curve, security, popularity, and other aspects. Firstly we need to break a myth that most beginners have about DVCS is that “There is no central version in the code or no master branch.” That’s not true, In DVCS there is also a master branch or central version in the code but it works in a different way than centralized source control. Let’s go through the overview of both version control systems. Centralized Version Control System In centralized source control, there is a server and a client. The server is the master repository that contains all of the versions of the code. To work on any project, firstly user or client needs to get the code from the master repository or server. So the client communicates with the server and pulls all the code or current version of the code from the server to their local machine. In other terms we can say, you need to take an update from the master repository and then you get the local copy of the code in your system. So once you get the latest version of the code, you start making your own changes in the code and after that, you simply need to commit those changes straight forward into the master repository. Committing a change simply means merging your own code into the master repository or making a new version of the source code. So everything is centralized in this model. There will be just one repository and that will contain all the history or version of the code and different branches of the code. So the basic workflow involves in the centralized source control is getting the latest version of the code from a central repository that will contain other people’s code as well, making your own changes in the code, and then committing or merging those changes into the central repository. Distributed Version Control System In distributed version control most of the mechanism or model applies the same as centralized. The only major difference you will find here is, instead of one single repository which is the server, here every single developer or client has their own server and they will have a copy of the entire history or version of the code and all of its branches in their local server or machine. Basically, every client or user can work locally and disconnected which is more convenient than centralized source control and that’s why it is called distributed. You don’t need to rely on the central server, you can clone the entire history or copy of the code to your hard drive. So when you start working on a project, you clone the code from the master repository in your own hard drive, then you get the code from your own repository to make changes and after doing changes, you commit your changes to your local repository and at this point, your local repository will have ‘change sets‘ but it is still disconnected with the master repository (master repository will have different ‘sets of changes‘ from each and every individual developer’s repository), so to communicate with it, you issue a request to the master repository and push your local repository code to the master repository. Getting the new change from a repository is called “pulling” and merging your local repository’s ‘set of changes’ is called “pushing“. It doesn’t follow the way of communicating or merging the code straight forward to the master repository after making changes. Firstly you commit all the changes in your own server or repository and then the ‘set of changes’ will merge to the master repository. Below is the diagram to understand the difference between these two in a better way: Basic Difference with Pros and Cons Centralized version control is easier to learn than distributed. If you are a beginner you’ll have to remember all the commands for all the operations in DVCS and working on DVCS might be confusing initially. CVCS is easy to learn and easy to set up. DVCS has the biggest advantage in that it allows you to work offline and gives flexibility. You have the entire history of the code in your own hard drive, so all the changes you will be making in your own server or to your own repository which doesn’t require an internet connection, but this is not in the case of CVCS. DVCS is faster than CVCS because you don’t need to communicate with the remote server for each and every command. You do everything locally which gives you the benefit to work faster than CVCS. Working on branches is easy in DVCS. Every developer has an entire history of the code in DVCS, so developers can share their changes before merging all the ‘sets of changes to the remote server. In CVCS it’s difficult and time-consuming to work on branches because it requires to communicate with the server directly. If the project has a long history or the project contain large binary files, in that case, downloading the entire project in DVCS can take more time and space than usual, whereas in CVCS you just need to get few lines of code because you don’t need to save the entire history or complete project in your own server so there is no requirement for additional space. If the main server goes down or it crashes in DVCS, you can still get the backup or entire history of the code from your local repository or server where the full revision of the code is already saved. This is not in the case of CVCS, there is just a single remote server that has entire code history. Merge conflicts with other developer’s code are less in DVCS. Because every developer work on their own piece of code. Merge conflicts are more in CVCS in comparison to DVCS. In DVCS, sometimes developers take the advantage of having the entire history of the code and they may work for too long in isolation which is not a good thing. This is not in the case of CVCS. Conclusion: Let’s see the popularity of DVCS and CVCS across the world. Image Source: Google Trends From Google Trends and all the above points, it’s clear that DVCS has more advantages and it’s more popular than CVCS, but if we need to talk about choosing a version control, so it also depends on which one is more convenient for you to learn as a beginner. You can choose any one of them but DVCS gives more benefit once you just go with the flow of using its commands. Comparison – Centralized, Decentralized and Distributed Systems Difficulty Level : Easy Last Updated : 02 Dec, 2022 In this article, we will try to understand and compare different aspects of centralized, decentralized, and distributed systems. 1. CENTRALIZED SYSTEMS: We start with centralized systems because they are the most intuitive and easy to understand and define. Centralized systems are systems that use client/server architecture where one or more client nodes are directly connected to a central server. This is the most commonly used type of system in many organizations where a client sends a request to a company server and receives the response. Figure – Centralized system visualization Example – Wikipedia. Consider a massive server to which we send our requests and the server responds with the article that we requested. Suppose we enter the search term ‘junk food’ in the Wikipedia search bar. This search term is sent as a request to the Wikipedia servers (mostly located in Virginia, U.S.A) which then responds back with the articles based on relevance. In this situation, we are the client node, Wikipedia servers are the central server. Characteristics of Centralized System – Presence of a global clock: As the entire system consists of a central node(a server/ a master) and many client nodes(a computer/ a slave), all client nodes sync up with the global clock(the clock of the central node). One single central unit: One single central unit which serves/coordinates all the other nodes in the system. Dependent failure of components: Central node failure causes the entire system to fail. This makes sense because when the server is down, no other entity is there to send/receive responses/requests. Scaling – Only vertical scaling on the central server is possible. Horizontal scaling will contradict the single central unit characteristic of this system of a single central entity. Components of Centralized System – Components of Centralized System are, Node (Computer, Mobile, etc.). Server. Communication link (Cables, Wi-Fi, etc.). Architecture of Centralized System – Client-Server architecture. The central node that serves the other nodes in the system is the server node and all the other nodes are the client nodes. Limitations of Centralized System – Can’t scale up vertically after a certain limit – After a limit, even if you increase the hardware and software capabilities of the server node, the performance will not increase appreciably leading to a cost/benefit ratio < 1. Bottlenecks can appear when the traffic spikes – as the server can only have a finite number of open ports to which can listen to connections from client nodes. So, when high traffic occurs like a shopping sale, the server can essentially suffer a Denial-of-Service attack or Distributed Denialof-Service attack. Advantages of Centralized System – Easy to physically secure. It is easy to secure and service the server and client nodes by virtue of their location Smooth and elegant personal experience – A client has a dedicated system which he uses(for example, a personal computer) and the company has a similar system which can be modified to suit custom needs Dedicated resources (memory, CPU cores, etc) More cost-efficient for small systems up to a certain limit – As the central systems take fewer funds to set up, they have an edge when small systems have to be built Quick updates are possible – Only one machine to update. Easy detachment of a node from the system. Just remove the connection of the client node from the server and voila! Node detached. Disadvantages of Centralized System – Highly dependent on the network connectivity – The system can fail if the nodes lose connectivity as there is only one central node. No graceful degradation of the system – abrupt failure of the entire system Less possibility of data backup. If the server node fails and there is no backup, you lose the data straight away Difficult server maintenance – There is only one server node and due to availability reasons, it is inefficient and unprofessional to take the server down for maintenance. So, updates have to be done on-the-fly(hot updates) which is difficult and the system could break. Applications of Centralized System – Application development – Very easy to set up a central server and send client requests. Modern technology these days do come with default test servers which can be launched with a couple of commands. For example, Express server, Django server. Data analysis – Easy to do data analysis when all the data is in one place and available for analysis Personal computing Use Cases – Centralized databases – all the data in one server for use. Single-player games like Need For Speed, GTA Vice City – an entire game in one system(commonly, a Personal Computer) Application development by deploying test servers leading to easy debugging, easy deployment, easy simulation Personal Computers Organizations Using – National Informatics Center (India), IBM 2. DECENTRALIZED SYSTEMS: These are other types of systems that have been gaining a lot of popularity, primarily because of the massive hype of Bitcoin. Now many organizations are trying to find the application of such systems. In decentralized systems, every node makes its own decision. The final behavior of the system is the aggregate of the decisions of the individual nodes. Note that there is no single entity that receives and responds to the request. Figure – Decentralized system visualization Example – Bitcoin. Let’s take Bitcoin for example because it is the most popular use case of decentralized systems. No single entity/organization owns the bitcoin network. The network is a sum of all the nodes who talk to each other for maintaining the amount of bitcoin every account holder has. Characteristics of Decentralized System – Lack of a global clock: Every node is independent of each other and hence, has different clocks that they run and follow. Multiple central units (Computers/Nodes/Servers): More than one central unit which can listen for connections from other nodes Dependent failure of components: one central node failure causes a part of the system to fail; not the whole system Scaling – Vertical scaling is possible. Each node can add resources(hardware, software) to itself to increase the performance leading to an increase in the performance of the entire system. Components – Components of Decentralized System are, Node (Computer, Mobile, etc.) Communication link (Cables, Wi-Fi, etc.) Architecture of Decentralized System – peer-to-peer architecture – all nodes are peers of each other. No one node has supremacy over other nodes master-slave architecture – One node can become a master by voting and help in coordinating of a part of the system but this does not mean the node has supremacy over the other node which it is coordinating Limitations of Decentralized System – May lead to the problem of coordination at the enterprise level – When every node is the owner of its own behavior, its difficult to achieve collective tasks Not suitable for small systems – Not beneficial to build and operate small decentralized systems because of the low cost/benefit ratio No way to regulate a node on the system – no superior node overseeing the behavior of subordinate nodes Advantages of Decentralized System – Minimal problem of performance bottlenecks occurring – The entire load gets balanced on all the nodes; leading to minimal to no bottleneck situations High availability – Some nodes(computers, mobiles, servers) are always available/online for work, leading to high availability More autonomy and control over resources – As each node controls its own behavior, it has better autonomy leading to more control over resources Disadvantages of Decentralized System – Difficult to achieve global big tasks – No chain of command to command others to perform certain tasks No regulatory oversight Difficult to know which node failed – Each node must be pinged for availability checking and partitioning of work has to be done to actually find out which node failed by checking the expected output with what the node generated Difficult to know which node responded – When a request is served by a decentralized system, the request is actually served by one of the nodes in the system but it is actually difficult to find out which node indeed served the request. Applications of Decentralized System – Private networks – peer nodes joined with each other to make a private network. Cryptocurrency – Nodes joined to become a part of a system in which digital currency is exchanged without any trace and location of who sent what to whom. However, in bitcoin, we can see the public address and amount of bitcoin transferred, but those public addresses are mutable and hence difficult to trace. Use Cases – Blockchain Decentralized databases – Entire databases split into parts and distributed to different nodes for storage and use. For example, records with names starting from ‘A’ to ‘K’ in one node, ‘L’ to ‘N’ in the second node, and ‘O’ to ‘Z’ in the third node Cryptocurrency Organizations Using – Bitcoin, Tor network 3. DISTRIBUTED SYSTEMS: A distributed system is a collection of computer programs that utilize computational resources across multiple, separate computation nodes to achieve a common, shared goal. Also known as distributed computing or distributed databases, it relies on separate nodes to communicate and synchronize over a common network. These nodes typically represent separate physical hardware devices but can also represent separate software processes, or other recursive encapsulated systems. Distributed systems aim to remove bottlenecks or central points of failure from a system. Example : Google search system. Each request is worked upon by hundreds of computers that crawl the web and return the relevant results. To the user, Google appears to be one system, but it actually is multiple computers working together to accomplish one single task (return the results to the search query). Characteristics of Distributed System: Resource sharing: A distributed system can share hardware, software, or data Simultaneous processing: Multiple machines can process the same function simultaneously Scalability: The computing and processing capacity can scale up as needed when extended to additional machines Error detection: Failures can be more easily detected Transparency: A node can access and communicate with other nodes in the system Components of Distributed System : The components of Distributed System are, Node (Computer, Mobile, etc.) A communication link (Cables, Wi-Fi, etc.) The architecture of Distributed System – peer-to-peer – all nodes are peers of each other and work towards a common goal client-server – some nodes become server nodes for the role of coordinator, arbiter, etc. n-tier architecture – different parts of an application are distributed in different nodes of the systems and these nodes work together to function as an application for the user/client Limitations of Distributed System – Difficult to design and debug algorithms for the system. These algorithms are difficult because of the absence of a common clock; so no temporal ordering of commands/logs can take place. Nodes can have different latencies which have to be kept in mind while designing such algorithms. The complexity increases with the increase in the number of nodes. Visit this link for more information No common clock causes difficulty in the temporal ordering of events/transactions Difficult for a node to get the global view of the system and hence take informed decisions based on the state of other nodes in the system Advantages of Distributed System – Low latency than a centralized system – Distributed systems have low latency because of high geographical spread, hence leading to less time to get a response Disadvantages of Distributed System – Difficult to achieve consensus The conventional way of logging events by absolute time they occur is not possible here Applications of Distributed System – Cluster computing – a technique in which many computers are coupled together to work so that they achieve global goals. The computer cluster acts as if they were a single computer Grid computing – All the resources are pooled together for sharing in this kind of computing turning the systems into a powerful supercomputer; essentially. Use Cases – SOA-based systems Multiplayer online games Organizations Using – Apple, Google, Facebook.