Uploaded by Shame Bope

Version Control Systems Loaded

advertisement
Version Control Systems
What is a “version control system”?
Version control systems are a category of software tools that helps in recording changes made to files by
keeping a track of modifications done in the code.
Why Version Control system is so Important?
As we know that a software product is developed in collaboration by a group of developers they might
be located at different locations and each one of them contributes to some specific kind of
functionality/features. So in order to contribute to the product, they made modifications to the source
code(either by adding or removing). A version control system is a kind of software that helps the
developer team to efficiently communicate and manage(track) all the changes that have been made to
the source code along with the information like who made and what changes have been made. A
separate branch is created for every contributor who made the changes and the changes aren’t merged
into the original source code unless all are analyzed as soon as the changes are green signaled they
merged to the main source code. It not only keeps source code organized but also improves productivity
by making the development process smooth.
Basically Version control system keeps track on changes made on a particular software and take a
snapshot of every modification. Let’s suppose if a team of developer add some new functionalities in an
application and the updated version is not working properly so as the version control system keeps track
of our work so with the help of version control system we can omit the new changes and continue with
the previous version.
Benefits of the version control system:

Enhances the project development speed by providing efficient collaboration,

Leverages the productivity, expedites product delivery, and skills of the employees through
better communication and assistance,

Reduce possibilities of errors and conflicts meanwhile project development through traceability
to every small change,

Employees or contributors of the project can contribute from anywhere irrespective of the
different geographical locations through this VCS,

For each different contributor to the project, a different working copy is maintained and not
merged to the main file unless the working copy is validated. The most popular example is Git,
Helix core, Microsoft TFS,

Helps in recovery in case of any disaster or contingent situation,

Informs us about Who, What, When, Why changes have been made.
Use of Version Control System:

A repository: It can be thought of as a database of changes. It contains all the edits and
historical versions (snapshots) of the project.

Copy of Work (sometimes called as checkout): It is the personal copy of all the files in a project.
You can edit to this copy, without affecting the work of others and you can finally commit your
changes to a repository when you are done making your changes.

Working in a group: Consider yourself working in a company where you are asked to work on
some live project. You can’t change the main code as it is in production, and any change may
cause inconvenience to the user, also you are working in a team so you need to collaborate with
your team to and adapt their changes. Version control helps you with the, merging different
requests to main repository without making any undesirable changes. You may test the
functionalities without putting it live, and you don’t need to download and set up each time, just
pull the changes and do the changes, test it and merge it back. It may be visualized as.
Types of Version Control Systems:

Local Version Control Systems

Centralized Version Control Systems

Distributed Version Control Systems
Local Version Control Systems: It is one of the simplest forms and has a database that kept all the
changes to files under revision control. RCS is one of the most common VCS tools. It keeps patch sets
(differences between files) in a special format on disk. By adding up all the patches it can then re-create
what any file looked like at any point in time.
Centralized Version Control Systems: Centralized version control systems contain just one repository
globally and every user need to commit for reflecting one’s changes in the repository. It is possible for
others to see your changes by updating.
Two things are required to make your changes visible to others which are:

You commit

They update
The benefit of CVCS (Centralized Version Control Systems) makes collaboration amongst developers
along with providing an insight to a certain extent on what everyone else is doing on the project. It
allows administrators to fine-grained control over who can do what.
It has some downsides as well which led to the development of DVS. The most obvious is the single
point of failure that the centralized repository represents if it goes down during that period
collaboration and saving versioned changes is not possible. What if the hard disk of the central database
becomes corrupted, and proper backups haven’t been kept? You lose absolutely everything.
Distributed Version Control Systems: Distributed version control systems contain multiple repositories.
Each user has their own repository and working copy. Just committing your changes will not give others
access to your changes. This is because commit will reflect those changes in your local repository and
you need to push them in order to make them visible on the central repository. Similarly, When you
update, you do not get others’ changes unless you have first pulled those changes into your repository.
To make your changes visible to others, 4 things are required:

You commit

You push

They pull

They update
The most popular distributed version control systems are Git, and Mercurial. They help us overcome the
problem of single point of failure.
Purpose of Version Control:

Multiple people can work simultaneously on a single project. Everyone works on and edits their
own copy of the files and it is up to them when they wish to share the changes made by them
with the rest of the team.

It also enables one person to use multiple computers to work on a project, so it is valuable even
if you are working by yourself.

It integrates the work that is done simultaneously by different members of the team. In some
rare cases, when conflicting edits are made by two people to the same line of a file, then human
assistance is requested by the version control system in deciding what should be done.

Version control provides access to the historical versions of a project. This is insurance against
computer crashes or data loss. If any mistake is made, you can easily roll back to a previous
version. It is also possible to undo specific edits that too without losing the work done in the
meanwhile. It can be easily known when, why, and by whom any part of a file was edited.
Centralized vs Distributed Version Control: Which One Should We Choose?

Difficulty Level : Easy

Last Updated : 13 Sep, 2021
Many of us are aware of version control when it comes to work with multiple developers on a single
project and collaborate with them. There is no doubt that version control makes developers work more
easily and fast. In most of the organization, developers use either Centralized Version Control
System(CVCS) like Subversion(SVN) or Concurrent Version System(CVS) or Distributed Version Control
System(DVCS) like Git (Written in C), Mercurial (Written in Python) or Bazaar (Written in Python).
Now come to the point, which one is best or which one we need to choose? We will compare each one’s
workflow, learning curve, security, popularity, and other aspects.
Firstly we need to break a myth that most beginners have about DVCS is that “There is no central version
in the code or no master branch.” That’s not true, In DVCS there is also a master branch or central
version in the code but it works in a different way than centralized source control.
Let’s go through the overview of both version control systems.
Centralized Version Control System
In centralized source control, there is a server and a client. The server is the master repository that
contains all of the versions of the code. To work on any project, firstly user or client needs to get the
code from the master repository or server. So the client communicates with the server and pulls all the
code or current version of the code from the server to their local machine. In other terms we can say,
you need to take an update from the master repository and then you get the local copy of the code in
your system. So once you get the latest version of the code, you start making your own changes in the
code and after that, you simply need to commit those changes straight forward into the master
repository. Committing a change simply means merging your own code into the master repository or
making a new version of the source code. So everything is centralized in this model.
There will be just one repository and that will contain all the history or version of the code and different
branches of the code. So the basic workflow involves in the centralized source control is getting the
latest version of the code from a central repository that will contain other people’s code as well, making
your own changes in the code, and then committing or merging those changes into the central
repository.
Distributed Version Control System
In distributed version control most of the mechanism or model applies the same as centralized. The only
major difference you will find here is, instead of one single repository which is the server, here every
single developer or client has their own server and they will have a copy of the entire history or version
of the code and all of its branches in their local server or machine. Basically, every client or user can
work locally and disconnected which is more convenient than centralized source control and that’s why
it is called distributed.
You don’t need to rely on the central server, you can clone the entire history or copy of the code to your
hard drive. So when you start working on a project, you clone the code from the master repository in
your own hard drive, then you get the code from your own repository to make changes and after doing
changes, you commit your changes to your local repository and at this point, your local repository will
have ‘change sets‘ but it is still disconnected with the master repository (master repository will have
different ‘sets of changes‘ from each and every individual developer’s repository), so to communicate
with it, you issue a request to the master repository and push your local repository code to the master
repository. Getting the new change from a repository is called “pulling” and merging your local
repository’s ‘set of changes’ is called “pushing“.
It doesn’t follow the way of communicating or merging the code straight forward to the master
repository after making changes. Firstly you commit all the changes in your own server or repository and
then the ‘set of changes’ will merge to the master repository.
Below is the diagram to understand the difference between these two in a better way:
Basic Difference with Pros and Cons

Centralized version control is easier to learn than distributed. If you are a beginner you’ll have to
remember all the commands for all the operations in DVCS and working on DVCS might be
confusing initially. CVCS is easy to learn and easy to set up.

DVCS has the biggest advantage in that it allows you to work offline and gives flexibility. You
have the entire history of the code in your own hard drive, so all the changes you will be making
in your own server or to your own repository which doesn’t require an internet connection, but
this is not in the case of CVCS.

DVCS is faster than CVCS because you don’t need to communicate with the remote server for
each and every command. You do everything locally which gives you the benefit to work faster
than CVCS.

Working on branches is easy in DVCS. Every developer has an entire history of the code in DVCS,
so developers can share their changes before merging all the ‘sets of changes to the remote
server. In CVCS it’s difficult and time-consuming to work on branches because it requires to
communicate with the server directly.

If the project has a long history or the project contain large binary files, in that case,
downloading the entire project in DVCS can take more time and space than usual, whereas in
CVCS you just need to get few lines of code because you don’t need to save the entire history or
complete project in your own server so there is no requirement for additional space.

If the main server goes down or it crashes in DVCS, you can still get the backup or entire history
of the code from your local repository or server where the full revision of the code is already
saved. This is not in the case of CVCS, there is just a single remote server that has entire code
history.

Merge conflicts with other developer’s code are less in DVCS. Because every developer work on
their own piece of code. Merge conflicts are more in CVCS in comparison to DVCS.

In DVCS, sometimes developers take the advantage of having the entire history of the code and
they may work for too long in isolation which is not a good thing. This is not in the case of CVCS.
Conclusion: Let’s see the popularity of DVCS and CVCS across the world.
Image Source: Google Trends
From Google Trends and all the above points, it’s clear that DVCS has more advantages and it’s more
popular than CVCS, but if we need to talk about choosing a version control, so it also depends on which
one is more convenient for you to learn as a beginner. You can choose any one of them but DVCS gives
more benefit once you just go with the flow of using its commands.
Comparison – Centralized, Decentralized and Distributed Systems

Difficulty Level : Easy

Last Updated : 02 Dec, 2022
In this article, we will try to understand and compare different aspects of centralized, decentralized, and
distributed systems.
1. CENTRALIZED SYSTEMS:
We start with centralized systems because they are the most intuitive and easy to understand and
define.
Centralized systems are systems that use client/server architecture where one or more client nodes are
directly connected to a central server. This is the most commonly used type of system in many
organizations where a client sends a request to a company server and receives the response.
Figure – Centralized system visualization
Example –
Wikipedia. Consider a massive server to which we send our requests and the server responds with the
article that we requested. Suppose we enter the search term ‘junk food’ in the Wikipedia search bar.
This search term is sent as a request to the Wikipedia servers (mostly located in Virginia, U.S.A) which
then responds back with the articles based on relevance. In this situation, we are the client node,
Wikipedia servers are the central server.
Characteristics of Centralized System –

Presence of a global clock: As the entire system consists of a central node(a server/ a master)
and many client nodes(a computer/ a slave), all client nodes sync up with the global clock(the
clock of the central node).

One single central unit: One single central unit which serves/coordinates all the other nodes in
the system.

Dependent failure of components: Central node failure causes the entire system to fail. This
makes sense because when the server is down, no other entity is there to send/receive
responses/requests.
Scaling –
Only vertical scaling on the central server is possible. Horizontal scaling will contradict the single central
unit characteristic of this system of a single central entity.
Components of Centralized System –
Components of Centralized System are,

Node (Computer, Mobile, etc.).

Server.

Communication link (Cables, Wi-Fi, etc.).
Architecture of Centralized System –
Client-Server architecture. The central node that serves the other nodes in the system is the server node
and all the other nodes are the client nodes.
Limitations of Centralized System –

Can’t scale up vertically after a certain limit – After a limit, even if you increase the hardware
and software capabilities of the server node, the performance will not increase appreciably
leading to a cost/benefit ratio < 1.

Bottlenecks can appear when the traffic spikes – as the server can only have a finite number of
open ports to which can listen to connections from client nodes. So, when high traffic occurs like
a shopping sale, the server can essentially suffer a Denial-of-Service attack or Distributed Denialof-Service attack.
Advantages of Centralized System –

Easy to physically secure. It is easy to secure and service the server and client nodes by virtue of
their location

Smooth and elegant personal experience – A client has a dedicated system which he uses(for
example, a personal computer) and the company has a similar system which can be modified to
suit custom needs

Dedicated resources (memory, CPU cores, etc)

More cost-efficient for small systems up to a certain limit – As the central systems take fewer
funds to set up, they have an edge when small systems have to be built

Quick updates are possible – Only one machine to update.

Easy detachment of a node from the system. Just remove the connection of the client node
from the server and voila! Node detached.
Disadvantages of Centralized System –

Highly dependent on the network connectivity – The system can fail if the nodes lose
connectivity as there is only one central node.

No graceful degradation of the system – abrupt failure of the entire system

Less possibility of data backup. If the server node fails and there is no backup, you lose the data
straight away

Difficult server maintenance – There is only one server node and due to availability reasons, it is
inefficient and unprofessional to take the server down for maintenance. So, updates have to be
done on-the-fly(hot updates) which is difficult and the system could break.
Applications of Centralized System –

Application development – Very easy to set up a central server and send client requests.
Modern technology these days do come with default test servers which can be launched with a
couple of commands. For example, Express server, Django server.

Data analysis – Easy to do data analysis when all the data is in one place and available for
analysis

Personal computing
Use Cases –

Centralized databases – all the data in one server for use.

Single-player games like Need For Speed, GTA Vice City – an entire game in one
system(commonly, a Personal Computer)

Application development by deploying test servers leading to easy debugging, easy deployment,
easy simulation

Personal Computers
Organizations Using –
National Informatics Center (India), IBM
2. DECENTRALIZED SYSTEMS:
These are other types of systems that have been gaining a lot of popularity, primarily because of the
massive hype of Bitcoin. Now many organizations are trying to find the application of such systems.
In decentralized systems, every node makes its own decision. The final behavior of the system is the
aggregate of the decisions of the individual nodes. Note that there is no single entity that receives and
responds to the request.
Figure – Decentralized system visualization
Example –
Bitcoin. Let’s take Bitcoin for example because it is the most popular use case of decentralized systems.
No single entity/organization owns the bitcoin network. The network is a sum of all the nodes who talk
to each other for maintaining the amount of bitcoin every account holder has.
Characteristics of Decentralized System –

Lack of a global clock: Every node is independent of each other and hence, has different clocks
that they run and follow.

Multiple central units (Computers/Nodes/Servers): More than one central unit which can listen
for connections from other nodes

Dependent failure of components: one central node failure causes a part of the system to fail;
not the whole system
Scaling –
Vertical scaling is possible. Each node can add resources(hardware, software) to itself to increase the
performance leading to an increase in the performance of the entire system.
Components –
Components of Decentralized System are,

Node (Computer, Mobile, etc.)

Communication link (Cables, Wi-Fi, etc.)
Architecture of Decentralized System –

peer-to-peer architecture – all nodes are peers of each other. No one node has supremacy over
other nodes

master-slave architecture – One node can become a master by voting and help in coordinating
of a part of the system but this does not mean the node has supremacy over the other node
which it is coordinating
Limitations of Decentralized System –

May lead to the problem of coordination at the enterprise level – When every node is the owner
of its own behavior, its difficult to achieve collective tasks

Not suitable for small systems – Not beneficial to build and operate small decentralized systems
because of the low cost/benefit ratio

No way to regulate a node on the system – no superior node overseeing the behavior of
subordinate nodes
Advantages of Decentralized System –

Minimal problem of performance bottlenecks occurring – The entire load gets balanced on all
the nodes; leading to minimal to no bottleneck situations

High availability – Some nodes(computers, mobiles, servers) are always available/online for
work, leading to high availability

More autonomy and control over resources – As each node controls its own behavior, it has
better autonomy leading to more control over resources
Disadvantages of Decentralized System –

Difficult to achieve global big tasks – No chain of command to command others to perform
certain tasks

No regulatory oversight

Difficult to know which node failed – Each node must be pinged for availability checking and
partitioning of work has to be done to actually find out which node failed by checking the
expected output with what the node generated

Difficult to know which node responded – When a request is served by a decentralized system,
the request is actually served by one of the nodes in the system but it is actually difficult to find
out which node indeed served the request.
Applications of Decentralized System –

Private networks – peer nodes joined with each other to make a private network.

Cryptocurrency – Nodes joined to become a part of a system in which digital currency is
exchanged without any trace and location of who sent what to whom. However, in bitcoin, we
can see the public address and amount of bitcoin transferred, but those public addresses are
mutable and hence difficult to trace.
Use Cases –

Blockchain

Decentralized databases – Entire databases split into parts and distributed to different nodes for
storage and use. For example, records with names starting from ‘A’ to ‘K’ in one node, ‘L’ to ‘N’
in the second node, and ‘O’ to ‘Z’ in the third node

Cryptocurrency
Organizations Using –
Bitcoin, Tor network
3. DISTRIBUTED SYSTEMS:
A distributed system is a collection of computer programs that utilize computational resources across
multiple, separate computation nodes to achieve a common, shared goal. Also known as distributed
computing or distributed databases, it relies on separate nodes to communicate and synchronize over a
common network. These nodes typically represent separate physical hardware devices but can also
represent separate software processes, or other recursive encapsulated systems. Distributed systems
aim to remove bottlenecks or central points of failure from a system.
Example :
Google search system. Each request is worked upon by hundreds of computers that crawl the web and
return the relevant results. To the user, Google appears to be one system, but it actually is multiple
computers working together to accomplish one single task (return the results to the search query).
Characteristics of Distributed System:

Resource sharing: A distributed system can share hardware, software, or data

Simultaneous processing: Multiple machines can process the same function simultaneously

Scalability: The computing and processing capacity can scale up as needed when extended to
additional machines

Error detection: Failures can be more easily detected

Transparency: A node can access and communicate with other nodes in the system
Components of Distributed System :
The components of Distributed System are,

Node (Computer, Mobile, etc.)

A communication link (Cables, Wi-Fi, etc.)
The architecture of Distributed System –

peer-to-peer – all nodes are peers of each other and work towards a common goal

client-server – some nodes become server nodes for the role of coordinator, arbiter, etc.

n-tier architecture – different parts of an application are distributed in different nodes of the
systems and these nodes work together to function as an application for the user/client
Limitations of Distributed System –

Difficult to design and debug algorithms for the system. These algorithms are difficult because of
the absence of a common clock; so no temporal ordering of commands/logs can take place.
Nodes can have different latencies which have to be kept in mind while designing such
algorithms. The complexity increases with the increase in the number of nodes. Visit this link for
more information

No common clock causes difficulty in the temporal ordering of events/transactions

Difficult for a node to get the global view of the system and hence take informed decisions
based on the state of other nodes in the system
Advantages of Distributed System –

Low latency than a centralized system – Distributed systems have low latency because of high
geographical spread, hence leading to less time to get a response
Disadvantages of Distributed System –

Difficult to achieve consensus

The conventional way of logging events by absolute time they occur is not possible here
Applications of Distributed System –

Cluster computing – a technique in which many computers are coupled together to work so that
they achieve global goals. The computer cluster acts as if they were a single computer

Grid computing – All the resources are pooled together for sharing in this kind of computing
turning the systems into a powerful supercomputer; essentially.
Use Cases –

SOA-based systems

Multiplayer online games
Organizations Using –
Apple, Google, Facebook.
Download