Introduction to Data Structures Dr. Joydeep Chandra Associate Professor Department of Computer Science and Engineering Indian Institute of Technology Patna Data Structures Dr. Joydeep Chandra, IIT Patna Spring Semester 2025 About The Course Evaluation : • Theory 65% • Quizzes 20% • Midsem 30% • Endsem 50% • Lab 35% • Class Assignments 20% • Midsem 30% • End Sem 50% Course link : Find in www.iitp.ac.in/~joydeep/courses.html Or directly http://172.16.1.252/~joydeep/CS1201_DataStruct/ Lead Teaching Assistants • Medhashree Ghosh (medhasree_2121cs05@iitp.ac.in) • Kartik Kaushik (kartik_2221cs32@iitp.ac.in) • Sandip Kumar (sandeep_2121cs29@iitp.ac.in) TextBooks • Data Structures using C and C++ by Yedidyah Langsam, Moshe J. Augenstein and Aaron M. Tenenbaum • Data Structures and Algorithms by Aho, Ullman, and Hopcroft Introduction What is data? • Raw, unprocessed facts and figures without context or meaning. • Often unorganized and random. • Collected directly from various sources (e.g., sensors, surveys, databases). • May consist of numbers, text, symbols, or measurements. • Examples • 23, "John", 5.5, "Blue“ • Sensor readings: Temperature: 28°C, Pressure: 1013 hPa Information • Information is processed, organized, or structured data that has some meaning or context • Examples • "John is 23 years old, 5.5 feet tall, and likes the color blue.“ • "The current weather is sunny with a temperature of 28°C and normal pressure." Computers and Information • A computer is a machine that manipulates information • Entire computer science branch is dedicated to • Study of how information is organized in a computer • How it can be manipulated • How it can be utilized • But fundamentally, information has • No formal definition, only notional • However, we can measure quantities of information Measures • Basic unit of information is the bit • One of 2 mutually exclusive possibilities, zero or one, but not both • Analogous to a switch with off or on states • With ๐ bits, we can represent 2๐ possibilities • E.g. 3 bits can be used to represent integers from 0 to 7 • However, no strict mapping exists for one possibility to a specific integer • Any assignment is fine as long as each possibility represents an unique integer Interpreting bit settings to represent integers • Binary number systems and mapping to decimal • Each position represents a power of 2 • Right most represents 20 = 1 • Next bit represents 21 = 2 and so on • 11001 will be 20 + 23 + 24 = 25 • How are negative numbers represented • Signed bit representation • One’s complement • Two’s complement • Other ways of interpreting bit settings to integers • Binary Coded Decimal Interpreting bit settings to represent real numbers • Floating point notation • Represented as ๐๐๐๐ก๐๐ ๐ ๐ × ๐๐๐ ๐ ๐๐ฅ๐๐๐๐๐๐ก • Base is usually fixed and mantissa and exponent varies to represent different real numbers • E.g. 198.25 can be represented as 19825 × 10−2 • Floating point numbers represented using 32 bits have 24 bit mantissa and 8 bit exponent • Both mantissa and exponent are two’s complement numbers • What is the largest and smallest real number that can be represented using 32 bits? Representing non-numerical data (Character Strings) • A different method of interpreting bit strings is necessary • Bit patterns may be mapped to characters in different ways • ASCII codes (7 bits) • Unicode (UTF-8, UTF-16 or UTF-32 uses 8, 16 and 32 bits resp.) • Strings are represented by concatenating the bit patterns of each character in the string Data Types • Thus data itself has no meaning • Any meaning can be assigned to a particular bit pattern to generate information as long as it is done consistently • Interpretation of the data (bit patterns) gives it a meaning • E.g. 100011 can be interpreted to represent the number 35 (binary) or 23 (BCD) or character `#’ • A method for interpreting the bit patterns is called data type • Different data types: integers, float, character etc. • Every computer has a set of native data types • Constructed with a mechanism for manipulating bit patterns consistent with the object they represent • For e.g. adding 2 floats require a different mechanism than adding 2 integers Implementing data types • One perspective about data types is • The set of native data types that a particular computer can support is determined by what functions are supported by the hardware • E.g. Integer addition & subtraction, float addition & subtraction • Other perspective about data types is • In terms of what the user wants to be done • Often by manipulating the mathematical concepts of supported data types • In such case a data type is an abstract concept defined by a set of logical properties • Example • A computer system supports an instruction ๐๐๐๐ธ(๐ ๐๐, ๐๐๐ ๐ก, ๐๐๐) that • Copies a character string of len bytes from src location to dest • Suppose we want to copy a string of arbitrary length, which is not known apriori (๐๐๐๐ธ๐๐ด๐ ) • We can decide on the new representation in memory and how the representation is to be manipulated Implementing ๐๐๐๐ธ๐๐ด๐ 5 /*Implementing MOVEVAR*/ ๐๐๐๐ธ ๐ ๐๐, ๐๐๐ ๐ก, 1 ๐๐๐ ๐ = 1; ๐ < ๐๐๐ ๐ก; ๐ + + ๐๐๐๐ธ(๐ ๐๐ ๐ , ๐๐๐ ๐ก ๐ , 1) H E L L 0 Implementing CONCATVAR 5 H E L L 0 9 E V E R Y B O D Y 14 H E L L O E V E R Y B O D Y /*Implementing ๐ถ๐๐๐ถ๐ด๐๐๐ด๐ */ ๐ง = ๐1 + ๐2; /*Implementing ๐ถ๐๐๐ถ๐ด๐๐๐ด๐ */ ๐๐๐๐ธ๐๐ด๐ ๐2, ๐3 ๐1 ; ๐๐๐๐ธ ๐ง, ๐3,1 ; ๐๐๐๐ธ๐๐ด๐ ๐1, ๐3 ; ๐๐๐ ๐ = 1; ๐ ≤ ๐1; ๐๐๐๐ธ ๐1 ๐ , ๐3 ๐ , 1 ; ๐ง = ๐1 + ๐2; ๐๐๐ ๐ = 1; ๐ ≤ ๐2 { ๐ฅ = ๐1 + ๐; ๐๐๐๐ธ ๐2 ๐ , ๐3 ๐ฅ , 1 ; } ๐๐๐๐ธ ๐ง, ๐3,1 ; Data types • In a programming context a data type • Is a classification that specifies the type of data that a variable or object can hold • It is defined by the following characteristics • Value Range: Specifies the range of values that a variable can take (e.g., integers, floating-point numbers). • Termed as DOMAIN • Memory Layout: Determines how the data is stored in memory. • Operations: Defines the operations that can be performed (e.g., arithmetic, logical, or relational). Common data types • Primitive Data Types: Basic types provided by a programming language (e.g., int, float, char in C). • Composite Data Types: Formed by combining other types (e.g., arrays, structures, classes in C++). • Abstract Data Types (ADTs): Logical descriptions of a data model (e.g., stack, queue) without specifying implementation details. Abstract Data Types • A useful tool for specifying the logical properties of a data type • ADT refers to the basic mathematical concept that defines a data type • Collection of values and the set of operations on those values • ADT is not concerned with the implementation details at all • Only the domain of a data type and the set of operations • Abstraction vs. implementation: Any specific implementation of an operation is not an ADT • Example • Arrays in C are not ADT as it enforces the way the operations are to be implemented, like directly accessing a position using the array index. • Strings are ADTs, their operations are defined but these operations can be implemented using various means Specifying ADTs • Can be specified using a number of methods including programming languages like C, C++, Java etc. • Example ADT of RATIONAL numbers ๐ • Rational numbers can be expressed as quotient of 2 integers , ๐ where ๐ ≠ 0 • Defined operations can be • Create a rational number • Add 2 rational numbers • Multiply 2 rational numbers • Check for equality of 2 rational numbers • ADT definitions consist of 2 parts • Value definition • Operator definition Specification of RATIONAL Implementation in C •Structures (struct) •C structures allow you to group data elements together, which can represent the data for an ADT. •Functions •Functions operate on the data encapsulated within the structure, representing the operations or behaviors of the ADT. •Encapsulation (via pointers and modular design) •While C lacks true encapsulation, one can achieve it by hiding the implementation details in .c files and exposing only the necessary interface in .h header files. RATIONAL in C ๐ก๐ฆ๐๐๐๐๐ ๐ ๐ก๐๐ข๐๐ก ๐๐๐ก๐๐๐๐๐{ ๐๐๐ก ๐๐ข๐๐๐๐๐ก๐๐; ๐๐๐ก ๐๐๐๐๐๐๐๐๐ก๐๐; }๐ ๐ด๐๐ผ๐๐๐ด๐ฟ; ๐๐๐ก ๐๐๐๐ ๐ฃ๐๐๐ { ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐1 = ๐๐๐๐๐ ๐๐ก๐๐๐๐๐ 2,3 ; ๐๐๐๐๐ก๐ (``Num=%d, Den=%d\n′′, ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐๐๐๐๐ ๐๐ก๐๐๐๐๐ ๐๐๐ก ๐, ๐๐๐ก ๐ { ๐๐ ๐ == 0 { ๐๐๐๐๐ก๐ "Denominator cannot be zero\n" ; ๐๐๐ก๐ข๐๐; } ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐ = ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐๐๐๐๐๐ ๐ ๐๐ง๐๐๐ ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ; ๐ → ๐๐ข๐๐๐๐๐ก๐๐ = ๐; ๐ → ๐๐๐๐๐๐๐๐๐ก๐๐ = ๐; ๐๐๐ก๐ข๐๐ ๐; } RATIONAL in C • Checking for equality of 2 rational numbers ๐๐๐ก ๐โ๐๐๐๐ธ๐๐ข๐๐ ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐1, ๐ ๐ด๐๐ผ๐๐๐ด๐ฟ ∗ ๐2 { ๐๐ ๐1 → ๐๐ข๐๐๐๐๐ก๐๐ ∗ ๐2 → ๐๐๐๐๐๐๐๐๐ก๐๐ = Digital Computer โ Computers have tremendous data processing capability, much much larger than humans. โ It is worth using such a device to solve different problems !! โ Computers can solve many problems. They are programmable ( can be instructed to do a task) โ Computers are equipped with : Input device – Devices used to provide input instances and input programs. Ex: keyboard Output Device – Notify user about result of computation. Ex : screen, printer Processing Unit – Performs the actual computation Arithmetic and Logic Unit (ALU) – Provides basic operational units of computer Control Unit - Controls flow of data and instructions Registers – Scratch locations used for storage of intermediate results and values External Memory – Out of the processor memory that provides storage space Main memory – memory closest to the CPU in which the programs are loaded for execution Secondary Memory – bigger memory meant for offline storage How Do Programs Run? Program = Algorithm ( Set of Instructions) + Data Execution of a Program, IIT Patna Spring Semester 2025 How Do Programs Run? High Level - Assembly Code - Machine Code Execution of a Program, IIT Patna Spring Semester 2025 How Do Programs Run? Memory Layout of a C code Need for Data Organization โ Programs involve data access and manipulation following a certain logic /* Compute the sum of two integers */ #include <stdio.h> int main() { int a, b, c; a = 10; b = 20; c = a + b; printf (“\n The sum of %d and %d is %d\n”, a,b,c); } โ Data Logic Requirement - Organized approach for data access and manipulation 1. Data model should be rich enough to mirror the actual relationship of data in real world 2. Structure should be simple enough so that it can be easily processed when necessary Example of Data Organization โ โ Problem : Search for a number k from a given set of N numbers Solution : 1. Store numbers in an array 3 7 6 1 10 8 2. Linearly Scan the array until k is found or array is exhausted Organization – Array Number of Checks 1. Best case - 1 2. Worst case - N 43 Alternate Way of Data Organization Problem : Search for a number k from a given set of N numbers โ Solution : 1. Store numbers as a tree 7 10 3 1 6 8 43 2. Search the tree until k is found Number of checks : 1. Best case : 1 2. Worst case : log 2 N Analyzing Data Organizations (Structure) Assume N = 1,000,000,000 and 1 GHz processor = 10 9 cycles per second, 1 cycle per transaction, Array based implementation takes N steps, requiring 1 billion clock cycles Tree based implementation takes log 2 N steps requiring 30 clock cycles Inference : Data Organization 2 (tree) is better than Data Organization 1 (array) in terms of the number of checks performed to search a number within a set of numbers Choice of Data Structure affects performance of Algorithm Data Structure • A container for data that allows organized access and manipulation • Examples : 1. Array 2. Linked List 3. Tree 4. Graph 5. Stack 6. Queue • And more...... Data Structure Applications : Example 1 Scheduling job in a printer Data structure : Printer queue Functions to support : Insert, delete Special accommodations needed for: Priority Scheduling, Dynamic update Data Structure Applications : Example 2 • Exploring Facebook Connection Network • Tell who is connected to your profile, directly and indirectly • Data structure : Network Graph • Functions to explore : Degree of separation Asymptotic Complexity • Running time of an algorithm as a function of input size n for large n. • Expressed using only the highest-order term in the expression for the exact running time. • Instead of exact running time, say Q(n2). • Describes behavior of function in the limit. • Written using Asymptotic Notation. Asymptotic Notation • Q, O, W, o, w • Defined for functions over the natural numbers. • Ex: f(n) = Q(n2). • Describes how f(n) grows in comparison to n2. • Define a set of functions; in practice used to compare two function sizes. • The notations describe different rate-of-growth relations between the defining function and the defined set of functions. Q-notation For function g(n), we define Q(g(n)), big-Theta of n, as the set: Q(g(n)) = {f(n) : ๏ค positive constants c1, c2, and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ c1g(n) ๏ฃ f(n) ๏ฃ c2g(n) } Intuitively: Set of all functions that have the same rate of growth as g(n). g(n) is an asymptotically tight bound for f(n). Q-notation For function g(n), we define Q(g(n)), big-Theta of n, as the set: Q(g(n)) = {f(n) : ๏ค positive constants c1, c2, and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ c1g(n) ๏ฃ f(n) ๏ฃ c2g(n) } Technically, f(n) ๏ Q(g(n)). Older usage, f(n) = Q(g(n)). I’ll accept either… f(n) and g(n) are nonnegative, for large n. Example Q(g(n)) = {f(n) : ๏ค positive constants c1, c2, and n0, such that ๏ขn ๏ณ n0, 0 ๏ฃ c1g(n) ๏ฃ f(n) ๏ฃ c2g(n)} • 10n2 - 3n = Q(n2) • What constants for n0, c1, and c2 will work? • Make c1 a little smaller than the leading coefficient, and c2 a little bigger. • To compare orders of growth, look at the leading term. • Exercise: Prove that n2/2-3n= Q(n2) Example Q(g(n)) = {f(n) : ๏ค positive constants c1, c2, and n0, such that ๏ขn ๏ณ n0, 0 ๏ฃ c1g(n) ๏ฃ f(n) ๏ฃ c2g(n)} • Is 3n3 ๏ Q(n4) ?? • How about 22n๏ Q(2n)?? O-notation For function g(n), we define O(g(n)), big-O of n, as the set: O(g(n)) = {f(n) : ๏ค positive constants c and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ f(n) ๏ฃ cg(n) } Intuitively: Set of all functions whose rate of growth is the same as or lower than that of g(n). g(n) is an asymptotic upper bound for f(n). f(n) = Q(g(n)) ๏ f(n) = O(g(n)). Q(g(n)) ๏ O(g(n)). Examples O(g(n)) = {f(n) : ๏ค positive constants c and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ f(n) ๏ฃ cg(n) } • Any linear function an + b is in O(n2). How? • Show that 3n3=O(n4) for appropriate c and n0. W -notation For function g(n), we define W(g(n)), big-Omega of n, as the set: W(g(n)) = {f(n) : ๏ค positive constants c and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ cg(n) ๏ฃ f(n)} Intuitively: Set of all functions whose rate of growth is the same as or higher than that of g(n). g(n) is an asymptotic lower bound for f(n). f(n) = Q(g(n)) ๏ f(n) = W(g(n)). Q(g(n)) ๏ W(g(n)). Example W(g(n)) = {f(n) : ๏ค positive constants c and n0, such that ๏ขn ๏ณ n0, we have 0 ๏ฃ cg(n) ๏ฃ f(n)} • ๏n = W(lg n). Choose c and n0. Relations Between Q, O, W Relations Between Q, W, O Theorem : For any two functions g(n) and f(n), f(n) = Q(g(n)) iff f(n) = O(g(n)) and f(n) = W(g(n)). •I.e., Q(g(n)) = O(g(n)) ๏W(g(n)) •In practice, asymptotically tight bounds are obtained from asymptotic upper and lower bounds. Running Times • “Running time is O(f(n))” ๏Worst case is O(f(n)) • O(f(n)) bound on the worst-case running time ๏ O(f(n)) bound on the running time of every input. • Q(f(n)) bound on the worst-case running time ๏ Q(f(n)) bound on the running time of every input. • “Running time is W(f(n))” ๏Best case is W(f(n)) • Can still say “Worst-case running time is W(f(n))” • Means worst-case running time is given by some unspecified function g(n) ๏ W(f(n)). Example • Insertion sort takes Q(n2) in the worst case, so sorting (as a problem) is O(n2). Why? • Any sort algorithm must look at each item, so sorting is W(n). • In fact, using (e.g.) merge sort, sorting is Q(n lg n) in the worst case. • Later, we will prove that we cannot hope that any comparison sort to do better in the worst case. Asymptotic Notation in Equations • Can use asymptotic notation in equations to replace expressions containing lower-order terms. • For example, 4n3 + 3n2 + 2n + 1 = 4n3 + 3n2 + Q(n) = 4n3 + Q(n2) = Q(n3). How to interpret? • In equations, Q(f(n)) always stands for an anonymous function g(n) ๏ Q(f(n)) • In the example above, Q(n2) stands for 3n2 + 2n + 1. o-notation For a given function g(n), the set little-o: o(g(n)) = {f(n): ๏ข c > 0, ๏ค n0 > 0 such that ๏ข n ๏ณ n0, we have 0 ๏ฃ f(n) < cg(n)}. f(n) becomes insignificant relative to g(n) as n approaches infinity: lim [f(n) / g(n)] = 0 n๏ฎ๏ฅ g(n) is an upper bound for f(n) that is not asymptotically tight. Observe the difference in this definition from previous ones. Why? w -notation For a given function g(n), the set little-omega: w(g(n)) = {f(n): ๏ข c > 0, ๏ค n0 > 0 such that ๏ข n ๏ณ n0, we have 0 ๏ฃ cg(n) < f(n)}. f(n) becomes arbitrarily large relative to g(n) as n approaches infinity: limn๏ฎ๏ฅ [f(n) / g(n)] = ๏ฅ. g(n) is a lower bound for f(n) that is not asymptotically tight. Comparison of Functions f๏ซg ๏ป a๏ซb f (n) = O(g(n)) ๏ป a ๏ฃ b f (n) = W(g(n)) ๏ป a ๏ณ b f (n) = Q(g(n)) ๏ป a = b f (n) = o(g(n)) ๏ป a < b f (n) = w (g(n)) ๏ป a > b Limits •lim [f(n) / g(n)] = 0 ๏f(n) ๏ o(g(n)) n๏ฎ๏ฅ •lim [f(n) / g(n)] < ๏ฅ ๏f(n) ๏ O(g(n)) n๏ฎ๏ฅ •0 < lim [f(n) / g(n)] < ๏ฅ ๏f(n) ๏ Q(g(n)) n๏ฎ๏ฅ •0 < lim [f(n) / g(n)] ๏f(n) ๏ W(g(n)) n๏ฎ๏ฅ •lim [f(n) / g(n)] = ๏ฅ ๏f(n) ๏ w(g(n)) n๏ฎ๏ฅ •lim [f(n) / g(n)] undefined ๏can’t say n๏ฎ๏ฅ Properties • Transitivity f(n) = Q(g(n)) & g(n) = Q(h(n)) ๏ f(n) = Q(h(n)) f(n) = O(g(n)) & g(n) = O(h(n)) ๏ f(n) = O(h(n)) f(n) = W(g(n)) & g(n) = W(h(n)) ๏ f(n) = W(h(n)) f(n) = o (g(n)) & g(n) = o (h(n)) ๏ f(n) = o (h(n)) f(n) = w(g(n)) & g(n) = w(h(n)) ๏ f(n) = w(h(n)) • Reflexivity f(n) = Q(f(n)) f(n) = O(f(n)) f(n) = W(f(n)) Properties • Symmetry f(n) = Q(g(n)) iff g(n) = Q(f(n)) • Complementarity f(n) = O(g(n)) iff g(n) = W(f(n)) f(n) = o(g(n)) iff g(n) = w((f(n)) Common Functions Monotonicity • f(n) is • monotonically increasing if m ๏ฃ n ๏ f(m) ๏ฃ f(n). • monotonically decreasing if m ๏ณ n ๏ f(m) ๏ณ f(n). • strictly increasing if m < n ๏ f(m) < f(n). • strictly decreasing if m > n ๏ f(m) > f(n). Exponentials • Useful Identities: 1 a ๏ฝ a ( a m ) n ๏ฝ a mn ๏ญ1 a m a n ๏ฝ a m๏ซn • Exponentials and polynomials nb lim n ๏ฝ 0 n๏ฎ๏ฅ a ๏ n b ๏ฝ o(a n ) Logarithms x = logba is the exponent for a = bx. Natural log: ln a = logea Binary log: lg a = log2a lg2a = (lg a)2 lg lg a = lg (lg a) a๏ฝb log b a log c ( ab) ๏ฝ log c a ๏ซ log c b log b a ๏ฝ n log b a n log c a log b a ๏ฝ log c b log b (1 / a ) ๏ฝ ๏ญ log b a 1 log b a ๏ฝ log a b a log b c ๏ฝc log b a Logarithms and exponentials – Bases • If the base of a logarithm is changed from one constant to another, the value is altered by a constant factor. • Ex: log10 n * log210 = log2 n. • Base of logarithm is not an issue in asymptotic notation. • Exponentials with different bases differ by a exponential factor (not a constant factor). • Ex: 2n = (2/3)n*3n. Polylogarithms • For a ๏ณ 0, b > 0, lim n๏ฎ๏ฅ ( lga n / nb ) = 0, so lga n = o(nb), and nb = w(lga n ) • Prove using L’Hopital’s rule repeatedly • lg(n!) = Q(n lg n) • Prove using Stirling’s approximation (in the text) for lg(n!). Exercise Express functions in A in asymptotic notation using functions in B. A B 5n2 + 100n 3n2 + 2 A ๏ Q(B) log2(n3) A ๏ Q(B) A ๏ Q(n2), n2 ๏ Q(B) ๏ A ๏ Q(B) log3(n2) logba = logca / logcb; A = 2lgn / lg3, B = 3lgn, A/B =2/(3lg3) n lg4 lg n 3 A ๏ w(B) alog b = blog a; B =3lg n=nlg 3; A/B =nlg(4/3) ๏ฎ ๏ฅ as n๏ฎ๏ฅ lg2n n1/2 lim ( lga n / nb ) = 0 (here a = 2 and b = 1/2) ๏ A ๏ o (B) n๏ฎ๏ฅ A ๏ o (B)