Uploaded by nachiketdeshmukh123

notes chap 1

advertisement
The importance of databases in both enterprise and daily life:
1. Role of Databases: Databases are essential not only for enterprises but also
for personal daily activities. They store and manage data that is accessed and
utilized in various ways.
2. Evolution of Database Interaction:
- Early databases were mainly back-office systems, and users interacted with
them through printed reports and paper forms.
- Over time, databases became more sophisticated, with improved programming
languages and user interfaces for querying and updating data.
- Advancements in hardware and software led to the development of more direct
user interactions with databases through applications.
3. User Interaction with Databases:
- Today, people interact with databases through web and mobile applications
without being aware of the underlying database structure.
- Examples include social media posts, online shopping, banking transactions,
and personalized advertisements, all of which rely on database access.
- Smartphones also frequently access databases for various functions.
4. Two Modes of Database Use:
- Online Transaction Processing (OLTP): This mode involves a large number of
users who retrieve small amounts of data and perform small updates. It is the
primary mode for most database applications used by individuals and businesses.
- Data Analytics: In this mode, data is processed to draw conclusions, infer
rules, or make decisions. Examples include banks determining loan eligibility
and online advertisers targeting ads to users based on data analysis.
5. Data Analysis and Predictive Models:
- Businesses use data analysis techniques to discover patterns and create
predictive models.
- Predictive models take input attributes (features) and generate
predictions, which are used for making business decisions.
- Examples include loan approval decisions and ad targeting decisions.
6. Importance of Data Mining:
- Data mining is a field that combines knowledge-discovery techniques from
artificial intelligence and statistics with efficient implementation on large
databases.
- It plays a crucial role in analyzing vast datasets to extract valuable
insights and patterns for decision-making.
In summary, databases play a central role in modern life, enabling various
interactions and decision-making processes. They support both transactional
operations and data analytics, and data mining techniques are used to extract
meaningful information from large datasets for informed decision-making.
Purpose of database systems and highlights the shortcomings of conventional
file-processing systems in managing data. Here are the key points:
1. Challenges with File-Processing Systems:
- In a university organization (used as an example), data about instructors,
students, departments, and courses are stored in operating system files.
- Application programs are developed to manipulate these files, including
adding new records, registering students, assigning grades, and more.
- As the university's needs change, new files and application programs are
added to the system.
2. Disadvantages of File-Processing Systems:
- Data Redundancy and Inconsistency: Different programmers may create files
and programs with different structures, leading to data redundancy and
inconsistency.
- Difficulty in Accessing Data: Retrieving specific data can be challenging
because there may not be existing application programs to meet certain queries.
- Data Isolation: Data is scattered across various files in different
formats, making it difficult to create new programs to retrieve data.
- Integrity Problems: Ensuring data consistency and enforcing constraints
(e.g., account balances must not go below zero) can be complex.
- Atomicity Problems: Ensuring that complex transactions are atomic (either
fully completed or not at all) is difficult.
- Concurrent-Access Anomalies: Multiple users updating data simultaneously
can lead to inconsistent data states.
- Security Problems: Enforcing security constraints, such as limiting access
to specific data based on user roles, is challenging in file-based systems.
4. Purpose of Database Systems:
- Database systems are introduced to overcome the limitations of fileprocessing systems.
- They provide a centralized and organized way to store, manage, and retrieve
data.
- Concepts and algorithms within database systems aim to solve the problems
associated with file-based data processing.
Data models:
- Data models are conceptual tools used to describe data, data relationships,
data semantics, and consistency constraints in a database.
- There are different data models covered in the text, classified into four
categories:
- Relational Model: Uses tables (relations) to represent data and
relationships among them. Each table has columns with unique names, and it is
widely used in modern database systems.
- Entity-Relationship Model: Uses entities and relationships to model data.
Entities represent real-world objects, and relationships describe connections
between them.
- Semi-Structured Data Model: Allows data items of the same type to have
different sets of attributes, unlike fixed-format records in other models. JSON
and XML are examples of semi-structured data representations.
- Object-Based Data Model: Integrates object-oriented concepts into
databases, allowing objects to be stored in relational tables and enabling the
execution of procedures within the database.
1. Data Abstraction Levels:
- Database systems use complex data structures to store and manage data
efficiently.
- To simplify interactions for users, multiple levels of data abstraction are
used, hiding the complexity of data storage and retrieval.
- The three primary levels of data abstraction are:
- Physical Level: Describes how data is physically stored, including
details of low-level data structures. It deals with the actual storage
mechanisms.
- Logical Level: Describes what data is stored in the database, the
relationships between data, and the overall database structure. It provides a
high-level, simplified view of the data.
- View Level: Describes specific parts of the database that users need to
access. It simplifies user interactions by providing tailored views of the data.
2. Data Models:
- Data models, such as the relational model, play a crucial role in data
abstraction. They hide low-level implementation details from users and
application developers.
- Application developers work with the abstractions provided by the data
model, and the database system handles the conversion of these operations into
low-level implementations.
4. Instances and Schemas:
- Databases change over time as data is added or removed. An instance of a
database represents the data stored in the database at a specific moment.
- The overall design of the database is referred to as the database schema.
- Database schemas can be categorized into:
- Physical Schema: Describes the physical storage and organization of data.
It is hidden beneath the logical schema and can be changed without affecting
application programs.
- Logical Schema: Describes the database design at the logical level,
including data structures and relationships. Programmers construct applications
based on the logical schema.
- View Schemas (Subschemas): Describe different views of the database at
the view level. They provide tailored views for specific users or applications.
5. Physical Data Independence:
- Application programs exhibit physical data independence if they do not
depend on the physical schema. This means they do not need to be rewritten if
the physical schema changes.
Data-Definition Language (DDL) and the Data-Manipulation Language (DML).
1. Database Language Integration:
- A database system offers a DDL for specifying the database schema and a
DML for expressing queries and updates.
- In practice, these languages are often integrated into a single database
language, such as SQL (Structured Query Language).
2. SQL in Relational Databases:
- SQL is widely used in relational database systems and is covered
extensively in later chapters .
3. Data-Definition Language (DDL):
- DDL is used to define the structure of the database schema.
- It is also used to specify additional properties and constraints on the
data.
- DDL statements can define the storage structure and access methods used by
the database system, though these details are typically hidden from users.
- DDL allows for the specification of consistency constraints to ensure data
integrity.
- Constraints can include:
- Domain Constraints: Specifying the valid domain of values for attributes.
- Referential Integrity: Ensuring that values in one relation's attributes
correspond to values in another relation's attributes.
- Authorization: Controlling user access to data, specifying read, insert,
update, or delete permissions.
4. Data Dictionary:
- Information generated by DDL statements is stored in a data dictionary.
- The data dictionary serves as metadata, containing information about the
database's structure, constraints, and other details.
- It is a special type of table accessible and updated only by the database
system itself, not regular users.
- The database system consults the data dictionary to enforce constraints and
access control.
2. Procedural vs. Declarative DML:
- Procedural DML requires users to specify both what data is needed and how
to obtain it.
- Declarative DML (nonprocedural DML) requires users to specify only what
data is needed, without specifying the steps to retrieve it.
- Declarative DMLs are typically easier to use but require the database
system to optimize the query execution.
3. Query Languages:
- A query is a statement that retrieves information from the database.
- A query language is used to express these queries.
- SQL (Structured Query Language) is a widely used nonprocedural query
language commonly used in relational database systems.
4. SQL Example:
- SQL queries take input from one or more tables and return a single table as
output.
- An example SQL query retrieves the names of instructors in the History
department:
```sql
SELECT instructor.name
FROM instructor
WHERE instructor.dept name = 'History';
```
5. Queries Involving Multiple Tables:
- SQL queries can involve information from multiple tables by specifying how
tables are related.
- An example SQL query retrieves instructor IDs and department names for
instructors associated with departments having a budget greater than $95,000.
```sql
SELECT instructor.ID, department.dept name
FROM instructor, department
WHERE instructor.dept name= department.dept name AND department.budget >
95000;
```
6. Database Access from Application Programs:
- SQL is not as powerful as general-purpose programming languages, and
certain computations and actions (e.g., user input, display output) are not
supported by SQL.
- Application programs are written in languages like C/C++, Java, or Python
and use embedded SQL queries to access and manipulate the data.
- Application-program interfaces (APIs) like ODBC and JDBC facilitate
communication between the application and the database.
Database design is a crucial aspect of creating efficient and effective database
systems. Here are the key points related to database design:
1. Scope of Database Design:
- Database design primarily focuses on designing the database schema, which
involves defining the structure and organization of data in the database.
- While schema design is a significant part of the process, designing a
complete database application environment involves broader considerations, such
as application design.
2. Understanding User Requirements:
- The initial phase of database design involves understanding and
characterizing the data requirements of the users who will interact with the
database.
- Database designers need to collaborate with domain experts and users to
gather these requirements effectively.
3. Choosing a Data Model:
- Once user requirements are documented, the database designer selects an
appropriate data model.
- The chosen data model serves as a conceptual framework for specifying how
the data will be structured in the database.
4. Conceptual Schema:
- The designer translates user requirements into a conceptual schema for the
database. This schema provides a high-level overview of the enterprise or system
being modeled.
- During this phase, the focus is on describing the data and their
relationships, rather than specifying physical storage details.
6. Functional Requirements:
- A fully developed conceptual schema also includes functional requirements,
which describe the types of operations or transactions that will be performed on
the data.
- Examples of operations include data modification, retrieval, and deletion.
- The designer ensures that the schema can accommodate these functional
requirements.
7. Logical and Physical Design:
- The design process proceeds with two additional phases: logical design and
physical design.
- In the logical-design phase, the high-level conceptual schema is mapped
onto the implementation data model of the chosen database system.
- The physical-design phase focuses on specifying physical features, such as
file organization and storage structures.
The database engine is a crucial component of any database system. It is
responsible for managing and controlling various aspects of the database,
including storage, data retrieval, query processing, and transaction management.
Here's an overview of the key components and responsibilities of a database
engine:
1. Storage Manager:
- The storage manager is responsible for handling the physical storage of
data in the database. Since databases can be quite large, they are typically
stored on disk drives.
- Data is moved between disk storage and main memory (RAM) as needed for
processing.
- Key responsibilities of the storage manager include:
- Storing, retrieving, and updating data in the database.
- Managing file allocation and storage structures on disk.
- Implementing caching strategies to minimize data movement between disk
and memory.
2. Query Processor:
- The query processor is responsible for processing and optimizing queries
and data manipulation operations.
- It interprets high-level database queries and translates them into a
sequence of low-level operations that the DBMS can execute efficiently.
- The query processor consists of several components, including:
- DDL Interpreter: Interprets DDL statements for schema definition and
records the definitions in the data dictionary.
- DML Compiler: Translates DML statements into query execution plans,
considering query optimization to choose the most efficient plan.
- Query Evaluation Engine: Executes the low-level instructions generated by
the DML compiler.
3. Transaction Management:
- Transaction management is responsible for ensuring the consistency and
durability of the database in the face of concurrent operations and system
failures.
- Key properties of transactions include atomicity (all-or-nothing),
consistency (preserving database consistency), and durability (changes survive
system failures).
- The transaction manager includes:
- Concurrency Control Manager: Ensures that concurrent transactions do not
interfere with each other, maintaining data consistency.
- Recovery Manager: Manages database recovery in case of system failures,
ensuring that the database can be restored to a consistent state.
4. Storage Structures:
- The storage manager implements various data structures to organize and
access data efficiently.
- These structures include data files (storing the actual data), a data
dictionary (metadata about the database schema), and indices (providing fast
access to data items).
5. Parallel Processing:
- Modern database engines often employ parallel processing techniques to
efficiently handle large volumes of data.
- This includes parallel data storage and parallel query processing, allowing
for faster data retrieval and analysis.
6. Solid-State Disks (SSDs):
- With the increasing use of SSDs for database storage, the storage manager
may need to adapt to take advantage of the speed and reliability of these
devices.
Database engines are complex software systems designed to manage data
efficiently while providing high levels of reliability and performance. They
play a critical role in the functioning of modern database systems, enabling
users to interact with and extract valuable information from large datasets.
The provided text describes the architecture of a database system and how it
interacts with various types of users and components. Here's a breakdown of the
key elements of the database and application architecture as presented in Figure
1.3:
1. Users:
- The architecture distinguishes between different types of users, including
naive users (such as tellers and web users), application programmers, and
sophisticated users (such as analysts).
2. Application Interfaces:
- Naive users interact with the database system through application
interfaces tailored to their needs.
- Application programmers write application programs that utilize the
capabilities of the database system.
- Sophisticated users may use query tools and administration tools to
interact with the database.
3. Query Processor:
- The query processor is responsible for processing and optimizing database
queries.
- It includes components like the DDL interpreter, DML compiler , and query
evaluation engine.
4. Database Engine Components:
- These components include the DML compiler and organizer, query evaluation
engine, buffer manager, file manager, transaction manager, storage manager, and
authorization and integrity manager.
- They work together to manage data storage, execute queries, ensure data
integrity, and handle transactions.
This architecture provides a high-level overview of how different users interact
with the database system and how the various components of the database engine
collaborate to manage data and queries efficiently.
Additionally, the text mentions the evolution from a centralized server
architecture to distributed and parallel database systems, highlighting the
scalability and performance enhancements achieved by these advanced
architectures.
The architecture of database applications is also discussed, with a shift from
two-tier to three-tier architectures, where the client machine serves as a front
end, the application server handles business logic, and the database system
stores and manages data. This three-tier architecture offers improved security
and performance.
A Database Administrator (DBA) plays a crucial role in managing a database
system and ensuring its smooth operation. Here are some of the key functions and
responsibilities of a DBA:
1. Schema Definition:
- The DBA is responsible for defining the initial database schema by
executing data definition statements (DDL). This involves specifying the
structure of the database, including tables, columns, data types, constraints,
and relationships.
2. Storage Structure and Access-Method Definition:
- The DBA may be involved in determining the physical organization of data
and specifying access methods, including the creation of indices. These
decisions can impact the database's performance and storage efficiency.
3. Schema and Physical Organization Modification:
- As the needs of the organization evolve, the DBA must be prepared to modify
the database schema and physical organization to accommodate these changes. This
could involve adding new tables, modifying existing ones, or optimizing data
storage.
4. Authorization Management:
- The DBA controls user access to the database by granting and managing
authorizations. Different users may have different levels of access based on
their roles and responsibilities. This helps enforce security and data privacy.
5. Routine Maintenance:
- The DBA is responsible for performing routine maintenance tasks to ensure
the health and performance of the database. This includes:
- Regularly backing up the database to prevent data loss in case of
disasters.
- Monitoring and managing disk space to ensure there is enough storage
available.
- Monitoring database performance and optimizing queries or configurations
to maintain efficient operation.
- Managing user accounts and access privileges.
6. Security Management:
- DBAs are responsible for implementing security measures to protect the
database from unauthorized access and security threats. This includes setting up
user authentication, encryption, and auditing.
7. Performance Tuning:
- DBAs monitor database performance and work to optimize it. They may
identify and resolve bottlenecks, tune queries, and make adjustments to the
database configuration to improve overall performance.
8. Data Integrity and Backup:
- Ensuring data integrity is a critical responsibility. DBAs implement
constraints and validation rules to maintain data accuracy. They also establish
backup and recovery procedures to safeguard data in case of failures.
9. Capacity Planning:
- DBAs forecast future storage and performance needs based on the growth of
data and user activity. This helps in provisioning adequate resources for the
database system.
10. Disaster Recovery Planning:
- Preparing for and mitigating potential disasters, such as system failures
or data corruption, is a vital DBA function. This includes creating disaster
recovery plans and performing regular drills.
Overall, a skilled DBA is essential for maintaining the reliability, security,
and performance of a database system, ensuring that it meets the organization's
data management needs effectively.
Download