Uploaded by Duc D uni

data management short questions

Given a database with (at least) a table service_request_xs.
The first 10 rows of the service_request_xs table:
req_id created_date
agency_name complaint_type
incident_zip req_status nyc_borough
-------- ------------------- ----------- ----------------------------- ------------ ---------- ------------13206317 2015-10-01 00:00:00 DOHMH
14291494 2015-10-01 00:00:00 DOHMH
15749776 2010-01-17 22:57:59 NYPD
Noise - Residential
Closed BRONX
16178965 2010-03-10 10:22:42 DOHMH
16350683 2010-03-31 11:21:00 DOT
Street Light Condition
Closed NULL
16359281 2010-04-01 11:48:59 DOB
General Construction/Plumbing 10002
16864252 2010-06-14 08:32:19 DOB
General Construction/Plumbing 10032
17602477 2010-07-07 19:15:00 DEP
Water System
18812685 2010-10-02 18:14:47 NYPD
Blocked Driveway
19426376 2010-12-27 19:03:36 NYPD
Noise - Residential
Right now, the Open311 database is used only in San Francisco and Washington, DC,
and it encompasses just basic quality-of-life complaints: potholes, garbage, vandalism,
and so on. But Open311 intends to eventually serve as a national, universal 311 thatunlike New York's current system-can be added to and accessed by anyone. That means
outside parties can develop new interfaces, both for reporting problems and for
visualizing the data. "It's designed to be a write-once, run-everywhere platform," says
OpenPlans program manager Philip Ashlock, using software terminology conventionally
applied to operating systems. In the current 311 paradigm, each new city is the
equivalent of a different OS, because the data is structured differently from place to
place. But with Open311, an app built for San Francisco can be ported instantly to work
in DC.
Which of the DAMA Wheel components (data management areas) is most suited for laying the
foundations for the initiative described above, and why?
In order to satisfy a ‘write-once, run-everywhere’, a standardised data model is crucial. By apply
data modelling principles, the initiative can define common data model for various complaints
such as trash, noise, rodent, and more. It improved data quality, data usability and reduce
redundancy by having a consistent model to follow. The standardise data model would also
ensure that the data from different cities and regions can be stored, accessed, and processed in
a uniform manner. It would establish a standardized structure for the data, enabling easy
integration and interoperability between different systems and applications.
State 3 metadata descriptions for the ‘created_date’ column that can be enforced/implemented
by the DBMS managing actual data
-Data type: The created_date column should be of the datetime data type. This will ensure
that the values stored in the column are dates and times in a consistent format.
-Not null: The created_date column should be marked as not null. This will prevent users
from creating rows in the table without specifying a value for the created_date column.
-Default value: The created_date column should have a default value
of CURRENT_TIMESTAMP. This will ensure that all rows in the table have a value for
the created_date column, even if the user does not specify one.
1. Volume: Big data is characterized by its large volume. This means that it can be
difficult to store and process using traditional methods.
2. Velocity: Big data is often generated in real time or near real time. This means
that it can be difficult to keep up with the data flow.
3. Variety: Big data can come from a variety of sources, including structured, semistructured, and unstructured data. This makes it difficult to unify and analyze the
4. Veracity: Big data can be inaccurate or incomplete. This can make it difficult to
trust the data and draw accurate conclusions from it.
5. Value: Big data can be used to gain insights into customer behavior, improve
decision-making, and identify new opportunities. This can lead to increased
revenue, improved efficiency, and reduced costs.
When the city’s Taxi and Limousine Commission installed television screens and credit card
machines… Which of the Big Data dimensions is most relevant to this part of case study and
Variety is the most relevant. Television screens can display various types of content, such
as advertisements, news, entertainment, or informational messages. This introduces
multimedia data, including images, videos, and potentially interactive elements, into the
taxi system. The content displayed on the screens may come from different sources, such
as media companies, advertisers, or city authorities.
Credit card machines enable electronic payment transactions within taxis, generating
financial transaction data. This includes customer payment information, transaction
amounts, payment methods, and potentially additional metadata related to the
The taxi system now deals with diverse data types including multimedia content & financial
transaction data. This increases the variety of data that needs to be collected, processed,
and analyzed.
By making all complaints and queries public, these services let ordinary people detect emergent
pattern.. Which of the Big data is most relevant to this part of the case study and why?
The most relevant big data dimension to this part of the case study is Velocity. Velocity
refers to the speed at which data is generated and processed. In the case of the public
complaints and queries, the data is generated very quickly, as people are constantly
submitting new complaints and queries. This high velocity of data generation allows
ordinary people to detect emergent patterns, such as trends or correlations, that might not
be visible if the data was generated more slowly.
For example, if a large number of people are complaining about the same issue, this could
be an indication of a systemic problem that needs to be addressed. By making all
complaints and queries public, these services allow ordinary people to identify these
problems and bring them to the attention of the authorities.