Uploaded by vagagn4

Data Profiling in Dataedo - Dataedo Documentation

advertisement
Data Profiling in Dataedo - Dataedo Documentation
24.08.22, 13:53
sales@dataedo.com
Dataedo
Documentation
Version 10.x (current)
Filter by title...
Building Data Catalog
Metadata
Management
Business Glossary
+1 704-387-5078
Data Profiling in
Dataedo
Mac Lewandowski
! 4th November, 2021
Dataedo 10 allows you to discover data stored in the database
and review its contents and quality. Data Profiling module is a
combination of useful metrics with friendly User Interface. On
Data Classification
top of it, Profiling in Dataedo allows you to peak into most
Data Profiling
common, or random data from your tables.
in Dataedo
Data Profiling in
Dataedo Web
Catalog
Supported data
types
Supported
sources
! Search
In this article
What is Data Profiling?
Data Profiling in Dataedo
Table row count
Column distribution
Column values profile
String length profile
Column top/all values/random
values
Sample data
How it works
Data Profiling
Data Profiling in
Dataedo
Desktop
My account
Saving
What is Data Profiling?
Where data is saved?
In general Data Profiling means the process of inspecting the
data and presenting statistics and metrics about it.
This is usually done in order to:
find out what's the quality of data and can it be reused,
better understand data structure,
discover potential data challenges and improvements,
review data before building software based on it,
Configuration
(enabling/disabling)
Security
Considerations
Troubleshooting
Data Profiling in Dataedo
Table row count
Dataedo scans table and counts number of rows in that table.
Each time you save profiling the values are updated, and later
presented in both Dataedo Desktop and Dataedo Web
Catalog.
Column distribution
Column distribution scans different types of values in the
column in terms of nullability and uniqueness:
https://dataedo.com/docs/data-profiling-in-dataedo
Page 1 of 5
Data Profiling in Dataedo - Dataedo Documentation
24.08.22, 13:53
Distinct values - Rows with values that are unique in the
column (think of ID or Order number)
Non distinct values - Rows that are non-unique and nonempty (think of First name)
Empty - Rows non-null values but with empty strings (for
instance '' or ' ')
NULL - Rows with null values
Column values profile
Dataedo performs basic profiling of numeric values in the
column. Results depends on a data type. Learn more about
Supported data types.
Numerical
String
Date
Min
minimum
value
first
alphabetically
sorted string
earliest found
date
Max
maximum
value
last
alphabetically
sorted string
latest found
date
Avg
average
value
-
-
Variance
variance
counted
for values
-
-
Standard
deviation
standard
deviation
for values
-
-
Span
difference
between
Max and
Min
values
-
difference
between Min
and Max dates
(formatted, ie.
2 months, 2.5
years)
Distinct
number of
distinct
values
number of
distinct
strings
number of
distinct dates
https://dataedo.com/docs/data-profiling-in-dataedo
Page 2 of 5
Data Profiling in Dataedo - Dataedo Documentation
24.08.22, 13:53
String length profile
Dataedo performs basic profiling of column string length:
Min – Minimum length of non null string in the column,
Max – Maximum length of string in the column,
Avg – average stringh length,
Variance – counted for string length,
Standard deviation – counted for string length,
Column top/all values/random values
Dataedo can scan columns for top or random values. For each
value it calculates how many rows have that value:
Top 10/100/1000 values – By default Dataedo scans top
10 or 100 most popular values from the column.
All values – if the number of distinct values is less than
1000 you can ask Dataedo to fetch all the values from the
column.
Random values – you can ask Dataedo to sample random
10 values from entire table. This can be useful for unique
values such as order_number.
Sample data
Dataedo fetches 10 random rows from the table and presents
it in the tabluar form. This data cannot be saved to the
repository.
https://dataedo.com/docs/data-profiling-in-dataedo
Page 3 of 5
Data Profiling in Dataedo - Dataedo Documentation
24.08.22, 13:53
How it works
On user request, Dataedo scans tables and columns and
gathers statistics and top data. Worth mentioning is that
preparing statistics are calculated on a database level, so we
are not downloading more data from the database than
necessary.
Prepared statistics are presented to a user in Dataedo
Desktop.
https://dataedo.com/docs/data-profiling-in-dataedo
Page 4 of 5
Data Profiling in Dataedo - Dataedo Documentation
24.08.22, 13:53
Saving
Saving is optional. Moreover saving can be disabled by
configuration. Read more about configuration in Data Profiling
configuration article. By default saving data is disabled.
Where data is saved?
Profiling data can be saved in the repository right next to the
data model metadata (tables, columns).
Ask our community
Contact support
Report issue
Found issue with this article? Comment below
There are no comments. Click here to write the first comment.
Product
Company
Newsletter
Document your data and gather tribal
Features
About us
Subscribe to our newsletter
knowledge with Data Dictionary & Data
Tutorials
Customers
Download
Blog
Support
Contact us
Catalog, Business Glossary, and ERDs.
Contact us
sales@dataedo.com
+1 704-387-5078
and receive the latest tips,
cartoons & webinars straight
to your inbox.
Your email *
Careers
Partners
Email address
Subscribe
© 2022 Dataedo Sp. z o.o.
https://dataedo.com/docs/data-profiling-in-dataedo
Privacy Policy
License Agreement
Page 5 of 5
Download