Data Profiling in Dataedo - Dataedo Documentation 24.08.22, 13:53 sales@dataedo.com Dataedo Documentation Version 10.x (current) Filter by title... Building Data Catalog Metadata Management Business Glossary +1 704-387-5078 Data Profiling in Dataedo Mac Lewandowski ! 4th November, 2021 Dataedo 10 allows you to discover data stored in the database and review its contents and quality. Data Profiling module is a combination of useful metrics with friendly User Interface. On Data Classification top of it, Profiling in Dataedo allows you to peak into most Data Profiling common, or random data from your tables. in Dataedo Data Profiling in Dataedo Web Catalog Supported data types Supported sources ! Search In this article What is Data Profiling? Data Profiling in Dataedo Table row count Column distribution Column values profile String length profile Column top/all values/random values Sample data How it works Data Profiling Data Profiling in Dataedo Desktop My account Saving What is Data Profiling? Where data is saved? In general Data Profiling means the process of inspecting the data and presenting statistics and metrics about it. This is usually done in order to: find out what's the quality of data and can it be reused, better understand data structure, discover potential data challenges and improvements, review data before building software based on it, Configuration (enabling/disabling) Security Considerations Troubleshooting Data Profiling in Dataedo Table row count Dataedo scans table and counts number of rows in that table. Each time you save profiling the values are updated, and later presented in both Dataedo Desktop and Dataedo Web Catalog. Column distribution Column distribution scans different types of values in the column in terms of nullability and uniqueness: https://dataedo.com/docs/data-profiling-in-dataedo Page 1 of 5 Data Profiling in Dataedo - Dataedo Documentation 24.08.22, 13:53 Distinct values - Rows with values that are unique in the column (think of ID or Order number) Non distinct values - Rows that are non-unique and nonempty (think of First name) Empty - Rows non-null values but with empty strings (for instance '' or ' ') NULL - Rows with null values Column values profile Dataedo performs basic profiling of numeric values in the column. Results depends on a data type. Learn more about Supported data types. Numerical String Date Min minimum value first alphabetically sorted string earliest found date Max maximum value last alphabetically sorted string latest found date Avg average value - - Variance variance counted for values - - Standard deviation standard deviation for values - - Span difference between Max and Min values - difference between Min and Max dates (formatted, ie. 2 months, 2.5 years) Distinct number of distinct values number of distinct strings number of distinct dates https://dataedo.com/docs/data-profiling-in-dataedo Page 2 of 5 Data Profiling in Dataedo - Dataedo Documentation 24.08.22, 13:53 String length profile Dataedo performs basic profiling of column string length: Min – Minimum length of non null string in the column, Max – Maximum length of string in the column, Avg – average stringh length, Variance – counted for string length, Standard deviation – counted for string length, Column top/all values/random values Dataedo can scan columns for top or random values. For each value it calculates how many rows have that value: Top 10/100/1000 values – By default Dataedo scans top 10 or 100 most popular values from the column. All values – if the number of distinct values is less than 1000 you can ask Dataedo to fetch all the values from the column. Random values – you can ask Dataedo to sample random 10 values from entire table. This can be useful for unique values such as order_number. Sample data Dataedo fetches 10 random rows from the table and presents it in the tabluar form. This data cannot be saved to the repository. https://dataedo.com/docs/data-profiling-in-dataedo Page 3 of 5 Data Profiling in Dataedo - Dataedo Documentation 24.08.22, 13:53 How it works On user request, Dataedo scans tables and columns and gathers statistics and top data. Worth mentioning is that preparing statistics are calculated on a database level, so we are not downloading more data from the database than necessary. Prepared statistics are presented to a user in Dataedo Desktop. https://dataedo.com/docs/data-profiling-in-dataedo Page 4 of 5 Data Profiling in Dataedo - Dataedo Documentation 24.08.22, 13:53 Saving Saving is optional. Moreover saving can be disabled by configuration. Read more about configuration in Data Profiling configuration article. By default saving data is disabled. Where data is saved? Profiling data can be saved in the repository right next to the data model metadata (tables, columns). Ask our community Contact support Report issue Found issue with this article? Comment below There are no comments. Click here to write the first comment. Product Company Newsletter Document your data and gather tribal Features About us Subscribe to our newsletter knowledge with Data Dictionary & Data Tutorials Customers Download Blog Support Contact us Catalog, Business Glossary, and ERDs. Contact us sales@dataedo.com +1 704-387-5078 and receive the latest tips, cartoons & webinars straight to your inbox. Your email * Careers Partners Email address Subscribe © 2022 Dataedo Sp. z o.o. https://dataedo.com/docs/data-profiling-in-dataedo Privacy Policy License Agreement Page 5 of 5