Feature Enhancements
New Check – Is Address:
- Ensures that address fields conform to required components. Users can enforce the correctly formatted presence of data representing a Road, City, State, Country, Post Code, or any combination thereof via multi-select picklist.
- This check leverages machine learning to support multilingual, global street address parsing and normalization. It was trained on over 1.2 billion records of real address data from every inhabited country on earth, in over 100 languages.
- Internal testing yielded 99.45% full-parse accuracy on held-out addresses (i.e. addresses from the training set that were purposefully removed so we could evaluate the parser’s performance against addresses it hadn’t seen before).
Enhanced Activity Reporting
- Selecting a date in the heatmap now filters the list of operations to include only those that were conducted on that date, allowing for faster, easier investigation of historical data quality events.
- Operation details have been significantly expanded to include comprehensive information about the profiles scanned, with the ability to easily drill down to individual tables/files and associated Anomalies.
- Operations now display and directly link to the Schedules that triggered them, improving traceability and visibility while streamlining navigation.
Table-Level And File-Level Insights
- Previously, the Insights Dashboard was only filterable using Datastore-level tags. We now support this same functionality for tagged tables and files In order to provide more granular, targeted data quality reporting.
Support For Incremental Scanning Of Partitioned Files:
- Optimized the incremental scanning process by tracking changes at the record level rather than the last modified timestamp of the folder. This prevents the scanning of unnecessary rows, ensuring that only the most relevant data is included for analysis.
As usual, our User Guide and accompanying Change Log captures more details about this release.