A self-service model of Data Wrangler makes pre-processing of data easy.
Through data exploration and loading, Data Wrangler provides data processing function that handles data correction and conversion.
Data profiling helps with identifying information such as data distribution status, data validation test and statistics. A comprehensive view of a given data set helps decide data cleansing target and processing type. By checking data distribution by column and invalid data, Data Wrangler suggests data correction, while returning maximum/minimum/mode/mean value.
Data Wrangler performs communication for jobs. For instance, it checks converted script by saving and managing data conversion history. It also manages preloaded data and column conversion history.
Merging data from different sources is possible with Data Wrangler and the lineage diagram provides a graphic representation of the relationship between the recipe and the source of the converted data.
- Apply Data Wrangler and Kubernetes engine all at once
- Kubernetes engine resources can be configured above the size of resources requested on Data Wrangler, preventing any errors caused by user mistake
- Uses schema information of connected data source (Hive schema, RDB schema)
- Load data using SQL
- Upload target data using local file feature
- Group function : Count, sum, avg, min, max, first, last, countDistinct, sumDistinct, collect_list, collect_set, etc.
- Window function : Lag, lead, rank, dense_rank, row_number, etc.
- Use various default scalar function as well as function needed for data pre-processing and math function
- Manage job, which applies recipe to the entire data and monitor execution status
- View job by status and name
- Monitor detailed status such as job list, status and execution time
Whether you’re looking for a specific business solution or just need some questions answered, we’re here to help