Data Factory

Data Factory is a data workflow scheduling service on cloud, which has built-in orchestrating and scheduling capabilities of data synchronization and data processing and analysis tasks. It helps the users to quickly build the data processing and analysis jobs in workflow and perform on a periodical basis.

Try Now

Benefits

Support Diversified Data Sources
It originally supports multiple data sources for connection and data collection from the services on the cloud and different data sources in the user local environment, which can accelerate the process of data integration.
Reliable Data Synchronization
The built-in efficient data transport capability allocates resources according to the allocation tasks with high-quality and reliable synchronization services.
Seamless Integration
Seamlessly integrated with services on the cloud, it can quickly connect to the Data Compute, JMR, Stream Hub, Stream Compute and other services of the users, and perform the data-driven service orchestration.
Simple and User-friendly
Wizard configuration, graphical service orchestration and flexible job scheduling are provided, so that the users can quickly build the data pipeline for data collection and ETL processing and analysis.
Enterprise Operation and Maintenance
Job running monitoring, exception handling, alarm warning, job log search and other features are provided to support the self-help O&M by enterprises.

Features

Service Connection Management

Data Source Connection

The Data Factory originally supports multiple common data sources, enabling connection and data collection from the services on the cloud and different data sources in the user local environment, which can accelerate the process of data integration. Currently, it supports cloud storage, cloud database, Data Compute, SQL Server, Oracle, MySQL, DB2, FTP, etc.

Computing Resource Connection

As a data integration service on the cloud, the Data Factory needs to perform ETL processing on the data that are accessed to the data source. By connecting different analysis services, the data factory cleans, transforms and analyzes the accessed data in the form of workflow to enable ETL. Currently, the data factory supports access to Data Compute, with such services as JD MapReduce, Stream Hub, Stream Compute and machine learning platform to be added later.

Data Synchronization

Data Access and Distribution

Data synchronization of the data factory supports synchronization of data from multiple local and cloud data sources, supporting the users with different synchronization policies such as full-scale synchronization, incremental synchronization and so on. It can be used for data access to enterprise data warehouse to collect multi-source data, and distribute the processed data in the data warehouse to the production system to support online services such as database system through the data synchronization feature of the Data Factory.

Data Workflow

Orchestration and Scheduling of Data Synchronization and Processing

The analysis tasks such as data access, data cleaning, data aggregation are orchestrated and organized through a unified Workflow Management Module, so that the users can formulate scheduling policies based on different cycles by month, week, day or hour as demanded by the service.

Job Operation and Maintenance

Job Alarm Warning Rule

The workflow of the Data Factory can be set up with multiple alarm warning policies to inform users of key running status of the task in no time.

Job Operation Monitoring

A record of the execution status and history of the workflow is provided for the users to view the execution results and detailed logs of each job and track the detailed logs of each execution link in each workflow, which can facilitate the diagnosis and analysis of problems by the operation and maintenance personnel.

Scenarios

Data Access to Cloud Data Warehouse

Supported by the Data Factory services, enterprises can quickly execute with low costs the data synchronization tasks targeting cloud database, Object Storage Service and standard data interface service (JDBC adaptive database/FTP service, etc.). With the scheduling task management, enterprises can easily realize periodic data access of data warehouse to different data sources.

Local Data Migration to Cloud

Data Integration provides a convenient tool (Client and SDK), making it convenient for the users to upload the local file and local database (mysql, oracle, sql server and DB2) data to the cloud data warehouse (Data Compute) in the local environment based on the mode of command line /SDK. In addition, this tool can also be used to download the cloud data warehouse to local, substantially lowering the threshold of local enterprise data to Cloud.