Book Your slot

X

ONLINE BOOKING

OFFLINE BOOKING

Call or WhatsApp 7993732682 (WhatsApp Now), 9177341827 (WhatsApp Now)

Questions

Library

University Updates

Informatives

Technology Lines

Training & Internships

X

Menu

Legal Disclaimer

Terms & Conditions

Need any help?? write to us at

support@engineershub.co

Follow Us

X

LOGIN

Login to access posts, links, updates, question papers, materials, one liners!

Use Your Email Address/Mobile and Password to Login

Forgot Password?

Not a member? Sign Up

LOGIN WITH EMAIL/MOBILE

Forgot Password?

Go Back

FORGOT PASSWORD

Go Back

RESET PASSWORD

Go Back

Continue with LinkedIn

OR

Fill Up a Simple Form

Already a Member? Login

SIGN UP

Fill all the below details correctly and click on Next

Go Back

Explain in detail about ETL? - EngineersHub

RealTime 2020 Training

Go Back

Question

Velkanti Varshitha

5 years ago

Explain in detail about ETL?

#Data Mining #Big Data

1 Answer(s) posted Write an answer 4525

Answer

Read Mode

Answer posted by Vamshi

5 years ago

ETL is a set of Extraction, Transformation and Loading process, which is performed, in the data staging environment. It typically converts the irrelevant data extracted from several operational systems, to the relevant/useful information. This information thus be stored into the data warehouse environment. The warehouse cannot be implemented efficiently, if the data is not collected, cleansed and integrated in an appropriate way. Hence, the ETL process must be efficiently designed and implemented, for an optimized development of data warehouse.

The major steps involved in the ETL process are,

1.Identifying the data tables, that need to be loaded in the data warehouse.

2. Identifying operational systems from which, the data needs to be extracted.

3. Mapping the raw data from the operational systems to the processing data stored in the data warehouse.

4.Identifying rules/standards for extracting data.

5.Identifying rules for cleansing and transforming the data.

6.Designing the aggregate table.

7.Storing the data tables in the data staging environment.

8. Creating procedures/approaches for loading data.

9. Performing ETL for dimension tables.

10. Performing ETL for fact tables.

Data Extraction: The function of the data extraction is to find and extract the data from various relational and non-relational data sources for future use. Data extraction process includes determining the operational systems from which the data can be collected. Capture the appropriate data from data sources to make required data analysis. Identify the extraction frequency of data i.e., daily, weekly or monthly to keep the data warehouse up to date.

Extraction can be carried in two ways. They are,

(i) Full Extraction: Extract the whole data from the data sources systems every time, then compare the extracted data with the previous data to find the changes. This extraction process does not require the tracking of changes at source systems from the last extraction point This process does not require additional information at source systems.

(ii) Incremental Extraction: In this process, the data is extracted beginning at the last successful extraction from the data source. This method requires an additional mechanism like time scheduler or some triggering events. This process can also be referred as change data capture.

Data Transformation: Data transformation is a process in which the data collected from many dissimilar source systems are transformed in accordance to a standardized format. The main purpose of performing a data transformation is to enhance the quality of the data before providing it as input in the data warehouse. Before this transformation process, reliability, consistency and validity of the data are checked.

Data transformation is carried out by following the below steps.

Data elements should be identified that are to be moved to data warehouse.
Calculated and derived data values should be identified.
Query performance is improved by performing aggregation of data depending upon the granularity of data that is decided while design phase of data warehouse.
Next step is to perform data cleaning.
In this step, define the rules for transformation, data mapping criteria and meta data updation.
In this step, data should be restructured.
Data fields from various sources should be combined into one entity.
For the captured data, data values, data types and field lengths should be standardized.
Data integrity should be maintained within the data warehouse.
Change the data according to the structure and format of data warehouse.
Analysis of data warehouse can be made easy by simplifying and re-arranging the individual fields.

Some of the transformation processes are, aggregation, normalization, union, table load up, join, routing of data, sequence generators, enforcing keys etc.

Data Loading: Data loading is a process of loading the extracted and transformed data, into the data warehouse repository. Typically, it is described by using the following three phrases.

(i) Initial Load: This phrase, specifies the initial loading of all the tables in the data warehouse repository i.e., the tables loaded for the first time in the warehouse.

Initial load moves large data within 8-10 hours. The time required for moving the data depends on the amount of history data needed for effective analysis.

(ii) Incremental Load: This phrase describes the frequent or incremental data changes made in the warehouse. Incremental loading can be performed based on two ways. time stamp some FLAG column-pick up those rows which are marked 'Y'.

(iii) Full Refresh: This phrase, describes the elimination of contents of at least one table and the refreshment of table with new data.

The data loading usually, requires the data warehouse to be kept off-line such that, no users can access the warehouse at the time of loading. Thus, a specified time must be assigned for loading the data. This can be done by fragmenting the data loading process into smaller number of portions such that, only few tables can be loaded concurrently. Such fragmentation has the following advantages:

(a) It enables the implementation of smaller loads concurrently.

(b) It allows certain portions of data warehouse to be kept on-line, during the processing of data loads in the other portions.

Once the loading process has been completed, the data loads need to be tested, so as to ensure, whether these loads have been correctly done in the data warehouse repository. In addition to this, an agenda/plan is designed to test the quality of the loaded records and the data that could not be loaded at the time of the loading process, can be managed by the same procedures, which can be useful for data loading.This approach is rarely used only after loading trails which are carried out during testing and starting phases of the data warehouse.

2

READ MORE

Recommended Questions

How can we declare and initialize 2D arrays? Explain with examples.?

Explain GPU computing massive parallel produce ?

Explain the catalog of Design pattern?

How many types of software myths are there?

What are the different types of testing? Explain them briefly?

Recommended Files from Library

Information Security

JAVA

ADHOC AND SENSOR NETWORKS

PHP (Brief Information)

JNTUH 3-1 Operating System Syllabus Copy

Useful Files

JNTUH R17 M. Tech II Semester BRIDGE ENGINEERING Examinations

JNTUH BTECH R13 ENGINEERING MECHANICS

Data Mining Lab Manual

Fundamentals of Data Structures

Efficient and Privacy-Aware Data

Users Joined

shishir

3 days ago

hss

4 days ago

srinu

1 week ago

paul

2 weeks ago

Likki

1 month ago

X

Explain in detail about ETL?

X

Explain in detail about ETL?

ETL is a set of Extraction, Transformation and Loading process, which is performed, in the data staging environment. It typically converts the irrelevant data extracted from several operational systems, to the relevant/useful information. This information thus be stored into the data warehouse environment. The warehouse cannot be implemented efficiently, if the data is not collected, cleansed and integrated in an appropriate way. Hence, the ETL process must be efficiently designed and implemented, for an optimized development of data warehouse.

The major steps involved in the ETL process are,

1.Identifying the data tables, that need to be loaded in the data warehouse.

2. Identifying operational systems from which, the data needs to be extracted.

3. Mapping the raw data from the operational systems to the processing data stored in the data warehouse.

4.Identifying rules/standards for extracting data.

5.Identifying rules for cleansing and transforming the data.

6.Designing the aggregate table.

7.Storing the data tables in the data staging environment.

8. Creating procedures/approaches for loading data.

9. Performing ETL for dimension tables.

10. Performing ETL for fact tables.

Data Extraction: The function of the data extraction is to find and extract the data from various relational and non-relational data sources for future use. Data extraction process includes determining the operational systems from which the data can be collected. Capture the appropriate data from data sources to make required data analysis. Identify the extraction frequency of data i.e., daily, weekly or monthly to keep the data warehouse up to date.

Extraction can be carried in two ways. They are,

(i) Full Extraction: Extract the whole data from the data sources systems every time, then compare the extracted data with the previous data to find the changes. This extraction process does not require the tracking of changes at source systems from the last extraction point This process does not require additional information at source systems.

(ii) Incremental Extraction: In this process, the data is extracted beginning at the last successful extraction from the data source. This method requires an additional mechanism like time scheduler or some triggering events. This process can also be referred as change data capture.

Data Transformation: Data transformation is a process in which the data collected from many dissimilar source systems are transformed in accordance to a standardized format. The main purpose of performing a data transformation is to enhance the quality of the data before providing it as input in the data warehouse. Before this transformation process, reliability, consistency and validity of the data are checked.

Data transformation is carried out by following the below steps.

Data elements should be identified that are to be moved to data warehouse.
Calculated and derived data values should be identified.
Query performance is improved by performing aggregation of data depending upon the granularity of data that is decided while design phase of data warehouse.
Next step is to perform data cleaning.
In this step, define the rules for transformation, data mapping criteria and meta data updation.
In this step, data should be restructured.
Data fields from various sources should be combined into one entity.
For the captured data, data values, data types and field lengths should be standardized.
Data integrity should be maintained within the data warehouse.
Change the data according to the structure and format of data warehouse.
Analysis of data warehouse can be made easy by simplifying and re-arranging the individual fields.

Some of the transformation processes are, aggregation, normalization, union, table load up, join, routing of data, sequence generators, enforcing keys etc.

Data Loading: Data loading is a process of loading the extracted and transformed data, into the data warehouse repository. Typically, it is described by using the following three phrases.

(i) Initial Load: This phrase, specifies the initial loading of all the tables in the data warehouse repository i.e., the tables loaded for the first time in the warehouse.

Initial load moves large data within 8-10 hours. The time required for moving the data depends on the amount of history data needed for effective analysis.

(ii) Incremental Load: This phrase describes the frequent or incremental data changes made in the warehouse. Incremental loading can be performed based on two ways. time stamp some FLAG column-pick up those rows which are marked 'Y'.

(iii) Full Refresh: This phrase, describes the elimination of contents of at least one table and the refreshment of table with new data.

The data loading usually, requires the data warehouse to be kept off-line such that, no users can access the warehouse at the time of loading. Thus, a specified time must be assigned for loading the data. This can be done by fragmenting the data loading process into smaller number of portions such that, only few tables can be loaded concurrently. Such fragmentation has the following advantages:

(a) It enables the implementation of smaller loads concurrently.

(b) It allows certain portions of data warehouse to be kept on-line, during the processing of data loads in the other portions.

Once the loading process has been completed, the data loads need to be tested, so as to ensure, whether these loads have been correctly done in the data warehouse repository. In addition to this, an agenda/plan is designed to test the quality of the loaded records and the data that could not be loaded at the time of the loading process, can be managed by the same procedures, which can be useful for data loading.This approach is rarely used only after loading trails which are carried out during testing and starting phases of the data warehouse.

x

Loading...

CLOSE