Book Your slot
X
ONLINE BOOKING
BOOK NOW
OFFLINE BOOKING
Call or WhatsApp 7993732682 (WhatsApp Now), 9177341827 (WhatsApp Now)
search
Menu Login home
  • Questions

  • Library

  • University Updates

  • Informatives

  • Technology Lines

  • Training & Internships

  • X
    Menu
  • Home
  • Privacy Policy
  • Legal Disclaimer
  • Terms & Conditions
  • Return Policy
  • About Us
  • Need any help?? write to us at

    support@engineershub.co

    Follow Us

    X
    LOGIN
    Login to access posts, links, updates, question papers, materials, one liners!
    Use Your Email Address/Mobile and Password to Login
    Forgot Password?
    Not a member? Sign Up
    LOGIN WITH EMAIL/MOBILE
    Forgot Password?
    Go Back
    FORGOT PASSWORD
    Go Back
    RESET PASSWORD
    Go Back
    Continue with LinkedIn
    OR
    Fill Up a Simple Form
    Already a Member? Login
    SIGN UP
    Fill all the below details correctly and click on Next
    Go Back
    Explain in detail about ETL? - EngineersHub
    Go Back
    Question
    Velkanti Varshitha
    4 years ago
    1 Answer(s) posted Write an answer 3961
    Answer
    Read Mode
    Answer posted by Vamshi
    4 years ago

    ETL is a set of Extraction, Transformation and Loading process, which is performed, in the data staging environment. It typically converts the irrelevant data extracted from several operational systems, to the relevant/useful information. This information thus be stored into the data warehouse environment. The warehouse cannot be implemented efficiently, if the data is not collected, cleansed and integrated in an appropriate way. Hence, the ETL process must be efficiently designed and implemented, for an optimized development of data warehouse.

    The major steps involved in the ETL process are,

    1.Identifying the data tables, that need to be loaded in the data warehouse.

    2. Identifying operational systems from which, the data needs to be extracted.

    3. Mapping the raw data from the operational systems to the processing data stored in the data warehouse.

    4.Identifying rules/standards for extracting data.

    5.Identifying rules for cleansing and transforming the data.

    6.Designing the aggregate table.

    7.Storing the data tables in the data staging environment.

    8. Creating procedures/approaches for loading data.

    9. Performing ETL for dimension tables.

    10. Performing ETL for fact tables.

    Data Extraction: The function of the data extraction is to find and extract the data from various relational and non-relational data sources for future use. Data extraction process includes determining the operational systems from which the data can be collected. Capture the appropriate data from data sources to make required data analysis. Identify the extraction frequency of data i.e., daily, weekly or monthly to keep the data warehouse up to date.

    Extraction can be carried in two ways. They are,

    (i) Full Extraction: Extract the whole data from the data sources systems every time, then compare the extracted data with the previous data to find the changes. This extraction process does not require the tracking of changes at source systems from the last extraction point This process does not require additional information at source systems.

    (ii) Incremental Extraction: In this process, the data is extracted beginning at the last successful extraction from the data source. This method requires an additional mechanism like time scheduler or some triggering events. This process can also be referred as change data capture.

    Data Transformation: Data transformation is a process in which the data collected from many dissimilar source systems are transformed in accordance to a standardized format. The main purpose of performing a data transformation is to enhance the quality of the data before providing it as input in the data warehouse. Before this transformation process, reliability, consistency and validity of the data are checked.

    Data transformation is carried out by following the below steps.

    • Data elements should be identified that are to be moved to data warehouse.

    • Calculated and derived data values should be identified.

    • Query performance is improved by performing aggregation of data depending upon the granularity of data that is decided while design phase of data warehouse.

    • Next step is to perform data cleaning.

    • In this step, define the rules for transformation, data mapping criteria and meta data updation.

    • In this step, data should be restructured.

    • Data fields from various sources should be combined into one entity.

    • For the captured data, data values, data types and field lengths should be standardized.

    • Data integrity should be maintained within the data warehouse.

    • Change the data according to the structure and format of data warehouse.

    • Analysis of data warehouse can be made easy by simplifying and re-arranging the individual fields.

    Some of the transformation processes are, aggregation, normalization, union, table load up, join, routing of data, sequence generators, enforcing keys etc.

    Data Loading: Data loading is a process of loading the extracted and transformed data, into the data warehouse repository. Typically, it is described by using the following three phrases.

    (i) Initial Load: This phrase, specifies the initial loading of all the tables in the data warehouse repository i.e., the tables loaded for the first time in the warehouse.

    Initial load moves large data within 8-10 hours. The time required for moving the data depends on the amount of history data needed for effective analysis.

    (ii) Incremental Load: This phrase describes the frequent or incremental data changes made in the warehouse. Incremental loading can be performed based on two ways. time stamp some FLAG column-pick up those rows which are marked 'Y'.

    (iii) Full Refresh: This phrase, describes the elimination of contents of at least one table and the refreshment of table with new data.

    The data loading usually, requires the data warehouse to be kept off-line such that, no users can access the warehouse at the time of loading. Thus, a specified time must be assigned for loading the data. This can be done by fragmenting the data loading process into smaller number of portions such that, only few tables can be loaded concurrently. Such fragmentation has the following advantages:

    (a) It enables the implementation of smaller loads concurrently.

    (b) It allows certain portions of data warehouse to be kept on-line, during the processing of data loads in the other portions. 

    Once the loading process has been completed, the data loads need to be tested, so as to ensure, whether these loads have been correctly done in the data warehouse repository. In addition to this, an agenda/plan is designed to test the quality of the loaded records and the data that could not be loaded at the time of the loading process, can be managed by the same procedures, which can be useful for data loading.This approach is rarely used only after loading trails which are carried out during testing and starting phases of the data warehouse.

    X
    Explain in detail about ETL?
    X
    Explain in detail about ETL?

    ETL is a set of Extraction, Transformation and Loading process, which is performed, in the data staging environment. It typically converts the irrelevant data extracted from several operational systems, to the relevant/useful information. This information thus be stored into the data warehouse environment. The warehouse cannot be implemented efficiently, if the data is not collected, cleansed and integrated in an appropriate way. Hence, the ETL process must be efficiently designed and implemented, for an optimized development of data warehouse.

    The major steps involved in the ETL process are,

    1.Identifying the data tables, that need to be loaded in the data warehouse.

    2. Identifying operational systems from which, the data needs to be extracted.

    3. Mapping the raw data from the operational systems to the processing data stored in the data warehouse.

    4.Identifying rules/standards for extracting data.

    5.Identifying rules for cleansing and transforming the data.

    6.Designing the aggregate table.

    7.Storing the data tables in the data staging environment.

    8. Creating procedures/approaches for loading data.

    9. Performing ETL for dimension tables.

    10. Performing ETL for fact tables.

    Data Extraction: The function of the data extraction is to find and extract the data from various relational and non-relational data sources for future use. Data extraction process includes determining the operational systems from which the data can be collected. Capture the appropriate data from data sources to make required data analysis. Identify the extraction frequency of data i.e., daily, weekly or monthly to keep the data warehouse up to date.

    Extraction can be carried in two ways. They are,

    (i) Full Extraction: Extract the whole data from the data sources systems every time, then compare the extracted data with the previous data to find the changes. This extraction process does not require the tracking of changes at source systems from the last extraction point This process does not require additional information at source systems.

    (ii) Incremental Extraction: In this process, the data is extracted beginning at the last successful extraction from the data source. This method requires an additional mechanism like time scheduler or some triggering events. This process can also be referred as change data capture.

    Data Transformation: Data transformation is a process in which the data collected from many dissimilar source systems are transformed in accordance to a standardized format. The main purpose of performing a data transformation is to enhance the quality of the data before providing it as input in the data warehouse. Before this transformation process, reliability, consistency and validity of the data are checked.

    Data transformation is carried out by following the below steps.

    • Data elements should be identified that are to be moved to data warehouse.

    • Calculated and derived data values should be identified.

    • Query performance is improved by performing aggregation of data depending upon the granularity of data that is decided while design phase of data warehouse.

    • Next step is to perform data cleaning.

    • In this step, define the rules for transformation, data mapping criteria and meta data updation.

    • In this step, data should be restructured.

    • Data fields from various sources should be combined into one entity.

    • For the captured data, data values, data types and field lengths should be standardized.

    • Data integrity should be maintained within the data warehouse.

    • Change the data according to the structure and format of data warehouse.

    • Analysis of data warehouse can be made easy by simplifying and re-arranging the individual fields.

    Some of the transformation processes are, aggregation, normalization, union, table load up, join, routing of data, sequence generators, enforcing keys etc.

    Data Loading: Data loading is a process of loading the extracted and transformed data, into the data warehouse repository. Typically, it is described by using the following three phrases.

    (i) Initial Load: This phrase, specifies the initial loading of all the tables in the data warehouse repository i.e., the tables loaded for the first time in the warehouse.

    Initial load moves large data within 8-10 hours. The time required for moving the data depends on the amount of history data needed for effective analysis.

    (ii) Incremental Load: This phrase describes the frequent or incremental data changes made in the warehouse. Incremental loading can be performed based on two ways. time stamp some FLAG column-pick up those rows which are marked 'Y'.

    (iii) Full Refresh: This phrase, describes the elimination of contents of at least one table and the refreshment of table with new data.

    The data loading usually, requires the data warehouse to be kept off-line such that, no users can access the warehouse at the time of loading. Thus, a specified time must be assigned for loading the data. This can be done by fragmenting the data loading process into smaller number of portions such that, only few tables can be loaded concurrently. Such fragmentation has the following advantages:

    (a) It enables the implementation of smaller loads concurrently.

    (b) It allows certain portions of data warehouse to be kept on-line, during the processing of data loads in the other portions. 

    Once the loading process has been completed, the data loads need to be tested, so as to ensure, whether these loads have been correctly done in the data warehouse repository. In addition to this, an agenda/plan is designed to test the quality of the loaded records and the data that could not be loaded at the time of the loading process, can be managed by the same procedures, which can be useful for data loading.This approach is rarely used only after loading trails which are carried out during testing and starting phases of the data warehouse.

    EngineersHub Logo
    x
    Loading...