DATA WAREHOUSE IMPLEMENTATION

Efficient Data Cube Computation Data cube can be viewed as a lattice of cuboids The bottom-most cuboid is the base cuboid The top-most cuboid (apex) contains only one cell How many cuboids in an n-dimensional cube with Llevels? Materialization of data cube Materialize every (cuboid) (full materialization), none (nomaterialization), or some (partial materialization) Selection of which cuboids to materialize Based on size, sharing, access frequency, etc.
Cube Operation Cube definition and computation in DMQLdefine cube sales[item, city, year]: sum(sales_in_dollars)compute cube sales Transform it into a SQL-like language (with a new operatorcube by, introduced by Gray et al.’96)SELECT item, city, year, SUM (amount)FROM SALESCUBE BY item, city, year Need compute the following Group-Bys(date, product, customer),(date,product),(date, customer), (product, customer),(date), (product), (customer)
Cube Computation: ROLAP-Based Method Efficient cube computation methods ROLAP-based cubing algorithms (Agarwal et al’96) Array-based cubing algorithm (Zhao et al’97) Bottom-up computation method (Beyer & Ramarkrishnan’99) H-cubing technique (Han, Pei, Dong & Wang:SIGMOD’01) ROLAP-based cubing algorithms Sorting, hashing, and grouping operations are applied to thedimension attributes in order to reorder and cluster relatedtuples Grouping is performed on some sub-aggregates as a “partialgrouping step” Aggregates may be computed from previously computedaggregates, rather than from the base fact table42Cube Computation: ROLAP-Based Method (2) This is not in the textbook but in a research paper Hash/sort based methods (Agarwal et. al. VLDB’96) Smallest-parent: computing a cuboid from thesmallest, previously computed cuboid Cache-results: caching results of a cuboid from whichother cuboids are computed to reduce disk I/Os Amortize-scans: computing as many as possiblecuboids at the same time to amortize disk reads Share-sorts: sharing sorting costs cross multiplecuboids when sort-based method is used Share-partitions: sharing the partitioning cost acrossmultiple cuboids when hash-based algorithms are used43Multi-way Array Aggregation for CubeComputation Partition arrays into chunks (a small subcube which fits in memory). Compressed sparse array addressing: (chunk_id, offset) Compute aggregates in “multiway” by visiting cube cells in the order whichminimizes the # of times to visit each cell, and reduces memory access andstorage cost.
Multi-Way Array Aggregation forCube Computation (Cont.) Method: the planes should be sorted and computedaccording to their size in ascending order. See the details of Example 2.12 (pp. 75-78) Idea: keep the smallest plane in the main memory,fetch and compute only one chunk at a time for thelargest plane Limitation of the method: computing well only for a smallnumber of dimensions If there are a large number of dimensions, “bottomupcomputation” and iceberg cube computationmethods can be explored47Indexing OLAP Data: Bitmap Index Index on a particular column Each value in the column has a bit vector: bit-op is fast The length of the bit vector: # of records in the base table The i-th bit is set if the i-th row of the base table has the value forthe indexed column not suitable for high cardinality domains
Indexing OLAP Data: Join Indices Join index: JI(R-id, S-id) where R (R-id, …)   S(S-id, …) Traditional indices map the values to a list ofrecord ids It materializes relational join in JI file andspeeds up relational join — a rather costlyoperation In data warehouses, join index relates the valuesof the dimensions of a start schema to rows inthe fact table. E.g. fact table: Sales and two dimensions cityand product A join index on city maintains for eachdistinct city a list of R-IDs of the tuplesrecording the Sales in the city Join indices can span multiple dimensions
Efficient Processing OLAP Queries Determine which operations should be performed on theavailable cuboids: transform drill, roll, etc. into corresponding SQL and/orOLAP operations, e.g, dice = selection + projection Determine to which materialized cuboid(s) the relevantoperations should be applied. Exploring indexing structures and compressed vs. densearray structures in MOLAP50Metadata Repository Meta data is the data defining warehouse objects. It has the followingkinds Description of the structure of the warehouse schema, view, dimensions, hierarchies, derived data defn, data martlocations and contents Operational meta-data data lineage (history of migrated data and transformation path),currency of data (active, archived, or purged), monitoring information(warehouse usage statistics, error reports, audit trails) The algorithms used for summarization The mapping from operational environment to the data warehouse Data related to system performance warehouse schema, view and derived data definitions Business data business terms and definitions, ownership of data, charging policies51Data Warehouse Back-End Tools and Utilities Data extraction: get data from multiple, heterogeneous, and externalsources Data cleaning: detect errors in the data and rectify them when possible Data transformation: convert data from legacy or host format to warehouseformat Load: sort, summarize, consolidate, compute views, checkintegrity, and build indicies and partitions Refresh propagate the updates from the data sources to thewarehouse

[Button id=”1″]

Quality and affordable writing services. Our papers are written to meet your needs, in a personalized manner. You can order essays, annotated bibliography, discussion, research papers, reaction paper, article critique, coursework, projects, case study, term papers, movie review, research proposal, capstone project, speech/presentation, book report/review, and more.
Need Help? Click On The Order Now Button For Help

What Students Are Saying About Us

.......... Customer ID: 12*** | Rating: ⭐⭐⭐⭐⭐
"Honestly, I was afraid to send my paper to you, but splendidwritings.com proved they are a trustworthy service. My essay was done in less than a day, and I received a brilliant piece. I didn’t even believe it was my essay at first 🙂 Great job, thank you!"

.......... Customer ID: 14***| Rating: ⭐⭐⭐⭐⭐
"The company has some nice prices and good content. I ordered a term paper here and got a very good one. I'll keep ordering from this website."

"Order a Custom Paper on Similar Assignment! No Plagiarism! Enjoy 20% Discount"