Efficient Data Cube Computation Data cube can be viewed as a lattice of cuboids The bottom-most cuboid is the base cuboid The top-most cuboid (apex) contains only one cell How many cuboids in an n-dimensional cube with Llevels? Materialization of data cube Materialize every (cuboid) (full materialization), none (nomaterialization), or some (partial materialization) Selection of which cuboids to materialize Based on size, sharing, access frequency, etc.
Cube Operation Cube definition and computation in DMQLdefine cube sales[item, city, year]: sum(sales_in_dollars)compute cube sales Transform it into a SQL-like language (with a new operatorcube by, introduced by Gray et al.’96)SELECT item, city, year, SUM (amount)FROM SALESCUBE BY item, city, year Need compute the following Group-Bys(date, product, customer),(date,product),(date, customer), (product, customer),(date), (product), (customer)
Cube Computation: ROLAP-Based Method Efficient cube computation methods ROLAP-based cubing algorithms (Agarwal et al’96) Array-based cubing algorithm (Zhao et al’97) Bottom-up computation method (Beyer & Ramarkrishnan’99) H-cubing technique (Han, Pei, Dong & Wang:SIGMOD’01) ROLAP-based cubing algorithms Sorting, hashing, and grouping operations are applied to thedimension attributes in order to reorder and cluster relatedtuples Grouping is performed on some sub-aggregates as a “partialgrouping step” Aggregates may be computed from previously computedaggregates, rather than from the base fact table42Cube Computation: ROLAP-Based Method (2) This is not in the textbook but in a research paper Hash/sort based methods (Agarwal et. al. VLDB’96) Smallest-parent: computing a cuboid from thesmallest, previously computed cuboid Cache-results: caching results of a cuboid from whichother cuboids are computed to reduce disk I/Os Amortize-scans: computing as many as possiblecuboids at the same time to amortize disk reads Share-sorts: sharing sorting costs cross multiplecuboids when sort-based method is used Share-partitions: sharing the partitioning cost acrossmultiple cuboids when hash-based algorithms are used43Multi-way Array Aggregation for CubeComputation Partition arrays into chunks (a small subcube which fits in memory). Compressed sparse array addressing: (chunk_id, offset) Compute aggregates in “multiway” by visiting cube cells in the order whichminimizes the # of times to visit each cell, and reduces memory access andstorage cost.
Multi-Way Array Aggregation forCube Computation (Cont.) Method: the planes should be sorted and computedaccording to their size in ascending order. See the details of Example 2.12 (pp. 75-78) Idea: keep the smallest plane in the main memory,fetch and compute only one chunk at a time for thelargest plane Limitation of the method: computing well only for a smallnumber of dimensions If there are a large number of dimensions, “bottomupcomputation” and iceberg cube computationmethods can be explored47Indexing OLAP Data: Bitmap Index Index on a particular column Each value in the column has a bit vector: bit-op is fast The length of the bit vector: # of records in the base table The i-th bit is set if the i-th row of the base table has the value forthe indexed column not suitable for high cardinality domains
Indexing OLAP Data: Join Indices Join index: JI(R-id, S-id) where R (R-id, …) S(S-id, …) Traditional indices map the values to a list ofrecord ids It materializes relational join in JI file andspeeds up relational join — a rather costlyoperation In data warehouses, join index relates the valuesof the dimensions of a start schema to rows inthe fact table. E.g. fact table: Sales and two dimensions cityand product A join index on city maintains for eachdistinct city a list of R-IDs of the tuplesrecording the Sales in the city Join indices can span multiple dimensions
Efficient Processing OLAP Queries Determine which operations should be performed on theavailable cuboids: transform drill, roll, etc. into corresponding SQL and/orOLAP operations, e.g, dice = selection + projection Determine to which materialized cuboid(s) the relevantoperations should be applied. Exploring indexing structures and compressed vs. densearray structures in MOLAP50Metadata Repository Meta data is the data defining warehouse objects. It has the followingkinds Description of the structure of the warehouse schema, view, dimensions, hierarchies, derived data defn, data martlocations and contents Operational meta-data data lineage (history of migrated data and transformation path),currency of data (active, archived, or purged), monitoring information(warehouse usage statistics, error reports, audit trails) The algorithms used for summarization The mapping from operational environment to the data warehouse Data related to system performance warehouse schema, view and derived data definitions Business data business terms and definitions, ownership of data, charging policies51Data Warehouse Back-End Tools and Utilities Data extraction: get data from multiple, heterogeneous, and externalsources Data cleaning: detect errors in the data and rectify them when possible Data transformation: convert data from legacy or host format to warehouseformat Load: sort, summarize, consolidate, compute views, checkintegrity, and build indicies and partitions Refresh propagate the updates from the data sources to thewarehouse
[Button id=”1″]
Quality and affordable writing services. Our papers are written to meet your needs, in a personalized manner. You can order essays, annotated bibliography, discussion, research papers, reaction paper, article critique, coursework, projects, case study, term papers, movie review, research proposal, capstone project, speech/presentation, book report/review, and more.
Need Help? Click On The Order Now Button For Help
What Students Are Saying About Us
.......... Customer ID: 12*** | Rating: ⭐⭐⭐⭐⭐"Honestly, I was afraid to send my paper to you, but splendidwritings.com proved they are a trustworthy service. My essay was done in less than a day, and I received a brilliant piece. I didn’t even believe it was my essay at first 🙂 Great job, thank you!"
.......... Customer ID: 14***| Rating: ⭐⭐⭐⭐⭐
"The company has some nice prices and good content. I ordered a term paper here and got a very good one. I'll keep ordering from this website."