Storage indexes are an Exadata feature designed to enhance performance by reducing disk I/O.
Unlike traditional B-tree or bitmap indexes, storage indexes do not store specific values to help find records; instead, they track the minimum, maximum, and null values of data in 1MB chunks of storage.
Here’s how they work:
- Pre-filtering Mechanism: Storage indexes act as a pre-filter. When a query is executed, Exadata checks the predicates (e.g., WHERE clause conditions) against the metadata stored in the storage indexes (min, max, and null values). If the storage index shows that no data in a 1MB storage chunk matches the query’s conditions, that chunk is skipped, eliminating the need to read it from disk.
- Memory-Only Feature: Storage indexes are held entirely in memory on the storage cells and never written to disk. This makes them volatile, meaning they must be rebuilt after a storage server is restarted.
- Eight Columns: A storage index can track up to eight columns from a table. It does not store every column’s details but dynamically maintains those that are most frequently queried.
- Query Optimization: Storage indexes are particularly beneficial when used with Smart Scans, a feature in Exadata that allows query processing to happen on the storage server. By pre-filtering irrelevant data, storage indexes help Smart Scans reduce the amount of data processed and returned to the database layer.
- Handling of Nulls: Storage indexes also provide a unique optimization for queries that involve null values, allowing for faster searches on columns that contain nulls by eliminating non-null regions from being read.
Table of Contents
In practice, storage indexes can deliver dramatic performance improvements.
They are most effective when the data in the storage regions is well-clustered or sorted.
However, there are limitations: they do not work with certain data types (e.g., CLOBs) or comparison operators (e.g., “!=”).
In summary, storage indexes help reduce disk I/O by allowing the Exadata system to skip reading storage regions that cannot contain relevant data, which results in significant query performance gains.
Example of Exadata storage index
Here’s an example to illustrate how storage indexes work in Exadata:
Scenario
Let’s assume you have a table orders
that contains the following columns:
order_id
order_date
customer_id
total_amount
The table has 1 million rows, and the data is distributed across multiple storage regions on an Exadata system.
Query
You run the following query to retrieve orders placed in a specific date range:
SELECT *
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
Storage Index Behavior
- Storage Index Creation: Exadata automatically creates storage indexes when the table is accessed frequently.
For each 1MB chunk of data stored on the storage cells, Exadata records the minimum and maximum values for theorder_date
column.
Example storage index for a storage region:- Region 1:
MIN(order_date) = '2023-11-15', MAX(order_date) = '2024-02-10'
- Region 2:
MIN(order_date) = '2024-03-01', MAX(order_date) = '2024-03-31'
- Region 3:
MIN(order_date) = '2023-10-05', MAX(order_date) = '2023-11-01'
- Region 1:
- Query Execution: When you run the query to find orders from January 2024, Exadata looks at the storage indexes first.
- Region 1 contains dates between
'2023-11-15'
and'2024-02-10'
, so this region will be read since it includes part of January 2024. - Region 2 contains dates between
'2024-03-01'
and'2024-03-31'
, so Exadata skips this region entirely because it falls outside the query range. - Region 3 contains dates between
'2023-10-05'
and'2023-11-01'
, so Exadata skips this region too, as it is irrelevant to the query.
- Region 1 contains dates between
- Data Reduction: Instead of scanning the entire table, Exadata only reads the relevant regions from disk.
This reduces I/O, speeds up the query, and optimizes system performance.
Benefit
In this case, because Exadata skipped irrelevant storage regions (Region 2 and Region 3), the query finishes faster, resulting in significantly lower I/O costs.
This is an example of how storage indexes help Exadata bypass unnecessary data and improve query performance.
DDL of table and index creation:
Here’s an example of the Data Definition Language (DDL) to create the orders
table and how an index would be created on it (if necessary).
Note that storage indexes in Exadata are not explicitly created by a DBA.
They are automatically managed by Exadata’s storage software based on the data being accessed.
However, I’ll also show how you would typically create an index if you’re dealing with a non-Exadata environment.
Example DDL for the orders
Table
CREATE TABLE orders (
order_id NUMBER PRIMARY KEY,
order_date DATE,
customer_id NUMBER,
total_amount NUMBER(10, 2)
);
This creates a basic orders
table with a primary key
on order_id
.
The order_date
column will be the one involved in query filtering for the example.
Note on Storage Indexes in Exadata
In Exadata, you do not manually create storage indexes.
Exadata automatically manages them in memory based on queries that frequently access certain columns.
You can’t control the storage index creation like you do with traditional indexes (e.g., B-tree or bitmap).
Example of Traditional Index Creation (if using a non-Exadata system)
If you wanted to create a traditional B-tree index on order_date
, you would write something like this:
CREATE INDEX idx_order_date
ON orders (order_date);
This traditional index helps optimize queries that filter on the order_date
, but it operates differently from storage indexes.
How Storage Indexes Work in Exadata
With Exadata, there’s no need to create an explicit index on order_date
to benefit from storage indexes.
The Exadata system dynamically creates and uses storage indexes at the storage cell level whenever it detects that a certain column (like order_date
) is frequently queried.
It tracks the minimum and maximum values in each storage block, allowing the system to skip irrelevant blocks during a query.
To summarize:
- Traditional Index: You create this manually (as shown in the example above).
- Storage Index (Exadata): Managed automatically by the system, no DDL required.
If you’re running your database on an Exadata machine, you just need to run your queries, and Exadata will automatically use storage indexes to optimize them based on query patterns.
How to check if a storage index has been used
In Oracle Exadata, storage indexes are managed automatically by the system, but you can monitor and check whether storage indexes are being utilized during query execution using various monitoring tools and views.
Here are a few ways to verify if a storage index has been created or utilized during query execution:
1. V$SESSTAT View
You can check the statistics of storage index usage per session using the V$SESSTAT
view, which contains session-level statistics.
Specifically, the statistic cell physical IO bytes saved by storage index
will indicate if storage indexes have been used.
Here is a query you can use to check if storage indexes were used by a particular session:
SELECT s.sid,
s.value AS "Bytes Saved by Storage Index"
FROM v$sesstat s
JOIN v$statname n ON s.statistic# = n.statistic#
WHERE n.name = 'cell physical IO bytes saved by storage index';
If the query returns a positive value in the Bytes Saved by Storage Index
column, it indicates that storage indexes were used during query execution for that session.
2. Exadata-Specific Wait Events
Another indicator of storage index usage is the reduction in I/O-related wait events.
You can monitor specific wait events in Exadata, such as:
cell smart table scan
cell smart index scan
These events indicate that Smart Scans are being performed and, by extension, storage indexes may have been involved in reducing the data to be scanned.
To check for Exadata-specific wait events:
SELECT event, total_waits, time_waited
FROM v$session_event
WHERE event LIKE 'cell smart%';
A high number of waits for cell smart table scan
may indicate that storage indexes have been used to skip irrelevant data during the scan.
3. CellCLI Command (on the Storage Cells)
If you have access to the storage cells themselves, you can use the CellCLI (Cell Command-Line Interface) to monitor and analyze the performance of the storage cells. This includes checking if storage indexes are being utilized.
To list the current statistics for the storage indexes on a storage cell, you can run the following CellCLI command:
CellCLI> list metriccurrent where objectType = 'CELLIODATA' and name = 'DB_IO_BY_STORIDX'
This will show metrics related to storage index usage on that particular storage cell.
Look for values in the DB_IO_BY_STORIDX
column, which indicates how much data has been saved by the use of storage indexes.
4. SQL Monitoring Report
When a query that is eligible for Smart Scan (and potentially storage index usage) is executed, you can generate a SQL Monitoring Report to get detailed information about the execution.
The SQL Monitoring report will show you:
- Whether Smart Scan was used.
- How much data was filtered out using the storage indexes.
To generate the SQL Monitoring report, you can use the following SQL command:
SELECT dbms_sqltune.report_sql_monitor(sql_id => 'your_sql_id')
FROM dual;
This report will provide insight into whether storage indexes were used, by looking at the reduction in data processed during Smart Scans.
5. DBA_HIST_SYSSTAT View (Historical Monitoring)
To check the historical usage of storage indexes over time, you can query the DBA_HIST_SYSSTAT
view, which records system-wide statistics over a period of time.
SELECT s.snap_id,
s.instance_number,
ss.value AS "Storage Index IO Bytes Saved"
FROM dba_hist_sysstat ss
JOIN dba_hist_snapshot s ON ss.snap_id = s.snap_id
WHERE ss.stat_name = 'cell physical IO bytes saved by storage index';
This query will provide an overview of how many bytes of physical I/O were saved by storage indexes over time, as recorded in the AWR (Automatic Workload Repository) snapshots.
Summary
You cannot manually create storage indexes in Exadata, but you can monitor and verify their usage by:
- Checking session-level statistics (
V$SESSTAT
). - Analyzing Exadata wait events (
cell smart table scan
). - Using CellCLI commands on the storage cells.
- Generating SQL Monitoring Reports.
- Looking at historical system statistics (
DBA_HIST_SYSSTAT
).
These tools will allow you to see if and when storage indexes were created and utilized to improve query performance.
Do you need to create indexes in Exadata
Not necessarily.
While storage indexes in Exadata are automatically managed and dynamically created by the system to optimize performance, traditional indexes (like B-tree or bitmap indexes) can still be beneficial in some cases.
Here’s an explanation of when you might and might not need to create traditional indexes on Exadata:
1. When Storage Indexes are Sufficient
- Sequential Data Access: If most of your queries involve scanning large amounts of data sequentially (e.g., data warehouse queries with full table scans), Exadata’s Smart Scans and storage indexes can be extremely effective in reducing I/O and improving performance.
- Range Queries: For range queries (like date ranges), storage indexes can help skip irrelevant blocks and minimize disk I/O without the need for traditional indexes.
- Dynamic Nature: Storage indexes are dynamically created and maintained in memory, which means they can adjust based on workload patterns without requiring DBA intervention.
2. When Traditional Indexes are Still Needed
- OLTP (Online Transaction Processing) Workloads: In OLTP environments, where queries often access individual rows or small sets of data (e.g., looking up a customer by ID), traditional B-tree indexes may still be necessary.
Exadata’s storage indexes are most effective in reducing I/O during full scans, but they are not used in single-row lookups. - Join Conditions: If your queries involve joins on non-primary key columns, traditional indexes (e.g., a B-tree or bitmap index) can help improve join performance by reducing the number of rows scanned.
- Data Updates: In environments with frequent updates or deletes, traditional indexes might help to ensure efficient row access and maintenance operations.
- Non-Eligible Smart Scan Queries: Not all queries can use Smart Scan (e.g., queries against small tables or those involving complex operators).
For those queries, traditional indexes can still help.
3. Special Cases Where You May Still Want Traditional Indexes
- Unique Constraints: Creating indexes on columns with unique constraints (e.g., unique keys, primary keys) is still necessary to enforce data integrity.
- Foreign Key Relationships: Indexes on foreign keys can still be useful to avoid full table scans during updates or deletes on parent tables.
- Non-Exadata Systems: If you have mixed environments where data might be accessed on both Exadata and non-Exadata systems, traditional indexes might still be necessary to optimize performance on non-Exadata platforms.
4. Mixed Workloads
For systems that have mixed workloads (a combination of OLTP and reporting), traditional indexes can complement Exadata’s storage indexes.
For example:
- OLTP workloads benefit from traditional indexes for quick row lookups.
- Analytical queries benefit from Exadata’s storage indexes for efficient large-scale scans.
Conclusion:
While storage indexes can reduce the need for traditional indexes in Exadata, they are not a complete replacement. You should still consider creating traditional indexes for:
- OLTP environments or mixed workloads.
- Queries that require precise row lookups.
- Enforcing unique constraints and foreign key relationships.
In a data warehouse environment where full table scans are common, storage indexes, combined with Smart Scan and Exadata’s optimizations, may greatly reduce the need for traditional indexing.
However, for OLTP workloads or cases involving selective queries or join conditions, traditional indexes may still be essential for optimal performance.