Wednesday, April 24, 2013

Vertica Analytics Database:Session III ---> Learning Projections


The Power of Projections – Part 3

In Part I and Part II of this post, we introduced you to Vertica’s projections, and described how easy it is to interface with them directly via SQL or through our Database Designer™ tool.

We will now end this series by comparing and contrasting Vertica’s projections with traditional indexes and materialized views.
Row-store databases often use Btree indexes as a performance enhancement.  Btree indexes are designed for highly concurrent single-record inserts and updates, e.g. an OLTP scenario. Most data warehouse practitioners would agree that index rebuilds after a batch load are preferable to taking the hit of maintaining them record by record given the logging overhead. Bitmap indexes are designed for bulk loads and are better than btrees for data warehousing but only suitable for low cardinality columns and a certain class of queries.  Even though you have these indexes to help find data, you still have to go to the base table to get the actual data, which brings with it all the disadvantages of a row store.
In a highly simplified view, you can think of a Vertica projection as a single level, densely packed, clustered index which stores the actual data values, is never updated in place, and has no logging. Any “maintenance” such as merging sorted chunks or purging deleted records is done as automatic and background activity, not in the path of real-time loads.  So yes, projections are a type of native index if you will, but they are very different from traditional indexes like Bitmap and Btrees.
Vertica also offers a unique feature known as “pre-join projections”. Pre-join projections denormalize tables at the physical layer under the covers providing a significant performance advantage over joining tables at run-time.  Pre-join projections automatically store the results of a join ahead of time, yet the logical schema is maintained – again, flexibility of the storage structure without having to rewrite your ETL or application.  Vertica can get away with this because it excels at sparse data storage, and in particular isn’t penalized at all for null values nor for wide fact tables. Since Vertica does not charge extra for additional projections, this is a great way to reap the benefits of denormalization without the need to purchase a larger capacity license.
So to sum up, here’s how Vertica projections stack up versus materialized views and conventional indexes.

Vertica’s Projections

Traditional Materialized Views

Traditional Indexes

  • Are primary storage – no base tables are required
  • Are secondary storage
  • Are secondary storage pointing to base table data
  • Can be segmented, partitioned, sorted, compressed and encoded to suit your needs
  • Are rigid: Practically limited to columns and query needs, more columns = more I/O
  • Support one clustered index at most – tough to scale out
  • Have a simple physical design
  • Use Aggregation losing valuable detail
  • Require complex design choices
  • Are efficient to load & maintain
  • Are mostly batch updated
  • Are expensive to update
  • Are versatile – they can support any data model
  • Provide high data latency
  • Provide high data latency
  • Allow you to work with the detailed data
  • Provide near-real time low data latency
  • Combine high availability with special optimizations for query performance
.
That’s pretty much all there is to it.  Whether you are running ad-hoc queries or canned operational-BI workloads, you will find projections to be a very powerful backbone for getting the job done!

No comments:

Post a Comment