Wednesday, April 24, 2013

Vertica Analytics Database:Session I ---> Learning Projections

The Power of Projections – Part 1

Many of us here at Vertica have been amazed and frankly flattered at how much FUD our competitors are putting out there regarding Vertica’s “projections”.  Having heard some incredibly inaccurate statements about them, we’ve decided to clarify what they are, how and why we have them, and the advantages they bring.
  Actually, projections are a pivotal component of our platform, and a major area of differentiation from the competition.  Most importantly, Vertica’s customers love the benefits projections bring! In an effort to provide you with as much detail as possible, this blog is broken up into three posts with Parts II and III being more technical.
First, some background. In traditional database architectures, data is primarily stored in tables. Additionally, secondary tuning structures such as indexes and materialized views are created for improved query performance.  Secondary structures like MVs and indexes have drawbacks – for instance they are expensive to maintain during data load (more detail on this in Part III).  Hence best practices often require rebuilding them during nightly batch windows, which prevents the ability to do real-time analytics.  Also, it isn’t uncommon to find data warehouse implementations that balloon to 3-6x base table sizes due to these structures. As a result, customers are often forced to remove valuable detailed data and replace it with aggregated data to solve this problem. However you can’t monetize what you lost!
Vertica created a superior solution by optimizing around performance, storage footprint, flexibility and simplicity. We removed the trade-off between performance and data size by using projections as the lynchpin of our purpose-built architecture.  Physical storage consists of optimized collections of table columns, which we call “projections”. In the traditional sense, Vertica has no raw uncompressed base tables, no materialized views, and no indexes. As a result there are no complex choices – everything is a projection!  Of course, your logical schema (we support any) remains the same as with any other database so that importing data is a cinch.  Furthermore, you still work with standard SQL/DDL (i.e. Create Table statements, etc).  The magic of projections and Vertica are what we do under the covers for you with the physical storage objects.  We provide the same benefits as indexes without all of the baggage.  We also provide an automatic tool, the Database Designer (more on this in Part II) to create projections automatically.
Projections store data in formats that optimize query execution. They share one similarity to materialized views in that they store data sets on disk rather than compute them each time they are used in a query (e.g. physical storage).  However, projections aren’t aggregated but rather store every row in a table, e.g. the full atomic detail. The data sets are automatically refreshed whenever data values are inserted, appended, or changed – again, all of this happens beneath the covers without user intervention – unlike materialized views. Projections provide the following benefits:
  • • Projections are transparent to end-users and SQL. The Vertica query optimizer automatically picks the best projections to use for any query.
  • • Projections allow for the sorting of data in any order (even if different from the source tables). This enhances query performance and compression.
  • • Projections deliver high availability optimized for performance, since the redundant copies of data are always actively used in analytics.  We have the ability to automatically store the redundant copy using a different sort order.  This provides the same benefits as a secondary index in a more efficient manner.
  • • Projections do not require a batch update window.  Data is automatically available upon loads.
  • • Projections are dynamic and can be added/changed on the fly without stopping the database.
In summary, Vertica’s projections represent collections of columns (okay so it is a table!), but are optimized for analytics at the physical storage structure level and are not constrained by the logical schema.  This allows for much more freedom and optimization without having to change the actual schema that certain applications are built upon.

Hopefully this gave you an overview of what projections are and how they work.  Please read Part II and Part III of this post to drill down into projections even further.

No comments:

Post a Comment