Wednesday, April 24, 2013

Vertica Analytics Database:Session II ---> Learning Projections


The Power of Projections – Part 2

In Part I of this post, we introduced you to the simple concept of Vertica’s projections.  Now that you have an understanding of what they are, we wanted to go into more detail on how users interface with them, and introduce you to Vertica’s unique Database Designer tool.
For each table in the database, Vertica requires a minimum of one projection, called a “superprojection”. A superprojection is a projection for a single table that contains all the columns and rows in the table.  Although the data may be the same as a traditional base table, it has the advantages of segmentation (spreading the data evenly across the nodes in the cluster), sorting, and encoding (compressing the size of the data on a per column basis).  This leads to significant footprint reduction as well as load and query performance enhancements.  To give you a sense of the impact that Vertica’s projections have on size, most Vertica customers have at least a 50% reduction in footprint thanks to our compression.  This includes the high availability copy and on average 3-5 projections.  Again, contrast this to traditional row-store databases ballooning upwards of 5x their original size and that is a 10:1 difference in Vertica.
To get your database up and running quickly, Vertica automatically creates a default superprojection for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE statements. This means that if database admins and users never want to know about a projection, they don’t have to – Vertica will automatically handle it under the covers. To further illustrate this point, users can simply pass in projection parameters such as Order By, Encodings, Segmentation, High Availability, and Partitioning right after the CREATE TABLE statement, never interfacing directly with a projection under the hood.
By creating a superprojection for each table in the database, Vertica ensures that all SQL queries can be answered. Default superprojections alone will do far better than a row-store, however, by themselves they may not fully optimize database performance and Vertica’s full potential.  Vertica recommends that you start with the default projections and then use Vertica’s nifty Database Designer™  to optimize your database.  Database Designer creates new projections that optimize your database based on its data statistics and the queries you use. Database Designer:
1. Analyzes your logical schema, sample data, and sample queries (optional).
2. Creates a physical schema design (projections) in the form of a SQL script that can be deployed automatically or manually.
3. Can be used by anyone without specialized database knowledge (even business users can run Database Designer).
4. Can be run and re-run anytime for additional optimization without stopping the database.
Designs created by the Database Designer provide exceptional query performance. The Database Designer uses sophisticated strategies to provide excellent ad-hoc query performance while using disk space efficiently. Of course, a proficient human may do even better than the Database Designer with more intimate knowledge of the data and the use-case – a small minority of our customers prefer to do manual projection design and can usually get a good feel for it after working with the product for a few weeks.
We’ve heard people ask if we need a projection for each query in Vertica, which we absolutely do not! Typically our customers use 3-5 projections and several are using the single superprojection only. A typical customer would have the superprojection along with a few smaller projections (often comprised of only a few columns each).  Unlike MVs and indexes, projections are cheap to maintain during load and due to Vertica’s compression, the resulting data size tends to be 5-25x smaller than the base data. Depending on your data latency needs (seconds to minutes) and storage availability you could choose to add more projections to further optimize the database.  Also important to note is that Vertica does not charge extra for projections, regardless of how many are deployed.  So whether a customer has 1 or 50 projections, their license fees are the same – entirely based on raw data.
As you can see, projections are very easy to work with, and if you are a business analyst who doesn’t know SQL/DDL, that’s okay, we created a tool that designs, deploys and optimizes the database automatically for you.  Our objective from day one has always been to enable customers to ask more questions and get faster answers from their data without having to constantly tune the underlying database.  Part III of this post goes into more detail on projections versus indexes and materialized views.

No comments:

Post a Comment