Amazon RedShift fully managed petabyte-scale data warehouse service

Redshift is a fully managed petabyte-scale data warehouse service from Amazon. The Amazon Redshift service manages all of the work of setting up, operating, and scaling a data warehouse. These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

It is designed for analytics workloads and offers seamless development and integration capabilities that can be used with existing SQL or BI tools. Based on columnar storage technology, it uses parallel and distributed queries processing models across nodes to deliver the required high performance at scale. It also provides number of automation features and tools to administration and control perspective, provisioning, configuring, monitoring, backing up, and securing a data warehouse are automated.

Benefits :

  1. Fast: Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and by parallelizing queries across multiple nodes.
  2. Simple: Amazon Redshift helps easily automate most of the common administrative tasks to manage, monitor, and scale data warehouse.
  3. Extensible: Redshift Spectrum enables one to run queries against exabytes of data in Amazon S3 as well as petabytes of data stored on local disks in Amazon Redshift, using the same SQL syntax and BI tools you use today. One can store highly structured, frequently accessed data on Redshift local disks, keep vast amounts of unstructured data in an Amazon S3 “data lake”, and query seamlessly across both.
  4. Scalable: Helps easily resize cluster up and down as performance and capacity needs change with just a few clicks in the console or a simple API call.
  5. Secure: Security is built-in. One can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate clusters using Amazon VPC and even manage keys using AWS Key Management Service (KMS) and hardware security modules (HSMs).

How does it compare to the traditional data warehouse / analytics :

Amazon Redshift uses a variety of innovations to achieve up to ten times higher performance than traditional databases for data warehousing and analytics workloads:

Columnar Data Storage: Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.

Advanced Compression: Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.

Massively Parallel Processing (MPP): Amazon Redshift automatically distributes data and query load across all nodes. Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.

Redshift Spectrum: Redshift Spectrum enables you to run queries against exabytes of data in Amazon S3. There is no loading or ETL required. Even if you don’t store any of your data in Amazon Redshift, you can still use Redshift Spectrum to query datasets as large as an exabyte in Amazon S3. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates the query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing.

Pricing :

As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use. Amazon Redshift lets you pay as you go. You can even try Amazon Redshift for free.

Leave a Reply

Your email address will not be published. Required fields are marked *