Running Cloudera on AWS

Geschreven door Matthias Vallaey | 7-mrt-2018 14:16:01

Customers of Cloudera and Amazon Web Services (AWS) now have the ability to run the enterprise data hub in the AWS public cloud, leveraging the power of the Cloudera Enterprise platform and the flexibility of the AWS cloud together.

This joint solution offers benefits, including:

Flexible Deployment, Faster Time to Insight

Running Cloudera Enterprise on AWS provides customers the greatest flexibility in how they deploy Hadoop, and can now bypass prolonged infrastructure selection and procurement processes, to rapidly put Cloudera’s Platform for Big Data to work to start realizing tangible business value from their data immediately. Hadoop excels at large scale data management and the AWS cloud focuses on providing infrastructure services on demand. Combining these allows customers to be able to leverage the power of Hadoop much faster and on-demand.

Scalable Data Management

At many large organizations, it can take weeks or even months to add new nodes into a traditional data cluster. By deploying Cloudera Enterprise in AWS, enterprises can effectively shorten rest-to-growth cycles to scale their data hubs as their business grows.

On-demand Processing Power

While Hadoop focus on collocating compute to disk, there are many processes that benefit from increased compute power. Deploying Hadoop on Amazon allows a fast ramp-up / ramp-down based on the needs of specific workloads, a flexibility that does not come easy with on-premise deployment.

Improved Efficiency, Increased Cost Savings

Deploying in AWS eliminates the need for organizations to dedicate resources toward maintaining a traditional data center, enabling them to focus instead on core competencies. As annual data growth for the average enterprise continues to skyrocket, even relatively new data management systems may experience strain under the demands of modern high performance workloads. By moving their data management platform to the cloud, enterprises can now offset or avoid the need to make costly annual investments in their on-premises data infrastructure to support new enterprise data growth, applications and workloads.

Solution Highlights

  • Unified Platform

  • Cloudera's platform covers the full data pipeline tightly integrated with common storage, schema, metadata, security, governance, and operations. Cloudera Altus makes it easy to deploy and manage any cloud workload and build higher-level applications in the cloud. This allows customers to focus on data and innovation rather than cloud infrastructure.

Built for the Enterprise 

Cloudera's platform works against AWS S3 (Simple Storage Service), so batch workloads like Data Engineering/Science or Analytic DB can be quickly spun up, sized appropriately, scaled and terminated (if necessary) to manage cost. Cloudera Altus does all of this on a platform that delivers enterprise-grade security, high-performance analytic engines and data governance. Altus works within the cloud service provider architecture. Altus creates clusters in a VPC in your AWS account and Altus jobs read input from and write output to Amazon S3. Altus offers a command line interface (CLI) as well as a web user interface. You use the Altus console or the CLI to perform tasks,such as creating clusters and running jobs on the cluster. The Altus console also providestoolsto facilitate administrative tasks, such as environment and account setup. Altus provides a Data Engineering service that enables you to create clusters and run jobs specifically for data science and engineering workloads, including batch processing jobs.
 

Simplified and Optimized

Cloudera on AWS customers can take advantage of elastic infrastructure to grow and shrink their clusters, provision compute on-demand and run against cloud-native object storage. Users can configure instances optimized for the specific workload they want to run, and terminate the job when it's complete. Additionally, Cloudera Altus makes it easy, cost-effective, and delivers extreme agility to the business by removing the burden of cluster operations. The PaaS offering includes intelligent defaults, limited up-front configuration, and a jobs-first orientation. That means users can focus solely on their work without having to concern themselves with cloud infrastructure management.