Now that Snowflake is available, you shouldn’t be building your grandmother’s data warehouse anymore (or your father’s for that matter....). It is an entirely new world now with data warehouse technology that has been specifically created for the cloud and low maintenance. Snowflake is the first “Cloud Data Warehouse” that was designed with cloud-based backend technology in mind. It is not a re-used older version of an open-source database or technology retrofitted from 10-20 years ago.
There is no other solution out there that comes close to Snowflake’s price, scale, and features.
Frank’s Top 5 Reasons to Modernize Your Data Warehouse:
✔︎ Only Pay for What You Use
- No more buying capacity in advance and paying for what you do not use.
✔︎ Elastic. Separate Compute from Compute and Storage
- No concurrency issues anymore. No ETL bottlenecks with reporting!
- You can actually switch warehouse size (which is your compute) from 1 SQL command to the next.
✔︎ Zero Copy Clone and Time Travel
- Never have to deal with backups again :)
- DBA job/data saving. You can always go back to correct data.
✔︎ JSON Integration
✔︎ LOW Maintenance (no indexing, vacuuming, optimizing, etc.)
- Gone are the days where there was TONS of work and risk with designing data
warehouses to deal with aggregates.
- The cost of maintaining Snowflake reduces your service costs tremendously.
When I was building “Big Data” data warehouse systems in the ‘90s and early 2000s, it was an entirely different world with a TON of optimization and tuning work. Data warehouses typically took years to roll out. And there was no agile development. With Snowflake, we actually avoid all of that work. There is really only one optimization with cluster keys and that can be automated. ALL of the other existing on-prem systems and cloud data warehouse solutions besides BigQuery blackbox require tuning and maintenance expertise. They are less agile and more costly to maintain. They slow your business down. In this blog, I’ll talk about why it might be time to modernize your data warehouse and upgrade to Snowflake if you haven’t already.
When starting your data warehouse modernization, begin by writing out what your business and technical objectives are. This will help align your system with your goals. After you’ve thought through that, it’s time to get into the nitty gritty. Below, we have outlined steps to think about when modernizing your data warehouse. You definitely want to make sure the entire pipelines—in and out—are secured and monitored. Also, you want to think about how you can actually have a truly agile data warehouse and data warehouse development along with continuous integration. This is only fully achievable on Snowflake with the combination of Zero-copy Cloning and integration with Jenkins, Git, etc.
You need to plan for details such as:
- List processes to migrate and/or modernize and re-engineer
- Create a future state architecture diagram
- List deployment processes used for the migration. Document how you will do the one-time data migration of existing tables. Plan on how you will do incremental releases
- Document any process dependencies for data sets and objects
When you are planning, begin with the “end in mind.” By that I mean, think through how data will enable your business objectives. Also, how can you make the system flexible? When you modernize your data warehouse, it is best to initially do a straightforward lift and shift of data from the legacy database to Snowflake to have minimal impact on the business. Then, you can rethink and modernize your data model. That is what we typically recommend to get immediate results and cost savings, unless your current data model doesn’t work. In the future, you can even look at moving to schema on read concepts with JSON which gives you much greater flexibility. If you have any questions about how to modernize your data pipeline and model, we are happy to provide you with a complimentary data assessment. Whether you’ve already begun migrating or are just getting started, we can provide you with detailed migration checklists at no cost.
Here are some other considerations to take into account when modernizing your data warehouse.
In the modern, well-designed data warehouse...Security is King. Snowflake has thought through security in depth, but for your modernization work, you do need to think through the security of your entire data pipeline. You need to make sure that all data pipelines and work on EC2 or other cloud or on-prem pipelines are encrypted workloads from end to end.
AGILE DATA WAREHOUSING.
Snowflake is truly game changing with its unique features of Zero-copy Cloning and Time Travel. These features are data professionals’ best friends. I mean, even now, when we first tell people about how these features work, they really do not believe us.
Snowflake Zero-copy Cloning makes it easy to test and QA your database without physically copying your data. You can learn more about the feature here. Time Travel is also an incredible tool. It enables you to access historical data (i.e. data that has been changed or deleted) at any point within a defined period. Here’s some helpful documentation on that!
One of the biggest benefits, though, is “truly” agile data warehousing. Before Snowflake, I really thought this terminology was a misnomer. I mean, all on-prem and most cloud data warehouses are anything but agile. DDL-type agility and data movement with “large” data sets had to be copied, and copying and I/O takes real time.
CONTINUOUS INTEGRATION AND CONTINUOUS DEVELOPMENT (CI/CD) on SNOWFLAKE.
Over the years, the concept of including database releases as part of CI/CD has been discussed and partially implemented. Now, with Snowflake you can make it a reality. Using a combination of Git, Jenkins, and Snowflake you actually can build solid, reliable CI/CD pipelines. The key is really in the Zero-copy Clone functionality which allows the protection of data before, during, and after deployment. DDL and DML can be stored in Git and actually tested and verified through test harnesses. All database changes can be subjected to source control as the application changes. For data, you can actually implement auditing and versioning to track changes with different clones.
Modernizing Governance. In most older or even semi-recent data warehouses we come across, there aren’t well updated catalogs or governance of who can access or download what data. With the new power of Snowflake, EVERY SINGLE interaction is within an audit table. You know every query rendered and every user that did it, along with the time it took. There are new tools or improvements in existing tools from Collibra, Informatica, Talend, and others on data cataloging and lineage. You need to at least look at these and think through your strategy in this area. Also, consider many of the new regulations with GDPR, CCCA, and others.
Every time we have modernized a data warehouse and moved it from any on-prem system to Snowflake, we have seen large reductions in ETL/ELT times and reporting times. I’ve worked on jobs that took 30 hours and seen them reduce to under one hour with just moving to a large warehouse/compute.
While migrating and after migration, you do need to put in Resource Monitors to make sure you understand the costs of the workloads. Even though, normally, the costs are so much lower, you still want to monitor and understand your workloads. If you are interested in monitoring down to even the business unit, then there are some tips and expertise we can share regarding that too.
COMPLIMENTARY DATA ASSESSMENT.
Reach out to Fairway to answer any questions or get started on a complimentary data assessment. As a trusted partner, we can help you with Snowflake implementations or any other data warehouse modernization projects you are working on. Ensure your business has access to best-in-breed tools for your success.