This article was published as a part of the Data Science Blogathon
Snowflake is a cloud data platform solution with unique features offered. In this tutorial, you will see an overview of Snowflake, its architecture, how to create a free trial account, and how to create a database, schema, and tables inside Snowflake.
Image source: Link
Snowflake is a completely managed cloud data platform which means there is no need to install any hardware or software that is required to work with it.
If you are from a SQL Developer background, then most of the time, your job includes how to write a faster query and also how to do performance tuning. W.r.t Snowflake even the performance tuning of the query is also taken care of by Snowflake itself. Maintainance or up-gradation is also managed by Snowflake itself. Snowflake cannot be run in an on-premise environment.
The workloads that are supported by Snowflake are listed below
The internal architecture of Snowflake comprises three layers as shown below.
Image credit: Link
2a.Database Storage Layer
This layer utilizes the existing cloud storage capabilities of AWS, GCP, or Azure which is internal to Snowflake. For example, if it is AWS, then S3 is used. As an end-user of the product, this is not revealed and it is completely managed by Snowflake itself.
For example, if we create a table within snowflake, then we can see the table by querying the table from the user interface but we can’t access the table from the cloud storage provider.
2b. Query Processing Layer
This layer consists of several virtual warehouses which are used to process the queries. Each virtual warehouse is not dependent on the other. These represent EC2 compute instances if you are from an AWS background. As an end-user, we can only select the size of the warehouse and also the minimum and a maximum number of clusters that come within each virtual warehouse unlike setting up an EC2 instance in AWS.
2c.Cloud Services Layer
Cloud services layer is a coordinator service that comprises of below
Login to URL and provide all the necessary details like your first name, last name, email, company name and click on continue and shown below
Image source: Author
Then the below screen is popped up where you can see the editions that snowflake provides. Currently, it provides Standard, Enterprise, Business Critical. You are free to select anyone.
You need to provide the cloud provider name as shown below. This is for the storage provider that we discussed above in the Architecture layer. Here I have provided AWS, Once you select then you should also provide the region, Here I have provided Asia Pacific Mumbai as shown below.
The reason for allowing the end-user to select the cloud provider and region is if in case the organization is already hosted on the cloud then we can select the same cloud for snowflake as well. We can also select the different clouds but we need to bear the data transfer cost. Typically we need to pay for the storage and compute capacity that we have used.
Image source: Author
Once you are done, you will receive an email for the registered email id to activate the account. Click to activate and set up your user name and password for the same. You will also receive a link to log in to your snowflake account like below.
This is the unique login link that we need to use every time to log in to the snowflake. Here AZ90921 is the unique account id generated for you. The rest of the URL contains cloud provider details like AWS,south-1 since we have selected the Mumbai region while logging in. Once you log in you will get the options to know quickly about the snowflake as shown below. Check the screenshot below on what you can do on the web-based user interface.
Image source: Author
Let’s quickly see the hands-on now.
Open the worksheet and paste the commands below. Execute the commands using the RUN option.
use role accountadmin
#Creating a warehouse create warehouse if not exists test_warehouse warehouse_size ='SMALL' auto_suspend=300 initially_suspended=true; USE WAREHOUSE TEST_WAREHOUSE # Create a test database create database testdb; #Check if the database is created successfully show databases like 'test%'; USE DATABASE testdb # Create a test schema create schema testschema; #Check if the schema is created successfully show schemas; USE SCHEMA testschema #Creating a sample employee table with two columns empname,empid create or replace table test_emptable (empid number,empname varchar); #Inserting sample data into an employee table insert into test_emptable values(1,'Micheal'); insert into test_emptable values(2,'Nick'); insert into test_emptable values(3,'George'); #Check if the table is created successfully show tables like 'test_emptable'; #Display the content of the table select * from test_emptable
4a.Explanation of the code
In the above code, we have used the role of account admin which is a high-level role available in the snowflake that has access to all objects.
Then I have created a warehouse called “test_warehouse”. Using this warehouse we created a database called “testdb”.In this DB a sample schema “testschema” is created. Then a sample employee table is created and I inserted data in it.
4b.Observation
Have you noticed that I have not given any commit command as we give explicitly in OracleDB? This is because the session has default parameters turned on. Here the parameter auto-commit is turned on. So that is the reason there is no need to give auto-commit explicitly again.
4c.Output
Image source: Author
In this blog post, we have seen what Snowflake is, its architecture, creating a free trial account, and hands-on. Using the free trial version you can also explore most of the prominent features that snowflake provides.
I hope this has given you an idea to get started with the Snowflake data platform. Please take the time to comment below and let me know your feedback.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.