Creating Snowflake External Stage Using Google Cloud Storage
If you are a Google cloud platform (GCP) user and want to use GCS as the external stage to load/unload data to snowflake, this blog will help you understand the step by step procedures.
Before going any further on unloading and loading data from GCP. Let us understand what is Load/Unload? and what are the supported cloud providers through which you can load/unload your data and query using Snowflake?
Loading: Loading of data refers to moving data into the Snowflake table through staging, where we ingest the data from cloud provider’s storage bucket or from internal stage.
Read to know more about loading.
Unloading: The unloading of data refers to moving data from snowflake to the cloud provider’s storage bucket using Data Staging.
Cloud providers supported by Snowflake
- Amazon Web Services
- Microsoft Azure
- Google Cloud Platform (Preview Mode)
For loading/unloading data to/from snowflake, we need to create integration objects to integrate GCP to Snowflake and an external staging area to temporarily keep and migrate data.
Creating Integration and External Stage:
- Log into snowflake web console and switch your role as Account Admin
- Create integration object by giving parameters like type of stage (i.e. External), storage provider (i.e. GCS), blocked locations, allowed locations (here all locations are allowed),etc in snowflake worksheet
- Below query is used to describe the integration object (here ‘gcs_int’)
DESC STORAGE INTEGRATION <INTEGRATION NAME>;
- From the results pane, note down service account name (This is required while adding member to the cloud bucket permissions. Every snowflake account has its unique service account)
- Log into Google cloud platform, click the 'Navigation Menu' -> 'IAM & admin' and then 'roles'
- Create a new role, select permissions and click 'Create'. Let us give ‘Snowflake_role’ as the role name. Choose all the assigned permissions.
- Now go to 'Navigation pane' and select 'Storage' then click on 'Browser' and choose the bucket you want to integrate with the role
- After choosing the bucket, click show info panel at the top right corner and then 'add member'
- Now in the add member pane, enter the service account name that was received while describing the integration in snowflake console and choose the role you have already created(role can be found in custom role menu, in this case it is 'Snowflake Role')
- You can also give permissions to other roles to use integration and to create stage to load/unload data using below query
1. grant usage on integration gcs_int to role Sysadmin;
2. grant create stage on schema public to role Sysadmin;
- Now create an external stage
- Use 'list @stage_name' to list all the files located in your Google cloud bucket in the results pane
We are all set to load/unload data to/from GCP using Snowflake.
LOADING DATA INTO TABLE:
- Create a ‘File Format’ in Snowflake by using WebUI or native SQL command to access the respective file format that is stored in the bucket
Note: In our case, DB Name is 'Shivam_data', schema name is 'Public' and file format is 'CsvFile'
- Finally, use 'Copy Command' to load the data from GCS bucket into the destination table(here it is ‘SAMPLETABLE’)
- You can verify the data in the destination table using simple select statement
Now, let us unload the data from 'SampleTable' into GCS Bucket.
UNLOADING DATA TO GCP BUCKET
- Snowflake auto compresses the data to be unloaded
- If compression is not required, choose compression as false and can unload your data into various file formats. Note the change in output_bytes.
The unloaded files can now be found in your Google Cloud Bucket
To know more, on how we can help you with snowflake and data management. Write to us at [email protected]