For a recent proof of concept, I needed to transfer data from five or six relatively small tables from SQL Server to Snowflake. Normally, I would go through all the steps to set up an ingestion pipeline in Snowflake; however, for a POC, that all seemed like overkill. I simply needed to copy the data from those tables from SQL Server to Snowflake. Python and Snowpark to the rescue!
Pre-Requisites
This article assumes the reader has a working knowledge of Python, Snowflake, and SQL Server. Additional pre-requisites for the code samples shown include:
- Python Development Environment
- I use VSCode, but any Python IDE or environment will work
- Python Libraries
- snowflake-connector-python
- snowflake-snowpark-python
- sqlalchemy
- pandas
- Snowflake Account
- SQL Server access
Import Python Libraries
Like most Python scripts, the first step is to import all the needed libraries.
#IMPORT LIBARRIES
import pandas
from sqlalchemy import create_engine
from snowflake.snowpark import Session as sp
from snowflake.connector.pandas_tools import write_pandas
Build SQL Server Connection
Using SQL Alchemy build a connection to SQL Server. In my instance, I am using Windows Authentication, so no username/password is specified. Modify the connection string as needed to fit your SQL Server instance.
#SET SQL SERVER OPTIONS
sql_server_options ={
"server":"SQLSERVERNAME",
"database":"DBNAME",
"table_name":"TABLENAME"
}
#BUILD SQL CONNECTION STRING
sql_server_conn_str = (
f'mssql+pyodbc:///?odbc_connect='
f'driver=ODBC Driver 17 for SQL Server;'
f'server={sql_server_options["server"]};'
f'database={sql_server_options["database"]};'
f'trusted_connection=yes;'
)
# Create an SQLAlchemy engine and connect to SQL Server
sql_server_engine = create_engine(sql_server_conn_str)
Load SQL Server Data into Pandas
Use the read_sql_table function to read the SQL Server table into a Pandas dataframe. This process can be very resource-intensive. I would recommend using this method only for “smaller” tables. In my example, I am using a date dimension table that is 32 columns wide by 15,000 rows. deep. This method may not be a good candidate for larger (i.e. wide and deep) tables.
Additionally, to assist with the table build in Snowflake, I’ll convert all the column names to upper case and set all the datatypes to string. The string datatype will be sufficient for a “quick and dirty” load to Snowflake. Once the data is loaded, I can copy it into another table and assign proper datatypes. This exercise is simply to get the data moved from SQL Server to Snowflake quickly.
#READ DATA
sql_df = pandas.read_sql_table(sql_server_options["table_name"],sql_server_engine)
#CONVERT TO STRING AND RENAME COLUMNS TO UPPER
sql_df = sql_df.astype("string")
sql_df.rename(columns=str.upper,inplace=True)
Build Snowflake Connection
Use Snowpark for Python and establish a connection to Snowflake ensuring to specify the proper database, schema, role, and warehouse to create the target table.
#SET SNOWFLAKE CREDS
snowflake_creds = {
"account":"ACCOUNTLOCATOR",
"user":"USERNAME",
"password":"PASSWORD",
"database":"DATABASE",
"schema":"SCHEMA",
"role":"ROLE",
"warehouse":"WAREHOUSE"
}
#CONNECT AND WRITE DATA TO SNOWFLAKE
sf_session = sp.builder.configs(snowflake_creds).create()
Create the Table and Write Data to Snowflake
Using the write_pandas command along with the auto_create_table setting, write the SQL Server data contained in the Pandas dataframe to Snowflake. By setting auto_create_table to true, write_pandas will create a table of the same structure as the dataframe.
#CREATE TABLE AND WRITE DATA, DROP TABLE FIRST IF ALREADY EXISTS
sf_session.write_pandas(sql_df,str.upper(sql_server_options["table_name"]),auto_create_table=True,overwrite=True,table_type='transient')
Conclusion
Python and Snowpark can make tasks quite simple for “quick and dirty” purposes, but shows just how powerful it can be for more large scale work. It took just under 50 lines of code to quickly and easily move data from SQL Server to Snowflake. I don’t recommend this method for large datasets, or for production-grade work; however, to quickly move something for POCs or “tinkering”, it’s hard to beat a small reusable script like this.
Full script is available on my GitHub repository.