This article is inspired by the acronym TL;DR (too long; didn’t read). As I dive more into technology topics such as Snowflake, Streamlit, AI, data management, and more, I find a lot of content that can be rather lengthy. Not all the content has a nice intro or summary. I’m doing a little finger-pointing at myself here because I’ve skipped that crucial intro on several articles.
TL;DR
In this demo, I’ll show a Streamlit app using Snowflake Cortex to extract content from a URL and produce a summary of the content. Prerequisites include a Snowflake account with Cortex enabled and a Python dev environment with Streamlit, Snowflake, and BeautifulSoup libraries installed.
Use AI to Summarize Web Content
With some of the “tinkering” I’ve been doing with Cortex, Snowflake’s integrated AI toolset, I wondered how I could leverage it to summarize web content without copying and pasting the full content into an app. That spawned a simple Streamlit app to do just that.
Getting Started
You’ll need a Snowflake account with the appropriate permissions applied to access the functions. I covered adding this permission set in a previous article.
If not already done, install the Snowflake Snowpark, Connector and ML libraries, Streamlit, and BeautifulSoup4 into your Python environment. I use VSCode, so I installed all these from the command line using pip.
Import Libraries and Snowflake Credentials
Now that all the prerequisites are handled, we can start building the app in our Python dev environment. First, we’ll import all our needed libraries and create our Snowflake credentials dictionary. Enter your Snowflake account information where specified and be sure to use the user and role with Cortex access granted in the first step.
#IMPORT SNOWFLAKE LIBARIES
from snowflake.snowpark import Session as sp
from snowflake import cortex as cx
import streamlit as st
#IMPORT "EXTRA" LIBRARIES
import requests
from bs4 import BeautifulSoup
#SET SNOWFLAKE CREDS
snowflake_creds = {
"account":"YOUR ACCOUNT",
"user":"YOUR USERNAME",
"password":"YOUR PASSWORD",
"database":"YOUR DB",
"schema":"YOUR SCHEMA",
"role":"YOUR ROLE",
"warehouse":"YOUR WAREHOUSE"
}
URL Text Input
For this app, I’m going to keep it pretty simple and have a header and single input box for the URL of the content to summarize. Once the URL is entered and submitted to the app, I’ll add some conditional logic to check if a URL was submitted, if it is valid, the content can be read, etc.
st.set_page_config(page_title="Snowflake Cortext URL Summarizer")
st.header("Snowflake Cortex URL Summarizer")
#EXTRACT TEXT FROM PUBLIC WEBSITE WITH "GET"
summary_url = st.text_input(label="Enter url to summarize (must be public)")
Making Soup
BeautifulSoup is a Python library that helps with parsing HTML and XML content. For this app, we’ll use it to parse the HTML content from a GET request and extract just the text. We don’t need all the HTML code like JavaScript or tags – just the “body” of the article or page. BeautifulSoup does that in just a few lines of code.
response = requests.get(summary_url)
#MAKE SOUP
soup = BeautifulSoup(response.content,features='html.parser')
Summarize the Content
Now that we’ve made soup, we’ll send that soup to Snowflake to run the Cortex Summarize function on the text and then write that summary to a chatbot window. I used the chatbot window for fun, but it could be a standard write function, a text area, or whatever you choose.
content_summary = cx.Summarize(soup.get_text(),sf_session)
chat_out = st.chat_message
with chat_out(name="ai"):
st.write(content_summary)
Running the App
Now that the app is complete, run the app by calling the streamlit run utility from the command line (or terminal in VSCode). Once the app loads in a browser window, paste a URL into the box and press Enter. Within a few seconds, your app will read the document and then provide a summary of the content. The animation below shows the app in action on one of my recent blog articles.
Conclusion
In this article I showed how to use Snowflake Cortex Summarize in a Streamlit app to provide a summary of web content – content that’s not even in Snowflake! Imagine using Snowflake’s AI functions with Python or Streamlit for rapid POCs for datasets not yet loaded to Snowflake – the possibilities are endless.
The complete demo code is available on my Github repository. Follow me on LinkedIn and Medium for more Snowflake, Streamlit, and Data Management content.
