I was recently attempting to cache the results of a long-running SQL query to a local parquet file using SQL via a workflow like this:
import os import pandas as pd import sqlalchemy env = os.environ engine = sqlalchemy.create_engine(f"mysql+pymysql://{env['SQL_USER']}:{env['SQL_PASSWORD']}@{env['SQL_HOST']}/{env['SQL_DB']}") connection = engine.connect() with engine.connect() as conn: df = pd.read_sql("SELECT * FROM articles", connection) df.to_parquet("articles.parquet")Read more...