Bulk insert rows to SQL

sqlalchemy supports multiple rows insert into database.

This example here uses pandas to insert the whole dataset into a MS SQL Server table.

Note if a data frame is too big, it can specify chunksize as well

Note, the default insert method is one row per insert, which is very slow for large data.

use the method 'multi' for multiple rows insertion

from sqlalchemy import create_engine

import pandas as pd

df = pd.DataFrame([['A', 'B', '2022-01-01', 1],['C', 'D', '2022-01-01',2]], columns=['Cat', 'Name', 'Date', 'Value'])

engine = create_engine(

'mssql+pyodbc://user_name:password@sql_server_address/database_name?driver=SQL Server')

df.to_sql(schema='dbo', name='TableName', con=engine, if_exists='append', index=None, chunksize=100, method='multi')

The sqlalchemy is actually running a massive insert statement:

INSERT INTO table (col1, col2, ... colk) VALUES (?, ?, ?, ?, ?), (?, ?, ?, ?, ?), (?, ?, ?, ?, ?), (?, ?, ?, ?, ?), ....

when there are too many rows, it tends to fail, maybe the string is too long.

Limit the chuncksize to a small enough value, e.g. 200

THe native SQLALchemy uses bulk_insert_objects for inserting but requires managing your own session, table definition, etc.

Page updated

Google Sites

Report abuse