BY: Bhagesh Kumar.
Hello Everyone, Hope you are doing well. In this blog I will tell you about another useful and important functionality of PANDAS library. As you Know Excel or csv files are mostly used for storing the data for different uses, in many sectors such as Education, Business and in firms. In these files the data is quite large. To store this big data we need Database, by storing the data in Database data became shareable, consistent and secure, in Database data is stored in a good manner and we can perform any query on it to get the data of our choice within couple of seconds.
Today’s topic:
Today I will explain you about importing/migrating data from EXCEL or CSV to Oracle table using Python using only some lines of Python Code.
First we look how to read data from EXCEL or CSV file, then we load the data into oracle table using Python.
Python Libraries:
Hope you are familiar with Python and Pandas which is most popular and important library used for data manipulation and analysis. You can check and get help from the official Pandas documentation link given below.
https://pandas.pydata.org/docs/getting_started/index.html#getting-started
Before moving further first install these libraries to your environment.
Environment:
Python3
cx_Oracle Library
Pandas Library
Jupyter Notebook
Oracle Database
Installing Libraries:
You can install by just simple one command using python pip as above.
pip install pandas
pip install cx_Oracle
Working Example:
Let’s move further and understand by example.
Firstly, you need to import Python libraries which you have installed earlier using PIP. Make connection of with database, which is Oracle in this case.
import cx_Oracle
import pandas as pd
Python library cx_Oracle is used to make Database connection with Oracle.
connection =cx_Oracle.connect("useruser", "yourpassword", "connection scring",encoding='UTF-8',nencoding='UTF-8')
cursor = connection.cursor()
Use Pandas to read the Excel file data in Python.
df = pd.read_excel('filename.xlsx')
Finally you will get data in DataFrame (Tabular) format. Now you have to make List of elements List, with the help of Pandas method to_list(), this method will take Excel rows one by one and put them into a List just like shown in below example List. The outer List contains multiple Lists each List is a one row of Excel Data. The List is easily comprehensible while data insertion and you can insert List of element Lists with executemany statement. You can use another Pandas function which is fillna(), this function is used to replace null values with any value, the replacement values will be written within parenthesis followed by fillna. You cansee the line of code below.
Example List:
[[‘a’,’b’,’c’,’d’,’e’],[‘x’,’y’,’z’,’aa’,’ab’]]
df_list = df.fillna('').values.tolist()
To follow my example, create the table in the database, name it as testdata as I did.
Create table testdata
(Name varchar2(55),
City varchar2(50),
Dob date,
Address varchar2(150),
Phonenumber varchar2(13));
As you see in table declaration above, you have created table with 5 columns. Secondly you need Excel file containing 5 column data which you want to insert, Now you are going to insert Excel data into table using Oracle insert statement. Let’s see how.
Oracle insert statement:
Insert into <tablename>
(column names separated by comma)
Values (ordered values separated by comma);
Write the Oracle insert statement and store it in a variable. We will pass this variable to Python executemany function, this function takes to positional arguments
- The variable which contains the insert statement
- The data list
insert_table = "INSERT INTO test (col1, col2, col3, col4, col5)" + \
" VALUES (:1, :2, :3, :4, :5)"
cursor.executemany(insert_table,df_list)
Last three lines of code:
Cursor.close()
Connection.commit()
Connection.close()
Full combined code:
import cx_Oracle
import pandas as pd
connection =cx_Oracle.connect("useruser", "yourpassword", "connection scring",encoding='UTF-8',nencoding='UTF-8')
cursor = connection.cursor()
insert_table = "INSERT INTO test (col1, col2, col3, col4, col5)" + \
" VALUES (:1, :2, :3, :4, :5)"
df = pd.read_excel('filename.xlsx')
df_list = df.fillna('').values.tolist()
cursor.executemany(insert_table,df_list)
cursor.close()
connection.commit()
connection.close()