Working with Excel files using Pandas

Last Updated : 28 Apr, 2026

Excel files store data in rows and columns, making them useful for managing structured datasets.

  • To work with Excel files, we use Pandas library which allows us to read, modify and analyze Excel data in a DataFrame format.
  • First, we install and import Pandas, then use the read_excel() function to load Excel data into Python for processing.

In the below code, we are working with an Excel file named students.xlsx which contains student data.

Python
import pandas as pd
df = pd.read_excel('students.xlsx')
print(df)

Output

Roll No. English Maths Science
0 1 19 13 17
1 2 14 20 18
2 3 15 18 19
3 4 13 14 14
4 5 17 16 20
5 6 19 13 17
6 7 14 20 18
7 8 15 18 19
8 9 13 14 14
9 10 17 16 20

Note: You may need to install openpyxl using pip install openpyxl to read Excel files.

Loading Multiple Sheets using concat()

By default, read_excel() loads only the first sheet of an Excel workbook. If your file contains multiple sheets, you can read each sheet separately and then combine them into a single DataFrame using pd.concat(). The read_excel() function provides useful arguments to control how data is loaded:

  • sheet_name: Specify the name of the sheet that needs to be used.
  • index_col: Defines the column to be used as the index.

Example: Here we concatenate the two sheets into a single DataFrame using the concat() function and to view the complete combined DataFrame, we simply run the following command:

Python
file = 'students.xlsx'
sheet1 = pd.read_excel(file, 
                        sheet_name = 0, 
                        index_col = 0)

sheet2 = pd.read_excel(file, 
                        sheet_name = 1, 
                        index_col = 0)

newData = pd.concat([sheet1, sheet2])
print(newData)

Output

Roll No. English Maths Science
1 19 13 17
2 14 20 18
3 15 18 19
4 13 14 14
5 17 16 20
6 19 13 17
7 14 20 18
8 15 18 19
9 13 14 14
10 17 16 20
1 14 18 20
2 11 19 18
3 12 18 16
4 15 18 19
5 13 14 14
6 14 18 20
7 11 19 18
8 12 18 16
9 15 18 19
10 13 14 14

Head() and Tail() methods

The head() and tail() methods are used to quickly preview data in a DataFrame. They help you inspect the top or bottom rows without printing the entire dataset. You can pass a number inside the brackets to specify how many rows you want to see

  • head(): Displays the first 5 rows by default.
  • tail(): Displays the last 5 rows by default.
Python
print(newData.head())
print(newData.tail())

Output

Roll No. English Maths Science
1 19 13 17
2 14 20 18
3 15 18 19
4 13 14 14
5 17 16 20
Roll No. English Maths Science
6 14 18 20
7 11 19 18
8 12 18 16
9 15 18 19
10 13 14 14

Shape() attribute

shape attribute is used to check the dimensions of a DataFrame. It returns a tuple showing the total number of rows and columns.

  • first value represents the number of rows
  • second value represents the number of columns
Python
newData.shape

Output

(20, 3)

Sort_values() method

sort_values() method is used to sort a DataFrame based on the values of a specific column. It is especially useful when working with numerical data, but it can also sort text data.

  • By default, it sorts values in ascending order.
  • To sort in descending order, use ascending=False.
Python
sorted_column = newData.sort_values(['English'], ascending = False)

Now, let's suppose we want the top 5 values of the sorted column, we can use the head() method here: 

Python
sorted_column.head(5)

Output

Roll No. English Maths Science
1 19 13 17
6 19 13 17
5 17 16 20
10 17 16 20
3 15 18 19

 We can do that with any numerical column of the data frame as shown below: 

Python
newData['Maths'].head()

Output

Roll No.
1 13
2 20
3 18
4 14
5 16
Name: Maths, dtype: int64

Describe() method

When your dataset contains numerical data, describe() method provides a quick statistical summary of the DataFrame. It includes Count (number of non null values), Mean, Standard Deviation, Minimum and Maximum values and Percentiles (25%, 50%, 75%)

Python
newData.describe()

Output

English Maths Science
count 20.00000 20.000000 20.000000
mean 14.30000 16.800000 17.500000
std 2.29645 2.330575 2.164304
min 11.00000 13.000000 14.000000
25% 13.00000 14.000000 16.000000
50% 14.00000 18.000000 18.000000
75% 15.00000 18.000000 19.000000
max 19.00000 20.000000 20.000000

Pandas also provides individual statistical methods like mean(), sum(), min() and max() to calculate specific values. This can also be done separately for all the numerical columns using following command: 

Python
newData['English'].mean()

Output

np.float64(14.3)

You can also create calculated columns, just like Excel formulas, by performing operations on existing columns.

Python
newData['Total Marks'] = newData["English"] + newData["Maths"] + newData["Science"]
newData['Total Marks'].head()

Output

Roll No.
1 49
2 52
3 52
4 41
5 53
Name: Total Marks, dtype: int64

After operating on the data in the data frame, we can export the data back to an Excel file using the method to_excel. For this, we need to specify an output Excel file where the transformed data is to be written, as shown below: 

Python
newData.to_excel('Output File.xlsx')

Output

3
Final Updated Sheet
  • It creates a new Excel file if it doesn’t exist.
  • Overwrites the file if a file with the same name already exists.
Comment