Excel files store data in rows and columns, making them useful for managing structured datasets.
- To work with Excel files, we use Pandas library which allows us to read, modify and analyze Excel data in a DataFrame format.
- First, we install and import Pandas, then use the read_excel() function to load Excel data into Python for processing.
In the below code, we are working with an Excel file named students.xlsx which contains student data.
import pandas as pd
df = pd.read_excel('students.xlsx')
print(df)
Output
Roll No. English Maths Science
0 1 19 13 17
1 2 14 20 18
2 3 15 18 19
3 4 13 14 14
4 5 17 16 20
5 6 19 13 17
6 7 14 20 18
7 8 15 18 19
8 9 13 14 14
9 10 17 16 20
Note: You may need to install openpyxl using pip install openpyxl to read Excel files.
Loading Multiple Sheets using concat()
By default, read_excel() loads only the first sheet of an Excel workbook. If your file contains multiple sheets, you can read each sheet separately and then combine them into a single DataFrame using pd.concat(). The read_excel() function provides useful arguments to control how data is loaded:
- sheet_name: Specify the name of the sheet that needs to be used.
- index_col: Defines the column to be used as the index.
Example: Here we concatenate the two sheets into a single DataFrame using the concat() function and to view the complete combined DataFrame, we simply run the following command:
file = 'students.xlsx'
sheet1 = pd.read_excel(file,
sheet_name = 0,
index_col = 0)
sheet2 = pd.read_excel(file,
sheet_name = 1,
index_col = 0)
newData = pd.concat([sheet1, sheet2])
print(newData)
Output
Roll No. English Maths Science
1 19 13 17
2 14 20 18
3 15 18 19
4 13 14 14
5 17 16 20
6 19 13 17
7 14 20 18
8 15 18 19
9 13 14 14
10 17 16 20
1 14 18 20
2 11 19 18
3 12 18 16
4 15 18 19
5 13 14 14
6 14 18 20
7 11 19 18
8 12 18 16
9 15 18 19
10 13 14 14
Head() and Tail() methods
The head() and tail() methods are used to quickly preview data in a DataFrame. They help you inspect the top or bottom rows without printing the entire dataset. You can pass a number inside the brackets to specify how many rows you want to see
- head(): Displays the first 5 rows by default.
- tail(): Displays the last 5 rows by default.
print(newData.head())
print(newData.tail())
Output
Roll No. English Maths Science
1 19 13 17
2 14 20 18
3 15 18 19
4 13 14 14
5 17 16 20
Roll No. English Maths Science
6 14 18 20
7 11 19 18
8 12 18 16
9 15 18 19
10 13 14 14
Shape() attribute
shape attribute is used to check the dimensions of a DataFrame. It returns a tuple showing the total number of rows and columns.
- first value represents the number of rows
- second value represents the number of columns
newData.shape
Output
(20, 3)
Sort_values() method
sort_values() method is used to sort a DataFrame based on the values of a specific column. It is especially useful when working with numerical data, but it can also sort text data.
- By default, it sorts values in ascending order.
- To sort in descending order, use ascending=False.
sorted_column = newData.sort_values(['English'], ascending = False)
Now, let's suppose we want the top 5 values of the sorted column, we can use the head() method here:
sorted_column.head(5)
Output
Roll No. English Maths Science
1 19 13 17
6 19 13 17
5 17 16 20
10 17 16 20
3 15 18 19
We can do that with any numerical column of the data frame as shown below:
newData['Maths'].head()
Output
Roll No.
1 13
2 20
3 18
4 14
5 16
Name: Maths, dtype: int64
Describe() method
When your dataset contains numerical data, describe() method provides a quick statistical summary of the DataFrame. It includes Count (number of non null values), Mean, Standard Deviation, Minimum and Maximum values and Percentiles (25%, 50%, 75%)
newData.describe()
Output
English Maths Science
count 20.00000 20.000000 20.000000
mean 14.30000 16.800000 17.500000
std 2.29645 2.330575 2.164304
min 11.00000 13.000000 14.000000
25% 13.00000 14.000000 16.000000
50% 14.00000 18.000000 18.000000
75% 15.00000 18.000000 19.000000
max 19.00000 20.000000 20.000000
Pandas also provides individual statistical methods like mean(), sum(), min() and max() to calculate specific values. This can also be done separately for all the numerical columns using following command:
newData['English'].mean()
Output
np.float64(14.3)
You can also create calculated columns, just like Excel formulas, by performing operations on existing columns.
newData['Total Marks'] = newData["English"] + newData["Maths"] + newData["Science"]
newData['Total Marks'].head()
Output
Roll No.
1 49
2 52
3 52
4 41
5 53
Name: Total Marks, dtype: int64
After operating on the data in the data frame, we can export the data back to an Excel file using the method to_excel. For this, we need to specify an output Excel file where the transformed data is to be written, as shown below:
newData.to_excel('Output File.xlsx')
Output

- It creates a new Excel file if it doesn’t exist.
- Overwrites the file if a file with the same name already exists.