TCS Python Interview Questions For Data Engineer 2025

This post covers essential Python interview questions for Data Engineers, focusing on pandas, NumPy, and data manipulation techniques.

How to Perform a GroupBy Operation and Aggregate Data in Pandas?

import pandas as pd

Sample DataFrame 
data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40]} 
df = pd.DataFrame(data)

GroupBy and aggregate 
grouped = df.groupby('Category').agg({'Values': ['sum', 'mean', 'count']}) 
print(grouped)

You Should Know:

Use `groupby()` followed by `agg()` for custom aggregations.
Common functions: sum(), mean(), max(), min(), count().

2. Difference Between `loc[]` and `iloc[]` in Pandas

– `loc[]` → Label-based indexing (df.loc[0, 'Column']).
– `iloc[]` → Integer-based indexing (df.iloc[0, 1]).

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) 
print(df.loc[0, 'A'])  Output: 1 
print(df.iloc[0, 1])  Output: 3

Convert Column to Datetime and Extract Year, Month, Day

df['Date'] = pd.to_datetime(df['Date']) 
df['Year'] = df['Date'].dt.year 
df['Month'] = df['Date'].dt.month 
df['Day'] = df['Date'].dt.day

Create a NumPy Array (0-100) and Reshape to 10×10

import numpy as np 
arr = np.arange(0, 100).reshape(10, 10) 
print(arr)

You Should Know:

– `np.arange()` creates a range.
– `reshape()` changes dimensions.

5. Broadcasting in NumPy

a = np.array([1, 2, 3]) 
b = 2 
print(a  b)  Output: [2, 4, 6]

– NumPy applies operations element-wise.

Find Index of Maximum Value in NumPy Array

arr = np.array([1, 5, 3, 9, 2]) 
max_index = np.argmax(arr) 
print(max_index)  Output: 3

7. Deep Copy vs. Shallow Copy in NumPy

Shallow Copy (view) → Shares memory.
Deep Copy (copy) → Independent memory.

arr = np.array([1, 2, 3]) 
shallow = arr.view() 
deep = arr.copy()

8. Handling NaN Values in NumPy

arr = np.array([1, np.nan, 3]) 
cleaned = np.nan_to_num(arr, nan=0)  Replace NaN with 0

What Undercode Say

Mastering these Python concepts is crucial for Data Engineering interviews. Practice these commands and understand their real-world applications in ETL pipelines, data cleaning, and analytics.

Expected Output:

Category Values 
sum mean count 
A 40 20 2 
B 60 30 2

Prediction:

Increased demand for PySpark + Pandas skills in 2025.
AI-driven data pipelines will integrate more automated ETL tools.

Relevant URLs:

IT/Security Reporter URL:

Reported By: Surbhi Walecha – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram

Listen to this Post

You Should Know:

2. Difference Between `loc[]` and `iloc[]` in Pandas

You Should Know:

5. Broadcasting in NumPy

7. Deep Copy vs. Shallow Copy in NumPy

8. Handling NaN Values in NumPy

What Undercode Say

Expected Output:

Prediction:

Relevant URLs:

IT/Security Reporter URL:

Join Our Cyber World:

Share this:

Related Posts: