Listen to this Post

This post covers essential Python interview questions for Data Engineers, focusing on pandas, NumPy, and data manipulation techniques.
- How to Perform a GroupBy Operation and Aggregate Data in Pandas?
import pandas as pd Sample DataFrame data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40]} df = pd.DataFrame(data) GroupBy and aggregate grouped = df.groupby('Category').agg({'Values': ['sum', 'mean', 'count']}) print(grouped)
You Should Know:
- Use `groupby()` followed by `agg()` for custom aggregations.
- Common functions:
sum(),mean(),max(),min(),count().
2. Difference Between `loc[]` and `iloc[]` in Pandas
– `loc[]` → Label-based indexing (df.loc[0, 'Column']).
– `iloc[]` → Integer-based indexing (df.iloc[0, 1]).
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
print(df.loc[0, 'A']) Output: 1
print(df.iloc[0, 1]) Output: 3
- Convert Column to Datetime and Extract Year, Month, Day
df['Date'] = pd.to_datetime(df['Date']) df['Year'] = df['Date'].dt.year df['Month'] = df['Date'].dt.month df['Day'] = df['Date'].dt.day
-
Create a NumPy Array (0-100) and Reshape to 10×10
import numpy as np arr = np.arange(0, 100).reshape(10, 10) print(arr)
You Should Know:
– `np.arange()` creates a range.
– `reshape()` changes dimensions.
5. Broadcasting in NumPy
a = np.array([1, 2, 3]) b = 2 print(a b) Output: [2, 4, 6]
– NumPy applies operations element-wise.
- Find Index of Maximum Value in NumPy Array
arr = np.array([1, 5, 3, 9, 2]) max_index = np.argmax(arr) print(max_index) Output: 3
7. Deep Copy vs. Shallow Copy in NumPy
- Shallow Copy (
view) → Shares memory. - Deep Copy (
copy) → Independent memory.
arr = np.array([1, 2, 3]) shallow = arr.view() deep = arr.copy()
8. Handling NaN Values in NumPy
arr = np.array([1, np.nan, 3]) cleaned = np.nan_to_num(arr, nan=0) Replace NaN with 0
What Undercode Say
Mastering these Python concepts is crucial for Data Engineering interviews. Practice these commands and understand their real-world applications in ETL pipelines, data cleaning, and analytics.
Expected Output:
Category Values sum mean count A 40 20 2 B 60 30 2
Prediction:
- Increased demand for PySpark + Pandas skills in 2025.
- AI-driven data pipelines will integrate more automated ETL tools.
Relevant URLs:
IT/Security Reporter URL:
Reported By: Surbhi Walecha – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


