TCS Python Interview Questions for Data Engineer 2025

Listen to this Post

Featured Image
This post covers essential Python interview questions for Data Engineers, focusing on pandas, NumPy, and data manipulation techniques.

  1. How to Perform a GroupBy Operation and Aggregate Data in Pandas?
    import pandas as pd
    
    Sample DataFrame 
    data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40]} 
    df = pd.DataFrame(data)
    
    GroupBy and aggregate 
    grouped = df.groupby('Category').agg({'Values': ['sum', 'mean', 'count']}) 
    print(grouped) 
    

You Should Know:

  • Use `groupby()` followed by `agg()` for custom aggregations.
  • Common functions: sum(), mean(), max(), min(), count().

2. Difference Between `loc[]` and `iloc[]` in Pandas

– `loc[]` → Label-based indexing (df.loc[0, 'Column']).
– `iloc[]` → Integer-based indexing (df.iloc[0, 1]).

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) 
print(df.loc[0, 'A'])  Output: 1 
print(df.iloc[0, 1])  Output: 3 
  1. Convert Column to Datetime and Extract Year, Month, Day
    df['Date'] = pd.to_datetime(df['Date']) 
    df['Year'] = df['Date'].dt.year 
    df['Month'] = df['Date'].dt.month 
    df['Day'] = df['Date'].dt.day 
    

  2. Create a NumPy Array (0-100) and Reshape to 10×10

    import numpy as np 
    arr = np.arange(0, 100).reshape(10, 10) 
    print(arr) 
    

You Should Know:

– `np.arange()` creates a range.
– `reshape()` changes dimensions.

5. Broadcasting in NumPy

a = np.array([1, 2, 3]) 
b = 2 
print(a  b)  Output: [2, 4, 6] 

– NumPy applies operations element-wise.

  1. Find Index of Maximum Value in NumPy Array
    arr = np.array([1, 5, 3, 9, 2]) 
    max_index = np.argmax(arr) 
    print(max_index)  Output: 3 
    

7. Deep Copy vs. Shallow Copy in NumPy

  • Shallow Copy (view) → Shares memory.
  • Deep Copy (copy) → Independent memory.
arr = np.array([1, 2, 3]) 
shallow = arr.view() 
deep = arr.copy() 

8. Handling NaN Values in NumPy

arr = np.array([1, np.nan, 3]) 
cleaned = np.nan_to_num(arr, nan=0)  Replace NaN with 0 

What Undercode Say

Mastering these Python concepts is crucial for Data Engineering interviews. Practice these commands and understand their real-world applications in ETL pipelines, data cleaning, and analytics.

Expected Output:

Category Values 
sum mean count 
A 40 20 2 
B 60 30 2 

Prediction:

  • Increased demand for PySpark + Pandas skills in 2025.
  • AI-driven data pipelines will integrate more automated ETL tools.

Relevant URLs:

IT/Security Reporter URL:

Reported By: Surbhi Walecha – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram