Q1. What is data cleaning?
Process of fixing or removing incorrect, incomplete, or duplicate data.
Q2. Why is data cleaning important?
Improves data quality and accuracy of analysis.
Q3. What are missing values?
Data points that are not stored or recorded.
Q4. How to detect missing values?
df.isnull()
Q5. How to remove missing values?
df.dropna()
Q6. How to fill missing values?
df.fillna(0)
Q7. Fill with mean?
df['col'].fillna(df['col'].mean())
Q8. Fill with median?
df['col'].fillna(df['col'].median())
Q9. Fill with mode?
df['col'].fillna(df['col'].mode()[0])
Q10. What are duplicates?
Repeated rows in dataset.
Q11. Detect duplicates?
df.duplicated()
Q12. Remove duplicates?
df.drop_duplicates()
Q13. What is inconsistent data?
Data with errors like different formats.
Q14. Fix inconsistent text?
df['col'].str.lower()
Q15. Remove extra spaces?
df['col'].str.strip()
Q16. Replace values?
df.replace('old','new')
Q17. Change data types?
df['col'].astype(int)
Q18. What is outlier?
Extreme value different from others.
Q19. Detect outliers?
Using IQR or visualization.
Q20. Remove outliers?
Filter based on threshold.
Q21. What is null vs NaN?
NaN is numeric missing value, null is general missing.
Q22. Rename columns?
df.rename(columns={'a':'b'})
Q23. Standardize data?
Convert values to common format.
Q24. What is encoding?
Convert categorical data to numeric.
Q25. Label encoding?
Assign numbers to categories.
Q26. One-hot encoding?
pd.get_dummies(df)
Q27. Remove columns?
df.drop('col', axis=1)
Q28. Remove rows?
df.drop(index)
Q29. What is normalization?
Scaling values between 0 and 1.
Q30. What is standardization?
Mean = 0, Std = 1 scaling.
Q31. Detect data type?
df.dtypes
Q32. Convert string to datetime?
pd.to_datetime()
Q33. Handle inconsistent dates?
Standardize date format.
Q34. Remove special characters?
str.replace()
Q35. Handle large datasets?
Use chunking.
Q36. Memory optimization?
Reduce data types size.
Q37. What is data validation?
Checking correctness of data.
Q38. What is imputation?
Filling missing values.
Q39. Drop columns with many nulls?
df.dropna(axis=1)
Q40. Sort data?
df.sort_values()
Q41. Detect inconsistent categories?
df['col'].unique()
Q42. Replace multiple values?
df.replace({'a':'x','b':'y'})
Q43. What is trimming?
Removing unwanted characters.
Q44. Handle missing categorical?
Fill with mode.
Q45. What is binning?
Grouping values into bins.
Q46. Example binning?
pd.cut()
Q47. Detect skewness?
df.skew()
Q48. Handle skewed data?
Log transformation.
Q49. Remove constant columns?
Drop columns with same values.
Q50. Check summary?
df.describe()
Q51. What is data manipulation?
Process of transforming and organizing data.
Q52. Add column?
df['new']=value
Q53. Delete column?
df.drop('col',axis=1)
Q54. Filter rows?
df[df['col']>10]
Q55. Sort data?
df.sort_values()
Q56. Group data?
df.groupby()
Q57. Aggregate functions?
sum(), mean(), count()
Q58. Apply function?
df.apply()
Q59. Lambda function?
lambda x: x+1
Q60. Merge data?
pd.merge()
Q61. Join data?
df.join()
Q62. Concatenate?
pd.concat()
Q63. Pivot table?
pd.pivot_table()
Q64. Melt function?
pd.melt()
Q65. Set index?
df.set_index()
Q66. Reset index?
df.reset_index()
Q67. Map function?
df['col'].map()
Q68. Replace values?
df.replace()
Q69. Rank data?
df.rank()
Q70. Sample data?
df.sample()
Q71. Window function?
df.rolling()
Q72. Shift data?
df.shift()
Q73. Expanding?
df.expanding()
Q74. Query?
df.query()
Q75. Eval?
df.eval()
Q76. Sort index?
df.sort_index()
Q77. Duplicate rows?
df.duplicated()
Q78. Unique values?
df['col'].unique()
Q79. Value counts?
df['col'].value_counts()
Q80. Select columns?
df[['a','b']]
Q81. Select rows?
df.loc[]
Q82. Integer location?
df.iloc[]
Q83. Slice data?
df[0:5]
Q84. Boolean indexing?
df[df['col']>5]
Q85. Add multiple columns?
Assign multiple values.
Q86. Drop multiple columns?
df.drop(['a','b'],axis=1)
Q87. Rename multiple columns?
df.rename()
Q88. Reorder columns?
Use column list.
Q89. Check memory?
df.memory_usage()
Q90. Copy dataframe?
df.copy()
Q91. Append rows?
pd.concat()
Q92. Remove index?
reset_index()
Q93. Describe data?
df.describe()
Q94. Info?
df.info()
Q95. Shape?
df.shape
Q96. Columns?
df.columns
Q97. Index?
df.index
Q98. Head?
df.head()
Q99. Tail?
df.tail()
Q100. Final step in data process?
Data analysis and visualization.

 

📢 Join Our WhatsApp Channel

💼 Get Daily IT Job Updates, Interview Preparation Tips & Instant Alerts directly on WhatsApp.

👉 Join WhatsApp Now

📢 Join Our Telegram Channel

💼 Get Daily IT Job Updates, Interview Tips & Exclusive Alerts directly on Telegram!

👉 Join Telegram

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2022 - 2025 itfreesource.com

Enable Notifications OK No thanks