TRAILBLAZERS
Graph Theory (DFS, BFS, MST, Topological Sort)
. Given a directed graph, implement Topological Sorting using both DFS and Kahn’s Algorithm.
1
2. Given an undirected weighted graph, implement Kruskal’s Algorithm to find the Minimum
Spanning Tree (MST).
3. Write a program to perform BFS and DFS traversal of agraph.Explainthedifferencesintheir
output for different types of graphs.
4. [What are the key differences between Prim’s and Kruskal’s Algorithm for MST? When is
Kruskal’s preferred over Prim’s?
5. Given a graph with cycles, write a program to detect cycles using DFS.
6. ImplementDijkstra’sAlgorithmusingapriorityqueue(Min-Heap)tofindtheshortestpathfroma
source node to all other nodes.
7. Explain how Graph Theory is used in Big Data applications like Google PageRank or Social
Network Analysis.
8. Writeaprogramtofindthenumberofstronglyconnectedcomponents(SCCs)inadirectedgraph
using Kosaraju’s Algorithm.
Big Data Tools (Hadoop, Kafka, Spark)
. Explain the Hadoop ecosystem and the role of HDFS, MapReduce, YARN, and Hive.
9
10. WriteaMapReduceprograminPythonorJavatocounttheoccurrencesofeachwordinalarge
dataset.
11. What are the key differences between Apache Kafka and RabbitMQ? When would you prefer
Kafka over RabbitMQ?
12. Using Apache Spark, write a PySpark program to process a large dataset and compute the
average salary per department from a CSV file.
Big Data: Streaming Pipeline with Kafka & Spark
3. Explain how Kafka and Spark Streaming can be integrated to process real-time data.
1
14. Write a Kafka Producer & Consumer program in Python to send and receive messages.
15. ImplementaSparkStreamingjobthatreadsreal-timedatafromKafka,processesit(e.g.,filters
logs with errors), and writes the output to a file.
PATHFINDERS
Graph Theory (DFS, BFS, MST)
. Implement BFS and DFS traversal for a given graph represented as an adjacency list.
1
2. Write a program to detect cycles in a directed graph using DFS.
3. Implement Kruskal’s Algorithm to find the Minimum Spanning Tree (MST) of a weighted
undirected graph.
4. Compare and contrastPrim’sandKruskal’sAlgorithms.Inwhatscenariosisonepreferredover
the other?
5. Given a directed acyclic graph (DAG), implement Topological Sorting using Kahn’s Algorithm.
6. Explain how Dijkstra’s Algorithm works. How is it different from the Bellman-Ford Algorithm?
7. Implement Dijkstra’s shortest path algorithm using a Min-Heap (Priority Queue).
8. Discuss real-world applications of Graph Theory in Big Data, such as in recommendation
systems, fraud detection, and Google PageRank.
Big Data Tools (Hadoop, Kafka, Spark)
9. ExplaintheHadoopecosystem.WhatarethecorecomponentsofHadoop,andhowdotheywork
together?
10. Implement a simple MapReduce job to count word occurrences in a given large text file.
11. HowdoesKafkaensurefaulttoleranceandscalability?ExplainKafkapartitions,replication,and
consumer groups.
12. Write a PySpark program to process alargedatasetcontainingcustomertransactionsandfind
the total sales per region.
Big Data: Streaming Pipeline with Kafka & Spark
13.Explain how Kafka and Spark Streaming can be integrated.Whataresomereal-world
use cases?
14.Implement a Kafka Producer and Consumer in Python to simulate a real-time
event-driven system.
15.UsingSparkStreaming,writeaprogramthatconsumesdatafromKafka,filtersrecords
containing a specific keyword (e.g., “ERROR” logs), and writes the results to a file.
ENDANGERED
DSA + ML Supervised Learning
1. Given a graph represented as an adjacency list, find all possible paths from a source to a
destination node using DFS.
2. Implement a shortest path algorithm for an unweighted graph using BFS and compare its
efficiency with Dijkstra’s Algorithm for a weighted graph.
3. You have a social network graph where users are represented as nodes and connections as
edges. Design an algorithm to recommend friends based on common connections using BFS.
4. Given a graph with N nodes and M edges, detect whether it contains a cycle using DFS.
5. Explain how graph traversal (BFS/DFS) can be used in web crawlers. What challenges arise
when crawling large-scale web graphs?
6. Givenadatasetwithmissingvalues,explainvariousstrategiesforimputationandtheirimpacton
supervised learning models.
7. Given a large dataset for a classification problem, explain how class imbalance affects model
performance and techniques to handle it.
Linear & Logistic Regression
8. Givenadatasetwithmultiplefeatures,explainhowfeatureselectionimprovestheperformanceof
a Linear Regression model.
9. You are building a predictive model for customer churn. Explain how you would use Logistic
Regression, and interpret the meaning of the coefficients.
10. How would you detect and handle multicollinearity in a Linear Regression model?
11. Explain how regularization (L1 and L2) prevents overfitting in regression models.
12. Givenadatasetwherethetargetvariableisbinary,howdoyouinterprettheprobabilitiesoutput
by Logistic Regression? Explain the role of the sigmoid function.
13. Discuss the effect of outliers on a Linear Regression model and how to mitigate their impact.
ML: House Price Prediction
14. You are given a dataset with house prices, number ofbedrooms,squarefootage,andlocation
features. Explain the steps to train a regression model to predict house prices.
15. How would you improve the performance of a house price prediction model if it underfits the
data? What techniques would you apply to optimize feature engineering and model selection?