NLP & AI System Design Challenges: Educational Projects & Solutions

Questions 1. One of the first challenges in turning handwritten notes into digital text is the quality of the captured image. Students or field officers often take photos of notes with their phones, and these images usually suffer from skewed angles, shadows, poor lighting, or unwanted background clutter. A system must therefore be able to automatically clean up the photo by cropping it to the note’s boundaries, correcting the skew, and removing shadows to make it suitable for text recognition. The second challenge lies in recognizing the handwriting itself. Unlike printed text, handwriting varies greatly between individuals in terms of style, size, and consistency. An effective system must detect where the text is located on the page, convert it into accurate digital text, preserve the original line breaks, and indicate the confidence of recognition so users know which parts may need manual correction. A third challenge is multilingual support. Many handwritten notes, especially in African contexts, include a mix of languages—commonly English, Kiswahili. A robust solution must be able to handle such code-switching, correctly capture diacritics and other language-specific features, and produce text that reflects the original multilingual content without errors. 2. A growing number of learners and professionals prefer speaking their answers rather than writing them, especially in situations where speed, accessibility, or convenience is important. However, transforming spoken responses into usable digital text is far from straightforward. The first challenge lies in speech capture and transcription. A system must reliably record answers through voice input and convert them into accurate text. This becomes more complex when dealing with different accents, background noise, and multilingual contexts where speakers may shift between English, Kiswahili, and local languages. The second challenge involves post-processing and editing. Automatic speech recognition often introduces errors—misheard words, misplaced punctuation, or missed diacritics. To ensure high-quality output, the system must allow a human-in-the-loop editing stage, where the user can review the transcription, see confidence scores, and quickly correct mistakes through an intuitive interface. The third challenge is integration with downstream tasks. Once answers are transcribed and edited, the system should make them available for search, grading, or further natural language processing applications. Balancing automation with human oversight is key to building a trustworthy and practical solution. Challenge for Students: Design and implement an end-to-end NLP system that captures spoken answers, transcribes them into text, and supports human-in-the-loop editing. The solution should handle multilingual speech, provide confidence scores for transcription, and ensure that the final edited text is both accurate and usable in educational or professional settings. 3. One of the pressing needs in modern education is the development of intelligent systems that can support teachers in grading and providing feedback to students. Manually evaluating open-text answers is often time-consuming, and while automation promises efficiency, the task is complex because it requires a deep understanding of meaning rather than simple keyword matching.The system under consideration would receive a student’s written answer to a question and use a large language model to evaluate its accuracy, relevance, and completeness. This grading process must go beyond surface-level matching and instead demonstrate the ability to capture semantic meaning, recognize partially correct responses, and adapt to different ways of expressing the same idea. After grading, the system would generate constructive feedback for the student. This feedback should not only highlight what the student has done correctly but also point out mistakes and provide suggestions for improvement. To be effective, the feedback must be clear, specific, and aligned with the learning objectives of the question, avoiding vague or overly generic comments. Finally, the system should deliver the feedback in both written and spoken form. While text feedback is useful for many learners, some benefit more from hearing the guidance in a natural, human-like voice. By converting the feedback into speech, the system ensures inclusivity and accommodates diverse learning preferences, making the process more engaging and accessible. The challenge for NLP students is to design and implement such an end-to-end system: one that accepts student answers as input, grades them using an LLM, produces personalized feedback as text, and then delivers the same feedback through voice output. The solution must strike a balance between automation and fairness, providing accurate evaluations while maintaining transparency and offering motivating, student-centered guidance. 4. Design an end-to-end learning assistant that ingests class materials in multiple formats— Word, PDF, plain text, web links, and lecture videos—and turns them into a trustworthy, searchable knowledge base for objective revision. The system should automatically parse documents (including OCR for scanned PDFs), crawl and clean linked pages, and transcribe videos to text with timestamps. It then needs to segment content into coherent chunks, normalize headings and figures, de-duplicate overlapping material, and attach rich metadata (topic, source, page/timecode) to support precise retrieval. Using this curated knowledge base, the assistant will leverage a large language model to generate revision questions that are grounded in the source content rather than general world knowledge, with explicit citations back to the exact spans used. Question styles should include wellformed MCQs with plausible, non-trivial distractors; short-answer prompts that target key definitions, formulas, or steps; and higher-order items that require comparison, explanation, or error analysis—optionally mapped to Bloom’s levels and tagged by difficulty. For each item, the system must produce an answer key, a concise explanation that references the source, and—in the case of MCQs—a rationale for why each distractor is incorrect. Because many learners operate in multilingual settings, the pipeline should handle English and Kiswahili content, preserve diacritics, and allow question/answer generation in either language while still grounding to the same underlying sources. To ensure reliability, the system should minimize hallucinations via retrieval-augmented prompting, expose confidence estimates, and include a human-in-the-loop review mode where instructors can accept, edit, or reject items before release. Finally, the assistant should support adaptive revision by sampling questions to balance coverage across topics and difficulty, track learner performance over time, and regenerate targeted practice on weak areas—while offering an offline mode for privacy and low-connectivity contexts, with an optional cloud model for heavier tasks. The core challenge is to integrate robust multimodal ingestion, careful knowledge curation, and controllable LLM-based question generation into a transparent, fair, and pedagogically sound workflow that measurably improves students’ mastery of their notes. 5. In many classrooms and group projects, important discussions happen in meetings, yet students often miss details if they are absent or cannot take full notes. A useful solution would be a system that can automatically record meetings, convert the spoken conversation into accurate written text, and distribute the notes to the relevant students. The challenge for students is to design and implement such a system powered by Generative AI. The system should first capture audio from a meeting and transcribe it into text, handling multiple speakers, overlapping voices, and background noise. It should then summarize the discussion in a structured way—highlighting key points, action items, and decisions—so that the notes are clear and useful rather than a raw transcript. Once the summary is ready, the system must identify which students need to receive the notes. This could be based on a class list, a meeting invitation, or role assignments, and the delivery should be done automatically via email, learning platforms, or other appropriate channels. To enhance usefulness, the system may also allow personalization, such as sending a detailed transcript to one student and a concise summary with action points to another. The task is not only technical but also ethical: the system must ensure privacy, obtain consent for recording, and give users the option to review or edit the notes before they are shared. By balancing automation, accuracy, and responsibility, students are challenged to create a GenAIpowered meeting assistant that improves communication and reduces the burden of notetaking in academic settings. 6. Dictionaries are essential tools for language learning and research, yet traditional ones are often limited by rigid keyword lookups. Users must enter the exact word form, spelling, or language to get a result. This creates a barrier for learners who may use paraphrases, synonyms, or even code-switch between English and Swahili. The challenge is to design and implement a retrieval-augmented generation (RAG) system powered by a vector database that makes dictionary entries more intelligent and user-friendly. The dictionary will be built primarily in Swahili, with entries covering meanings, examples, and usage notes. Users should be able to ask questions in either Swahili or English, and the system must handle paraphrased queries gracefully. For example, a user might ask “What does ‘rafiki’ mean?” in English, or “Maana ya rafiki ni nini?” in Swahili, and still receive the same clear and specific definition from the Swahili dictionary. The system should first embed all dictionary entries into a vector space and store them in a vector database. When a user query arrives, the system should also be embedded and compared with the database to retrieve the most relevant entry, even if the query wording does not exactly match the stored definition. The RAG pipeline should then use a generative model to produce a natural, context-aware answer that directly cites the dictionary entry rather than inventing content. To make the tool effective, students must also address challenges such as multilingual handling (switching between English and Swahili), paraphrase detection, and providing transparent answers that point back to the original dictionary entry. The final system should support seamless interaction, returning accurate and reliable definitions regardless of how the question is phrased. 7. Design a mobile-first expense assistant for mama mboga, bodaboda riders, and mochi that is powered by a Large Language Model (LLM) and can ingest M-Pesa transactions securely. The assistant should automatically pull spending data from the user’s phone— either by parsing M-Pesa SMS receipts on-device or, where available and with explicit consent, via a secure API connection—to build a continuously updated ledger. Users must also be able to add cash expenses quickly by voice (Kiswahili/English) or simple text taps, with the LLM classifying each item into clear categories (e.g., stock, fuel, repairs, meals, float, airtime) and correcting for duplicates, refunds, or reversals.From this unified ledger, the system should generate an end-of-month sheet that totals spend by category, shows daily/weekly trends, and flags anomalies. Using the LLM plus lightweight rules, it should deliver personalized, plain-language advice on where to cut costs and how much the user could save next month—grounded in their actual patterns (e.g., “Fuel is 31% above your 3-month average; combining trips or refueling at Station X on off-peak days could save ~KES 1,200”). Advice must be transparent: every suggestion should link back to specific transactions or trends so users can trust and verify the recommendations. Because connectivity can be unreliable, core features (SMS ingestion, expense capture, summaries, and basic advice) should work offline with on-device processing; when online, the app may use a stronger cloud LLM for richer insights or multilingual explanations. The interface must be low-friction and inclusive: big buttons, minimal typing, voice input in Kiswahili/English, and simple visuals. Include a human-in-the-loop edit mode so users can fix miscategorized items, split shared costs, and tag business vs. personal spend; these corrections should retrain the categorizer over time. Privacy and safety are essential. The system must request explicit consent before reading SMS or connecting to any wallet API, store data encrypted on-device, and provide clear data controls (view/export/delete). No sensitive data should leave the phone without opt-in. Finally, tailor small persona tweaks: for mama mboga, track perishable-stock losses; for bodaboda, separate fuel vs. maintenance vs. loans; for mochi, track materials vs. repairs vs. custom orders. The challenge is to blend trustworthy M-Pesa-aware data ingestion, LLM-driven understanding and advice, and a humane UX that actually helps micro-entrepreneurs spend smarter and save more. 8. Many learners in African contexts want to strengthen their grammar, vocabulary, and sentence construction in their local languages, but there are very few digital tools available compared to English or other global languages. Traditional language learning apps often neglect indigenous languages, leaving a gap for students, young people, and even adults who want to improve literacy and communication skills in their mother tongue. The challenge is to design and implement an AI-powered language tutor that focuses on grammar, vocabulary, and sentence structure in a chosen local language (for example, Kiswahili, Kikuyu, Luo, or Kalenjin). The tutor should allow a learner to input a word, phrase, or sentence—either typed or spoken—and the system should provide explanations in simple, accessible language. For vocabulary, it should offer the meaning, part of speech, and example sentences. For grammar, it should explain how the word or phrase fits into sentence structure and highlight any errors if the learner makes a mistake. For sentences, it should suggest corrections, give alternative constructions, and explain why those alternatives are valid. The system should be interactive and conversational, powered by a large language model fine-tuned or adapted with retrieval from a local-language knowledge base (grammar rules, dictionaries, example corpora). It should support multilingual scaffolding, meaning that learners can ask questions in English or Kiswahili and get explanations in their target local language or vice versa. To make the tool more engaging, it can also include quizzes, practice exercises, or short dialogue roleplays generated dynamically by the model. Key challenges include handling code-switching (where learners mix English with local languages), ensuring the AI does not “hallucinate” incorrect rules, and designing a feedback loop where learners can flag mistakes and improve the tutor over time. The tutor must also run offline or with low bandwidth for rural contexts, with an optional cloud model for richer interactions. The goal is to create a trustworthy AI tutor that not only answers questions but also builds learner confidence, promotes the use of local languages in digital spaces, and bridges the resource gap in AI for African languages. 9. Kenyan court judgments, which are freely available through the Kenya Law website, provide authoritative decisions on critical issues ranging from constitutional questions to criminal appeals. While these documents are comprehensive, they are also lengthy and complex, making it difficult for ordinary citizens, students, and researchers to quickly find clear answers to their questions. For instance, someone may want to know what the Supreme Court decided in the 2022 presidential election petition or how the courts have interpreted the Muruatetu case on the death penalty. Searching through hundreds of pages to locate such specific information is time-consuming and requires legal expertise. The challenge for students is to design and build an AI-powered legal assistant that can take plain-language questions, in either English or Kiswahili, and return concise and accurate answers drawn directly from Kenyan judgments. The system should be able to retrieve the most relevant passages from the official case texts and present them clearly, with proper citations so that users can verify the source. To ensure accessibility, the tool should be bilingual, handling queries in both languages while keeping the original legal citations intact. Finally, the assistant must carry a clear disclaimer stating that it provides legal information for educational purposes only, and not professional legal advice. By completing this challenge, students will demonstrate how natural language processing can be applied to make legal information more accessible and usable for wider society.Judgments from the Kenyan courts, especially from the Supreme Court and the Court of Appeal, are often very detailed, sometimes spanning over a hundred pages. While such depth is crucial for legal practitioners, it poses a challenge for students, journalists, and the general public who may need to quickly understand the core facts and outcome of a case. Important details such as the facts of the case, the legal issues under consideration, the court’s reasoning, and the final decision are often buried in dense legal language. This makes it difficult for non-experts to access justice-related information in a meaningful way. 10. The challenge for students is to create an automatic summarization system that processes Kenyan court judgments and generates clear and structured case briefs. The system should capture and organize the content into a logical flow, highlighting the main facts, the issues at stake, the court’s reasoning, and the final orders. In addition to this structured summary, the system should also produce a very short abstract—one or two sentences—that provides the essence of the judgment for quick reading. Where possible, the summaries should retain references to legal provisions and case citations to preserve the integrity of the original ruling. By tackling this challenge, students will gain hands-on experience with summarization techniques in NLP while addressing a real societal need: making justice more accessible to everyone through technology. 11. Financial institutions such as the Central Bank of Kenya (CBK) rely on accurate, timely, and context-aware analysis of financial data to make critical policy decisions. However, much of this information is scattered across structured reports, unstructured documents, and real-time market feeds. The challenge is to design an NLP-powered advisory system that can process diverse financial data sources, extract key insights, and generate actionable advice tailored to the needs of the CBK. The system should be capable of ingesting multiple forms of data, including official reports (PDF, Word, spreadsheets), market news articles, policy briefs, and even social media feeds that influence public sentiment. Using Natural Language Processing techniques, the system must automatically extract relevant indicators such as inflation trends, interest rate changes, currency fluctuations, loan performance, and emerging risks in the banking sector. Once the data is analyzed, the system should generate plain-language summaries and policy insights. For example, it might highlight that “inflationary pressure is rising due to food and fuel prices, suggesting a possible need to adjust the central bank rate,” or that “loan defaults in the agricultural sector have spiked by 12% this quarter, indicating credit risk exposure.” The system should also present evidence transparently, linking every recommendation to the underlying data or source document. To improve usability, the assistant must be interactive: CBK officials should be able to ask questions in English such as, “What are the main risks to the shilling this month?” or “Summarize recent loan default trends in microfinance institutions.” The system should respond with concise, evidence-grounded answers while also providing access to the supporting raw data. Privacy and accuracy are crucial. The system should avoid hallucinations by grounding its responses in a verified financial knowledge base, use a retrieval-augmented generation (RAG) approach for transparency, and provide confidence scores on its outputs. Ideally, it should also work in a multilingual mode, offering summaries or explanations in Kiswahili when needed. The ultimate challenge for NLP students is to create a prototype policy advisory tool that blends structured data analytics with unstructured text understanding, supports interactive queries, and delivers trustworthy financial insights that could guide CBK decision-making on monetary policy, financial stability, and regulatory oversight. 12. Kenya National Bureau of Statistics (KNBS) reports such as the Economic Survey and the Statistical Abstract are essential resources for understanding the country’s economy and development. They contain large volumes of data on GDP, inflation, employment, trade, agriculture, and county-level indicators. However, these reports are usually distributed in PDF format, filled with hundreds of pages of text, tables, and charts. While rich in content, the reports are not machine-friendly, and extracting insights from them often requires laborious manual effort. Policymakers, researchers, and even students frequently struggle to quickly locate specific numbers or trends, slowing down decision-making and learning. The challenge is to design a system that can automatically process KNBS reports and transform them into usable data and analytics. The system should be able to read PDF documents, extract both text and statistical tables, and then clean and organize the data into a structured format such as CSV files or database tables. Once the data is available in a clean form, the system should support analysis by detecting key indicators, generating trends, and producing visualizations. For example, it should be able to compare GDP growth across years, show employment trends for different counties, or highlight inflation changes over time. To make the system more accessible, students should add a natural language component that allows users to ask questions such as “What was Kenya’s GDP growth in 2024?” or “Show the unemployment rate in Nairobi County over the last five years.” The system should respond with accurate information, supported by visualizations or concise summaries, and must reference the exact KNBS report and page number where the data was found. This will ensure transparency and allow users to verify the information. By completing this challenge, students will learn how to combine natural language processing, information extraction, and data analytics to solve a real-world problem. They will demonstrate how AI can bridge the gap between unstructured reports and actionable insights, making national statistics more accessible for decision-makers, researchers, and the public.

NLP & AI System Design Challenges: Educational Projects & Solutions

Related documents

Products

Support

NLP & AI System Design Challenges: Educational Projects & Solutions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib