Uploaded by kingryan269

NLP & AI System Design Challenges: Educational Projects & Solutions

advertisement
Questions
1. One of the first challenges in turning handwritten notes into digital text is the quality of the
captured image. Students or field officers often take photos of notes with their phones, and
these images usually suffer from skewed angles, shadows, poor lighting, or unwanted
background clutter. A system must therefore be able to automatically clean up the photo
by cropping it to the note’s boundaries, correcting the skew, and removing shadows to
make it suitable for text recognition. The second challenge lies in recognizing the
handwriting itself. Unlike printed text, handwriting varies greatly between individuals in
terms of style, size, and consistency. An effective system must detect where the text is
located on the page, convert it into accurate digital text, preserve the original line breaks,
and indicate the confidence of recognition so users know which parts may need manual
correction. A third challenge is multilingual support. Many handwritten notes, especially
in African contexts, include a mix of languages—commonly English, Kiswahili. A robust
solution must be able to handle such code-switching, correctly capture diacritics and other
language-specific features, and produce text that reflects the original multilingual content
without errors.
2. A growing number of learners and professionals prefer speaking their answers rather than
writing them, especially in situations where speed, accessibility, or convenience is
important. However, transforming spoken responses into usable digital text is far from
straightforward. The first challenge lies in speech capture and transcription. A system must
reliably record answers through voice input and convert them into accurate text. This
becomes more complex when dealing with different accents, background noise, and
multilingual contexts where speakers may shift between English, Kiswahili, and local
languages. The second challenge involves post-processing and editing. Automatic speech
recognition often introduces errors—misheard words, misplaced punctuation, or missed
diacritics. To ensure high-quality output, the system must allow a human-in-the-loop
editing stage, where the user can review the transcription, see confidence scores, and
quickly correct mistakes through an intuitive interface. The third challenge is integration
with downstream tasks. Once answers are transcribed and edited, the system should make
them available for search, grading, or further natural language processing applications.
Balancing automation with human oversight is key to building a trustworthy and practical
solution. Challenge for Students: Design and implement an end-to-end NLP system that
captures spoken answers, transcribes them into text, and supports human-in-the-loop
editing. The solution should handle multilingual speech, provide confidence scores for
transcription, and ensure that the final edited text is both accurate and usable in educational
or professional settings.
3. One of the pressing needs in modern education is the development of intelligent systems
that can support teachers in grading and providing feedback to students. Manually
evaluating open-text answers is often time-consuming, and while automation promises
efficiency, the task is complex because it requires a deep understanding of meaning rather
than simple keyword matching.The system under consideration would receive a student’s
written answer to a question and use a large language model to evaluate its accuracy,
relevance, and completeness. This grading process must go beyond surface-level matching
and instead demonstrate the ability to capture semantic meaning, recognize partially correct
responses, and adapt to different ways of expressing the same idea. After grading, the
system would generate constructive feedback for the student. This feedback should not
only highlight what the student has done correctly but also point out mistakes and provide
suggestions for improvement. To be effective, the feedback must be clear, specific, and
aligned with the learning objectives of the question, avoiding vague or overly generic
comments. Finally, the system should deliver the feedback in both written and spoken
form. While text feedback is useful for many learners, some benefit more from hearing the
guidance in a natural, human-like voice. By converting the feedback into speech, the
system ensures inclusivity and accommodates diverse learning preferences, making the
process more engaging and accessible. The challenge for NLP students is to design and
implement such an end-to-end system: one that accepts student answers as input, grades
them using an LLM, produces personalized feedback as text, and then delivers the same
feedback through voice output. The solution must strike a balance between automation and
fairness, providing accurate evaluations while maintaining transparency and offering
motivating, student-centered guidance.
4. Design an end-to-end learning assistant that ingests class materials in multiple formats—
Word, PDF, plain text, web links, and lecture videos—and turns them into a trustworthy,
searchable knowledge base for objective revision. The system should automatically parse
documents (including OCR for scanned PDFs), crawl and clean linked pages, and
transcribe videos to text with timestamps. It then needs to segment content into coherent
chunks, normalize headings and figures, de-duplicate overlapping material, and attach rich
metadata (topic, source, page/timecode) to support precise retrieval. Using this curated
knowledge base, the assistant will leverage a large language model to generate revision
questions that are grounded in the source content rather than general world knowledge,
with explicit citations back to the exact spans used. Question styles should include wellformed MCQs with plausible, non-trivial distractors; short-answer prompts that target key
definitions, formulas, or steps; and higher-order items that require comparison,
explanation, or error analysis—optionally mapped to Bloom’s levels and tagged by
difficulty. For each item, the system must produce an answer key, a concise explanation
that references the source, and—in the case of MCQs—a rationale for why each distractor
is incorrect. Because many learners operate in multilingual settings, the pipeline should
handle English and Kiswahili content, preserve diacritics, and allow question/answer
generation in either language while still grounding to the same underlying sources. To
ensure reliability, the system should minimize hallucinations via retrieval-augmented
prompting, expose confidence estimates, and include a human-in-the-loop review mode
where instructors can accept, edit, or reject items before release. Finally, the assistant
should support adaptive revision by sampling questions to balance coverage across topics
and difficulty, track learner performance over time, and regenerate targeted practice on
weak areas—while offering an offline mode for privacy and low-connectivity contexts,
with an optional cloud model for heavier tasks. The core challenge is to integrate robust
multimodal ingestion, careful knowledge curation, and controllable LLM-based question
generation into a transparent, fair, and pedagogically sound workflow that measurably
improves students’ mastery of their notes.
5. In many classrooms and group projects, important discussions happen in meetings, yet
students often miss details if they are absent or cannot take full notes. A useful solution
would be a system that can automatically record meetings, convert the spoken conversation
into accurate written text, and distribute the notes to the relevant students. The challenge
for students is to design and implement such a system powered by Generative AI. The
system should first capture audio from a meeting and transcribe it into text, handling
multiple speakers, overlapping voices, and background noise. It should then summarize the
discussion in a structured way—highlighting key points, action items, and decisions—so
that the notes are clear and useful rather than a raw transcript. Once the summary is ready,
the system must identify which students need to receive the notes. This could be based on
a class list, a meeting invitation, or role assignments, and the delivery should be done
automatically via email, learning platforms, or other appropriate channels. To enhance
usefulness, the system may also allow personalization, such as sending a detailed transcript
to one student and a concise summary with action points to another. The task is not only
technical but also ethical: the system must ensure privacy, obtain consent for recording,
and give users the option to review or edit the notes before they are shared. By balancing
automation, accuracy, and responsibility, students are challenged to create a GenAIpowered meeting assistant that improves communication and reduces the burden of notetaking in academic settings.
6. Dictionaries are essential tools for language learning and research, yet traditional ones are
often limited by rigid keyword lookups. Users must enter the exact word form, spelling, or
language to get a result. This creates a barrier for learners who may use paraphrases,
synonyms, or even code-switch between English and Swahili. The challenge is to design
and implement a retrieval-augmented generation (RAG) system powered by a vector
database that makes dictionary entries more intelligent and user-friendly. The dictionary
will be built primarily in Swahili, with entries covering meanings, examples, and usage
notes. Users should be able to ask questions in either Swahili or English, and the system
must handle paraphrased queries gracefully. For example, a user might ask “What does
‘rafiki’ mean?” in English, or “Maana ya rafiki ni nini?” in Swahili, and still receive the
same clear and specific definition from the Swahili dictionary. The system should first
embed all dictionary entries into a vector space and store them in a vector database. When
a user query arrives, the system should also be embedded and compared with the database
to retrieve the most relevant entry, even if the query wording does not exactly match the
stored definition. The RAG pipeline should then use a generative model to produce a
natural, context-aware answer that directly cites the dictionary entry rather than inventing
content. To make the tool effective, students must also address challenges such as
multilingual handling (switching between English and Swahili), paraphrase detection, and
providing transparent answers that point back to the original dictionary entry. The final
system should support seamless interaction, returning accurate and reliable definitions
regardless of how the question is phrased.
7. Design a mobile-first expense assistant for mama mboga, bodaboda riders, and mochi that
is powered by a Large Language Model (LLM) and can ingest M-Pesa transactions
securely. The assistant should automatically pull spending data from the user’s phone—
either by parsing M-Pesa SMS receipts on-device or, where available and with explicit
consent, via a secure API connection—to build a continuously updated ledger. Users must
also be able to add cash expenses quickly by voice (Kiswahili/English) or simple text taps,
with the LLM classifying each item into clear categories (e.g., stock, fuel, repairs, meals,
float, airtime) and correcting for duplicates, refunds, or reversals.From this unified ledger,
the system should generate an end-of-month sheet that totals spend by category, shows
daily/weekly trends, and flags anomalies. Using the LLM plus lightweight rules, it should
deliver personalized, plain-language advice on where to cut costs and how much the user
could save next month—grounded in their actual patterns (e.g., “Fuel is 31% above your
3-month average; combining trips or refueling at Station X on off-peak days could save
~KES 1,200”). Advice must be transparent: every suggestion should link back to specific
transactions or trends so users can trust and verify the recommendations. Because
connectivity can be unreliable, core features (SMS ingestion, expense capture, summaries,
and basic advice) should work offline with on-device processing; when online, the app may
use a stronger cloud LLM for richer insights or multilingual explanations. The interface
must be low-friction and inclusive: big buttons, minimal typing, voice input in
Kiswahili/English, and simple visuals. Include a human-in-the-loop edit mode so users can
fix miscategorized items, split shared costs, and tag business vs. personal spend; these
corrections should retrain the categorizer over time. Privacy and safety are essential. The
system must request explicit consent before reading SMS or connecting to any wallet API,
store data encrypted on-device, and provide clear data controls (view/export/delete). No
sensitive data should leave the phone without opt-in. Finally, tailor small persona tweaks:
for mama mboga, track perishable-stock losses; for bodaboda, separate fuel vs.
maintenance vs. loans; for mochi, track materials vs. repairs vs. custom orders. The
challenge is to blend trustworthy M-Pesa-aware data ingestion, LLM-driven understanding
and advice, and a humane UX that actually helps micro-entrepreneurs spend smarter and
save more.
8. Many learners in African contexts want to strengthen their grammar, vocabulary, and
sentence construction in their local languages, but there are very few digital tools available
compared to English or other global languages. Traditional language learning apps often
neglect indigenous languages, leaving a gap for students, young people, and even adults
who want to improve literacy and communication skills in their mother tongue. The
challenge is to design and implement an AI-powered language tutor that focuses on
grammar, vocabulary, and sentence structure in a chosen local language (for example,
Kiswahili, Kikuyu, Luo, or Kalenjin). The tutor should allow a learner to input a word,
phrase, or sentence—either typed or spoken—and the system should provide explanations
in simple, accessible language. For vocabulary, it should offer the meaning, part of speech,
and example sentences. For grammar, it should explain how the word or phrase fits into
sentence structure and highlight any errors if the learner makes a mistake. For sentences, it
should suggest corrections, give alternative constructions, and explain why those
alternatives are valid. The system should be interactive and conversational, powered by a
large language model fine-tuned or adapted with retrieval from a local-language knowledge
base (grammar rules, dictionaries, example corpora). It should support multilingual
scaffolding, meaning that learners can ask questions in English or Kiswahili and get
explanations in their target local language or vice versa. To make the tool more engaging,
it can also include quizzes, practice exercises, or short dialogue roleplays generated
dynamically by the model. Key challenges include handling code-switching (where
learners mix English with local languages), ensuring the AI does not “hallucinate” incorrect
rules, and designing a feedback loop where learners can flag mistakes and improve the
tutor over time. The tutor must also run offline or with low bandwidth for rural contexts,
with an optional cloud model for richer interactions. The goal is to create a trustworthy AI
tutor that not only answers questions but also builds learner confidence, promotes the use
of local languages in digital spaces, and bridges the resource gap in AI for African
languages.
9. Kenyan court judgments, which are freely available through the Kenya Law website,
provide authoritative decisions on critical issues ranging from constitutional questions to
criminal appeals. While these documents are comprehensive, they are also lengthy and
complex, making it difficult for ordinary citizens, students, and researchers to quickly find
clear answers to their questions. For instance, someone may want to know what the
Supreme Court decided in the 2022 presidential election petition or how the courts have
interpreted the Muruatetu case on the death penalty. Searching through hundreds of pages
to locate such specific information is time-consuming and requires legal expertise. The
challenge for students is to design and build an AI-powered legal assistant that can take
plain-language questions, in either English or Kiswahili, and return concise and accurate
answers drawn directly from Kenyan judgments. The system should be able to retrieve the
most relevant passages from the official case texts and present them clearly, with proper
citations so that users can verify the source. To ensure accessibility, the tool should be
bilingual, handling queries in both languages while keeping the original legal citations
intact. Finally, the assistant must carry a clear disclaimer stating that it provides legal
information for educational purposes only, and not professional legal advice. By
completing this challenge, students will demonstrate how natural language processing can
be applied to make legal information more accessible and usable for wider
society.Judgments from the Kenyan courts, especially from the Supreme Court and the
Court of Appeal, are often very detailed, sometimes spanning over a hundred pages. While
such depth is crucial for legal practitioners, it poses a challenge for students, journalists,
and the general public who may need to quickly understand the core facts and outcome of
a case. Important details such as the facts of the case, the legal issues under
consideration, the court’s reasoning, and the final decision are often buried in dense
legal language. This makes it difficult for non-experts to access justice-related information
in a meaningful way.
10. The challenge for students is to create an automatic summarization system that processes
Kenyan court judgments and generates clear and structured case briefs. The system should
capture and organize the content into a logical flow, highlighting the main facts, the issues
at stake, the court’s reasoning, and the final orders. In addition to this structured summary,
the system should also produce a very short abstract—one or two sentences—that provides
the essence of the judgment for quick reading. Where possible, the summaries should retain
references to legal provisions and case citations to preserve the integrity of the original
ruling. By tackling this challenge, students will gain hands-on experience with
summarization techniques in NLP while addressing a real societal need: making justice
more accessible to everyone through technology.
11. Financial institutions such as the Central Bank of Kenya (CBK) rely on accurate, timely,
and context-aware analysis of financial data to make critical policy decisions. However,
much of this information is scattered across structured reports, unstructured documents,
and real-time market feeds. The challenge is to design an NLP-powered advisory system
that can process diverse financial data sources, extract key insights, and generate actionable
advice tailored to the needs of the CBK. The system should be capable of ingesting
multiple forms of data, including official reports (PDF, Word, spreadsheets), market news
articles, policy briefs, and even social media feeds that influence public sentiment. Using
Natural Language Processing techniques, the system must automatically extract relevant
indicators such as inflation trends, interest rate changes, currency fluctuations, loan
performance, and emerging risks in the banking sector. Once the data is analyzed, the
system should generate plain-language summaries and policy insights. For example, it
might highlight that “inflationary pressure is rising due to food and fuel prices, suggesting
a possible need to adjust the central bank rate,” or that “loan defaults in the agricultural
sector have spiked by 12% this quarter, indicating credit risk exposure.” The system should
also present evidence transparently, linking every recommendation to the underlying data
or source document. To improve usability, the assistant must be interactive: CBK officials
should be able to ask questions in English such as, “What are the main risks to the shilling
this month?” or “Summarize recent loan default trends in microfinance institutions.” The
system should respond with concise, evidence-grounded answers while also providing
access to the supporting raw data. Privacy and accuracy are crucial. The system should
avoid hallucinations by grounding its responses in a verified financial knowledge base, use
a retrieval-augmented generation (RAG) approach for transparency, and provide
confidence scores on its outputs. Ideally, it should also work in a multilingual mode,
offering summaries or explanations in Kiswahili when needed. The ultimate challenge for
NLP students is to create a prototype policy advisory tool that blends structured data
analytics with unstructured text understanding, supports interactive queries, and delivers
trustworthy financial insights that could guide CBK decision-making on monetary policy,
financial stability, and regulatory oversight.
12. Kenya National Bureau of Statistics (KNBS) reports such as the Economic Survey and the
Statistical Abstract are essential resources for understanding the country’s economy and
development. They contain large volumes of data on GDP, inflation, employment, trade,
agriculture, and county-level indicators. However, these reports are usually distributed in
PDF format, filled with hundreds of pages of text, tables, and charts. While rich in content,
the reports are not machine-friendly, and extracting insights from them often requires
laborious manual effort. Policymakers, researchers, and even students frequently struggle
to quickly locate specific numbers or trends, slowing down decision-making and learning.
The challenge is to design a system that can automatically process KNBS reports and
transform them into usable data and analytics. The system should be able to read PDF
documents, extract both text and statistical tables, and then clean and organize the data into
a structured format such as CSV files or database tables. Once the data is available in a
clean form, the system should support analysis by detecting key indicators, generating
trends, and producing visualizations. For example, it should be able to compare GDP
growth across years, show employment trends for different counties, or highlight inflation
changes over time. To make the system more accessible, students should add a natural
language component that allows users to ask questions such as “What was Kenya’s GDP
growth in 2024?” or “Show the unemployment rate in Nairobi County over the last five
years.” The system should respond with accurate information, supported by visualizations
or concise summaries, and must reference the exact KNBS report and page number where
the data was found. This will ensure transparency and allow users to verify the information.
By completing this challenge, students will learn how to combine natural language
processing, information extraction, and data analytics to solve a real-world problem. They
will demonstrate how AI can bridge the gap between unstructured reports and actionable
insights, making national statistics more accessible for decision-makers, researchers, and
the public.
Download