The following NAIRR Pilot project request writeup is shared with permission from PI Yun Huang from University of Illinois Urbana-Champaign:
A. Scientific/Technical Goal
Extensive research has been dedicated to creating a more inclusive video-based learning environment for Deaf and Hard of Hearing (DHH) learners. Efforts such as utilizing text captions [1, 2, 3] and integrating sign language captioning [4] have been explored. However, these methods alone have proven insufficient for serving the inclusive learning needs of DHH learners. The effectiveness of captions is limited by inaccuracies [1, 2, 3] and the diverse backgrounds of DHH learners, which impact how they learn with captions [5, 6]. This discrepancy underscores the need for improved accessibility in video-based learning, ensuring materials are not only reachable but also effectively support the unique learning experiences of DHH individuals.
The exploration of new technologies is essential to address the distinct ways DHH and hearing students access and learn from video-based materials. For example, recent studies emphasize the importance of enhancing learners’ emotional experiences in online learning environments, which could significantly boost motivation and academic performance [7, 8, 9, 10]. It is found promising to recognize, interpret, and categorize human emotions through multimodal cues [11, 12, 13] and tailor learning analytics tools to the needs of DHH learners, facilitating a more engaging and effective learning process. However, the integration of such technologies must be approached with caution, considering the unique challenges and ethical implications of using emotions in learning environments tailored to underserved populations [14, 15]. Our work aims to promote positive learning emotions of DHH learners in online settings and examining the potential of generative AI to enhance people’s curiosity for learning in an inclusive manner.
Artificial intelligence (AI) systems have demonstrated their abilities to boost the performance by summarizing content [16], recommending related work [17], and generating new hypotheses [18]. They can be used to promote students’ curiosity and learning efficiency. Thus, this project focuses on facilitating the discovery of insightful questions given existing video content to guide and educate learners in sca↵olding and creating learning ideas. We aim to introduce new systems/methods to augment learners’ discovery process using natural language generation models. We have collected and annotated an initial dataset of open-access video lectures. Our interaction design for learners to interact with AI for effective exploration has shown promising results and conducted a human evaluation study [19]. For example, learners could explore similar videos with a depth-first approach or compare different videos in a breadth-first mode.
Experiments of larger scales are needed to be conducted in order to fully understand the advantages of the system and undergo further improvements. Such experiments would require extensive GPU computational resources, which can be enabled by the NAIRR pilot program.
We estimate the project to have a duration of 6 months. The investigation will be conducted in two phases:
- Phase 1 (2 months): Development of question generation models and pipelines. This stage includes collecting and curating a diverse dataset of research publications across various disciplines for training and testing the models. We will then design and develop deep learning models for generating questions based on the video content such as captions and images, using di↵erent architectures such as transformer-based models and fine-tune them for the task. We will also define and implement evaluation metrics to assess the quality of generated questions, including dimensions such as relevance, novelty, and clarity.
- Phase 2 (4 months): Large-Scale User Study for comprehensive Human Evaluation. We will recruit participants from diverse backgrounds for the study. The interaction log data will be collected and analyzed assess the system’s performance and identify areas for improvement. We aim to complete a large-scale study (at least 100 participants) to comprehensively understand the e↵ect of generative AI (including LLM and multimodal language models).
The results of the study will be published in HCI conferences as one or multiple publications. Our initial estimation is to publish 1-2 conferences paper to SIGCHI 2025 (due in September).
B. Estimate
The majority of experiments in our next steps mainly involve the training, evaluation, and hosting of deep learning models (more specifically language models with text generation purposes). Our current code and framework for fine-tuning and developing local LLMs are built using Pytorch and Huggingface Transformers libraries. We expect to fine-tune and host local large/multimodal language models, thus we will request cost to cover the provisioning of GPU-enabled servers with computational power equivalent to AWS g5.12xlarge (with 4xA10 GPUs). To incorporate the use of multimodal language models into our project, we can extend our focus to include not only text-based interactions but also those involving voice and other modalities. This approach can significantly enhance the scope of our research, particularly in the area of persona-based brainstorming. We estimate an usage of one such instance 8 hours per day, based on AWS’s provided cost estimator, the total cost for GPU server adds up to $8,281.14. Additionally, we request resources to support the use of OpenAI API through Microsoft Azure for using LLMs such as GPT-4 for data augmentation and evaluation experiments. We estimate the cost to be approximately $500 for each (data augmentation and evaluation) of the two processes. An cost estimation table is presented as follows:
Item | Description | Estimated Cost (USD) |
GPU Server (AWS g5.12xlarge) | 4xA10 GPUs, 8 hours/day for 6 months | $8,281.14 |
Data Augmentation (Azure Open AI API) | Usage of LLMs like GPT-4 for data augmentation | $1,050 |
Evaluation (Azure OpenAI API) | Usage of LLMs like GPT-4 for evaluation experiments | $600 |
Total | $9,931.14 |
Table 1: Cost Estimation Table
Our project will also require storage for the research publications dataset, trained models, and intermediate outputs. We estimate a need for at least 1 TB of storage space to accommodate the data and models. Our project will also use open-access video lectures as the primary dataset for training and evaluating the models.
C. Support Needs
We already have experience managing and scaling these applications on cloud platforms such as AWS and Microsoft Azure through a previous CloudBank allocation. Standard helpdesk support should be sufficient.
D. Team And Team Preparedness
Our team is uniquely qualified and ready to execute this project, leveraging our expertise in Human- Computer Interaction, Human-AI Collaboration, Natural Language Processing, and AI-augmented learning technologies.
- Dr. Yun Huang (P.I.): an Associate Professor at the School of Information Sciences at the University of Illinois at Urbana-Champaign, specializes in Human-Computer Interaction and Human-AI Collaboration. Her work focuses on creating inclusive technologies that enhance educational opportunities, promote accessibility, and support social inclusion. Dr. Huang has a proven track record of leading interdisciplinary research projects and has received support from industry leaders such as OpenAI and Google, as well as government agencies like the National Science Foundation and the National Institute on Disability, Independent Living, and Rehabilitation Research.
- Yiren Liu: a PhD candidate in Informatics at the University of Illinois at Urbana-Champaign, has a strong background in natural language processing and interdisciplinary research. His work on conversational AI systems, empathetic dialogue generation, and AI-augmented learning technologies demonstrates his ability to leverage AI for societal benefit. Yiren has collaborated with industrial partners such as IBM and OpenAI and has been recognized for his research through numerous publications and awards.
Lead Time and System Usage. We expect a lead time of approximately 2-4 weeks before we can begin using the requested resources. This time will be used to set up accounts, configure the necessary software, and prepare the datasets for the project. Our team has recently used cloud-based GPU servers for training and evaluating deep learning models, with applications involving natural language processing and AI-augmented learning systems. Again, we have experience managing and scaling these applications on cloud platforms such as AWS and Microsoft Azure through a previous CloudBank allocation. We also had experience working with Slurm based HPC resources.
Team Members and Citizenship. The following team members will require accounts on the requested resources:
- Dr. Yun Huang - Citizenship: XXXX
- Yiren Liu - Citizenship: XXXX