publications
2023
- EMNLPFrom Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense ReasoningZheyuan Zhang, Shane Storks , Fengyuan Hu , Sungryull Sohn , Moontae Lee , Honglak Lee , and Joyce ChaiIn Empirical Methods in Natural Language Processing (EMNLP) , 2023
Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing fast and intuitive heuristic thinking to make decisions based on past experience, then rationalizing the decisions through slower and deliberative analytic reasoning. We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning. We show that our proposed Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions, yielding state-of-the-art results on Tiered Reasoning for Intuitive Physics (TRIP). We also find that this improved coherence is a direct result of more faithful attention to relevant language context in each step of reasoning. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
- arXivEfficient In-Context Learning in Vision-Language Models for Egocentric VideosKeunwoo Peter Yu , Zheyuan Zhang, Fengyuan Hu , and Joyce ChaiIn arXiv , 2023
Recent advancements in text-only large language models (LLMs) have highlighted the benefit of in-context learning for adapting to new tasks with a few demonstrations. However, extending in-context learning to large vision-language models (VLMs) using a huge amount of naturalistic vision-language data has shown limited success, particularly for egocentric videos, due to high data collection costs. We propose a novel training method 𝔼fficient 𝕀n-context 𝕃earning on 𝔼gocentric 𝕍ideos (𝔼𝕀𝕃𝔼𝕍), which elicits in-context learning in VLMs for egocentric videos without requiring massive, naturalistic egocentric video datasets. 𝔼𝕀𝕃𝔼𝕍 involves architectural and training data adaptations to allow the model to process contexts interleaved with video clips and narrations, sampling of in-context examples with clusters of similar verbs and nouns, use of data with skewed marginal distributions with a long tail of infrequent verbs and nouns, as well as homonyms and synonyms. Our evaluations show that 𝔼𝕀𝕃𝔼𝕍-trained models outperform larger VLMs trained on a huge amount of naturalistic data in in-context learning. Furthermore, they can generalize to not only out-of-distribution, but also novel, rare egocentric videos and texts via in-context learning, demonstrating potential for applications requiring cost-effective training, and rapid post-deployment adaptability.
2022
- ONLINEBot Lab: Autonomous Ground Vehicle from Low-level Control, SLAM to Planning and ExplorationZheyuan Zhang, Yu Zhu , Manu Aatitya Raajan Priyadharshini , and Thirumalaesh AshokkumarIn , 2022
The MBot mobile robotics project aims to develop an autonomous ground vehicle to navigate in the unknown environment. There are four key components of the project. Low-level control executes commands from high-level system to drive the robot based on velocity models and kinematics with a PID controller. Simultaneous Localization and Mapping (SLAM) is at the core of MBot project which allows the robot to use LiDAR to build a map of the environment and localize in that map at the same time. We developed mapping module, particle filter with action model and sensor model. Additionally, we implemented AStar (A*) heuristic search with pruning algorithm for path planning and frontier-guided algorithm for exploration. We present the theory and detailed imple- mentation with experimental results for the project in this report.
- ONLINEA Computational Cognitive Model of Human Memory Based on Invertible Neural NetworksZheyuan ZhangIn , 2022
Cognitive modeling is a prerequisite for an intelligent agent to be more like humans or other animals, and memory is the basis for higher mental activities such as thinking, generating emotions, and imagining. This paper presents a computational model of memory to simulate how we remember and recall things. It demonstrates the feasibility of using neural networks to encode information stored in the memory. The computational model is based on the multi-store model of memory, which divides the memory into a sensory register, a short-term store, and a long-term store. The model accomplishes the process of recovering memory traces by using invertible neural networks (INNs). Furthermore, the paper established a bridge between artificial intelligence and psychology, in which psychology can inspire and advance AI while AI can be used to explain psychology in a computational manner.
2021
- INSAILow-cost Solution for Vision-based Robotic GraspingZheyuan Zhang, and Huiliang ShangIn International Conference on Networking Systems of AI (IEEE INSAI) , 2021
Robotic grasping is a fundamental task for many robots to interact with the outside world, and it is still challenging. There are at least three tasks for robot grasping: object localization, grasp pose estimation, and motion planning. This paper presents a low-cost machine vision solution for robotic grasping based on template matching, including comparisons between different approaches, including state-of-the-art YOLOv4 object detection and edge-based geometric shape detection. The robotic grasping solution presented in this paper shows a high pick-and-place success rate. An improvement for template matching is implemented in this paper as well. This paper also provides detailed analysis, algorithms, and experiments.