Machine Learning

Meta Learning

Meta-learning is a type of machine learning that focuses on learning how to learn from a given set of tasks, to apply the acquired knowledge to new tasks and domains. Deep learning's long-tailed problem refers to imbalanced class distributions in datasets, where minority classes have limited samples. This can hinder model performance and fairness. Our lab research innovative techniques to address this issue, aiming to improve accuracy for underrepresented classes. We explore data augmentation, adaptive loss functions, and advanced learing algorithms.


Vision-Language Model

Vision-Language Model is an AI technique that combines the ability to process images and written text. It can "see" and "read" simultaneously, analyzing images to recognize objects and scenes while understanding and generating natural language descriptions or answers based on the visual input. These models are trained using paired image and text data, learning to understand the relationships between visuals and their corresponding textual descriptions. Vision-Language Models have applications in image captioning, visual question answering, and aiding individuals with visual impairments, enabling computers to better comprehend and communicate about the visual world.

AI-Agent with Large Language Models (LLMs)

Current Large Language Models (LLM), such as GPT-4o and Gemini-1.5, process user queries with high accuracy. However, these LLMs have problems such as the large size of the model to be deployed and used on local devices (e.g. cell phones, laptops), privacy and security issues such as personal information leakage because they run in the cloud, and the need for cloud/Wi-Fi connection always. To solve these problems, we are conducting research to develop Small Language Models (SLM) with the performance of existing LLM using various techniques such as high-quality training data generation and quantization, so that they can be used on local devices.

Image & Video Super-Resolution

Image and Video Super-Resolution are techniques in computer vision aimed at enhancing the resolution of images and videos, respectively. Image Super-Resolution (ISR) focuses on generating high-resolution images from low-resolution inputs. Video Super-Resolution (VSR) extends this concept to video sequences, enhancing frame resolution while maintaining temporal consistency. These can be achieved by learning-based approaches utilizing deep learning models like Convolutional Neural Networks (CNNs) and Transformers. Applications span various fields such as medical imaging, satellite imaging, security surveillance, and entertainment, enhancing image and video clarity for better analysis, identification, and viewing experiences.

Virtual Try-on

Virtual try-on is a narrow research branch of the Generative Adversarial Network and Diffusion Model. However, in order to perform the Virtual try-on problem, there are other subproblems that need to be worked on, such as image segmentation or pose estimation, and many more subproblems.

Sensor Fusion

Sensor fusion combines data from multiple sensors to improve the accuracy and robustness of information beyond what individual sensors provide. In a multimodal deep learning model that uses RGB and infrared (IR) images, the model integrates data from both modalities to enhance perception and produce a more accurate understanding of the scene. This fusion approach is particularly effective in scenarios such as maritime vessel and aircraft detection, where combining visible and thermal imagery improves detection accuracy under various environmental conditions​. 

Medical Image Processing

The study team is considering methods that make use of biomedical image datasets at all spatial scales, from molecular and cellular imaging to tissue and organ imaging. Although not restricted to these, the typical biomedical image datasets of interest include those obtained from: magnetic resonance, ultrasound, computed tomography, nuclear medicine, X-ray, optical and confocal microscopy video, and range data images. We concentrate on 3D medical detection and segmentation, as well as classification, detection, segmentation, instance segmentation, and panoptic segmentation of various biomarkers in medical images.