Shao-Yu (Becca) Chang
email: shaoyuc3 -at- iis.sinica.edu.tw

| CV | Github |

I am a full-time Research Assistant in Computer Vision & Machine Learning Lab at Academia Sinica working with Professor Tyng-Luh Liu. My research interests lie in the intersection of Computer Vision and Machine Learning. I am especially interested in visual systems's synthesis and analysis ability for images and videos. I would like to build robust and innovative models to help interpret the visual world and provide user-friendly experiences in complex scenarios.

Prior to this, I earned my Master's degree in Applied Mathematics with a specialization in Computational Science and Engineering (CSE) at the University of Illinois Urbana-Champaign, where I worked with Professor David Forsyth. My Bachelor's degree, in Applied Mathematics, was obtained from National Chiao Tung University (NCTU), during which I served as a research assistant under the guidance of Professor Chin-Tien Wu.

sym

Academia Sinica
Research Assistant
Jul. 23 - Present

sym

UIUC
MS in Applied Math
Aug. 21 - May. 23

sym

NCTU MMSC
Research Assistant
Sep. 20 - Jun. 21

sym

NCTU
BS in Applied Math
Sep. 16 - Jun. 20



  Preprints
sym

DiffusionAtlas: High-fidelity Consistent Diffusion Video Editing
Shao-Yu Chang, Hwann-Tzong Chen, Tyng-Luh Liu

| abstract | project page | arXiv |

We present a diffusion-based video editing framework, namely DiffusionAtlas, which can achieve both frame consistency and high fidelity in editing video object appearance. Despite the success in image editing, diffusion models still encounter significant hindrances when it comes to video editing due to the challenge of maintaining spatiotemporal consistency in the object's appearance across frames. On the other hand, atlas-based techniques allow propagating edits on the layered representations consistently back to frames. However, they often struggle to create editing effects that adhere correctly to the user-provided textual or visual conditions due to the limitation of editing the texture atlas on a fixed UV mapping field. Our method leverages a visual-textual diffusion model to edit objects directly on the diffusion atlases, ensuring coherent object identity across frames. We design a loss term with atlas-based constraints and build a pretrained text-driven diffusion model as pixel-wise guidance for refining shape distortions and correcting texture deviations. Qualitative and quantitative experiments show that our method outperforms state-of-the-art methods in achieving consistent high-fidelity video-object editing.

  Publications
sym

Preserving Image Properties Through Initializations in Diffusion Models
Jeffrey Zhang, Shao-Yu Chang, Kedan Li, David Forsyth
WACV 2024

| abstract | project page | paper |

Retail photography imposes specific requirements on images. For instance, images may need uniform background colors, consistent model poses, centered products, and consistent lighting. Minor deviations from these standards impact a site's aesthetic appeal, making the images unsuitable for use. We show that Stable Diffusion methods, as currently applied, do not respect these requirements. The usual practice of training the denoiser with a very noisy image and starting inference with a sample of pure noise leads to inconsistent generated images during inference. This inconsistency occurs because it is easy to tell the difference between samples of the training and inference distributions. As a result, a network trained with centered retail product images with uniform backgrounds generates images with erratic backgrounds. The problem is easily fixed by initializing inference with samples from an approximation of noisy images. However, in using such an approximation, the joint distribution of text and noisy image at inference time still slightly differs from that at training time. This discrepancy is corrected by training the network with samples from the approximate noisy image distribution. Extensive experiments on real application data show significant qualitative and quantitative improvements in performance from adopting these procedures. Finally, our procedure can interact well with other control-based methods to further enhance the controllability of diffusion-based methods.

sym

Controlling Virtual Try-on Pipeline Through Rendering Policies
Kedan Li, Jeffrey Zhang, Shao-Yu Chang, David Forsyth
WACV 2024

| abstract | paper |

This paper shows how to impose rendering policies on a virtual try-on (VTON) pipeline. Our rendering policies are lightweight procedural descriptions of how the pipeline should render outfits or render particular types of garments. Our policies are procedural expressions describing offsets to the control points for each set of garment types. The policies are easily authored and are generalizable to any outfit composed of garments of similar types. We describe a VTON pipeline that accepts our policies to modify garment drapes and produce high-quality try-on images with garment attributes preserved.
Layered outfits are a particular challenge to VTON systems because learning to coordinate warps between multiple garments so that nothing sticks out is difficult. Our rendering policies offer a lightweight and effective procedure to achieve this coordination, while also allowing precise manipulation of drape. Drape describes the way in which a garment is worn (for example, a shirt could be tucked or untucked).
Quantitative and qualitative evaluations demonstrate that our method allows effective manipulation of drape and produces significant measurable improvements in rendering quality for complicated layering interactions.

sym

Comparing Handwriting Fluency in English Language Teaching Using Computer Vision Techniques
Chuan-Wei Syu, Shao-Yu Chang, Chi-Cheng Chang.
ICITL 2023

| abstract | paper |

Educational materials play a vital role in effectively conveying information to learners, with the readability and legibility of written text serving as crucial factors. This study investigates the influence of font selection on educational materials and explores the relationship between handwriting fluency and cognitive load. By identifying challenges in written expression, such as reduced working memory capacity, text organization difficulties, and content recall issues, the study sheds light on the significance of neat handwriting. The research emphasizes the relevance of neat handwriting in critical examinations, including college entrance exams, academic English exams, and job interviews, where the fluency of one’s handwriting can impact the decision-making process of interviewers. This highlights the value of handwriting fluency beyond educational contexts. Advancements in computer science and machine vision present new opportunities for automating font evaluation and selection. By employing machine vision algorithms to objectively analyze visual features of fonts, such as serifs, stroke width, and character spacing, the legibility and readability of fonts used in English language teaching materials are assessed. In this study, machine vision techniques are applied to score fonts used in educational materials. The OpenCV computer vision library is utilized to extract visual features of fonts from images, enabling the analysis of their legibility and readability. The primary objective is to provide educators with an automated and objective tool for scoring handwriting, reducing visual fatigue, and ensuring impartial evaluations. This research contributes to enhancing the quality of educational materials and provides valuable insights for educators, researchers, and font designers.

  Posters
sym

Classification of Satellite Images with Spectral Indices
Shao-Yu Chang, Chin-Tien Wu
SIAM CSE 2021 (Poster Presentation)

| abstract | poster |

Spectral indices are combinations of the pixel values from two or more spectral bands in a multispectral image. Spectral indices are used to highlight pixels showing the relative abundance or lack of a land-cover type of interest in an image. This study aims to build a feature space with some spectral indices and see if those indices are useful and efficient for the classification of satellite images. The training data is extracted from the NAIP program with six classes (building, barren land, trees, grassland, road, and water). Each image has four spectral bands (RED, GREEN, BLUE, and NIR). After some comparison and analysis, the spectral indices used are Modified Soil-adjusted Vegetation Index (MSAVI), Atmospherically Resistant Vegetation Index (ARVI), Normalized Difference Water Index (NDWI), Difference Spectral Building Index (DSBI), and Road Extraction Index (REI). With these spectral indices and the distribution of the four spectral bands (here, we use mean and standard deviation to represent each distribution), an 18-dimensional feature space is formed. The dimensionality of the feature space is then reduced to improve the classification accuracy using the Unsupervised Feature Selection with Ordinal Locality (UFSOL). The spectral indices have performed a good job as features and has attained 92.58% of accuracy.

  Projects
sym

Learning Virtual Try-on for Image-based Tasks
An implementation of image-based virtual try-on using DiOr as the base pipeline and Parser-Based Appearance Flow Style as the flow field estimator.
| code |

sym

Python-RayTracing
A Python implementation of Ray Tracing based on C++ Ray Tracing Book Series by Peter Shirley.
| code |

sym

Blind Deconvolution with Image Statistics
Built a model to recover blurred satellite images by simulating histograms of motion blurring kernels in different degree angles and lengths.
| poster | pdf |

  Awards
  • Mathematics Presidential Award for Outstanding Performance in Discrete Mathematics, NCTU

Thanks for the template from here