RTCSA 2025 Tutorial
Deep Software Stack Optimization for AI-Enabled Embedded Systems

Seongsoo Hong (Seoul National University)
Namcheol Lee (Seoul National University)
Geonha Park (Seoul National University)
Taehyun Kim (Seoul National University)

▶ Overview

Objective
This tutorial provides lectures and hands-on exercises on optimizing and deploying LiteRT (formerly TFLite) models on the RUBIK Pi platform, focusing on pipeline parallelism for efficient on-device inference.

Key Topics

Inference driver and inference runtime
Model slicing and conversion
Throughput enhancement via pipelining on heterogeneous accelerators

Target Audience

This tutorial is designed for students, engineers, and researchers interested in on-device AI.
It is particularly well-suited for beginners and intermediate-level participants who want to gain practical experience in on-device DNN inference and its optimization.

▶ Notice

The number of participants is limited to 20 on a first-come, first-served basis.
RUBIK Pi boards will be provided for use during the tutorial.

▶ Prerequisites

Please bring a personal laptop with the following software installed in advance

Visual Studio Code (VS Code) with Remote – SSH extension installed
ADB (Android Debug Bridge)

▶ Tentative Schedule

Lecture 1: From Inference Driver to Inference Runtime (50 min)
Lecturer	Seongsoo Hong
Topics	Step-by-Step Inference Driver Walkthrough
	Internals of Lite Runtime (LiteRT)
Brief Break (10 min)
Lecture 2: Model Slicer (50 min)
Lecturer	Seongsoo Hong
Topics
	Model Slicer: Slicing and Conversion Tool for LiteRT
Brief Break (10 min)
Lecture 3: Throughput Enhancement on Heterogeneous Accelerators (1 h 30 min)
Lecturer	Namcheol Lee
Topics	Implementing a Pipelined Inference Driver for Heterogeneous Processors

Slide GitHub

RTCSA 2025 Tutorial Deep Software Stack Optimization for AI-Enabled Embedded Systems

RTCSA 2025 Tutorial
Deep Software Stack Optimization for AI-Enabled Embedded Systems