Tri NguyenTri Nguyen

 

 

ExpertiseExpertise

 

 

      

 

 

      

 

 

      

Research & ProjectsResearch & Projects

Code Review AI

GitHub

Built a multi-agent CLI tool that automates code review using NVIDIA Nemotron, reducing review turnaround from hours to minutes while maintaining senior-engineer-level thoroughness.

Manual code review is the single biggest bottleneck in development pipelines. Automating it with multi-agent LLM orchestration cuts cycle time, enforces consistency across teams, and frees senior engineers to focus on architecture instead of line-by-line review.

// applications

Automated pull request review for distributed engineering teams

Continuous code quality enforcement in CI/CD pipelines

Security vulnerability scanning before production deployment

PythonNVIDIA NIMNemotronLangGraphCLI

Click to explore →

ML Serving Optimization in Edge-Cloud

GitHub

Pioneered resource optimization techniques for ML model serving on heterogeneous edge devices, achieving 25-30% efficiency improvement in multi-device deployments.

Edge ML inference is bottlenecked by heterogeneous hardware and limited resources. Optimizing serving across diverse edge nodes unlocks real-time AI at the network edge where latency matters most.

// applications

Autonomous vehicle perception on embedded GPUs

Smart retail real-time inventory tracking

Industrial quality inspection on edge cameras

PyTorchONNXKubernetesDocker

Click to explore →

Federated Learning Marketplace

Designed a quality-aware, cost-conscious federated ML marketplace framework enabling ML-as-a-Service across distributed edge nodes.

Data privacy regulations prevent centralized training in many domains. A federated marketplace enables collaborative model improvement while keeping sensitive data local to each participant.

// applications

Cross-hospital medical AI without sharing patient data

Financial fraud detection across banking networks

Distributed smart grid energy optimization

TensorFlowRayDockerPython

Click to explore →

Real-time Manufacturing Process Optimization

Website

Designed and deployed production ML pipeline integrating IoT sensor data with predictive analytics to reduce manufacturing operation costs.

Manufacturing downtime costs millions per hour. ML-driven predictive maintenance and process optimization catch failures before they happen and continuously tune operations for peak efficiency.

// applications

Predictive maintenance on CNC machines

Real-time defect detection on assembly lines

Energy consumption optimization in factories

MLflowKubernetesAirflowPrometheus

Click to explore →

Explainable AI for Digital Twin Systems

GitHub

Developed security-aware ML serving architecture integrating explainability for real-time decision auditing in critical infrastructure digital twins.

Critical infrastructure decisions cannot be black boxes. Explainable AI provides transparent reasoning for automated decisions, enabling human oversight and regulatory compliance in high-stakes environments.

// applications

Power grid anomaly detection with audit trails

Smart building security threat analysis

Water treatment plant process monitoring

SHAPLIMEPythonDocker

Click to explore →

Opportunistic Data Operations for HPC

GitHub

Built a framework enabling developers and scientists to execute data operations during idle periods of long-running HPC applications, improving resource utilization in data-intensive computational workflows.

HPC jobs leave resources idle during computation phases. Opportunistic scheduling reclaims these gaps for data tasks like analysis and ML, maximizing expensive supercomputer utilization without interfering with primary workloads.

// applications

In-situ data analysis during large-scale simulations

Opportunistic ML training on idle HPC nodes

Automated data preprocessing in scientific computing pipelines

C++PythonCMakeDocker

Click to explore →

Chainer-XP: Deep Learning on Intel Xeon Phi

Website

Developed a flexible framework enabling neural network training and inference on Intel Xeon Phi coprocessors, bridging the gap between deep learning workloads and many-core architectures.

Deep learning frameworks were not optimized for many-core coprocessors like Intel Xeon Phi. Adapting ANN workloads to these architectures unlocks cost-effective parallel computation for research labs without GPU clusters.

// applications

Neural network training on HPC clusters with Xeon Phi nodes

Cost-effective deep learning for research institutions

Parallel ANN inference on many-core architectures

PythonC++ChainerIntel Xeon Phi

Click to explore →

pyMIC-DL: Deep Learning Library for Xeon Phi

Website

Built a Python library that enables existing deep learning frameworks to offload computation to Intel Xeon Phi coprocessors, providing a seamless bridge between high-level DL APIs and many-core hardware acceleration.

Deep learning libraries lacked native support for Intel Xeon Phi offloading. A transparent library layer lets researchers leverage many-core coprocessors without rewriting models, lowering the barrier to HPC-accelerated training.

// applications

Offloading DL training to Xeon Phi coprocessors in hybrid clusters

Accelerating scientific deep learning on non-GPU HPC hardware

Benchmarking neural network performance across compute architectures

PythonCIntel Xeon PhipyMIC

Click to explore →

 

Google Scholar →
2025

Quality-driven Inference Orchestration for Multi-modality Machine Learning Systems at the Edge

T Nguyen, AD Nguyen, L Truong

Under Submission

2025

On Optimizing Resources for Real-Time End-to-End Machine Learning in Heterogeneous Edges

MT Nguyen, HL Truong

Software: Practice and Experience 55(3), 541-558

2025

EADRAN: An Edge Marketplace for Federated Learning

TD Cao, HT Nguyen, MT Nguyen, T Truong-Huu, HL Truong

Future Generation Computer Systems, 108046

2024

Security Orchestration with Explainability for Digital Twins-based Smart Systems

MT Nguyen, AN Lam, P Nguyen, HL Truong

IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)

2024

Novel Contract-based Runtime Explainability Framework for End-to-End Ensemble Machine Learning Serving

MT Nguyen, HL Truong, T Truong-Huu

IEEE/ACM 3rd International Conference on AI Engineering (CAIN)

2024

Optimizing Multiple Consumer-specific Objectives in End-to-End Ensemble Machine Learning Serving

MT Nguyen, HL Truong, P Arcaini, F Ishikawa

IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC)

2024

Supporting Opportunistic Data Operations for Data-intensive Computational Applications

MT Nguyen, AD Nguyen, J Rantaharju, T Puro, M Rheinhardt, et al.

IEEE International Conference on Big Data (BigData), 3735-3744

2021

QoA4ML - A Framework for Supporting Contracts in Machine Learning Services

L Truong, T Nguyen

IEEE International Conference on Web Services, 465-475

2021

Demonstration Paper: Monitoring Machine Learning Contracts with QoA4ML

MT Nguyen, HL Truong

ACM/SPEC International Conference on Performance Engineering (ICPE)

2020

Chainer-XP: A Flexible Framework for ANNs Run on the Intel Xeon Phi

TD Diep, MT Nguyen, NY Nguyen-Huynh, MT Chung, MT Nguyen, et al.

Modeling, Simulation and Optimization of Complex Processes HPSC 2018

2019

Attention-based Neural Network: A Novel Approach for Predicting the Popularity of Online Content

MT Nguyen, DH Le, T Nakajima, M Yoshimi, N Thoai

IEEE 21st International Conference on High Performance Computing and Communications (HPCC)

2019

Analyzing and Predicting the Popularity of Online Contents

MT Nguyen, T Nakajima, M Yoshimi, N Thoai

21st International Conference on Information Integration and Web Intelligence (iiWAS)

2019

Optimizing Color-based Cooperative Caching in Telco-CDNs by Using Real Datasets

ATN Tran, MT Nguyen, TD Diep, T Nakajima, N Thoai

International Conference on Ubiquitous Information Management and Communication (IMCOM)

2018

Analyzing and Visualizing Web Server Access Log File

MT Nguyen, TD Diep, T Hoang Vinh, T Nakajima, N Thoai

International Conference on Future Data and Security Engineering, 349-367

2018

A Performance Study of Color-based Cooperative Caching in Telco-CDNs by Using Real Datasets

ATN Tran, MT Nguyen, TD Diep, T Nakajima, N Thoai

9th International Symposium on Information and Communication Technology (SoICT)

2018

pyMIC-DL: A Library for Deep Learning Frameworks Run on the Intel Xeon Phi Coprocessor

ATN Tran, HP Nguyen, MT Nguyen, TD Diep, N Quang-Hung, N Thoai

IEEE 20th International Conference on High Performance Computing and Communications (HPCC)

ExperienceExperience

 

  

 
  •  
  •  
  •  
  •  
     

 

  

 
  •  
  •  
  •  
  •  
     

 

  

 
  •  
  •  
    

 

  

 
  •  
  •  
  •  
    

 

 

 

 

 

 

  

 

 

 

Honours & AwardsHonours & Awards

  
  
  
  
  

 

AWS SimuLearn: Cloud Practitioner

 

 

Securing and Protecting Your Data in Amazon S3

 

 

AWS SimuLearn: Highly Available Web Applications

 

 

AWS SimuLearn: Connecting VPCs

 

 

AWS SimuLearn: Auto-Healing and Scaling Applications

 

 

AWS SimuLearn: Databases in Practice

 

 

AWS SimuLearn: Core Security Concepts

 

 

AWS Free Tier: Introduction to Monitoring Services

 

 

AWS SimuLearn: File Systems in the Cloud

 

 

AWS SimuLearn: First NoSQL Database

 

 

AWS SimuLearn: Networking Concepts

 

 

AWS SimuLearn: Computing Solutions

 

 

AWS SimuLearn: Cloud Computing Essentials

 

 

AWS SimuLearn: Cloud First Steps

 

 

 

 
Getting Started with Compute

 

Getting Started with Networking

 

Getting Started with Storage

 

Machine Learning Foundations

 

Introduction to Cloud 101

 

Introduction to Generative AI

 

Let's ConnectLet's Connect