From Black Box to Blueprint:
Trustworthy AI for Materials Discovery

I am a PhD researcher at Johns Hopkins University developing frameworks that bridge large scale molecular dynamics with first-principles simulations by machine-learned force fields, spanning explainable AI (AAAI 2026 XAI4Science), generalization benchmarking (NeurIPS 2025 AI4Mat spotlight), and causal reasoning (KDD 2026). I have a strong foundation in both AI/ML methodology and computational materials science, with demonstrated experience translating research into industry applications.

Recent Work

RepliCan

RepliCan: Evaluating LLM Agents on Scientific Reproducibility in Computational Materials Science

Ziyang Huang, Ji Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, William Jurayj, Somdatta Goswami, Michael Shields, Jaafar El-Awady, Paulette Clancy, William Gantt Walden, Nicholas Andrews, Benjamin Van Durme, Daniel Khashabi

Submitted to COLM Conference (2026)

[PDF (Coming Soon)]

ARIA

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning in Trustworthy Materials Discovery

Yi Cao, Liaoyaqi Wang, Jieneng Chen, Benjamin Van Durme, Alan Yuille, Paulette Clancy*
Submitted to KDD Conference, AI4Sciences Track (2026)

Causal-aware KG-LLM integration framework that mitigates "contextual tunneling" and improves scientific reasoning.

[PDF (Coming Soon)]

Dual-Level Explainability

What is Your Force Field Really Learning? Gaining Scientific Intuition with a Dual-Level Explainability Framework

Yi Cao, Peter Mastracco, Jieneng Chen, Alan Yuille, Paulette Clancy*
AAAI Conference Workshop (XAI4Science), Spotlight (2026)

Dual-level explainability framework bridging model reasoning with human understanding in scientific AI.

[PDF (Coming Soon)] [Code (Coming Soon)]

NeurIPS 2025

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

Yi Cao and Paulette Clancy*
NeurIPS Conference Workshop (AI4Mat), Spotlight (2025)

Comprehensive benchmarking framework for evaluating MLFF generalization in materials science.

[PDF]

Atomic Switch

Atomic Switch Control via Two-Mode Intercalation for Tunable 2D Materials

Yi Cao, Victor Wu, and Paulette Clancy*
npj 2D Materials and Applications, Under Review (2025)

Investigating two-mode intercalation for tunable electronic properties in 2D materials.

[Project Website]

Self-Healing

Low-energy pathways lead to self-healing defects in CsPbBr₃

Kumar Miskin, Yi Cao, Madaline Marland, Farhan Shaikh, David T. Moore, John Marohn, Paulette Clancy*
Phys. Chem. Chem. Phys., 27(29), 15446-15459 (2025)

Computational discovery of self-healing mechanisms with implications for rational material design.

[PDF] [Post]

2025 Yi Cao. This work is licensed under CC BY-NC-SA 4.0.

Background & Expertise

Technical Portfolio

Machine Learning & AI

  • Deep Learning (PyTorch, TensorFlow)
  • Large Language Models (fine-tuning, prompt engineering)
  • Causal Inference, Graph Neural Networks
  • Transfer Learning, Explainable AI

Scientific Computing

  • Molecular Dynamics (LAMMPS, GROMACS)
  • Density Functional Theory (Quantum ESPRESSO)
  • High-Performance Computing (MPI, CUDA)
  • Materials Informatics, Scientific AI

Programming & Tools

  • Python, MATLAB, R, Git
  • Docker, Linux/Unix
  • SQL, Database Management
  • Distributed Computing, Large-scale Data Processing

Education & Research Timeline

Aug. 2026 - May 2028 (Expected)
MS in Computer Science
Johns Hopkins University
Sept 2023 - May 2028 (Expected)
PhD in ChemBE
Johns Hopkins University
Nov 2023 - Present
Graduate Researcher
Clancy Lab, JHU
Jun 2024 - Jul 2024
CADD Intern
Viva Biotech
2019 - 2023
B.S. Pharmaceutical Sci.
Fudan University
Dec 2022 - Feb 2023
Quality Culture Intern
Boehringer Ingelheim
Feb 2022 - Jun 2023
Undergrad Researcher
ISTBI, Fudan University
Aug 2021 - Dec 2021
Visiting Scholar
UC Berkeley

Work Experience

Qualcomm

Engineering Intern (GPU High-Level Modeling)

Qualcomm | San Diego, CA, USA | May - Aug 2026

Developing an agentic AI workflow to automate complex debugging processes in GPU high-level modeling (HLM), integrating LLM capabilities with existing diagnostic tools to streamline operations.

Company Website
Viva Biotech

Computational Drug Design Intern

Viva Biotech | Shanghai, China | Jun - Jul 2024

Conducted Computer-Aided Drug Design (CADD) research using co-solvent MD simulations, optimizing drug discovery through protein-ligand interaction analysis.

Learn More
iGEM Competition

Scientific Advisor

Fudan iGEM Team | Shanghai, China | Dec 2022 - Nov 2023

Guided experimental design and scientific documentation. Led brainstorming sessions resulting in Gold Medal and Best Environmental Project.

View Project
Boehringer Ingelheim

Quality Culture Intern

Boehringer Ingelheim | Shanghai, China | Dec 2022 - Feb 2023

Led a team in developing a white paper on quality culture through research and interviews, resulting in improved company-wide quality guidelines.

Company Website
Teaching

Science Communication & Teaching

Conference Talks, Posters, and More

Selected for PHM Society Doctoral Symposium 2025, presented at MRS Fall Meeting 2024, and completed JHU Teaching Institute certification.

View All

Awards & Honors

  • Best Poster Award - Women in Data Science and AI Symposium (2026)
  • Top 1 Poster Presentation Award - Women of Whiting STEM Symposium (2026)
  • 2025 Visionary Award - LLM Hackathon for Materials Science (ranked 6th of 120 teams)
  • Spotlight Talk (Top-tier recognition) - AAAI XAI4Science Workshop (2026)
  • Spotlight Talk & Travel Grant - NeurIPS AI4Mat Workshop (2025)
  • Poster Presentation Award Winner - Women in AI 2025
  • Doctoral Symposium Selectee - PHM Society (1 of 10 PhD students globally, 2025)
  • Empower Your Pitch Finalist - JHU (1 of 12 PhD students university-wide, 2025)
  • "Graduate Star" Nomination Award - Fudan University (2023)
  • Excellent Graduates of Shanghai Colleges - (2023)
  • 1st Class Scholarship - Fudan University (2021)

Vision for AI-Accelerated Materials Discovery

Hover or click to expand; click again to collapse.

What is your long-term goal?
Build a closed-loop system linking AI + simulations + experiments.

My long-term goal is to bridge the gap between computational simulations and experimental materials science, enabling a closed-loop design process. With prior wet-lab training in biomaterials, I’ve seen how tedious trial-and-error methods are.

My vision is to build systems that minimize experiments by learning from past data and simulations—so materials discovery becomes faster, deeper, and smarter.

▲ Collapse
What is your mission as a simulation researcher?
Use ML to extract maximum insight from minimal experiments.

I aim to merge simulation data and historical experiments using advanced ML techniques like active learning and transfer learning to uncover hidden patterns. This allows us to optimize material design with fewer experiments, while gaining more knowledge—accelerating understanding of atomic-level interactions and enabling better materials in fewer cycles.

▲ Collapse
How do you understand Machine Learning?
ML is not a black box—it’s a transparent, evolving partner in science.

To me, ML is not magic—it’s a dynamic tool that gains strength when guided by domain knowledge. With strong grounding in materials science, I see ML as a transparent, explainable collaborator. CPUs and GPUs are extensions of human thought. AI and humans co-evolve, inspiring each other.

As Marie Curie once said, "Nothing in life is to be feared, it is only to be understood." Through interdisciplinary research in ML and materials, I hope to help people understand—and therefore face—the world with greater confidence and curiosity.

▲ Collapse

Get In Touch

I'm always interested in discussing computational materials science, machine learning applications in materials discovery, and potential collaborations. Feel free to reach out if you'd like to connect!

  • Address

    3400 N. Charles Street
    Baltimore, MD 21218
    United States
  • Phone

    +1 (443) 278-3766
  • Email

    ycao73@jh.edu