From Black Box to Blueprint:
Trustworthy AI for Materials Discovery

I am a PhD candidate at Johns Hopkins University dedicated to building the next generation of autonomous reasoning frameworks for materials discovery. By bridging molecular simulation (DFT/MD) with Machine Learning, I aim to move beyond "black-box" models toward a new AI paradigm that is trustworthy and leads to reliable, physics-informed materials design.

Recent Work

Dual-Level Explainability

What is Your Force Field Really Learning? Gaining Scientific Intuition with a Dual-Level Explainability Framework

Yi Cao, Peter Mastracco, Jieneng Chen, Alan Yuille, Paulette Clancy*
AAAI Conference Workshop (XAI4Science), Spotlight (2026)

Dual-level explainability framework bridging model reasoning with human understanding in scientific AI.

[PDF (Coming Soon)] [Code (Coming Soon)]

NeurIPS 2025

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

Yi Cao and Paulette Clancy*
NeurIPS Conference Workshop (AI4Mat), Spotlight (2025)

Comprehensive benchmarking framework for evaluating MLFF generalization in materials science.

[PDF]

Atomic Switch

Atomic Switch Control via Two-Mode Intercalation for Tunable 2D Materials

Yi Cao, Victor Wu, and Paulette Clancy*
npj 2D Materials and Applications, Under Review (2025)

Investigating two-mode intercalation for tunable electronic properties in 2D materials.

[Project Website]

Self-Healing

Low-energy pathways lead to self-healing defects in CsPbBr₃

Kumar Miskin, Yi Cao, Madaline Marland, Farhan Shaikh, David T. Moore, John Marohn, Paulette Clancy*
Phys. Chem. Chem. Phys., 27(29), 15446-15459 (2025)

Computational discovery of self-healing mechanisms with implications for rational material design.

[PDF] [Post]

2025 Yi Cao. This work is licensed under CC BY-NC-SA 4.0.

Background & Expertise

Technical Portfolio

Machine Learning & AI

  • Deep Learning (PyTorch)
  • LLMs & Prompt Engineering
  • Causal Inference & GNNs
  • Explainable AI (XAI)

Scientific Computing

  • Molecular Dynamics (LAMMPS)
  • DFT (Quantum ESPRESSO)
  • HPC (MPI, CUDA)
  • Materials Informatics

Programming & Tools

  • Python, MATLAB, R
  • Git, Docker, Linux
  • SQL, Distributed Computing

Education & Research Timeline

Sept 2023 - Present
PhD in ChemBE
Johns Hopkins University
Nov 2023 - Present
Graduate Researcher
Clancy Lab, JHU
Jun 2024 - Jul 2024
CADD Intern
Viva Biotech
2019 - 2023
B.S. Pharmaceutical Sci.
Fudan University
Dec 2022 - Feb 2023
Quality Culture Intern
Boehringer Ingelheim
Feb 2022 - Jun 2023
Undergrad Researcher
ISTBI, Fudan University
Aug 2021 - Dec 2021
Visiting Scholar
UC Berkeley

Work Experience

Viva Biotech

Computational Drug Design Intern

Viva Biotech | Shanghai, China | Jun - Jul 2024

Conducted Computer-Aided Drug Design (CADD) research using co-solvent MD simulations, optimizing drug discovery through protein-ligand interaction analysis.

Learn More
iGEM Competition

Scientific Advisor

Fudan iGEM Team | Shanghai, China | Dec 2022 - Nov 2023

Guided experimental design and scientific documentation. Led brainstorming sessions resulting in Gold Medal and Best Environmental Project.

View Project
Boehringer Ingelheim

Quality Culture Intern

Boehringer Ingelheim | Shanghai, China | Dec 2022 - Feb 2023

Led a team in developing a white paper on quality culture through research and interviews, resulting in improved company-wide quality guidelines.

Company Website
Teaching

Science Communication & Teaching

Conference Talks, Posters, and More

Selected for PHM Society Doctoral Symposium 2025, presented at MRS Fall Meeting 2024, and completed JHU Teaching Institute certification.

View All

Vision for AI-Accelerated Materials Discovery

Hover or click to expand; click again to collapse.

What is your long-term goal?
Build a closed-loop system linking AI + simulations + experiments.

My long-term goal is to bridge the gap between computational simulations and experimental materials science, enabling a closed-loop design process. With prior wet-lab training in biomaterials, I’ve seen how tedious trial-and-error methods are.

My vision is to build systems that minimize experiments by learning from past data and simulations—so materials discovery becomes faster, deeper, and smarter.

▲ Collapse
What is your mission as a simulation researcher?
Use ML to extract maximum insight from minimal experiments.

I aim to merge simulation data and historical experiments using advanced ML techniques like active learning and transfer learning to uncover hidden patterns. This allows us to optimize material design with fewer experiments, while gaining more knowledge—accelerating understanding of atomic-level interactions and enabling better materials in fewer cycles.

▲ Collapse
How do you understand Machine Learning?
ML is not a black box—it’s a transparent, evolving partner in science.

To me, ML is not magic—it’s a dynamic tool that gains strength when guided by domain knowledge. With strong grounding in materials science, I see ML as a transparent, explainable collaborator. CPUs and GPUs are extensions of human thought. AI and humans co-evolve, inspiring each other.

As Marie Curie once said, "Nothing in life is to be feared, it is only to be understood." Through interdisciplinary research in ML and materials, I hope to help people understand—and therefore face—the world with greater confidence and curiosity.

▲ Collapse

Get In Touch

I'm always interested in discussing computational materials science, machine learning applications in materials discovery, and potential collaborations. Feel free to reach out if you'd like to connect!

  • Address

    3400 N. Charles Street
    Baltimore, MD 21218
    United States
  • Phone

    +1 (443) 278-3766
  • Email

    ycao73@jh.edu