SURP Student Spotlight: Zehao Peng

Zehao is an incoming third year undergraduate student pursuing a computer science specialist, statistics major, and math minor. Born and raised in Beijing, he moved to Richmond, British Columbia in high school before joining UofT in 2023. Zehao is interested in studying machine learning techniques and how they enable discoveries from large amounts of data. Among other interests, he enjoys exploring the backcountry, often by bicycle, and reading. He is very fond of Emily Carr’s artwork, maintains an absurdly obsessive relationship with oranges, and cares quite dearly about environmental causes.

What made you decide to participate in SURP?

Last summer, I worked on a project under the supervision of Alicia Savelli and Prof. Speagle, studying the star formation histories of galaxies in cosmological simulations. It was a great learning experience, and it introduced me to the field of statistical and machine learning. It is quite interesting to me how these computational methods are able to derive meaningful insights from such a large corpus of data, which is especially powerful in a physical science like astronomy, a field intrinsically driven by enormous (astronomical!) amounts of data. Over the school year, I further pursued this interest, working part time under the supervision of Prof. Keith Vanderlinde to study radio frequency interference (RFI) with data collected at the Algonquin Radio Observatory. I thoroughly enjoyed the general environment fostered at the Dunlap Institute, which has allowed me to expand my academic horizons, as well as hone my professional skills in areas such as scientific communication. When I learned that I would have the opportunity to continue this fascinating research through SURP, I applied without hesitation.

What is your favourite thing about SURP?

It certainly is hard to narrow it down to one thing! Besides the great opportunities for learning and growth, my favourite thing about SURP is the community. Immersed in a group of people who have similar academic interests and life goals, I made many great friends through the program – it is just easy for us to get along! Additionally, my supervisors, Biprateep and Josh, regularly go above and beyond to support and encourage my learning. In general, my experiences in this regard have been lovely.

Can you tell us about your research project?

I am working towards building an astrophysical foundation model. This summer, we begin the journey with optical spectra. Due to the expansion of the universe, when light is emitted from distant objects, its spectrum is shifted towards longer wavelengths upon detection. With surveys like the Dark Energy Spectroscopic Instrument (DESI), which have a fixed observation range, redshift permits observation windows into different parts of the rest frame spectrum for objects at different redshifts. We can imagine similar objects emit similarly, and these objects at different redshifts present realizations of a latent “canonical” spectrum. By ingesting many observational samples, we can learn a complete distribution of spectral features over a broad region in the rest frame. In other words, instead of discarding spectral data due to redshift cuts, we now use the variability to our advantage – objects at different redshifts offer pieces which we may pool together to complete a full puzzle.

The model that I am developing is an implementation of the masked autoencoder (MAE), which is fundamentally a neural network that hides random parts of its input and trains itself to reconstruct the full spectrum. Heuristically, this masking regime is quite similar to the limits that instrumentation windows impose on spectroscopic observations, where the unobserved portions can simply be seen as “masked”. By training the model to generate over masked portions, it can learn to infill and extend spectra beyond their observed wavelengths in an astrophysically realistic manner.

Specifically, we investigate changing the masking strategy throughout training for the model to learn structures at different scales. To do this, we implement a “multi-head patching” variation of the formidable transformer architecture. If the transformer model sounds familiar, ChatGPT is a Generative Pretrained Transformer! Mechanically, the model we are developing is quite similar to LLMs.

Ultimately, we hope to add in other modalities of data such as images, resulting in a multi-modal, multi-task model that develops a “foundational” understanding of galaxies and astrophysical processes in general. Such a model will have a plethora of downstream scientific applications. It is a fascinating project, and I am very excited to see where it can go. If you are curious, please reach out!

Can you explain how SURP has been different from your undergrad work?

As I am working on a project that is of strong personal interest, my work is endowed with a clear purpose and motivation, which is sometimes lacking for regular coursework. For that reason, I find it easier to stay consistently engaged, and the learning processes that emerge are always intellectually stimulating. Additionally, as SURP supports me to work on my project full time, I can commit in a more involved fashion, with more directional and formal autonomy over my project. Overall, I think that SURP is an incredible opportunity to supplement my undergrad studies and grow my career as a researcher.

What are your plans for the future?

In the short term, after completing my bachelor’s degree, I plan to follow my passion for statistics and machine learning, and eventually obtain a PhD in the field. In the long run, I envisage myself applying my expertise to benefit the world and help as many people as possible. In addition to making discoveries to expand scientific frontiers like space exploration, I want to tackle important issues such as technological governance, climate change, and human rights—particularly in a world that will only become more profoundly transformed by artificial intelligence.

Tell us something fun about yourself unrelated to SURP!

I am a huge fan of science fiction written by Cixin Liu (劉慈欣). His early works, which often centred on physicists or computer scientists at work, inspired me in primary school to become a scientist.
All of my plants have names.
I have been learning to swing dance!

The Dunlap Institute for Astronomy and Astrophysics at the University of Toronto is an endowed research institute with over 80 faculty, postdocs, students, and staff, dedicated to innovative technology, ground-breaking research, world-class training, and public engagement.

The research themes of its faculty and Dunlap Fellows span the Universe and include: optical, infrared and radio instrumentation, Dark Energy, large-scale structure, the Cosmic Microwave Background, the interstellar medium, galaxy evolution, cosmic magnetism, and time-domain science.

The Dunlap Institute, the David A. Dunlap of Astronomy and Astrophysics, and other researchers across the University of Toronto’s three campuses together comprise the leading concentration of astronomers in Canada, at the leading research university in the country.

The Dunlap Institute is committed to making its science, training, and public outreach activities productive and enjoyable for everyone of all backgrounds and identities.