Harry Mayne

PhD researcher at the University of Oxford

I'm currently researching language model interpretability.

X logoLinkedIn logo
Image of me!

Research

I'm a PhD researcher working on language model interpretability and representation engineering. At the moment I'm interested in methods to extract representations of safety-related concepts, and how we can use those to control model behaviour. Generally, I believe that interpretability has incredibly diverse applications from helping understand and fix known limitations of models to improving explainability and safety.

Outside of my main PhD research, I'm also interested in understanding the limitations of alignment algorithms and designing evals for advanced AI. I recently was part of the team that created the LINGOLY reasoning benchmark, which we will be presenting as an oral at NeurIPS in Vancouver.

I work in the Oxford Internet Institute's language modelling group, supervised by Dr Adam Mahdi, and am also a member of OxNLP.

Publications
I'm now in the second year of my PhD. I've had a bit of an unusual path to get to where I am today, having transitioned from originally studying economics!
Education
Oxford Internet Institute, University of Oxford
DPhil Social Data Science
Researching language model interpretability
2023 - 2026
Oxford Internet Institute, University of Oxford
MSc Social Data Science
Distinction, 77%
Oxford Internet Institute Thesis Prize for best dissertation (88%)
2022 - 2023
Selwyn College, University of Cambridge
BA Economics
Double First Class, top 10% of cohort
Awarded the Patrick Cross Prize for exceptional performance in the Economics Tripos
2019 - 2022
Grants
Grand Union DPT, Economic and Social Research Council
Full PhD Scholarship
2022 - 2026

Blog

Coming soon...

Contact

If you would like to discuss collaborations, talks or teaching then please get in touch. I'm very open to discussing research ideas! You can also contact me through X or LinkedIn.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.