Muyu He's profile photo
Muyu He
  • Grad student in Penn Engineering. Focusing on Machine learning.
  • Philadelphia, Pennsylvania
  • Email

About Me

I’m a computer science grad student at the University of Pennsylvania's School of Engineering. I am currently working as a machine learning engineer intern at SenseTime, developing the next generation of multi-modal smartphone assistants.

Before studying at Penn, I was a philosophy Summa Cum Laude graduate at the University of California, Los Angeles, ranked as the school's top 3 philosophy undergraduate in 2022.

My main interest is in deep learning and artificial intelligence, particularly visual question-answering tasks related to multi-modal large language models and image generation tasks related to diffusion models.

In my free time, I'm two content creators on China's biggest social media, RED, as a philosophy educator (25k followers) and a musician (8k).

Work Experience


SenseTime

Apr 2024 - Present

Machine learning engineer intern | Research and development of the GUI agent
  • Fine-tuned the LLM base of two multi-modal large language models (7B and 1B parameters) with 60k data samples on 32 V100 GPUs on phone screen navigation tasks, achieving 75% accuracy in multi-step predictions of music-app-related actions
  • Innovated an automatic data generation pipeline that created 2k UI screen data samples for fine-tuning tasks, leveraging opencv to synthesize android screens and in-app designs, surpassing traditional web scraping and manual labeling methods 20 times in speed
  • Iterated the fine-tuning procedures from merely generating action predictions to also producing Chain-of-Thought data including observations, action logic, and predicted results, validating ideas with 5+ similar pipelines like ScreenAgent and MobileAgent
  • Took full ownership of the automatic generation of Chain-of-Thought data as training output for a 20B MLLM model, running GPT4V to generate 10k instances in 20+ parallel subprocesses, improving model accuracy on multiple benchmarks by 15%
  • Creatively implemented another data pipeline that randomly combined 15k authentic screenshots into 3k simulated screen operation data, using ui-animator to navigate Android apps and GPT4V to generate corresponding metadata
Cognitive Computation Group @ UPenn

July 2023 - Mar 2024

Research Assistant | Research on Text-to-Image Generation
  • Led a team of three to compile the first benchmark to test diffusion models’ ability to generate images grounded in common sense
  • Designed a dataset across 6 common sense categories containing 200 pairs of image prompts, following up with an automatic evaluation process to use CLIP to compute cosine similarity between generated images and ground truth
  • Constructed an automatic data pipeline from few-shot prompting GPT-4 to generate 200+ prompts per batch, to serializing and transferring prompts to remote GPU pools as model inputs, to generating and evaluating 4 variations of each prompt in parallel
  • Innovated 2 automatic scoring metrics on 4 SOTA models that only diverges from human evaluations by 3%, revealing major shortcomings in new Dall-E and Stable Diffusion models including the lack of basic physical and spatial common sense

Projects


Mindscape
  • Deployed a philosophy chatbot platform with Next.js and Vercel, enabling users to create and chat with AI philosophers by storing the models in a SQL-based Prisma database and routing users’ queries to GPT-4’s API and a vector database
  • Created an ample external knowledge database by recursively scraped 500+ web pages from Stanford Encyclopedia of Philosophy with requests and BeautifulSoup, splitting the texts into 46000+ chunks of data and upserting them to a Pinecone vector database
  • Improved the accuracy of chatbots’ responses using similarity search to return the top 3 text chunks most relevant to the query within 0-1 second and embedding them in GPT prompts, surpassing native GPT’s user satisfaction rate as reported by 10+ users
  • Enabled chatbots to have long-term memories about user conversations by storing all chat messages into Redis and retrieving the 30+ most recent messages in realtime, preventing chatbots’ from memory loss due to recurrent API calls
Tiny Transformer
  • Created a decoder-only transformer with 16 million tokens capable of generating text in any given author’s writing style
  • Preprocessed the complete corpus of Shakespeare with 1.2 million text tokens, chunking the text into sequences of 32-character tokens and transforming them into embeddings of size 256 to serve as input to the transformer model
  • Manually implemented 6 decoder blocks of multi-head attention layers and feed-forward layers to enable the model to understand semantic relations between tokens, adding residual connections and pre-layer normalization to prevent vanishing gradient
  • Successfully achieved a cross-entropy loss of 0.9 on the training set and 1.5 on the validation set after 4k iterations of mini-batch gradient descent using the Adam optimizer, approaching the authentic writing styles of the target author

Extracurricular activities


REDPhilosophy content creator
  • Innovated popular philosophical content for the general public, acquiring 25,000 followers within 5 months and surpassing 93% of channels in growth speed
  • Produced animations, articles, and cartoons every two days, acquiring over 1,000 interactions across social media per post