CV

Feel free to reach out if you have any questions!

Basics

Name Siddharth Parekh
Email spparekh@andrew.cmu.edu
Phone (332) 248-8513
Location Pittsburgh, PA

Education

  • 2021.08 - 2025.05

    Pittsburgh, PA

    B.S. in Computer Science
    Carnegie Mellon University, Pittsburgh, PA
    SCS Concentration in Machine Learning
    Machine Learning and AI:
    • 10-315 Introduction to Machine Learning
    • 11-411 Natural Language Processing
    • 11-485 Introduction to Deep Learning
    • 10-422 Foundations of Learning, Game Theory, and Their Connections
    • 10-708 Probabilistic Graphical Models *
    • 10-720 Convex Optimization *
    Computer Science:
    • 15-213 Introduction to Computer Systems
    • 15-210 Parallel and Sequential Algorithms and Data Structures
    • 15-251 Great Theoretical Ideas in Computer Science
    • 15-451 Algorithms Design and Analysis
    • 15-445 Database Systems
    Mathematics:
    • 15-151 Mathematical Foundations of Computer Science
    • 21-241 Matrices and Linear Tranformations
    • 21-259 Calculus in Three Dimensions
    • 36-218 Probability Theory for Computer Scientists
    Computational Finance:
    • 21-270 Introduction to Mathematical Finance
    • 21-378 Mathematics of Fixed Income Markets

    * - graduate course

Work

  • 2022.06 - 2022.08

    Mumbai, India

    AI Intern - NLP
    MikoAI
    Streamlined the multilingual personality module of the Miko robot, enhancing its language processing capabilities across 8 languages.
    • Benchmarked open-source machine translation models towards optimizing cost-efficiency without compromising performance.
    • Developed a neural classifier for question answering, achieving linear speedup over traditional vector search.

Awards

Publications

  • 2024
    AliGATr: Graph-based layout generation for form understanding
    EMNLP 2024 Findings
    Forms constitute a large portion of layout-rich documents that convey information through key-value pairs. In this paper, we present AliGATr, a graph-based model that uses a generative objective to represent complex grid-like layouts that are often found in forms. Using a grid-based graph topology, our model learns to generate the layout of each page token by token in a data efficient manner, performing at par with state-of-the-art models.

Skills

Programming Languages
C/C++
Python
Java
JavaScript
Julia
OCaml
Machine Learning
PyTorch
Tensorflow
Transformers
Scikit-Learn
NumPy
Pandas
Cloud
AWS
Google Cloud Platform
Miscellaneous
Git
Docker

Languages

English
Native
Gujarati
Native
Hindi
Fluent
Spanish
Beginner