Arshi Arora

Arshi Arora

Research Biostatistician

Memorial Sloan Kettering Cancer Center

About me

I am a thoughtful genomics data scientist statistician with strong programming skills in R and Python, and a formal training in both Computational Biology and Biostatistics. Currently, I work as a Principal Scientist at Incyte deploying ML models to mine for novel drug targets and making my work more accessible via data visualization and Shiny and FlexDashboard apps. Previously, I worked at Memorial Sloan Kettering Cancer Center and focused on methodological work in the field of Cancer Genomics.

The right hemisphere of my brain is into ceramics, painting, DIY crafts and biking. I am a minimalist and on a personal mission to reduce what goes in my trashbin. I also co-host a podcast on Computational Biology called Computationally Yours!


  • MS Biostatistics, 2017

    Columbia University

  • MS Computational Biology, 2010

    Carnegie Mellon University

  • B.Tech Biotechnology, 2008

    Amity University





Potter (wizarding and muggle)






Package to wrangle and visualize genomic data in R

iCluster and TCGA

Integrative clustering of TCGA datasets


Visualization tool for clustered groups


An outcome weighted supervised clustering algorithm

Recent & Upcoming Talks

AMSTAT feature, 2021
panelmap at WSDS, 2020
BIRSBIO 2020 Hackathon
ISMCO (2019)

Recent Posts

A brief primer on scientific and mathematical notations

As I finished writing the final draft of my first first author paper, survClust, there were a lot of other firsts! In my opinion writing the methods and a crisp conclusion and discussion were the difficult parts.

Academic Hugo Theme via Blogdown: Few more details and deployment (part 2)

This is in continuation to a post I wrote - Academic Hugo Theme via Blogdown: Where to start? After setting up a basic website with About, Skills and Experience pages.

Academic Hugo Theme via Blogdown: Where to start?

Setting up a personal website is fun and a great way to reach visibility. Whether its your work, skills, or other hobbies, they all can reach the light of day in one platform!

Journey so far


Principal Investigator


Feb 2022 – Present Wilmington, DE
Responsibilities include: My role at Incyte is highly cross-functional where I work with scientists from Discovery, Pharmacology and Biology to power their biological hypothesis with data. This exposed me to various view points sometimes of the same problem and in identifying answers supported by data to key questions in drug discovery and translation. I also lead Target Identification and Validation efforts with the help of Machine Learning and other deep learning models.

Research Biostatistician

Memorial Sloan Kettering Cancer Center

May 2012 – Feb 2022 New York
Responsibilities include:

  • Developed survClust, a semi-supervised classification algorithm that stratifies patients into cohorts driven by their genetic background and survival.
  • survClust was then used in a pancancer cohort of patients treated with immune checkpoint blockade therapies to stratify patients with worst prognosis. Read more here
  • Lead genomics analyst of the International consortium of Melanoma (InterMEL) and building a framework for identifying false positives from tumor-only somatic mutation calling pipeline. (Glitter)
  • Integrated analysis of various cancer types as part of The Cancer Genome Atlas (TCGA) consortium like Liver Hepatocellular Carcinoma (LIHC), Prostate Adenocarcinoma (PRAD) and Skin Cutaneous Melanoma (SKCM) using joint latent variable model implemented in iCluster, to arrive at molecularly distinct subtypes.
  • Providing genomics and analytical support to faculty members of Epidemiology and Biostatistics Department at Memorial Sloan Kettering Cancer Center on a broad range of analysis like copy number and clonal evolution, mutational signature analysis, and building statistical models to identify prognostic molecular features in exome sequencing and mutation panel testing datasets.
  • Understanding etiological tumor heterogeneity across various molecular assays like gene expression, mutation, copy number, and epigenetic data through known clinical risk factors to characterize distinct risk groups.
  • Developed a validated prognostic gene risk score of colorectal cancer liver metastasis patients.