Jacob Schreiber

I am an Assistant Professor in the Genomics and Computational Biology Department at UMass Chan Medical School in Worcester, MA. In previous lives, I was a visiting scientist at the Institute of Molecular Pathology (IMP) in Vienna, a postdoctoral researcher at Stanford University with Dr. Anshul Kundaje, and a graduate student at the University of Washington with Dr. William Noble.

My goal is to understand the regulatory role of each nucleotide in the genome and how this role changes across all cells in our body. This could be done simply via experimental means if we had infinite money and time but, until then, my group will develop computational methods that work toward this goal. To this end, I have developed Ledidi, a method for editing biosequences to exhibit desired characteristics, Avocado, a deep tensor factorization approach for jointly modeling thousands of genome-wide regulatory experiments and imputing those that have not yet been performed, and a method that uses submodular optimization to guide future experimental efforts. These projects sometimes involve machine learning methods that are not mainstream, and so I routinely contribute to the Python open source community in the form of packages that implement general purpose versions of the algorithms that I apply to genomics. As such, I am the core developer of pomegranate, a package for flexible probabilistic modeling, apricot, a package for submodular optimization, and in the past was a core developer for scikit-learn.

In addition to my research activities, I am also an editor at the Stanford AI Lab Blog, an editor at the Journal of Open Source Software, on the editorial board of reviewers for the Journal of Machine Learning Research, and occasionally co-host podcasts on The Bioinformatics Chat. When I don't get much done in a week, I pretend these are the reasons why.

Research Software: tangermeme tfmodisco-lite bpnet-lite yuzu Ledidi Avocado Rambutan PyPore

General Software: apricot pomegranate scikit-learn

Selected Publications

Accelerating in-silico saturation mutagenesis using compressed sensing, Submitted to RECOMB 2021, 2021
J. Schreiber*, S. Nair, A. Balsubramani, and A. Kundaje
[paper] [tweetorial]

Navigating the pitfalls of applying machine learning in genomics, Nature Reviews Genetics, 2021
S. Whalen*, J. Schreiber* W.S. Noble, and K.S. Pollard (*co-first author)
[paper] [tweetorial]

Machine learning for profile prediction in genomics, Current Opinion in Chemical Biology, 2021
J. Schreiber and R. Singh
[paper] [tweetorial]

Ledidi: Designing genomic edits that induce functional activity, ICML Workshop on Computational Biology, 2020
J. Schreiber, Y. Y. Lu, and W.S. Noble
[paper] [poster] [tweetorial]

Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2020 (Best Paper)
J. Schreiber, D. Hegde, and W.S. Noble
[paper] [tweetorial]

A pitfall for machine learning methods aiming to predict across cell types, Genome Biology, 2020
J. Schreiber, R. Singh, J. Bilmes, and W.S. Noble
[paper] [tweetorial]

Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data, Bioinformatics, 2020
J. Schreiber, J. Bilmes, and W.S. Noble
[paper] [code] [tweetorial]

apricot: Submodular selection for data summarization in Python, Journal of Machine Learning Research, 2020
J. Schreiber, J. Bilmes, and W.S. Noble
[paper] [tweetorial]

Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples Genome Biology, 2020
J. Schreiber, J. Bilmes, and W.S. Noble
[paper] [code]

Multi-scale Deep Tensor Factorization Learns a Latent Representation of the Human Epigenome, Genome Biology, 2020
J. Schreiber, T. Durham, J. Bilmes, and W.S. Noble
[paper] [code] [talk] [slides] [tutorial]

A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens, Cell, 2019
M. Gasperini, A.J. Hill, J.L. McFaline-Figueroa, B. Martin, S. Kim, M.D. Zhang, D. Jackson, A. Leith, J. Schreiber, W.S. Noble, C. Trapnell, N. Ahituv, and J. Shendure
[paper]

pomegranate: Fast and Flexible Probabilistic Modeling in Python, Journal of Machine Learning Research 2017
J. Schreiber
[paper] [code] [talk] [slides] [tutorial]

Finding the optimal Bayesian network given a constraint graph, PeerJ Computer Science 2017
J. Schreiber and W.S. Noble
[paper] [code] [tutorial]

Analysis of Nanopore Data Using Hidden Markov Models, Bioinformatics, 2015
J. Schreiber and K. Karplus
[paper] [code] [tutorial]

Discrimination among Protein Variants Using an Unfoldase-Coupled Nanopore, ACS Nano, 2014
J. Nivala, L. Mulroney, G. Li, J. Schreiber, and M. Akeson
[paper]

Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands, Proceedings of the National Academy of Science, 2013
J. Schreiber, Z. L. Wescoe, R. Abu-shumays, J. T. Vivian, B. Baatar, K. Karplus, and M. Akeson
[paper]

Selected Publications

Crookshanks