Computing Systems for Data-Driven Science
Foundations of Data Science
Dr. Knowles studied Natural Sciences and Information Engineering at the University of Cambridge before obtaining an MSc in Bioinformatics and Systems Biology at Imperial College London. During his PhD studies in the Cambridge University Engineering Department Machine Learning Group under Zoubin Ghahramani he worked on Bayesian nonparametric models for factor analysis, hierarchical clusterings and network analysis, as well as on (stochastic) variational inference. He was a post-doctoral researcher at Stanford University with Sylvia Plevritis (Center for Computational Systems Biology/Radiology) and Jonathan Pritchard (Genetics/Biology) having previously worked with Daphne Koller (Computer Science). He is a Core Faculty Member at the New York Genome Center, an Assistant Professor of Computer Science and an Interdisciplinary Appointee in Systems Biology at Columbia University. His lab develops and applies statistical machine learning methods for computational genomics.
Advances in DNA and RNA sequencing technology are resulting in exponential growth in the volume of genomic data being generated in both basic and translational biomedical research. Meanwhile, practices for managing and analysing such datasets is suboptimal. Analysis pipelines are file based, with metadata often encoded in file names which is both error-prone and makes applications to new datasets cumbersome. Thus there is substantial opportunity in genomics to bring modern data management software and hardware tools to bare.