Researchers across disciplines use statistics to measure, to model, and to mine for insights. But the field of statistics has been changed by the advent of huge data sets, and by new technologies such as machine learning and deep neural networks.
MIT’s recent Statistics and Data Science Conference, or SDSCon, celebrated those changes — and the new opportunities and breakthroughs they make possible.
“The foundations of 21st century statistics include not only classic statistics and probability, but also computation and data analysis,” explained Devavrat Shah, professor in the Department of Electrical Engineering and Computer Science (EECS). Shah is also director of the MIT Statistics and Data Science Center (SDSC), which hosted the event and is itself a part of the MIT Institute for Data, Systems, and Society (IDSS).
The annual conference was held on campus on April 20 and attracted almost 200 participants from diverse fields in academia and industry, including science, engineering, economics, and mathematics. Talk topics crossed disciplines as well, ranging from genomics to economics, from astronomy and artificial intelligence to social media. Videos of all talks are available online.
To kick off the event, Shah announced two new academic programs for SDSC: an interdisciplinary PhD in statistics, and an online MicroMasters in statistics and data science. Michael Sipser, dean of MIT’s School of Science, gave opening remarks reflecting on the interdisciplinary nature of statistics at MIT.
“Statistics is a mathematical science, but it has somewhat different goals and a different culture than does mathematics itself,” Sipser said. “Now, with the Statistics and Data Science Center [program], we bring together statisticians and data scientists from across disciplines. That broad-based approach has been key to its success.”
SDSCon’s first session demonstrated that range of disciplines within MIT. First, Leonid Mirny, a physics professor with the MIT Institute for Medical Engineering and Science, discussed his research on understanding the human genome as a molecule folded in 3-D. Mirny was followed by Joshua Tenenbaum of the Department of Brain and Cognitive Sciences, whose artificial intelligence research seeks to understand and replicate how the human mind processes data and forms concepts. Last, Piotr Indyk of EECS explored the complexity of big data problems from the perspective of an algorithm designer.
“We live in interesting times,” Indyk said. “The amount of data that is being produced grows pretty quickly. This means the input to our problems is growing.”
Three plenary talks from academics outside MIT showed applications in the social sciences. First, Sendhil Mullainathan of Harvard University’s Department of Economics talked about the role machine learning can play in assessing and minimizing human bias and error. A chief example: physicians balancing care with cost when deciding to order expensive tests for patients whose symptoms may indicate a heart attack.
Next, Kathleen McKeown, a computer science professor at Columbia University, explored ways in which salient information can be extracted from large, dynamic data sources such as Twitter feeds. One potential application of her research: using the tweets of gang members to predict violence in communities.
Lastly, Stanford University engineer Stephen Boyd addressed applications of statistics to finance, where trades can be modeled and simulated in an attempt to forecast risks and returns.
At a panel talk about data visualization, panelists explored the opportunities and challenges of representing information — from new ways to map the stars to journalists telling a clear story with numbers. Matthew Kay, assistant professor for the University of Michigan’s School of Information, emphasized the importance of data visualizations clearly representing uncertainty, referencing the confusion many felt around models predicting the outcome of the 2016 U.S. presidential election.
SDSCon included an industry session with representatives from Facebook and Google Brain, where the interdisciplinarity of statistics and data science was also underscored. Said Martin Wattenberg, senior staff research scientist at Google: “Figuring out how we bring people together from different disciplines to collaborate around visualization is critical.”
Shah underscored that SDSC has representation “across the Institute, and across the academic units.”
“That is the core of how we believe statistics and data science should be done, both at MIT and beyond,” he said. “It is fundamentally interdisciplinary.”