Article summary I
Summary
David Donoho’s “50 Years of Data Science” provided very thoughtful retrospectives and perspectives on data science and how it relates to Statistics. He referenced landmark data science papers by four statistical heroes: John Tukey: “The Future of Data Analysis” (1962); John Chambers: “Greater or Lesser Statistics: A Choice for Future Research” (1993); Leo Breiman: “Statistical Modeling: The Two Cultures” (2001); William S. Cleveland: “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” (2001), reviewed the evolution of data science as a “science of learning from data”, and proposed a fuller version of data science (referred to “Greater Data Science (GDS)) that has 6 components (data gathering, preparation, and exploration; data representation and transformation, computing, modeling, visualization and presentation, science about data science). He discussed that modeling dominated the representation of data science in today’s academic departments, then he provided insights on how to teach GDS effectively and showed some scientific research examples in data science. In the end, he drew that data science in the next 50 years should be open science takes over, science as data, and science data analysis, tested empirically.
Reaction
Data science is a fast-evolving field, and there is a great demand in the real world. The debate about science vs data science has been raging in academic statistics departments over the past couple of years, and the opinions about what is data science are different, such as data science is just a “rebranding” of statistics, statistics is the least important part of data science, and data science is statistics, etc. I agreed with Dr. Donoho’s definition of data science. Data science is not just rebranding of statistics, it’s larger than that, for example, to include GDS6: Science about Data Science. As far as I’m concern, the theory is important, but there are still many issues that need to be considered in solving practical problems. I am optimistic that the GDS will become a reality shortly. But I’m not sure how to learn these skills, and what courses are better for students to be great data scientists? Donoho’s paper proposed some ideas, but I don’t think that is suitable for everyone. Also, it seems that the major contributions to data science come from people who worked in the industry or had industry experience, e.g. Wickham and Xie. But I think more people from academia will contribute to data science since we have many new data science programs.
Questions
- What’s the allocation of efforts on each component of GDS?
- Is it the industry that has a greater demand for data scientists, not academia? What is the difference between data scientists from academia and industry?
- The long-term direction of Data Science?