Data Science – then and now!

Data Science – then and now!


  • Data Science = Statistics + Computer Science
  • emerges as a designation for stores of big data

 Data Science – then and now!


The following timeline traces the evolution of the term “Data Science”, along with its use, attempts to define it, and related terms:


“The future of Data Analyses “- by John W.Turkey, 1962

  • More emphasis was placed on using data to suggest hypotheses to test
  • Exploratory Data Analysis and Confirmatory Data Analysis works in parallel

“Book on Survey – Contemporary data processing methods “– by Peter Naur, 1974

    • Data is a representation of the facts or ideas in a formalized manner
    • It is capable of being communicated or manipulated by some process
    • The rise of “Datalogy”, the science of data and data processes and its place in education
    • Data Science here defined as – the science of dealing with data, once established and the relation of data being delegated to the other fields and sciences.


“The International Association for Statistical Computing (IASC)”- Section of ISI, 1977

  • The mission is to link traditional statistical methodology, modern computer technology and the knowledge of domain experts in order to convert data into information and knowledge

Gregory Piatetsky-Shapiro, 1989

  • Arrival of Knowledge Discovery in Databases (KDD) workshop
  • It became the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) in 1995

“Database Marketing” – cover story by BusinessWeek, 1994

  • Companies collect mountains of information about you
  • Then crunch it to predict how likely you are to buy a product
  • Implement the knowledge to craft a marketing message precisely calibrated to get you to do so
  • Many companies were too overwhelmed by the sheer quantity of data to do anything useful with the information
  • However, many companies believe they have no choice but to brave the database-marketing frontier


“Members of the International Federation of Classification Societies (IFCS)”, 1996

  • Data science is included in the title of the conference (“Data science, classification, and related methods”)

“From Data Mining to Knowledge Discovery in Databases” by – Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth,1996

  • Historically, the notion of finding useful patterns in data has been given a variety of names,
  • Some of the names are data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing
  • KDD [Knowledge Discovery in Databases] refers to the overall process of discovering useful knowledge from data, and
  • Data mining refers to a particular step in this process
  • Data mining is the application of specific algorithms for extracting patterns from data
  • Data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, are essential to ensure that useful knowledge is derived from the data

H. C. Carver Chair in Statistics at the University of Michigan -Professor C. F. Jeff Wu, 1997

  • Asked statistics to be renamed as data science, and statisticians to be renamed data scientists

The journal Data Mining and Knowledge Discovery, 1997

  • “Data mining” designates as – “extracting information from large databases.”

“Mining Data for Nuggets of Knowledge” – Jacob Zahavi quoted – 1997

  • Conventional statistical methods work well with small data sets
  • Today’s databases, however, involves millions of rows and scores of columns of data
  • Scalability is a huge issue in data mining
  • Another technical challenge is developing models that can do a better job analysing data, detecting non-linear relationships and interaction between elements
  • Special data mining tools may have to be developed to address web-site decisions


Also read: The Beginners’ Guide to Data Science Jargon


“Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” – by William S. Cleveland, 2001

  • Plan to enlarge the major areas of technical work of the field of statistics
  • The benefit to the data analyst has been limited, because the knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited
  • A merger of knowledge bases would produce a powerful force for innovation
  • The statisticians should look to computing for knowledge today just as data science looked to mathematics in the past
  • The departments of data science should contain faculty members who devote their careers to advances in computing with data and who form partnership with computer scientists

“Statistical Modeling: The Two Cultures” (PDF) – by Leo Breiman, 2001

  • Two cultures in the use of statistical modeling to reach conclusions from data
  • One assumes that the data are generated by a given stochastic data model, while the other uses algorithmic models and treats the data mechanism as unknown
  • Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics
  • It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets.
  • If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools

Launch of Journal of Data Science, 2003

  • Data Science means almost everything that has something to do with data: Collecting, analyzing, modeling
  • The most important part is its applications–all sorts of applications

“Competing on Analytics,” a Babson College Working Knowledge Research Center report “- by Thomas H. Davenport, Don Cohen, and Al Jacobson, 2005

  • The emergence of a new form of competition based on the extensive use of analytics, data, and fact-based decision making
  • Beside competing on traditional factors, companies starts to employ statistical and quantitative analysis and predictive modeling as primary elements of competition

The National Science Board publishes “Long-lived Digital Data Collections – 2005

  • Data scientists are – “the information and computer scientists, database and software engineers and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection.”
  • In simple terms, they are the people who work where the research is carried out–or, in the case of data centre personnel, in close collaboration with the creators of the data–and may be involved in creative enquiry and analysis, enabling others to work with digital data, and developments in data base technology


Also read: Secrets To Clinch Victory in Global Data Science Competitions

Harnessing the Power of Digital Data for Science and Society, 2009

  • The nation needs to identify and promote the emergence of new disciplines and specialist’s expert in addressing the complex and dynamic challenges of digital preservation, sustained access, reuse and repurposing of data
  • Many disciplines are seeing the emergence of a new type of data science and management expert, accomplished in the computer, information, and data sciences arenas and in another domain science
  • These individuals are key to the current and future success of the scientific enterprise
  • However, these individuals often receive little recognition for their contributions and have limited career paths.

“Google’s Chief Economist, tells the McKinsey Quarterly”- Hal Varian, 2009

  • Quote – “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”
  • The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—are going to be the most important skills in the coming decades
  • Managers need to be able to access and understand the data themselves.

“The Revolution in Astronomy Education: Data Science for the Masses “- Kirk D. Borne, 2009

  • Understanding the data is crucial for the success of sciences, communities, projects, agencies, businesses, and economies
  • It is true for both specialists (scientists) and non-specialists (everyone else: the public, educators and students, workforce)
  • specialists must learn and apply new data science research techniques
  • Non-specialists require information literacy skills

“Rise of the Data Scientist”- Nathan Yau, 2009

  • As quoted, “the next sexy job in the next 10 years would be statisticians.”
  • By statisticians, he actually meant a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts
  • Ben Fry argues for an entirely new field, which will combine the skills and talents from disjointed areas of expertise… [Computer science; mathematics, statistics, and data mining; graphic design and human-computer interaction].


Also read: How is data science helping NFL players win Super bowl?!

Troy Sadkowsky, 2009

  • Created the data scientists group on LinkedIn, complementing his website, (which later became

”Data, Data Everywhere“- The Economist Special Report – Kenneth Cukier, 2009

  • A new kind of professionals has emerged – the data scientists, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data

“What is Data Science?”- Mike Loukides, 2010

  • Data scientists combine entrepreneurship with patience, along with the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution
  • They are inherently interdisciplinary
  • They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions
  • They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: ‘here’s a lot of data, what can you make from it?’


Also read: What Sets Apart Data Science from Big Data and Data Analytics

“A Taxonomy of Data Science” – Hilary Mason and Chris Wiggins – 2010

  • Data scientist, in roughly chronological order: Obtain, Scrub, Explore, Model, and Interpret
  • Data science is clearly a blend of the hackers’ arts
  • Statistics and Machine learning and the expertise in mathematics and the domain of the data for the analysis to be interpretable
  • Requires creative decisions and open-mindedness in a scientific context

“The Data Science Venn Diagram”- Drew Conway, 2010

  • Simply enumerating texts and tutorials does not untangle the knots
  • Data Science Venn Diagram – hacking skills, math and stats knowledge, and substantive expertise


“Why the term ‘data science’ is flawed but useful “- Pete Warden, 2011

  • The people tend to work beyond the narrow specialties that dominate the corporate and institutional world, handling everything from finding the data, processing it at scale, visualizing it and writing it up as a story
  • They also seem to start by looking at what the data can tell them, and then pick interesting threads to follow rather than the traditional scientist’s approach of choosing the problem first and then finding data to shed light on it

“Data Science’:  What’s in a name?”- David Smith, 2011

  • Many companies are now hiring ‘data scientists’, and the entire branch of study is run under the name of ‘data science’
  • Yet some have resisted the change from the more traditional terms like ‘statistician’ or ‘quant’ or ‘data analyst’
  • However, unabashedly ‘Data Science’ better describes what we actually do, which is a combination of computer hacking, data analysis, and problem solving


“The Art of Data Science” – Matthew J. Graham, 2011

  • To flourish in the new data-intensive environment of 21st century, we need to evolve new skills
  • We need to understand what rules [data] obey, how it is symbolized and communicated, and what its relationship to physical space and time is.

“Data Science, Moore’s Law, and Moneyball” – Harlan Harris, 2011

  • Data Scientist runs the gamut from data collection and munging, through an application of statistics, machine learning and related techniques for interpretation, communication, and visualization of the results
  • Data Science is defined by its practitioners, as a career path rather than a category of activities
  • People who consider themselves Data Scientists typically have eclectic career paths, that might in some ways seem not to make much sense.

“Building Data Science Teams”- D.J. Patil, 2011

  • Jeff Hammerbacher shared the experiences of building the data and analytics groups at Facebook and LinkedIn
  • He realized that as their organizations grew, they need to figure out what to call the people on their teams
  • ‘Business analyst’ seemed too limiting
  • ‘Data analyst’ was a contender, but they felt that title might limit what people could do. After all, many of the people on their teams had deep engineering expertise
  • ‘Research scientist’ was a reasonable job title used by companies like Sun, HP, Xerox, Yahoo, and IBM
  • However, they felt that most research scientists worked on projects that were futuristic and abstract, and the work was done in labs that were isolated from the product development teams
  • Instead, the focus of the teams was to work on data applications that would have an immediate and massive impact on the business
  • The term that seemed to fit best was data scientist: those who use both data and science to create something new

“Data Scientist: The Sexiest Job of the 21st Century” in the Harvard Business Review – Tom Davenport and D.J. Patil, 2012


Join DexLab Analytics for intensive Online Data Science Certification Pune and Gurgaon. A top-notch data science online learning institute, DexLab Analytics feel honoured to host a wide array of training sessions, both online and in-class for data aspirants.


Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.