This session covered the connections between Big Data, Analytics and Evidence Based paradigm. Goldenberg’s article on evidence based approach in medicine was discussed. In particular the debate was around the notion articulated in the paper that the appeal to the authority of evidence that characterises evidence based practices does not increase objectivity but rather obscures the subjective elements that inescapably enters all forms of human enquiry. The definition of evidence was discussed as ‘some conceptual warrant for belief or action’ and centrality of evidence in science was accepted.
Goldenberg – On evidence and EBM – lessons from the philosophy of science
It is the practice of basing all beliefs and practices strictly on evidence that allegedly separates science from other activities. The evidence based medicine (EBM) movement purports to eschew unsystematic and intuitive methods of individual clinical practice in favour of more scientifically rigorous approach. This rigour is achieved through methodological clinical decision making based on examination of evidence derived from the latest clinical research. The evidence based techniques is an extension of the philosophical system of logical positivism which recognises only scientifically verifiable propositions as meaningful. This school of thought originated in Vienna in 1920 and as number of members of Vienna circle emigrated to UK & US that led to the strong influence of logical positivism on Anglo-American analytic philosophy.
The EBM movement centres around five linked ideas:
- Clinical decisions should be based on the best available scientific evidence
- Clinical problem and not the habits or protocols should determine the type of evidence to be sought
- Identifying the best evidence means epidemiological and bio statistical way of thinking
- Conclusions derived from identifying and critically appraising evidence are useful only if put into action in managing patients or making health care decisions
- Performance should be constantly evaluated
The synthesis of large amount of clinical trial data into manageable “clinical summaries” or “meta-analyses” in EBM’s projects like Cochrane Collaboration is first step towards Big Data concept.
The critique of EBM is done on two grounds. In the first Hanson (1958) , Kuhn (1970,1996) and Feyerabend ( 1978) have claimed that observation is theory-laden and is coloured by our background beliefs and assumptions therefore can never be unmitigated perception of nature of things. In the second Duhem (1982) and Quine (1960) have argued that our theory choices are never determined exclusively by evidence instead a given body of evidence may support numerous even contradicting theories.
Phenomenological approaches to science and medicine further challenge notions of evidence in EBM by questioning why relevant evidence is assumed to come primarily from clinical trials and other objective measures. They argue instead that patients self understanding and experience of illness also offers a legitimate source of relevant medical knowledge. This theoretical approach is grounded in the philosophy of Edmund Husserl and his followers who questioned the philosophical completeness of natural sciences. They argued that Cartesian dualism which splits the world into minds and bodies fails to explain human understanding leading to a crisis of meaning.
Next the dictum of “You can’t manage what you don’t measure” by Deming & Drucker was explored in McAfee & Brynjolfsson’s article on Big Data.
McAfee & Brynjolfsson – Big Data – The Management Revolution
The claim that more data we measure the better we can manage the things can be justified statistically by showing that data driven companies are more profitable than others. But this is not a given, it all depends on how the data is analysed and how committed the senior management is with data analytics. The article describes how Big Data is different from field of analytics and why it has become important in recent days. The article outlines three key differences
Volume: As of 2012, about 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so. More data cross the internet every second than were stored in the entire internet just 20 years ago. This gives companies an opportunity to work with many petabyes of data in a single data set—and not just from the internet
Velocity: For many applications, the speed of data creation is even more important than the volume. Real-time or nearly real-time information makes it possible for a company to be much more agile than its competitors
Variety: Big data takes the form of messages, updates, and images posted to social networks; readings from sensors; GPS signals from cell phones, and more. Many of the most important sources of big data are relatively new
The 5 management challenges with Big Data has been described in the article
Leadership: Companies succeed in the big data era not simply because they have more or better data, but because they have leadership teams that set clear goals, define what success looks like, and ask the right questions. Big data’s power does not erase the need for vision or human insight.
Talent management: As data become cheaper, the complements to data become more valuable. Some of the most crucial of these are data scientists and other professionals skilled at working with large quantities of information. Along with the data scientists, a new generation of computer scientists are bringing to bear techniques for working with very large data sets. The best data scientists are also comfortable speaking the language of business and helping leaders reformulate their challenges in ways that big data can tackle. Not surprisingly, people with these skills are hard to find and in great demand.
Technology: The tools available to handle the volume, velocity, and variety of big data have improved greatly in recent years. In general, these technologies are not prohibitively expensive, and much of the software is open source. Hadoop, the most commonly used framework, combines commodity hardware with open-source software. However, these technologies do require a skill set that is new to most IT departments, which will need to work hard to integrate all the relevant internal and external sources of data.
Decision making: An effective organization puts information and the relevant decision rights in the same location. In the big data era, information is created and transferred, and expertise is often not where it used to be. The artful leader will create an organization flexible enough to minimize the “not invented here” syndrome and maximize cross-functional cooperation
Company culture: The first question a datadriven organization asks itself is not “What do we think?” but “What do we know?” This requires a move away from acting solely on hunches and instinct. It also requires breaking a bad habit we’ve noticed in many organizations: pretending to be more data-driven than they actually are.
Marcus’s article in New Yorker also shares the same themes.
Marcus – Steamrolled by Big Data (The New Yorker)
The story of Google improving spell checkers using Big Data has been mentioned along with case of Oren Etzioni created Farecast (eventually sold to Microsoft, and now part of Bing Travel), which scraped data from the Web to make good guesses about whether airline fare would rise or fall.
The case study on Numenta founded by Jeff Hawkins of Palm Pilot’s fame has been mentioned in quite a detail. According to Numenta’s Web site, their software, Grok, “ﬁnds complex patterns in data streams and generates actionable predictions in real time…. Feed Grok data, and it returns predictions that generate action. Grok learns and adapts automatically.” Numenta boasts that “As the age of the digital nervous system dawns, Grok represents the type of technology that will convert massive data ﬂows into value.”
Marcus does claim that that every problem is different and that there are no universally applicable solutions. An algorithm that is good at chess isn’t going to be much help parsing sentences, and one that parses sentences isn’t going to be much help playing chess. A faster computer will be better than a slower computer at both, but solving problems will often (though not always) require a fair amount of what some researchers call “domain knowledge”—speciﬁc information about particular problems, often gathered painstakingly by experts. Big Data is a powerful tool for inferring correlations, not a magic wand for inferring causality.
The article also presents a critique of Big Data by invoking a chat the author had with Anthony Nyström, of the Web software company Intridea in which Nystrom claimed that selling Big Data is a great gig for charlatans, because they never have to admit to being wrong. “If their system fails to provide predictive insight, it’s not their models, it’s an issue with your data.” You didn’t have enough data, there was too much noise, you measured the wrong things. The list of excuses can be long.
Morozov’s article on planning machine was discussed next
The article describes the origins of Big Data concept with the story of Stafford Beer, leading theorist of Cybernetics who envisaged Project Cybersyn to help Chile’s socialist government control the country and it’s economy with the help of computers. Stafford Beer helped design systems like Datafeed which had four screens that could show hundreds of pictures and figures on historical & statistical information on state of production in the country. There was another screen that simulated the future state of the Chilean economy under various conditions.
One wall was reserved for Project Cyberfolk, an ambitious effort to track the real-time happiness of the entire Chilean nation in response to decisions made in the op room. Beer built a device that would enable the country’s citizens, from their living rooms, to move a pointer on a voltmeter-like dial that indicated moods ranging from extreme unhappiness to complete bliss. The plan was to connect these devices to a network—it would ride on the existing TV networks—so that the total national happiness at any moment in time could be determined. The algedonic meter, as the device was called (from the Greek algos, “pain,” and hedone, “pleasure”), would measure only raw pleasure-or-pain reactions to show whether government policies were working.
As Eden Medina shows in “Cybernetic Revolutionaries,” her entertaining history of Project Cybersyn, Beer set out to solve an acute dilemma that Allende faced. How was he to nationalize hundreds of companies, reorient their production toward social needs, and replace the price system with central planning, all while fostering the worker participation that he had promised? Beer realized that the planning problems of business managers—how much inventory to hold, what production targets to adopt, how to redeploy idle equipment—were similar to those of central planners. Computers that merely enabled factory automation were of little use; what Beer called the “cussedness of things” required human involvement. It’s here that computers could help—flagging problems in need of immediate attention, say, or helping to simulate the long-term consequences of each decision. By analyzing troves of enterprise data, computers could warn managers of any “incipient instability.” In short, management cybernetics would allow for the reëngineering of socialism—the command-line economy.
Yet central planning had been powerfully criticized for being unresponsive to shifting realities, notably by the free-market champion Friedrich Hayek. The efforts of socialist planners, he argued, were bound to fail, because they could not do what the free market’s price system could: aggregate the poorly codified knowledge that implicitly guides the behavior of market participants. Beer and Hayek knew each other; as Beer noted in his diary, Hayek even complimented him on his vision for the cybernetic factory, after Beer presented it at a 1960 conference in Illinois. (Hayek, too, ended up in Chile, advising Augusto Pinochet.) But they never agreed about planning. Beer believed that technology could help integrate workers’ informal knowledge into the national planning process while lessening information overload.
Next Harford’s article on Big Data was discussed in which he details what might be going wrong with this whole concept.
Harford, T. (2014). Big Data: Are we making a big mistake? The Financial Times.
The article refers to “Google Flu Trends” which was quick, accurate and cheap & theory-free but still made almost accurate predictions of Flu trends across America. Google’s engineers didn’t bother to develop a hypothesis about what search terms – “flu symptoms” or “pharmacies near me” – might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work.
The article professes to tread cautiously on the four claims on Big Data prevalent among businesses i.e.
- Data analysis produces uncanny accurate results
- Every single data point can be captured, making old statistical sampling techniques obsolete
- It is passé to fret about what causes what, because statistical correlation tells us what we need to know
- Scientific and statistical models aren’t needed as “with enough data, the numbers speak for themselves”
A big data is one where “N = All” where we have the whole population and no sampling is required but this notion can be challenged as it is virtually impossible to get all the data points.