Big Data ≧ Data + Questions + Algorithmic Knowledge + Commitment + Collaboration. Many big data activities fail to establish the necessary preconditions for success. In science the main challenges are the access to data and the collaboration across disciplines. For companies, however the main challenge in applying big data is to ask the right questions. A company may spend enormous amounts of resources creating an impressive big data infrastructure, and then discover that not a single pertinent question can be answered by applying big data. This is a rather common reality and depending on whom one talks to, the reality is that many or most big data projects simply fail.
There are several clear reasons for such failures. Sometimes things fails because there is simply no potential for successful big data applications, and this is only discovered after the fact. In many other cases, it is because the basic requirements for success are not fulfilled. Before one starts good quality data, useful questions, proper algorithmic knowledge and the right tools must exist. A clear commitment to apply findings and the ability to to collaborate across multidisciplinary teams are further prerequisites for any possible success. Companies can in most cases purchase the required data and the algorithmic knowledge to crunch it. However one must be able to formulate questions independently and in to do so it is necessary first to have an understanding of the potential of applied data science in relation to the market and the company itself.
External consulting can help, but will more likely only stimulate the search for proper questions. Instruments and tools will be provided and – in case of top consultants – care will be taken that good practices are considered and that the search for questions is not limited to those already asked by competitors in the market. However, companies cannot fully outsource the thinking part of big data. Relying on good practices is useful, but they must be adapted to the company’s market position and the capabilities available inside the company. It is also crucial that company leaders are open for creating innovations which at best cannot be copied by others.
More (Applied) Mathematics than Science
Taking the above into consideration, applied data science is no magic bullet. First, it primarily enables those who have a profound understanding of their own business to further develop that understanding. Second, any impact made with applied data science depends on the capability to implement change and innovation. Thus, applied data science is not likely to help those, who do not know their business well, nor is it likely to innovate companies which are resistant to change.
Big data is more mathematics than science in that like most standard scientific practices it does not ensure validity ex ante, but its findings must be validated ex post. Unfortunately, contrary to pure mathematics with its proofs of facts, ex post validation in big data generates at best a weak evidence, which is to be “falsified” in the sense of Popper through implementation in real life as with most applied mathematics. In actuality, applied data science in business comes with the risk of not scrutinising the algorithmic results sufficiently before they are turned into business innovations. .
In Search of Data
The application of big data in science is both more straightforward and more challenging than in business. Science is very experienced in the creation of good questions, but in many cases it is too costly or even impossible to obtain the data needed to answer the questions. The situation is particularly painful in healthcare.Applied data science has the potential to dramatically improve prevention, diagnosis, therapy, and monitoring, as well as healthcare practices in general, the running of healthcare units, and the resource planning for the healthcare system as a whole. However, big data can not be done with out data and such improvements will happen only if patients’ health data can be made available for research. Right now, this is often not the case.
Looking at the situation on a more general scale, in some areas the digitalisation of data has enabled significant progress, e.g. in economic history, while in many other areas the dramatic lack of accessible data blocks progress and innovations If Switzerland is not able to solve the data access problems in several critical areas in the next few years, it will be faced with severe problems in research, in economic development, and in government.
In Search of Skills and Sharing
Having data and the proper questions is the beginning, and for many companies it is deemed as good enough, however this is not the case for more advanced scientific research. Along with practical knowhow and advanced tooling, in many situations even in business a deep algorithmic knowledge is necessary. Having the data and questions in hand is a start, but researchers and companies should at least understand how to work with data of mixed quality, how to eliminate bias from data, and how to focus on relevant dimensions in high dimensional data spaces. They should understand which tools in which cases and under what conditions are available to help face these challenges. They should become aware of the advantages and disadvantages of the contemporary algorithmic machinery and consider them in each and every project – and possibly with the aid of external contributors.
In practical settings, the application of data science often relies on a wide range of expertise and is preferably performed in multidisciplinary teams.. However, such a collaboration is very challenging. There is ample evidence that cross-disciplinary collaboration fails if there is too much reliance on interfaces and has more chance to succeed if the flow of results among the various disciplinary experts in the teams is nurtured by the creation of domains of shared knowledge and the use of boundary objects, which is an art of its own.
Beyond the buzzwords and sales speeches, it can be said that big data is a big mess and that it needs a kind of artistic inspiration. This brings with it an awkward situation and unpleasant predicament, or one might term it “up shit creek without a paddle in order to compose a piece of music”. Although this image might repel you, it does serve as a metaphor for many big data ventures and in some case the metaphor makes perfect sense. When big data becomes a tool to save human life, then it becomes clear in the sense of Paul Feyerabend that with big data “anything goes”.
Projects must fit their methods and these must adapt during runtime. Scientifically speaking, this is forbidden in many disciplines, but it is exactly this “crime” against scientific rules that provides the true thrill of big data projects. On the one hand, it is possible to courageously leave established disciplinary grounds to discover new land, and on the other, it becomes possible to depart far away from known grounds and drown in a self-made sea of, data generated fictions.
There are many types of big data projects with very different tasks, ranging from very trivial to absolutely impossible. For the more challenging amongst them, the following inequality holds true: Big Data ≧ Data + Questions + Algorithmic Knowledge + Commitment + Collaboration. Thereby, the “≧” should be read as “is more than just”, the difference being here the inclusion of a great idea, good luck, or further undefined ingredients such as a broad scientific education. However, great ideas and good luck cannot come to fruition unless the domains of operation are fully understood.. So a proper mix of experience, expertise, talent, and a true commitment to collaboration remain the bottom line of big success with big data.