Q. How is big data reshaping the global economy today, and what does the future hold?
Karim R. Lakhani: Today, there's a shortage of people who can actually take the data, analyze it, and derive incidence. Basically, there is a secular shift in the economy where more and more of any company's transactions and activities are being digitized and also happening online, which then generates more data. There is an explosion in the "four Vs"—variety, velocity, voracity, and volume. What used to be limited to just the software business is now happening in a whole range of businesses across industries. It's critical for companies to recruit and build data analytics teams. The exposure to and understanding of analytics methods should range from experimentation through data analysis through machine learning and application of the methods.
John A. Deighton: For 150 years, we built the economy on cheap energy. We refer to that era as the Industrial Age. Now we're building an economy on data, and we refer to it as the Information Age. Data has special properties that make it unlike any other resource. It can, for example, be used without being used up. And consuming information does not exclude someone else from consuming it. Unlike physical products, information products can be replicated at low or no incremental cost. The result is a new science, data science, that is complementing the physical sciences. Being data literate will be the most critical skill of the Information Age.
Q. Predictive analytics is becoming a hot-button issue. How does a company address the ethics of transparency and at the same time improve its capabilities?
JD: Predictive analysis is a big part of this course. We explore not only technical questions, but also ethical questions in data collection: "I didn't tell you this, so how can you use it?" For example, analytics can predict which people in a firm are most likely to resign from their jobs. It involves looking at a whole set of predictors and deciding something that these employees might not even know—they will get and accept an offer from a competitor. Just because we can does not mean we should, however.
KL: We provide participants with an analytical framework to get them thinking about how data can be useful to their business. Data can help you do things cheaper, faster, and better, and it can help you improve the ways in which you run your business. For example, an insurance company wants to be better at detecting fraud. They have existing methods, but they don't know how to use the available data to create and capture more value.
Q. What is "correlation versus causality," and how is it used across different industries?
JD: Distinguishing correlation from causality is a key issue in the big data world because so much data is observational. It's found data that just happens, so it's tempting to think that the correlational patterns are actually causal patterns. Here's a well-known example in my field: If you shop at both the physical and the online store and then buy from the store catalog, you spend more in total than someone who shops at only one channel. That gets interpreted as "We must get you to use more channels," treating a correlational finding as a causal finding. But there's another explanation: You like the store, so you shop at every possible outlet. If you made people who didn't like the store shop at more channels, they'd buy the same amount spread over different channels.
KL: This came up in banking, too. When ATMs were first launched, they found the same data: People who used the teller and the ATM were better customers than people who just used the teller. So they tried to drive people to use the ATM, but that didn't lead to an increase in customers because only the loyal customers like to use multiple outlets. Now they're making the same mistake again with online banking. But it's even worse because when people go to online banking, they are more likely to defect. The causality is actually reversed.
Q. You mentioned that "experimentation" is a key theme of the program. In what way?
JD: The Google Car case explores the difference between modeling and machine loading, which is an experimentation theme we play with in the program. When you model something, you work with historical data. But when you're trying to drive a self-driving car, historical data isn't useful—it's all about machine learning. You have to determine if that thing in the road is a paper bag or a child. Do you brake, or do you run over it?
KL: Yes, the Google Car is a good example of experimenting with new business models that have all this data and connectivity available to them. It's driven by the fact that you can actually do this kind of analysis and then offer new business models. Instacart is the same thing. It's a new business model in terms of how you actually deliver the groceries, and being able to predict and match line demand is very different.
Q. Could you share some examples of companies that are using consumer data to improve efficiency?
JD: In the Instacart case, we examine grocery buying as a personal shopper service. The issue is predicting the patterns of demand so that you can try to steer customers away from the times when you're short of personal shoppers and toward the times when you have excess capacity. Then there's the issue of defining the ethical boundaries to predict the information people don't volunteer to tell you. It's basically an efficiency steward.
KL: Exactly. Should Whole Foods think of Instacart as a competitor or a supplier? If Whole Foods gets consumers used to buying everything off Instacart, then they could integrate and have all this data. "We know every Friday there's a run on Rice Krispies. We don't need to even have it sitting in Whole Foods. We'll just buy it, stock it, and make it available." Those are the kinds of things you could imagine happening now that they have this very fine green level of consumer data. Because Whole Foods can't identify me yet, I can go in and shop. I'm completely anonymous.
For more information on Harvard Business School's big data analytics program, visit Competing on Business Analytics and Big Data.