Skip to content Skip to footer
-70%

Trustworthy Online Controlled Experiments 1st Edition by Ron Kohavi, ISBN-13: 978-1108724265

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Trustworthy Online Controlled Experiments 1st Edition by Ron Kohavi, ISBN-13: 978-1108724265

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ Cambridge University Press
  • Publication date: ‎ April 2, 2020
  • Edition: ‎ 1st
  • Language ‏: ‎ English
  • 290 pages
  • ISBN-10: ‎ 1108724264
  • ISBN-13: ‎ 978-1108724265

Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests.

Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions. Learn how to

  • Use the scientific method to evaluate hypotheses using controlled experiments.
  • Define key metrics and ideally an Overall Evaluation Criterion.
  • Test for trustworthiness of the results and alert experimentersto violated assumptions.
  • Build a scalable platform that lowers the marginal cost of experiments close to zero.
  • Avoid pitfalls like carryover effects and Twyman’s law * Understand how statistical issues play out in practice.

Table of Contents:

Half-title

Reviews

Title page

Copyright information

Contents

Preface

Acknowledgments

Part I Introductory Topics for Everyone

1 Introduction and Motivation

Online Controlled Experiments Terminology

Why Experiment? Correlations, Causality, and Trustworthiness

Necessary Ingredients for Running Useful Controlled Experiments

Tenets

Tenet 1: The Organization Wants to Make Data-Driven Decisions and Has Formalized an OEC

Tenet 2: The Organization Is Willing to Invest in the Infrastructure and Tests to Run Controlled Exp

Tenet 3: The Organization Recognizes That It Is Poor at Assessing the Value of Ideas

Improvements over Time

Google Ads Example

Bing Relevance Example

Bing Ads Example

Examples of Interesting Online Controlled Experiments

UI Example: 41 Shades of Blue

Making an Offer at the Right Time

Personalized Recommendations

Speed Matters a LOT

Malware Reduction

Backend Changes

Strategy, Tactics, and Their Relationship to Experiments

Scenario 1: You Have a Business Strategy and You Have a Product with Enough Users to Experiment

Scenario 2: You Have a Product, You Have a Strategy, but the Results Suggest That You Need to Consid

Additional Reading

2 Running and Analyzing Experiments: An End-to-End Example

Setting up the Example

Hypothesis Testing: Establishing Statistical Significance

Designing the Experiment

Running the Experiment and Getting Data

Interpreting the Results

From Results to Decisions

3 Twyman’s Law and Experimentation Trustworthiness

Misinterpretation of the Statistical Results

Lack of Statistical Power

Misinterpreting p-values

Peeking at p-values

Multiple Hypothesis Tests

Confidence Intervals

Threats to Internal Validity

Violations of SUTVA

Survivorship Bias

Intention-to-Treat

Sample Ratio Mismatch (SRM)

Threats to External Validity

Primacy Effects

Novelty Effects

Detecting Primacy and Novelty Effects

Segment Differences

Segmented View of a Metric

Segmented View of the Treatment Effect (Heterogeneous Treatment Effect)

Analysis by Segments Impacted by Treatment Can Mislead

Simpson’s Paradox

Encourage Healthy Skepticism

4 Experimentation Platform and Culture

Experimentation Maturity Models

Leadership

Process

Build vs. Buy

Can an External Platform Provide the Functionality You Need?

What Would the Cost Be to Build Your Own?

What’s the Trajectory of Your Experimentation Needs?

Do You Need to Integrate into Your System’s Configuration and Deployment Methods?

Infrastructure and Tools

Experiment Definition, Set-up, and Management

Experiment Deployment

Experiment Instrumentation

Scaling Experimentation: Digging into Variant Assignment

Single-Layer Method

Concurrent Experiments

Experimentation Analytics

Part II Selected Topics for Everyone

5 Speed Matters: An End-to-End Case Study

Key Assumption: Local Linear Approximation

How to Measure Website Performance

The Slowdown Experiment Design

Impact of Different Page Elements Differs

Extreme Results

6 Organizational Metrics

Metrics Taxonomy

Formulating Metrics: Principles and Techniques

Evaluating Metrics

Evolving Metrics

Additional Resources

SIDEBAR: Guardrail Metrics

SIDEBAR: Gameability

7 Metrics for Experimentation and the Overall Evaluation Criterion

From Business Metrics to Metrics Appropriate for Experimentation

Combining Key Metrics into an OEC

Example: OEC for E-mail at Amazon

Example: OEC for Bing’s Search Engine

Goodhart’s Law, Campbell’s Law, and the Lucas Critique

8 Institutional Memory and Meta-Analysis

What Is Institutional Memory?

Why Is Institutional Memory Useful?

9 Ethics in Controlled Experiments

Background

Risk

Benefits

Provide Choices

Data Collection

Culture and Processes

SIDEBAR: User Identifiers

Part III Complementary and Alternative Techniques to Controlled Experiments

10 Complementary Techniques

The Space of Complementary Techniques

Logs-based Analysis

Human Evaluation

User Experience Research (UER)

Focus Groups

Surveys

External Data

Putting It All Together

11 Observational Causal Studies

When Controlled Experiments Are Not Possible

Designs for Observational Causal Studies

Interrupted Time Series

Interleaved Experiments

Regression Discontinuity Design

Instrumented Variables (IV) and Natural Experiments

Propensity Score Matching

Difference in Differences

Pitfalls

SIDEBAR: Refuted Observational Causal Studies

Part IV Advanced Topics for Building an Experimentation Platform

12 Client-Side Experiments

Differences between Server and Client Side

Difference #1: Release Process

Difference #2: Data Communication between Client and Server

Implications for Experiments

Implication #1: Anticipate Changes Early and Parameterize

Implication #2: Expect a Delayed Logging and Effective Starting Time

Implication #3: Create a Failsafe to Handle Offline or Startup Cases

Implication #4: Triggered Analysis May Need Client-Side Experiment Assignment Tracking

Implication #5: Track Important Guardrails on Device and App Level Health

Implication #6: Monitor Overall App Release through Quasi-experimental Methods

Implication #7: Watch Out for Multiple Devices/Platforms and Interactions between Them

Conclusions

13 Instrumentation

Client-Side vs. Server-Side Instrumentation

Processing Logs from Multiple Sources

Culture of Instrumentation

14 Choosing a Randomization Unit

Randomization Unit and Analysis Unit

User-level Randomization

15 Ramping Experiment Exposure: Trading Off Speed, Quality, and Risk

What Is Ramping?

SQR Ramping Framework

Four Ramp Phases

Ramp Phase One: Pre-MPR

Ramp Phase Two: MPR

Ramp Phase Three: Post-MPR

Ramp Phase Four: Long-Term Holdout or Replication

Post Final Ramp

16 Scaling Experiment Analyses

Data Processing

Data Computation

Results Summary and Visualization

Part V Advanced Topics for Analyzing Experiments

17 The Statistics behind Online Controlled Experiments

Two-Sample t-Test

p-Value and Confidence Interval

Normality Assumption

Type I/II Errors and Power

Bias

Multiple Testing

Fisher’s Meta-analysis

18 Variance Estimation and Improved Sensitivity: Pitfalls and Solutions

Common Pitfalls

Delta vs. Delta %

Ratio Metrics. When Analysis Unit Is Different from Experiment Unit

Outliers

Improving Sensitivity

Variance of Other Statistics

19 The A/A Test

Why A/A Tests?

Example 1: Analysis Unit Differs from Randomization Unit

Example 2: Optimizely Encouraged Stopping When Results Were Statistically Significant

Example 3: Browser Redirects

Example 4: Unequal Percentages

Example 5: Hardware Differences

How to Run A/A Tests

When the A/A Test Fails

20 Triggering for Improved Sensitivity

Examples of Triggering

Example 1: Intentional Partial Exposure

Example 2: Conditional Exposure

Example 3: Coverage Increase

Example 4: Coverage Change

Example 5: Counterfactual Triggering for Machine Learning Models

A Numerical Example (Kohavi, Longbotham et al. 2009)

Optimal and Conservative Triggering

Overall Treatment Effect

Example 1

Example 2

Trustworthy Triggering

Common Pitfalls

Pitfall 1: Experimenting on Tiny Segments That Are Hard to Generalize

Pitfall 2: A Triggered User Is Not Properly Triggered for the Remaining Experiment Duration

Pitfall 3: Performance Impact of Counterfactual Logging

Open Questions

Question 1: Triggering Unit

Question 2: Plotting Metrics over Time

21 Sample Ratio Mismatch and Other Trust-Related Guardrail Metrics

Sample Ratio Mismatch

Scenario 1

Scenario 2

SRM Causes

Debugging SRMs

Other Trust-Related Guardrail Metrics

22 Leakage and Interference between Variants

Examples

Direct Connections

Indirect Connections

Some Practical Solutions

Rule-of-Thumb: Ecosystem Value of an Action

Isolation

Edge-Level Analysis

Detecting and Monitoring Interference

23 Measuring Long-Term Treatment Effects

What Are Long-Term Effects?

Reasons the Treatment Effect May Differ between Short-Term and Long-Term

Why Measure Long-Term Effects?

Long-Running Experiments

Alternative Methods for Long-Running Experiments

Method #1: Cohort Analysis

Method #2: Post-Period Analysis

Method #3: Time-Staggered Treatments

Method #4: Holdback and Reverse Experiment

References

Index

Ron Kohavi is a VP and Technical Fellow at Airbnb. He was previously a Technical Fellow and Corporate VP at Microsoft. Prior to Microsoft, he was the director of data mining and personalization at Amazon.com. He has a PhD in Computer Science for Stanford University. His papers have over 40,000 citations and three of his papers are in the top 1,000 most-cited papers in Computer Science.

Diane Tang is a Google Fellow, with expertise in large-scale data analysis and infrastructure, online controlled experiments, and ads systems. She has an AB from Harvard and MS/PhD from Stanford, and has patents and publications in mobile networking, information visualization, experiment methodology, data infrastructure, and data mining / large data.

Ya Xu heads Data Science and Experimentation at LinkedIn. She has led LinkedIn to become one of the most well-regarded companies when it comes to A/B testing. Before LinkedIn, she worked at Microsoft and received a PhD in Statistics from Stanford University. She is widely regarded as one of the premier scientists, practitioners and thought leaders in the domain of experimentation, with several filed patents and publications. She is also a frequent speaker at top conferences, universities and companies across the country.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)