- Blogs
- Discovery Lean Six Sigma
- How to Estimate the Probability of a No-Show using Binary Logistic Regression

In April 2017, overbooking of flight seats hit the headlines when a United Airlines customer was dragged off a flight. A TED talk by Nina Klietsch gives a good, but simplistic explanation of why overbooking is so attractive to airlines.

Overbooking is not new to the airlines; these strategies were officially sanctioned by The American Civil Aeronautics Board in 1965, and since that time complex statistical models have been researched and developed to set the ticket pricing and overbooking strategies to deliver maximum revenue to the airlines.

In this blog, I would like to look at one aspect of this: the probability of a no-show. In Klietsch’s talk, she assumed that the probability of a no-show (a customer not turning up for a flight) is identical for all customers. In reality, this is not true—factors such as time of day, price, time since booking, and whether a traveler is alone or in a group will impact the probability of a no show.

By using this information about our customers, we can predict the probability of a no-show using binary logistic regression. This type of modeling is common to many services and industries. Some of the applications, in addition to predicting no-shows, include:

- Credit scores: What is the probability of default?
- Marketing offers: What are the chances you'll buy a product based on a specific offer?
- Quality: What is the probability of a part failing?
- Human resources: What is the sickness absence rate likely to be?

In all cases, your outcome (the event you are predicting) is discrete and can be split into two separate groups; for example, purchase/no purchase, pass/fail, or show/no show. Using the characteristics of your customers or parts as predictors you can use this modeling technique to predict the outcome.

Let’s look at an example. I was unable to find any airline data, so I am illustrating this with one of our Minitab sample data sets, Cerealpurchase.mtw.

In this example, a food company surveys consumers to measure the effectiveness of their television ad in getting viewers to buy their cereal. The Bought column has the value 1 if the respondent purchased the cereal, and the value 0 if not. In addition to asking if respondents have seen the ad, the survey also gathers data on the household income and the number of children, which the company also believes might influence the purchase of this cereal.

Using **Stat > Regression > Binary Logistic Regression**, I entered the details of the response I wanted to predict, **Bought,** and the value in the Response Event which indicated a purchase. I then entered the Continuous predictor, **Income **and the Categorical predictors **Children **and **ViewAd. **My completed dialog box looks like this:

After pressing OK, Minitab performs the analysis and displays the results in the Session window. From this table at the top of the output I can see that the researchers surveyed a sample of 71 customers, of which 22 purchased the cereal.

With Logistic regression, the output features a Deviance Table instead of an Analysis of Variance Table. The calculations and test statistics used with this type of data are different, but we still use the P-value on the far right to determine which factors have an effect on our response.

As we would when using other regression methods, we are going to reduce the model by eliminating non-significant terms one at a time. In this case, as highlighted above, Income is not significant. We can simply press Ctrl-E to recall the last dialog box, remove the Income term from the model, and rerun the analysis. Minitab returns the following results:

After removing Income, we can see that both Children and ViewAd are significant at the 0.05 significance level. This could be good news for the Marketing Department, as it clearly indicates that viewing the ad did influence the decision to buy. However from this table it is not possible to see if this effect is positive or negative.

To understand this, we need to look at another part of the output. In Binary Logistic Regression, we are trying to estimate the probability of an event. To do this we use the Odds Ratio, which compares the odds of two events by dividing the odds of success under condition A by the odds of success under condition B.

In this example, the Odds Ratio for Children is telling us that respondents who reported they do have children are 5.1628 times more likely to purchase the cereal than those who did not report having children. The good news for the Marketing Department is that customers who viewed the ad were 3.0219 times more likely to purchase the cereal. If the Odds Ratio was less than 1, we would conclude that seeing the advert reduces sales!

The other way to look at these results is to calculate the probability of purchase and analyse this.

It is easy to calculate the probability of a sale by clicking on the **Storage **button in the **Binary Logistic Regression **dialog box and checking the box labeled **Fits (event probabilities)**. This will store the probability of purchase in the worksheet.

Using the fits data, we can produce a table summarizing the Probability of Purchase for all the combinations of Children and ViewAd, as follows:

In the rows we have the Children indicator, and in the columns we have the ViewAd indicator. In each cell the top number is the probability of cereal purchase, and the bottom number is the count of customers observed in each of the groups.

Based on this table, customers with children who have seen the ad have a 51% chance of purchase, whereas customers without children who have not seen the ad have a 6% chance of purchase.

Now let's bring this back to our airline example. Using the information about their customers' demographics and flight preferences, an airline can use binary logistic regression to estimate the probabilities of a “no-show” for a whole plane and then determine by how much they should overbook seats. Of course, no model is perfect, and as we saw with United, getting it wrong can have severe consequences.

Original: http://blog.minitab.com/blog/using-data-and-statistics/how-to-estimate-the-probability-of-a-no-show-using-binary-logistic-regression

By: Gillian Groom

Posted: August 3, 2017, 1:57 pm

Dummy user for scooping articles

I'm a dummy user created for scooping great articles in the network for the community.

- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- August 2014
- July 2014
- June 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- March 2012
- February 2012
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- April 2010
- March 2010
- February 2010
- December 2009
- November 2009
- August 2009
- June 2009
- March 2009
- November 2008
- October 2008
- July 2008
- May 2008
- April 2008
- March 2008
- February 2008
- June 2007
- February 2007
- August 2005
- February 2002

innovation, Leadership, innovation excellence, Blogartikel, big data, Articles, data management, Data Education, Education Resources For Use & Management of Data, lean manufacturing, & Education, lean, Data Daily | Data News, Quality Insider Article, Twitter Ed, Business, Six Sigma, Management, Management Article, Digitalisierung, systems thinking, lean six sigma, Gastbeiträge, strategy, Lean Management, Big Data News, Operations Article, Smart Data News, Interviews, kaizen, Problem solving, Soft Skills, The Latest, Change, continuous improvement, marketing, Uncategorized, systems view of the world, Organization, Theory of Constraints, quality, Personal, Immobilien, Culture, statistics, agile, MPD, Videos, Sekretariat & Assistenz, Banken