Forecasting Project Delays with Contextual Task Information and Machine Learning
Xiaojia Guo (Robert H. Smith School of Business, UMD), Yael Grushka-Cockayne (Harvard Business School), Bert De Reyck (Lee Kong Chian School of Business, Singapore Management University)
Delays and cost overruns are inherent problems in most projects. For instance, Apple canceled their wireless charging pad, the AirPower, after numerous delays and the company failed to accomplish of what they hope (CNBC 2019). The renovation of Rio’s Maracana stadium was completed four months behind schedule, causing a significant risk of having nowhere to hold the 2014 World Cup finals(BBC News 2013). Another major construction project, the bullet train that would link San Francisco to San Diego, was canceled in 2019 due to massive delays, cost overruns and mismanagement (Los Angeles Times 2019).
In this paper, we propose a new approach to predict project durations or delays using machine learning and Nature Language Processing (NPL) techniques. Specifically, we first apply NLP techniques to extract characteristics from individual tasks' names and descriptions. Next, the characteristics extracted from the target task, together with those of the task’s predecessors, successors and parallel tasks, are used as predictors to forecast the distribution of the task’s duration. With the predictions of task durations at hand, the distribution of the entire project's duration can be obtained by running simulations.
Since most predictive models can only deal with numerical or categorical variables, text information can not be directly used as predictors. In operations and marketing literature, NLP techniques have been applied to measure the sentiment of consumer reviews (Archak et al. 2011, Ghose et al. 2012), online chats (Das and Chen 2007), and movie scripts (Eliashberg et al. 2007). Sentiment analysis is intuitive, and its result can be easily applied as variables in a predictive model. However, since names and descriptions of project tasks do not have strong sentiment, other NLP techniques need to be explored.
There are two major challenges when using text information in forecasting project delays. First, how to use text information to model the structure of the project, or the dependencies between tasks, is not trivial. Second, in existing literature, the predictive models that use numeric representations of text information as predictors often consider only the effect of text, without incorporating other predictors in the model. In this paper, we propose new models to solve these two challenges. We build and test our new approach using a unique data set that contains over 250,000 schedules and actual timelines of construction projects in the UK and EU.
(More Details Coming Soon)
References:
CNBC. 2019. Apple cancels AirPower, the wireless charging pad it announced over a year ago. https://www.cnbc.com/2019/03/29/apple-cancels-airpower.html.
BBC News. 2013. ‘Problems’ as Maracana stadium reopens in Rio. https://www.bbc.co.uk/news/world-latin-america-22320825.
Los Angeles Times. 2019. Bullet train went from peak california innovation to the project from hell. https://www.latimes.com/local/lanow/ la-me-bullet-train-california-problems-20190213-story.html.
Archak, N., Ghose, A., Ipeirotis, P.G., 2011. Deriving the pricing power of product features by mining consumer reviews. Management science, 57(8), pp.1485–1509.
Ghose, A., Ipeirotis, P.G., Li, B., 2012. Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science, 31(3), pp.493–520.
Das S.R., Chen M.Y., 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management science, 53(9), pp.1375–1388.
Eliashberg J., Hui S.K., Zhang Z.J., 2017. From story line to box office: a new Approach for grenn-lighting movie scripts. Management Science, 53(6), pp.881-893.