Understanding & Predicting Length of Stay (LOS) using Machine Learning

Length of Stay (LOS) is perhaps one of the most closely watched metrics in inpatient Hospital settings. From Wikipedia on what LOS means:
A common statistic associated with length of stay is the average length of stay (ALOS), a mean calculated by dividing the sum of inpatient days by the number of patients admissions with the same diagnosis-related group classification. Length of stay (LOS) is a term to describe the duration of a single episode of hospitalization. Inpatient days are calculated by subtracting day of admission from day of discharge.

LOS is a big focus area for Insurance companies & hospitals. For example, Medicare, through its Bundled Payments for Care Improvement (BPCI) Initiative,  aims to pay a flat fee for a single type of surgery such as Knee Replacement. In that scenario, Hospitals are extremely motivated to reduce the LOS for a single surgery since that reduces the costs of the Hospital while keeping the same fee payment. Given that context, predictive capabilities around LOS are an extremely important area for innovation.

Dexur analyzes large scale medical claims data sets to identify LOS by discharge & diagnoses area for all hospitals. We have created a large and enriched LOS data set based on claims for all hospitals to aid in the development of machine learning models. If you are a healthcare researcher & want access to these data sets, please contact us & we can collaborate on a project. A simple chart based on the data set showing the top Discharge Groups by LoS at Mayo Clinic at Rochester is given below & the details can be seen here.

In addition, we have also shared 5 Machine Learning studies that try to predict & understand LOS in Hospitals: 

  1. IMPROVED PREDICTION OF HOSPITAL LENGTH OF STAY FOR SEVERE INJURY: There are limited beds in hospital trauma wards, and yet there is a constant demand for these beds by the inflow of severely injured patients. Many patients are initially allocated to these beds when they could be better treated in another specialised ward. If we could accurately classify patients with hospital length of stay (LOS) of 2 days or less versus those who require longer stays, we could make a more informed decision whether or not to place them in another ward when they are admitted, rather than wasting time and resources transferring them to another ward later. The study was conducted on two datasets: one with 2546 records from the Trauma Services Centre at the Royal Prince Alfred Hospital in Sydney, consisting of trauma patients admitted to the centre between 2007–11; the other from the Hospital das Foras Armadas in Portugal with 17546 records collected from 2000–13 and covering a wide range of medical diagnoses. The authors investigate feature transformation and selection techniques in the construction of a LOS prediction model for trauma patients. They also apply and evaluate a comprehensive range of classification algorithms on data from the trauma domain as well as from a general hospital setting. In addition, the authors propose a new nearest neighbour (NN) algorithm, ranked NN, which takes into account the predictive relevance of features when computing the distance to the nearest neighbors.

  2. Length of Stay Prediction and Analysis through a Growing Neural Gas Model: In this work a novel unsupervised LoS prediction model is presented which performs better than other ones commonly used in this kind of problem. The developed model detects autonomously the subset of non-class attributes to be considered in these classification tasks, and the structure of the trained self organizing network can be analysed in order to extract the main factors leading to the overcoming of regional LoS threshold. The Growing Neural Gas (GNG) model is capable in identifying exactly the local dimension of the input space. The paper explains how the authors obtained a higher accurate prediction by the use of GNG in comparison with other algorithms which are commonly used in these kind of problems.

  3. MACHINE LEARNING TECHNIQUES FOR PREDICTING HOSPITAL LENGTH OF STAY IN PENNSYLVANIA FEDERAL AND SPECIALTY HOSPITALS: For inpatient care units, two variables play an important role in determining hospital resource utilization. The first variable is predicting a patient’s hospital length of stay (LOS), and second variable is predicting readmissions [Kelly et al. (2013)]. Ideally, a hospital must minimize both variables to provide high-quality healthcare and improve resource utilization. Predicting hospital LOS allows a hospital to predict discharge dates for a patient admitted to the hospital, which in turn allows improved scheduling of elective admissions leading to reduce variance in hospital bed occupancies. Predicting LOS also allows a hospital to scale its capacity during its longterm strategic planning. In this paper, we compare three different machine learning techniques for predicting length of stay (LOS) in Pennsylvania Federal and Specialty hospitals. Using the real-world data on 88 hospitals, the authors compare the performances of three different machine learning techniques—Classification and Regression Tree (CART), Chi-Square Automatic Interaction Detection (CHAID) and Support Vector Regression (SVR)—and find that there is no significant difference in performances of these three techniques. However, CART provides a decision tree that is easy to understand and interpret. The results from CART indicate that psychiatric care hospitals typically have higher LOS than nonpsychiatric care hospitals. For non-psychiatric care hospitals, the LOS depends on hospital capacity (beds staffed) with larger hospitals with beds staffed over 329 having average LOS of 13 weeks vs. smaller hospitals with average LOS of about 3 weeks.

  4. A Comparison of Supervised Machine Learning Techniques for Predicting Short-Term In-Hospital Length of Stay Among Diabetic Patients: Due to the growing number of hospitalized diabetic patients, predicting the average length of stay (LOS) has become increasingly important for both resource planning and effective admission scheduling. Obtaining LOS estimates is useful for planning future bed usage, determining specialists for patients with multiple diagnoses, determining health insurance schemes and reimbursement systems in the private sector, planning discharge dates for elderly patients, and allowing families to better plan for the return of their relatives. In this paper, the authors compare and discuss the performance of various supervised machine learning algorithms (i.e., multiple linear regression, support vector machines, multi-task learning, and random forests) for predicting long versus short-term length of stay of hospitalized diabetic patients.

  5. Real-time prediction of inpatient length of stay for discharge prioritization: Hospitals are challenged to provide timely patient care while maintaining high resource utilization. This has prompted hospital initiatives to increase patient flow and minimize nonvalue added care time. Real-time demand capacity management (RTDC) is one such initiative whereby clinicians convene each morning to predict patients able to leave the same day and prioritize their remaining tasks for early discharge. Our objective is to automate and improve these discharge predictions by applying supervised machine learning methods to readily available health information. The authors use supervised machine learning methods to predict patients’ likelihood of discharge by 2 p.m. and by midnight each day for an inpatient medical unit. Using data collected over 8000 patient stays and 20 000 patient days, the predictive performance of the model is compared to clinicians using sensitivity, specificity, Youden’s Index (i.e., sensitivity þ specificity – 1), and aggregate accuracy measures. The model compared to clinician predictions demonstrated significantly higher sensitivity (P<.01), lower specificity (P<.01), and a comparable Youden Index (P>.10). Early discharges were less predictable than midnight discharges. The model was more accurate than clinicians in predicting the total number of daily discharges and capable of ranking patients closest to future discharge. Conclusions There is potential to use readily available health information to predict daily patient discharges with accuracies comparable to clinician predictions. This approach may be used to automate and support daily RTDC predictions aimed at improving patient flow.