This project analyzes hospital inpatient operations, efficiency, and costs using the NY SPARCS De-Identified Inpatient Discharges (2021) dataset.
The goal is to evaluate whether hospitals are running smoothly and identify opportunities for improving:
- Staff planning
- Bed utilization
- Cost control
- Emergency workload management
The project follows a real-world data analyst workflow using Excel, SQL, Python, and Power BI, designed for executive, operations, and finance stakeholders.
- Assess hospital operational efficiency
- Identify high-cost and high length-of-stay (LOS) drivers
- Improve:
- Bed utilization
- Staff planning
- Cost control
- Emergency admissions management
Source: NY SPARCS Inpatient Discharges (De-Identified) – 2021
Original Dataset Size
- Rows: 2.13 million
- Columns: 33
Used in This Project
- Rows: 100,000 (sampled due to system limitations)
Key Fields Used
- Admission type
- Length of stay (LOS)
- Diagnosis (CCSR)
- Hospital / Facility
- Payment typology
- Emergency indicator
- Total charges
- Total costs
Note: Sampling was done only due to hardware constraints.
The analysis logic is fully scalable to the complete dataset.
Purpose: Initial data validation and understanding
Steps Performed
- Opened and explored raw data
- Checked date formats
- Handled missing values
- Removed duplicates
- Performed consistency checks
- Validated numeric columns (LOS, costs, charges)
Analysis
- Pivot tables for patient counts, LOS, and costs
- Quick charts for high-level insights
Purpose: Answer stakeholder-driven business questions
SQL Concepts Used
COUNT,SUM,AVGMIN,MAXGROUP BY,ORDER BYWHERE,HAVINGCASE WHENDISTINCT,TOPJOIN,LEFT JOIN
Key Insights Generated
- Top diagnoses by patient volume and cost
- Emergency vs non-emergency cost comparison
- Length of stay by hospital and age group
- Revenue vs cost by payment type
Purpose: Deeper exploratory analysis and validation
Steps Performed
- Data cleaning & preprocessing
- Exploratory Data Analysis (EDA)
- Visualization of:
- Cost drivers
- Length of stay patterns
- Age-group trends
Purpose: Communicate insights clearly to decision-makers
Audience: Hospital Directors / Senior Management
Audience: Hospital Operations & Admin
Audience: Finance & Medical Teams
- Emergency admissions contribute disproportionately higher costs
- Older age groups (50+) have longer average hospital stays
- Certain diagnoses drive both high costs and long LOS
- Government insurance dominates patient volume but shows margin pressure
- Private insurance demonstrates more stable revenue-to-cost balance