Restaurant Review Dataset Cleaning and Database Integration

ABOUT THE PROJECT

This project involved preparing a 50,000+ restaurant review dataset of plagued by missing values, inconsistent data types, duplicates, and multi-valued fields. The cleaned data was uploaded to a MySQL database, providing a solid foundation for data management to enabling data-driven business insights in the restaurant sector..

PROJECT TOOLS, SKILLS AND ACTIVITIES

- Defined clear project goals and success criteria to ensure data quality and readiness for analysis - Conducted a thorough review of raw datasets to understand their structure, completeness, and consistency - Standardized the dataset by normalizing column names, fixing formats, and enforcing consistent data types - Handled missing data using targeted imputation where possible and excluded records that could not be reliably recovered - Removed duplicate records and ensured entity consistency using rule-based matching and deduplication - Identified and addressed outliers using statistical methods and domain knowledge to improve analysis accuracy - Engineered new features and aggregated metrics to enhance data understanding and management - Split and restructured multi-value fields into usable formats while maintaining relational integrity using indexing - Validated the final dataset through summary statistics and spot checks, delivering comprehensive documentation for easy handoff - Restructured and exploded multi-value columns: Dish_liked, Rest_type and Cuisines, into usable formats while maintaining relationships with primary table utilizing pandas index and created columns as primary keys and foreign keys

PROJECT LINKS

GitHub Repository: View Project on GitHub
Cleaned Datasets: View Datasets

Restaurant Review Dataset Cleaning and Database Integration

ABOUT THE PROJECT

PROJECT TOOLS, SKILLS AND ACTIVITIES

PROJECT LINKS

Latest Post

Restaurant Review Dataset Cleaning and Database Integration

Connect with Me