Stage 0: Form Team
Lokananda Dhage
Mary Feng
Stage 1: Problem Definition
We are planning to analyze information about restaurants in the Madison, WI area. We obtained data from the Zomato API and the Yelp dataset challenge. Each Yelp review and Zomato review will be one of our text documents for Stage 2.
Stage 2: Information Extraction
We performed information extraction on 300 randomly selected Yelp reviews.
Stage 3: Entity Matching
Since our Yelp/Zomato dataset had fewer than 3,000 tuples in each table, we switched to a different dataset for this stage of the project. We performed entity matching between a Song table with 961,593 tuples, and a Track table with 734,485 tuples.
Stage 4: Data Merging
We returned to our Yelp/Zomato dataset. We combined the two into a single dataset in CSV format.
Stage 5: Data Analysis
We performed correlation discovery on our merged Yelp/Zomato file.