Found 1 repositories(showing 1)
hsudarshan
Goal: Learn Parallel Processing of Big Data using Hadoop MapReduce and a Build a Dashboard(s) for Analysis and Visualization of the Results Context: Class room scheduling for courses is complex problem. It is all the more difficult in a department where the enrollments are increasing and number of courses and class sizes are increasing. Consider the case of this course (CSE4/587): I requested a larger room at the beginning of the semester. We have to send in a formal request through a departmental secretary and the reply comes a week later and it is always negative. For example they could not give a room larger than NSC 215 (150 cap) for this course. I requested a larger room for the midterm exam. They answer was negative. I found out that all the information about courses and classrooms is in a database and it is publicly available through a web site: http://www.buffalo.edu/class-schedule?semester=spring for example gives the Spring semester’s courses. A web crawler can get this information by scraping the web site resulting in very large unstructured text data. Any authorized person can also get this information directly from the database. That’s what we have. I will send you the link in a message to the class. DO NOT SHARE it with anybody beyond this course. Download and save this data in csv file as CourseRoom.csv. This is the main data set you will work with. You can get other data sets based on the needs of your analysis.
All 1 repositories loaded