CMSC498T Introduction to Data Science II: Exploring, modeling and communicating with data
This page and the main course page serve as syllabus for this class.
Data science encapsulates the interdisciplinary activities required to create data-centric products and applications that address specific scientific, socio-political or business questions. It has drawn tremendous attention from both academia and industry and is making deep inroads in industry, government, health and journalism.
This course focuses on (i) exploratory and statistical data analysis, (ii) data and information visualization, and (iii) the presentation and communication of analysis results. It will be centered around case studies and projects drawing extensively from applications.
Héctor Corrada Bravo
Center for Bioinformatics and Computational Biology
Department of Computer Science
Office: 3114F Biomolecular Sciences Building
Phone Number: 301-405-2481
Lecture Meeting times
Tuesday and Thursday, 2:00pm-3:15pm
Room CSI 2120
Office Hours: Friday 1:00pm-2:00pm AVW 3223 and by appointment
TA: Wikum Dinalankara
Office Hours: TBD
Evaluation (see section below on logistics for details)
- Data Projects (4) (40%)
- In-class work (15%)
- Written homework (15%)
- Presentations (10%)
- Class participation (5%)
- Final project (15%)
Textbook information: There is no required textbook. However, we will be drawing heavily from these two sources:
- N. Zumel and J. Mount. Practical Data Science with R. Manning Publications Co. 2014. Readings from this book are posted in canvas: https://myelms.umd.edu/courses/1130499. Readings from this book are listed as Zumen & Mount.
- G. James, D. Witten, T. Hastie and R. Tibshirani. Introduction to Statistical Learning with Applications in R. Springer 2013.. A free PDF of this book can be downloaded at their site. Readings from this book are listed as ISL.
Additional readings will be posted in ELMS https://myelms.umd.edu/courses/1130499.
1) Students will be able to create specific requirements for a
data-centric application used to address a specific problem or
2) Students will be able to identify and select appropriate tools: language, libraries and data resources, to meet specific requirements for a data-centric application
3) Students will be able to build and disseminate a data-centric application from a set of specific requirements using existing tools, libraries, data resources and publishing mechanisms.
Tentative Course Organization
- What is Data Science? Who is a Data Scientist? The scope of Data Science: the open data movement, science, business, government, education, sport, and more
- Introduction to the R data analysis environment
- Formulating data-centric answers to scientific, business and social questions
- Best practices: organizing projects, managing collaborations and expectations
Exploratory Data Analysis
- Data scraping, cleaning and summarization
- Visualization I: visualizing to explore
- Exploration in scale: introduction to map reduce
Computational and Statistical Data Analysis
- From exploration to inference: quantifying variation and uncertainty
- Linear modeling: regression and prediction
- Going further in prediction: machine learning
- Visualization II: visualizing inferences and uncertainty
Mining massive datasets
- Regression and prediction at scale
- Mining time series and data streams
- Mining networks
- Large-scale clustering and other unsupervised methods
Communicating with data
- Writing with and about data: communicating the result of a data analysis
- Visualization III: information visualization, visualizing for an audience
- Putting it together: interactive data dissemination on the web (d3.js and related technologies)
- There will be reading assignments. Students are expected to have read the material before class.
- Students are expected to attend lectures. Active participation is expected. There will be graded work done in class.
- Assignments are to be handed-in electronically or in class as instructed on their due date. Late assignments will not be accepted.
- There will be graded work to be done in class. Students not in class that day, except for an excused absence, will not be able to complete that work outside class.
- Students may discuss homeworks and projects in groups. However, each student must write and/or program solutions independently.
- Posting project solutions in a public online location without express consent and permission from the instructor is a violation of academic integrity policy.
- Cell phone usage is prohibited during lecture, laptop use will be allowed to the extent that students demonstrably use it to follow along an in-class analysis or demonstration.
- Using or referencing any materials from the web without proper citation is a violation to the honor code.
- In this course you are responsible for both the University’s Code of Academic Integrity and the University of Maryland Guidelines for Acceptable Use of Computing Resources. Any evidence of unacceptable use of computer accounts or unauthorized cooperation on tests, quizzes, or projects will be submitted to the Student Honor Council, which could result in an XF for the course, suspension, or expulsion from the University.
- Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester.
- Any student who must miss a class due to religious holidays should also notify the instructor during the first two weeks of class.
Policy on excused absences
- Any student who needs to be excused for an absence from a single
lecture, recitation, or lab due to a medically necessitated absence shall:
a) Make a reasonable attempt to inform the instructor of his/her
illness prior to the class.
b) Upon returning to the class, present their instructor with a self-signed note attesting to the date of their illness. Each note must contain an acknowledgment by the student that the information provided is true and correct. Providing false information to University officials is prohibited under Part 9(i) of the Code of Student Conduct (V-1.00(B) University of Maryland Code of Student Conduct) and may result in disciplinary action.
- Self-documentation: The self-documentation may not be used for the Major Scheduled Grading Events as defined below and it may only be used for only 1 class meeting (or more, if you choose) during the semester.
- Any student who needs to be excused for a prolonged absence (2 or more consecutive class meetings), or for a Major Scheduled Grading Event, must provide written documentation of the illness from the Health Center or from an outside health care provider. This documentation must verify dates of treatment and indicate the timeframe that the student was unable to meet academic responsibilities. In addition, it must contain the name and phone number of the medical service provider to be used if verification is needed. No diagnostic information will ever be requested.
Course evaluations are important and that the department and faculty take student feedback seriously. Students can go to the www.courseevalum.umd.edu to complete their evaluations.