Capstone Project in Data Science

(Fall 2020, Winter 2021, Spring 2021)

The course will study data science from the systems engineering perspective, introduce and address a variety of ethical issues that arise in data science projects, and engage students in project-based learning through a series of carefully selected and curated data science studies. A major overarching goal is to prepare students to make a positive impact on the world with data- intensive methodologies. In line with this, we will study and discuss a number of case studies in “ethics in data science” which emphasize responsible data practice. Another major focus will be on correctly interpreting, explaining, and communicating the results of analyses. This component of the course will focus on decision making under uncertainty, the role of correlation and causation, and drawing attention to common statistical traps and paradoxes that drive erroneous conclusions.

The Fall course is a lecture-based course with projects and papers. The capstone projects (pursued in Winter and Spring) will be interdisciplinary, will have outside customers, and will require students to apply skills or investigate issues across different subject areas or domains of knowledge. Students will work with leaders from the industry and research labs. See the sponsoring institutions at . Examples of projects include quantifying insect-plant network interactions, risk prediction, energy efficiency, inferring health from personal fitness devices, call tracking/analytics, and modeling of COVID-19.

Upon completing the course sequence, students will be able to understand the data science process and the structure and the role of each of its constituent steps; engineer the appropriate data science process for a given data analytical problem; design and implement evaluation studies to compare the quality of performed data analysis; understand technical trade-offs associated with working with “Big Data”; understand ethical implications of data science work, and be able to apply ethical reasoning to specific data science projects; visualize the results of data analytical studies, and convey them to customers.


  • Year-long capstone course:
    • Classroom instruction in Fall 2020: focus on the process of discovering knowledge from data, public policy, ethics, fairness, and statistical traps.
    • Followed by two quarters of faculty-mentored experiential project work. 
  • Synthesize course materials from individual machine learning, statistics, and data engineering courses, and place them in the context of concrete problems and datasets.
  • Culminates in an end-of-year showcase of projects to the local data science community
  • Develop skills such as
    • Oral communication and public speaking
    • Time management
    • Teamwork
    • Data analysis and informed decision making

Enrollment details: 4 units each quarter

  • Fall 2020: CMPSC 190DD, MW 5-6:30.
  • Winter 2021: CMPSC 190DE, times TBD.
  • Spring 2021: CMPSC 190DF, times TBD.

If you are interested, please fill out this course survey. 


Staff: Tim Robinson,
Faculty: Ambuj Singh,