Business characteristics classification models

Carpe Data

About the Project

A business owner usually needs to buy insurance to protect their business against property loss and general liability. And insurance carriers would adjust the premium based on how much risk a business has. Risk can refer to the likelihood of a loss, and the severity of a potential loss.

Insurance carriers are particularly interested in detecting businesses in “risky” industries or detecting risks in businesses. We have defined three risk groups of interest, “explosive”, “entertainment”, and “traffic”. If a business has one or more characteristics in the table attached in the appendix, it is categorized in related risk groups.

Carpe Data is hoping to improve its current flagging methodology for businesses with risky behavior through the creation of a machine learning model that will categorize businesses into risk groups (defined below). The model will utilize different business information such as its name, industries, and social media data points (i.e. textual reviews, images, etc.) to help label businesses that fall into the “Entertainment” or “Traffic” risk groups. Note that these groups are NOT mutually exclusive and businesses may overlap with each other.

For this project, students will apply new NLP techniques and ML methods to label businesses according to risk groups. Students will validate their approach using manually labeled data and compare their approach with Carpe Data’s current classification method.


  • Noa Rapoport
  • Haoming Deng
  • Lex Navarra
  • Dan Le
  • Yutong Wang


  • Andy Chen, Carpe Data
  • Amy Huynh, Carpe Data
  • Joshua Bang, UCSB


About Carpe Data

Providing insurance companies with next generation data solutions, Carpe Data gathers and refines a range of emerging and alternative data sources that spans social media, online content, and everything in between. The result? Insurers gain a deeper insight of risks, enhancing all facets of the insurance lifecycle.