Multi-class claims activity classification based on HTML data

Carpe Data

About the Project

Our project focuses on building machine learning classification models in Python that 1) predict and flag whether or not a web page contains evidence about a fraud claim, and 2) provide information about the specific types of activity that is present.

Insurance companies typically receive a significant amount of claims each year. By automatically flagging and classifying web pages that have information potentially relevant to the claims, our project would help to significantly reduce the amount of manual inspection required for potential cases of insurance fraud.

Our dataset provided by the Carpe Data team contained 42,485 observations representing web pages potentially containing claimant activity information. Ultimately, our best performing model was our SVM classifier with a weighted average precision of 85%. 


  • Tyler Chia
  • Anum Damani
  • Annie Huang
  • Alex Rudolph
  • Rithvik Vobbilisetty


  • Crystal Zhang, Sponsor
  • Kevin Neal, Sponsor
  • Joshua Bang, TA


About Carpe Data

Providing insurance companies with next generation data solutions, Carpe Data gathers and refines a range of emerging and alternative data sources that spans social media, online content, and everything in between. The result? Insurers gain a deeper insight of risks, enhancing all facets of the insurance lifecycle.