Multi-class claims activity classification based on HTML data

About the Project

Our project focuses on building machine learning classification models in Python that 1) predict and flag whether or not a web page contains evidence about a fraud claim, and 2) provide information about the specific types of activity that is present.

Insurance companies typically receive a significant amount of claims each year. By automatically flagging and classifying web pages that have information potentially relevant to the claims, our project would help to significantly reduce the amount of manual inspection required for potential cases of insurance fraud.

Our dataset provided by the Carpe Data team contained 42,485 observations representing web pages potentially containing claimant activity information. Ultimately, our best performing model was our SVM classifier with a weighted average precision of 85%.

Team

Tyler Chia
Anum Damani
Annie Huang
Alex Rudolph
Rithvik Vobbilisetty

Mentors

Crystal Zhang, Sponsor
Kevin Neal, Sponsor
Joshua Bang, TA

Presentation

Capstone Showcase Poster582.9 KB

About Carpe Data

Providing insurance companies with next generation data solutions, Carpe Data gathers and refines a range of emerging and alternative data sources that spans social media, online content, and everything in between. The result? Insurers gain a deeper insight of risks, enhancing all facets of the insurance lifecycle.

Multi-class claims activity classification based on HTML data

About the Project

Team

Mentors

Presentation

About Carpe Data

UCSB Contact

Cal Poly Contact

Website