Organisational and business context (3 min):
- New lending product in a “risky” segment
- Approach is to provide access, but price-for- risk
Analytical Problem (2 mins):
- Large amounts of customer data, from traditional and emerging sources
- Very young lending program. Only 1000 disbursals, not “bads” as yet
- How to estimate risk even as the data is thickening?
Data Structure (5 mins):
- Specific data from the lending program
- Nearly complete set of customer features at the time of booking: application form, credit bureau, “big-data” footprint through mobile phone, employment details, bank statements etc.
- Partial outcome data: missed payments but no defaults.
- Surrogate data from the credit bureau
- Nearly complete outcome data: actual defaults on all organized sector loans.
- Restrictions on data access, must be processed at the bureau.
- Limited data on customer features: “big-data” footprint, bank statement insights, employment details etc. are not available at the bureau.
- Adding new data every month
- Missed payment/ default outcomes, and qualitative diagnostics provided by domain experts.
Analytic Approach #1 (6 mins)
- Build model on surrogate outcomes by merging our partner’s data with credit bureau data, to the
extent data sharing was possible.
Analytic Approach #2 (6 mins)
- Build model on our own partial outcomes, using data sources from app form, credit bureau, “big- data” from mobile phone, summarized bank statements etc.