Show HN: Analyzing PPP Loan Fraud with Advanced Python Data Analysis

github.com

1 points by eigenvalue a day ago

I recently made a quite elaborate system for systematically finding suspected fraudulent loans in a giant 8.4gb CSV dump of PPP loan data using lots of interesting Python data science techniques. The entire thing is open-source, and you can easily replicate the findings, which are depressing.

If you just want to see the complete final outputs of the analysis that looks at the most suspicious looking loans (after scoring them using a powerful model that looks at many indicators of fraud), you can see them here:

https://raw.githubusercontent.com/Dicklesworthstone/ppp_loan...

I did all of this work in the last couple days, mostly using Grok3, which was a really great way to get familiar with this new and very powerful model. I was impressed with how well it worked, both in terms of helping to come up with ideas for the system and also implementing it.

I also wrote a blog post about it with more details (although the readme file in the repo is probably more informative, if technical):

https://fixmydocuments.com/blog/02_ppp_loan_fraud_analysis