Classification Practice & College Analysis

This two-folded project uses various supervised binary classification algorithms to predict the for-profit or non-profit status of colleges as well as evaluates and digs into different factors relating to for-profit and predatory institutions in the higher education system. The data and explanatory variables were extracted and selected by Lauren from a governmental database.

Before performing any classification, bivariate associations between each explanatory variable and the profit status were first observed by using the SelectKBest method with Chi-Square test. Reasonably,the lowest p-values corresponded to variables that were mostly financially related, such as ‘faculty_salary’, ‘tuition_revenue_per_fte’, ‘pell_grant_debt’ and ‘instructional_expenditure_per_fte’, meaning that these variable have (statistically) significant influences on profit status. But it also seemed counterintuitive that factors including retention rate, whether or not complete repayment in 7 years, and whether receiving accreditation appeared very loosely related to the profit status. For-profit schools should have had a greater financial impact on students by impeding their loan repayment process and even making them unwilling to stay in the same school anymore. Also, I expected accreditation to be provided to more non-profit institutions so that students were not encouraged to go to those predatory colleges. It made me wonder what standards had been applied to determine the profit status of colleges and what factors accreditations from both federal and state levels took into account.

In the classification part, I did a 70/30 train/test split in the beginning and separately normalized all quantitative columns in my explanatory variable matrices of the train and test sets to the same scale. After adding the categorical variables back to the matrices, I applied 6 different classification algorithms successively on my train and test sets. The logistic regression rendered 92% test accuracy. The K-nearest neighbor with optimal k=5 had 93% accuracy. It also did a decent job classifying in general as indicated by the metrics in classification report and the confusion matrix. The four other classifiers, Naive Bayes, gradient boosting, random forest, and SVM, along with these two were plotted into a comparative ROC curve. They all did pretty well with auc scores close to 1, which signified that they all had near perfect classifications. The random forest classifier was used to predict profit status a random new school in the test set. A 2D decision boundary in the logistic regression model of profit status was plotted on the ‘instructional_expenditure_per_fte’ and ‘5_year_declining_balance’ space:

In the next part, I came up with an original ranking method of colleges that incorporated 7 different variables (scaled) in its calculations. Among the 7 variables, 5 of them were considered positively related to the “goodness” of a school while the other twos were the opposite. The score was simply calculated as the arithmetic average effect of all these 7 variables. Based on this method, the top 5 worst schools were Mesabi Range College, Lincoln Technical Institute-Lowell, Success Schools, Citizens School of Nursing, and Bryant & Stratton College-Albany.

The data analysis above made me think about crucial factors to pay attention to if I were a governmental organization overseeing accreditation and making sure that the college was non-predatory. With some additional backup research, my list is the following:

-The Net Price of colleges: It is the the amount that a student pays per year in an academic institution after substracting any scholarships or grants that the student receives. Compared with tuition, it is a fairer metric to evaluate the actual cost of colleges, as described here

-Whether colleges are faithful in their advertising process: Predatory colleges use distorted information and deceptive tactics in their advertisements for prospective students to enroll their school and leave them with enormous debts. One of the examples of such is the largely online, for-profit Ashford University. According to this article, “Ashford’s salespeople made a wide variety of false and misleading statements to prospective students to meet their enrollment growth targets, including how much financial aid students would get, how many prior academic credits would transfer into the school, and the school’s ability to prepare students for careers in fields like social work, nursing, medical billing, and teaching;For-profit Ashford misled investors and the public in its filings with the Securities and Exchange Commission by inflating the percentage of working alumni who reported that their Ashford degree prepared them for their current job” Its students not only owe billions in federal loans, but Ashford claims that they also mean hundreds of millions of dollars directly to the school. To collect that money, Ashford has engaged in aggressive and illegal practice like threatening and imposing unlawful debt collection fees. These nefarious predatory behaviors should be examined closely.

-Whether the colleges are targeting students of minority groups: Predatory colleges often utilize the weakness and desire of achieving higher education of the minority students to defraud them. From the same example, Ashford also especially harms students of color given the school’s intentional racial targeting of minority students. Minority students thereby are having a greater risk of debt, default, and economic distress. Research also shows that black college enrollment has increased at nearly twice the rate of white enrollment in recent years, but a disproportionate number of those African-American students end up at for-profit schools. However,for-profit graduates fare little better on the job market than job seekers with high school degrees; their diplomas, that is, are a net loss, offering essentially the same grim job prospects as if they had never gone to college, plus a lifetime debt sentence.

-Whether the schools are reaping profits from federal student aids: A major way in which predatory schools profit and exploit students is sucking money from federal student aids. For-profit schools recruit heavily in low-income communities, and most students finance their education with a mix of federal Pell grants and federal student loans. But government-backed student loans max out at $12,500 per school year, and tuition at for-profits can go much higher. For example, at schools like ITT Tech. tuition runs up to $25,000 and they can receive up top 90% of their revenues from government money. For the remaining 10 percent, they count on veterans—GI Bill money counts as outside funds—as well as scholarships and private loans. Such schools now enroll around 10% of America’s college students, but take in more than a quarter of all federal financial aid—as much as $33 billion in a single year.

-Debt-to-earnings ratio: When the graduates spend too much of their incomes paying back their student loans, then the schools that they go to can be highly predatory. One example is the Harvard A.R.T. Institute. On average, graduates earn about $36,000 per year. However, students borrow on average $78,000 to obtain their degrees. After accounting for basic living expenses, the average Harvard A.R.T. Institute graduate has to pay 44 percent of discretionary income just to make the minimum loan payment. Such enormous burdens should not be allowed to impose on students seeking education.

You can find my code located here

Written on November 7, 2019