As most people are new to Machine Learning, in addition to my earlier blog on Getting Started Tips, I have decided to post the following Python script which uses Logistic Regression to make predictions.
You should be able to get auc score of around 0.63, which will put you in approximately 8th position as at the time of writing this blog.
Any comments please post here or in the competition forum.
__author__ = 'Teh Loo Hai'
__website__ = 'www.actuaries.com.my'
import pandas as pd
import numpy as np
from sklearn import linear_model
if __name__ == "__main__":
train = pd.read_csv('../input/SAStraining.csv')
test = pd.read_csv('../input/SAStest.csv')
# select numeric features
features = ['time_in_hospital', 'num_lab_procedures',
'num_procedures', 'num_medications',
'number_outpatient', 'number_emergency',
'number_inpatient', 'number_diagnoses']
# fill nan with 0
train[features].fillna(0, inplace=True)
# set random number seed
np.random.seed(4321)
# build logistic regression model using numeric features only
model = linear_model.LogisticRegression()
model.fit(train[features], train['readmitted'])
# make predictions on test data
preds = model.predict_proba(test[features])
# create submission file
submission = pd.DataFrame({'patientID': test.patientID,
'readmitted': preds[:, 1]})
submission.to_csv('submission-logistic.csv', index=False)