Python Starter Code for Asia Actuarial Analytics Challenge 2016 

AUC 0.63

As most people are new to Machine Learning, in addition to my earlier blog on Getting Started Tips, I have decided to post the following Python script which uses Logistic Regression to make predictions.

You should be able to get auc score of around 0.63, which will put you in approximately 8th position as at the time of writing this blog.

Any comments please post here or in the competition forum.

__author__ = 'Teh Loo Hai'
__website__ = 'www.actuaries.com.my'

import pandas as pd
import numpy as np
from sklearn import linear_model

if __name__ == "__main__":
    train = pd.read_csv('../input/SAStraining.csv')
    test = pd.read_csv('../input/SAStest.csv')

    # select numeric features
    features = ['time_in_hospital', 'num_lab_procedures',
                'num_procedures', 'num_medications',
                'number_outpatient', 'number_emergency',
                'number_inpatient', 'number_diagnoses']

    # fill nan with 0
    train[features].fillna(0, inplace=True)

    # set random number seed
    np.random.seed(4321)

    # build logistic regression model using numeric features only
    model = linear_model.LogisticRegression()
    model.fit(train[features], train['readmitted'])

    # make predictions on test data
    preds = model.predict_proba(test[features])

    # create submission file
    submission = pd.DataFrame({'patientID': test.patientID,
                               'readmitted': preds[:, 1]})
    submission.to_csv('submission-logistic.csv', index=False)

Posted by Loo Hai Tuesday, May 24, 2016 2:00:00 PM Categories: Machine Learning SAS
Rate this Content 0 Votes

Asia Actuarial Analytics Challenge 2016 

- Getting Started Tips

Singapore Actuarial Society (SAS) has recently launched the above competition to promote development of data analytics talent in Asia. If you don't know how to get started, the following are some tips:

  1. You need to have an invitation link before you can participate. You can find the invitation link in our April 2016 newsletter. Not sure whether you are eligible to participate, check the competition forum and if still unsure, ask the admin.
  2. Submit an all zeros submission by downloading and submitting the sample submission file. There you have it, you should achieve a score of 0.50000 and at par with the benchmark.
  3. Not happy with your score? Use a random number generator to generate your predictions. Submit your predictions and you should get a score either higher or lower than 0.50000. If you get a score higher than 0.50000, congratulations, you have beaten the benchmark! If your score is lower than 0.50000, just change your previous predictions by subtracting each one of them from 1 and submit again. Amazing, now you have outperformed the benchmark.
  4. Try something more actuarial. Fit a least squares regression line (e.g. using Excel) with "readmitted" as your y variables and choosing say "time_in_hospital" as your x variables. Use your regression line to make predictions and submit them.
  5. Improve your model by trying multiple regression (can still use Excel).
  6. Do more advanced stuff like GLM.
  7. Sign up for a machine learning course like the one run by SAS.

Good luck!

Posted by Loo Hai Friday, April 22, 2016 2:51:00 PM Categories: Machine Learning SAS
Rate this Content 0 Votes