COMP0050 Assignment
Data
Download from moodle the file COMP0050CourseworkData.zip.
This contains two datasets:
1- peerToPeerLoans.csv: The data come from George, N. (2018) (All
Lending Club loan data version 6, February 2018, www.
kaggle.com/wordsforthewise/lending-club). This is a subset of the
datasets used in Turiel, J. D., & Aste, T. (2020). The variable of
interest is charged_off, which takes value 1 if a debtor is not repaying
the loan (0 otherwise).
2- stockReturns.csv: this dataset contains 500 daily percentage stock
returns for 50 assets.
Tasks
There will be two tasks corresponding to the two datasets:
- The task is to build a model to predict whether a customer will default
on their loan. You should compare the performance of different
methods (e.g. logistic regression, classification trees/forests) in terms
of their ability to correctly predict loan defaults. You are free to focus
on a subset of the data (e.g. a reduced set of features, or a subset of
the loans) and to manipulate the data as you like, but you should
explain your rationale. - Focus on the global minimum variance portfolio. Compare the
portfolio variance using two different regularizers. Use validation
methods to find the optimal values of the parameters.
For both tasks, justify whether you want to focus only on subsamples of the
data. You are also free to explore questions related to the data and the
tasks you think are interesting, as long as your analysis includes the
development of predictive models of defaults for what concerns task 1 and
regularized portfolio optimization for task 2.
Useful references in relation to the above tasks are the following
Turiel, J. D., & Aste, T. (2020). Peer-to-peer loan acceptance and default
prediction with artificial intelligence. Royal Society open science, 7(6),
191649.
Fastrich, B., Paterlini, S., & Winker, P. (2015). Constructing optimal sparse
portfolios using regularization methods. Computational Management
Science, 12(3), 417-434.
Brodie, J., Daubechies, I., De Mol, C., Giannone, D., & Loris, I. (2009).
Sparse and stable Markowitz portfolios. Proceedings of the National
Academy of Sciences, 106(30), 12267-12272.
Written report
A brief written report (maximum 8 pages, with a maximum 4 pages for each
task) containing the justification of the approach, the results of your
analysis, and a discussion of your results should be submitted to Moodle
before the deadline of Wednesday 06/04/2022 at 16:00.
Marking?
This assignment is worth 100% of the overall mark (50% for each task).
The marking will be based on the following criteria (with uniform weights):
1) Clarity of presentation and explanations?
2) Validity of results ?
3) Critical interpretation of the results?
WX:codehelp