After formulating an experimental plan, we needed to have some datasets to run those experiments on. There are some requirements that classify a good dataset for this task: it must be non-trivial in size, needs to include a protective attribute (such as gender or race), and it must have some true ranking. While it is possible to find a datasets that satisfies two out of the three requirements, it becomes difficult to satisfy all of the requirements.
Encompassing a dataset from Rankit, list of Fifa 2018 players was added as a possibility. It contained 17981 players, has a protective attribute (age and nationality), and potentially has a true ranking. The true ranking can be based on goals scored, or money earned by the player. Some ranking data can also be found over this datasets, although it does not encompass all of the players.
Hospitals and doctors also have a significant amount of data, both attribute and entity-vise. However, finding a true ranking might prove to be impossible.
Taking into account the feedback received from the our mentors, we updated the section analyzing the outcome of the online user study. We updated the machine learning section to include more references and added more charts to the whole paper.
The team also discussed the next steps and observed new features to be implemented.
Finishing off my sections from last week, I was done writing. I spent the most of the week grammars checking the overall paper. Wanting to write some more information on the machine learning part of the project, I ended up reading some papers on different types of ranking.
“Learning to rank.” Learning to rank - RLScore 0.7 documentation, staff.cs.utu.fi/~aatapa/software/RLScore/tutorial_ranking.html#learning-to-rank.
Li H. (2011) A short introduction to learning to rank. IEICE Trans. Inf. Syst. , 94, 1854–1862
Liu, Tie-Yan. "Learning to rank for information retrieval." Foundations and Trends® in Information Retrieval 3, no. 3 (2009): 225-331.