Regarding hypotheses and questions,
- What phenomena or properties are being investigated? Why are they of interest?
In this paper graph based document similarity is investigated. This approach is “exploiting explicit hierarchical and transversal relations”. The efficiency of these should be shown. The topic is of interest because word-distribution-based document representations are problematic when the language or vocabulary differ and graph-based approaches are “infeasible in many applications”
- Has the aim of the research been articulated? What are the specific hypotheses and
research questions? Are these elements convincingly connected to each other?
The specific hypotheses are that the new approach provides a “significantly higher correlation with human notions of document similarity”, that this approach “holds for short documents
with few annotations”, and that the “document similarity can be calculated
efficiently compared to other graph-traversal based approaches”.
- To what extent is the work innovative? Is this reflected in the claims?
The approach should be more effiecient.
- What would disprove the hypothesis? Does it have any improbable consequences?
The hypothesis would be disproven if the graph-based approach would be less efficient than other approaches. If it would not produce a higher correlation with human notions of document similarity (for short documents with few annotations) the hypothesis would be disproven.
- What are the underlying assumptions? Are they sensible?
- Has the work been critically questioned? Have you satisfied yourself that it is
Regarding evidence and measurement,
- What forms of evidence are to be used? If it is a model or a simulation, what
demonstrates that the results have practical validity?
As evidence an experiment is used.
- How is the evidence to be measured? Are the chosen methods of measurement
objective, appropriate, and reasonable?
The evidence is measured with “Pearson and Spearman correlation plus their harmonic mean, as well as ranking quality using Normalized Discounted Cumulative Gain” These correlation measures are used in related work, making the results comparable.
- What are the qualitative aims, and what makes the quantitative measures you have
chosen appropriate to those aims?
- What compromises or simplifications are inherent in your choice of measure?
- Will the outcomes be predictive?
- What is the argument that will link the evidence to the hypothesis?
- To what extent will positive results persuasively confirm the hypothesis? Will
negative results disprove it?
Positive results will show that for this set of documents the hypothesis is correct. Negative results will disprove the hypothesis for this set of documents.
- What are the likely weaknesses of or limitations to your approach?
This approaches needs at least one annotation.