Fighting for breath – Style analysis

The article is published in a radio listing/newspaper, so we cannot expect the same standards for style as in scientific work, which we have discussed in the seminar. The article misses references and shows an overall more casual style. We would see this as flaws in scientific writing but can accept it in this type of literature.

The casual style is expressed in the use of contractions (“isn’t”, “wasn’t, …) and exclamation marks (“near my home!”, “on the back of the truck!”). Furthermore, some vague analogies are used, which might be unfamiliar to non-native speakers (“thick like pea soup”, “political minefield”, “uphill struggle”), to achieve a more colorful description. Naturally the title follows this casual style as well, as it is more catchy than informative. Note that these points are acceptable when writing popular science but should be avoided in real scientific work.

Other guidelines however, hold for both kinds of literature. So in the following I will point out where the article adheres or violates these more general applicable guidelines. The article has a clear read thread which is easy to follow. This is achieved by using simple sentence structure and simple language. Furthermore, the balance of sentence and paragraph length is chosen well, so it is neither monotone nor disruptive to read. But the article also comprises some minor flaws. Sometimes it gives unnecessary information like the numbers of cars, vans and trucks in traffic which merely illustrate the huge amount of vehicles producing exahust gasses. Moreover, the article makes extensive use of parenthesis to give extra information which sometimes breaks the reading flow. Lastly, many words are printed in italics just for emphasis without the need for it.

To conclude, the article shows an overall good style for popular science because it provides a clear read thread and uses simple sentence structure and language but it has minor flaws which can break the reading flow.

Zobel’s Checklist

Since I don’t have a master thesis topic yet, I used the paper:

Paul, C., Efficient Graph-Based Document Similarity. In:
Proceedings of the ESWC 2016,

Regarding hypotheses and questions,

  • What phenomena or properties are being investigated? Why are they of interest?
    • The paper investigates automated document similarity meassurement. This is for example used in article recomendations on newspaper websites.
  • Has the aim of the research been articulated? What are the specific hypotheses and research questions? Are these elements convincingly connected to each other?
    • The aim is to present a graph-based algorithm to measure semantic similarity of documents, that:
      • (i) provides higher correlation with human notion of similarity than similar approaches
      • (ii) first hypothesis also holds for small documents with few annotations
      • (iii) is more efficient than other graph based approaches
  • To what extent is the work innovative? Is this reflected in the claims?
    • The algorithm is said to be more efficient than other graph-based algorithms. The used similarity measure is said to correlate more with human notion than comparable ones.
  • What would disprove the hypothesis? Does it have any improbable consequences?
    • (i) and (ii) similar approaches provide equal or higher correlation with human notion
    • (iii) An existing graph-based algorithm that is equally or more efficient with similar or better results
  • What are the underlying assumptions? Are they sensible?
    • Semantically annotated documents as are available as input. This is sensible because it helps to focus on comparing the documents rather than analyzing them.
  • Has the work been critically questioned? Have you satisfied yourself that it is sound science?
    • The paper shows that the proposed algorithm is better than some selected other ones on selected data. As I am not into the topic of semantic document comparison I cannot say whether the selected data and reference algorithms are representative. Additionally, the paper never states the limitations of the proposed algorithm. So the results do not appear very trustworthy to me.

Regarding evidence and measurement,

  • What forms of evidence are to be used? If it is a model or a simulation, what demonstrates that the results have practical validity?
    • An experiment on 2 datasets (one standard benchmark set and another with small documents)
  • How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?
    • The authors use standard metrics that were also used to evaluate the reference algorithms. So the 3 criteria seem to be fulfilled.
  • What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?
    • The authors want to show that their proposed algorithm comes closer to human notion of similarity and works more efficient than reference algorithms. Using the standard metrics is appropriate for that.
  • What compromises or simplifications are inherent in your choice of measure?
    • I am not well familliar with the used meassures, so I cannot say anything here.
  • Will the outcomes be predictive?
    • Yes, the hypotheses predict higher similarity meassures and less excecution time for the proposed algorithm in comparison to similar ones.
  • What is the argument that will link the evidence to the hypothesis?
    • The quantitative measures allow to directly compare the performance of the proposed algorithm with reference algorithms. (Is it faster? Is it closer to human notion?)
  • To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?
    • Since the authors do not state any constraints, they indirectly claim their algorithm performs better than reference ones in any circumstances. There for a positive result in an experiment may strongly support their hypothesis but not totally confirm it. However, negative results will directly disprove their hypothesis.
  • What are the likely weaknesses of or limitations to your approach?
    • I could not find any statements of the authors regarding weaknesses or limitations of their proposed approach