here is the final version of my paper:
Thank you for reading! 🙂
here is the final version of my paper:
Thank you for reading! 🙂
a first version of my paper about ‘Operating system level virtualization’ is now available: kuntke_oslv
Enjoy reading (and reviewing!)
Due to a difficult mathematical style formatting inside wordpress, I put the text also into a pdf for a more professional look and feel: homework_describing_algorithm.pdf
Choose a simple algorithm and a standard description of it, such as linear search
in a sorted array. Rewrite the algorithm in prosecode. Repeat the exercise with a
more interesting algorithm, such as heapsort. Now choose an algorithm with an
asymptotic cost analysis. Rewrite the algorithm as literate code, incorporating
the important elements of the analysis into the algorithm’s description.
Therefore it compares each element e of with and count all comparisons until either an element equals or the end of the list is reached.
Heapsort() takes one list and sorts all elements of . In the first step all elements are sorted in a heap – a binary tree, with the property, that all root nodes are bigger or equal compared to the first child and smaller or equal compared to the second child. In the second step, a sorted list is created by removing repeatedly the biggest (root) node from the heap and appending it to the list.
Quicksort(l) sorts a given list by a divide and conquer approach. Therefore data is partitioned recursively into subpartitions, by comparing all elements with a so called pivot element. This pivot element is used to put lower and equal values into the one sublist and greater values into the another sublist. A well chosen pivot element is therefore needed for a balanced partitioning, meaning equal sized sublists.
The major steps of the algorithm are as follows:
We now examine these steps in detail, including a cost analysis.
Therefore we need as much comparisons as elements are inside :
The text “Fighting for breath” describes air pollution and its impact to our health.
My overall impression about the style of the text is good. Of course, there are some minor flaws, I would might change. For example the following (long) term inside brackets:
… (ultraviolet light acts on traffic fumes to produce ozone – a vital filter in the stratosphere but a highly irritant substance when breathed at ground level).
This example breaks my reading flow hardly. One better way could be to remove the additional facts, introduced by the dash. Another way might be to write complete non-bracketed sentences with those information.
The other bracketed information was used in a better manner and quite okay for an popular science news magazine/radio(?) article. For an scientific report some of the other brackets also had to been shortened or completely removed.
Another remark about the text is, that the use of exclamation marks should be avoided. At least in a scientific article this would not held an quality assurance.
Besides these two points (too heavy usage of bracketed information and the use of exclamation marks) I like the text due to the general style of writing.
2. Apply the Checklist from Zobel p.49 to the research plan of your (future) Master thesis.
Try to answer all questions. If you feel that a question is inappropriate
explain shortly why this is the case.
Due to the fact that I have not a definite master thesis topic yet, I done the task with the research question of the given paper.
• What phenomena or properties are being investigated? Why are they of interest?
The efficiency of graph based similarity index methods are investigated. The research question is: How a graph based similarity index method can be much faster, than traditional methods. This question is of interest, because the quality of graph based similarity index methods tends to be in some areas (e.g. document comparision) of higher quality compared to currently used word-distribution-based methods, but with lower speed.
• Has the aim of the research been articulated? What are the specific hypotheses and
research questions? Are these elements convincingly connected to each other?
The aim of the research is to show, that:
i) the similarity measure provides a significantly higher correlation with human notions of document similarity than comparable measures
ii) this also holds for short documents with few annotations
iii) document similarity can be calculated efficiently compared to other graph-traversal based approaches
These hypotheses are strongly connected to each other and therefore I think the connection is convincing.
• To what extent is the work innovative? Is this reflected in the claims?
The work is innovative, because current used similarity methods are not fast enough for a productive usage in many areas (e.g. web search tasks).
• What would disprove the hypothesis? Does it have any improbable consequences?
The result, that
i) the similarity measure produce a lower correlation with human notions of documents than existing word-distribution-based methods
ii) a much higher error for short documents
iii) the developed similarity method is not more efficient compared to other graph-traversal based approaches
… would disprove one (or two/three) of the hypotheses.
• What are the underlying assumptions? Are they sensible?
The assumption is, that the developed similarity method is used in the area of calculating document similarity. I think this is a sensible assumption, because these methods are mainly designed (already indicated by name) for this field.
• Has the work been critically questioned? Have you satisfied yourself that it is sound science?
(This could only answered by the author, but I will try)
Of course, the work has been critically questioned, because the outcome was not certain at the beginning of the research task. It is really science, because research in this area and getting answers of the research questions will provide an advance in science – especially in the domain ‘information retrieval’.
• What forms of evidence are to be used? If it is a model or a simulation, what demonstrates that the results have practical validity?
Experiments and simulation will generate the evidence of the hypotheses.
• How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?
Retrieving information about the efficiency of the algorithm/implementation will be objective. The correlation about all documents will be also objective, but the differencing into small documents is not objective, but should be done by an appropriate treshold value for the document size.
• What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?
The qualitative aims are to produce a higher correlation compared to currently used techniques. The quantitative measure is the higher efficiency compared to graph-based document models – may resulting to get a efficiency as high as word-distribution-based methods.
• What compromises or simplifications are inherent in your choice of measure?
The used documents for the experiment/simulation are representative.
• Will the outcomes be predictive?
Yes – If everything is like propagated the outcome should be: A high correlation/ High efficiency. Numbers of measurement in both disciplines are just vague predictive.
• What is the argument that will link the evidence to the hypothesis?
The argument that the simulation/experiment will be very similar to a everyday usage of the method will link the evidence to the hypothesis.
• To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?
Positive results will show ‘how good’ the hypotheses are matched.
Negative results will disprove that the achieved quality and/or efficiency of the developed similarity method fits the criteria. But it will not exclude the general possibility, that similar techniques can be good enough to fit all the criteria.
• What are the likely weaknesses of or limitations to your approach?
There are no apparent weaknesses in the targeted approach. One limitation is, that the method can only compare text documents, and is not applicable for multimedia retrieval (e.g. finding similar images).
Virtualization is a technology that allows sharing the resources of a physical computer between multiple operating systems (OSs). The concept of Virtual Machines (VMs) was introduced in the 60’s by IBM. One revolutionary step was done by the FreeBSD project in 2000 with introducing FreeBSD Jails. With this work, the concept of operating system level virtualization – as one of two big categories of virtualization technologies – was boosted in the BSD community and later becomes also in the GNU/Linux world more and more popular.
Nowadays, companies, which act in the field of cloud computing, like Amazon, use such operating system level virtualization approaches for sharing their server capabilities in a highly dynamic way to customers.
First, we classify the main virtualization technologies into two categories and present a deeper introduction to operating system level virtualization. Secondly, we show differences of the two virtualization categories by comparing two popular representatives of each category. Finally we present a summary of the comparision and outline the advantages and disadvantages of *operating system level virtualization*.
(My insertions are intended to be based on American English rules)
In the following you can see my best five references for my state of the art paper: “Operating system level virtualization technologies”. I just found one interesting book of my topic, which was not interesting enough for beeing in my best five references, so this list contains only journal and conference papers.
 – Soltesz, Stephen, et al. “Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors.” ACM SIGOPS Operating Systems Review. Vol. 41. No. 3. ACM, 2007.
The journal paper introduces the topic of container-based virtualization, which covers the subject of my work in large parts. Furthermore it shows differences of this next-gen approach with the previously almost universally used hypervisor virtualization.
Google scholar shows a citation count of 441, which seems to be a high number in this area.
 – Kamp, Poul-Henning, and Robert NM Watson. “Jails: Confining the omnipotent root.” Proceedings of the 2nd International SANE Conference. Vol. 43. 2000.
The conference paper of developer of “FreeBSD jails” write with first-hand knowledge about this technology. “FreeBSD jails” itself was a technology game changer and the conceptual ancestor to current used systems (e.g. lxc).
The paper was submitted to SANE (System Administration and Network Engineering) conference in 2000. Google scholar shows a citation count of 338, which is also a good number.
 – Felter, Wes, et al. “An updated performance comparison of virtual machines and linux containers.” Performance Analysis of Systems and Software (ISPASS), 2015IEEE International Symposium On Performance Analysis and of Systems and Software. IEEE, 2015.
This relatively new conference paper compares specific software of both approaches: container based virtualization and hypervisor based virtualization. The presented information can be utilised very well in my work.
 – Rodríguez-Haro, Fernando, et al. “A summary of virtualization techniques.” Procedia Technology 3 (2012): 267-272.
This journal paper provides a gentle introduction to the virtualization techniques and an overview of the main concepts of virtualization. Because it is a good paper to classify the different approaches of virtualization I will use this work for retrieving basic informations.
Google scholar shows a citation count of 11.
 – Rathore, Muhammad Siraj, Markus Hidell, and Peter Sjödin. “KVM vs. LXC: comparing performance and isolation of hardware-assisted virtual routers.” American Journal of Networks and Communications 2.4 (2013): 88-96.
This journal paper provides a deeper comparision between the hypervisor virtualization with the newer container-based virtualization by investigating representative software of each type.
Google scholar shows a citation count of “only” 5, but because the results of the paper seems to be very interesting for my work I choosed this as part of my “best five references”.
What is the question of my bachelor thesis?
The question of my bachelor thesis is: How a data structure and data management can be designed to make it possible to render an arbitrarily sized multichannel volume dataset with allowing an high rendering speed on casual consumer hardware.
I think I have to add here a short motivation: Medical and biological 3D datasets can be very, very huge (>100GB), but it is kind of a computer scientist’s dream that they can be processed by ordinary computers in a smooth way.
The company for I was researching for with my bachelor thesis already knew that the goal is in parts achieveable – and therefore how the question should be answered. That was caused by the fact an other software company already achieved this goal in parts. But unfortunately they don’t offered any details of their implementation and researching for getting this done.
What was the procedure to resolve the question?
At first I had looked for related work in the specific domain space – medical and biological volume data visualization – of my bachelor thesis. Afterwards I sized up the search range incrementally to get more interesting papers.
Some of the retrieved papers could answer specific questions to my implementation in more detail or bring up some more ideas. Especially nice caching algorithms were interesting at the end, which is needed in many computer science areas.
In fact I the main part of my thesis was built up on the statements of only a few papers (three or four in numbers) for my work, because they covered my task very good and had different details in their approaches.
While I creating my theories about how a good data structure and data management should be I started writing some code to verify the most hypothesis I created during my research phase. Unfortunately I had not such a good time management to produce an implementation, that covers all of my theoretical work.
Sometimes I needed some information I could not retrieve during my researching phase, so I had to create these information byself through writing some tools for benchmarking, like Input/Output performance depending on different operating systems, hardware and file systems. These results was interesting for my experimental implementation, but creating these results slowed down my progressing.
What are the results of the question?
My result is a detailed concept of a data structure and management for the special case of volume data visualization including a experimental implementation, which is able to load very huge datasets and render them with standard rendering techniques.
1. Technical Overview: Urban Internet of Things
Alternative: Urban Internet of Things overview with evaluation of Padova Smart City project
2. Black-box string test case generation
Alternative: Generation of superior string test sets for real-world programs
Operating system level virtualization technologies
State of the art paper about technologies, allowing to keep services in separated, restricted environments. The paper should include an in-depth view of lxc, docker, rocket and bsd jails. May a comparision with “traditionally” virtualization technologies will be also part of the paper.