Lecture task: The process of my bachelor thesis

What is the question of my bachelor thesis?

The question of my bachelor thesis is: How a data structure and data management can be designed to make it possible to render an arbitrarily sized multichannel volume dataset with allowing an high rendering speed on casual consumer hardware.

I think I have to add here a short motivation: Medical and biological 3D datasets can be very, very huge (>100GB), but it is kind of a computer scientist’s dream that they can be processed by ordinary computers in a smooth way.

The company for I was researching for with my bachelor thesis already knew that the goal is in parts achieveable – and therefore how the question should be answered. That was caused by the fact an other software company already achieved this goal in parts. But unfortunately they don’t offered any details of their implementation and researching for getting this done.

What was the procedure to resolve the question?

At first I had looked for related work in the specific domain space – medical and biological volume data visualization – of my bachelor thesis. Afterwards I sized up the search range incrementally to get more interesting papers.
Some of the retrieved papers could answer specific questions to my implementation in more detail or bring up some more ideas. Especially nice caching algorithms were interesting at the end, which is needed in many computer science areas.
In fact I the main part of my thesis was built up on the statements of only a few papers (three or four in numbers) for my work, because they covered my task very good and had different details in their approaches.
While I creating my theories about how a good data structure and data management should be I started writing some code to verify the most hypothesis I created during my research phase. Unfortunately I had not such a good time management to produce an implementation, that covers all of my theoretical work.
Sometimes I needed some information I could not retrieve during my researching phase, so I had to create these information byself through writing some tools for benchmarking, like Input/Output performance depending on different operating systems, hardware and file systems. These results was interesting for my experimental implementation, but creating these results slowed down my progressing.

What are the results of the question?

My result is a detailed concept of a data structure and management for the special case of volume data visualization including a experimental implementation, which is able to load very huge datasets and render them with standard rendering techniques.

Summary of (a part of) my bachelor thesis

This is just about a third of the topics covered in my bachelor thesis. Nevertheless it is quite long. So I guess there are to many details. I admit that being inspired by Schl√ľko 3 my goal was actually to match it to the form denoted by the titles.

My bachelor thesis was elaborated in the context of a simulation system for chemical, thermal and hydrological processes in solid materials.
In computer-based simulation time series are used for the modeling of time-dependent effects on the simulation model. These effects originate from processes of systems that effect the modeled real-world system, but are not modeled themselves.
Time series are generated e.g by weather stations, that regularly gather measurements for properties of the local climate like temperature or humidity, or by gauging stations at rivers, that regularly gather measurements about the river like height of water or its velocity.

These devices, their sensors or their routines may fail. A Station may be out of power, a sensor may be blocked or biased by a leaf, measurement techniques may not work under certain unintended circumstances like extreme rainfall or errors in routines lead to erroneous representations for measurement results. Subsequently in generated time series certain data can be missing, non interpretable or less representative for the effect on the system that should be modeled. When this data is used for the model, the model will be less representative in comparison to a model, which does not contain these errors, and the simulation results will be less reliable.

In order to support the identification of those problems the goal of this work was to develop a visualization (using HTML and a certain JavaScript library) for time series. It should process such data and present the different problems in different manners, so the user is able to distinguish the problems.
Continue reading