DB II - Programmierprojekt
This is the documentation of the programming project for the exercises in the Lecture 'Databases Implementation Techniques' (DB2) in summer term 2020. You can find general information about the lecture here.
The purpose of the programming tasks is to deepen your knowledge in selected aspects of the lecture. This year, we decided to set this focus on compression techniques in column oriented database management systems. Furthermore, we choose C++ as programming language, because it is the most frequently used programming language for database management systems (except C). The task is to implement compression techniques in our framework. We provide a set of classes as presetting, where you have to include an implementation w.r.t. an interface. You can download the sources here. A set of unit tests will help you during the development process to identify errors. The same unit tests will be used at the end of the term to validate your solution. A working implementation is a necessary prerequisite to participate in the exam!
You may choose between the following compression techniques (you may suggest other compression techniques as well):
- Run Length Encoding
- Delta Coding
- Bit-Vector Encoding
- Dictionary Encoding
- Frequency Partitioning
All compression techniques are explained in the lecture. You can find the slides here.
Students will form teams of two students each.
Please register your team until the 11.05.2020 via moodle.
Solutions are to be submitted via moodle.
The deadline is the 06.07.2020 at 23:59 o'clock.
Note that the deadline is strict, there will be no deadline extension.
Solutions will be presented and discussed by each team in the last exercise.
Teams consisting of bachelor students have to implement two compression techniques and will receive 5 credit points when they pass the exam.
Teams consisting of master students have to implement three compression techniques, because they will receive 6 credit points when they pass the exam.
We will check the quality of your submitted solution. It has to pass the unit tests, implement the compression technique it represents, and may not be a copy of a solution submitted by another team or any third party implementation. Solutions who fail to fulfill only one of these requirements will not be able to participate in the exam.
The framework runs on Linux and Windows (cygwin) with common C++ compilers (g++, clang). You need to install the boost libraries (Serialization, Any), which can be installed easily on Linux and Windows (cygwin).
Setup in Ubuntu
Open a terminal and type:
sudo apt-get install build-essential libboost-all-dev doxygen
Then, enter the directory you unpacked the archive with the source code and type the following commands to build the program, the documentation and run the program:
Setup for Windows (Cygwin) - Unsupported
As for Windows (cygwin), you need to install the necessary packages using the GUI of the cygwin setup program, which you can download on the official website. You should install the latest version of the boost libraries, the compiler you wish to use (e.g., g++, clang), the make program, as well as a tool to unpack the source archive. The build steps are the same as for Ubuntu.
To implement your selected compression technique, you have to inherit from the base class CoGaDB::CompressedColumn and implement it's pure virtual methods (similar to an abstract method in Java). You can test your class by creating an instance and pass a pointer to the unittest function. We prepared an example in the project, the CoGaDB::DictionaryCompressedColumn, which is stored in the file compression dictionary_compressed_column.hpp.
You should familiarize yourself with the following features of the C++ language:
- pointers, references, and smart pointers
- create objects on the heap with new
- call by value and call by reference
- public inheritance
- basic STL containers, such as std::vector and std::list
- basic templates and how to use them
You can find a lot of useful examples in the framework code, e.g., the unit tests.
Recommended (selected) sources of information about C++ are:
- Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, 4th edition, 2013
- Scott Meyers. Effective C++: 55 Specific Ways to Improve Your Programs and Designs, Addison-Wesley, 3rd edition, 2005