The research topic was only defined as follows: XML query processing using GPGPU. In order to clearly introduce this topic, terms and acronyms will be defined first, then the core idea will be explained.
eXtensible Markup Language (XML) is both human- and machine-readable, tree-based language widely used in computer world for the transmission and the storage of data. This simplification of the SGML language says [W-XML], was designed to ease and spread the usage of an interoperable language over the internet. This aim has been reached nowadays since the XML language is used by a huge amount of applications and most programming languages feature an XML parser. Actually XML is a meta-markup language and is used for creating other markup languages. XML tags are used to describe the contents of a document.
XML Path (XPath) language has been created to address parts of XML documents. XPath is required by both XPointer and XSL transformation standards, consequently of the broad XML usage, the need to locate specific data in XML documents became high and revealed a trade off: Keep data in convenient XML format with slow querying or convert data into more efficient database format with faster querying?
The biggest is the amount a XML data, the most time-consuming is the conversion option. My research is about finding a way to speedup XPath queries over big XML documents.
Graphic Processing Unit (GPU) is a processor architecture designed from the beginning for efficient and massive parallel execution. According to the increase of graphic rendering complexity,
these chips gained a more general purpose design. GPGPU acronym refers to these new GPU architectures and stands for General Purpose GPU.
The strategy of Professor Amagasa (who suggested this idea) is to execute on GPU, several instances of the same query processing algorithm addressing different data partitions. This is data parallelism
(opposed to task parallelism where one algorithm processes several data in the same time). Although this hardware is called "general purpose", several limitations have to be overcome.
"Stream processing is a computer programming paradigm" [W-STRPROC]. This paradigm was invented to create programs that use a limited form of parallel execution. A stream is a set of data and the stream processing consists in applying one or more hardware instructions to each element of the stream. That kind of parallel execution is limited since there is no control on the way this process is done (synchronization, communication, ...). The series of operations is the same for each data therefore it cannot use any conditional branching which depends on the data value.
Stream S0 | 4 | 18 | 12 | 5 |
---|---|---|---|---|
Operations |
1 m = 10
2 if S0[i] > m
3 S1[i] += 2
4 else
5 S1[i] = 0
|
|||
Stream S1 | 0 | 20 | 14 | 0 |
Stream S0 | 4 | 18 | 12 | 5 |
---|---|---|---|---|
Operations |
1 m = 10
2 if m > 5
3 S1[i] += 2
4 else
5 S1[i] = 0
|
|||
Stream S1 | 6 | 20 | 14 | 7 |
Vector processing is a computer programming paradigm similar to stream processing paradigm. The major difference between vector and stream paradigm is the memory access pattern. Vector processing operators involve a memory read, an execution on an instruction over several data and finally a memory write. Memory access for each single-instruction leads to high bandwidth use.
The practical target of this research is to evaluate the opportunity of using GPU as XML query coprocessor for programs dealing with a big amount of XML data.
In comparison with their parallel processing power, GPU are cheaper than most dedicated chips for parallel execution (such as DSP).
Making good use of GPU hardware, because it is still very graphic processing oriented, is a great challenge:
What is the practical purpose of this research?
My long-term plan is to build a hybrid webserver which makes use of CPU and GPU. This solution might be a cheaper alternative to multiprocessor configurations found nowadays. The main advantage of the hybrid solution would be its scalability since current motherboards can host from zero up to four discrete graphic cards (known as 4-way SLI for nVIDIA, 4-way Crossfire for AMD). At low request rate, CPU would handle the whole process and GPU devices would be added progressively while the rate increases.