LNCS Proceedings Papers
Petr Bauch, Vojtěch Havel, and Jiří Barnat,
LTL Model Checking of LLVM Bitcode with Symbolic Data
The correctness of parallel and reactive programs is often easier specified using formulae of temporal logics. Yet verifying that a system satisfies such specifications is more difficult than verifying safety properties: the recurrence of a specific program state has to be detected. This paper reports on the development of a generic framework for automatic verification of linear temporal logic specifications for programs in LLVM bitcode. Our method searches explicitly through all possible interleavings of parallel threads (control non-determinism) but represents symbolically the variable evaluations (data non-determinism), guided by the specification in order to prove the correctness. To evaluate the framework we compare our method with state-of-the-art tools on a set of unmodified C programs.

Stephan Beyer and Markus Chimani,
Steiner Tree 1.39-Approximation in Practice
We consider the currently strongest Steiner tree approximation algorithm that has recently been published by Goemans, Olver, Rothvoß and Zenklusen~(2012). It first solves a hypergraphic LP relaxation and then applies matroid theory to obtain an integral solution. The cost of the resulting Steiner tree is at most (1.39 + ε)-times the cost of an optimal Steiner tree where ε tends to zero as some parameter k tends to infinity. However, the degree of the polynomial running time depends on this constant k, so only small k are tractable in practice.
The algorithm has, to our knowledge, not been implemented and evaluated in practice before. We investigate different implementation aspects and parameter choices of the algorithm and compare tuned variants to an exact LP-based algorithm as well as to fast and simple 2-approximations.

Jan Fiedor, Zdeněk Letko, Joao Lourenco and Tomáš Vojnar,
On Monitoring C/C++ Transactional Memory Programs
Transactional memory (TM) is an increasingly popular technique for synchronising threads in multi-threaded programs. To address both correctness and performance-related issues of TM programs, one needs to monitor and analyse their execution. However, monitoring concurrent programs (including TM programs) can have a non-negligible impact on their behaviour, which may hamper objectives of the intended analysis. In this paper, we propose several approaches for monitoring TM programs and study their impact on the behaviour of the monitored programs. The considered approaches range from specialised lightweight monitoring to generic heavyweight monitoring. The implemented monitoring tools are publicly available for further applications, and the implementation techniques used for lightweight monitoring can be used as an inspiration for developing further specialised lightweight monitors.

Radek Hrbáček
Bent Functions Synthesis on Intel Xeon Phi Coprocessor
A new approach to synthesize bent Boolean functions by means of Cartesian Genetic Programming (CGP) has been proposed recently. Bent functions have important applications in cryptography due to their high nonlinearity. However, they are very rare and their discovery using conventional brute force methods is not efficient enough. In this paper, a new parallel implementation is proposed and the performance is evaluated on the Intel Xeon Phi Coprocessor.

Vojtěch Nikl and Jiří Jaroš,
Parallelisation of the 3D Fast Fourier Transform Using the Hybrid OpenMP/MPI Decomposition
The 3D fast Fourier transform (FFT) is the heart of many simulation methods. Although the efficient parallelisation of the FFT has been deeply studied over last few decades, many researchers only focused on either pure message passing (MPI) or shared memory (OpenMP) implementations. Unfortunately, pure MPI approaches cannot exploit the shared memory within the cluster node and the OpenMP cannot scale over multiple nodes.
This paper proposes a 2D hybrid decomposition of the 3D FFT where the domain is decomposed over the first axis by means of MPI while over the second axis by means of OpenMP. The performance of the proposed method is thoroughly compared with the state of the art libraries (FFTW, PFFT, P3DFFT) on three supercomputer systems with up to 16k cores. The experimental results show that the hybrid implementation offers 10-20 % higher performance and better scaling especially for high core counts.

Juraj Nižnan, Radek Pelánek, and Jiří Řihák,
Mapping Problems to Skills Combining Expert Opinion and Student Data
Construction of a mapping between educational content and skills is an important part of development of adaptive educational systems. This task is difficult, requires a domain expert, and any mistakes in the mapping may hinder the potential of an educational system. In this work we study techniques for improving a problem-skill mapping constructed by a domain expert using student data, particularly problem solving times. We describe and compare different techniques for the task - a multidimensional model of problem solving times and supervised classification techniques. In the evaluation we focus on surveying situations where the combination of expert opinion with student data is most useful.

Karel Štěpka and Martin Falk,
Image Analysis of Gene Locus Positions within Chromosome Territories in Human Lymphocytes
One of the important areas of current cellular research with substantial impacts on medicine is analyzing the spatial organization of genetic material within the cell nuclei. Higher-order chromatin structure has been shown to play essential roles in regulating fundamental cellular processes, like DNA transcription, replication, and repair. In this paper, we present an image analysis method for the localization of gene loci with regard to chromosomal territories they occupy in 3D confocal microscopy images. We show that the segmentation of the territories to obtain a precise position of the gene relative to a hard territory boundary may lead to undesirable bias in the results; instead, we propose an approach based on the evaluation of the relative chromatin density at the site of the gene loci. This method yields softer, fuzzier ``boundaries'', characterized by progressively decreasing chromatin density. The method therefore focuses on the extent to which the signals are located inside the territories, rather than a hard yes/no classification.

Vladimír Štill, Petr Ročkai, and Jiří Barnat,
Context-Switch-Directed Verification in DIVINE
In model checking of real-life C and C++ programs, both search efficiency and counterexample readability are very important. In this paper, we suggest context-switch-directed exploration as a way to find a well-readable counterexample faster. Furthermore, we allow to limit the number of context switches used in state-space exploration if desired. The new algorithm is implemented in the DIVINE model checker and enables both unbounded and bounded context-switch-directed exploration for models given in LLVM bitcode, which efficiently allows for verification of multi-threaded C and C++ programs.

David Wehner,
A New Concept in Advice Complexity of Job Shop Scheduling
In online scheduling problems, we want to assign jobs to machines while optimizing some given objective function. In the class we study in this paper, we are given a number $m$ of machines and two jobs that both want to use each of the given machines exactly once in some predefined order. Each job consists of $m$ tasks and each task needs to be processed on one particular machine. The objective is to assign the tasks to the machines while minimizing the makespan, i.e., the processing time of the job that takes longer. In our model, the tasks arrive in consecutive time steps and an algorithm must assign a task to a machine without having full knowledge of the order in which the remaining tasks arrive.
We study the advice complexity of this problem, which is a tool to measure the amount of information necessary to achieve a certain output quality. A great deal of research has been carried out in this field; however, this paper studies the problem in a new setting. In this setting, the oracle does not know the exact future anymore but only all possible future scenarios and their probabilities. This way, the additional information becomes more realistic. We prove that the problem is more difficult with this oracle than before. Moreover, in job shop scheduling, we provide a lower bound of 1+1/(6√m) on the competitive ratio of any online algorithm with advice and prove an upper bound of 1+1/√m on the competitive ratio of an algorithm from Hromkovic et al..

Regular Papers
Renata Avros, Vendula Hrubá, Bohuslav Křena, Zdeněk Letko, Hana Pluháčková, Tomáš Vojnar, Zeev Volkovich, and Shmuel Ur
Boosted Decision Trees for Behaviour Mining of Concurrent Programs
Testing of concurrent programs is difficult since the scheduling non-determinism requires one to test a huge number of different thread interleavings. Moreover, a simple repetition of test executions will typically examine similar interleavings only. One popular way how to deal with this problem is to use the noise injection approach, which is, however, parameterized with many parameters whose suitable values are difficult to find. In this paper, we propose a novel application of classification-based data mining for this purpose. Our approach can identify which test and noise parameters are the most influential for a given program and a given testing goal and which values (or ranges of values) of these parameters are suitable for meeting this goal. We present experiments that show that our approach can indeed fully automatically improve noise-based testing of particular programs with a particular testing goal. At the same time, we use it to obtain new general insights into noise-based testing as well.

Peter Bezděk, Nikola Beneš, Jiří Barnat, and Ivana Černá
LTL model checking of Parametric Timed Automata
The parameter synthesis problem for timed automata is undecidable in general even for very simple reachability properties. In this paper we introduce restrictions on parameter valuations under which the parameter synthesis problem is decidable for LTL properties. The proposed problem could be solved using an explicit enumeration of all possible parameter valuations. However, we introduce a symbolic zone-based method for synthesising bounded integer parameters of parametric timed automata with an LTL specification. Our method extends the ideas of the standard automata-based approach to LTL model checking of timed automata. Our solution employs constrained parametric difference bound matrices and a suitable notion of extrapolation.

Tomáš Čejka, Lukáš Kekely, Pavel Benáček, Rudolf B. Blažek, and Hana Kubátová
FPGA Accelerated Change-Point Detection Method for 100 Gb/s Networks
The aim of this paper is a hardware realization of a statistical anomaly detection method as a part of high-speed monitoring probe for computer networks. The sequential Non-Parametric Cumulative Sum (NP-CUSUM) procedure is the detection method of our choice and we use an FPGA based accelerator card as the target platform. For rapid detection algorithm development, a high-level synthesis (HLS) approach is applied. Furthermore, we combine HLS with the usage of Software Defined Monitoring (SDM) framework on the monitoring probe, which enables easy deployment of various hardware-accelerated monitoring applications into high-speed networks. Our implementation of NP-CUSUM algorithm serves as hardware plug-in for SDM and realizes the detection of network attacks and anomalies directly in FPGA. Additionally, the parallel nature of the FPGA technology allows us to realize multiple different detections simultaneously without any losses in throughput. Our experimental results show the feasibility of HLS and SDM combination for effective realization of traffic analysis and anomaly detection in networks with speeds up to 100Gb/s.

Milan Dvořák, Tomáš Závodník, and Jan Kořenek
Hardware Accelerated Book Handling with Unlimited Depth
Strong competitiveness between market participants on electronic exchanges calls for continuing reduction of latency of trading systems. Recent efforts are focused on hardware acceleration using FPGA technology and running trading strategies directly in hardware to eliminate high latency of system bus and software processing. For any trading system, book handling is an important time-critical operation, which has not been accelerated using the FPGA technology yet. Therefore we propose a new hybrid hardware-software architecture that processes messages from the exchange and creates the book with best buy and sell prices. Based on the analysis of transactions on the exchange, we propose to store only the most frequently accessed price levels in hardware and keep the rest in the host memory managed by software. This enables handling half of the whole stock universe (4 000 instruments) in a single FPGA. Update of price levels in hardware takes only 240 ns, which is two orders of magnitude faster than recent software implementations. Throughput of the hardware unit is 75 million messages per second, which is 140 times more than current peak rates of the market data.

Dušan Kolář and Peter Matula
Composite Data Type Recovery in a Retargetable Decompilation
Decompilation is a reverse engineering technique performing a transformation of a platform-dependent binary file into a High Level Language (HLL) representation. Despite its complexity, several decompilers have been developed in recent years. They are not yet advanced enough to serve as a standalone tool, but combined with the traditional disassemblers, they allow much faster understanding of analysed machine code. To achieve the necessary quality, many advanced analyses must be performed. One of the toughest, but most rewarding, is the data type reconstruction analysis. It aims to assign each object with a high level type, preferably the same as in the original source code. This paper presents the composite data type analysis used by the retargetable decompiler developed within the Lissom project at FIT BUT. We design a whole new address expression (AE) creation algorithm, which is both retargetable and suitable for operating on code in Static Single Assignment (SSA) form. Moreover, we devise a new AE aggregation rules that increase the quality of recovered data types.

Vlastimil Košař and Jan Kořenek
Multi-Stride NFA-Split Architecture for Regular Expression Matching Using FPGA
Regular expression matching is a time critical operation for any network security system. The NFA-Split is an efficient hardware architecture to match a large set of regular expressions at multigigabit speed with efficient FPGA logic utilization. Unfortunately, the matching speed is limited by processing only single byte in one clock cycle. Therefore, we propose new multi-stride NFA-Split architecture, which increases achievable throughput by processing multiple bytes per clock cycle. Moreover, we investigate efficiency of mapping DU to the FPGA logic and propose new optimizations of mapping NFA-Split architecture to the FPGA. These optimizations are able to reduce up to 71.85% of FPGA LUTs and up to 94.18% of BlockRAMs.

Alexander Meduna and Ondřej Soukup
Computational Completeness Resulting from Scattered Context Grammars Working Under Various Derivation Modes
This paper introduces and studies a whole variety of derivation modes in scattered context grammars. These grammars are conceptualized just like classical scattered context grammars except that during the applications of their rules, after erasing n nonterminals, they can insert new substrings possibly at different positions than the original occurrence of the erased nonterminal. The paper concentrates its attention on investigating the generative power of scattered context grammars working under these derivation modes. It demonstrates that all of them are computationally complete - that is, they characterize the family of recursively enumerable languages.

Jana Pazúriková and Luděk Matyska
Convergence of Parareal Algorithm Applied on Molecular Dynamics Simulations
Parallel and distributed computations based on the spatial decomposition of the problem are beginning to fail to saturate large supercomputers with their limited strong scalability. An application of the high performance computing, a molecular dynamics simulation, shows this limit especially in experiments with longer simulation times. When the parallelism in space does not suffice, the parallelism in time could cut the wallclock time at the expense of additional computational resources. The parareal algorithm that decomposes the temporal domain has been extensively researched and already applied to molecular dynamics simulations, however with rather modest results. We propose a novel modification that uses the coarse function with a simpler physics model, not a longer timestep as in state-of-the-art methods. The evaluation with a prototype implementation indicates that our method provides a rather long time window for the parallel-in-time computation, a reasonable convergence and stable properties of the simulated system.

Šimon Tóth
A Case for a Multifaceted Fairness Model: An Overview of Fairness Methods for Job Queuing and Scheduling
Job scheduling for HPC and Grid-like systems, while being a heavily studied subject, suffers from a particular disconnect between theoretical approaches and practical applications. Most production systems still rely on a small set of rather conservative scheduling policies. One of the areas that tries to bridge the world of scientific research and practical application is the study of fairness. Fairness in a system has strong implications on customer satisfaction, with psychological studies suggesting that perceived fairness is generally even more important than the quality of service. This paper provides an overview of different approaches for handling fairness in a job scheduling/queuing system. We start with analytic approaches that rely on statistical modeling and try to provide strong categorization and ordering of various scheduling policies according to their fairness. Following that we provide an overview of recent advancements that rely on simulations and use high resolution analysis to extract fairness information from realistic job traces. As a conclusion to this article, we propose a new direction for research. We propose a novel multifaceted fairness approach, i.e., a combination of different fairness models inside a single system, that could better capture the heterogeneous fairness-related requirements of different users in the system. It could serve as a solution to the shortcomings of some of the methods presented in this paper.