Publications by Frank Eichinger

Frank Eichinger, Jannik Kiesel, Matthias Dorner and Stefan Arnold. Estimations of Professional Experience with Panel Data to Improve Salary Predictions. International Conference on Artificial Intelligence (BCS AI). 2023. [ bib | DOI | .pdf ]

Predicting salaries is crucial in business. While prediction models can be trained on large and real salary datasets, they typically lack information regarding professional experience, an essential factor for salary. We investigate various regression techniques for the estimation of professional experience based on data from the Socio-Economic Panel (SOEP) to augment data sets. We further show how to integrate such models into applications and evaluate the usefulness for salary prediction on a large real payroll dataset.

Regression models: https://www.it-management.rw.fau.de/sgai/

Frank Eichinger and Moritz Mayer. Predicting Salaries with Random-Forest Regression. In Bader Alyoubi, Chiheb-Eddine Ben N'Cir, Ibraheem Alharbi and Anis Jarboui, eds., Machine Learning and Data Analytics for Solving Business Problems, vol. 11 of Unsupervised and Semi-Supervised Learning, chap. 1, pp. 1--21. Springer, 2022. [ bib | DOI | http ]

For companies it is essential to know the market price of the salaries of their current and prospective employees. Predicting such salaries is challenging, as many factors need to be considered, and large real datasets for learning are scarce. For this reason, research on salary predictions is comparably rare and limited. In this study, we investigate whether and how an advanced machine-learning approach, namely ensembles of random-forest regression, can achieve high-quality salary predictions. We use a large real dataset of more than three million employees and more than 300 professions. Our approach learns -- for each profession -- a random-forest regression model to predict salaries. In our evaluation, we show that this approach performs better than related work on salary prediction by machine-learning approaches with a mean absolute percentage error (MAPE) of 17.1%. We identify reducing the number of possible values of categorical variables, training separate models as well as outlier handling as the key factors for the results achieved.

Frank Eichinger, Pavel Efros, Stamatis Karnouskos and Klemens Böhm. A Time-Series Compression Technique and its Application to the Smart Grid. The VLDB Journal, 24(2):193--218, 2015. [ bib | DOI | http | .pdf ]

Time-series data is increasingly collected in many domains. One example is the smart electricity infrastructure, which generates huge volumes of such data from sources such as smart electricity meters. Although today this data is used for visualization and billing in mostly 15-min resolution, its original temporal resolution frequently is more fine-grained, e.g., seconds. This is useful for various analytical applications such as short-term forecasting, disaggregation and visualization. However, transmitting and storing huge amounts of such fine-grained data is prohibitively expensive in terms of storage space in many cases. In this article, we present a compression technique based on piecewise regression and two methods which describe the performance of the compression. Although our technique is a general approach for time-series compression, smart grids serve as our running example and as our evaluation scenario. Depending on the data and the use-case scenario, the technique compresses data by ratios of up to factor 5,000 while maintaining its usefulness for analytics. The proposed technique has outperformed related work and has been applied to three real-world energy datasets in different scenarios. Finally, we show that the proposed compression technique can be implemented in a state-of-the-art database management system.

The final publication is available at Springer via https://dx.doi.org/10.1007/s00778-014-0368-8.

Frank Eichinger and Immanuel Wietreich. RecoLeta: A Recommender System for Events for Personalised E-Mail Campaigns. Datenbanksysteme für Business, Technologie und Web (BTW). 2015. [ bib | http | .pdf ]

We demonstrate the RecoLeta system for event recommendations. It combines two different recommender approaches: one novel approach dedicated to music concert events and one state-of-the-art approach. We also present our big-data architecture for e-mail delivery and recommendation calculation in an in-memory database.

Frank Eichinger, Victor Pankratius and Klemens Böhm. Data Mining for Defects in Multicore Applications: An Entropy-Based Call-Graph Technique. Concurrency and Computation: Practice and Experience, 26(1):1--20, 2014. [ bib | DOI | http | .pdf ]

Multicore computers are ubiquitous. Expert developers as well as developers with little experience in parallelism are now asked to create multithreaded software in order to exploit parallelism in mainstream shared-memory hardware. However, finding and fixing parallel programming errors is a complex and arduous task. Programmers thus rely on tools such as race detectors that typically focus on reporting errors due to incorrect usage of synchronization constructs or due to missing synchronization. This arsenal of debugging techniques, however, is incomplete. This article presents a new perspective and addresses a largely unexplored direction of defect localization where a wrong usage of non-parallel programming constructs might cause wrong parallel application behavior. In particular, we make a contribution by showing how to use data-mining techniques to locate defects in multithreaded shared-memory programs. Our technique analyzes execution anomalies in a condensed representation of the dynamic call graphs of a multithreaded object-oriented application and identifies methods that contain a defect. Compared to race detectors that concentrate on finding incorrect synchronization, our method is able to reveal a wider range of defects that affect the control flow of a parallel program. Results from controlled experiments show that our data-mining approach not only finds race conditions in different types of multicore applications, but also other errors that cause incorrect parallel program behavior. Data-mining techniques offer a fruitful new ground for parallel program debugging, and we also discuss long-term directions for this interesting field.

Frank Eichinger, Daniel Pathmaperuma, Harald Vogt and Emmanuel Müller. Data Analysis Challenges in the Future Energy Domain. In Ting Yu, Nitesh Chawla and Simeon Simoff, eds., Computational Intelligent Data Analysis for Sustainable Development, Data Mining and Knowledge Discovery Series, chap. 7, pp. 181--242. Chapman and Hall/CRC, 2013. [ bib | DOI | http ]

Luigi Briguglio, Frank Eichinger, Massimiliano Nigrelli and Javier Lucio Ruiz-Andino. Marketplaces for Energy Demand-Side Management based on Future-Internet Technology. Computing Research Repository (CoRR) in arXiv, abs/1304.5346, 2013. [ bib | DOI | http | .pdf ]

Renewable energies become more important, and they contribute to the EU’s goals for greenhouse-gas reduction. However, their fluctuating nature calls for demand-side-management techniques, which balance energy generation and consumption. Such techniques are currently not broadly deployed. This paper describes the latest results from the FINSENY project on how Future-Internet enablers and market mechanisms can be used to realise such systems.

Luigi Briguglio, Massimiliano Nigrelli, Frank Eichinger, Javier Lucio Ruiz-Andino and Valter Bella. A Marketplace-Based Approach to Demand-Side Management in the Smart Grid. ERCIM News, 2013(92):32--33, 2013. [ bib | http ]

Market mechanisms facilitated by Future Internet technologies will help manage energy demand in the smart grid. New scenarios, stakeholders and services will need to be considered in the development of strategies to manage energy generation, distribution and consumption.

Frank Eichinger. Data-Mining Techniques for Call-Graph-Based Software-Defect Localisation. Ph.D. thesis, Department of Informatics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, 2011. [ bib | DOI | .pdf ]

Defect localisation is an important problem in software engineering. This dissertation investigates call-graph-mining-based software defect localisation, which supports software developers by providing hints where defects might be located. It extends the state-of-the-art by proposing new graph representations and mining techniques for weighted graphs. This leads to a broader range of detectable defects, to an increased localisation precision and to enhanced scalability.

Frank Eichinger, Christopher Oßner and Klemens Böhm. Scalable Software-Defect Localisation by Hierarchical Mining of Dynamic Call Graphs. International Conference on Data Mining (SDM). 2011. [ bib | DOI | http | .pdf ]

The localisation of defects in computer programmes is essential in software engineering and is important in domain-specific data mining. Existing techniques which build on call-graph mining localise defects well, but do not scale for large software projects. This paper presents a hierarchical approach with good scalability characteristics. It makes use of novel call-graph representations, frequent subgraph mining and feature selection. It first analyses call graphs of a coarse granularity, before it zooms-in into more fine-grained graphs. We evaluate our approach with defects in the Mozilla Rhino project: In our setup, it narrows down the code a developer has to examine to about 6% only.

Frank Eichinger and Klemens Böhm. Software-Bug Localization with Graph Mining. In Charu C. Aggarwal and Haixun Wang, eds., Managing and Mining Graph Data, vol. 40 of Advances in Database Systems, chap. 17, pp. 515--546. Springer, 2010. [ bib | DOI | http ]

In the recent past, a number of frequent subgraph mining algorithms has been proposed. They allow for analyses in domains where data is naturally graph-structured. However, caused by scalability problems when dealing with large graphs, the application of graph mining has been limited to only a few domains. In software engineering, debugging is an important issue. It is most challenging to localize bugs automatically, as this is expensive to be done manually. Several approaches have been investigated, some of which analyze traces of repeated program executions. These traces can be represented as call graphs. Such graphs describe the invocations of methods during an execution. This chapter is a survey of graph mining approaches for bug localization based on the analysis of dynamic call graphs. In particular, this chapter first introduces the subproblem of reducing the size of call graphs, before the different approaches to localize bugs based on such reduced graphs are discussed. Finally, we compare selected techniques experimentally and provide an outlook on future issues.

Frank Eichinger, Matthias Huber and Klemens Böhm. On the Usefulness of Weight-Based Constraints in Frequent Subgraph Mining. International Conference on Artificial Intelligence (BCS AI). 2010. [ bib | DOI ]

Frequent subgraph mining is an important data-mining technique. In this paper we look at weighted graphs, which are ubiquitous in the real world. The analysis of weights in combination with mining for substructures might yield more precise results. In particular, we study frequent subgraph mining in the presence of weight-based constraints and explain how to integrate them into mining algorithms. While such constraints only yield approximate mining results in most cases, we demonstrate that such results are useful nevertheless and explain this effect. To do so, we both assess the completeness of the approximate result sets, and we carry out application-oriented studies with real-world data-analysis problems: software-defect localization and explorative mining in transportation logistics. Our results are that the runtime can improve by a factor of up to 3.5 in defect localization and 7 in explorative mining. At the same time, we obtain an even slightly increased defect-localization precision and obtain good explorative mining results.

Note that there is an extended version of this conference paper in a technical report available for download: https://publikationen.bibliothek.kit.edu/1000017769.

Experimental data: https://dbis.ipd.kit.edu/download/eichi/eichinger10on/ ParSeMiS Extensions: https://sdqweb.ipd.kit.edu/wiki/ParSeMiS-Extensions

Frank Eichinger, Matthias Huber and Klemens Böhm. On the Usefulness of Weight-Based Constraints in Frequent Subgraph Mining. Karlsruhe Reports in Informatics 2010,10, Department of Informatics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, 2010. [ bib | DOI | http ]

Frequent subgraph mining is an important data-mining technique. In this paper we look at weighted graphs, which are ubiquitous in the real world. The analysis of weights in combination with mining for substructures might yield more precise results. In particular, we study frequent subgraph mining in the presence of weight-based constraints and explain how to integrate them into mining algorithms. While such constraints only yield approximate mining results in most cases, we demonstrate that such results are useful nevertheless and explain this effect. To do so, we both assess the completeness of the approximate result sets, and we carry out application-oriented studies with real-world data-analysis problems: software-defect localization, weighted graph classification and explorative mining in logistics. Our results are that the runtime can improve by a factor of up to 3.5 in defect localization and classification and 7 in explorative mining. At the same time, we obtain an even slightly increased defect-localization precision, stable classification precision and obtain good explorative mining results.

Note that there is a shorter conference version of this technical report: https://dx.doi.org/10.1007/978-0-85729-130-1_5.

Experimental data: https://dbis.ipd.kit.edu/download/eichi/eichinger10on/ ParSeMiS Extensions: https://sdqweb.ipd.kit.edu/wiki/ParSeMiS-Extensions

Frank Eichinger, David Kramer, Klemens Böhm and Wolfgang Karl. From Source Code to Runtime Behaviour: Software Metrics Help to Select the Computer Architecture. Knowledge-Based Systems, 23(4):343--349, 2010. [ bib | DOI ]

The decision which hardware platform to use for a certain application is an important problem in computer architecture. This paper reports on a study where a data-mining approach is used for this decision. It relies purely on source-code characteristics, to avoid potentially expensive programme executions. One challenge in this context is that one cannot infer how often functions that are part of the application are typically executed. The main insight of this study is twofold: (a) Source-code characteristics are sufficient nevertheless. (b) Linking individual functions with the runtime behaviour of the programme as a whole yields good predictions. In other words, while individual data objects from the training set may be quite inaccurate, the resulting model is not.

Note that there is a conference version of this paper available for download: https://dx.doi.org/10.5445/IR/1000012935.

Frank Eichinger, Klaus Krogmann, Roland Klug and Klemens Böhm. Software-Defect Localisation by Mining Dataflow-Enabled Call Graphs. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). 2010. [ bib | DOI | http | .pdf ]

Defect localisation is essential in software engineering and is an important task in domain-specific data mining. Existing techniques building on call-graph mining can localise different kinds of defects. However, these techniques focus on defects that affect the controlflow and are agnostic regarding the dataflow. In this paper, we introduce dataflow-enabled call graphs that incorporate abstractions of the dataflow. Building on these graphs, we present an approach for defect localisation. The creation of the graphs and the defect localisation are essentially data mining problems, making use of discretisation, frequent subgraph mining and feature selection. We demonstrate the defect-localisation qualities of our approach with a study on defects introduced into Weka. As a result, defect localisation now works much better, and a developer has to investigate on average only 1.5 out of 30 methods to fix a defect.

The original publication is available at https://link.springer.com/chapter/10.1007/978-3-642-15880-3_33.

Experimental data: https://dbis.ipd.kit.edu/download/eichi/eichinger10software-defect/

Frank Eichinger, Victor Pankratius, Philipp W. L. Große and Klemens Böhm. Localizing Defects in Multithreaded Programs by Mining Dynamic Call Graphs. Testing: Academic and Industrial Conference -- Practice and Research Techniques (TAIC PART). 2010. [ bib | DOI | http | .pdf ]

Writing multithreaded software for multicore computers confronts many developers with the difficulty of finding parallel programming errors. In the past, most parallel debugging techniques have concentrated on finding race conditions due to wrong usage of synchronization constructs. A widely unexplored issue, however, is that a wrong usage of non-parallel programming constructs may also cause wrong parallel application behavior. This paper presents a novel defect-localization technique for multithreaded shared-memory programs that is based on analyzing execution anomalies. Compared to race detectors that report just on wrong synchronization, this method can detect a wider range of defects affecting parallel execution. It works on a condensed representation of the call graphs of multithreaded applications and employs data-mining techniques to locate a method containing a defect. Our results from controlled application experiments show that we found race conditions, but also other programming errors leading to incorrect parallel program behavior. On average, our approach reduced in our benchmark the amount of code to be inspected to just 7.1% of all methods.

The original publication is available at https://link.springer.com/chapter/10.1007/978-3-642-15585-7_7.

Frank Eichinger and Klemens Böhm. Selecting Computer Architectures by Means of Control-Flow-Graph Mining. International Symposium on Intelligent Data Analysis (IDA). 2009. [ bib | DOI | http | .pdf ]

Deciding which computer architecture provides the best performance for a certain program is an important problem in hardware design and benchmarking. While previous approaches require expensive simulations or program executions, we propose an approach which solely relies on program analysis. We correlate substructures of the control-flow graphs representing the individual functions with the runtime on certain systems. This leads to a prediction framework based on graph mining, classification and classifier fusion. In our evaluation with the SPEC CPU 2000 and 2006 benchmarks, we predict the faster system out of two with high accuracy and achieve significant speedups in execution time.

The original publication is available at https://link.springer.com/chapter/10.1007/978-3-642-03915-7_27.

Frank Eichinger and Klemens Böhm. Towards Scalability of Graph-Mining Based Bug Localisation. International Workshop on Mining and Learning with Graphs (MLG). 2009. [ bib | DOI | .pdf ]

(Semi-)automated bug localisation is an important issue in software engineering. Recent techniques based on call graphs and graph mining can locate bugs in relatively small programs, but do not scale for real-world applications. In this paper we describe a bug-localisation approach based on graph mining that has this property, at least according to preliminary experiments. Our main contribution is the definition and analysis of class-level call graphs, with encouraging results.

Frank Eichinger, David Kramer, Klemens Böhm and Wolfgang Karl. From Source Code to Runtime Behaviour: Software Metrics Help to Select the Computer Architecture. International Conference on Artificial Intelligence (BCS AI). 2009. [ bib | DOI | http | .pdf ]

The decision which hardware platform to use for a certain application is an important problem in computer architecture. This paper reports on a study where a data-mining approach is used for this decision. It relies purely on source-code characteristics, to avoid potentially expensive program executions. One challenge in this context is that one cannot infer how often functions that are part of the application are typically executed. The main insight of this study is twofold: (a) Source-code characteristics are sufficient nevertheless. (b) Linking individual functions with the runtime behaviour of the program as a whole yields good predictions. In other words, while individual data objects from the training set may be quite inaccurate, the resulting model is not.

Note that there is a journal version of this paper: https://dx.doi.org/10.1016/j.knosys.2009.11.014.

Frank Eichinger and Klemens Böhm. Kombiniertes Mining von strukturellen und relationalen Daten. Workshop on Foundations of Databases (Grundlagen von Datenbanken, GvD). 2008. [ bib | DOI | .pdf ]

Data Mining Techniken wie Klassifikation, Regression und Clusteranalyse finden heutzutage eine weite Verbreitung. Entsprechende relationale Daten liegen in vielen Anwendungsdomänen vor, und effiziente Data Mining Algorithmen sind in kommerzielle Werkzeuge sowie in Datenbank Management Systeme integriert. In den letzten Jahren wurden aber auch verschiedene strukturelle Data Mining Techniken entwickelt, die z.B. mit Graph-basierten Daten arbeiten. Solche Techniken erschließen neue Anwendungsgebiete, bieten aber auch das Potential, bisherige Techniken zu ergänzen. Oft können durch Kombination bisherige Ergebnisse verbessert werden. In diesem Beitrag präsentieren wir Arbeiten aus dem Bereich der Vorhersage von Kundenverhalten und der Fehlersuche in Software, in denen strukturelle und relationale Data Mining Techniken erfolgreich kombiniert wurden. Schließlich geben wir einen Ausblick auf weitere Anwendungsgebiete und zukünftige Herausforderungen.

Frank Eichinger, Klemens Böhm and Matthias Huber. Mining Edge-Weighted Call Graphs to Localise Software Bugs. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). 2008. [ bib | DOI | http | .pdf ]

An important problem in software engineering is the automated discovery of noncrashing occasional bugs. In this work we address this problem and show that mining of weighted call graphs of program executions is a promising technique. We mine weighted graphs with a combination of structural and numerical techniques. More specifically, we propose a novel reduction technique for call graphs which introduces edge weights. Then we present an analysis technique for such weighted call graphs based on graph mining and on traditional feature selection schemes. The technique generalises previous graph mining approaches as it allows for an analysis of weights. Our evaluation shows that our approach finds bugs which previous approaches cannot detect so far. Our technique also doubles the precision of finding bugs which existing techniques can already localise in principle.

The original publication is available at https://link.springer.com/chapter/10.1007/978-3-540-87479-9_40.

Experimental data: https://dbis.ipd.kit.edu/download/eichi/eichinger08mining/ Video: https://videolectures.net/ecmlpkdd08_eichinger_mewc/

Frank Eichinger, Klemens Böhm and Matthias Huber. Improved Software Fault Detection with Graph Mining. International Workshop on Mining and Learning with Graphs (MLG). 2008. [ bib | DOI | .pdf ]

This work addresses the problem of discovering bugs in software development. We investigate the utilisation of call graphs of program executions and graph mining algorithms to approach this problem. We propose a novel reduction technique for call graphs which introduces edge weights. Then, we present an analysis technique for such weighted call graphs based on graph mining and on traditional feature selection. Our new approach finds bugs which could not be detected so far. With regard to bugs which can already be localised, our technique also doubles the precision of finding them.

Video: https://videolectures.net/mlg08_eichinger_isfd/

Frank Eichinger, Detlef D. Nauck and Frank Klawonn. Sequence Mining for Customer Behaviour Predictions in Telecommunications. Workshop on Practical Data Mining: Applications, Experiences and Challenges. 2006. [ bib | DOI | .pdf ]

Predicting the behaviour of customers is challenging, but important for service oriented businesses. Data mining techniques are used to make such predictions, typically using only recent static data. In this paper, a sequence mining approach is proposed, which allows taking historic data and temporal developments into account as well. In order to form a combined classifier, sequence mining is combined with decision tree analysis. In the area of sequence mining, a tree data structure is extended with hashing techniques and a variation of a classic algorithm is presented. The combined classifier is applied to real customer data and produces promising results.


This file was generated by bibtex2html 1.98.