spreadsheet available for visualization at https://docs.google.com/spreadsheets/d/e/2PACX-1vRg5WNl3PYbJPemzM1AtHvBqhI-iufeTO7t0aDadNGxb0NiBqmwULZlUir4vpS3c7R9dfi9j3THl0k0/pubhtml?gid=516629492&single=true,,,,,,,,,,,,,,bibref,What's the Distribution of Research Effort per Dimension? (Mapping Study),,,,,,,,,,,, ,,,,,,,,,,,,,,,LOG ENGINEERING = 24,,,LOG INFRASTRUCTURE = 16,,LOG ANALYSIS = 68,,,,,,, Demographics,,,,,,,,,,Structured Abstract,,,,,8,5,11,13,3,20,9,6,6,4,10,6,7 Venue,Acron.,Year,Type,Topic,Rank,Link,Authors,Paper title,Abstract,"WHAT (is this about, after all)?","WHY (are the authors motivated by the ""what"")?","HOW (do the authors support the ""what"" question)?",CONCLUSION (based on the experiments was...)?,bibref,Practices,Requirements,Implementation,Parsing,Storage,Anomaly Detection,Security & Privacy,Root Cause Analysis,Failure Prediction,Quality Assurance,Model Inf. & Invariant Mining,Reliability & Dependability,Log Platforms International Symposium on Software Reliability Engineering,ISSRE,1992,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/285886/,"Dong Tang , R.K. Iyer",Analysis of the VAX/VMS error logs in multicomputer environments-a case study of software dependability,"An analysis is given of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments. Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability. Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software 'time between error' distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines.",An analysis is given of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments.,,"Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability.","Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software 'time between error' distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines.",Tang_1992,,,,,,1,,,,,,, International Conference on Intelligence in Communication Systems,,2004,16/1,,?,https://link.springer.com/chapter/10.1007/978-3-540-30179-0_27,Risto Vaarandi,A breadth-first algorithm for mining frequent patterns from event logs,,,,,,,,,,,,,,,,,,, International Conference on Automated Software Engineering,ASE,1998,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/732614/,J.H. Andrews,"Testing using log file analysis: tools, methods, and issues","Large software systems often keep log files of events. Such log files can be analyzed to check whether a run of a program reveals faults in the system. We discuss how such log files can be used in software testing. We present a framework for automatically analyzing log files, and describe a language for specifying analyzer programs and an implementation of that language. The language permits compositional, compact specifications of software, which act as test oracles; we discuss the use and efficacy of these oracles for unit- and system-level testing in various settings. We explore methodological issues such as efficiency and logging policies, and the scope and limitations of the framework. We conclude that testing using log file analysis constitutes a useful methodology for software verification, somewhere between current testing practice and formal verification methodologies.","We discuss how such log files can be used in software testing. We present a framework for automatically analyzing log files, and describe a language for specifying analyzer programs and an implementation of that language.",Large software systems often keep log files of events. Such log files can be analyzed to check whether a run of a program reveals faults in the system.,,"We conclude that testing using log file analysis constitutes a useful methodology for software verification, somewhere between current testing practice and formal verification methodologies.",Andrews_1998,,,,,,,,,,1,,, International Conference on Computer Communication and Informatics,,2015,6/2,,?,https://ieeexplore.ieee.org/document/7218075/,"Amit Aeri , Shyam Tukadiya",A comparative study of network based system log management tools,,,,,,,,,,,,,,,,,,, Global Trends in Computing and Communication Systems,,2012,10/1,,?,http://link.springer.com/chapter/10.1007/978-3-642-29219-4_47,"G.Sudhamathy, B. Sarojini Illango, C. Jothi Venkateswaran",A Comparative Study on Web Log Clustering Approaches,,,,,,,,,,,,,,,,,,, International Journal of Computer Science and Applications,,2009,journal,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-67650473447&partnerID=40&md5=cb48baedb7508b176daba2beff1b3698,"Ra, I., Park, T.-K.",A forensic logging system based on a secure os,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2007,research track paper,Computer Software,A,https://ieeexplore.ieee.org/abstract/document/4273008/,"Adam Oliner , Jon Stearley",What supercomputers say: A study of five system logs,"If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampered by the inaccessibility of empirical data. This paper addresses that dearth by examining system logs from five supercomputers, with the aim of providing useful insight and direction for future research into the use of such logs. We present details about the systems, methods of log collection, and how alerts were identified; propose a simpler and more effective filtering algorithm; and define operational context to encompass the crucial information that we found to be currently missing from most logs. The machines we consider (and the number of processors) are: Blue Gene/L (131072), Red Storm (10880), Thunderbird (9024), Spirit (1028), and Liberty (512). This is the first study of raw system logs from multiple supercomputers.","This paper addresses that dearth by examining system logs from five supercomputers, with the aim of providing useful insight and direction for future research into the use of such logs.","If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampered by the inaccessibility of empirical data.","We present details about the systems, methods of log collection, and how alerts were identified; propose a simpler and more effective filtering algorithm; and define operational context to encompass the crucial information that we found to be currently missing from most logs. The machines we consider (and the number of processors) are: Blue Gene/L (131072), Red Storm (10880), Thunderbird (9024), Spirit (1028), and Liberty (512). This is the first study of raw system logs from multiple supercomputers.",,Oliner_2007,,,,,,1,,,,,,, Knowledge and Information Systems,,2001,journal,"Information Systems Business and Management",B,https://link.springer.com/article/10.1007/PL00011675,"Mehmet Sayal, Peter Scheuermann",Distributed web log mining using maximal large itemsets,,,,,,,,,,,,,,,,,,, International Conference on Information Visualisation,IV,2002,research track paper,"Artificial Intelligence and Image Processing Design Practice and Management",B,https://ieeexplore.ieee.org/document/1028831/,"T. Takada , H. Koike",Tudumi: information visualization system for monitoring and auditing computer logs,,,,,,,,,,,,,,,,,,, Annual Computer Security Applications Conference,ACSAC,2003,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/1254330/,"C. Abad , J. Taylor , C. Sengul , W. Yurcik , Y. Zhou , K. Rowe",Log correlation for intrusion detection: a proof of concept,"Intrusion detection is an important part of networked-systems security protection. Although commercial products exist, finding intrusions has proven to be a difficult task with limitations under current techniques. Therefore, improved techniques are needed. We argue the need for correlating data among different logs to improve intrusion detection systems accuracy. We show how different attacks are reflected in different logs and argue that some attacks are not evident when a single log is analyzed. We present experimental results using anomaly detection for the virus Yaha. Through the use of data mining tools (RIPPER) and correlation among logs we improve the effectiveness of an intrusion detection system while reducing false positives.",We argue the need for correlating data among different logs to improve intrusion detection systems accuracy.,"Intrusion detection is an important part of networked-systems security protection. Although commercial products exist, finding intrusions has proven to be a difficult task with limitations under current techniques. Therefore, improved techniques are needed.",We show how different attacks are reflected in different logs and argue that some attacks are not evident when a single log is analyzed. We present experimental results using anomaly detection for the virus Yaha.,Through the use of data mining tools (RIPPER) and correlation among logs we improve the effectiveness of an intrusion detection system while reducing false positives.,Abad_2003,,,,,,,1,,,,,, Hawaii International Conference on System Sciences,HICSS ,2003,research track paper,Information Systems,A,https://ieeexplore.ieee.org/document/1174915/,"A. Ulrich , H. Hallal , A. Petrenko , S. Boroday",Verifying trustworthiness requirements in distributed systems with formal log-file analysis,"The paper reports on an analysis technology based on the tracing approach to test trustworthy requirements of a distributed system. The system under test is instrumented such that it generates events at runtime to enable reasoning about the implementation of these requirements in a later step. Specifically, an event log collected during a system run is converted into a specification of the system. The (trustworthy) requirements of the system must be formally specified by an expert who has sufficient knowledge about the behaviour of the system. The reengineered model of the system and the requirement descriptions are then processed by an off-the-shelf model checker. The model checker generates scenarios that visualize fulfilments or violations of the requirements. A complex example of a concurrent system serves as a case study.",The paper reports on an analysis technology based on the tracing approach to test trustworthy requirements of a distributed system,,"The system under test is instrumented such that it generates events at runtime to enable reasoning about the implementation of these requirements in a later step. Specifically, an event log collected during a system run is converted into a specification of the system. The (trustworthy) requirements of the system must be formally specified by an expert who has sufficient knowledge about the behaviour of the system. The reengineered model of the system and the requirement descriptions are then processed by an off-the-shelf model checker. The model checker generates scenarios that visualize fulfilments or violations of the requirements. A complex example of a concurrent system serves as a case study.",,Ulrich_2003,,,,,,,,,,,1,, International Conference on Dependability,,2009,research track paper,,?,https://ieeexplore.ieee.org/document/5211082/,"M. Cinque , D. Cotroneo , A. Pecchia",A Logging Approach for Effective Dependability Evaluation of Complex Systems,,,,,,,,,,,,,,,,,,, Data Warehousing and Knowledge Discovery,DaWaK,2003,research track paper,Data Format,B,http://link.springer.com/chapter/10.1007/978-3-540-45228-7_36,"Kimmo Hätönen, Jean François Boulicaut, Mika Klemettinen, Markus Miettinen, Cyrille Masson",Comprehensive Log Compression with Frequent Patterns,,,,,,,,,,,,,,,,,,, Intelligent Data Analysis,,2003,journal,"Artificial Intelligence and Image Processing Data Format Cognitive Science",B,https://content.iospress.com/articles/intelligent-data-analysis/ida00122,"Pabarskaite, Zidrina",Decision trees for web log mining,,,,,,,,,,,,,,,,,,, IEEE Transactions on Software Engineering,TSE,2003,journal,"Computer Software Information Systems",A*,https://ieeexplore.ieee.org/document/1214327/,"J.H. Andrews , Yingjun Zhang",General test result checking with log file analysis,"We describe and apply a lightweight formal method for checking test results. The method assumes that the software under test writes a text log file; this log file is then analyzed by a program to see if it reveals failures. We suggest a state-machine-based formalism for specifying the log file analyzer programs and describe a language and implementation based on that formalism. We report on empirical studies of the application of log file analysis to random testing of units. We describe the results of experiments done to compare the performance and effectiveness of random unit testing with coverage checking and log file analysis to other unit testing procedures. The experiments suggest that writing a formal log file analyzer and using random testing is competitive with other formal and informal methods for unit testing.",We describe and apply a lightweight formal method for checking test results.,,"The method assumes that the software under test writes a text log file; this log file is then analyzed by a program to see if it reveals failures. We suggest a state-machine-based formalism for specifying the log file analyzer programs and describe a language and implementation based on that formalism. We report on empirical studies of the application of log file analysis to random testing of units. We describe the results of experiments done to compare the performance and effectiveness of random unit testing with coverage checking and log file analysis to other unit testing procedures.",The experiments suggest that writing a formal log file analyzer and using random testing is competitive with other formal and informal methods for unit testing.,Andrews_2003,,,,,,,,,,1,,, Journal of Computer Science,,2011,journal,,?,http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.676.671,"J. Vellingiri , S. Chenthur P",A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification,,,,,,,,,,,,,,,,,,, Annual Computer Security Applications Conference,ACSAC,2004,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/1377226/,"E.L. Barse , E. Jonsson",Extracting attack manifestations to determine log data requirements for intrusion detection,"Log data adapted for intrusion detection is a little explored research issue despite its importance for successful and efficient detection of attacks and intrusions. This paper presents a starting point in the search for suitable log data by providing a framework for determining exactly which log data that can reveal a specific attack, i.e. the attack manifestations. An attack manifestation consists of the log entries added, changed or removed by the attack compared to normal behaviour. We demonstrate the use of the framework by studying attacks in different types of log data. This work provides a foundation for a fully automated attack analysis. It also provides some pointers for how to define a collection of log elements that are both sufficient and necessary for detection of a specific group of attacks. We believe that this lead to a log data source that is especially adapted for intrusion detection purposes.","This paper presents a starting point in the search for suitable log data by providing a framework for determining exactly which log data that can reveal a specific attack, i.e. the attack manifestations.",Log data adapted for intrusion detection is a little explored research issue despite its importance for successful and efficient detection of attacks and intrusions. ,"An attack manifestation consists of the log entries added, changed or removed by the attack compared to normal behaviour. We demonstrate the use of the framework by studying attacks in different types of log data. This work provides a foundation for a fully automated attack analysis. It also provides some pointers for how to define a collection of log elements that are both sufficient and necessary for detection of a specific group of attacks.",We believe that this lead to a log data source that is especially adapted for intrusion detection purposes.,Barse_2004,,,,,,,1,,,,,, "IEEE International Conference on Web Services ",ICWS,2004,research track paper,Information Systems,A,https://ieeexplore.ieee.org/document/1314724/,"S.M.S. da Cruz , M.L.M. Campos , P.F. Pires , L.M. Campos",Monitoring e-business Web services usage through a log based architecture,"The emergence of Web services represents a significant advance in the continuing evolution of e-business. In order to fully explore business opportunities provided by this paradigm, it is important to track its utilization. This can be done through the use of logging facilities. However, current Web logging approaches do not contemplate Web services utilization. This paper presents a Web services logging architecture based on SOAP intermediaries that captures comprehensive services usage information, which can be explored to improve B2B and B2C transactions by providing feedback on customer electronic behavior.","This paper presents a Web services logging architecture based on SOAP intermediaries that captures comprehensive services usage information, which can be explored to improve B2B and B2C transactions by providing feedback on customer electronic behavior.","The emergence of Web services represents a significant advance in the continuing evolution of e-business. In order to fully explore business opportunities provided by this paradigm, it is important to track its utilization. This can be done through the use of logging facilities. However, current Web logging approaches do not contemplate Web services utilization.",,,Cruz_2004,,1,,,,,,,,,,, Asia Joint Conference on Information Security,,2017,7/2,,?,https://ieeexplore.ieee.org/document/8026037/,"Mamoru Mimura , Yuhei Otsubo , Hidehiko Tanaka , Hidema Tanaka",A Practical Experiment of the HTTP-Based RAT Detection Method in Proxy Server Logs,,,,,,,,,,,,,,,,,,, IEEE Transactions on Software Engineering,TSE,2004,journal,"Computer Software Information Systems",A*,https://ieeexplore.ieee.org/document/1359769/,"J. Tian , S. Rudraraju , Zhao Li",Evaluating Web software reliability based on workload and failure data extracted from server logs,"We characterize usage and problems for Web applications, evaluate their reliability, and examine the potential for reliability improvement. Based on the characteristics of Web applications and the overall Web environment, we classify Web problems and focus on the subset of source content problems. Using information about Web accesses, we derive various measurements that can characterize Web site workload at different levels of granularity and from different perspectives. These workload measurements, together with failure information extracted from recorded errors, are used to evaluate the operational reliability for source contents at a given Web site and the potential for reliability improvement. We applied this approach to the Web sites www.seas.smu.edu and www.kde.org. The results demonstrated the viability and effectiveness of our approach.","We characterize usage and problems for Web applications, evaluate their reliability, and examine the potential for reliability improvement.",,"Based on the characteristics of Web applications and the overall Web environment, we classify Web problems and focus on the subset of source content problems. Using information about Web accesses, we derive various measurements that can characterize Web site workload at different levels of granularity and from different perspectives. These workload measurements, together with failure information extracted from recorded errors, are used to evaluate the operational reliability for source contents at a given Web site and the potential for reliability improvement. We applied this approach to the Web sites www.seas.smu.edu and www.kde.org.",The results demonstrated the viability and effectiveness of our approach.,Tian_2004,,,,,,,,,,,,1, Large Installation System Administration Conference,LISA,1998,,,?,http://static.usenix.org/events/lisa98/full_papers/girardin/girardin.pdf,"Luc Girardin, Dominique Brodbeck",A Visual Approach for Monitoring Logs.,,,,,,,,,,,,,,,,,,, International Conference on Parallel and Distributed Processing Techniques and Applications,PDPTA,2005,7/2,Distributed Computing,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-33645759565&partnerID=40&md5=851dd9cc3d4f5187b6efd70462234c38,"Paniagua, C., Xhafa, F., Caballé, S., Daradoumis, T.",A parallel grid-based implementation for real time processing of event log data in collaborative applications,,,,,,,,,,,,,,,,,,, "IEEE International Symposium on Cluster, Cloud and Grid Computing",CCGrid,2005,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/1558544/,J.E. Prewett,Incorporating information from a cluster batch scheduler and center management software into automated log file analysis,"Cluster style architectures are becoming increasingly important in the computing world. These machines have been the target of several recent attacks. In this paper, we examine how the unique characteristics of these machines, including how they are generally operated in the larger context of a computing center, can be leveraged to provide better security for these machines. We examine this by incorporating data from a batch scheduler and a user database to augment our automated log analysis using the freely available log analysis tool LoGS.","In this paper, we examine how the unique characteristics of these machines, including how they are generally operated in the larger context of a computing center, can be leveraged to provide better security for these machines.",Cluster style architectures are becoming increasingly important in the computing world. These machines have been the target of several recent attacks.,We examine this by incorporating data from a batch scheduler and a user database to augment our automated log analysis using the freely available log analysis tool LoGS.,,Prewett_2005,,,,,,,1,,,,,, International Workshop on Document Analysis Systems,DAS,2006,research track paper,Artificial Intelligence and Image Processing,B,http://link.springer.com/chapter/10.1007/11669487_26,"Michael Flaster, Bruce Hillyer, Tin Kam Ho",Exploratory Analysis System for Semi-structured Engineering Logs,,,,,,,,,,,,,,,,,,, International Conference on Information Systems Security and Privacy,,2018,12/2,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85050163209&partnerID=40&md5=e2a1b801179ec3676a0f4440e22ebb2e,"Wurzenberger M., Skopik F., Settanni G., Fiedler R.",AECID: A Self-learning Anomaly Detection Approach based on Light-weight Log Parser Models,,,,,,,,,,,,,,,,,,, SecureComm and Workshops,SecureComm,2006,research track paper,Computer Software,B,https://ieeexplore.ieee.org/document/4198837/,"Jianqing Zhang , Nikita Borisov , William Yurcik",Outsourcing Security Analysis with Anonymized Logs,,,,,,,,,,,,,,,,,,, International Conference on Very Large Data Bases,VLDB,2006,industry track paper,"Data Format ",A*,https://dl.acm.org/citation.cfm?id=1164221,"Mirko Steinle, Karl Aberer, Sarunas Girdzijauskas, Christian Lovis",Mapping moving landscapes by mining mountains of logs: novel techniques for dependency model generation,"Problem diagnosis for distributed systems is usually difficult. Thus, an automated support is needed to identify root causes of encountered problems such as performance lags or inadequate functioning quickly. The many tools and techniques existing today that perform this task rely usually on some dependency model of the system. However, in complex and fast evolving environments it is practically unfeasible to keep such a model up-to-date manually and it has to be created in an automatic manner. For high level objects this is in itself a challenging and less studied task. In this paper, we propose three different approaches to discover dependencies by mining system logs. Our work is inspired by a recently developed data mining algorithm and techniques for collocation extraction from the natural language processing field. We evaluate the techniques in a case study for Geneva University Hospitals (HUG) and perform large-scale experiments on production data. Results show that all techniques are capable of finding useful dependency information with reasonable precision in a real-world environment.","In this paper, we propose three different approaches to discover dependencies by mining system logs.","Problem diagnosis for distributed systems is usually difficult. Thus, an automated support is needed to identify root causes of encountered problems such as performance lags or inadequate functioning quickly. The many tools and techniques existing today that perform this task rely usually on some dependency model of the system. However, in complex and fast evolving environments it is practically unfeasible to keep such a model up-to-date manually and it has to be created in an automatic manner. For high level objects this is in itself a challenging and less studied task.",Our work is inspired by a recently developed data mining algorithm and techniques for collocation extraction from the natural language processing field. We evaluate the techniques in a case study for Geneva University Hospitals (HUG) and perform large-scale experiments on production data.,Results show that all techniques are capable of finding useful dependency information with reasonable precision in a real-world environment.,Steinle_2006,,,,,,,,,,,1,, Asian Internet Engineering Conference,,2017,8/2,,?,https://dl.acm.org/citation.cfm?id=3154973,"Kazuki Otomo, Satoru Kobayashi, Kensuke Fukuda, Hiroshi Esaki",An Analysis of Burstiness and Causality of System Logs,,,,,,,,,,,,,,,,,,, "International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing",,2010,8/2,,?,https://ieeexplore.ieee.org/document/5724812/,"Marcos Dias de Assuncao , Anne-Cecile Orgerie , Laurent Lefevre",An Analysis of Power Consumption Logs from a Monitored Grid Site,,,,,,,,,,,,,,,,,,, International Conferences on Big Data and Cloud Computing,,2016,,,?,https://ieeexplore.ieee.org/document/7723745/,"Wei Peng , Yongjiang Li , Bing Li , Xiangyuan Zhu",An Analysis Platform of Road Traffic Management System Log Data Based on Distributed Storage and Parallel Computing Techniques,,,,,,,,,,,,,,,,,,, ICIC Express Letters,,2018,8/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85053260742&partnerID=40&md5=780576d4a61cc857c5cf084a17f8cf6d,"Wongthai W., van Moorsel A.",An approach to defining and identifying logging system patterns for infrastructure as a service cloud,,,,,,,,,,,,,,,,,,, International Journal of Computer Applications,IJCA,2010,journal,,?,http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.4736&rep=rep1&type=pdf,"Nikhil Kumar Singh, Deepak Singh Tomar, Bhola Nath Roy",An approach to understand the end user behavior through log analysis,,,,,,,,,,,,,,,,,,, International Journal on Artificial Intelligence Tools,,2006,,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-33750145053&partnerID=40&md5=31613bf8925628a272aef37b4926c4ff,"Adeva, J.J.G., Atxa, J.M.P.",Web misuse detection through text categorisation of application server logs,,,,,,,,,,,,,,,,,,, "IEEE/WIC/ACM International Joint Conferences on Web Inteligence and Intelligent Agent Technology ",WI-IAT,2007,research track paper,Artificial Intelligence and Image Processing,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-38949091064&partnerID=40&md5=474fefcb92ec005d6d158486cb94e1fb,"Unruh, A., Bailey, J., Ramamohanarao, K.",A logging-based approach for building more robust multi-agent systems,,,,,,,,,,,,,,,,,,, International Conference on Fuzzy Systems and Knowledge Discovery,FSKD,2007,7/2,Artificial Intelligence and Image Processing,C,https://ieeexplore.ieee.org/document/4406215/,"Sen Guo , Yongsheng Liang , Zhili Zhang , Wei Liu",Association Rule Retrieved from Web Log Based on Rough Set Theory,,,,,,,,,,,,,,,,,,, IEEE International Parallel and Distributed Processing Symposium,IPDPS,2007,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/4228363/,"Yinglung Liang , Yanyong Zhang , Hui Xiong , Ramendra Sahoo",An Adaptive Semantic Filter for Blue Gene/L Failure Log Analysis,"Frequent failure occurrences are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to better understand the failure behavior of such systems and further develop effective fault-tolerant strategies, we have collected detailed event logs from IBM Blue Gene/L, which has as many as 128K processors, and is currently the fastest supercomputer in the world. Due to the scale of such machines and the granularity of the logging mechanisms, the logs can get voluminous and usually contain records which may not all be distinct. Consequently, it is crucial to filter these logs towards isolating the specific failures, which can then be useful for subsequent analysis. However, existing filtering methods either require too much domain expertise, or produce erroneous results. This paper thus fills this crucial void by designing and developing an adaptive semantic filtering (ASF) method, which is accurate, light-weight, and more importantly, easy to automate. Specifically, ASF exploits the semantic correlation between two events, and dynamically adapts the correlation threshold based on the temporal gap between the events. We have validated the ASF method using the failure logs collected from Blue Gene/L over a period of 98 days. Our experimental results show that ASF can effectively remove redundant entries in the logs, and the filtering results can serve as a good base for future failure analysis studies.","This paper thus fills this crucial void by designing and developing an adaptive semantic filtering (ASF) method, which is accurate, light-weight, and more importantly, easy to automate.","Due to the scale of such machines and the granularity of the logging mechanisms, the logs can get voluminous and usually contain records which may not all be distinct. Consequently, it is crucial to filter these logs towards isolating the specific failures, which can then be useful for subsequent analysis. However, existing filtering methods either require too much domain expertise, or produce erroneous results.","Specifically, ASF exploits the semantic correlation between two events, and dynamically adapts the correlation threshold based on the temporal gap between the events. We have validated the ASF method using the failure logs collected from Blue Gene/L over a period of 98 days.","Our experimental results show that ASF can effectively remove redundant entries in the logs, and the filtering results can serve as a good base for future failure analysis studies.",Liang_2007,,,,1,,,,,,,,, International Conference on Computational Intelligence and Communication Networks,,2014,7/2,,?,https://ieeexplore.ieee.org/document/7065584/,"Sheetal Sahu , Praneet Saurabh , Sandeep Rai",An Enhancement in Clustering for Sequential Pattern Mining through Neural Algorithm Using Web Logs,,,,,,,,,,,,,,,,,,, Seventh International Conference on the Quantitative Evaluation of Systems,,2010,,,?,https://ieeexplore.ieee.org/document/5600405/,"Adetokunbo Makanju , A. Nur Zincir-Heywood , Evangelos E. Milios",An Evaluation of Entropy Based Approaches to Alert Detection in High Performance Cluster Logs,,,,,,,,,,,,,,,,,,, IEEE/ACM International Conference on Grid Computing,GRID,2007,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/4354137/,"Dan Gunter , Brian L. Tierney , Aaron Brown , Martin Swany , John Bresnahan , Jennifer M. Schopf",Log summarization and anomaly detection for troubleshooting distributed systems,"Today's system monitoring tools are capable of detecting system failures such as host failures, OS errors, and network partitions in near-real time. Unfortunately, the same cannot yet be said of the end-to-end distributed software stack. Any given action, for example, reliably transferring a directory of files, can involve a wide range of complex and interrelated actions across multiple pieces of software: checking user certificates and permissions, getting details for all files, performing third-party transfers, understanding re-try policy decisions, etc. We present an infrastructure for troubleshooting complex middleware, a general purpose technique for configurable log summarization, and an anomaly detection technique that works in near-real time on running Grid middleware. We present results gathered using this infrastructure from instrumented Grid middleware and applications running on the Emulab testbed. From these results, we analyze the effectiveness of several algorithms at accurately detecting a variety of performance anomalies.","We present an infrastructure for troubleshooting complex middleware, a general purpose technique for configurable log summarization, and an anomaly detection technique that works in near-real time on running Grid middleware.","Today's system monitoring tools are capable of detecting system failures such as host failures, OS errors, and network partitions in near-real time. Unfortunately, the same cannot yet be said of the end-to-end distributed software stack. Any given action, for example, reliably transferring a directory of files, can involve a wide range of complex and interrelated actions across multiple pieces of software: checking user certificates and permissions, getting details for all files, performing third-party transfers, understanding re-try policy decisions, etc.",We present results gathered using this infrastructure from instrumented Grid middleware and applications running on the Emulab testbed. ,"From these results, we analyze the effectiveness of several algorithms at accurately detecting a variety of performance anomalies.",Gunter_2007,,,,,,,,,,,,,1 International Conference on Secure Software Integration and Reliability Improvement,,2010,10/2,,?,https://ieeexplore.ieee.org/document/5502846/,"Zhen Ming Jiang , Alberto Avritzer , Emad Shihab , Ahmed E. Hassan , Parminder Flora",An Industrial Case Study on Speeding Up User Acceptance Testing by Mining Execution Logs,,,,,,,,,,,,,,,,,,, IEEE Access,,2018,journal,,?,https://ieeexplore.ieee.org/document/8371223/,"Zhaoli Liu , Tao Qin , Xiaohong Guan , Hezhi Jiang , Chenxu Wang",An Integrated Method for Anomaly Detection From Massive System Logs,"Logs are generated by systems to record the detailed runtime information about system operations, and log analysis plays an important role in anomaly detection at the host or network level. Most existing detection methods require a priori knowledge, which cannot be used to detect the new or unknown anomalies. Moreover, the growing volume of logs poses new challenges to anomaly detection. In this paper, we propose an integrated method using K-prototype clustering and k-NN classification algorithms, which uses a novel clustering-filtering-refinement framework to perform anomaly detection from massive logs. First, we analyze the characteristics of system logs and extract 10 features based on the session information to characterize user behaviors effectively. Second, based on these extracted features, the K-prototype clustering algorithm is applied to partition the data set into different clusters. Then, the obvious normal events which usually present as highly coherent clusters are filtered out, and the others are regarded as anomaly candidates for further analysis. Finally, we design two new distance-based features to measure the local and global anomaly degrees for these anomaly candidates. Based on these two new features, we apply the k-NN classifier to generate accurate detection results. To verify the integrated method, we constructed a log collection and anomaly detection platform in the campus network center of Xi'an Jiaotong University. The experimental results based on the data sets collected from the platform show our method has high detection accuracy and low computational complexity.",,,,,,,,,,,,,,,,,, Turkish Journal of Electrical Engineering and Computer Sciences,,2016,journal,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84963876682&partnerID=40&md5=3f9440c7a1e2e5f91b4fba7c863d83a7,"Hajamydeen, A.I., Udzir, N.I., Mahmod, R., Abdul Ghani, A.A.",An unsupervised heterogeneous log-based framework for anomaly detection,,,,,,,,,,,,,,,,,,, International Conference on Software and Information Engineering,,2018,research track paper,,?,https://dl.acm.org/citation.cfm?id=3220269,"Marlina Abdul Latib, Saiful Adli Ismail, Othman Mohd Yusop, Pritheega Magalingam, Azri Azmi",Analysing Log Files For Web Intrusion Investigation Using Hadoop,,,,,,,,,,,,,,,,,,, Conference on Emerging Technologies and Factory Automation,,2010,8/2,,?,https://ieeexplore.ieee.org/document/5641202/,"Volodymyr Vasyutynskyy , Andre Gellrich , Klaus Kabitzsch , David Wustmann",Analysis of internal logistic systems based on event logs,,,,,,,,,,,,,,,,,,, American Control Conference,,2011,6/2,,?,https://ieeexplore.ieee.org/document/5989966/,"Changyan Zhou , Ratnesh Kumar , Shengbing Jiang",Analysis of runtime data-log for software fault localization,,,,,,,,,,,,,,,,,,, Information Sciences,,2007,journal,"Library and Information Studies Information Systems",B,https://www.sciencedirect.com/science/article/pii/S0020025508001886,"Pekka Kumpulainen, Kimmo Hätönen",Local anomaly detection for network system log monitoring,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2008,industry track paper,Computer Software,A,https://ieeexplore.ieee.org/document/4630109/,"Chinghway Lim , Navjot Singh , Shalini Yajnik",A log mining approach to failure analysis of enterprise telephony systems,"Log monitoring techniques to characterize system and user behavior have gained significant popularity. Some common applications of study of systems logs are syslog mining to detect and predict system failure behavior, Web log mining to characterize Web usage patterns, and error/debug log analysis for detecting anomalies. In this paper, we discuss our experiences with applying log mining techniques to characterize the behavior of large enterprise telephony systems. We aim to detect, and in some cases, predict system anomalies. We describe the problems encountered in the study of such logs and propose some solutions. The key differentiator of our solutions is the use of individual message frequencies to characterize system behavior and the ability to incorporate domain-specific knowledge through user feedback. The techniques that we propose are general enough to be applicable to other systems logs and can easily be packaged into automated tools for log analysis.","In this paper, we discuss our experiences with applying log mining techniques to characterize the behavior of large enterprise telephony systems.","Log monitoring techniques to characterize system and user behavior have gained significant popularity. Some common applications of study of systems logs are syslog mining to detect and predict system failure behavior, Web log mining to characterize Web usage patterns, and error/debug log analysis for detecting anomalies",We describe the problems encountered in the study of such logs and propose some solutions. The key differentiator of our solutions is the use of individual message frequencies to characterize system behavior and the ability to incorporate domain-specific knowledge through user feedback.,The techniques that we propose are general enough to be applicable to other systems logs and can easily be packaged into automated tools for log analysis.,Lim_2008,,,,,,1,,,,,,, Large Installation System Administration Conference,LISA,2014,,,?,https://www.usenix.org/system/files/conference/lisa14/lisa14-paper-alspaugh.pdf,"S. Alspaugh, Beidi Chen, Jessica Lin, Archana Ganapathi, Marti A. Hearst, Randy Katz",Analyzing Log Analysis: An Empirical Study of User Log Mining.,,,,,,,,,,,,,,,,,,, "Transactions on Systems, Man, and Cybernetics, Part C",,2012,journal,,?,https://ieeexplore.ieee.org/document/6392466/,"Karen A. Garcia , Raúl Monroy , Luis A. Trejo , Carlos Mex-Perera , Eduardo Aguirre",Analyzing Log Files for Postmortem Intrusion Detection,,,,,,,,,,,,,,,,,,, International Conference on Advanced Information Networking and Applications,AINA,2007,research track paper,Distributed Computing,B,https://ieeexplore.ieee.org/document/4220975/,"Masahiro Nagao , Gen Kitagata , Takuo Suganuma , Norio Shiratori",Stepwise Log Summarizing Method for Real-Time Network Event Handling,,,,,,,,,,,,,,,,,,, Journal of Software: Evolution and Process,,2008,journal,,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-49749129669&partnerID=40&md5=5b5cf07c8f977d63df4078a006187a89,"Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.",An automated approach for abstracting execution logs to execution events,,,,,,,,,,,,,,,,,,, International Conference on Enterprise Information Systems,ICEIS,2008,8/2,Information Systems,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-55349121225&partnerID=40&md5=6e8cc419a325b3fa61e0c53b826c16f9,"Bezerra, F., Wainer, J.",Anomaly detection algorithms in business process logs,,,,,,,,,,,,,,,,,,, International Workshop on Visualization for Computer Security,VizSec,2008,research track paper,Computer Software,C,http://link.springer.com/chapter/10.1007/978-3-540-85933-8_15,"Sergey Bratus, Axel Hansen, Fabio Pellacini, Anna Shubina","Backhoe, a Packet Trace and Log Browser",,,,,,,,,,,,,,,,,,, "International Conference on Availability, Reliability and Security",ARES,2008,research track paper,Computer Software,B,https://ieeexplore.ieee.org/document/4529398/,"Adrian Frei , Marc Rennhard",Histogram Matrix: Log File Visualization for Anomaly Detection,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2010,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/5544279/,"Marcello Cinque , Domenico Cotroneo , Roberto Natella , Antonio Pecchia",Assessing and improving the effectiveness of logs for the analysis of software faults,"Event logs are the primary source of data to characterize the dependability behavior of a computing system during the operational phase. However, they are inadequate to provide evidence of software faults, which are nowadays among the main causes of system outages. This paper proposes an approach based on software fault injection to assess the effectiveness of logs to keep track of software faults triggered in the field. Injection results are used to provide guidelines to improve the ability of logging mechanisms to report the effects of software faults. The benefits of the approach are shown by means of experimental results on three widely used software systems.",This paper proposes an approach based on software fault injection to assess the effectiveness of logs to keep track of software faults triggered in the field.,"Event logs are the primary source of data to characterize the dependability behavior of a computing system during the operational phase. However, they are inadequate to provide evidence of software faults, which are nowadays among the main causes of system outages.",Injection results are used to provide guidelines to improve the ability of logging mechanisms to report the effects of software faults. The benefits of the approach are shown by means of experimental results on three widely used software systems.,,Cinque_2010,,1,,,,,,,,,,, "Annual Conference on Privacy, Security and Trust",PST,2008,research track paper,Data Format,C,https://ieeexplore.ieee.org/document/4641277/,"Adetokunbo Makanju , Stephen Brooks , A. Nur Zincir-Heywood , Evangelos E. Milios",LogView: Visualizing Event Log Clusters,,,,,,,,,,,,,,,,,,, International Symposium on Software Reliability Engineering,ISSRE,2008,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/4700316/,"Leonardo Mariani , Fabrizio Pastore",Automated Identification of Failure Causes in System Logs,"Log files are commonly inspected by system administrators and developers to detect suspicious behaviors and diagnose failure causes. Since size of log files grows fast, thus making manual analysis impractical, different automatic techniques have been proposed to analyze log files. Unfortunately, accuracy and effectiveness of these techniques are often limited by the unstructured nature of logged messages and the variety of data that can be logged.This paper presents a technique to automatically analyze log files and retrieve important information to identify failure causes. The technique automatically identifies dependencies between events and values in logs corresponding to legal executions, generates models of legal behaviors and compares log files collected during failing executions with the generated models to detect anomalous event sequences that are presented to users. Experimental results show the effectiveness of the technique in supporting developers and testers to identify failure causes.",This paper presents a technique to automatically analyze log files and retrieve important information to identify failure causes.,"Log files are commonly inspected by system administrators and developers to detect suspicious behaviors and diagnose failure causes. Since size of log files grows fast, thus making manual analysis impractical, different automatic techniques have been proposed to analyze log files. Unfortunately, accuracy and effectiveness of these techniques are often limited by the unstructured nature of logged messages and the variety of data that can be logged.",,,Mariani_2008,,,,,,,,,,,1,, ACM International Conference on Knowledge Discovery and Data Mining,KDD,2009,research track paper,Data Format,A*,https://dl.acm.org/citation.cfm?id=1557154,"Adetokunbo A.O. Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios",Clustering event logs using iterative partitioning,"The importance of event logs, as a source of information in systems and network management cannot be overemphasized. With the ever increasing size and complexity of today's event logs, the task of analyzing event logs has become cumbersome to carry out manually. For this reason recent research has focused on the automatic analysis of these log files. In this paper we present IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs. Through a 3-Step hierarchical partitioning process IPLoM partitions log data into its respective clusters. In its 4th and final stage IPLoM produces cluster descriptions or line formats for each of the clusters produced. Unlike other similar algorithms IPLoM is not based on the Apriori algorithm and it is able to find clusters in data whether or not its instances appear frequently. Evaluations show that IPLoM outperforms the other algorithms statistically significantly, and it is also able to achieve an average F-Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%.","In this paper we present IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs.","The importance of event logs, as a source of information in systems and network management cannot be overemphasized. With the ever increasing size and complexity of today's event logs, the task of analyzing event logs has become cumbersome to carry out manually. For this reason recent research has focused on the automatic analysis of these log files.",Through a 3-Step hierarchical partitioning process IPLoM partitions log data into its respective clusters. In its 4th and final stage IPLoM produces cluster descriptions or line formats for each of the clusters produced. Unlike other similar algorithms IPLoM is not based on the Apriori algorithm and it is able to find clusters in data whether or not its instances appear frequently.,"Evaluations show that IPLoM outperforms the other algorithms statistically significantly, and it is also able to achieve an average F-Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%.",Makanju_2009,,,,1,,,,,,,,, ACM SIGOPS Symposium on Operating Systems Principles,SOSP,2009,research track paper,Computer Software,A*,https://www.scopus.com/inward/record.url?eid=2-s2.0-77956513188&partnerID=40&md5=f933498ff0de2c1f2a0538c2b8c77706,"Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.",Detecting large-scale system problems by mining console logs,"Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We use a combination of program analysis and information retrieval techniques to transform free-text console logs into numerical features, which captures sequences of events in the system. We then analyze these features using machine learning to detect operational problems. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. In addition, we extend our methods to online problem detection where the sequences of events are continuously generated as data streams.",We propose a general methodology to mine this rich source of information to automatically detect system runtime problems.,"Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers.","We use a combination of program analysis and information retrieval techniques to transform free-text console logs into numerical features, which captures sequences of events in the system. We then analyze these features using machine learning to detect operational problems. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. In addition, we extend our methods to online problem detection where the sequences of events are continuously generated as data streams.",,Xu_2009,,,,,,1,,,,,,, International Conference on Advances in ICT for Emerging Regions,,2018,8/2,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85049515670&partnerID=40&md5=7b07eabe3991a6fa1950398eee0a0b30,"Jayathilake P.W.D.C., Weeraddana N.R., Hettiarachchi H.K.E.P.",Automatic detection of multi-line templates in software log files,,,,,,,,,,,,,,,,,,, International IT Capacity and Performance Conference,,2015,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84959421143&partnerID=40&md5=7bfaa59b8c083760dc478544b6bac653,"Awad, M., Menascé, D.A.",Automatic workload characterization using system log analysis,,,,,,,,,,,,,,,,,,, Empirical Software Engineering,,2009,journal,Computer Software,A,http://link.springer.com/article/10.1007/s10664-008-9084-6,"Toan Huynh, James Miller",Another viewpoint on “evaluating web software reliability based on workload and failure data extracted from server logs”,"An approach of determining a website’s reliability is evaluated in this paper. This technique extracts workload measures and error codes from the server’s data logs. This information is then used to calculate the reliability for a particular website. This study follows on from a previous study, and hence, can be regarded as a “partial replication” (technically, as both studies are case studies not formal experiments, this description is inaccurate. Unfortunately, no corresponding definition exists for case studies, and hence the term is used to convey a general sense of purpose) of the original study. Although the method proposed by the original study is feasible, the effectiveness of just using a specific error type and a specific workload to estimate the reliability of websites is questionable. In this study, different error types and their usefulness for reliability analysis are examined and discussed. After a thorough investigation, we believe that reliability analysis for websites must be based on more specific error definitions as they can provide a superior reliability estimate for today’s highly dynamic websites.",An approach of determining a website’s reliability is evaluated in this paper.,"Although the method proposed by the original study is feasible, the effectiveness of just using a specific error type and a specific workload to estimate the reliability of websites is questionable.","In this study, different error types and their usefulness for reliability analysis are examined and discussed.","After a thorough investigation, we believe that reliability analysis for websites must be based on more specific error definitions as they can provide a superior reliability estimate for today’s highly dynamic websites.",Huynh_2009,,,,,,,,,,,,1, "International Conference on Availability, Reliability and Security",ARES,2009,research track paper,Computer Software,B,https://ieeexplore.ieee.org/document/5066470/,"Narate Taerat , Nichamon Naksinehaboon , Clayton Chandler , James Elliott , Chokchai Leangsuksun , George Ostrouchov , Stephen L. Scott , Christian Engelmann",Blue Gene/L Log Analysis and Time to Interrupt Estimation,,,,,,,,,,,,,,,,,,, Web Intelligence and Agent Systems: an international journal,,2009,journal,"Other Information and Computing Sciences Artificial Intelligence and Image Processing Cognitive Science",C,https://www.scopus.com/inward/record.url?eid=2-s2.0-62449224713&partnerID=40&md5=e396327f550ab0951d1f1a56bb8f2759,"Unruh, A., Bailey, J., Ramamohanarao, K.",Building more robust multi-agent systems using a log-based approach,,,,,,,,,,,,,,,,,,, Conference on Communications and Network Security,,2018,7/2,,?,https://ieeexplore.ieee.org/document/8433138/,"Tao Qin , Chao He , Hezhi Jiang , Ruoya Chen",Behavior Rhythm: An Effective Model for Massive Logs Characterizing and Security Monitoring in Cloud,,,,,,,,,,,,,,,,,,, "International Conference on Behavioral, Economic and Socio-cultural Computing",,2015,7/2,,?,https://ieeexplore.ieee.org/document/7365981/,"Sizhong Du , Jian Cao",Behavioral anomaly detection approach based on log monitoring,,,,,,,,,,,,,,,,,,, IEEE International Conference on Data Mining,ICDM,2009,research track paper,Data Format,A*,https://ieeexplore.ieee.org/document/5360240/,"Qiang Fu , Jian-Guang Lou , Yi Wang , Jiang Li",Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis,"Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomalies detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages' timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experiments on Hadoop and SILK show that the technique can effectively detect running anomalies.","In this paper, we propose an unstructured log analysis technique for anomalies detection.","Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis.","In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages' timing information.","With these learned models, we can automatically detect anomalies in newly input log files. Experiments on Hadoop and SILK show that the technique can effectively detect running anomalies.",Fu_2009,,,,,,1,,,,,,, IEEE International Conference on Data Mining,ICDM,2009,research track paper,Data Format,A*,https://ieeexplore.ieee.org/document/5360285/,"Wei Xu , Ling Huang , Armando Fox , David Patterson , Michael Jordan",Online System Problem Detection by Mining Patterns of Console Logs,"We describe a novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in an online setting. Different from existing solutions, we use a two stage detection system. The first stage uses frequent pattern mining and distribution estimation techniques to capture the dominant patterns (both frequent sequences and time duration). The second stage use principal component analysis based anomaly detection technique to identify actual problems. Using real system data from a 203-node Hadoop cluster, we show that we can not only achieve highly accurate and fast problem detection, but also help operators better understand execution patterns in their system.",We describe a novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in an online setting.,,"Different from existing solutions, we use a two stage detection system. The first stage uses frequent pattern mining and distribution estimation techniques to capture the dominant patterns (both frequent sequences and time duration). The second stage use principal component analysis based anomaly detection technique to identify actual problems.","Using real system data from a 203-node Hadoop cluster, we show that we can not only achieve highly accurate and fast problem detection, but also help operators better understand execution patterns in their system.",Xu_2009a,,,,,,1,,,,,,, Machine Learning and Knowledge Discovery in Databases,ECML PKDD,2009,research track paper,Data Format,A,http://link.springer.com/chapter/10.1007/978-3-642-04180-8_32,"Michal Aharon, Gilad Barash, Ira Cohen, Eli Mordechai",One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs,"In this paper we describe our work on pattern discovery in system event logs. For discovering the patterns we developed two novel algorithms. The first is a sequential and efficient text clustering algorithm which automatically discovers the templates generating the messages. The second, the PARIS algorithm (Principle Atom Recognition In Sets), is a novel algorithm which discovers patterns of messages that represent processes occurring in the system. We demonstrate the usefulness of our analysis, on real world logs from various systems, for debugging of complex systems, efficient search and visualization of logs and characterization of system behavior.",In this paper we describe our work on pattern discovery in system event logs,,"For discovering the patterns we developed two novel algorithms. The first is a sequential and efficient text clustering algorithm which automatically discovers the templates generating the messages. The second, the PARIS algorithm (Principle Atom Recognition In Sets), is a novel algorithm which discovers patterns of messages that represent processes occurring in the system. We demonstrate the usefulness of our analysis, on real world logs from various systems, for debugging of complex systems, efficient search and visualization of logs and characterization of system behavior.",,Aharon_2009,,,,1,,,,,,,,, Knowledge-Based and Intelligent Information and Engineering Systems,KES,2010,research track paper,Information Systems,B,http://link.springer.com/chapter/10.1007/978-3-642-15387-7_21,"Iago Porto-Díaz, Óscar Fontenla-Romero, Amparo Alonso-Betanzos",A Log Analyzer Agent for Intrusion Detection in a Multi-Agent System,,,,,,,,,,,,,,,,,,, "IEEE Conference on Systems, Man and Cybernetics",SMC,2010,,Information Systems,B,https://ieeexplore.ieee.org/document/5641988/,"Yi Hu , Alina Campan , James Walden , Irina Vorobyeva , Justin Shelton",An effective log mining approach for database intrusion detection,,,,,,,,,,,,,,,,,,, IEEE International Conference on Data Mining,ICDM,2010,research track paper,Data Format,A*,https://ieeexplore.ieee.org/document/5694003/,"Liang Tang , Tao Li",LogTree: A Framework for Generating System Events from Raw Textual Logs,"Modern computing systems are instrumented to generate huge amounts of system logs and these data can be utilized for understanding and complex system behaviors. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Recent works apply clustering techniques to translate the raw log messages into system events using only the word/term information. In this paper, we first illustrate the drawbacks of existing techniques for event generation from system logs. We then propose Log Tree, a novel and algorithm-independent framework for events generation from raw system log messages. Log Tree utilizes the format and structural information of the raw logs in the clustering process to generate system events with better accuracy. In addition, an indexing data structure, Message Segment Table, is proposed in Log Tree to significantly improve the efficiency of events creation. Extensive experiments on real system logs demonstrate the effectiveness and efficiency of Log Tree.","In this paper, we first illustrate the drawbacks of existing techniques for event generation from system logs. We then propose Log Tree, a novel and algorithm-independent framework for events generation from raw system log messages.",Modern computing systems are instrumented to generate huge amounts of system logs and these data can be utilized for understanding and complex system behaviors. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Recent works apply clustering techniques to translate the raw log messages into system events using only the word/term information.,"Log Tree utilizes the format and structural information of the raw logs in the clustering process to generate system events with better accuracy. In addition, an indexing data structure, Message Segment Table, is proposed in Log Tree to significantly improve the efficiency of events creation.",Extensive experiments on real system logs demonstrate the effectiveness and efficiency of Log Tree., Tang_2010,,,,1,,,,,,,,, "International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications",,2013,,,?,https://dl.acm.org/citation.cfm?id=2501228,"Farhana Zulkernine, Patrick Martin, Wendy Powley, Sima Soltani, Serge Mankovskii, Mark Addleman",CAPRI: a tool for mining complex line patterns in large log data,,,,,,,,,,,,,,,,,,, International Workshop on Large-Scale Network Security,,2016,workshop,,?,https://ieeexplore.ieee.org/document/7847169/,"Jain-Shing Wu , Yuh-Jye Lee , Te-En Wei , Chih-Hung Hsieh , Chia-Min Lai",ChainSpot: Mining Service Logs for Cyber Security Threat Detection,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2012,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/6263946/,"Catello Di Martino , Marcello Cinque , Domenico Cotroneo",Assessing time coalescence techniques for the analysis of supercomputer logs,"This paper presents a novel approach to assess time coalescence techniques. These techniques are widely used to reconstruct the failure process of a system and to estimate dependability measurements from its event logs. The approach is based on the use of automatically generated logs, accompanied by the exact knowledge of the ground truth on the failure process. The assessment is conducted by comparing the presumed failure process, reconstructed via coalescence, with the ground truth. We focus on supercomputer logs, due to increasing importance of automatic event log analysis for these systems. Experimental results show how the approach allows to compare different time coalescence techniques and to identify their weaknesses with respect to given system settings. In addition, results revealed an interesting correlation between errors caused by the coalescence and errors in the estimation of dependability measurements.",This paper presents a novel approach to assess time coalescence techniques.,These techniques are widely used to reconstruct the failure process of a system and to estimate dependability measurements from its event logs.,"The approach is based on the use of automatically generated logs, accompanied by the exact knowledge of the ground truth on the failure process. The assessment is conducted by comparing the presumed failure process, reconstructed via coalescence, with the ground truth. We focus on supercomputer logs, due to increasing importance of automatic event log analysis for these systems. ","Experimental results show how the approach allows to compare different time coalescence techniques and to identify their weaknesses with respect to given system settings. In addition, results revealed an interesting correlation between errors caused by the coalescence and errors in the estimation of dependability measurements.",Martino_2012,,,,,,,,,,,1,, International Conference on Distributed Computing Systems,ICDCS,2010,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/5541622/,"Jiaqi Tan , Soila Kavulya , Rajeev Gandhi , Priya Narasimhan","Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems","The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of MapReduce programs, and they do not expose program behaviors in terms of Maps and Reduces. We have developed a novel non-intrusive log-analysis technique which extracts state-machine views of the control- and data-flows in MapReduce behavior from the native logs of Hadoop MapReduce systems, and it synthesizes these views to create a unified, causal view of MapReduce program behavior. This technique enables us to visualize MapReduce programs in terms of MapReduce-specific behaviors, aiding operators in reasoning about and debugging performance problems in MapReduce systems. We validate our technique and visualizations using a realworld workload, showing how to understand the structure and performance behavior of MapReduce jobs, and diagnose injected performance problems reproduced from real-world problems.","We have developed a novel non-intrusive log-analysis technique which extracts state-machine views of the control- and data-flows in MapReduce behavior from the native logs of Hadoop MapReduce systems, and it synthesizes these views to create a unified, causal view of MapReduce program behavior.","The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of MapReduce programs, and they do not expose program behaviors in terms of Maps and Reduces. ","We validate our technique and visualizations using a realworld workload, showing how to understand the structure and performance behavior of MapReduce jobs, and diagnose injected performance problems reproduced from real-world problems.",, Tan_2010,,,,,,,,,,,1,, "IFIP International Conference on Network and Parallel Computing ",NPC,2010,research track paper,Distributed Computing,C,http://link.springer.com/chapter/10.1007/978-3-642-15672-4_23,"Wei Zhou, Jianfeng Zhan, Dan Meng, Zhihong Zhang",Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems,,,,,,,,,,,,,,,,,,, International Conference on Intelligent Systems and Knowledge Engineering,ISKE,2010,research track paper,Artificial Intelligence and Image Processing,B,https://ieeexplore.ieee.org/document/5680850/,"Jo Swinnen , Koen Vanhoof , Els Hannes",Querying event logs: Discovering non-events in event logs,,,,,,,,,,,,,,,,,,, Large Installation System Administration Conference,LISA,2010,,,?,http://static.usenix.org/events/lisa10/tech/full_papers/lisa10_proceedings.pdf#page=171,"Ariel Rabkin, Randy Katz",Chukwa: A system for reliable large-scale log collection,,,,,,,,,,,,,,,,,,, International Conference on Ubiquitous Information Management and Communication,,2017,,,?,https://dl.acm.org/citation.cfm?id=3022288,"BKSP Kumar Raju, Nikhil Bharadwaj Gosala, G Geethakumari",CLOSER: applying aggregation for effective event reconstruction of cloud service logs,,,,,,,,,,,,,,,,,,, International Symposium on Reliable Distributed Systems,SRDS,2010,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/5623389/,"Pin Zhou , Binny Gill , Wendy Belluomini , Avani Wildani",GAUL: Gestalt Analysis of Unstructured Logs for Diagnosing Recurring Problems in Large Enterprise Storage Systems,"We present GAUL, a system to automate the whole log comparison between a new problem and the ones diagnosed in the past to identify recurring problems. GAUL uses a fuzzy match algorithm based on the contextual overlap between log lines and efficiently implements this using scalable index/search. The accuracy and efficiency of the comparison is further improved by leveraging problem set information and noise tolerance techniques. We evaluate GAUL using 4339 customer problems that occurred in all field deployments of an enterprise storage system over the course of a year. Our results show that with human-filtered logs, GAUL can identify the correct problem set 66% of the time among the top10 matches, which is 15% more accurate than the VSM system that uses cosine similarity and 19% more accurate than the ERRCMP system that uses error codes for log comparison. With unfiltered logs, the top10 match accuracy of GAUL is 40%, which is 22% more accurate than VSM and 26% more accurate than ERRCMP.","We present GAUL, a system to automate the whole log comparison between a new problem and the ones diagnosed in the past to identify recurring problems.",,GAUL uses a fuzzy match algorithm based on the contextual overlap between log lines and efficiently implements this using scalable index/search. The accuracy and efficiency of the comparison is further improved by leveraging problem set information and noise tolerance techniques. We evaluate GAUL using 4339 customer problems that occurred in all field deployments of an enterprise storage system over the course of a year. ,"Our results show that with human-filtered logs, GAUL can identify the correct problem set 66% of the time among the top10 matches, which is 15% more accurate than the VSM system that uses cosine similarity and 19% more accurate than the ERRCMP system that uses error codes for log comparison. With unfiltered logs, the top10 match accuracy of GAUL is 40%, which is 22% more accurate than VSM and 26% more accurate than ERRCMP.",Zhou_2010,,,,1,,,,,,,,, International Symposium on Software Reliability Engineering,ISSRE,2010,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/5635046/,"Sean Banerjee , Hema Srikanth , Bojan Cukic",Log-Based Reliability Analysis of Software as a Service (SaaS),"Software as a Service (SaaS) has gained momentum in the past few years and businesses have been increasingly moving to SaaS model for their IT solutions. SaaS is a newer and transformed model where software is delivered to customers as a service over the web. With the SaaS model, there is a need for service providers to ensure that the services are available and reliable for end users at all times, which introduces significant pressure on the service provider to ensure right test processes and methodologies to minimize any impact to the provisions in Service Level Agreements (SLA). There is lack of research on the unique approaches to reliability analysis of SaaS suites. In this paper, we expand traditional approaches to reliability analysis of traditional web servers and propose methods tailored towards assessing the workload and reliability of SaaS applications. In addition we show the importance of data filtration when assessing SaaS reliability from log files. Finally, we discuss the suitability of reliability measures with respect to their relevance in the context of SLAs.","In this paper, we expand traditional approaches to reliability analysis of traditional web servers and propose methods tailored towards assessing the workload and reliability of SaaS applications.","Software as a Service (SaaS) has gained momentum in the past few years and businesses have been increasingly moving to SaaS model for their IT solutions. SaaS is a newer and transformed model where software is delivered to customers as a service over the web. With the SaaS model, there is a need for service providers to ensure that the services are available and reliable for end users at all times, which introduces significant pressure on the service provider to ensure right test processes and methodologies to minimize any impact to the provisions in Service Level Agreements (SLA). There is lack of research on the unique approaches to reliability analysis of SaaS suites.","In addition we show the importance of data filtration when assessing SaaS reliability from log files. Finally, we discuss the suitability of reliability measures with respect to their relevance in the context of SLAs.",, Banerjee_2010,,,,,,,,,,,,1, Rough Sets and Knowledge Technology,,2014,,,?,http://link.springer.com/chapter/10.1007/978-3-319-11740-9_23,"Xin Cheng, Ruizhi Wang",Communication Network Anomaly Detection Based on Log File Analysis,,,,,,,,,,,,,,,,,,, European Conference on Information Warfare and Security,,2013,10/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84893487079&partnerID=40&md5=8c1f175084b8c0066a866e780fb98872,"Vaarandi, R., Niziński, P.",Comparative analysis of open-source log management solutions for security monitoring and network forensics,,,,,,,,,,,,,,,,,,, USENIX Annual Technical Conference,USENIX,2010,research track paper,Information and Computing Sciences,A,https://www.usenix.org/event/atc10/tech/full_papers/Lou.pdf,"Jian-Guang LOU, Qiang FU, Shengqi YANG, Ye XU, Jiang LI",Mining Invariants from Console Logs for System Problem Detection.,"Detecting execution anomalies is very important to the maintenance and monitoring of large-scale distributed systems. People often use console logs that are produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting console logs for the detection of anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is great demand for automatic anomaly detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomaly detection, with a novel algorithm to automatically discover program invariants in logs. At first, a log parser is used to convert the unstructured logs to structured logs. Then, the structured log messages are further grouped to log message groups according to the relationship among log parameters. After that, the program invariants are automatically mined from the log message groups. The mined invariants can reveal the inherent linear characteristics of program work flows. With these learned invariants, our technique can automatically detect anomalies in logs. Experiments on Hadoop show that the technique can effectively detect execution anomalies. Compared with the state of art, our approach can not only detect numerous real problems with high accuracy but also provide intuitive insight into the problems.","In this paper, we propose an unstructured log analysis technique for anomaly detection, with a novel algorithm to automatically discover program invariants in logs.","Detecting execution anomalies is very important to the maintenance and monitoring of large-scale distributed systems. People often use console logs that are produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting console logs for the detection of anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is great demand for automatic anomaly detection techniques based on log analysis.","At first, a log parser is used to convert the unstructured logs to structured logs. Then, the structured log messages are further grouped to log message groups according to the relationship among log parameters. After that, the program invariants are automatically mined from the log message groups. The mined invariants can reveal the inherent linear characteristics of program work flows. With these learned invariants, our technique can automatically detect anomalies in logs. ","Experiments on Hadoop show that the technique can effectively detect execution anomalies. Compared with the state of art, our approach can not only detect numerous real problems with high accuracy but also provide intuitive insight into the problems.",Lou_2010,,,,,,,,,,,1,, International Conference on Ubiquitous Robots and Ambient Intelligence,,2014,,,?,https://ieeexplore.ieee.org/document/7057539/,"Stephan Puls , Daniel Lemcke , Heinz Worn",Context-sensitive natural language generation for human readable event logs based on situation awareness in human-robot cooperation,,,,,,,,,,,,,,,,,,, NSDI'10: Proceedings of the USENIX conference on Networked systems design and implementation,,2010,,,B,https://dl.acm.org/citation.cfm?id=1855735,"Vinh The Lam, Michael Mitzenmacher, George Varghese",Carousel: scalable logging for intrusion prevention systems,,,,,,,,,,,,,,,,,,, International Conference on Engineering Applications of Neural Networks,EANN,2011,research track paper,Artificial Intelligence and Image Processing,C,http://link.springer.com/chapter/10.1007/978-3-642-23957-1_20,"Tuomo Sipola, Antti Juvonen, Joel Lehtonen",Anomaly Detection from Network Logs Using Diffusion Maps,,,,,,,,,,,,,,,,,,, IBM Journal of Research and Development,,2011,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-81555212616&partnerID=40&md5=cbfb6657bc8f4db915b3d2da41a76af2,"Aharoni, E., Fine, S., Goldschmidt, Y., Lavi, O., Margalit, O., Rosen-Zvi, M., Shpigelman, L.",Smarter log analysis,"Modern computer systems generate an enormous number of logs. IBM Mining Effectively Large Output Data Yield (MELODY) is a unique and innovative solution for handling these logs and filtering out the anomalies and failures. MELODY can detect system errors early on and avoid subsequent crashes by identifying the root causes of such errors. By analyzing the logs leading up to a problem, MELODY can pinpoint when and where things went wrong and visually present them to the user, ensuring that corrections are accurately and effectively done. We present the MELODY solution and describe its architecture, algorithmic components, functions, and benefits. After being trained on a large portion of relevant data, MELODY provides alerts of abnormalities in newly arriving log files or in streams of logs. The solution is being used by IBM services groups that support IBM xSeries® servers on a regular basis. MELODY was recently tested with ten large IBM customers who use zSeries machines and was found to be extremely useful for the information technology experts in those companies. They found that the solution's ability to reduce extensively large log data to manageable sets of highlighted messages saved them time and helped them make better use of the data.",IBM Mining Effectively Large Output Data Yield (MELODY) is a unique and innovative solution for handling these logs and filtering out the anomalies and failures. MELODY can detect system errors early on and avoid subsequent crashes by identifying the root causes of such errors.,Modern computer systems generate an enormous number of logs.,"By analyzing the logs leading up to a problem, MELODY can pinpoint when and where things went wrong and visually present them to the user, ensuring that corrections are accurately and effectively done. We present the MELODY solution and describe its architecture, algorithmic components, functions, and benefits. After being trained on a large portion of relevant data, MELODY provides alerts of abnormalities in newly arriving log files or in streams of logs. The solution is being used by IBM services groups that support IBM xSeries® servers on a regular basis. MELODY was recently tested with ten large IBM customers who use zSeries machines and was found to be extremely useful for the information technology experts in those companies.",They found that the solution's ability to reduce extensively large log data to manageable sets of highlighted messages saved them time and helped them make better use of the data.,Aharoni_2011,,,,,,,,,,,,,1 International Conference on Extending Database Technology,,2002,18/1,,?,https://link.springer.com/10.1007/3-540-45876-X_8,"Wenwu Lou, Guimei Liu, Hongjun Lu, Qiang Yang",Cut-and-pick transactions for proxy log mining,,,,,,,,,,,,,,,,,,, "World Academy of Science, Engineering and Technology",,2008,,,?,https://www.researchgate.net/profile/Mohd_Helmy_Abd_Wahab/publication/303065182_Data_Pre-processing_on_Web_Server_Logs_for_Generalized_Association_Rules_Mining/links/5736956b08ae298602e0a958/Data-Pre-processing-on-Web-Server-Logs-for-Generalized-Association-Rules-Mining.pdf,"Mohd Helmy Abd Wahab, Mohd Norzali Haji Mohd, Hafizul Fahri Hanafi, Mohamad Farhan Mohamad Mohsin",Data pre-processing on web server logs for generalized association rules mining algorithm,,,,,,,,,,,,,,,,,,, "International Conference on Dependable, Autonomic and Secure Computin",DASC,2011,research track paper,Computer Software,C,https://ieeexplore.ieee.org/document/6118346/,"Edward Chuah , Gary Lee , William-Chandra Tjhi , Shyh-Hao Kuo , Terence Hung , John Hammond , Tommy Minyard , James C. Browne",Establishing Hypothesis for Recurrent System Failures from Cluster Log Files,,,,,,,,,,,,,,,,,,, IEEE International Parallel and Distributed Processing Symposium,IPDPS,2011,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6012893/,"Ziming Zheng , Li Yu , Wei Tang , Zhiling Lan , Rinku Gupta , Narayan Desai , Susan Coghlan , Daniel Buettner",Co-analysis of RAS Log and Job Log on Blue Gene/P,"With the growth of system size and complexity, reliability has become of paramount importance for petascale systems. Reliability, Availability, and Serviceability (RAS) logs have been commonly used for failure analysis. However, analysis based on just the RAS logs has proved to be insufficient in understanding failures and system behaviors. To overcome the limitation of this existing methodologies, we analyze the Blue Gene/P RAS logs and the Blue Gene/P job logs in a cooperative manner. From our co-analysis effort, we have identified a dozen important observations about failure characteristics and job interruption characteristics on the Blue Gene/P systems. These observations can significantly facilitate the research in fault resilience of large-scale systems.","To overcome the limitation of this existing methodologies, we analyze the Blue Gene/P RAS logs and the Blue Gene/P job logs in a cooperative manner.","With the growth of system size and complexity, reliability has become of paramount importance for petascale systems. Reliability, Availability, and Serviceability (RAS) logs have been commonly used for failure analysis. However, analysis based on just the RAS logs has proved to be insufficient in understanding failures and system behaviors. ","From our co-analysis effort, we have identified a dozen important observations about failure characteristics and job interruption characteristics on the Blue Gene/P systems. ",These observations can significantly facilitate the research in fault resilience of large-scale systems., Zheng_2011,,,,,,,,1,,,,, Expert Systems with Applications,,2011,journal,"Artificial Intelligence and Image Processing Information Systems Applied Mathematics",B,https://www.scopus.com/inward/record.url?eid=2-s2.0-79953720253&partnerID=40&md5=8324a58a50914499f24b3fe322299b04,"Kim, S., Cho, N.W., Kang, B., Kang, S.-H.",Fast outlier detection for very large log data,,,,,,,,,,,,,,,,,,, International Conference on Parallel Processing,Euro-Par,2011,research track paper,Distributed Computing,A,https://link.springer.com/chapter/10.1007/978-3-642-23400-2_6,"Ana Gainaru, Franck Cappello, Stefan Trausan-Matu, Bill Kramer",Event log mining tool for large scale HPC systems,"Event log files are the most common source of information for the characterization of events in large scale systems. However the large size of these files makes the task of manual analysing log messages to be difficult and error prone. This is the reason why recent research has been focusing on creating algorithms for automatically analysing these log files. In this paper we present a novel methodology for extracting templates that describe event formats from large datasets presenting an intuitive and user-friendly output to system administrators. Our algorithm is able to keep up with the rapidly changing environments by adapting the clusters to the incoming stream of events. For testing our tool, we have chosen 5 log files that have different formats and that challenge different aspects in the clustering task. The experiments show that our tool outperforms all other algorithms in all tested scenarios achieving an average precision and recall of 0.9, increasing the correct number of groups by a factor of 1.5 and decreasing the number of false positives and negatives by an average factor of 4.",In this paper we present a novel methodology for extracting templates that describe event formats from large datasets presenting an intuitive and user-friendly output to system administrators. ,Event log files are the most common source of information for the characterization of events in large scale systems. However the large size of these files makes the task of manual analysing log messages to be difficult and error prone. This is the reason why recent research has been focusing on creating algorithms for automatically analysing these log files.,"Our algorithm is able to keep up with the rapidly changing environments by adapting the clusters to the incoming stream of events. For testing our tool, we have chosen 5 log files that have different formats and that challenge different aspects in the clustering task. ","The experiments show that our tool outperforms all other algorithms in all tested scenarios achieving an average precision and recall of 0.9, increasing the correct number of groups by a factor of 1.5 and decreasing the number of false positives and negatives by an average factor of 4.", Gainaru_2011,,,,1,,,,,,,,, Advances in Information and Computer Security,,2018,journal,,?,http://link.springer.com/chapter/10.1007/978-3-319-97916-8_10,"Katsutaka Ito, Hirokazu Hasegawa, Yukiko Yamaguchi, Hajime Shimada",Detecting Privacy Information Abuse by Android Apps from API Call Logs,,,,,,,,,,,,,,,,,,, International Symposium on Reliable Distributed Systems,SRDS,2011,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6076757/,"Kamal Kc , Xiaohui Gu",ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures,"We present an Efficient Log-based Troubleshooting(ELT) system for cloud computing infrastructures. ELT adopts a novel hybrid log mining approach that combines coarse-grained and fine-grained log features to achieve both high accuracy and low overhead. Moreover, ELT can automatically extract key log messages and perform invariant checking to greatly simplify the troubleshooting task for the system administrator. We have implemented a prototype of the ELT system and conducted an extensive experimental study using real management console logs of a production cloud system and a Hadoop cluster. Our experimental results show that ELT can achieve more efficient and powerful troubleshooting support than existing schemes. More importantly, ELT can find software bugs that cannot be detected by current cloud system management practice.",We present an Efficient Log-based Troubleshooting(ELT) system for cloud computing infrastructures. ,,"ELT adopts a novel hybrid log mining approach that combines coarse-grained and fine-grained log features to achieve both high accuracy and low overhead. Moreover, ELT can automatically extract key log messages and perform invariant checking to greatly simplify the troubleshooting task for the system administrator. We have implemented a prototype of the ELT system and conducted an extensive experimental study using real management console logs of a production cloud system and a Hadoop cluster.","Our experimental results show that ELT can achieve more efficient and powerful troubleshooting support than existing schemes. More importantly, ELT can find software bugs that cannot be detected by current cloud system management practice.", Kc_2011,,,,,,,,,,,1,, Australasian Information Security Conference,,2011,10/1,,Unranked,https://dl.acm.org/citation.cfm?id=2460421,"Malcolm Corney, George Mohay, Andrew Clark",Detection of anomalies from user profiles generated from system logs,,,,,,,,,,,,,,,,,,, Conference on Data and application security and privacy,,2011,12/2,,?,https://dl.acm.org/citation.cfm?id=1943524,"You Chen, Bradley Malin",Detection of anomalous insiders in collaborative environments via relational analysis of access logs,,,,,,,,,,,,,,,,,,, ACM Symposium on Applied Computing,SAC,2011,research track paper,Data Format,B,https://dl.acm.org/citation.cfm?id=1982298,"Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios",Storage and retrieval of system log events using a structured schema based on message type transformation,,,,,,,,,,,,,,,,,,, International Journal of Internet Technology and Secured Transactions,,2011,,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-84878807413&partnerID=40&md5=79b55c1dd1c9b6449016d5f38f291162,"Nehinbe, J.O.",Understanding the decision rules for partitioning logs of intrusion detection systems (IDS),,,,,,,,,,,,,,,,,,, World Congress on Computing and Communication Technologies,,2014,,,?,https://ieeexplore.ieee.org/document/6755103/,"M. Gomathy , V. Karthika Devi , D. Meenakshi",Developing an Error Logging Framework for Ruby on Rails Application Using AOP,,,,,,,,,,,,,,,,,,, International Systems and Storage Conference,,2017,,,?,https://dl.acm.org/citation.cfm?id=3078484,Rukma Talwadker,Dexter: faster troubleshooting of misconfiguration cases using system logs,,,,,,,,,,,,,,,,,,, International Journal of Information Security,,2012,journal,,C,https://link.springer.com/article/10.1007/s10207-012-0163-8,"Dina Hadžiosmanović, Damiano Bolzoni, Pieter H. Hartel",A log mining approach for process monitoring in SCADA,,,,,,,,,,,,,,,,,,, ACM Transactions on Computer Systems,TOCS,2012,journal,"Computer Software Information Systems",A*,https://dl.acm.org/citation.cfm?id=2110360,"Ding Yuan, Jing Zheng, Soyeon Park, Yuanyuan Zhou, Stefan Savage",Improving software diagnosability via log enhancement,"Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of troubleshooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both overhead and privacy, it is common that only the run-time log generated by a system (e.g., syslog) can be shared with the developers. Unfortunately, the ad-hoc nature of such reports are frequently insufficient for detailed failure diagnosis. This paper seeks to improve this situation within the rubric of existing practice. We describe a tool, LogEnhancer that automatically “enhances” existing logging code to aid in future post-failure debugging. We evaluate LogEnhancer on eight large, real-world applications and demonstrate that it can dramatically reduce the set of potential root failure causes that must be considered while imposing negligible overheads.","We describe a tool, LogEnhancer that automatically “enhances” existing logging code to aid in future post-failure debugging.","Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of troubleshooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both overhead and privacy, it is common that only the run-time log generated by a system (e.g., syslog) can be shared with the developers. Unfortunately, the ad-hoc nature of such reports are frequently insufficient for detailed failure diagnosis. This paper seeks to improve this situation within the rubric of existing practice.","We evaluate LogEnhancer on eight large, real-world applications ",[we] demonstrate that it can dramatically reduce the set of potential root failure causes that must be considered while imposing negligible overheads.,Yuan_2012a,,1,,,,,,,,,,, Annual Computer Software and Applications Conference,,2012,research track paper,,?,https://ieeexplore.ieee.org/document/6340147/,"Theodoros Kalamatianos , Kostas Kontogiannis , Peter Matthews",Domain Independent Event Analysis for Log Data Reduction,,,,,,,,,,,,,,,,,,, IEEE Transactions on Knowledge and Data Engineering,,2012,journal,Data Format,A,https://ieeexplore.ieee.org/document/5936060/,"Adetokunbo Makanju , A. Nur Zincir-Heywood , Evangelos E. Milios",A Lightweight Algorithm for Message Type Extraction in System Application Logs,"Message type or message cluster extraction is an important task in the analysis of system logs in computer networks. Defining these message types automatically facilitates the automatic analysis of system logs. When the message types that exist in a log file are represented explicitly, they can form the basis for carrying out other automatic application log analysis tasks. In this paper, we introduce a novel algorithm for carrying out message type extraction from event log files. IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process. The first three steps hierarchically partition the event log into groups of event log messages or event clusters. In its fourth and final stage, IPLoM produces a message type description or line format for each of the message clusters. IPLoM is able to find clusters in data irrespective of the frequency of its instances in the data, it scales gracefully in the case of long message type patterns and produces message type descriptions at a level of abstraction, which is preferred by a human observer. Evaluations show that IPLoM outperforms similar algorithms statistically significantly.","In this paper, we introduce a novel algorithm for carrying out message type extraction from event log files.","Message type or message cluster extraction is an important task in the analysis of system logs in computer networks. Defining these message types automatically facilitates the automatic analysis of system logs. When the message types that exist in a log file are represented explicitly, they can form the basis for carrying out other automatic application log analysis tasks.","IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process. The first three steps hierarchically partition the event log into groups of event log messages or event clusters. In its fourth and final stage, IPLoM produces a message type description or line format for each of the message clusters. IPLoM is able to find clusters in data irrespective of the frequency of its instances in the data, it scales gracefully in the case of long message type patterns and produces message type descriptions at a level of abstraction, which is preferred by a human observer.",Evaluations show that IPLoM outperforms similar algorithms statistically significantly.,Makanju_2012,,,,1,,,,,,,,, "International Conference on Sensing, Communication, and Networking",,2014,9/2,,?,https://ieeexplore.ieee.org/document/6990375/,"Wei Dong , Chao Huang , Jiliang Wang , Chun Chen , Jiajun Bu",Dynamic logging with Dylog for networked embedded systems,,,,,,,,,,,,,,,,,,, Science and Information Conference,,2015,,,?,https://ieeexplore.ieee.org/document/7237305/,"Aadil Al-Mahrouqi , Sameh Abdalla , Tahar Kechadi",Efficiency of network event logs as admissible digital evidence,,,,,,,,,,,,,,,,,,, International Journal of Computer Science and Information Security,,2009,journal,,?,https://arxiv.org/abs/0907.5433,"Ratnesh Kumar Jain, Dr. R. S. Kasana, Dr. Suresh Jain",Efficient web log mining using doubly linked tree,,,,,,,,,,,,,,,,,,, "IEEE Network Operations and Management Symposium ",NOMS,2012,research track paper,Computer Hardware,B,https://ieeexplore.ieee.org/document/6211882/,"Adetokunbo Makanju , A. Nur Zincir-Heywood , Evangelos E. Milios",Interactive learning of alert signatures in High Performance Cluster system logs,,,,,,,,,,,,,,,,,,, Symposium and Bootcamp on the Science of Security,,2015,,,?,https://dl.acm.org/citation.cfm?id=2746200,"Jason King, Rahul Pandita, Laurie Williams",Enabling forensics by proposing heuristics to identify mandatory log events,,,,,,,,,,,,,,,,,,, "International Conference on Computing, Networking and Communications",,2017,7/2,,?,https://ieeexplore.ieee.org/document/7876181/,"Umair Sajid Hashmi , Arsalan Darbandi , Ali Imran",Enabling proactive self-healing by data mining network failure logs,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2013,research track paper,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84883367588&partnerID=40&md5=b0a28ae69f598f6e0d3661d192d1d878,"El-Sayed, N., Schroeder, B.",Reading between the lines of failure logs: Understanding how HPC systems fail,"As the component count in supercomputing installations continues to increase, system reliability is becoming one of the major issues in designing HPC systems. These issues will become more challenging in future Exascale systems, which are predicted to include millions of CPU cores. Even with relatively reliable individual components, the sheer number of components will increase failure rates to unprecedented levels. Efficiently running those systems will require a good understanding of how different factors impact system reliability. In this paper we use a decade worth of field data made available by Los Alamos National Lab to study the impact of a diverse set of factors on the reliability of HPC systems. We provide insights into the nature of correlations between failures, and investigate the impact of factors, such as the power quality, temperature, fan and chiller reliability, system usage and utilization, and external factors, such as cosmic radiation, on system reliability",In this paper we use a decade worth of field data made available by Los Alamos National Lab to study the impact of a diverse set of factors on the reliability of HPC systems.,"As the component count in supercomputing installations continues to increase, system reliability is becoming one of the major issues in designing HPC systems. These issues will become more challenging in future Exascale systems, which are predicted to include millions of CPU cores. Even with relatively reliable individual components, the sheer number of components will increase failure rates to unprecedented levels. Efficiently running those systems will require a good understanding of how different factors impact system reliability. ","We provide insights into the nature of correlations between failures, and investigate the impact of factors, such as the power quality, temperature, fan and chiller reliability, system usage and utilization, and external factors, such as cosmic radiation, on system reliability",,Sayed_2013,,,,,,,,,,,,1, IEEE/IFIP International Conference on Dependable Systems,DSN,2015,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/7266837/,"Alina Oprea , Zhou Li , Ting-Fang Yen , Sang H. Chin , Sumayah Alrwais",Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data,"Recent years have seen the rise of sophisticated attacks including advanced persistent threats (APT) which pose severe risks to organizations and governments. Additionally, new malware strains appear at a higher rate than ever before. Since many of these malware evade existing security products, traditional defenses deployed by enterprises today often fail at detecting infections at an early stage. We address the problem of detecting early-stage APT infection by proposing a new framework based on belief propagation inspired from graph theory. We demonstrate that our techniques perform well on two large datasets. We achieve high accuracy on two months of DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of web proxy logs collected at the border of a large enterprise and identify hundreds of malicious domains overlooked by state-of-the-art security products.",We address the problem of detecting early-stage APT infection by proposing a new framework based on belief propagation inspired from graph theory.,"Recent years have seen the rise of sophisticated attacks including advanced persistent threats (APT) which pose severe risks to organizations and governments. Additionally, new malware strains appear at a higher rate than ever before. Since many of these malware evade existing security products, traditional defenses deployed by enterprises today often fail at detecting infections at an early stage.","We achieve high accuracy on two months of DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of web proxy logs collected at the border of a large enterprise and identify hundreds of malicious domains overlooked by state-of-the-art security products.",We demonstrate that our techniques perform well on two large datasets., Oprea_2015,,,,,,,1,,,,,, "IEEE Network Operations and Management Symposium ",NOMS,2012,research track paper,Computer Hardware,B,https://ieeexplore.ieee.org/document/6212005/,"Haibo Mi , Huaimin Wang , Gang Yin , Hua Cai , Qi Zhou , Tingtao Sun",Performance problems diagnosis in cloud computing systems by mining request trace logs,,,,,,,,,,,,,,,,,,, New Generation Computing: computing paradigms and computational intelligence,,2012,journal,,B,https://link.springer.com/article/10.1007/s00354-012-0105-z,"Hamid Saadatfar, Hamid Fadishei, Hossein Deldari",Predicting job failures in AuverGrid based on workload log analysis,,,,,,,,,,,,,,,,,,, CEUR Workshop Proceedings,,2015,15/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84937703874&partnerID=40&md5=1d859c7b5b9c243f880241090a3355ea,"Mokhov, A., Carmona, J.",Event log visualisation with conditional partial order graphs: From control flow to data,,,,,,,,,,,,,,,,,,, International Symposium on Reliable Distributed Systems,SRDS,2012,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6424841/,"Xiaoyu Fu , Rui Ren , Jianfeng Zhan , Wei Zhou , Zhen Jia , Gang Lu",LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems,"This paper presents a set of innovative algorithms and a system, named Log Master, for mining correlations of events that have multiple attributions, i.e., node ID, application ID, event type, and event severity, in logs of large-scale cloud and HPC systems. Different from traditional transactional data, e.g., supermarket purchases, system logs have their unique characteristics, and hence we propose several innovative approaches to mining their correlations. We parse logs into an n-ary sequence where each event is identified by an informative nine-tuple. We propose a set of enhanced apriori-like algorithms for improving sequence mining efficiency, we propose an innovative abstraction-event correlation graphs (ECGs) to represent event correlations, and present an ECGs-based algorithm for fast predicting events. The experimental results on three logs of production cloud and HPC systems, varying from 433490 entries to 4747963 entries, show that our method can predict failures with a high precision and an acceptable recall rates.","This paper presents a set of innovative algorithms and a system, named Log Master, for mining correlations of events that have multiple attributions, i.e., node ID, application ID, event type, and event severity, in logs of large-scale cloud and HPC systems. ","Different from traditional transactional data, e.g., supermarket purchases, system logs have their unique characteristics, and hence we propose several innovative approaches to mining their correlations.","We parse logs into an n-ary sequence where each event is identified by an informative nine-tuple. We propose a set of enhanced apriori-like algorithms for improving sequence mining efficiency, we propose an innovative abstraction-event correlation graphs (ECGs) to represent event correlations, and present an ECGs-based algorithm for fast predicting events.","The experimental results on three logs of production cloud and HPC systems, varying from 433490 entries to 4747963 entries, show that our method can predict failures with a high precision and an acceptable recall rates.", Fu_2012,,,,,,,,,1,,,, ACM Symposium on Applied Computing,SAC,2012,research track paper,Data Format,B,https://dl.acm.org/citation.cfm?id=2245395,"A. Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios, Markus Latzel","Spatio-temporal decomposition, clustering and identification for alert detection in system logs",,,,,,,,,,,,,,,,,,, "Symposium on Networked Systems, Design and Implementation",NSDI,2012,research track paper,Distributed Computing,B,https://dl.acm.org/citation.cfm?id=2228334,"Karthik Nagara, Charles Killian, Jennifer Neville",Structured comparative analysis of systems logs to diagnose performance problems,,,,,,,,,,,,,,,,,,, International Symposium on Software Reliability Engineering,ISSRE,2012,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/6405402/,"Antonio Pecchia , Stefano Russo",Detection of Software Failures through Event Logs: An Experimental Study,"Software faults are recognized to be among the main responsible for system failures in many application domains. Event logs play a key role to support the analysis of failures occurring under real workload conditions. Nevertheless, field experience suggests that event logs may be inaccurate at reporting software failures or they fail to provide accurate support for understanding their causes. This paper analyzes the factors that determine accurate detection of software failures through event logs. The study is based on a data set of 17,387 experiments where failures have been induced by means of software fault injection into three systems. Analysis reveals that the reporting ability of logs collected during the experiments, is not influenced by the type of fault that is activated at runtime. More importantly, analysis demonstrates that, despite the considered systems adopt very similar detection mechanisms, the ability of logs at reporting a given type of failure changes significantly across the systems. A closer inspection of collected logs reveals that characteristics, such as system architecture, placement of the logging instructions and specific supports provided by the execution environment, significantly increase accuracy of logs at runtime.",This paper analyzes the factors that determine accurate detection of software failures through event logs.,"Software faults are recognized to be among the main responsible for system failures in many application domains. Event logs play a key role to support the analysis of failures occurring under real workload conditions. Nevertheless, field experience suggests that event logs may be inaccurate at reporting software failures or they fail to provide accurate support for understanding their causes. ","The study is based on a data set of 17,387 experiments where failures have been induced by means of software fault injection into three systems.","Analysis reveals that the reporting ability of logs collected during the experiments, is not influenced by the type of fault that is activated at runtime. More importantly, analysis demonstrates that, despite the considered systems adopt very similar detection mechanisms, the ability of logs at reporting a given type of failure changes significantly across the systems. A closer inspection of collected logs reveals that characteristics, such as system architecture, placement of the logging instructions and specific supports provided by the execution environment, significantly increase accuracy of logs at runtime.",Pecchia_2012,,1,,,,,,,,,,, "Research in Attacks, Intrusions, and Defenses",RAID,2012,research track paper,Data Format,A,http://link.springer.com/chapter/10.1007/978-3-642-33338-5_15,"Jie Chu, Zihui Ge, Richard Huber, Ping Ji, Jennifer Yates, Yung-Chao Yu",ALERT-ID: Analyze Logs of the Network Element in Real Time for Intrusion Detection,"The security of the networking infrastructure (e.g., routers and switches) in large scale enterprise or Internet service provider (ISP) networks is mainly achieved through mechanisms such as access control lists (ACLs) at the edge of the network and deployment of centralized AAA (authentication, authorization and accounting) systems governing all access to network devices. However, a misconfigured edge router or a compromised user account may put the entire network at risk. In this paper, we propose enhancing existing security measures with an intrusion detection system overseeing all network management activities. We analyze device access logs collected via the AAA system, particularly TACACS+, in a global tier-1 ISP network and extract features that can be used to distinguish normal operational activities from rogue/anomalous ones. Based on our analyses, we develop a real-time intrusion detection system that constructs normal behavior models with respect to device access patterns and the configuration and control activities of individual accounts from their long-term historical logs and alerts in real-time when usage deviates from the models. Our evaluation shows that this system effectively identifies potential intrusions and misuses with an acceptable level of overall alarm rate.","We analyze device access logs collected via the AAA system, particularly TACACS+, in a global tier-1 ISP network and extract features that can be used to distinguish normal operational activities from rogue/anomalous ones.","The security of the networking infrastructure (e.g., routers and switches) in large scale enterprise or Internet service provider (ISP) networks is mainly achieved through mechanisms such as access control lists (ACLs) at the edge of the network and deployment of centralized AAA (authentication, authorization and accounting) systems governing all access to network devices. However, a misconfigured edge router or a compromised user account may put the entire network at risk.","Based on our analyses, we develop a real-time intrusion detection system that constructs normal behavior models with respect to device access patterns and the configuration and control activities of individual accounts from their long-term historical logs and alerts in real-time when usage deviates from the models.",Our evaluation shows that this system effectively identifies potential intrusions and misuses with an acceptable level of overall alarm rate., Chu_2012,,,,,,,1,,,,,, Annual Computer Security Applications Conference,ACSAC,2013,research track paper,Computer Software,A,https://dl.acm.org/citation.cfm?id=2523670,"Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, Engin Kirda",Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks,"As more and more Internet-based attacks arise, organizations are responding by deploying an assortment of security products that generate situational intelligence in the form of logs. These logs often contain high volumes of interesting and useful information about activities in the network, and are among the first data sources that information security specialists consult when they suspect that an attack has taken place. However, security products often come from a patchwork of vendors, and are inconsistently installed and administered. They generate logs whose formats differ widely and that are often incomplete, mutually contradictory, and very large in volume. Hence, although this collected information is useful, it is often dirty. We present a novel system, Beehive, that attacks the problem of automatically mining and extracting knowledge from the dirty log data produced by a wide variety of security products in a large enterprise. We improve on signature-based approaches to detecting security incidents and instead identify suspicious host behaviors that Beehive reports as potential security incidents. These incidents can then be further analyzed by incident response teams to determine whether a policy violation or attack has occurred. We have evaluated Beehive on the log data collected in a large enterprise, EMC, over a period of two weeks. We compare the incidents identified by Beehive against enterprise Security Operations Center reports, antivirus software alerts, and feedback from enterprise security specialists. We show that Beehive is able to identify malicious events and policy violations which would otherwise go undetected.","We present a novel system, Beehive, that attacks the problem of automatically mining and extracting knowledge from the dirty log data produced by a wide variety of security products in a large enterprise. ","As more and more Internet-based attacks arise, organizations are responding by deploying an assortment of security products that generate situational intelligence in the form of logs. These logs often contain high volumes of interesting and useful information about activities in the network, and are among the first data sources that information security specialists consult when they suspect that an attack has taken place. However, security products often come from a patchwork of vendors, and are inconsistently installed and administered. They generate logs whose formats differ widely and that are often incomplete, mutually contradictory, and very large in volume. Hence, although this collected information is useful, it is often dirty.","We improve on signature-based approaches to detecting security incidents and instead identify suspicious host behaviors that Beehive reports as potential security incidents. These incidents can then be further analyzed by incident response teams to determine whether a policy violation or attack has occurred. We have evaluated Beehive on the log data collected in a large enterprise, EMC, over a period of two weeks. We compare the incidents identified by Beehive against enterprise Security Operations Center reports, antivirus software alerts, and feedback from enterprise security specialists. We show that Beehive is able to identify malicious events and policy violations which would otherwise go undetected.",, Yen_2013,,,,,,,1,,,,,, International Conference on Software Engineering,ICSE,2012,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2337236,"Ding Yuan, Soyeon Park, Yuanyuan Zhou",Characterizing logging practices in open-source software,"Software logging is a conventional programming practice. As its efficacy is often important for users and developers to understand what have happened in production run, yet software logging is often done in an arbitrarily manner. So far, there have been little study for understanding logging practices in real world software. This paper makes the first attempt (to the best of our knowledge) to provide quantitative characteristic study of the current log messages within four pieces of large open-source software. First, we quantitatively show that software logging is pervasive. By examining developers’ own modifications to logging code in revision history, we find that they often do not make the log messages right in their first attempts, and thus need to spend significant amount of efforts to modify the log messages as after-thoughts. Our study further provides several interesting findings on where developers spend most of their efforts in modifying the log messages, which can give insights for programmers, tool developers, and language and compiler designers to improve the current logging practice. To demonstrate the benefit of our study, we built a simple checker based on one of our findings and effectively detected 138 new problematic logging code from studied software (24 of them are already confirmed and fixed by developers). ",This paper makes the first attempt (to the best of our knowledge) to provide quantitative characteristic study of the current log messages within four pieces of large open-source software.,"Software logging is a conventional programming practice. As its efficacy is often important for users and developers to understand what have happened in production run, yet software logging is often done in an arbitrarily manner. So far, there have been little study for understanding logging practices in real world software.","First, we quantitatively show that software logging is pervasive. By examining developers’ own modifications to logging code in revision history, we find that they often do not make the log messages right in their first attempts, and thus need to spend significant amount of efforts to modify the log messages as after-thoughts. Our study further provides several interesting findings on where developers spend most of their efforts in modifying the log messages, which can give insights for programmers, tool developers, and language and compiler designers to improve the current logging practice.","To demonstrate the benefit of our study, we built a simple checker based on one of our findings and effectively detected 138 new problematic logging code from studied software (24 of them are already confirmed and fixed by developers).",Yuan_2012,1,,,,,,,,,,,, IEEE International Conference on Engineering of Complex Computer Systems,ICECCS,2013,research track paper,Communications Technologies,A,https://ieeexplore.ieee.org/document/6601828/,"Atef Shalan , Mohammad Zulkernine",Runtime Prediction of Failure Modes from System Error Logs,"Predicting potential failure occurrences during runtime is important to achieve system resilience and avoid hazardous consequences of failures. Existing failure prediction techniques in software systems involve forecasting failure counts, effects, and occurrences. Most of these techniques predict failures that may occur in future runtime intervals and only few techniques predict them at runtime. However, they do not estimate the failure modes and they require extensive instrumentation of source code. In this paper, we provide an approach for predicting failure occurrences and modes during system runtime. Our methodology utilizes system error log records to craft runtime error-spread signature. Using system error log history, we determine a predictive function (estimator) for each failure mode based on these signatures. This estimator can be used to predict a failure mode eventuality measure (a probability of failure mode occurrence) from system error log during system runtime. An experimental evaluation using PostgreSQL opensource database is provided. Our results show high accuracy of failure occurrence and mode predictions."," In this paper, we provide an approach for predicting failure occurrences and modes during system runtime.","Predicting potential failure occurrences during runtime is important to achieve system resilience and avoid hazardous consequences of failures. Existing failure prediction techniques in software systems involve forecasting failure counts, effects, and occurrences. Most of these techniques predict failures that may occur in future runtime intervals and only few techniques predict them at runtime. However, they do not estimate the failure modes and they require extensive instrumentation of source code.","Our methodology utilizes system error log records to craft runtime error-spread signature. Using system error log history, we determine a predictive function (estimator) for each failure mode based on these signatures. This estimator can be used to predict a failure mode eventuality measure (a probability of failure mode occurrence) from system error log during system runtime. An experimental evaluation using PostgreSQL opensource database is provided.",Our results show high accuracy of failure occurrence and mode predictions., Shalan_2013,,,,,,,,,1,,,, Formal Techniques for Safety-Critical Systems,,2015,workshop,,?,http://link.springer.com/chapter/10.1007/978-3-319-17581-2_1,"Klaus Havelund, Rajeev Joshi",Experience with Rule-Based Analysis of Spacecraft Logs,,,,,,,,,,,,,,,,,,, International Workshop on Systematic Approaches to Digital Forensic Engineering,,2010,,,?,https://ieeexplore.ieee.org/document/5491960/,"Sebastian Schmerl , Michael Vogel , René Rietz , Hartmut König",Explorative Visualization of Log Data to Support Forensic Analysis and Signature Development,,,,,,,,,,,,,,,,,,, IEEE Transactions on Software Engineering,TSE,2013,journal,"Computer Software Information Systems",A*,https://ieeexplore.ieee.org/document/6320555/,"Marcello Cinque , Domenico Cotroneo , Antonio Pecchia",Event Logs for the Analysis of Software Failures: A Rule-Based Approach,"Event logs have been widely used over the last three decades to analyze the failure behavior of a variety of systems. Nevertheless, the implementation of the logging mechanism lacks a systematic approach and collected logs are often inaccurate at reporting software failures: This is a threat to the validity of log-based failure analysis. This paper analyzes the limitations of current logging mechanisms and proposes a rule-based approach to make logs effective to analyze software failures. The approach leverages artifacts produced at system design time and puts forth a set of rules to formalize the placement of the logging instructions within the source code. The validity of the approach, with respect to traditional logging mechanisms, is shown by means of around 12,500 software fault injection experiments into real-world systems.",This paper analyzes the limitations of current logging mechanisms and proposes a rule-based approach to make logs effective to analyze software failures.,"Event logs have been widely used over the last three decades to analyze the failure behavior of a variety of systems. Nevertheless, the implementation of the logging mechanism lacks a systematic approach and collected logs are often inaccurate at reporting software failures: This is a threat to the validity of log-based failure analysis.","The approach leverages artifacts produced at system design time and puts forth a set of rules to formalize the placement of the logging instructions within the source code. The validity of the approach, with respect to traditional logging mechanisms, is shown by means of around 12,500 software fault injection experiments into real-world systems.",,Cinque_2013,,1,,,,,,,,,,, European Workshop on Dependable Computing,,2011,6/2,,?,https://dl.acm.org/citation.cfm?id=1978599,"Midori Sugaya, Ken Igarashi, Masaaki Goshima, Shinpei Nakata, Kimio Kuramitsu",Extensible online log analysis system,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2016,industry track paper,Computer Software,A,https://ieeexplore.ieee.org/document/7579781/,"Pinjia He , Jieming Zhu , Shilin He , Jian Li , Michael R. Lyu",An Evaluation Study on Log Parsing and Its Use in Log Mining,"Logs, which record runtime information of modern systems, are widely utilized by developers (and operators) in system development and maintenance. Due to the ever-increasing size of logs, data mining models are often adopted to help developers extract system behavior information. However, before feeding logs into data mining models, logs need to be parsed by a log parser because of their unstructured format. Although log parsing has been widely studied in recent years, users are still unaware of the advantages of different log parsers nor the impact of them on subsequent log mining tasks. Thus they often re-implement or even re-design a new log parser, which would be time-consuming yet redundant. To address this issue, in this paper, we study four log parsers and package them into a toolkit to allow their reuse. In addition, we obtain six insightful findings by evaluating the performance of the log parsers on five datasets with over ten million raw log messages, while their effectiveness on a real-world log mining task has been thoroughly examined.","To address this issue, in this paper, we study four log parsers and package them into a toolkit to allow their reuse.","Although log parsing has been widely studied in recent years, users are still unaware of the advantages of different log parsers nor the impact of them on subsequent log mining tasks. Thus they often re-implement or even re-design a new log parser, which would be time-consuming yet redundant.",,"In addition, we obtain six insightful findings by evaluating the performance of the log parsers on five datasets with over ten million raw log messages, while their effectiveness on a real-world log mining task has been thoroughly examined.",He_2016,,,,1,,,,,,,,, International Parallel and Distributed Processing Symposium Workshop,,2016,WORKSHOP,,?,https://ieeexplore.ieee.org/document/7530062/,"Yining Zhao , Haili Xiao",Extracting Log Patterns from System Logs in LARGE,,,,,,,,,,,,,,,,,,, International Conference on Runtime Verification,RV,2013,research track paper,Computer Software,C,http://link.springer.com/chapter/10.1007/978-3-642-35632-2_17,"David Basin, Felix Klaedtke, Srdjan Marinovic, Eugen Zălinescu",Monitoring Compliance Policies over Incomplete and Disagreeing Logs,,,,,,,,,,,,,,,,,,, International Symposium on Reliable Distributed Systems,SRDS,2013,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6656267/,"Edward Chuah , Arshad Jhumka , Sai Narasimhamurthy , John Hammond , James C. Browne , Bill Barth",Linking Resource Usage Anomalies with System Failures from Cluster Log Data,"Bursts of abnormally high use of resources are thought to be an indirect cause of failures in large cluster systems, but little work has systematically investigated the role of high resource usage on system failures, largely due to the lack of a comprehensive resource monitoring tool which resolves resource use by job and node. The recently developed TACC_Stats resource use monitor provides the required resource use data. This paper presents the ANCOR diagnostics system that applies TACC_Stats data to identify resource use anomalies and applies log analysis to link resource use anomalies with system failures. Application of ANCOR to first identify multiple sources of resource anomalies on the Ranger supercomputer, then correlate them with failures recorded in the message logs and diagnosing the cause of the failures, has identified four new causes of compute node soft lockups. ANCOR can be adapted to any system that uses a resource use monitor which resolves resource use by job.",This paper presents the ANCOR diagnostics system that applies TACC_Stats data to identify resource use anomalies and applies log analysis to link resource use anomalies with system failures. ,"Bursts of abnormally high use of resources are thought to be an indirect cause of failures in large cluster systems, but little work has systematically investigated the role of high resource usage on system failures, largely due to the lack of a comprehensive resource monitoring tool which resolves resource use by job and node. The recently developed TACC_Stats resource use monitor provides the required resource use data","Application of ANCOR to first identify multiple sources of resource anomalies on the Ranger supercomputer, then correlate them with failures recorded in the message logs and diagnosing the cause of the failures, has identified four new causes of compute node soft lockups. ANCOR can be adapted to any system that uses a resource use monitor which resolves resource use by job.",, Chuah_2013,,,,,,,,1,,,,, "International Conference on Collaborative Computing: Networking, Applications and Worksharing",,2013,8/2,,?,https://ieeexplore.ieee.org/document/6680025/,"John Dwyer , Traian Marius Truta",Finding anomalies in windows event logs using standard deviation,,,,,,,,,,,,,,,,,,, "International Conference on Knowledge Science, Engineering and Management",KSEM,2013,,,B,http://link.springer.com/chapter/10.1007/978-3-642-39787-5_44,"Jungang Xu, Hui Li",The Failure Prediction of Cluster Systems Based on System Logs,,,,,,,,,,,,,,,,,,, "Journal of Aerospace Computing, Information, and Communication",,2010,journal,,?,https://arc.aiaa.org/doi/abs/10.2514/1.49356,"Howard Barringer , Alex Groce , Klaus Havelund , Margaret Smith",Formal analysis of log files,,,,,,,,,,,,,,,,,,, Digital Forensics and Cyber Crime,,2012,16/1,,?,http://link.springer.com/chapter/10.1007/978-3-642-35515-8_13,"Sean Thorpe, Indrakshi Ray, Indrajit Ray, Tyrone Grandison, Abbie Barbir, Robert France",Formal Parameterization of Log Synchronization Events within a Distributed Forensic Compute Cloud Database Environment,,,,,,,,,,,,,,,,,,, "International Workshop on Visualization for Cyber Security ",VizSec,2013,research track paper,Computer Software,C,https://dl.acm.org/citation.cfm?id=2517958,"Mansour Alsaleh, Abdullah Alqahtani, Abdulrahman Alarifi, AbdulMalik Al-Salman",Visualizing PHPIDS log files for better understanding of web server attacks,,,,,,,,,,,,,,,,,,, "IEEE Network Operations and Management Symposium ",NOMS,2014,research track paper,Computer Hardware,B,https://ieeexplore.ieee.org/document/6838292/,"Ashot N. Harutyunyan , Arnak V. Poghosyan , Naira M. Grigoryan , Mazda A. Marvasti",Abnormality analysis of streamed log data,,,,,,,,,,,,,,,,,,, Security and Communication Networks,,2014,journal,,Unranked,https://www.scopus.com/inward/record.url?eid=2-s2.0-84911896278&partnerID=40&md5=ee0efab69209672dfe66daeb33d7c778,"Xiao, Y., Yue, S., Fu, B., Ozdemir, S.",GlobalView: Building global view with log files in a distributed/networked system for accountability,,,,,,,,,,,,,,,,,,, ACM/IFIP/USENIX International Middleware Conference,Middleware,2014,research track paper,Computer Software,A,https://dl.acm.org/citation.cfm?id=2663319,"Saeed Ghanbari, Ali B. Hashemi, Cristiana Amza",Stage-aware anomaly detection through tracking log points,"We introduce Stage-aware Anomaly Detection (SAAD), a low-overhead real-time solution for detecting runtime anomalies in storage systems. Modern storage server architectures are multi-threaded and structured as a set of modules, which we call stages. SAAD leverages this to collect stage-level log summaries at runtime and to perform statistical analysis across stage instances. Stages that generate rare execution flows and/or register unusually high duration for regular flows at run-time indicate anomalies. SAAD makes two key contributions: i) limits the search space for root causes, by pinpointing specific anomalous code stages, and ii) reduces compute and storage requirements for log analysis, while preserving accuracy, through a novel technique based on log summarization. We evaluate SAAD on three distributed storage systems: HBase, Hadoop Distributed File System (HDFS), and Cassandra. We show that, with practically zero overhead, we uncover various anomalies in real-time.","We introduce Stage-aware Anomaly Detection (SAAD), a low-overhead real-time solution for detecting runtime anomalies in storage systems.",,"Modern storage server architectures are multi-threaded and structured as a set of modules, which we call stages. SAAD leverages this to collect stage-level log summaries at runtime and to perform statistical analysis across stage instances. Stages that generate rare execution flows and/or register unusually high duration for regular flows at run-time indicate anomalies.","SAAD makes two key contributions: i) limits the search space for root causes, by pinpointing specific anomalous code stages, and ii) reduces compute and storage requirements for log analysis, while preserving accuracy, through a novel technique based on log summarization. We evaluate SAAD on three distributed storage systems: HBase, Hadoop Distributed File System (HDFS), and Cassandra. We show that, with practically zero overhead, we uncover various anomalies in real-time.", Ghanbari_2014,,,,,,1,,,,,,, International Conference on Advances in Social Networks Analysis and Mining,,2014,8/2,,?,https://dl.acm.org/citation.cfm?id=3191960,"Omair Shafiq, Reda Alhajj, Jon G. Rokne",Handling incomplete data using semantic logging based social network analysis hexagon for effective application monitoring and management,,,,,,,,,,,,,,,,,,, International Conference on Future Internet Technologies,,2017,7/2,,?,https://dl.acm.org/citation.cfm?id=3095788,"Hiroshi Abe, Keiichi Shima, Yuji Sekiya, Daisuke Miyamoto, Tomohiro Ishihara, Kazuya Okada",Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data,,,,,,,,,,,,,,,,,,, IEEE International Conference on Cluster Computing,CLUSTER,2014,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6968768/,"Xiaoyu Fu , Rui Ren , Sally A. McKee , Jianfeng Zhan , Ninghui Sun",Digging deeper into cluster system logs for failure prediction and root cause diagnosis,"As the sizes of supercomputers and data centers grow towards exascale, failures become normal. System logs play a critical role in the increasingly complex tasks of automatic failure prediction and diagnosis. Many methods for failure prediction are based on analyzing event logs for large scale systems, but there is still neither a widely used one to predict failures based on both non-fatal and fatal events, nor a precise one that uses fine-grained information (such as failure type, node location, related application, and time of occurrence). A deeper and more precise log analysis technique is needed. We propose a three-step approach to draw out event dependencies and to identify failure-event generating processes. First, we cluster frequent event sequences into event groups based on common events. Then we infer causal dependencies between events in each event group. Finally, we extract failure rules based on the observation that events of the same event types, on the same nodes or from the same applications have similar operational behaviors. We use this rich information to improve failure prediction. Our approach semi-automates diagnosing the root causes of failure events, making it a valuable tool for system administrators.",We propose a three-step approach to draw out event dependencies and to identify failure-event generating processes. ,"As the sizes of supercomputers and data centers grow towards exascale, failures become normal. System logs play a critical role in the increasingly complex tasks of automatic failure prediction and diagnosis. Many methods for failure prediction are based on analyzing event logs for large scale systems, but there is still neither a widely used one to predict failures based on both non-fatal and fatal events, nor a precise one that uses fine-grained information (such as failure type, node location, related application, and time of occurrence). A deeper and more precise log analysis technique is needed.","First, we cluster frequent event sequences into event groups based on common events. Then we infer causal dependencies between events in each event group. Finally, we extract failure rules based on the observation that events of the same event types, on the same nodes or from the same applications have similar operational behaviors. We use this rich information to improve failure prediction. Our approach semi-automates diagnosing the root causes of failure events, making it a valuable tool for system administrators.",, Fu_2014,,,,,,,,,1,,,, International Workshop on Software Engineering for Resilient Systems,,2017,,,?,http://link.springer.com/chapter/10.1007/978-3-319-65948-0_12,"Marcin Kubacki, Janusz Sosnowski",Holistic Processing and Exploring Event Logs,,,,,,,,,,,,,,,,,,, Symposium and Bootcamp on the Science of Security,,2014,,,?,https://dl.acm.org/citation.cfm?id=2600185,"Lucas Layman, Sylvain David Diffo, Nico Zazworka",Human factors in webserver log file analysis: a controlled experiment on investigating malicious activity,,,,,,,,,,,,,,,,,,, Data and Applications Security and Privacy,,2013,16/1,,?,http://link.springer.com/chapter/10.1007/978-3-642-39256-6_7,"Sean Thorpe, Indrajit Ray, Tyrone Grandison, Abbie Barbir, Robert France",Hypervisor Event Logs as a Source of Consistent Virtual Machine Evidence for Forensic Cloud Investigations,,,,,,,,,,,,,,,,,,, Workshop on Changing landscapes in HPC security,,2013,,,?,https://dl.acm.org/citation.cfm?id=2465812,"Orianna DeMasi, Taghrid Samak, David H. Bailey",Identifying HPC codes via performance logs and machine learning,,,,,,,,,,,,,,,,,,, Journal of Information Security and Applications,,2018,journal,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85044103144&partnerID=40&md5=3c6d1babb3d09594d4fa8a7ee5f56941,"Parkinson S., Khan S.",Identifying irregularities in security event logs through an object-based Chi-squared test of independence,,,,,,,,,,,,,,,,,,, Communication and Networking,,2011,10/1,,?,http://link.springer.com/chapter/10.1007/978-3-642-27201-1_40,"Joon-Min Gil, Mihye Kim, Ui-Sung Song",Implementation of Log Analysis System for Desktop Grids and Its Application to Resource Group-Based Task Scheduling,,,,,,,,,,,,,,,,,,, International Parallel and Distributed Processing Symposium Workshop,,2018,WORKSHOP,,?,https://ieeexplore.ieee.org/document/8425454/,"Yining Zhao , Xiaodong Wang , Haili Xiao , Xuebin Chi",Improvement of the Log Pattern Extracting Algorithm Using Text Similarity,,,,,,,,,,,,,,,,,,, International Conference on Information Visualisation,IV,2014,research track paper,"Artificial Intelligence and Image Processing Design Practice and Management",B,https://ieeexplore.ieee.org/document/6902902/,"S. Devaux , F. Bouali , G. Venturini",DataTube4log: A Visual Tool for Mining Multi-threaded Software Logs,,,,,,,,,,,,,,,,,,, IEEE International Conference on Computer Communications,IEEE INFOCOM,2014,research track paper,Distributed Computing,A*,https://ieeexplore.ieee.org/document/6847986/,"Tatsuaki Kimura , Keisuke Ishibashi , Tatsuya Mori , Hiroshi Sawada , Tsuyoshi Toyono , Ken Nishimatsu , Akio Watanabe , Akihiro Shimoda , Kohei Shiomoto",Spatio-temporal factorization of log data for understanding network events,"Understanding the impacts and patterns of network events such as link flaps or hardware errors is crucial for diagnosing network anomalies. In large production networks, analyzing the log messages that record network events has become a challenging task due to the following two reasons. First, the log messages are composed of unstructured text messages generated by vendor-specific rules. Second, network equipment such as routers, switches, and RADIUS severs generate various log messages induced by network events that span across several geographical locations, network layers, protocols, and services. In this paper, we have tackled these obstacles by building two novel techniques: statistical template extraction (STE) and log tensor factorization (LTF). STE leverages a statistical clustering technique to automatically extract primary templates from unstructured log messages. LTF aims to build a statistical model that captures spatial-temporal patterns of log messages. Such spatial-temporal patterns provide useful insights into understanding the impacts and root cause of hidden network events. This paper first formulates our problem in a mathematical way. We then validate our techniques using massive amount of network log messages collected from a large operating network. We also demonstrate several case studies that validate the usefulness of our technique.","In this paper, we have tackled these obstacles by building two novel techniques: statistical template extraction (STE) and log tensor factorization (LTF).","Understanding the impacts and patterns of network events such as link flaps or hardware errors is crucial for diagnosing network anomalies. In large production networks, analyzing the log messages that record network events has become a challenging task due to the following two reasons. First, the log messages are composed of unstructured text messages generated by vendor-specific rules. Second, network equipment such as routers, switches, and RADIUS severs generate various log messages induced by network events that span across several geographical locations, network layers, protocols, and services",STE leverages a statistical clustering technique to automatically extract primary templates from unstructured log messages. LTF aims to build a statistical model that captures spatial-temporal patterns of log messages. Such spatial-temporal patterns provide useful insights into understanding the impacts and root cause of hidden network events. This paper first formulates our problem in a mathematical way. We then validate our techniques using massive amount of network log messages collected from a large operating network. We also demonstrate several case studies that validate the usefulness of our technique.,, Kimura_2014,,,,,,,,1,,,,, Empirical Software Engineering,,2013,journal,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84922001381&partnerID=40&md5=700ec8f7a4205cecc5238a313e8b0e7c,,Studying the relationship between logging characteristics and the code quality of platform software,"Platform software plays an important role in speeding up the development of large scale applications. Such platforms provide functionalities and abstraction on which applications can be rapidly developed and easily deployed. Hadoop and JBoss are examples of popular open source platform software. Such platform software generate logs to assist operators in monitoring the applications that run on them. These logs capture the doubts, concerns, and needs of developers and operators of platform software. We believe that such logs can be used to better understand code quality. However, logging characteristics and their relation to quality has never been explored. In this paper, we sought to empirically study this relation through a case study on four releases of Hadoop and JBoss. Our findings show that files with logging statements have higher post-release defect densities than those without logging statements in 7 out of 8 studied releases. Inspired by prior studies on code quality, we defined log-related product metrics, such as the number of log lines in a file, and log-related process metrics such as the number of changed log lines. We find that the correlations between our log-related metrics and post-release defects are as strong as their correlations with traditional process metrics, such as the number of pre-release defects, which is known to be one the metrics with the strongest correlation with post-release defects. We also find that log-related metrics can complement traditional product and process metrics resulting in up to 40 % improvement in explanatory power of defect proneness. Our results show that logging characteristics provide strong indicators of defect-prone source code files. However, we note that removing logs is not the answer to better code quality. Instead, our results show that it might be the case that developers often relay their concerns about a piece of code through logs. Hence, code quality improvement efforts (e.g., testing and inspection) should focus more on the source code files with large amounts of logs or with large amounts of log churn.","In this paper, we sought to empirically study this [logs and code quality] relation through a case study on four releases of Hadoop and JBoss. ","Platform software plays an important role in speeding up the development of large scale applications. Such platforms provide functionalities and abstraction on which applications can be rapidly developed and easily deployed. Hadoop and JBoss are examples of popular open source platform software. Such platform software generate logs to assist operators in monitoring the applications that run on them. These logs capture the doubts, concerns, and needs of developers and operators of platform software. We believe that such logs can be used to better understand code quality. However, logging characteristics and their relation to quality has never been explored",,"Our findings show that files with logging statements have higher post-release defect densities than those without logging statements in 7 out of 8 studied releases. Inspired by prior studies on code quality, we defined log-related product metrics, such as the number of log lines in a file, and log-related process metrics such as the number of changed log lines. We find that the correlations between our log-related metrics and post-release defects are as strong as their correlations with traditional process metrics, such as the number of pre-release defects, which is known to be one the metrics with the strongest correlation with post-release defects. We also find that log-related metrics can complement traditional product and process metrics resulting in up to 40 % improvement in explanatory power of defect proneness. Our results show that logging characteristics provide strong indicators of defect-prone source code files. However, we note that removing logs is not the answer to better code quality. Instead, our results show that it might be the case that developers often relay their concerns about a piece of code through logs. Hence, code quality improvement efforts (e.g., testing and inspection) should focus more on the source code files with large amounts of logs or with large amounts of log churn.",Shang_2013,1,,,,,,,,,,,, International Workshop on Dynamic Analysis,,2007,,,?,https://dl.acm.org/citation.cfm?id=1270373,"Donald J. Yantzi, James H. Andrews",Industrial Evaluation of a Log File Analysis Methodology,,,,,,,,,,,,,,,,,,, "IEEE International Symposium on Cluster, Cloud and Grid Computing",CCGrid,2014,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/6846439/,"Eunjung Yoon , Anna Squicciarini",Toward Detecting Compromised MapReduce Workers through Log Analysis,"MapReduce is a framework for performing data intensive computations in parallel on commodity computers. When MapReduce is carried out in distributed settings, users maintain very little control over these computations, causing several security and privacy concerns. MapReduce activities may be subverted or compromised by malicious or cheating nodes. In this paper, we focus on the analysis and detection of attacks launched by malicious or mis configured nodes, which may tamper with the ordinary functions of the MapReduce framework. Our goal is to investigate the extent to which integrity and correctness of computation in a MapReduce environments can be verified while introducing no modifications on the original MapReduce operations or introductions of extra operations, neither computational nor cryptographic. We identify a number of data and computation integrity checks against aggregated low-level system traces and Hadoop logs, correlated with one another to obtain insights on the operations being performed by nodes. This information is then matched against system and program invariants to effectively detect malicious activities, from lazy nodes to nodes changing input/output or completing different computations. ","In this paper, we focus on the analysis and detection of attacks launched by malicious or mis configured nodes, which may tamper with the ordinary functions of the MapReduce framework.","MapReduce is a framework for performing data intensive computations in parallel on commodity computers. When MapReduce is carried out in distributed settings, users maintain very little control over these computations, causing several security and privacy concerns. MapReduce activities may be subverted or compromised by malicious or cheating nodes. ","Our goal is to investigate the extent to which integrity and correctness of computation in a MapReduce environments can be verified while introducing no modifications on the original MapReduce operations or introductions of extra operations, neither computational nor cryptographic. We identify a number of data and computation integrity checks against aggregated low-level system traces and Hadoop logs, correlated with one another to obtain insights on the operations being performed by nodes. This information is then matched against system and program invariants to effectively detect malicious activities, from lazy nodes to nodes changing input/output or completing different computations.",, Yoon_2014,,,,,,,1,,,,,, International Conference on Data and Software Engineering,,2014,,,?,https://ieeexplore.ieee.org/document/7062673/,"Jeremy Joseph Hanniel , Tricya E. Widagdo , Yudistira D. W. Asnar",Information system log visualization to monitor anomalous user activity based on time,,,,,,,,,,,,,,,,,,, Procedia Computer Science,,2015,,,?,https://www.sciencedirect.com/science/article/pii/S1877050915004184,"Amruta Ambre, Narendra Shekokar",Insider threat detection using log analysis and event correlation,,,,,,,,,,,,,,,,,,, IEEE International Conference on Computer and Information Technology,CIT,2014,research track paper,,C,https://ieeexplore.ieee.org/document/6984648/,"Janusz R. Getta , Marcin Zimniak , Wolfgang Benn",Mining Periodic Patterns from Nested Event Logs,,,,,,,,,,,,,,,,,,, Advances in Artificial Intelligence: From Theory to Practice,,2017,?,,?,http://link.springer.com/chapter/10.1007/978-3-319-60045-1_23,"Mohamed Cherif Dani, Henri Doreau, Samantha Alt",K-means Application for Anomaly Detection and Log Classification in HPC,,,,,,,,,,,,,,,,,,, Proceedings of the NetDB,,2011,,,?,http://pages.cs.wisc.edu/~akella/CS744/F17/838-CloudPapers/Kafka.pdf,"J Kreps, N Narkhede, J Rao",Kafka: A distributed messaging system for log processing,,,,,,,,,,,,,,,,,,, International Conference on Software Technologies,ICSoft,2014,research track paper,Computer Software,B,http://link.springer.com/chapter/10.1007/978-3-319-25579-8_8,"Fredrik Abbors, Dragos Truscan, Tanwir Ahmad",Mining Web Server Logs for Creating Workload Models,,,,,,,,,,,,,,,,,,, IEEE/IFIP International Conference on Dependable Systems,DSN,2018,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/8416513/,"Francisco Neves , Nuno Machado , Jose Pereira",Falcon: A Practical Log-Based Analysis Tool for Distributed Systems,"Programmers and support engineers typically rely on log data to narrow down the root cause of unexpected behaviors in dependable distributed systems. Unfortunately, the inherently distributed nature and complexity of such distributed executions often leads to multiple independent logs, scattered across different physical machines, with thousands or millions entries poorly correlated in terms of event causality. This renders log-based debugging a tedious, time-consuming, and potentially inconclusive task. We present Falcon, a tool aimed at making log-based analysis of distributed systems practical and effective. Falcon's modular architecture, designed as an extensible pipeline, allows it to seamlessly combine several distinct logging sources and generate a coherent space-time diagram of distributed executions. To preserve event causality, even in the presence of logs collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule. Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the log-based analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation.","We present Falcon, a tool aimed at making log-based analysis of distributed systems practical and effective.","Programmers and support engineers typically rely on log data to narrow down the root cause of unexpected behaviors in dependable distributed systems. Unfortunately, the inherently distributed nature and complexity of such distributed executions often leads to multiple independent logs, scattered across different physical machines, with thousands or millions entries poorly correlated in terms of event causality. This renders log-based debugging a tedious, time-consuming, and potentially inconclusive task.","Falcon's modular architecture, designed as an extensible pipeline, allows it to seamlessly combine several distinct logging sources and generate a coherent space-time diagram of distributed executions. To preserve event causality, even in the presence of logs collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule.",Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the log-based analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation., Neves_2018,,,,,,,,,,,,,1 "Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns",,2009,8/2,,?,https://ieeexplore.ieee.org/document/5359624/,Nabil Hammoud,LEC: Log Event Correlation Architecture Based on Continuous Query,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2000,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=337194,"James H. Andrews, Yingjun Zhang",Broad-spectrum studies of log file analysis,"This paper reports on research into applying the technique of log file analysis for checking test results to a broad range of testing and other tasks. The studies undertaken included applying log file analysis to both unit- and system-level testing and to requirements of both safety-critical and non-critical systems, and the use of log file analysis in combination with other testing methods. The paper also reports on the technique of using log file analyzers to simulate the software under test, both in order to validate the analyzers and to clarify requirements. It also discusses practical issues to do with the completeness of the approach, and includes comparisons to other recently-published approaches to log file analysis.",This paper reports on research into applying the technique of log file analysis for checking test results to a broad range of testing and other tasks.,,"The studies undertaken included applying log file analysis to both unit- and system-level testing and to requirements of both safety-critical and non-critical systems, and the use of log file analysis in combination with other testing methods. The paper also reports on the technique of using log file analyzers to simulate the software under test, both in order to validate the analyzers and to clarify requirements. It also discusses practical issues to do with the completeness of the approach, and includes comparisons to other recently-published approaches to log file analysis.",,Andrews_2000,,,,,,,,,,1,,, International Symposium on Formal Methods,FM,2014,research track paper,Computation Theory and Mathematics,A,https://link.springer.com/chapter/10.1007/978-3-319-06410-9_12,"Denis Butin, Daniel Le Métayer",Log analysis for data protection accountability,"Accountability is increasingly recognised as a cornerstone of data protection, notably in European regulation, but the term is frequently used in a vague sense. For accountability to bring tangible benefits, the expected properties of personal data handling logs (used as “accounts”) and the assumptions regarding the logging process must be defined with accuracy. In this paper, we provide a formal framework for accountability and show the correctness of the log analysis with respect to abstract traces used to specify privacy policies. We also show that compliance with respect to data protection policies can be checked based on logs free of personal data, and describe the integration of our formal framework in a global accountability process.","In this paper, we provide a formal framework for accountability (...)","Accountability is increasingly recognised as a cornerstone of data protection, notably in European regulation, but the term is frequently used in a vague sense. For accountability to bring tangible benefits, the expected properties of personal data handling logs (used as “accounts”) and the assumptions regarding the logging process must be defined with accuracy.","(...) show the correctness of the log analysis with respect to abstract traces used to specify privacy policies. We also show that compliance with respect to data protection policies can be checked based on logs free of personal data, and describe the integration of our formal framework in a global accountability process.",, Butin_2014,,,,,,,1,,,,,, Science China Information Sciences,,2012,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84871752053&partnerID=40&md5=9d921d9930bc6229563775f5d38ccbf1,"Mi, H.B., Wang, H.M., Zhou, Y.F., Lyu, M.R., Cai, H.",Localizing root causes of performance anomalies in cloud computing systems by analyzing request trace logs,,,,,,,,,,,,,,,,,,, Large Installation System Administration Conference,LISA,2010,,,?,https://www.usenix.org/legacy/events/lisa10/tech/full_papers/lisa10_proceedings.pdf#page=155,Paul Krizak,Log Analysis and Event Correlation Using Variable Temporal Event Correlator (VTEC).,,,,,,,,,,,,,,,,,,, "SIAM International Conference on Data Mining ",SDM,2014,research track paper,Data Format,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84959911106&partnerID=40&md5=37f2d607ae29ce605111ec6c60895378,"Gao, Y., Zhou, W., Zhang, Z., Han, J., Meng, D., Xu, Z.",Online anomaly detection by improved grammar compression of log sequences,"Nowadays, log sequences mining techniques are widely used in detecting anomalies for Internet services. The state-ofthe-art anomaly detection methods either need significant computational costs, or require specific assumptions that the test logs are holding certain data distribution patterns in order to be effective. Therefore, it is very difficult to achieve real time responses and it greatly reduces the effectiveness of these mechanisms in reality. To address these issues, we propose an innovative anomaly detection strategy called CADM. In CADM, the relative entropy between test logs and normal logs is exploited to discover the anomalous levels. Instead of calculating the relative entropy based on certain predefined data distribution models, our solution inspects the relationship between relative entropy and compression size with an improved grammar-based compression method. No assumptions are needed. In addition, our mechanism has excellent scalability with only 0(n) computational complexity. It can generate the detection results on the fly. Experimental analysis with both synthetic and real world logs proves that CADM is superior to the other methods. It can achieve very high anomaly detection accuracy with the minimal computational overhead. It is suitable for log mining tasks and can be applied on a broad variety of application fields.","To address these issues, we propose an innovative anomaly detection strategy called CADM.","Nowadays, log sequences mining techniques are widely used in detecting anomalies for Internet services. The state-ofthe-art anomaly detection methods either need significant computational costs, or require specific assumptions that the test logs are holding certain data distribution patterns in order to be effective. Therefore, it is very difficult to achieve real time responses and it greatly reduces the effectiveness of these mechanisms in reality.","In CADM, the relative entropy between test logs and normal logs is exploited to discover the anomalous levels. Instead of calculating the relative entropy based on certain predefined data distribution models, our solution inspects the relationship between relative entropy and compression size with an improved grammar-based compression method. No assumptions are needed. In addition, our mechanism has excellent scalability with only 0(n) computational complexity. It can generate the detection results on the fly.", Experimental analysis with both synthetic and real world logs proves that CADM is superior to the other methods. It can achieve very high anomaly detection accuracy with the minimal computational overhead. It is suitable for log mining tasks and can be applied on a broad variety of application fields.,Gao_2014,,,,,,1,,,,,,, Computer Networks,,2015,journal,Distributed Computing,A,https://www.sciencedirect.com/science/article/pii/S1389128615002650,"Antti Juvonen, Tuomo Sipola, Timo Hämäläinen",Online anomaly detection using dimensionality reduction techniques for HTTP log analysis,"Modern web services face an increasing number of new threats. Logs are collected from almost all web servers, and for this reason analyzing them is beneficial when trying to prevent intrusions. Intrusive behavior often differs from the normal web traffic. This paper proposes a framework to find abnormal behavior from these logs. We compare random projection, principal component analysis and diffusion map for anomaly detection. In addition, the framework has online capabilities. The first two methods have intuitive extensions while diffusion map uses the Nyström extension. This fast out-of-sample extension enables real-time analysis of web server traffic. The framework is demonstrated using real-world network log data. Actual abnormalities are found from the dataset and the capabilities of the system are evaluated and discussed. These results are useful when designing next generation intrusion detection systems. The presented approach finds intrusions from high-dimensional datasets in real time.",This paper proposes a framework to find abnormal behavior from these logs.,"Modern web services face an increasing number of new threats. Logs are collected from almost all web servers, and for this reason analyzing them is beneficial when trying to prevent intrusions. Intrusive behavior often differs from the normal web traffic.","We compare random projection, principal component analysis and diffusion map for anomaly detection. In addition, the framework has online capabilities. The first two methods have intuitive extensions while diffusion map uses the Nyström extension. This fast out-of-sample extension enables real-time analysis of web server traffic. The framework is demonstrated using real-world network log data. Actual abnormalities are found from the dataset and the capabilities of the system are evaluated and discussed.",These results are useful when designing next generation intrusion detection systems. The presented approach finds intrusions from high-dimensional datasets in real time., Juvonen_2015,,,,,,1,,,,,,, Empirical Software Engineering,,2015,journal,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84930485160&partnerID=40&md5=7174103f09c6fce1ef61ba0f17eef5b4,"Russo, B., Succi, G., Pedrycz, W.",Mining system logs to learn error predictors: a case study of a telemetry system,"Predicting system failures can be of great benefit to managers that get a better command over system performance. Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This study discusses how to effectively mining sequences of logs and provide correct predictions. The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation, and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for telemetry and performance of cars. On this system, we discuss the ability of three well-known support vector machines - multilayer perceptron, radial basis function and linear kernels - to fit and predict defective log sequences. Our results show that a good analysis strategy provides stable, accurate predictions. Such strategy must at least require high fitting ability of models used for prediction. We demonstrate that such models give excellent predictions both on individual applications - e.g., 1 % false positive rate, 94 % true positive rate, and 95 % precision - and across system applications - on average, 9 % false positive rate, 78 % true positive rate, and 95 % precision. We also show that these results are similarly achieved for different degree of sequence defectiveness. To show how good are our results, we compare them with recent studies in system log analysis. We finally provide some recommendations that we draw reflecting on our study.",This study discusses how to effectively mining sequences of logs and provide correct predictions.,"Predicting system failures can be of great benefit to managers that get a better command over system performance. Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges."," The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation, and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for telemetry and performance of cars. On this system, we discuss the ability of three well-known support vector machines - multilayer perceptron, radial basis function and linear kernels - to fit and predict defective log sequences. To show how good are our results, we compare them with recent studies in system log analysis. We finally provide some recommendations that we draw reflecting on our study.","Our results show that a good analysis strategy provides stable, accurate predictions. Such strategy must at least require high fitting ability of models used for prediction. We demonstrate that such models give excellent predictions both on individual applications - e.g., 1 % false positive rate, 94 % true positive rate, and 95 % precision - and across system applications - on average, 9 % false positive rate, 78 % true positive rate, and 95 % precision. We also show that these results are similarly achieved for different degree of sequence defectiveness.",Russo_2015,,,,,,,,,1,,,, International Workshop on Sustainable Ultrascale Computing Systems,,2015,,,?,https://e-archivo.uc3m.es/handle/10016/21995,"Mavridis Ilias, Karatza Eleni",Log file analysis in cloud with Apache Hadoop and Apache Spark,,,,,,,,,,,,,,,,,,, Advances in Digital Forensics IX,,2013,?,,?,http://link.springer.com/chapter/10.1007/978-3-642-41148-9_10,"Gregory Bosman, Stefan Gruner",Log File Analysis with Context-Free Grammars,,,,,,,,,,,,,,,,,,, Journal of Telecommunications and Information Technology,,2015,journal,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-84952882662&partnerID=40&md5=483c5ab9eeb4ef6ca185c6308b611958,"Malec, P., Piwowar, A., Kozakiewicz, A., Lasota, K.",Detecting security violations based on multilayered event log processing,,,,,,,,,,,,,,,,,,, Symposium and Bootcamp on the Science of Security,,2014,,,?,https://dl.acm.org/citation.cfm?id=2600183,"Jason King, Laurie Williams",Log your CRUD: design principles for software logging mechanisms,,,,,,,,,,,,,,,,,,, "IEEE International Conference on Trust, Security and Privacy in Computing and Communications",TrustCom,2015,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/7345288/,"Daniel Gonçalves , João Bota , Miguel Correia",Big Data Analytics for Detecting Host Misbehavior in Large Logs,"The management of complex network infrastructures continues to be a difficult endeavor today. These infrastructures can contain a huge number of devices that may misbehave in unpredictable ways. Many of these devices keep logs that contain valuable information about the infrastructures' security, reliability, and performance. However, extracting information from that data is far from trivial. The paper presents a novel approach to assess the security of such an infrastructure using its logs, inspired on data from a real telecommunications network. We use machine learning and data mining techniques to analyze the data and semi-automatically discover misbehaving hosts, without having to instruct the system about how hosts misbehave.","The paper presents a novel approach to assess the security of such an infrastructure using its logs, inspired on data from a real telecommunications network.","The management of complex network infrastructures continues to be a difficult endeavor today. These infrastructures can contain a huge number of devices that may misbehave in unpredictable ways. Many of these devices keep logs that contain valuable information about the infrastructures' security, reliability, and performance. However, extracting information from that data is far from trivial.","We use machine learning and data mining techniques to analyze the data and semi-automatically discover misbehaving hosts, without having to instruct the system about how hosts misbehave.",,Goncalves_2015,,,,,,,1,,,,,, Operating Systems Review,,2011,,,?,https://dl.acm.org/citation.cfm?id=1945034,"Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Todd C. Mowry",Log-based architectures: using multicore to help software behave correctly,,,,,,,,,,,,,,,,,,, International Conference on Big Data Computing Service and Applications,,2018,6/2,,?,https://ieeexplore.ieee.org/document/8405724/,"Vaibhav Agrawal , Devanjal Kotia , Kamelia Moshirian , Mihui Kim",Log-Based Cloud Monitoring System for OpenStack,,,,,,,,,,,,,,,,,,, Innovative Technologies for Dependable OTS-Based Critical Systems,,2013,13/1,,?,http://link.springer.com/chapter/10.1007/978-88-470-2772-5_15,"Antonio Pecchia, Marcello Cinque",Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications,,,,,,,,,,,,,,,,,,, "IEEE International Symposium on Cluster, Cloud and Grid Computing",CCGrid,2015,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/7152468/,"Hao Lin , Jingyu Zhou , Bin Yao , Minyi Guo , Jie Li",Cowic: A Column-Wise Independent Compression for Log Stream Analysis,"Nowadays massive log streams are generated from many Internet and cloud services. Storing log streams consumes a large amount of disk space and incurs high cost. Traditional compression methods can be applied to reduce storage cost, but are inefficient for log analysis, because fetching relevant log entries from compressed data often requires retrieval and decompression of large blocks of data. We propose a column-wise compression approach for well-formatted log streams, where each log entry can be independently compressed or decompressed for analysis. Specifically, we separate a log entry into several columns and compress each column with different models. We have implemented our approach as a library and integrated it into two applications, a log search system and a log joining system. Experimental results show that our compression scheme outperforms traditional compression methods for decompression times and has a competitive compression ratio. For log search, our approach achieves better query times than using traditional compression algorithms for both in-core and out-of-core cases. For joining log streams, our approach achieves the same join quality with only 30% memory of uncompressed streams.","We propose a column-wise compression approach for well-formatted log streams, where each log entry can be independently compressed or decompressed for analysis.","Nowadays massive log streams are generated from many Internet and cloud services. Storing log streams consumes a large amount of disk space and incurs high cost. Traditional compression methods can be applied to reduce storage cost, but are inefficient for log analysis, because fetching relevant log entries from compressed data often requires retrieval and decompression of large blocks of data.","Specifically, we separate a log entry into several columns and compress each column with different models. We have implemented our approach as a library and integrated it into two applications, a log search system and a log joining system.","Experimental results show that our compression scheme outperforms traditional compression methods for decompression times and has a competitive compression ratio. For log search, our approach achieves better query times than using traditional compression algorithms for both in-core and out-of-core cases. For joining log streams, our approach achieves the same join quality with only 30% memory of uncompressed streams.",Lin_2015,,,,,1,,,,,,,, SPEC International Conference on Performance Engineering,,2018,,,?,https://dl.acm.org/citation.cfm?id=3184416,"Kundi Yao, Guilherme B. de Pádua, Weiyi Shang, Steve Sporea, Andrei Toma, Sarah Sajedi",Log4Perf: Suggesting Logging Locations for Web-based Systems' Performance Monitoring,,,,,,,,,,,,,,,,,,, IEEE International Conference on Software Maintenance and Evolution,ICSME,2014,research track paper,Computer Software,A,https://ieeexplore.ieee.org/document/6976068/,"Weiyi Shang , Meiyappan Nagappan , Ahmed E. Hassan , Zhen Ming Jiang",Understanding Log Lines Using Development Knowledge,"Logs are generated by output statements that developers insert into the code. By recording the system behaviour during runtime, logs play an important role in the maintenance of large software systems. The rich nature of logs has introduced a new market of log management applications (e.g., Splunk, XpoLog and log stash) that assist in storing, querying and analyzing logs. Moreover, recent research has demonstrated the importance of logs in operating, understanding and improving software systems. Thus log maintenance is an important task for the developers. However, all too often practitioners (i.e., operators and administrators) are left without any support to help them unravel the meaning and impact of specific log lines. By spending over 100 human hours and manually examining all the email threads in the mailing list for three open source systems (Hadoop, Cassandra and Zookeeper) and performing web search on sampled logging statements, we found 15 email inquiries and 73 inquiries from web search about different log lines. We identified that five types of development knowledge that are often sought from the logs by practitioners: meaning, cause, context, impact and solution. Due to the frequency and nature of log lines about which real customers inquire, documenting all the log lines or identifying which ones to document is not efficient. Hence in this paper we propose an on-demand approach, which associates the development knowledge present in various development repositories (e.g., code commits and issues reports) with the log lines. Our case studies show that the derived development knowledge can be used to resolve real-life inquiries about logs.","Hence in this paper we propose an on-demand approach, which associates the development knowledge present in various development repositories (e.g., code commits and issues reports) with the log lines.","Logs are generated by output statements that developers insert into the code. By recording the system behaviour during runtime, logs play an important role in the maintenance of large software systems. The rich nature of logs has introduced a new market of log management applications (e.g., Splunk, XpoLog and log stash) that assist in storing, querying and analyzing logs. Moreover, recent research has demonstrated the importance of logs in operating, understanding and improving software systems. Thus log maintenance is an important task for the developers. However, all too often practitioners (i.e., operators and administrators) are left without any support to help them unravel the meaning and impact of specific log lines.","By spending over 100 human hours and manually examining all the email threads in the mailing list for three open source systems (Hadoop, Cassandra and Zookeeper) and performing web search on sampled logging statements, we found 15 email inquiries and 73 inquiries from web search about different log lines. We identified that five types of development knowledge that are often sought from the logs by practitioners: meaning, cause, context, impact and solution.",Our case studies show that the derived development knowledge can be used to resolve real-life inquiries about logs.,Shang_2014,1,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2014,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2568246,"Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, Arvind Krishnamurthy",Inferring models of concurrent systems from logs of their behavior with CSight,"Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation. To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs. ","To provide developers with more insight into concurrent systems, we developed CSight.","Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation.","CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding.",Our evaluation finds that CSight infers accurate models that can help developers find bugs., Beschastnikh_2014,,,,,,,,,,,1,, CEUR Workshop Proceedings,,2015,15/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84962514621&partnerID=40&md5=957356863c396441f77e6823b37b485d,"Heikkinen, E., Hämäläinen, T.D.",LOGDIG log file analyzer for mining expected behavior from log files,,,,,,,,,,,,,,,,,,, Journal of Control Engineering and Applied Informatics,,2014,journal,,?,http://ceai.srait.ro/index.php?journal=ceai&page=article&op=view&path%5B%5D=2181&path%5B%5D=0,"Alecsandru Patrascu, Victor-Valeriu Patriciu",Logging system for cloud computing forensic environments,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2014,industry track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2591175,"Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, Tao Xie",Where do developers log? an empirical study on logging practices in industry,"System logs are widely used in various tasks of software system management. It is crucial to avoid logging too little or too much. To achieve so, developers need to make informed decisions on where to log and what to log in their logging practices during development. However, there exists no work on studying such logging practices in industry or helping developers make informed decisions. To fill this significant gap, in this paper, we systematically study the logging practices of developers in industry, with focus on where developers log. We obtain six valuable findings by conducting source code analysis on two large industrial systems (2.5M and 10.4M LOC, respectively) at Microsoft. We further validate these findings via a questionnaire survey with 54 experienced developers in Microsoft. In addition, our study demonstrates the high accuracy of up to 90% F-Score in predicting where to log. ","To fill this significant gap, in this paper, we systematically study the logging practices of developers in industry, with focus on where developers log.","System logs are widely used in various tasks of software system management. It is crucial to avoid logging too little or too much. To achieve so, developers need to make informed decisions on where to log and what to log in their logging practices during development. However, there exists no work on studying such logging practices in industry or helping developers make informed decisions.","We obtain six valuable findings by conducting source code analysis on two large industrial systems (2.5M and 10.4M LOC, respectively) at Microsoft. We further validate these findings via a questionnaire survey with 54 experienced developers in Microsoft.","In addition, our study demonstrates the high accuracy of up to 90% F-Score in predicting where to log. ",Fu_2014a,,,1,,,,,,,,,, International Symposium on Software Reliability Engineering,ISSRE,2015,industry track paper,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84964815259&partnerID=40&md5=ff61aeb461250ade4b672ec72445a5a3,"Farshchi, M., Schneider, J.-G., Weber, I., Grundy, J.",Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis,"Failure of application operations is one of the main causes of system-wide outages in cloud environments. This particularly applies to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration that are exposed to frequent interference from other concurrent operations, configuration changes, and resources failure. However, current practices fail to provide a reliable assurance of correct execution of these kinds of operations. In this paper, we present an approach to address this problem that adopts a regression-based analysis technique to find the correlation between an operation's activity logs and the operation activity's effect on cloud resources. The correlation model is then used to derive assertion specifications, which can be used for runtime verification of running operations and their impact on resources. We evaluated our proposed approach on Amazon EC2 with 22 rounds of rolling upgrade operations while other types of operations were running and random faults were injected. Our experiment shows that our approach successfully managed to raise alarms for 115 random injected faults, with a precision of 92.3%.","In this paper, we present an approach to address this problem that adopts a regression-based analysis technique to find the correlation between an operation's activity logs and the operation activity's effect on cloud resources. ","Failure of application operations is one of the main causes of system-wide outages in cloud environments. This particularly applies to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration that are exposed to frequent interference from other concurrent operations, configuration changes, and resources failure. However, current practices fail to provide a reliable assurance of correct execution of these kinds of operations.","The correlation model is then used to derive assertion specifications, which can be used for runtime verification of running operations and their impact on resources. We evaluated our proposed approach on Amazon EC2 with 22 rounds of rolling upgrade operations while other types of operations were running and random faults were injected.","Our experiment shows that our approach successfully managed to raise alarms for 115 random injected faults, with a precision of 92.3%.",Farshchi_2015,,,,,,1,,,,,,, IEEE International Symposium on Parallel and Distributed Processing with Applications,ISPA,2015,research track paper, Distributed Computing,B,https://ieeexplore.ieee.org/document/7345629/,"Nentawe Gurumdimma , Arshad Jhumka , Maria Liakata , Edward Chuah , James Browne",Towards Increasing the Error Handling Time Window in Large-Scale Distributed Systems Using Console and Resource Usage Logs,,,,,,,,,,,,,,,,,,, "ACM International Conference on Information and Knowledge Management ",CIKM,2016,research track paper,"Information Systems Artificial Intelligence and Image Processing Library and Information Studies",A,https://dl.acm.org/citation.cfm?id=2983358,"Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, Abdullah Mueen",LogMine: Fast Pattern Recognition for Log Analytics,"Modern engineering incorporates smart technologies in all aspects of our lives. Smart technologies are generating terabytes of log messages every day to report their status. It is crucial to analyze these log messages and present usable information (e.g. patterns) to administrators, so that they can manage and monitor these technologies. Patterns minimally represent large groups of log messages and enable the administrators to do further analysis, such as anomaly detection and event prediction. Although patterns exist commonly in automated log messages, recognizing them in massive set of log messages from heterogeneous sources without any prior information is a significant undertaking. We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages. Our method is fast, memory efficient, accurate, and scalable. LogMine is implemented in map-reduce framework for distributed platforms to process millions of log messages in seconds. LogMine is a robust method that works for heterogeneous log messages generated in a wide variety of systems. Our method exploits algorithmic techniques to minimize the computational overhead based on the fact that log messages are always automatically generated. We evaluate the performance of LogMine on massive sets of log messages generated in industrial applications. LogMine has successfully generated patterns which are as good as the patterns generated by exact and unscalable method, while achieving a 500× speedup. Finally, we describe three applications of the patterns generated by LogMine in monitoring large scale industrial systems.","We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages.","Although patterns exist commonly in automated log messages, recognizing them in massive set of log messages from heterogeneous sources without any prior information is a significant undertaking.",LogMine is implemented in map-reduce framework for distributed platforms to process millions of log messages in seconds. LogMine is a robust method that works for heterogeneous log messages generated in a wide variety of systems. Our method exploits algorithmic techniques to minimize the computational overhead based on the fact that log messages are always automatically generated. We evaluate the performance of LogMine on massive sets of log messages generated in industrial applications.,"LogMine has successfully generated patterns which are as good as the patterns generated by exact and unscalable method, while achieving a 500× speedup. Finally, we describe three applications of the patterns generated by LogMine in monitoring large scale industrial systems.", Hamooni_2016,,,,1,,,,,,,,, International Conference on Big Data,,2016,10/2,,?,https://ieeexplore.ieee.org/document/7840748/,"Ruoyu Wang , Daniel Sun , Guoqiang Li , Muhammad Atif , Surya Nepal",LogProv: Logging events as provenance of big data analytics pipelines with trustworthiness,,,,,,,,,,,,,,,,,,, ACM International Conference on Knowledge Discovery and Data Mining,KDD,2016,research track paper,Data Format,A*,https://dl.acm.org/citation.cfm?id=2939712,"Animesh Nandi, Atri Mandal, Shubham Atreja, Gargi B. Dasgupta, Subhrajit Bhattacharya",Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs,"We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs. Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.",We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs.,,"Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. (...) We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.","Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. ", Nandi_2016,,,,,,1,,,,,,, International Journal of Simulation and Process Modelling,,2016,journal,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-84989955714&partnerID=40&md5=853056c9ef5c43060ea0c33c5fd3eaed,"Chen, J., Zhu, L., Guo, Y., Huang, R., Cai, S., Zhao, X.",A mining approach for component abnormal information based on monitor log,,,,,,,,,,,,,,,,,,, ACM Transactions on Storage,,2016,journal,Data Format,B,https://dl.acm.org/citation.cfm?id=2846101,"Jayanta Basak, P. C. Nagesh",A User-Friendly Log Viewer for Storage Systems,,,,,,,,,,,,,,,,,,, Computing,,2016,journal,"Information and Computing Sciences Mathematical Sciences",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84952056410&partnerID=40&md5=67cd31322a861f6b2ed08b95ff021352,"Balliu, A., Olivetti, D., Babaoglu, O., Marzolla, M., Sîrbu, A.",A Big Data analyzer for large trace logs,"Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents Big Data analyzer (BiDAl), a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.","This paper presents Big Data analyzer (BiDAl), a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms.","During the course of [large data center] operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task.",,,Balliu_2015,,,,,,,,,,,,,1 Workshop on Artificial Intelligence and Security,,2015,,,?,https://dl.acm.org/citation.cfm?id=2808773,"Konstantin Berlin, David Slater, Joshua Saxe",Malicious Behavior Detection using Windows Audit Logs,,,,,,,,,,,,,,,,,,, "International Conference on Software Analysis, Evolution, and Reengineering",SANER,2016,research track paper,Computer Software,C,https://ieeexplore.ieee.org/document/7476654/,"Suhas Kabinna , Weiyi Shang , Cor-Paul Bezemer , Ahmed E. Hassan",Examining the Stability of Logging Statements,"Logging statements produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior research demonstrated the importance of logging statements in operating, understanding and improving software systems. The importance of logs has lead to a new market of log management and processing tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing misleading results and failures of log processing tools. In order to proactively mitigate such issues that are caused by unstable logging statements, in this paper we empirically study the stability of logging statements in four open source applications namely:Liferay, ActiveMQ, Camel and Cloud Stack. We find that 20-45% of the logging statements in our studied applications change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which they will let their tools depend. In this paper, we make an important first step towards assisting developers of log processing tools in determining whether a logging statement is likely to remain unchanged in the future. Using random forest classifiers, we examine which metrics are important for understanding whether a logging statement will change. We show that our classifiers achieve 83%-91% precision and 65%-85% recall in the four studied applications. We find that file ownership, developer experience, log density and SLOC are important metrics for determining whether a logging statement will change in the future. Developers can use this knowledge to build more robust log processing tools, by making those tools depend on logs that are generated by logging statements that are likely to remain unchanged.",,,,,,,,,,,,,,,,,, International Workshop on HPC User Support Tools,,2017,,,?,https://dl.acm.org/citation.cfm?id=3152559,"Abida Haque, Alexandra DeLucia, Elisabeth Baseman",Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs,,,,,,,,,,,,,,,,,,, Advances in Transdisciplinary Engineering,,2015,10/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84959289596&partnerID=40&md5=42a1c5cea5cb5fea3f2ae272f2657633,"Shen, G., Luo, F., Hong, G.",Measuring and evaluating source code logs using static code analyzer,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2015,industry track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2819035,"Antonio Pecchia, Marcello Cinque, Gabriella Carrozza, Domenico Cotroneo",Industry practices and event logging: assessment of a critical software development process,"Practitioners widely recognize the importance of event logging for a variety of tasks, such as accounting, system measurements and troubleshooting. Nevertheless, in spite of the importance of the tasks based on the logs collected under real workload conditions, event logging lacks systematic design and implementation practices. The implementation of the logging mechanism strongly relies on the human expertise. This paper proposes a measurement study of event logging practices in a critical industrial domain. We assess a software development process at Selex ES, a leading Finmeccanica company in electronic and information solutions for critical systems. Our study combines source code analysis, inspection of around 2.3 millions log entries, and direct feedback from the development team to gain process-wide insights ranging from programming practices, logging objectives and issues impacting log analysis. The findings of our study were extremely valuable to prioritize event logging reengineering tasks at Selex ES. ",This paper proposes a measurement study of event logging practices in a critical industrial domain.,"Practitioners widely recognize the importance of event logging for a variety of tasks, such as accounting, system measurements and troubleshooting. Nevertheless, in spite of the importance of the tasks based on the logs collected under real workload conditions, event logging lacks systematic design and implementation practices. The implementation of the logging mechanism strongly relies on the human expertise.","We assess a software development process at Selex ES, a leading Finmeccanica company in electronic and information solutions for critical systems. Our study combines source code analysis, inspection of around 2.3 millions log entries, and direct feedback from the development team to gain process-wide insights ranging from programming practices, logging objectives and issues impacting log analysis.","The findings of our study were extremely valuable to prioritize event logging reengineering tasks at Selex ES. ",Pecchia_2015,1,,,,,,,,,,,, Large Installation System Administration Conference,LISA,2002,,,?,https://www.usenix.org/event/lisa02/tech/full_papers/takada/takada_html/,"Tetsuji Takada, Hideki Koike",MieLog: A Highly Interactive Visual Log Browser Using Information Visualization and Statistical Analysis.,,,,,,,,,,,,,,,,,,, IEEE European Symposium on Security and Privacy,EuroS&P,2018,16/2,"Computer Software Computation Theory and Mathematics",New,https://ieeexplore.ieee.org/document/8406589/,"Carlos Cotrini , Thilo Weghorn , David Basin",Mining ABAC Rules from Sparse Logs,,,,,,,,,,,,,,,,,,, IEEE International Working Conference on Mining Software Repositories,MSR,2016,research track paper,"Data Format Computer Software",A,https://dl.acm.org/citation.cfm?id=2901769,"Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Ahmed E. Hassan",Logging library migrations: a case study for the apache software foundation projects,"Developers leverage logs for debugging, performance monitoring and load testing. The increased dependence on logs has lead to the development of numerous logging libraries which help developers in logging their code. As new libraries emerge and current ones evolve, projects often migrate from an older library to another one. In this paper we study logging library migrations within Apache Software Foundation (ASF) projects. From our manual analysis of JIRA issues, we find that 33 out of 223 (i.e., 14%) ASF projects have undergone at least one logging library migration. We find that the five main drivers for logging library migration are: 1) to increase flexibility (i.e., the ability to use different logging libraries within a project) 2) to improve performance, 3) to reduce effort spent on code maintenance, 4) to reduce dependence on other libraries and 5) to obtain specific features from the new logging library. We find that over 70% of the migrated projects encounter on average two post-migration bugs due to the new logging library. Furthermore, our findings suggest that performance (traditionally one of the primary drivers for migrations) is rarely improved after a migration.",In this paper we study logging library migrations within Apache Software Foundation (ASF) projects.,"Developers leverage logs for debugging, performance monitoring and load testing. The increased dependence on logs has lead to the development of numerous logging libraries which help developers in logging their code. As new libraries emerge and current ones evolve, projects often migrate from an older library to another one.",,"From our manual analysis of JIRA issues, we find that 33 out of 223 (i.e., 14%) ASF projects have undergone at least one logging library migration. We find that the five main drivers for logging library migration are: 1) to increase flexibility (i.e., the ability to use different logging libraries within a project) 2) to improve performance, 3) to reduce effort spent on code maintenance, 4) to reduce dependence on other libraries and 5) to obtain specific features from the new logging library. We find that over 70% of the migrated projects encounter on average two post-migration bugs due to the new logging library. Furthermore, our findings suggest that performance (traditionally one of the primary drivers for migrations) is rarely improved after a migration.",Kabinna_2016,1,,,,,,,,,,,, Transactions on Network and Service Management,,2018,journal,,?,https://ieeexplore.ieee.org/document/8122062/,"Satoru Kobayashi , Kazuki Otomo , Kensuke Fukuda , Hiroshi Esaki",Mining Causality of Network Events in Log Data,,,,,,,,,,,,,,,,,,, Operating Systems Review,,2010,,,?,https://dl.acm.org/citation.cfm?id=1740411,"Jian-Guang Lou, Qiang Fu, Yi Wang, Jiang Li",Mining dependency in distributed systems through unstructured logs analysis,,,,,,,,,,,,,,,,,,, Journal of Information Processing,,2016,journal,Information and Computing Sciences,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-84978371359&partnerID=40&md5=95dd3e25fca1003eb9daceeb17c10bc8,"Ono, Y., Sakurai, K., Yamane, S.",LogChamber: Inferring source code locations corresponding to mobile applications run-time logs,,,,,,,,,,,,,,,,,,, International Conference on Architectural Support for Programming Languages and Operating Systems,ASPLOS,2016,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2872407,"Xiao Yu, Pallavi Joshi, Jianwu Xu, Guoliang Jin, Hui Zhang, Guofei Jiang",CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs,"Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions. CloudSeer enables effective workflow monitoring. It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis. Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.",CloudSeer enables effective workflow monitoring.,"Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions.","It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis.","Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.",Yu_2016,,,,,,,,,,,,,1 European Dependable Computing Conference,EDCC,2014,8/2,"Other Information and Computing Sciences Information Systems",Unranked,https://ieeexplore.ieee.org/document/6821088/,"Santonu Sarkar , Rajeshwari Ganesan , Marcello Cinque , Flavio Frattini , Stefano Russo , Agostino Savignano",Mining Invariants from SaaS Application Logs (Practical Experience Report),,,,,,,,,,,,,,,,,,, Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques,SLAM,2011,workshop paper,,?,https://dl.acm.org/citation.cfm?id=2038638,"Stefan Weigert, Matti Hiltunen, Christof Fetzer",Mining large distributed log data in near real time,,,,,,,,,,,,,,,,,,, "SIGKDD Explorations Newsletter - Natural language processing and text mining: Volume 7 Issue 1, June",,2005,,,?,https://dl.acm.org/citation.cfm?id=1089822,"Wei Peng, Tao Li, Sheng Ma",Mining logs files for data-driven system management,,,,,,,,,,,,,,,,,,, "International Conference on Web-Age Information Management ",WAIM,2016,research track paper,Information Systems,C,http://link.springer.com/chapter/10.1007/978-3-319-39958-4_13,"Jinwei Guo, Chendong Zhang, Peng Cai, Minqi Zhou, Aoying Zhou",Low Overhead Log Replication for Main Memory Database System,,,,,,,,,,,,,,,,,,, IEEE International Symposium on Parallel and Distributed Processing with Applications,ISPA,2016,research track paper, Distributed Computing,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85015198781&partnerID=40&md5=b2f7f0fdb530c9d3a0df6056943bacb9,"Feng, B., Wu, C., Li, J.",MLC: An efficient multi-level log compression method for cloud backup systems,,,,,,,,,,,,,,,,,,, International Symposium on Integrated Network Management,,2011,,,?,https://ieeexplore.ieee.org/document/5990536/,"Thomas Reidemeister , Miao Jiang , Paul A.S. Ward",Mining unstructured log files for recurrent fault diagnosis,,,,,,,,,,,,,,,,,,, IEEE International Symposium on Parallel and Distributed Processing with Applications,ISPA,2016,research track paper, Distributed Computing,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85015203510&partnerID=40&md5=69ac802a9b0c75ba99c003daa0f83abc,"Xu, Y., Hou, Z.",NVM-assisted non-redundant logging for android systems,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2015,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2818807,"Jieming Zhu, Pinjia He, Qiang Fu, Hongyu Zhang, Michael R. Lyu, Dongmei Zhang",Learning to log: helping developers make informed logging decisions,"Logging is a common programming practice of practical importance to collect system runtime information for postmortem analysis. Strategic logging placement is desired to cover necessary runtime information without incurring unintended consequences (e.g., performance overhead, trivial logs). However, in current practice, there is a lack of rigorous specifications for developers to govern their logging behaviours. Logging has become an important yet tough decision which mostly depends on the domain knowledge of developers. To reduce the effort on making logging decisions, in this paper, we propose a ""learning to log"" framework, which aims to provide informative guidance on logging during development. As a proof of concept, we provide the design and implementation of a logging suggestion tool, LogAdvisor, which automatically learns the common logging practices on where to log from existing logging instances and further leverages them for actionable suggestions to developers. Specifically, we identify the important factors for determining where to log and extract them as structural features, textual features, and syntactic features. Then, by applying machine learning techniques (e.g., feature selection and classifier learning) and noise handling techniques, we achieve high accuracy of logging suggestions. We evaluate LogAdvisor on two industrial software systems from Microsoft and two open-source software systems from GitHub (totally 19.1M LOC and 100.6K logging statements). The encouraging experimental results, as well as a user study, demonstrate the feasibility and effectiveness of our logging suggestion tool. We believe our work can serve as an important first step towards the goal of ""learning to log"".","To reduce the effort on making logging decisions, in this paper, we propose a ""learning to log"" framework, which aims to provide informative guidance on logging during development.","Logging is a common programming practice of practical importance to collect system runtime information for postmortem analysis. Strategic logging placement is desired to cover necessary runtime information without incurring unintended consequences (e.g., performance overhead, trivial logs). However, in current practice, there is a lack of rigorous specifications for developers to govern their logging behaviours. Logging has become an important yet tough decision which mostly depends on the domain knowledge of developers.","As a proof of concept, we provide the design and implementation of a logging suggestion tool, LogAdvisor, which automatically learns the common logging practices on where to log from existing logging instances and further leverages them for actionable suggestions to developers. Specifically, we identify the important factors for determining where to log and extract them as structural features, textual features, and syntactic features. Then, by applying machine learning techniques (e.g., feature selection and classifier learning) and noise handling techniques, we achieve high accuracy of logging suggestions. We evaluate LogAdvisor on two industrial software systems from Microsoft and two open-source software systems from GitHub (totally 19.1M LOC and 100.6K logging statements).","The encouraging experimental results, as well as a user study, demonstrate the feasibility and effectiveness of our logging suggestion tool. We believe our work can serve as an important first step towards the goal of ""learning to log"".",Zhu_2015,,,1,,,,,,,,,, IEEE Software,,2016,journal,Computer Software,B,https://ieeexplore.ieee.org/document/7412635/,"Andriy Miranskyy , Abdelwahab Hamou-Lhadj , Enzo Cialini , Alf Larsson",Operational-Log Analysis for Big Data Systems: Challenges and Solutions,,,,,,,,,,,,,,,,,,, "International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems",MASCOTS,2016,research track paper,Computer Hardware,A,https://ieeexplore.ieee.org/document/7774577/,"Mahmoud Awad , Daniel A. Menascé",Performance Model Derivation of Operational Systems through Log Analysis,"Manually developing analytic performance models of operational systems can be challenging, time consuming, and costly. This paper describes a method that uses system logs and configuration files to automatically derive analytic performance models of operational systems. The method described here automatically determines: (1) the system software servers, (2) the system devices, (3) the deployment of software servers to devices, (4) the communication patterns between software servers for each external use-case, and (5) the probability at which interactions between servers occur. The method was implemented and validated on a multi-tier system. The results showed that the method is capable of deriving the workload model and system model by parsing the system configuration files and log files and inferring user-system interaction patterns and client-server interaction diagrams.",This paper describes a method that uses system logs and configuration files to automatically derive analytic performance models of operational systems.,"Manually developing analytic performance models of operational systems can be challenging, time consuming, and costly.",,The results showed that the method is capable of deriving the workload model and system model by parsing the system configuration files and log files and inferring user-system interaction patterns and client-server interaction diagrams., Awad_2016,,,,,,,,,,,1,, International Symposium on Reliable Distributed Systems,SRDS,2016,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/7794329/,"Nentawe Gurumdimma , Arshad Jhumka , Maria Liakata , Edward Chuah , James Browne",CRUDE: Combining Resource Usage Data and Error Logs for Accurate Error Detection in Large-Scale Distributed Systems,"The use of console logs for error detection in large scale distributed systems has proven to be useful to system administrators. However, such logs are typically redundant and incomplete, making accurate detection very difficult. In an attempt to increase this accuracy, we complement these incomplete console logs with resource usage data, which captures the resource utilisation of every job in the system. We then develop a novel error detection methodology, the CRUDE approach, that makes use of both the resource usage data and console logs. We thus make the following specific technical contributions: we develop (i) a clustering algorithm to group nodes with similar behaviour, (ii) an anomaly detection algorithm to identify jobs with anomalous resource usage, (iii) an algorithm that links jobs with anomalous resource usage with erroneous nodes. We then evaluate our approach using console logs and resource usage data from the Ranger Supercomputer. Our results are positive: (i) our approach detects errors with a true positive rate of about 80%, and (ii) when compared with the well-known Nodeinfo error detection algorithm, our algorithm provides an average improvement of around 85% over Nodeinfo, with a best-case improvement of 250%.","We then develop a novel error detection methodology, the CRUDE approach, that makes use of both the resource usage data and console logs.","The use of console logs for error detection in large scale distributed systems has proven to be useful to system administrators. However, such logs are typically redundant and incomplete, making accurate detection very difficult.","In an attempt to increase this accuracy, we complement these incomplete console logs with resource usage data, which captures the resource utilisation of every job in the system. We thus make the following specific technical contributions: we develop (i) a clustering algorithm to group nodes with similar behaviour, (ii) an anomaly detection algorithm to identify jobs with anomalous resource usage, (iii) an algorithm that links jobs with anomalous resource usage with erroneous nodes. We then evaluate our approach using console logs and resource usage data from the Ranger Supercomputer. "," Our results are positive: (i) our approach detects errors with a true positive rate of about 80%, and (ii) when compared with the well-known Nodeinfo error detection algorithm, our algorithm provides an average improvement of around 85% over Nodeinfo, with a best-case improvement of 250%.", Gurumdimma_2016,,,,,,,,1,,,,, International Symposium on Software Reliability Engineering,ISSRE,2016,industry track paper,Computer Software,A,https://ieeexplore.ieee.org/document/7774521/,"Shilin He , Jieming Zhu , Pinjia He , Michael R. Lyu",Experience Report: System Log Analysis for Anomaly Detection,"Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. Traditionally, developers (or operators) often inspect the logs manually with keyword search and rule matching. The increasing scale and complexity of modern systems, however, make the volume of logs explode, which renders the infeasibility of manual inspection. To reduce manual effort, many anomaly detection methods based on automated log analysis are proposed. However, developers may still have no idea which anomaly detection methods they should adopt, because there is a lack of a review and comparison among these anomaly detection methods. Moreover, even if developers decide to employ an anomaly detection method, re-implementation requires a nontrivial effort. To address these problems, we provide a detailed review and evaluation of six state-of-the-art log-based anomaly detection methods, including three supervised methods and three unsupervised methods, and also release an open-source toolkit allowing ease of reuse. These methods have been evaluated on two publicly-available production log datasets, with a total of 15,923,592 log messages and 365,298 anomaly instances. We believe that our work, with the evaluation results as well as the corresponding findings, can provide guidelines for adoption of these methods and provide references for future development.","To address these problems, we provide a detailed review and evaluation of six state-of-the-art log-based anomaly detection methods, including three supervised methods and three unsupervised methods, and also release an open-source toolkit allowing ease of reuse.","Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. Traditionally, developers (or operators) often inspect the logs manually with keyword search and rule matching. The increasing scale and complexity of modern systems, however, make the volume of logs explode, which renders the infeasibility of manual inspection. To reduce manual effort, many anomaly detection methods based on automated log analysis are proposed. However, developers may still have no idea which anomaly detection methods they should adopt, because there is a lack of a review and comparison among these anomaly detection methods. Moreover, even if developers decide to employ an anomaly detection method, re-implementation requires a nontrivial effort.","These methods have been evaluated on two publicly-available production log datasets, with a total of 15,923,592 log messages and 365,298 anomaly instances.","We believe that our work, with the evaluation results as well as the corresponding findings, can provide guidelines for adoption of these methods and provide references for future development.",He_2016a,,,,,,1,,,,,,, International Conference on Big Data Computing Service and Applications,,2018,7/2,,?,https://ieeexplore.ieee.org/document/8405707/,"Junior Dongo , Charif Mahmoudi , Fabrice Mourlin",NDN Log Analysis Using Big Data Techniques: NFD Performance Assessment,,,,,,,,,,,,,,,,,,, "Security, Privacy, and Anonymity in Computation, Communication, and Storage",,2017,,,?,http://link.springer.com/chapter/10.1007/978-3-319-72395-2_19,"Kai Ma, Rong Jiang, Mianxiong Dong, Yan Jia, Aiping Li",Neural Network Based Web Log Analysis for Web Intrusion Detection,,,,,,,,,,,,,,,,,,, Transactions on Emerging Topics in Computing,,2016,journal,,?,https://ieeexplore.ieee.org/document/7389388/,"Daniel Sun , Min Fu , Liming Zhu , Guoqiang Li , Qinghua Lu",Non-Intrusive Anomaly Detection With Streaming Performance Metrics and Logs for DevOps in Public Clouds: A Case Study in AWS,,,,,,,,,,,,,,,,,,, Journal of Information Processing,,2016,journal,Information and Computing Sciences,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-84961194323&partnerID=40&md5=b6ee4d3f550a1d09c239b1059d51be68,"Saito, S., Maruhashi, K., Takenaka, M., Torii, S.",TOPASE: Detection and prevention of brute force attacks with disciplined IPs from IDS logs,,,,,,,,,,,,,,,,,,, Journal of Computer Science and Technology,,2016,journal,Information and Computing Sciences,B,https://link.springer.com/article/10.1007/s11390-016-1678-7,"De-Qing Zou, Hao Qin, Hai Jin",Uilog: Improving log-based fault diagnosis by log analysis,,,,,,,,,,,,,,,,,,, ACM Conference on Computer and Communications Security,CCS,2017,research track paper,"Computer Software Distributed Computing Communications Technologies",A*,https://dl.acm.org/citation.cfm?id=3134015,"Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar",DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning,"Anomaly detection is a critical step towards building a secure and trustworthy system. The primary purpose of a system log is to record system states and significant events at various critical points to help debug system failures and perform root cause analysis. Such log data is universally available in nearly all computer systems. Log data is an important and valuable resource for understanding system status and performance issues; therefore, the various system logs are naturally excellent source of information for online monitoring and anomaly detection. We propose DeepLog, a deep neural network model utilizing Long Short-Term Memory (LSTM), to model a system log as a natural language sequence. This allows DeepLog to automatically learn log patterns from normal execution, and detect anomalies when log patterns deviate from the model trained from log data under normal execution. In addition, we demonstrate how to incrementally update the DeepLog model in an online fashion so that it can adapt to new log patterns over time. Furthermore, DeepLog constructs workflows from the underlying system log so that once an anomaly is detected, users can diagnose the detected anomaly and perform root cause analysis effectively. Extensive experimental evaluations over large log data have shown that DeepLog has outperformed other existing log-based anomaly detection methods based on traditional data mining methodologies.","We propose DeepLog, a deep neural network model utilizing Long Short-Term Memory (LSTM), to model a system log as a natural language sequence.","Anomaly detection is a critical step towards building a secure and trustworthy system. The primary purpose of a system log is to record system states and significant events at various critical points to help debug system failures and perform root cause analysis. Such log data is universally available in nearly all computer systems. Log data is an important and valuable resource for understanding system status and performance issues; therefore, the various system logs are naturally excellent source of information for online monitoring and anomaly detection.","This [model] allows DeepLog to automatically learn log patterns from normal execution, and detect anomalies when log patterns deviate from the model trained from log data under normal execution. In addition, we demonstrate how to incrementally update the DeepLog model in an online fashion so that it can adapt to new log patterns over time. Furthermore, DeepLog constructs workflows from the underlying system log so that once an anomaly is detected, users can diagnose the detected anomaly and perform root cause analysis effectively.",Extensive experimental evaluations over large log data have shown that DeepLog has outperformed other existing log-based anomaly detection methods based on traditional data mining methodologies., Du_2017,,,,,,1,,,,,,, Simulation Modelling Practice and Theory,,2017,journal,"Computation Theory and Mathematics Applied Mathematics",C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85025601866&partnerID=40&md5=2dfc8f8ad9ec5a3f773153da38f3939a,"Qi G., Tsai W.-T., Li W., Zhu Z., Luo Y.",A cloud-based triage log analysis and recovery framework,,,,,,,,,,,,,,,,,,, "IEEE International Conference on High Performance Computing and Simulation ",IEEE HPCS,2017,8/2,,B,https://ieeexplore.ieee.org/document/8035155/,"Francesco Folino , Gianluigi Folino , Luigi Pontieri , Pietro Sabatino",A Peer-to-Peer Architecture for Detecting Attacks from Network Traffic and Log Data,,,,,,,,,,,,,,,,,,, ACM International Conference on Knowledge Discovery and Data Mining,KDD,2017,research track paper,Data Format,A*,https://dl.acm.org/citation.cfm?id=3098124,"Fei Wu, Pranay Anchuri, Zhenhui Li",Structural Event Detection from Log Messages,"A wide range of modern web applications are only possible because of the composable nature of the web services they are built upon. It is, therefore, often critical to ensure proper functioning of these web services. As often, the server-side of web services is not directly accessible, several log message based analysis have been developed to monitor the status of web services. Existing techniques focus on using clusters of messages (log patterns) to detect important system events. We argue that meaningful system events are often representable by groups of cohesive log messages and the relationships among these groups. We propose a novel method to mine structural events as directed workflow graphs (where nodes represent log patterns, and edges represent relations among patterns). The structural events are inclusive and correspond to interpretable episodes in the system. The problem is non-trivial due to the nature of log data: (i) Individual log messages contain limited information, and (ii) Log messages in a large scale web system are often interleaved even though the log messages from individual components are ordered. As a result, the patterns and relationships mined directly from the messages and their ordering can be erroneous and unreliable in practice. Our solution is based on the observation that meaningful log patterns and relations often form workflow structures that are connected. Our method directly models the overall quality of structural events. Through both qualitative and quantitative experiments on real world datasets, we demonstrate the effectiveness and the expressiveness of our event detection method.","We propose a novel method to mine structural events as directed workflow graphs (where nodes represent log patterns, and edges represent relations among patterns). The structural events are inclusive and correspond to interpretable episodes in the system.","A wide range of modern web applications are only possible because of the composable nature of the web services they are built upon. It is, therefore, often critical to ensure proper functioning of these web services. As often, the server-side of web services is not directly accessible, several log message based analysis have been developed to monitor the status of web services. Existing techniques focus on using clusters of messages (log patterns) to detect important system events. We argue that meaningful system events are often representable by groups of cohesive log messages and the relationships among these groups. The problem is non-trivial due to the nature of log data: (i) Individual log messages contain limited information, and (ii) Log messages in a large scale web system are often interleaved even though the log messages from individual components are ordered. As a result, the patterns and relationships mined directly from the messages and their ordering can be erroneous and unreliable in practice.",Our solution is based on the observation that meaningful log patterns and relations often form workflow structures that are connected. Our method directly models the overall quality of structural events.,"Through both qualitative and quantitative experiments on real world datasets, we demonstrate the effectiveness and the expressiveness of our event detection method.", Wu_2017,,,,,,,,,,,1,, ACM International Conference on Knowledge Discovery and Data Mining,KDD,2017,research track paper,Data Format,A*,https://dl.acm.org/citation.cfm?id=3098022,"Tao Li, Yexi Jiang, Chunqiu Zeng, Bin Xia, Zheng Liu, Wubai Zhou, Xiaolong Zhu, Wentao Wang, Liang Zhang, Jun Wu, Li Xue, Dewei Bao",FLAP: An End-to-End Event Log Analysis Platform for System Management,"Many systems, such as distributed operating systems, complex networks, and high throughput web-based applications, are continuously generating large volume of event logs. These logs contain useful information to help system administrators to understand the system running status and to pinpoint the system failures. Generally, due to the scale and complexity of modern systems, the generated logs are beyond the analytic power of human beings. Therefore, it is imperative to develop a comprehensive log analysis system to support effective system management. Although a number of log mining techniques have been proposed to address specific log analysis use cases, few research and industrial efforts have been paid on providing integrated systems with an end-to-end solution to facilitate the log analysis routines. In this paper, we design and implement an integrated system, called FIU Log Analysis Platform (a.k.a. FLAP), that aims to facilitate the data analytics for system event logs. FLAP provides an end-to-end solution that utilizes advanced data mining techniques to assist log analysts to conveniently, timely, and accurately conduct event log knowledge discovery, system status investigation, and system failure diagnosis. Specifically, in FLAP, state-of-the-art template learning techniques are used to extract useful information from unstructured raw logs; advanced data transformation techniques are proposed and leveraged for event transformation and storage; effective event pattern mining, event summarization, event querying, and failure prediction techniques are designed and integrated for log analytics; and user-friendly interfaces are utilized to present the informative analysis results intuitively and vividly. Since 2016, FLAP has been used by Huawei Technologies Co. Ltd for internal event log analysis, and has provided effective support in its system operation and workflow optimization.","In this paper, we design and implement an integrated system, called FIU Log Analysis Platform (a.k.a. FLAP), that aims to facilitate the data analytics for system event logs.","Many systems, such as distributed operating systems, complex networks, and high throughput web-based applications, are continuously generating large volume of event logs. These logs contain useful information to help system administrators to understand the system running status and to pinpoint the system failures. Generally, due to the scale and complexity of modern systems, the generated logs are beyond the analytic power of human beings. Therefore, it is imperative to develop a comprehensive log analysis system to support effective system management. Although a number of log mining techniques have been proposed to address specific log analysis use cases, few research and industrial efforts have been paid on providing integrated systems with an end-to-end solution to facilitate the log analysis routines.","FLAP provides an end-to-end solution that utilizes advanced data mining techniques to assist log analysts to conveniently, timely, and accurately conduct event log knowledge discovery, system status investigation, and system failure diagnosis. Specifically, in FLAP, state-of-the-art template learning techniques are used to extract useful information from unstructured raw logs; advanced data transformation techniques are proposed and leveraged for event transformation and storage; effective event pattern mining, event summarization, event querying, and failure prediction techniques are designed and integrated for log analytics; and user-friendly interfaces are utilized to present the informative analysis results intuitively and vividly.","Since 2016, FLAP has been used by Huawei Technologies Co. Ltd for internal event log analysis, and has provided effective support in its system operation and workflow optimization.",Li_2017,,,,,,,,,,,,,1 IEEE Global Telecommunications Conference,GLOBECOM,2017,research track paper,,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85046449321&partnerID=40&md5=e3757b75e190a28d5f4fb8d33daba481,"Yoshida N., Ata S., Nakayama H., Hayashi T.",Automation of Network Operations by Cooperation between Anomaly Detections and Operation Logs,,,,,,,,,,,,,,,,,,, Empirical Software Engineering,,2017,journal,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84966320690&partnerID=40&md5=ba592649195bc874879eec14897f38cf,"Chen, B., (Jack) Jiang, Z.M.",Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation,"Log messages, which are generated by the debug statements that developers insert into the code at runtime, contain rich information about the runtime behavior of software systems. Log messages are used widely for system monitoring, problem diagnoses and legal compliances. Yuan et al. performed the first empirical study on the logging practices in open source software systems. They studied the development history of four C/C++ server-side projects and derived ten interesting findings. In this paper, we have performed a replication study in order to assess whether their findings would be applicable to Java projects in Apache Software Foundations. We examined 21 different Java-based open source projects from three different categories: server-side, client-side and supporting-component. Similar to the original study, our results show that all projects contain logging code, which is actively maintained. However, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages. A significantly higher portion of log updates are for enhancing the quality of logs (e.g., formatting & style changes and spelling/grammar fixes) rather than co-changes with feature implementations (e.g., updating variable names).","In this paper, we have performed a replication study in order to assess whether their findings would be applicable to Java projects in Apache Software Foundations.","Log messages, which are generated by the debug statements that developers insert into the code at runtime, contain rich information about the runtime behavior of software systems. Log messages are used widely for system monitoring, problem diagnoses and legal compliances. Yuan et al. performed the first empirical study on the logging practices in open source software systems. They studied the development history of four C/C++ server-side projects and derived ten interesting findings.","We examined 21 different Java-based open source projects from three different categories: server-side, client-side and supporting-component.","Similar to the original study, our results show that all projects contain logging code, which is actively maintained. However, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages. A significantly higher portion of log updates are for enhancing the quality of logs (e.g., formatting & style changes and spelling/grammar fixes) rather than co-changes with feature implementations (e.g., updating variable names).",Chen_2017a,,,1,,,,,,,,,, Empirical Software Engineering,,2017,journal,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-84991112488&partnerID=40&md5=d04d6a1d7186bb0d50692743772878a6,"Li, H., Shang, W., Hassan, A.E.",Which log level should developers choose for a new logging statement?,"Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements. In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement. We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.","In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement.","Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements.","We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. ","First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.",Li_2017a,,,1,,,,,,,,,, IBM Journal of Research and Development,,2017,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85016441785&partnerID=40&md5=3c6dc05775f6b95112a50ef728681b5b,"Ramakrishna, V., Rajput, N., Mukherjea, S., Dey, K.",A platform for end-to-end mobile application infrastructure analytics using system log correlation,"Monitoring and analyzing the performance of mobile applications is challenging when the applications rely on remote services and functioning networks. Root causes of failures and performance anomalies of client-server applications are difficult to pinpoint because of their distributed nature. In this paper, we introduce the Mobile Infrastructure Analytics System (MIAS), which helps efficiently detect and debug application faults in a distributed environment, holistically analyzing application and network activity across client devices, application servers, database servers, etc. MIAS collects hypertext transfer protocol (HTTP) session data and system logs from servers and instrumented mobile applications, automatically correlates HTTP sessions with server activity, detects anomalous behavior using statistical techniques, and extracts a small and relevant set of log entries for manual inspection. We show how faults were detected and root causes pinpointed just by glancing at the evidence, for a real-world bookstore mobile application.","In this paper, we introduce the Mobile Infrastructure Analytics System (MIAS), which helps efficiently detect and debug application faults in a distributed environment, holistically analyzing application and network activity across client devices, application servers, database servers, etc.",Monitoring and analyzing the performance of mobile applications is challenging when the applications rely on remote services and functioning networks. Root causes of failures and performance anomalies of client-server applications are difficult to pinpoint because of their distributed nature. ," MIAS collects hypertext transfer protocol (HTTP) session data and system logs from servers and instrumented mobile applications, automatically correlates HTTP sessions with server activity, detects anomalous behavior using statistical techniques, and extracts a small and relevant set of log entries for manual inspection."," We show how faults were detected and root causes pinpointed just by glancing at the evidence, for a real-world bookstore mobile application.",Ramakrishna_2017,,,,,,,,,,,,1, IBM Journal of Research and Development,,2017,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85016417253&partnerID=40&md5=dfe1e7c902b4859e00a5892f46790f81,"Wang, J., Li, C., Han, S., Sarkar, S., Zhou, X.",Predictive maintenance based on event-log analysis: A case study,"Predictive maintenance techniques are designed to help anticipate equipment failures to allow for advance scheduling of corrective maintenance, thereby preventing unexpected equipment downtime, improving service quality for customers, and also reducing the additional cost caused by over-maintenance in preventative maintenance policies. Many types of equipment - e.g., automated teller machines (ATMs), information technology equipment, medical devices, etc. - track run-time status by generating system messages, error events, and log files, which can be used to predict impending failures. Aiming at these types of equipment, we present a general classification-based failure prediction method. In our parameterized model, we systematically defined four categories of features to try to cover all possibly useful features, and then used feature selection to identify the most important features for model construction. The general solution is sufficiently flexible and complex to address failure prediction for target equipment types. We chose ATMs as the example equipment and used real ATM run-time event logs and maintenance records as experimental data to evaluate our method on the feasibility and effectiveness. In this paper, we also share insights on how to optimize the model parameters, select the most effective features, and tune classifiers to build a high-performance prediction model.","Aiming at these types of equipment, we present a general classification-based failure prediction method.","Predictive maintenance techniques are designed to help anticipate equipment failures to allow for advance scheduling of corrective maintenance, thereby preventing unexpected equipment downtime, improving service quality for customers, and also reducing the additional cost caused by over-maintenance in preventative maintenance policies. Many types of equipment - e.g., automated teller machines (ATMs), information technology equipment, medical devices, etc. - track run-time status by generating system messages, error events, and log files, which can be used to predict impending failures. ","In our parameterized model, we systematically defined four categories of features to try to cover all possibly useful features, and then used feature selection to identify the most important features for model construction. We chose ATMs as the example equipment and used real ATM run-time event logs and maintenance records as experimental data to evaluate our method on the feasibility and effectiveness. In this paper, we also share insights on how to optimize the model parameters, select the most effective features, and tune classifiers to build a high-performance prediction model.", The general solution is sufficiently flexible and complex to address failure prediction for target equipment types.,Wang_2017,,,,,,,,,1,,,, International Symposium on Parallel and Distributed Processing Workshops and Phd Forum,,2011,workshop,,?,https://ieeexplore.ieee.org/document/6009015/,"Nithin Nakka , Ankit Agrawal , Alok Choudhary",Predicting Node Failure in High Performance Computing Systems from Failure and Usage Logs,,,,,,,,,,,,,,,,,,, IEEE International Conference on Cluster Computing,CLUSTER,2017,research track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/8049013/,"Byung H. Park , Saurabh Hukerikar , Ryan Adamson , Christian Engelmann",Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale,"Today's high-performance computing (HPC) systems are heavily instrumented, generating logs containing information about abnormal events, such as critical conditions, faults, errors and failures, system resource utilization, and about the resource usage of user applications. These logs, once fully analyzed and correlated, can produce detailed information about the system health, root causes of failures, and analyze an application's interactions with the system, providing valuable insights to domain scientists and system administrators. However, processing HPC logs requires a deep understanding of hardware and software components at multiple layers of the system stack. Moreover, most log data is unstructured and voluminous, making it more difficult for system users and administrators to manually inspect the data. With rapid increases in the scale and complexity of HPC systems, log data processing is becoming a big data challenge. This paper introduces a HPC log data analytics framework that is based on a distributed NoSQL database technology, which provides scalability and high availability, and the Apache Spark framework for rapid in-memory processing of the log data. The analytics framework enables the extraction of a range of information about the system so that system administrators and end users alike can obtain necessary insights for their specific needs. We describe our experience with using this framework to glean insights from the log data about system behavior from the Titan supercomputer at the Oak Ridge National Laboratory.","This paper introduces a HPC log data analytics framework that is based on a distributed NoSQL database technology, which provides scalability and high availability, and the Apache Spark framework for rapid in-memory processing of the log data.","Today's high-performance computing (HPC) systems are heavily instrumented, generating logs containing information about abnormal events, such as critical conditions, faults, errors and failures, system resource utilization, and about the resource usage of user applications. These logs, once fully analyzed and correlated, can produce detailed information about the system health, root causes of failures, and analyze an application's interactions with the system, providing valuable insights to domain scientists and system administrators. However, processing HPC logs requires a deep understanding of hardware and software components at multiple layers of the system stack. Moreover, most log data is unstructured and voluminous, making it more difficult for system users and administrators to manually inspect the data. With rapid increases in the scale and complexity of HPC systems, log data processing is becoming a big data challenge.",The analytics framework enables the extraction of a range of information about the system so that system administrators and end users alike can obtain necessary insights for their specific needs. We describe our experience with using this framework to glean insights from the log data about system behavior from the Titan supercomputer at the Oak Ridge National Laboratory.,,Park_2017,,,,,,,,,,,,1, European Conference on Pattern Languages of Programs,EuroPLop,2017,research track paper,Computer Software,B,https://dl.acm.org/citation.cfm?id=3147720,"Tiago Boldt Sousa, Hugo Sereno Ferreira, Filipe Figueiredo Correia, Ademar Aguiar",Engineering Software for the Cloud: Messaging Systems and Logging,,,,,,,,,,,,,,,,,,, "IEEE International Conference on Web Services ",ICWS,2017,industry track paper,Information Systems,A,https://ieeexplore.ieee.org/document/8029786/,"Siyang Lu , BingBing Rao , Xiang Wei , Byungchul Tak , Long Wang , Liqiang Wang",Log-based Abnormal Task Detection and Root Cause Analysis for Spark,"Application delays caused by abnormal tasks arecommon problems in big data computing frameworks. Anabnormal task in Spark, which may run slowly withouterror or warning logs, not only reduces its resident node'sperformance, but also affects other nodes' efficiency.Spark log files report neither root causes of abnormal tasks,nor where and when abnormal scenarios happen. AlthoughSpark provides a “speculation” mechanism to detect stragglertasks, it can only detect tailed stragglers in each stage. Sincethe root causes of abnormal happening are complicated, thereare no effective ways to detect root causes.This paper proposes an approach to detect abnormality andanalyzes root causes using Spark log files. Unlike commononline monitoring or analysis tools, our approach is a pureoff-line method that can analyze abnormality accurately. Ourapproach consists of four steps. First, a parser preprocessesraw log files to generate structured log data. Second, ineach stage of Spark application, we choose features relatedto execution time and data locality of each task, as well asmemory usage and garbage collection of each node. Third,based on the selected features, we detect where and whenabnormalities happen. Finally, we analyze the problems usingweighted factors to decide the probability of root causes. In thispaper, we consider four potential root causes of abnormalities,which include CPU, memory, network, and disk. The proposedmethod has been tested on real-world Spark benchmarks.To simulate various scenario of root causes, we conductedinterference injections related to CPU, memory, network,and Disk. Our experimental results show that the proposedapproach is accurate on detecting abnormal tasks as well asfinding the root causes",This paper proposes an approach to detect abnormality andanalyzes root causes using Spark log files.,"Application delays caused by abnormal tasks arecommon problems in big data computing frameworks. Anabnormal task in Spark, which may run slowly withouterror or warning logs, not only reduces its resident node'sperformance, but also affects other nodes' efficiency.Spark log files report neither root causes of abnormal tasks,nor where and when abnormal scenarios happen. AlthoughSpark provides a “speculation” mechanism to detect stragglertasks, it can only detect tailed stragglers in each stage. Sincethe root causes of abnormal happening are complicated, thereare no effective ways to detect root causes.","Unlike commononline monitoring or analysis tools, our approach is a pureoff-line method that can analyze abnormality accurately. Ourapproach consists of four steps. First, a parser preprocessesraw log files to generate structured log data. Second, ineach stage of Spark application, we choose features relatedto execution time and data locality of each task, as well asmemory usage and garbage collection of each node. Third,based on the selected features, we detect where and whenabnormalities happen. Finally, we analyze the problems usingweighted factors to decide the probability of root causes. In thispaper, we consider four potential root causes of abnormalities,which include CPU, memory, network, and disk. The proposedmethod has been tested on real-world Spark benchmarks.To simulate various scenario of root causes, we conductedinterference injections related to CPU, memory, network,and Disk.",Our experimental results show that the proposedapproach is accurate on detecting abnormal tasks as well asfinding the root causes,Lu_2017,,,,,,1,,,,,,, "IEEE International Symposium on Cluster, Cloud and Grid Computing",CCGrid,2017,research track paper,Distributed Computing,A,https://dl.acm.org/citation.cfm?id=3101172,"Sheng Di, Rinku Gupta, Marc Snir, Eric Pershey, Franck Cappello",LogAider: A tool for mining potential correlations of HPC log events,"Today's large-scale supercomputers are producing a huge amount of log data. Exploring various potential correlations of fatal events is crucial for understanding their causality and improving the working efficiency for system administrators. To this end, we developed a toolkit, named LogAider, that can reveal three types of potential correlations: across-field, spatial, and temporal. Across-field correlation refers to the statistical correlation across fields within a log or across multiple logs based on probabilistic analysis. For analyzing the spatial correlation of events, we developed a generic, easy-to-use visualizer that can view any events queried by users on a system machine graph. LogAider can also mine spatial correlations by an optimized K-meaning clustering algorithm over a Torus network topology. It is also able to disclose the temporal correlations (or error propagations) over a certain period inside a log or across multiple logs, based on an effective similarity analysis strategy. We assessed LogAider using the one-year reliability-availability-serviceability (RAS) log of Mira system (one of the world's most powerful supercomputers), as well as its job log. We find that LogAider very helpful for revealing the potential correlations of fatal system events and job events, with an accurate mining of across-field correlation with both precision and recall of 99.9-100%, as well as precise detection of temporal-correlation with a high similarity (up to 95%) to the ground-truth.","To this end, we developed a toolkit, named LogAider, that can reveal three types of potential correlations: across-field, spatial, and temporal.",Today's large-scale supercomputers are producing a huge amount of log data. Exploring various potential correlations of fatal events is crucial for understanding their causality and improving the working efficiency for system administrators. ,"Across-field correlation refers to the statistical correlation across fields within a log or across multiple logs based on probabilistic analysis. For analyzing the spatial correlation of events, we developed a generic, easy-to-use visualizer that can view any events queried by users on a system machine graph. LogAider can also mine spatial correlations by an optimized K-meaning clustering algorithm over a Torus network topology. It is also able to disclose the temporal correlations (or error propagations) over a certain period inside a log or across multiple logs, based on an effective similarity analysis strategy. We assessed LogAider using the one-year reliability-availability-serviceability (RAS) log of Mira system (one of the world's most powerful supercomputers), as well as its job log. ","We find that LogAider very helpful for revealing the potential correlations of fatal system events and job events, with an accurate mining of across-field correlation with both precision and recall of 99.9-100%, as well as precise detection of temporal-correlation with a high similarity (up to 95%) to the ground-truth.", Di_2017,,,,,,,,,,,,,1 International Journal of Computer Applications,IJCA,2014,journal,,?,https://www.researchgate.net/profile/Sumitra_Pundlik/publication/261700670_Real_Time_Generalized_Log_File_Management_and_Analysis_using_Pattern_Matching_and_Dynamic_Clustering/links/0deec5350cce89cacb000000/Real-Time-Generalized-Log-File-Management-and-Analysis-using-Pattern-Matching-and-Dynamic-Clustering.pdf,"Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik, Gaurav Rai",Real time generalized log file management and analysis using pattern matching and dynamic clustering,,,,,,,,,,,,,,,,,,, Large Installation System Administration Conference,LISA,2004,,,?,https://www.usenix.org/legacy/event/lisa04/tech/full_papers/rouillard/rouillard.pdf,John P. Rouillard,Real-time Log File Analysis Using the Simple Event Correlator (SEC).,,,,,,,,,,,,,,,,,,, Workshop on Machine Learning for Computing Systems,,2018,,,?,https://dl.acm.org/citation.cfm?id=3217872,"Andy Brown, Aaron Tuor, Brian Hutchinson, Nicole Nichols",Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection,,,,,,,,,,,,,,,,,,, Digital Investigation,,2017,journal,Other Information and Computing Sciences,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85019097613&partnerID=40&md5=a8adb9d5911ad2c7b951fe8e505c337a,"Studiawan, H., Payne, C., Sohel, F.",Graph clustering and anomaly detection of access control log for forensic purposes,,,,,,,,,,,,,,,,,,, International Conference on Big Data Analysis,,2016,7/2,,?,https://ieeexplore.ieee.org/document/7509792/,"Maocai Cheng , Kaiyong Xu , Xuerong Gong",Research on audit log association rule mining based on improved Apriori algorithm,,,,,,,,,,,,,,,,,,, "International Conference on Database and Expert Systems Applications ",DEXA,2017,research track paper,"Artificial Intelligence and Image Processing Data Format Information Systems",B,http://link.springer.com/chapter/10.1007/978-3-319-64471-4_22,"Hamza Labbaci, Brahim Medjahed, Youcef Aklouf",Learning Interactions from Web Service Logs,,,,,,,,,,,,,,,,,,, Computational Collective Intelligence. Technologies and Applications,,2014,10/1,,?,http://link.springer.com/chapter/10.1007/978-3-319-11289-3_67,"Grzegorz Kołaczek, Tomasz Kuzemko",Security Incident Detection Using Multidimensional Analysis of the Web Server Log Files,,,,,,,,,,,,,,,,,,, "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",,2017,14/1,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85017235475&partnerID=40&md5=72ccc74bc6eb284e5458c7bffb71881b,"Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.",Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems,,,,,,,,,,,,,,,,,,, Asia Conference on Computer and Communications Security,,2017,research track paper,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85021926604&partnerID=40&md5=182a8eaacc2615c7958820b1d50330f1,"Karande, V., Bauman, E., Lin, Z., Khan, L.",SGX-Log: Securing system logs with SGX,,,,,,,,,,,,,,,,,,, Journal of Supercomputing,,2017,journal,Distributed Computing,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85014539558&partnerID=40&md5=c83bed2fb0aaf4af77fabfba35af2a41,"Vega C., Roquero P., Leira R., Gonzalez I., Aracil J.",Loginson: a transform and load system for very large-scale log analysis in large IT infrastructures,,,,,,,,,,,,,,,,,,, "International Conference on Circuit ,Power and Computing Technologies",,2017,8/2,,?,https://ieeexplore.ieee.org/document/8074209/,"Ezz El-Din Hemdan , D. H. Manjaiah",Spark-based log data analysis for reconstruction of cybercrime events in cloud environment,,,,,,,,,,,,,,,,,,, International Joint Conference on e-Business and Telecommunications,ICETE,2017,research track paper,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85029453283&partnerID=40&md5=7ee7245833126969ecc8404e7d0af2be,"Tillem, G., Erkin, Z., Lagendijk, R.L.",Mining encrypted software logs using alpha algorithm,,,,,,,,,,,,,,,,,,, Journal of Systems Architecture,,2017,journal,Computer Software,B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85032346005&partnerID=40&md5=8244a21301fbcc51fbbf0c72bbb8228c,"Li B., Lin Y., Zhang S.",Multi-Task Learning for Intrusion Detection on web logs,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2016,industry track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=2889232,"Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, Xuewei Chen",Log clustering based problem identification for online service systems,"Logs play an important role in the maintenance of large-scale online service systems. When an online service fails, engineers need to examine recorded logs to gain insights into the failure and identify the potential problems. Traditionally, engineers perform simple keyword search (such as ""error"" and ""exception"") of logs that may be associated with the failures. Such an approach is often time consuming and error prone. Through our collaboration with Microsoft service product teams, we propose LogCluster, an approach that clusters the logs to ease log-based problem identification. LogCluster also utilizes a knowledge base to check if the log sequences occurred before. Engineers only need to examine a small number of previously unseen, representative log sequences extracted from the clusters to identify a problem, thus significantly reducing the number of logs that should be examined, meanwhile improving the identification accuracy. Through experiments on two Hadoop-based applications and two large-scale Microsoft online service systems, we show that our approach is effective and outperforms the state-of-the-art work proposed by Shang et al. in ICSE 2013. We have successfully applied LogCluster to the maintenance of many actual Microsoft online service systems. In this paper, we also share our success stories and lessons learned.","Through our collaboration with Microsoft service product teams, we propose LogCluster, an approach that clusters the logs to ease log-based problem identification.","Logs play an important role in the maintenance of large-scale online service systems. When an online service fails, engineers need to examine recorded logs to gain insights into the failure and identify the potential problems. Traditionally, engineers perform simple keyword search (such as ""error"" and ""exception"") of logs that may be associated with the failures. Such an approach is often time consuming and error prone.","LogCluster also utilizes a knowledge base to check if the log sequences occurred before. Engineers only need to examine a small number of previously unseen, representative log sequences extracted from the clusters to identify a problem, thus significantly reducing the number of logs that should be examined, meanwhile improving the identification accuracy.","Through experiments on two Hadoop-based applications and two large-scale Microsoft online service systems, we show that our approach is effective and outperforms the state-of-the-art work proposed by Shang et al. in ICSE 2013. We have successfully applied LogCluster to the maintenance of many actual Microsoft online service systems. In this paper, we also share our success stories and lessons learned.", Lin_2016,,,,1,,,,,,,,, International Symposium on Software Reliability Engineering,ISSRE,2017,industry track paper,Computer Software,A,https://ieeexplore.ieee.org/document/8109100/,"Christophe Bertero , Matthieu Roy , Carla Sauvanaud , Gilles Tredan",Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection,"Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers.","In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting).","Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc.","This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform.",Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers., Bertero_2017,,,,,,1,,,,,,, Journal of Systems and Software,JSS,2017,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85001086191&partnerID=40&md5=43f3205e098dd7e1e2dbb26d618bf80d,"Mavridis I., Karatza H.",Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark,"Log files are generated in many different formats by a plethora of devices and software. The proper analysis of these files can lead to useful information about various aspects of each system. Cloud computing appears to be suitable for this type of analysis, as it is capable to manage the high production rate, the large size and the diversity of log files. In this paper we investigated log file analysis with the cloud computational frameworks Apache™Hadoop® and Apache Spark™. We developed realistic log file analysis applications in both frameworks and we performed SQL-type queries in real Apache Web Server log files. Various experiments were performed with different parameters in order to study and compare the performance of the two frameworks.",In this paper we investigated log file analysis with the cloud computational frameworks Apache™Hadoop® and Apache Spark™.,"Log files are generated in many different formats by a plethora of devices and software. The proper analysis of these files can lead to useful information about various aspects of each system. Cloud computing appears to be suitable for this type of analysis, as it is capable to manage the high production rate, the large size and the diversity of log files.",We developed realistic log file analysis applications in both frameworks and we performed SQL-type queries in real Apache Web Server log files. Various experiments were performed with different parameters in order to study and compare the performance of the two frameworks.,,Mavridis_2017,,,,,1,,,,,,,, Journal of Internet Technology,,2017,,,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85028454909&partnerID=40&md5=09c69b3716bca3b38b8cf84ff2085ab3,"Zou, J., Tao, D., Yu, J.",A hybrid intrusion detection model for web log-based attacks,,,,,,,,,,,,,,,,,,, International Conference on Information and Communications Security,ICICS,2018,research track paper,Computer Software,B,http://link.springer.com/chapter/10.1007/978-3-030-01950-1_42,"Mamoru Mimura, Hidema Tanaka",A Linguistic Approach Towards Intrusion Detection in Actual Proxy Logs,,,,,,,,,,,,,,,,,,, ACM International Symposium on High Performance Distributed Computing,HPDC,2018,research track paper,Distributed Computing,A,https://dl.acm.org/citation.cfm?id=3208044,"Aidi Pi, Wei Chen, Xiaobo Zhou, Mike Ji",Profiling distributed systems in lightweight virtualized environments with logs and resource metrics,"Understanding and troubleshooting distributed systems in the cloud is considered a very difficult problem because the execution of a single user request is distributed to multiple machines. Further, the multi-tenancy nature of cloud environments further introduces interference that causes performance issues. Most existing troubleshooting tools either focus on log analysis or intrusive tracing methods, leaving resource usage monitoring unexplored. We propose and implement LRTrace, a non-intrusive tracing and feedback control tool for distributed applications in lightweight virtualized environments. LRTrace profiles both log messages and actual resource consumptions of an application at runtime in a fine-grained manner, which is made possible by lightweight container-based virtualization. By correlating these two kinds of information, LRTrace provides users the ability to build the relationship between changes in resource consumption and application events. Furthermore, LRTrace allows users to define and implement their own feedback control plug-ins to manage the cluster in a semi-automatic manner. In system evaluation, we run Spark and MapReduce applications in a multi-tenant cluster and show that LRTrace can diagnose performance issues caused by either interference or bugs, or both. It also helps users to understand the workflows of data-parallel applications.","We propose and implement LRTrace, a non-intrusive tracing and feedback control tool for distributed applications in lightweight virtualized environments.","Understanding and troubleshooting distributed systems in the cloud is considered a very difficult problem because the execution of a single user request is distributed to multiple machines. Further, the multi-tenancy nature of cloud environments further introduces interference that causes performance issues. Most existing troubleshooting tools either focus on log analysis or intrusive tracing methods, leaving resource usage monitoring unexplored.","LRTrace profiles both log messages and actual resource consumptions of an application at runtime in a fine-grained manner, which is made possible by lightweight container-based virtualization. By correlating these two kinds of information, LRTrace provides users the ability to build the relationship between changes in resource consumption and application events. Furthermore, LRTrace allows users to define and implement their own feedback control plug-ins to manage the cluster in a semi-automatic manner.","In system evaluation, we run Spark and MapReduce applications in a multi-tenant cluster and show that LRTrace can diagnose performance issues caused by either interference or bugs, or both. It also helps users to understand the workflows of data-parallel applications.",Pi_2018,,,,,,,,1,,,,, Computers in Industry,,2018,journal,"Computer Software Distributed Computing Cognitive Science",B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85041421248&partnerID=40&md5=fc6cbcdadd25e865b64adb900b73b71b,"Nagashree N., Tejasvi R., Swathi K.C.",An Early Risk Detection and Management System for the Cloud with Log Parser,,,,,,,,,,,,,,,,,,, IEEE International Conference on Program Comprehension,ICPC,2018,research track paper,Computer Software,C,https://dl.acm.org/citation.cfm?id=3196345,"Diego Castro, Marcelo Schots",Analysis of test log information through interactive visualizations,,,,,,,,,,,,,,,,,,, Neurocomputing,,2018,journal,"Artificial Intelligence and Image Processing Cognitive Science Biomedical Engineering ",B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85042663069&partnerID=40&md5=adaf75b346004ad640b2ffcb1c1ca51e,"Tonnelier E., Baskiotis N., Guigue V., Gallinari P.",Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework,,,,,,,,,,,,,,,,,,, International Symposium on Parallel and Distributed Computing,ISPDC,2018,research track paper,Distributed Computing,C,https://ieeexplore.ieee.org/document/8452034/,"Siavash Ghiasvand , Florina M. Ciorba",Assessing Data Usefulness for Failure Analysis in Anonymized System Logs,,,,,,,,,,,,,,,,,,, Conference on Communications and Network Security,,2017,9/2,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85016029704&partnerID=40&md5=b6f0cbe57b7a1d78e939b5a7f6b07903,"Suh-Lee, C., Jo, J.-Y., Kim, Y.",Text mining for security threat detection discovering hidden information in unstructured log messages,,,,,,,,,,,,,,,,,,, "International Conference on Database and Expert Systems Applications ",DEXA,2018,research track paper,"Artificial Intelligence and Image Processing Data Format Information Systems",B,http://link.springer.com/chapter/10.1007/978-3-319-98812-2_12,"Marietheres Dietz, Günther Pernul",Big Log Data Stream Processing: Adapting an Anomaly Detection Technique,,,,,,,,,,,,,,,,,,, Empirical Software Engineering,,2018,journal,Computer Software,A,http://link.springer.com/article/10.1007/s10664-018-9603-z,"Mehran Hassani, Weiyi Shang, Emad Shihab, Nikolaos Tsantalis",Studying and detecting log-related issues,"Logs capture valuable information throughout the execution of software systems. The rich knowledge conveyed in logs is highly leveraged by researchers and practitioners in performing various tasks, both in software development and its operation. Log-related issues, such as missing or having outdated information, may have a large impact on the users who depend on these logs. In this paper, we first perform an empirical study on log-related issues in two large-scale, open source software systems. We find that the files with log-related issues have undergone statistically significantly more frequent prior changes, and bug fixes. We also find that developers fixing these log-related issues are often not the ones who introduced the logging statement nor the owner of the method containing the logging statement. Maintaining logs is more challenging without clear experts. Finally, we find that most of the defective logging statements remain unreported for a long period (median 320 days). Once reported, the issues are fixed quickly (median five days). Our empirical findings suggest the need for automated tools that can detect log-related issues promptly. We conducted a manual study and identified seven root-causes of the log-related issues. Based on these root causes, we developed an automated tool that detects four evident types of log-related issues. Our tool can detect 75 existing inappropriate logging statements reported in 40 log-related issues. We also reported new issues found by our tool to developers and 38 previously unknown issues in the latest release of the subject systems were accepted by developers.",characterization of log-related issues + automated tool to detect log-related issues,"Logs capture valuable information throughout the execution of software systems. The rich knowledge conveyed in logs is highly leveraged by researchers and practitioners in performing various tasks, both in software development and its operation. Log-related issues, such as missing or having outdated information, may have a large impact on the users who depend on these logs. ","In this paper, we first perform an empirical study on log-related issues in two large-scale, open source software systems. We conducted a manual study and identified seven root-causes of the log-related issues.","We find that the files with log-related issues have undergone statistically significantly more frequent prior changes, and bug fixes. We also find that developers fixing these log-related issues are often not the ones who introduced the logging statement nor the owner of the method containing the logging statement. Maintaining logs is more challenging without clear experts. Finally, we find that most of the defective logging statements remain unreported for a long period (median 320 days). Once reported, the issues are fixed quickly (median five days). Our empirical findings suggest the need for automated tools that can detect log-related issues promptly.",Hassani_2018,,,1,,,,,,,,,, International Conference on Data Mining Workshops,,2012,workshop paper,,?,https://ieeexplore.ieee.org/document/6406404/,Wilhelmiina Hämäläinen,Thorough Analysis of Log Data with Dependency Rules: Practical Solutions and Theoretical Challenges,,,,,,,,,,,,,,,,,,, Empirical Software Engineering,,2018,journal,Computer Software,A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85041236951&partnerID=40&md5=4f3d82a1856c65710f10c619e2becb5a,"Li H., Chen T.-H.P., Shang W., Hassan A.E.",Studying software logging using topic models,"Software developers insert logging statements in their source code to record important runtime information; such logged information is valuable for understanding system usage in production and debugging system failures. However, providing proper logging statements remains a manual and challenging task. Missing an important logging statement may increase the difficulty of debugging a system failure, while too much logging can increase system overhead and mask the truly important information. Intuitively, the actual functionality of a software component is one of the major drivers behind logging decisions. For instance, a method maintaining network communications is more likely to be logged than getters and setters. In this paper, we used automatically-computed topics of a code snippet to approximate the functionality of a code snippet. We studied the relationship between the topics of a code snippet and the likelihood of a code snippet being logged (i.e., to contain a logging statement). Our driving intuition is that certain topics in the source code are more likely to be logged than others. To validate our intuition, we conducted a case study on six open source systems, and we found that i) there exists a small number of “log-intensive” topics that are more likely to be logged than other topics; ii) each pair of the studied systems share 12% to 62% common topics, and the likelihood of logging such common topics has a statistically significant correlation of 0.35 to 0.62 among all the studied systems; and iii) our topic-based metrics help explain the likelihood of a code snippet being logged, providing an improvement of 3% to 13% on AUC and 6% to 16% on balanced accuracy over a set of baseline metrics that capture the structural information of a code snippet. Our findings highlight that topics contain valuable information that can help guide and drive developers’ logging decisions.","We studied the relationship between the topics of a code snippet and the likelihood of a code snippet being logged (i.e., to contain a logging statement).","Software developers insert logging statements in their source code to record important runtime information; such logged information is valuable for understanding system usage in production and debugging system failures. However, providing proper logging statements remains a manual and challenging task. Missing an important logging statement may increase the difficulty of debugging a system failure, while too much logging can increase system overhead and mask the truly important information. Intuitively, the actual functionality of a software component is one of the major drivers behind logging decisions. For instance, a method maintaining network communications is more likely to be logged than getters and setters.","In this paper, we used automatically-computed topics of a code snippet to approximate the functionality of a code snippet. (...) Our driving intuition is that certain topics in the source code are more likely to be logged than others. ","we found that i) there exists a small number of “log-intensive” topics that are more likely to be logged than other topics; ii) each pair of the studied systems share 12% to 62% common topics, and the likelihood of logging such common topics has a statistically significant correlation of 0.35 to 0.62 among all the studied systems; and iii) our topic-based metrics help explain the likelihood of a code snippet being logged, providing an improvement of 3% to 13% on AUC and 6% to 16% on balanced accuracy over a set of baseline metrics that capture the structural information of a code snippet. Our findings highlight that topics contain valuable information that can help guide and drive developers’ logging decisions.",Li_2018,,,1,,,,,,,,,, "International Conference on Availability, Reliability and Security",ARES,2018,research track paper,Computer Software,B,https://dl.acm.org/citation.cfm?id=3230855,"Zongze Li, Matthew Davidson, Song Fu, Sean Blanchard, Michael Lang",Converting Unstructured System Logs into Structured Event List for Anomaly Detection,,,,,,,,,,,,,,,,,,, International Conference on Agents and Artificial Intelligence,ICAART,2018,research track paper,Artificial Intelligence and Image Processing,C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85046626347&partnerID=40&md5=8247f312461a5e9eda8b103175cef9c0,"Dagnely P., Tsiporkova E., Tourwe T.",Data-driven relevancy estimation for event logs exploration and preprocessing,,,,,,,,,,,,,,,,,,, International Parallel and Distributed Processing Symposium Workshop,,2015,WORKSHOP,,?,https://ieeexplore.ieee.org/document/7284426/,"Nentawe Gurumdimma , Arshad Jhumka , Maria Liakata , Edward Chuah , James Browne",Towards Detecting Patterns in Failure Logs of Large-Scale Distributed Systems,,,,,,,,,,,,,,,,,,, IEEE International Conference on Computer Communications,IEEE INFOCOM,2018,research track paper,Distributed Computing,A*,https://ieeexplore.ieee.org/document/8486257/,"Subhendu Khatuya , Niloy Ganguly , Jayanta Basak , Madhumita Bharde , Bivas Mitra",ADELE: Anomaly Detection from Event Log Empiricism,"A large population of users gets affected by sudden slowdown or shutdown of an enterprise application. System administrators and analysts spend considerable amount of time dealing with functional and performance bugs. These problems are particularly hard to detect and diagnose in most computer systems, since there is a huge amount of system generated supportability data (counters, logs etc.) that need to be analyzed. Most often, there isn't a very clear or obvious root cause. Timely identification of significant change in application behavior is very important to prevent negative impact on the service. In this paper, we present ADELE, an empirical, data-driven methodology for early detection of anomalies in data storage systems. The key feature of our solution is diligent selection of features from system logs and development of effective machine learning techniques for anomaly prediction. ADELE learns from system's own history to establish the baseline of normal behavior and gives accurate indications of the time period when something is amiss for a system. Validation on more than 4800 actual support cases shows ~ 83% true positive rate and ~ 12% false positive rate in identifying periods when the machine is not performing normally. We also establish the existence of problem “signatures” which help map customer problems to already seen issues in the field. ADELE's capability to predict early paves way for online failure prediction for customer systems.","In this paper, we present ADELE, an empirical, data-driven methodology for early detection of anomalies in data storage systems.","A large population of users gets affected by sudden slowdown or shutdown of an enterprise application. System administrators and analysts spend considerable amount of time dealing with functional and performance bugs. These problems are particularly hard to detect and diagnose in most computer systems, since there is a huge amount of system generated supportability data (counters, logs etc.) that need to be analyzed. Most often, there isn't a very clear or obvious root cause. Timely identification of significant change in application behavior is very important to prevent negative impact on the service. ","The key feature of our solution is diligent selection of features from system logs and development of effective machine learning techniques for anomaly prediction. ADELE learns from system's own history to establish the baseline of normal behavior and gives accurate indications of the time period when something is amiss for a system. We also establish the existence of problem “signatures” which help map customer problems to already seen issues in the field.","Validation on more than 4800 actual support cases shows ~ 83% true positive rate and ~ 12% false positive rate in identifying periods when the machine is not performing normally. ADELE's capability to predict early paves way for online failure prediction for customer systems.", Khatuya_2018,,,,,,,,,1,,,, International Conference on Computer Science and Software Engineering,,2012,6/2,,?,https://ieeexplore.ieee.org/document/6261962/,Dileepa Jayathilake,Towards structured log analysis,,,,,,,,,,,,,,,,,,, Task Models and Diagrams for Users Interface Design,,2007,,,?,http://link.springer.com/chapter/10.1007/978-3-540-70816-2_3,"Ivo Malý, Pavel Slavík",Towards Visual Analysis of Usability Test Logs Using Task Models,,,,,,,,,,,,,,,,,,, Computers and Security,,2018,journal,"Computer Software Distributed Computing Information Systems",B,https://www.scopus.com/inward/record.url?eid=2-s2.0-85053399855&partnerID=40&md5=bc9fd672f5878bc394a6c8cb68cfd3b3,"Landauer M., Wurzenberger M., Skopik F., Settanni G., Filzmoser P.",Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection,,,,,,,,,,,,,,,,,,, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences,,2018,journal,"Computer Software Electrical and Electronic Engineering",C,https://www.scopus.com/inward/record.url?eid=2-s2.0-85053827869&partnerID=40&md5=6ae437b1b8787439a388d07be8dac0cc,"Sato T., Himura Y., Yasuda Y.",Evidence-based context-aware log data management for integrated monitoring system,,,,,,,,,,,,,,,,,,, IEEE Transactions on Dependable and Secure Computing,,2018,journal,"Computer Software Data Format Distributed Computing",A,https://ieeexplore.ieee.org/document/8067504/,"Pinjia He , Jieming Zhu , Shilin He , Jian Li , Michael R. Lyu",Towards Automated Log Parsing for Large-Scale Log Data Analysis,"Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant. In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. To address the above limitations, we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform. Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.","In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform.","Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant.",Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets.,"We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.",He_2018,,,,1,,,,,,,,, Passive and Active Measurement,PAM,2018,research track paper,Information Systems,B,http://link.springer.com/chapter/10.1007/978-3-319-76481-8_13,"Oliver Gasser, Benjamin Hof, Max Helm, Maciej Korczynski, Ralph Holz, Georg Carle",In Log We Trust: Revealing Poor Security Practices with Certificate Transparency Logs and Internet Measurements,,,,,,,,,,,,,,,,,,, International Conference on Software Engineering,ICSE,2017,research track paper,Computer Software,A*,https://dl.acm.org/citation.cfm?id=3097378,"Boyuan Chen, Zhen Ming (Jack) Jiang",Characterizing and detecting anti-patterns in the logging code,"Snippets of logging code are output statements (e.g., LOG.info or System.out.println) that developers insert into a software system. Although more logging code can provide more execution context of the system's behavior during runtime, it is undesirable to instrument the system with too much logging code due to maintenance overhead. Furthermore, excessive logging may cause unexpected side-effects like performance slow-down or high disk I/O bandwidth. Recent studies show that there are no well-defined coding guidelines for performing effective logging. Previous research on the logging code mainly tackles the problems of where-to-log and what-to-log. There are very few works trying to address the problem of how-to-log (developing and maintaining high-quality logging code). In this paper, we study the problem of how-to-log by characterizing and detecting the anti-patterns in the logging code. As the majority of the logging code is evolved together with the feature code, the remaining set of logging code changes usually contains the fixes to the anti-patterns. We have manually examined 352 pairs of independently changed logging code snippets from three well-maintenance open source systems: ActiveMQ, Hadoop and Maven. Our analysis has resulted in six different anti-patterns in the logging code. To demonstrate the value of our findings, we have encoded these anti-patterns into a static code analysis tool, LCAnalyzer. Case studies show that LCAnalyzer has an average recall of 95% and precision of 60% and can be used to automatically detect previously unknown anti-patterns in the source code. To gather feedback, we have filed 64 representative instances of the logging code anti-patterns from the most recent releases of ten open source software systems. Among them, 46 instances (72%) have already been accepted by their developers.","In this paper, we study the problem of how-to-log by characterizing and detecting the anti-patterns in the logging code.","Snippets of logging code are output statements (e.g., LOG.info or System.out.println) that developers insert into a software system. Although more logging code can provide more execution context of the system's behavior during runtime, it is undesirable to instrument the system with too much logging code due to maintenance overhead. Furthermore, excessive logging may cause unexpected side-effects like performance slow-down or high disk I/O bandwidth. Recent studies show that there are no well-defined coding guidelines for performing effective logging. Previous research on the logging code mainly tackles the problems of where-to-log and what-to-log. There are very few works trying to address the problem of how-to-log (developing and maintaining high-quality logging code).","As the majority of the logging code is evolved together with the feature code, the remaining set of logging code changes usually contains the fixes to the anti-patterns. We have manually examined 352 pairs of independently changed logging code snippets from three well-maintenance open source systems: ActiveMQ, Hadoop and Maven.","Our analysis has resulted in six different anti-patterns in the logging code. To demonstrate the value of our findings, we have encoded these anti-patterns into a static code analysis tool, LCAnalyzer. Case studies show that LCAnalyzer has an average recall of 95% and precision of 60% and can be used to automatically detect previously unknown anti-patterns in the source code. To gather feedback, we have filed 64 representative instances of the logging code anti-patterns from the most recent releases of ten open source software systems. Among them, 46 instances (72%) have already been accepted by their developers.",Chen_2017,1,,,,,,,,,,,, Cluster Computing,,2017,journal,,?,http://link.springer.com/article/10.1007/s10586-017-1317-2,"Boyeon Song, Jangwon Choi, Sang-Soo Choi, Jungsuk Song",Visualization of security event logs across multiple networks and its application to a CSOC,,,,,,,,,,,,,,,,,,, IEEE International Conference on Program Comprehension,ICPC,2018,research track paper,Computer Software,C,https://dl.acm.org/citation.cfm?id=3196328,"Shanshan Li, Xu Niu, Zhouyang Jia, Ji Wang, Haochen He, Teng Wang",Logtracker: learning log revision behaviors proactively from software evolution history,,,,,,,,,,,,,,,,,,, Security and Communication Networks,,2015,journal,,Unranked,https://www.scopus.com/inward/record.url?eid=2-s2.0-84931010064&partnerID=40&md5=3792d59ea5012419a91fb5ad78d261a1,"Alsaleh, M., Alarifi, A., Alqahtani, A., Al-Salman, A.",Visualizing web server attacks: Patterns in PHPIDS logs,,,,,,,,,,,,,,,,,,, International Conference on Automated Software Engineering,ASE,2018,research track paper,Computer Software,A,https://dl.acm.org/citation.cfm?id=3238214,"Boyuan Chen, Jian Song, Peng Xu, Xing Hu, Zhen Ming (Jack) Jiang",An automated approach to estimating code coverage measures via execution logs,"Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly used to evaluate and improve the existing test suites. Based on our industrial and open source studies, existing state-of-the-art code coverage tools are only used during unit and integration testing due to issues like engineering challenges, performance overhead, and incomplete results. To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, to estimating code coverage measures using the readily available execution logs. Using program analysis techniques, LogCoCo matches the execution logs with their corresponding code paths and estimates three different code coverage criteria: method coverage, statement coverage, and branch coverage. Case studies on one open source system (HBase) and five commercial systems from Baidu and systems show that: (1) the results of LogCoCo are highly accurate (>96% in seven out of nine experiments) under a variety of testing activities (unit testing, integration testing, and benchmarking); and (2) the results of LogCoCo can be used to evaluate and improve the existing test suites. Our collaborators at Baidu are currently considering adopting LogCoCo and use it on a daily basis.","To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, to estimating code coverage measures using the readily available execution logs.","Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly used to evaluate and improve the existing test suites. Based on our industrial and open source studies, existing state-of-the-art code coverage tools are only used during unit and integration testing due to issues like engineering challenges, performance overhead, and incomplete results.","Using program analysis techniques, LogCoCo matches the execution logs with their corresponding code paths and estimates three different code coverage criteria: method coverage, statement coverage, and branch coverage.","Case studies on one open source system (HBase) and five commercial systems from Baidu and systems show that: (1) the results of LogCoCo are highly accurate (>96% in seven out of nine experiments) under a variety of testing activities (unit testing, integration testing, and benchmarking); and (2) the results of LogCoCo can be used to evaluate and improve the existing test suites. Our collaborators at Baidu are currently considering adopting LogCoCo and use it on a daily basis.", Chen_2018,,,,,,,,,,1,,, International Conference on Automated Software Engineering,ASE,2018,research track paper,Computer Software,A,https://dl.acm.org/citation.cfm?id=3238193,"Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu",Characterizing the natural language descriptions in software logging statements,"Logging is a common programming practice of great importance in modern software development, because software logs have been widely used in various software maintenance tasks. To provide high-quality logs, developers need to design the description text in logging statements carefully. Inappropriate descriptions will slow down or even mislead the maintenance process, such as postmortem analysis. However, there is currently a lack of rigorous guide and specifications on developer logging behaviors, which makes the construction of description text in logging statements a challenging problem. To fill this significant gap, in this paper, we systematically study what developers log, with focus on the usage of natural language descriptions in logging statements. We obtain 6 valuable findings by conducting source code analysis on 10 Java projects and 7 C# projects, which contain 28,532,975 LOC and 115,159 logging statements in total. Furthermore, our study demonstrates the potential of automated description text generation for logging statements by obtaining up to 49.04 BLEU-4 score and 62.1 ROUGE-L score using a simple information retrieval method. To facilitate future research in this field, the datasets have been publicly released.","To fill this significant gap, in this paper, we systematically study what developers log, with focus on the usage of natural language descriptions in logging statements.","Logging is a common programming practice of great importance in modern software development, because software logs have been widely used in various software maintenance tasks. To provide high-quality logs, developers need to design the description text in logging statements carefully. Inappropriate descriptions will slow down or even mislead the maintenance process, such as postmortem analysis. However, there is currently a lack of rigorous guide and specifications on developer logging behaviors, which makes the construction of description text in logging statements a challenging problem.","We obtain 6 valuable findings by conducting source code analysis on 10 Java projects and 7 C# projects, which contain 28,532,975 LOC and 115,159 logging statements in total.","Furthermore, our study demonstrates the potential of automated description text generation for logging statements by obtaining up to 49.04 BLEU-4 score and 62.1 ROUGE-L score using a simple information retrieval method. To facilitate future research in this field, the datasets have been publicly released.",He_2018a,,,1,,,,,,,,,, International Conference on Distributed Computing Systems,ICDCS,2018,industry track paper,Distributed Computing,A,https://ieeexplore.ieee.org/document/8416368/,"Biplob Debnath , Mohiuddin Solaimani , Muhammad Ali Gulzar Gulzar , Nipun Arora , Cristian Lumezanu , JianWu Xu , Bo Zong , Hui Zhang , Guofei Jiang , Latifur Khan",LogLens: A Real-Time Log Analysis System,"Administrators of most user-facing systems depend on periodic log data to get an idea of the health and status of production applications. Logs report information, which is crucial to diagnose the root cause of complex problems. In this paper, we present a real-time log analysis system called LogLens that automates the process of anomaly detection from logs with no (or minimal) target system knowledge and user specification. In LogLens , we employ unsupervised machine learning based techniques to discover patterns in application logs, and then leverage these patterns along with the real-time log parsing for designing advanced log analytics applications. Compared to the existing systems which are primarily limited to log indexing and search capabilities, LogLens presents an extensible system for supporting both stateless and stateful log analysis applications. Currently, LogLens is running at the core of a commercial log analysis solution handling millions of logs generated from the large-scale industrial environments and reported up to 12096x man-hours reduction in troubleshooting operational problems compared to the manual approach.","In this paper, we present a real-time log analysis system called LogLens that automates the process of anomaly detection from logs with no (or minimal) target system knowledge and user specification.","Administrators of most user-facing systems depend on periodic log data to get an idea of the health and status of production applications. Logs report information, which is crucial to diagnose the root cause of complex problems.","In LogLens , we employ unsupervised machine learning based techniques to discover patterns in application logs, and then leverage these patterns along with the real-time log parsing for designing advanced log analytics applications. Compared to the existing systems which are primarily limited to log indexing and search capabilities, LogLens presents an extensible system for supporting both stateless and stateful log analysis applications.","Currently, LogLens is running at the core of a commercial log analysis solution handling millions of logs generated from the large-scale industrial environments and reported up to 12096x man-hours reduction in troubleshooting operational problems compared to the manual approach.", Debnath_2018,,,,,,1,,,,,,, "International Conference on Software Analysis, Evolution, and Reengineering",SANER,2018,research track paper,Computer Software,C,https://ieeexplore.ieee.org/document/8330208/,"Maikel Leemans , Wil M. P. van der Aalst , Mark G. J. van den Brand",Recursion aware modeling and discovery for hierarchical software event log analysis,"This paper presents 1) a novel hierarchy and recursion extension to the process tree model; and 2) the first, recursion aware process model discovery technique that leverages hierarchical information in event logs, typically available for software systems. This technique allows us to analyze the operational processes of software systems under real-life conditions at multiple levels of granularity. The work can be positioned in-between reverse engineering and process mining. An implementation of the proposed approach is available as a ProM plugin. Experimental results based on real-life (software) event logs demonstrate the feasibility and usefulness of the approach and show the huge potential to speed up discovery by exploiting the available hierarchy.",,,,,,,,,,,,,,,,,, Journal of Systems and Software,JSS,2018,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85047805628&partnerID=40&md5=64fc17972df185f35ce3b739216a7188,"Bao L., Li Q., Lu P., Lu J., Ruan T., Zhang K.",Execution anomaly detection in large-scale systems through console log analysis,"Execution anomaly detection is important for development, maintenance and performance tuning in large-scale systems. System console logs are the significant source of troubleshooting and problem diagnosis. However, manually inspecting logs to detect anomalies is unfeasible due to the increasing volume and complexity of log files. Therefore, this is a substantial demand for automatic anomaly detection based on log analysis. In this paper, we propose a general method to mine console logs to detect system problems. We first give some formal definitions of the problem, and then extract the set of log statements in the source code and generate the reachability graph to reveal the reachable relations of log statements. After that, we parse the log files to create log messages by combining information about log statements with information retrieval techniques. These messages are grouped into execution traces according to their execution units. We propose a novel anomaly detection algorithm that considers traces as sequence data and uses a probabilistic suffix tree based method to organize and differentiate significant statistical properties possessed by the sequences. Experiments on a CloudStack testbed and a Hadoop production system show that our method can effectively detect running anomalies in comparison with existing four detection algorithms.","In this paper, we propose a general method to mine console logs to detect system problems.","Execution anomaly detection is important for development, maintenance and performance tuning in large-scale systems. System console logs are the significant source of troubleshooting and problem diagnosis. However, manually inspecting logs to detect anomalies is unfeasible due to the increasing volume and complexity of log files. Therefore, this is a substantial demand for automatic anomaly detection based on log analysis.","We first give some formal definitions of the problem, and then extract the set of log statements in the source code and generate the reachability graph to reveal the reachable relations of log statements. After that, we parse the log files to create log messages by combining information about log statements with information retrieval techniques. These messages are grouped into execution traces according to their execution units. We propose a novel anomaly detection algorithm that considers traces as sequence data and uses a probabilistic suffix tree based method to organize and differentiate significant statistical properties possessed by the sequences.",Experiments on a CloudStack testbed and a Hadoop production system show that our method can effectively detect running anomalies in comparison with existing four detection algorithms.,Bao_2018,,,,,,1,,,,,,, Journal of Systems and Software,JSS,2018,journal,"Computer Software Information Systems",A,https://www.scopus.com/inward/record.url?eid=2-s2.0-85016586904&partnerID=40&md5=fc0d0575a6ab148fad0b060ffda87e8d,"Farshchi M., Schneider J.-G., Weber I., Grundy J.",Metric selection and anomaly detection for cloud operations using log and metric correlation analysis,"Cloud computing systems provide the facilities to make application services resilient against failures of individual computing resources. However, resiliency is typically limited by a cloud consumer's use and operation of cloud resources. In particular, system operations have been reported as one of the leading causes of system-wide outages. This applies specifically to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration – which are executed at much higher frequencies now than a decade ago. We address this problem by proposing a novel approach to detect errors in the execution of these kinds of operations, in particular for rolling upgrade operations. Our regression-based approach leverages the correlation between operations’ activity logs and the effect of operation activities on cloud resources. First, we present a metric selection approach based on regression analysis. Second, the output of a regression model of selected metrics is used to derive assertion specifications, which can be used for runtime verification of running operations. We have conducted a set of experiments with different configurations of an upgrade operation on Amazon Web Services, with and without randomly injected faults to demonstrate the utility of our new approach.","We address this problem by proposing a novel approach to detect errors in the execution of these kinds of operations, in particular for rolling upgrade operations.","Cloud computing systems provide the facilities to make application services resilient against failures of individual computing resources. However, resiliency is typically limited by a cloud consumer's use and operation of cloud resources. In particular, system operations have been reported as one of the leading causes of system-wide outages. This applies specifically to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration – which are executed at much higher frequencies now than a decade ago.","Our regression-based approach leverages the correlation between operations’ activity logs and the effect of operation activities on cloud resources. First, we present a metric selection approach based on regression analysis. Second, the output of a regression model of selected metrics is used to derive assertion specifications, which can be used for runtime verification of running operations. We have conducted a set of experiments with different configurations of an upgrade operation on Amazon Web Services, with and without randomly injected faults to demonstrate the utility of our new approach.",,Farshchi_2018,,,,,,1,,,,,,, Computer Engineering,,2006,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJC200622023.htm,,Design and implementation of log audit system.,,,,,,,,,,,,,,,,,,, Computer Knowledge and Technology,,2010,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-DNZS201022018.htm,,Research on Hadoop-based network log analysis system [J],,,,,,,,,,,,,,,,,,, Computer Software,,2010,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-78650823768&partnerID=40&md5=1b9a991f68322a60dae7cbdebd580550,"Goto, J., Takada, H., Honda, S., Nagao, T.",Visualization tool for trace log: TraceLogVisualizer (TLV),,,,,,,,,,,,,,,,,,, "European Conference on Information Warfare and Security , ECIW",,2005,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84873882664&partnerID=40&md5=61294af7c57dfb7a97a38c4ae0f339eb,"Mee, V., Sutherland, I.",Windows event logs and their forensic usefulness,,,,,,,,,,,,,,,,,,, ICIC Express Letters,,2017,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85007294777&partnerID=40&md5=6718599dc0d7279ee7163a61d516adc0,"Chan-In, P., Wongthai, W.",Performance improvement considerations of cloud logging systems,,,,,,,,,,,,,,,,,,, ICIC Express Letters,,2016,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84956990947&partnerID=40&md5=348a8f8ab17a1ea48a25dc8ccb56cb20,"Wongthai, W., Van Moorsel, A.",Performance measurement of logging systems in infrastructure as a service cloud,,,,,,,,,,,,,,,,,,, "ICIC Express Letters, Part B: Applications",,2014,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84893206393&partnerID=40&md5=f9729ef551c2f5014c687a4196841eda,"Hong, Y.R., Kim, D.",Implementation of an advanced method for auditing web application server logs using access record,,,,,,,,,,,,,,,,,,, Indonesian Journal of Electrical Engineering and Computer Science,,2018,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85041112640&partnerID=40&md5=990f7708ad595e8b659bb742d39ddc56,"Rastogi, R., Nahata, S., Ghuli, P., Pratiba, D., Shobha, G.",Anomaly detection in log records,,,,,,,,,,,,,,,,,,, International Conference on Parallel and Distributed Computing and Systems,,2008,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-74549200861&partnerID=40&md5=29eda1c2e41c8bfefa967b292558d31f,"Solano-Quinde, L.D., Bode, B.M.",RAS and Job log data analysis for failure prediction for the IBM Blue Gene/L,,,,,,,,,,,,,,,,,,, "International Conference on Software Engineering and Data Engineering, SEDE",,2012,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84871983900&partnerID=40&md5=6903a73f1bc3331a5121fadda935fa44,"Rabidas, S., Kumar, J., Das, S., Ghosh, D., Debnath, N.",Towards forensic readiness and homogeneity of operating system logs,,,,,,,,,,,,,,,,,,, International Journal of Advancements in Computing Technology,,2012,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84859813028&partnerID=40&md5=b7908417df7f1dfe983b4e318686acc5,"Jia, Z., Gong, Z., Wei, Z., Zhang, J., Luo, S., Xin, Y.",A distributed method on web log sequential pattern mining,,,,,,,,,,,,,,,,,,, International Journal of Embedded Systems,,2018,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-85051002788&partnerID=40&md5=dc830f4b9ee0dc6fe5efa84a19583bc2,"Benkhelifa, E., Thomas, B.E., Tawalbeh, L., Jararweh, Y.",A framework and a process for digital forensic analysis on smart phones with multiple data logs,,,,,,,,,,,,,,,,,,, Jisuanji GongchengComputer Engineering,,2002,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-0036122397&partnerID=40&md5=dc3ca470cdf45005e06f9269eaa81eaa,"Jiang, Y., Tian, S.",Research on data mining to system log audit information in IDS,,,,,,,,,,,,,,,,,,, Jisuanji GongchengComputer Engineering,,2002,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-0036619061&partnerID=40&md5=b21162adf72df5ba62912708d74dc835,"Li, C., Wang, W., Cheng, L., Wang, W., Li, J.",Study and implementation of network security audit system based on firewall log,,,,,,,,,,,,,,,,,,, Jisuanji GongchengComputer Engineering,,2003,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-0242657322&partnerID=40&md5=bada22384462244fc8602468a7c1cd88,"Lin, H., Dou, M.",Secure defense of system log,,,,,,,,,,,,,,,,,,, Jisuanji XuebaoChinese Journal of Computers,,2014,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84893571517&partnerID=40&md5=9675448fad7f15ac075cc159cd00b951,"Gao, Y., Zhou, W., Han, J.-Z., Meng, D.",An online log anomaly detection method based on grammar compression,,,,,,,,,,,,,,,,,,, Jisuanji Yanjiu yu FazhanComputer Research and Development,,2013,,,?,https://www.scopus.com/inward/record.url?eid=2-s2.0-84877297264&partnerID=40&md5=dd8d675d723abe3ccded8f674e16b08d,"Wang, N., Han, J., Fang, J.",A log anomaly detection algorithm for debugging based on grammar-based codes,,,,,,,,,,,,,,,,,,, Journal of Hebei University(Natural Science Edition),,2005,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-HBDD200502019.htm,,Data Preparation for Web Log Mining [J],,,,,,,,,,,,,,,,,,, ,,2005,,,?,http://en.cnki.com.cn/Article_en/CJFDTotal-BJGY200504007.htm,,Data preprocess in Web log mining [J],,,,,,,,,,,,,,,,,,, ,,2006,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-CQSF200604010.htm,,Research on Application of Web Log Mining Technology [J],,,,,,,,,,,,,,,,,,, ,,2002,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJC200207108.htm,,Research and Application of Web Log Mining [J],,,,,,,,,,,,,,,,,,, ,,2004,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJY200406019.htm,,An information mining method on web log,,,,,,,,,,,,,,,,,,, ,,2006,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJZ200610028.htm,,A Distributed Web Log Mining System Based on Mobile Agent Technology [J],,,,,,,,,,,,,,,,,,, ,,2007,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-SJSJ200710034.htm,,Research on data preprocessing technology in web log mining,,,,,,,,,,,,,,,,,,, ,,2008,,,?,http://en.cnki.com.cn/Article_en/CJFDTotal-TYGY200802002.htm,,Research on Method for Session Identification in Web Log Mining,,,,,,,,,,,,,,,,,,, ,,2007,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-WJFZ200708005.htm,,Research on Data Preprocessing Technology in Web Log Mining [J],,,,,,,,,,,,,,,,,,, ,,2009,,,?,http://en.cnki.com.cn/Article_en/CJFDTOTAL-WJFZ200911038.htm,,Research and Design Web-Based Security Audit Log System [J],,,,,,,,,,,,,,,,,,, ,,2004,,,?,http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/abstract/abstract1414.shtml,,The Function of Log Analysis in Network Security,,,,,,,,,,,,,,,,,,, International Conference on Automated Software Engineering,ASE,2019,research track paper,,A,https://ieeexplore.ieee.org/abstract/document/8952375?,"Ren, Z., Liu, C., Xiao, X., Jiang, H., & Xie, T.",Root cause localization for unreproducible builds via causality analysis over system call tracing,"Localization of the root causes for unreproducible builds during software maintenance is an important yet challenging task, primarily due to limited runtime traces from build processes and high diversity of build environments. To address these challenges, in this paper, we propose RepTrace, a framework that leverages the uniform interfaces of system call tracing for monitoring executed build commands in diverse build environments and identifies the root causes for unreproducible builds by analyzing the system call traces of the executed build commands. Specifically, from the collected system call traces, RepTrace performs causality analysis to build a dependency graph starting from an inconsistent build artifact (across two builds) via two types of dependencies: read/write dependencies among processes and parent/child process dependencies, and searches the graph to find the processes that result in the inconsistencies. To address the challenges of massive noisy dependencies and uncertain parent/child dependencies, RepTrace includes two novel techniques: (1) using differential analysis on multiple builds to reduce the search space of read/write dependencies, and (2) computing similarity of the runtime values to filter out noisy parent/child process dependencies. The evaluation results of RepTrace over a set of real-world software packages show that RepTrace effectively finds not only the root cause commands responsible for the unreproducible builds, but also the files to patch for addressing the unreproducible issues. Among its Top-10 identified commands and files, RepTrace achieves high accuracy rate of 90.00% and 90.56% in identifying the root causes, respectively.","we propose RepTrace, a framework that leverages the uniform interfaces of system call tracing for monitoring executed build commands in diverse build environments and identifies the root causes for unreproducible builds by analyzing the system call traces of the executed build commands.","Localization of the root causes for unreproducible builds during software maintenance is an important yet challenging task, primarily due to limited runtime traces from build processes and high diversity of build environments.","Specifically, from the collected system call traces, RepTrace performs causality analysis to build a dependency graph starting from an inconsistent build artifact (across two builds) via two types of dependencies: read/write dependencies among processes and parent/child process dependencies, and searches the graph to find the processes that result in the inconsistencies. To address the challenges of massive noisy dependencies and uncertain parent/child dependencies, RepTrace includes two novel techniques: (1) using differential analysis on multiple builds to reduce the search space of read/write dependencies, and (2) computing similarity of the runtime values to filter out noisy parent/child process dependencies.","The evaluation results of RepTrace over a set of real-world software packages show that RepTrace effectively finds not only the root cause commands responsible for the unreproducible builds, but also the files to patch for addressing the unreproducible issues. Among its Top-10 identified commands and files, RepTrace achieves high accuracy rate of 90.00% and 90.56% in identifying the root causes, respectively.",ren_2019,,,,,,,,1,,,,, Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering,FSE,2019,research track paper,,A*,https://dl.acm.org/doi/abs/10.1145/3338906.3338931,"Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Chen, J.",Robust log-based anomaly detection on unstable log data,"Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event data extracted from historical logs. However, we find that the existing methods do not work well in practice. These methods have the close-world assumption, which assumes that the log data is stable over time and the set of distinct log events is known. However, our empirical study shows that in practice, log data often contains previously unseen log events or log sequences. The instability of log data comes from two sources: 1) the evolution of logging statements, and 2) the processing noise in log data. In this paper, we propose a new log-based anomaly detection approach, called LogRobust. LogRobust extracts semantic information of log events and represents them as semantic vectors. It then detects anomalies by utilizing an attention-based Bi-LSTM model, which has the ability to capture the contextual information in the log sequences and automatically learn the importance of different log events. In this way, LogRobust is able to identify and handle unstable log events and sequences. We have evaluated LogRobust using logs collected from the Hadoop system and an actual online service system of Microsoft. The experimental results show that the proposed approach can well address the problem of log instability and achieve accurate and robust results on real-world, ever-changing log data.","In this paper, we propose a new log-based anomaly detection approach, called LogRobust.","Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event data extracted from historical logs. However, we find that the existing methods do not work well in practice. These methods have the close-world assumption, which assumes that the log data is stable over time and the set of distinct log events is known. However, our empirical study shows that in practice, log data often contains previously unseen log events or log sequences. The instability of log data comes from two sources: 1) the evolution of logging statements, and 2) the processing noise in log data. ","LogRobust extracts semantic information of log events and represents them as semantic vectors. It then detects anomalies by utilizing an attention-based Bi-LSTM model, which has the ability to capture the contextual information in the log sequences and automatically learn the importance of different log events. In this way, LogRobust is able to identify and handle unstable log events and sequences. We have evaluated LogRobust using logs collected from the Hadoop system and an actual online service system of Microsoft.","The experimental results show that the proposed approach can well address the problem of log instability and achieve accurate and robust results on real-world, ever-changing log data.",zhang_2019,,,,,,1,,,,,,, International Conference on Software Engineering,ICSE,2019,research track paper,,A*,https://ieeexplore.ieee.org/abstract/document/8811945,"Li, Z., Chen, T. H., Yang, J., & Shang, W.",DLFinder: Characterizing and detecting duplicate logging code smells,"Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem. In this paper, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers' understanding of the dynamic view of the system. We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers' feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and two additional systems: Camel and Wicket. In total, combining the results of DLFinder and our manual analysis, we reported 82 problematic code smell instances to developers and all of them have been fixed.","In this paper, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers' understanding of the dynamic view of the system.","Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem.","We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers' feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and two additional systems: Camel and Wicket.","In total, combining the results of DLFinder and our manual analysis, we reported 82 problematic code smell instances to developers and all of them have been fixed.",li_2019,,,1,,,,,,,,,, Empirical Software Engineering,,2019,journal,,A,https://link.springer.com/article/10.1007/s10664-019-09687-9,"Zeng, Y., Chen, J., Shang, W., & Chen, T. H. P.",Studying the characteristics of logging practices in mobile apps: a case study on f-droid,"Logging is a common practice in software engineering. Prior research has investigated the characteristics of logging practices in system software (e.g., web servers or databases) as well as desktop applications. However, despite the popularity of mobile apps, little is known about their logging practices. In this paper, we sought to study logging practices in mobile apps. In particular, we conduct a case study on 1,444 open source Android apps in the F-Droid repository. Through a quantitative study, we find that although mobile app logging is less pervasive than server and desktop applications, logging is leveraged in almost all studied apps. However, we find that there exist considerable differences between the logging practices of mobile apps and the logging practices in server and desktop applications observed by prior studies. In order to further understand such differences, we conduct a firehouse email interview and a qualitative annotation on the rationale of using logs in mobile app development. By comparing the logging level of each logging statement with developers’ rationale of using the logs, we find that all too often (35.4%), the chosen logging level and the rationale are inconsistent. Such inconsistency may prevent the useful runtime information to be recorded or may generate unnecessary logs that may cause performance overhead. Finally, to understand the magnitude of such performance overhead, we conduct a performance evaluation between generating all the logs and not generating any logs in eight mobile apps. In general, we observe a statistically significant performance overhead based on various performance metrics (response time, CPU and battery consumption). In addition, we find that if the performance overhead of logging is significantly observed in an app, disabling the unnecessary logs indeed provides a statistically significant performance improvement. Our results show the need for a systematic guidance and automated tool support to assist in mobile logging practices.","we conduct a case study on 1,444 open source Android apps in the F-Droid repository. ","Prior research has investigated the characteristics of logging practices in system software (e.g., web servers or databases) as well as desktop applications. However, despite the popularity of mobile apps, little is known about their logging practices.",,,zeng_2019,1,,,,,,,,,,,, International Conference on Software Maintenance and Evolution,ICSME,2019,research track paper,,A,https://ieeexplore.ieee.org/abstract/document/8919133,"Zhi, Chen, Jianwei Yin, Shuiguang Deng, Maoxin Ye, Min Fu, and Tao Xie",An exploratory study of logging configuration practice in Java,"Logging components are an integral element of software systems. These logging components receive the logging requests generated by the logging code and process these requests according to logging configurations. Logging configurations play an important role on the functionality, performance, and reliability of logging. Although recent research has been conducted to understand and improve current practice on logging code, no existing research focuses on logging configurations. To fill this gap, we conduct an exploratory study on logging configuration practice of 10 open-source projects and 10 industrial projects written in Java in various sizes and domains. We quantitatively show how logging configurations are used with respect to logging management, storage, and formatting. We categorize and analyze the change history (1,213 revisions) of logging configurations to understand how the logging configurations evolve. Based on these study results, we reveal 10 findings about current practice of logging configurations. As a proof of concept, we develop a simple detector based on some of our findings. We apply our detector on three popular open-source projects and identify three long-lived issues (more than two years). All these issues are confirmed and two of them have been fixed by the open-source developers.","To fill this gap, we conduct an exploratory study on logging configuration practice of 10 open-source projects and 10 industrial projects written in Java in various sizes and domains.","Although recent research has been conducted to understand and improve current practice on logging code, no existing research focuses on logging configurations. ",,,zhi_2019,1,,,,,,,,,,,, Empirical Software Engineering,,2019,journal,,A,https://link.springer.com/article/10.1007/s10664-019-09757-y,"Li, Shanshan, Xu Niu, Zhouyang Jia, Xiangke Liao, Ji Wang, and Tao Li",Guiding log revisions by learning from software evolution history,"Despite the importance of log statements in postmortem debugging, developers are difficult to establish good logging practices. There are mainly two reasons. First, there are no rigorous specifications or systematic processes to instruct logging practices. Second, logging code evolves with bug fixes or feature updates. Without considering the impact of software evolution, previous works on log enhancement can partially release the first problem but are hard to solve the latter. To fill this gap, this paper proposes to guide log revisions by learning from evolution history. Motivated by code clones, we assume that logging code with similar context is pervasive and deserves similar modifications and conduct an empirical study on 12 open-source projects to validate our assumption. Upon this, we design and implement LogTracker, an automatic tool that learns log revision rules by mining the correlation between logging context and modifications and recommends candidate log revisions by applying these rules. With an enhanced modeling of logging context, LogTracker can instruct more intricate log revisions that cannot be covered by existing tools. Our experiments show that LogTracker can detect 369 instances of candidates when applied to the latest versions of software. So far, we have reported 79 of them, and 52 have been accepted.","To fill this gap, this paper proposes to guide log revisions by learning from evolution history. ","Despite the importance of log statements in postmortem debugging, developers are difficult to establish good logging practices. There are mainly two reasons. First, there are no rigorous specifications or systematic processes to instruct logging practices. Second, logging code evolves with bug fixes or feature updates. Without considering the impact of software evolution, previous works on log enhancement can partially release the first problem but are hard to solve the latter. ","Motivated by code clones, we assume that logging code with similar context is pervasive and deserves similar modifications and conduct an empirical study on 12 open-source projects to validate our assumption. Upon this, we design and implement LogTracker, an automatic tool that learns log revision rules by mining the correlation between logging context and modifications and recommends candidate log revisions by applying these rules. With an enhanced modeling of logging context, LogTracker can instruct more intricate log revisions that cannot be covered by existing tools. ","Our experiments show that LogTracker can detect 369 instances of candidates when applied to the latest versions of software. So far, we have reported 79 of them, and 52 have been accepted.",li_2019_emse,,,1,,,,,,,,,, IEEE Transactions on Software Engineering,,2019,journal,,A*,https://ieeexplore.ieee.org/abstract/document/8840982,"Liu, Zhongxin, Xin Xia, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li",Which variables should i log?,"Developers usually depend on inserting logging statements into the source code to collect system runtime information. Such logged information is valuable for software maintenance. A logging statement usually prints one or more variables to record vital system status. However, due to the lack of rigorous logging guidance and the requirement of domain-specific knowledge, it is not easy for developers to make proper decisions about which variables to log. To address this need, in this work, we propose an approach to recommend logging variables for developers during development by learning from existing logging statements. Different from other prediction tasks in software engineering, this task has two challenges: 1) Dynamic labels -- different logging statements have different sets of accessible variables, which means in this task, the set of possible labels of each sample is not the same. 2) Out-of-vocabulary words -- identifiers' names are not limited to natural language words and the test set usually contains a number of program tokens which are out of the vocabulary built from the training set and cannot be appropriately mapped to word embeddings. To deal with the first challenge, we convert this task into a representation learning problem instead of a multi-label classification problem. Given a code snippet which lacks a logging statement, our approach first leverages a neural network with an RNN (recurrent neural network) layer and a self-attention layer to learn the proper representation of each program token, and then predicts whether each token should be logged through a unified binary classifier based on the learned representation. To handle the second challenge, we propose a novel method to map program tokens into word embeddings by making use of the pre-trained word embeddings of natural language tokens. We evaluate our approach on 9 large and high-quality Java projects. Our evaluation results show that the average MAP of our approach is over 0.84, outperforming random guess and an information-retrieval-based method by large margins.",we propose an approach to recommend logging variables for developers during development by learning from existing logging statements.,"However, due to the lack of rigorous logging guidance and the requirement of domain-specific knowledge, it is not easy for developers to make proper decisions about which variables to log.","Different from other prediction tasks in software engineering, this task has two challenges: 1) Dynamic labels -- different logging statements have different sets of accessible variables, which means in this task, the set of possible labels of each sample is not the same. 2) Out-of-vocabulary words -- identifiers' names are not limited to natural language words and the test set usually contains a number of program tokens which are out of the vocabulary built from the training set and cannot be appropriately mapped to word embeddings. To deal with the first challenge, we convert this task into a representation learning problem instead of a multi-label classification problem. Given a code snippet which lacks a logging statement, our approach first leverages a neural network with an RNN (recurrent neural network) layer and a self-attention layer to learn the proper representation of each program token, and then predicts whether each token should be logged through a unified binary classifier based on the learned representation. To handle the second challenge, we propose a novel method to map program tokens into word embeddings by making use of the pre-trained word embeddings of natural language tokens. ","We evaluate our approach on 9 large and high-quality Java projects. Our evaluation results show that the average MAP of our approach is over 0.84, outperforming random guess and an information-retrieval-based method by large margins.",liu_2019,,,1,,,,,,,,,, International Conference on Software Maintenance and Evolution,ICSME,2019,research track paper,,A,https://ieeexplore.ieee.org/abstract/document/8919094,"Anu, Han, Jie Chen, Wenchang Shi, Jianwei Hou, Bin Liang, and Bo Qin",An Approach to Recommendation of Verbosity Log Levels Based on Logging Intention,"Verbosity levels of logs are designed to discriminate highly diverse runtime events, which facilitates system failure identification through simple keyword search (e.g., fatal, error). Verbosity levels should be properly assigned to logging statements, as inappropriate verbosity levels would confuse users and cause a lot of redundant maintenance effort. However, to achieve such a goal is not an easy task due to the lack of practical specifications and guidelines towards verbosity log level usages. The existing research has built a classification model on log related quantitative metrics such as log density to improve logging level practice. Though such quantitative metrics can reveal logging characteristics, their contributions on logging level decision are limited, since valuable logging intention information buried in logging code context can not be captured. In this paper, we propose an automatic approach to help developers determine the appropriate verbosity log levels. More specially, our approach discriminates different verbosity log level usages based on code context features that contain underlying logging intention. To validate our approach, we implement a prototype tool, VerbosityLevelDirector, and perform a case study to measure its effectiveness on four well-known open source software projects. Evaluation results show that VerbosityLevelDirector achieves high performance on verbosity level discrimination and outperforms the baseline approaches on all those projects. Furthermore, through applying noise handling technique, our approach can detect previously unknown inappropriate verbosity level configurations in the code repository. We have reported 21 representative logging level errors with modification advice to issue tracking platforms of the examined software projects and received positive feedback from their developers. The above results confirm that our work can help developers make a better logging level decision in real-world engineering.",we propose an automatic approach to help developers determine the appropriate verbosity log levels.,"Verbosity levels should be properly assigned to logging statements, as inappropriate verbosity levels would confuse users and cause a lot of redundant maintenance effort. However, to achieve such a goal is not an easy task due to the lack of practical specifications and guidelines towards verbosity log level usages. The existing research has built a classification model on log related quantitative metrics such as log density to improve logging level practice. Though such quantitative metrics can reveal logging characteristics, their contributions on logging level decision are limited, since valuable logging intention information buried in logging code context can not be captured.","More specially, our approach discriminates different verbosity log level usages based on code context features that contain underlying logging intention.","Evaluation results show that VerbosityLevelDirector achieves high performance on verbosity level discrimination and outperforms the baseline approaches on all those projects. Furthermore, through applying noise handling technique, our approach can detect previously unknown inappropriate verbosity level configurations in the code repository. We have reported 21 representative logging level errors with modification advice to issue tracking platforms of the examined software projects and received positive feedback from their developers. The above results confirm that our work can help developers make a better logging level decision in real-world engineering.",anu_2019,,,1,,,,,,,,,, International Joint Conference on Artificial Intelligence,IJCAI,2019,research track paper,,A*,https://www.researchgate.net/publication/334843942_LogAnomaly_Unsupervised_Detection_of_Sequential_and_Quantitative_Anomalies_in_Unstructured_Logs,"Meng, Weibin, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen et al",LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs,"Recording runtime status via logs is common for almost every computer system, and detecting anomalies in logs is crucial for timely identifying malfunctions of systems. However, manually detecting anomalies for logs is time-consuming, error-prone, and infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics of log templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework to model unstructured a log stream as a natural language sequence. Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which were not done by any previous work. Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log templates between periodic model retrainings. Our evaluation on two public production log datasets show that LogAnomaly outperforms existing log-based anomaly detection methods.",,,,,meng_2019,,,,,,1,,,,,,, International Conference on Software Engineering,ICSE,2019,industry track paper,,A*,https://dl.acm.org/doi/10.1109/ICSE-SEIP.2019.00021,"Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R.",Tools and benchmarks for automated log parsing,"Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The increasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.",,,,,zhu_2019,,,,1,,,,,,,,, International Conference on Data Engineering,ICDE,2019,research track paper,,A*,https://ieeexplore.ieee.org/abstract/document/8731527,"Agrawal, Amey, Rohit Karlupia, and Rajat Gupta",Logan: A distributed online log parser,"Logs serve as a critical tool for debugging and monitoring applications. However, gaining insights from unstructured logs is difficult. Hence, many log management and analysis applications first parse logs into structured templates. In this paper, we train a data-driven log parser on our new Apache Spark dataset, the largest application log dataset yet. We implement a distributed online algorithm to accommodate for the large volume of data. We also devise a new metric for evaluation of parsers when labeled data is unavailable. We show that our method generalizes over diverse datasets without any parameter tuning or domain-specific inputs from the user. When evaluated on publicly available HDFS dataset our method performs 13x faster than the previous state-of-the-art.",,,,,agrawal_2019,,,,1,,,,,,,,, International Conference on Automated Software Engineering,ASE,2019,research track paper,,A,https://ieeexplore.ieee.org/abstract/document/8952406,"Liu, J., Zhu, J., He, S., He, P., Zheng, Z., & Lyu, M. R.",Logzip: extracting hidden structures via iterative clustering for log compression,"System logs record detailed runtime information of software systems and are used as the main data source for many tasks around software engineering. As modern software systems are evolving into large scale and complex structures, logs have become one type of fast-growing big data in industry. In particular, such logs often need to be stored for a long time in practice (e.g., a year), in order to analyze recurrent problems or track security issues. However, archiving logs consumes a large amount of storage space and computing resources, which in turn incurs high operational cost. Data compression is essential to reduce the cost of log storage. Traditional compression tools (e.g., gzip) work well for general texts, but are not tailed for system logs. In this paper, we propose a novel and effective log compression method, namely logzip. Logzip is capable of extracting hidden structures from raw logs via fast iterative clustering and further generating coherent intermediate representations that allow for more effective compression. We evaluate logzip on five large log datasets of different system types, with a total of 63.6 GB in size. The results show that logzip can save about half of the storage space on average over traditional compression tools. Meanwhile, the design of logzip is highly parallel and only incurs negligible overhead. In addition, we share our industrial experience of applying logzip to Huawei's real products.",,,,,liu_2019_logzip,,,,,1,,,,,,,,