Sunnyvale, CA, United States
Sunnyvale, CA, United States

Time filter

Source Type

Saneifar H.,University of Tehran | Bonniol S.,Satin Technologies | Poncelet P.,Montpellier University | Roche M.,Montpellier University | Roche M.,IRSTEA
Intelligent Data Analysis | Year: 2015

With the development of new technologies more and more information is stored in log files. Analyzing such logs can be very useful for the decision maker. One of the probably best known example is the Web log file analysis where lots of efficient tools have been proposed to extract the top-k accessed pages, the best users or even the patterns describing the behaviors of users on a Web site. These tools take advantages of the well-formed structures of the data. Unfortunately, logs files from the industrial world have very heterogeneous complex structures (e.g., tables, lists, data blocks). For experts, analyzing logs to find messages helping to better understand causes of a failure, if a problem have already occurred in the past or even knowing the main consequences of a failure is a hard, tedious, time-consuming and error-prone task. There is thus a need for new tools helping the experts to easily recognize the appropriate part in logs. Passage retrieval methods have proved to be very useful for extracting relevant parts in documents. In this paper we propose a new approach for automatically split logs files into relevant segments based on their logical units. We characterize the complex logical units found in logs according to their syntactic characteristics. We also introduce the notion of generalized vs-grams which is used to automatically extract the syntactic characteristics of special structures found in log files. Conducted experiments are performed on real datasets from the industrial world to demonstrate the efficiency of our proposal on the recognition of complex logical units. © 2015 - IOS Press and the authors. All rights reserved.


Saneifar H.,Montpellier University | Saneifar H.,Satin Technologies | Bonniol S.,Satin Technologies | Laurent A.,Montpellier University | And 2 more authors.
Communications in Computer and Information Science | Year: 2011

In many application areas, systems reports occurring events in a kind of textual data called usually log files. Log files report the status of systems, products, or even causes of problems that can occur. The Information extracted from log files of computing systems can be considered one of the important resources of information systems. Log files are considered as a kind of "complex textual data", i.e. the multi-source, heterogeneous, and multi-format data. In this paper, we aim particularly at exploring the lexical structure of these log files in order to extract the terms used in log files. These terms will be used in the building of domain ontology and also in enrichment of features of log files corpus. According to features of such textual data, applying the classical methods of information extraction is not an easy task, more particularly for terminology extraction. Here, we introduce a new developed version of Exterlog, our approach to extract the terminology from log files, which is guided by Web to evaluate the extracted terms. We score the extracted terms by a Web and context based measure. We favor the more relevant terms of domain and emphasize the precision by filtering terms based on their scores. The experiments show that Exterlog is well-adapted terminology extraction approach from log files. © 2011 Springer-Verlag.


Saneifar H.,Montpellier University | Saneifar H.,Satin Technologies | Bonniol S.,Satin Technologies | Laurent A.,Montpellier University | And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2010

The question answering systems are considered the next generation of search engines. This paper focuses on the first step of this process, which is to search for relevant passages containing answers. Passage Retrieval, can be difficult because of the complexity of data, log files in our case. Our contribution is based on the enrichment of queries by using a learning method and a novel term weighting function. This original term weighting function, used within the enrichment process, aims to assign a weight to terms according to their relatedness to the context of answers. Experiments conducted on real data show that our protocol of primitive query enrichment make it possible to retrieve relevant passages. © 2010 Springer-Verlag Berlin Heidelberg.


Saneifar H.,Montpellier University | Saneifar H.,Satin Technologies | Bonniol S.,Satin Technologies | Poncelet P.,Montpellier University | And 2 more authors.
Computers in Industry | Year: 2014

Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms. The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages. © 2014 Elsevier B.V.


Saneifar H.,Montpellier University | Saneifar H.,Satin Technologies | Bonniol S.,Satin Technologies | Laurent A.,Montpellier University | And 2 more authors.
CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications | Year: 2010

The question answering systems are considered the next generation of search engines. This paper focuses on the first step of this process which is to search for relevant passages containing the responses. Such a task can be difficult because of the complexity of data, logs files in our case. Our contribution is based on enrichment of queries using a learning method based on the notion of "lexical world" and a novel term weighting function. This original weighting function, implemented within the enrichment process, aims to assign a high weight to terms that might be relative to the context of the answer. Experiments conducted on real data show that our protocol of primitive query enrichment allows to extract relevant passages.


Saneifar H.,Montpellier University | Saneifar H.,Satin Technologies | Bonniol S.,Satin Technologies | Poncelet P.,Montpellier University | And 2 more authors.
Journal of Universal Computer Science | Year: 2015

Log files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises some challenges concerning the extraction of information from log files. Log files are usually multi-source, multi-format, and have a heterogeneous and evolving structure. Moreover, they usually do not respect natural language grammar and structures even though they are written in English. Classical methods of information extraction such as terminology extraction methods are particularly irrelevant to this context. In this paper, we introduce our approach Exterlog to extract terminology from log files. We detail how it deals with the specific features of such textual data. The performance is emphasized by favoring the most relevant terms of the domain based on a scoring function which uses a Web and context based measure. The experiments show that Exterlog is a well-adapted approach for terminology extraction from log files. © J.UCS.


Loading Satin Technologies collaborators
Loading Satin Technologies collaborators