You are here
Instance Filtering is a preprocessing step for supervised classification-based learning systems for entity recognition.
The goal of Instance Filtering is to reduce both the skewed class distribution and the data set size by eliminating negative instances, while preserving positive ones as much as possible. This process is performed on both the training and test set, with the effect of reducing the learning and classification time, while maintaining or improving the prediction accuracy.
The core of jInFil is primarily developed by Claudio Giuliano and Raffaella Rinaldi.
jInFil is released as free software with full source code, provided under the terms of the Apache License, Version 2.0.
Some of jInFil's features include:
- Implements the class of Stop Word Filters
- Written in Java, so it runs on Mac OS X, OS/2, Unix, VMS and Windows
- Abstract filter and data representation interfaces
- Supports IOB and IOB2 data representation
The latest version of jInFil is 1.1, released 09 June 2006. To download, install and run it
contact Claudio Giuliano
You are welcome to use the code under the terms of the Apache License, Version 2.0, however please acknowledge its use with a citation:
- Alfio Massiliano Gliozzo, Claudio Giuliano and Raffaella Rinaldi,
Instance Filtering for Entity Recognition,
SIGKDD Explorations, Special Issue: Text Mining and Natural Language Processing, June 2005.
-  Alfio Massimiliano Gliozzo, Claudio Giuliano, Raffaella Rinaldi,
Instance Pruning by Filtering Uninformative Words: an Information
Extraction Case Study, in Proceeding of Sixth International
Conference on Intelligent Text Processing and Computational Linguistics
(CICLing-2005), Mexico City, Mexico, 13-19 February 2005.