-- DESCRIPTION
Attached is a synthetic test dataset and the source code necessary for the tests.
In the / Java folder there is the source code of the RL application (NetBeans projects).
In the / MLP_TF folder there are the python scripts used for creating and evaluating the MLP model.
In the folder / dataset there are the tables used and the dataset obtained from them. The various data are available both in .csv format and loaded in a SQLlite database because it is necessary for the use of the application.
The / Results folder contains the tests already performed with MLP and Threshold Classifier (using weights)
Finally, in the / Demo folder there is the .jar already compiled, the SQLlite database and the MLP model. Running the jar starts the RL producing the results in output.

-- TABLES & DATASET
Two tables have been created (called TAB_A and TAB_B) of 1000 records each and containing 250 MATCH. In particular:
The structure of the columns present is similar to that of the original data.
The fiscal code has been replaced by a progressive number.
The various registries, as with the original data, concern mainly geographically close places (in this case the Province of Campania) and the various names, surnames, provinces, municipalities and dates have been extracted from non-uniform distributions (for example by weighing more surnames widespread or common with a larger population).
Both tables were then added with noise such as errors, missing values, incorrect / replaced characters, missing words ...
The dataset was obtained by applying the Indexing phase to these two tables, obtaining about 3400 pairs of candidate records (of which, only 250 are MATCH).
The original tables, those obtained by preprocessing and indexing are pre-calculated and stored in the SQLite database to be used.


-- JAVA CODE
The Java source code is divided into two distinct NetBeans projects.
The first RecordLinkage project contains part of the original code of the RL application from which the code of the Preprocessing and Indexing phases and all the SADAS proprietary code or references and uses of their DBMS were removed.
The second RecordLinkageTest project uses the first one as libraries and performs the RL process. The configuration of the process (such as the choice of tables, columns, comparison functions, classifier, ...) is carried out in a class called AutoConfig.java that can be modified, in particular to change the classifier it is sufficient to uncomment the code :
    
	@Override
    public IClassifier classifier() {
        IClassifier cls = new MLPClassifier(this);
        //IClassifier cls = new ThresholdClassifier(this, 0.75, 0.75);
        //IClassifier cls = new OptimalThresholdClassifier(this, loadGoldenStandard()).setStep(0.01).setVerbose(false);
        return cls;
    }


-- RESULTS
Once the application is executed, an OUTPUT_TAB_A_TAB_B folder is created in which there is a log of the results (metrics and list of TP, FP, TN and FN) and a summary of the configuration set for that test.
The outputs of the various tests are already present in the Results folder:
MLP classifier: the one trained with the original municipal and banking data;
Threshold classifier with a threshold set to 0.6 and weighted similarity vectors.
'Optimal Threshold' classifier with weighted similarity vectors. This classifier uses the golden standard to determine the optimal threshold threshold which in this case is 0.41. So it can be seen as an upper limit on the performance achievable by a Classifier at Threshold on that dataset.