Overview

Dataset statistics

 Cluster 1Cluster 2Cluster 3
Number of variables282828
Number of observations857511168
Missing cells000
Missing cells (%)0.0%0.0%0.0%
Duplicate rows25669120
Duplicate rows (%)29.9%1352.6%0.0%
Total size in memory183.4 KiB109.4 KiB36.1 KiB
Average record size in memory219.1 B219.2 B219.7 B

Variable types

 Cluster 1Cluster 2Cluster 3
Categorical282828

Alerts

Cluster 1Cluster 2Cluster 3
Cluster has constant value "0" Cluster has constant value "1" Cluster has constant value "2" Constant
Dataset has 256 (29.9%) duplicate rows Dataset has 6912 (1352.6%) duplicate rowsAlert not present in Duplicates
faixaEtaria is highly overall correlated with idadefaixaEtaria is highly overall correlated with idadefaixaEtaria is highly overall correlated with DIABETES and 1 other fieldsHigh Correlation
FORMACLIN1 is highly overall correlated with classifFORMACLIN1 is highly overall correlated with classifFORMACLIN1 is highly overall correlated with classifHigh Correlation
classif is highly overall correlated with FORMACLIN1classif is highly overall correlated with FORMACLIN1classif is highly overall correlated with FORMACLIN1High Correlation
hiv is highly overall correlated with aidshiv is highly overall correlated with aidshiv is highly overall correlated with aidsHigh Correlation
aids is highly overall correlated with hivaids is highly overall correlated with hivaids is highly overall correlated with hivHigh Correlation
idade is highly overall correlated with faixaEtariaidade is highly overall correlated with faixaEtariaidade is highly overall correlated with faixaEtariaHigh Correlation
tipoCaso is highly imbalanced (82.4%) tipoCaso is highly imbalanced (89.2%) tipoCaso is highly imbalanced (73.9%) Imbalance
FORMACLIN1 is highly imbalanced (88.7%) FORMACLIN1 is highly imbalanced (72.5%) FORMACLIN1 is highly imbalanced (60.2%) Imbalance
classif is highly imbalanced (74.6%) classif is highly imbalanced (55.3%) Alert not present in Imbalance
BACOUTRO is highly imbalanced (76.1%) BACOUTRO is highly imbalanced (73.3%) Alert not present in Imbalance
cultEsc is highly imbalanced (55.8%) Alert not present in Alert not present in Imbalance
NECROP is highly imbalanced (96.6%) NECROP is highly imbalanced (96.3%) NECROP is highly imbalanced (94.7%) Imbalance
hiv is highly imbalanced (54.6%) hiv is highly imbalanced (68.5%) hiv is highly imbalanced (59.3%) Imbalance
aids is highly imbalanced (67.9%) aids is highly imbalanced (82.9%) Alert not present in Imbalance
DIABETES is highly imbalanced (61.7%) DIABETES is highly imbalanced (67.8%) DIABETES is highly imbalanced (87.1%) Imbalance
MENTAL is highly imbalanced (87.3%) MENTAL is highly imbalanced (88.4%) MENTAL is highly imbalanced (90.7%) Imbalance
motMudEsquema is highly imbalanced (97.7%) motMudEsquema is highly imbalanced (93.2%) motMudEsquema is highly imbalanced (82.3%) Imbalance
HISTOPATOL is highly imbalanced (84.5%) HISTOPATOL is highly imbalanced (70.2%) HISTOPATOL is highly imbalanced (53.7%) Imbalance
Alert not present in sitAtual is highly imbalanced (59.7%) Alert not present in Imbalance
Alert not present in ALCOOLISMO is highly imbalanced (51.3%) Alert not present in Imbalance
Alert not present in DROGADICAO is highly imbalanced (60.4%) Alert not present in Imbalance
Alert not present in TABAGISMO is highly imbalanced (55.1%) TABAGISMO is highly imbalanced (58.6%) Imbalance
Alert not present in Status_Resistencia is highly imbalanced (51.9%) Alert not present in Imbalance
Alert not present in Alert not present in TIPOCUP is highly overall correlated with motMudEsquemaHigh Correlation
Alert not present in Alert not present in DIABETES is highly overall correlated with faixaEtariaHigh Correlation
Alert not present in Alert not present in motMudEsquema is highly overall correlated with TIPOCUPHigh Correlation

Reproduction

 Cluster 1Cluster 2Cluster 3
Analysis started2023-08-25 16:07:23.1997842023-08-25 16:07:28.7030412023-08-25 16:07:33.680331
Analysis finished2023-08-25 16:07:28.6977622023-08-25 16:07:33.6741862023-08-25 16:07:38.026250
Duration5.5 seconds4.97 seconds4.35 seconds
Software versionpandas-profiling v3.6.6pandas-profiling v3.6.6pandas-profiling v3.6.6
Download configurationconfig.jsonconfig.jsonconfig.json

Variables

racaCor
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct554
Distinct (%)0.6%1.0%2.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Pardo
371 
Branco
326 
Preto
145 
Indigena
 
8
Amarelo
 
7
Branco
351 
Pardo
118 
Preto
 
32
Indigena
 
5
Amarelo
 
5
Branco
93 
Pardo
54 
Preto
20 
Amarelo
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length887
Median length566
Mean length5.42473755.73581215.5654762
Min length555

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters46492931935
Distinct characters161613
Distinct categories222 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique001 ?
Unique (%)0.0%0.0%0.6%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowBrancoBrancoPardo
2nd rowPardoBrancoBranco
3rd rowBrancoBrancoBranco
4th rowPardoBrancoBranco
5th rowBrancoPardoPreto

Common Values

ValueCountFrequency (%)
Pardo 371
43.3%
Branco 326
38.0%
Preto 145
 
16.9%
Indigena 8
 
0.9%
Amarelo 7
 
0.8%
ValueCountFrequency (%)
Branco 351
68.7%
Pardo 118
 
23.1%
Preto 32
 
6.3%
Indigena 5
 
1.0%
Amarelo 5
 
1.0%
ValueCountFrequency (%)
Branco 93
55.4%
Pardo 54
32.1%
Preto 20
 
11.9%
Amarelo 1
 
0.6%

Length

2023-08-25T13:07:38.143283image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:38.320772image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:38.475564image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:38.625988image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
pardo 371
43.3%
branco 326
38.0%
preto 145
 
16.9%
indigena 8
 
0.9%
amarelo 7
 
0.8%
ValueCountFrequency (%)
branco 351
68.7%
pardo 118
 
23.1%
preto 32
 
6.3%
indigena 5
 
1.0%
amarelo 5
 
1.0%
ValueCountFrequency (%)
branco 93
55.4%
pardo 54
32.1%
preto 20
 
11.9%
amarelo 1
 
0.6%

Most occurring characters

ValueCountFrequency (%)
r 849
18.3%
o 849
18.3%
a 712
15.3%
P 516
11.1%
d 379
8.2%
n 342
7.4%
B 326
 
7.0%
c 326
 
7.0%
e 160
 
3.4%
t 145
 
3.1%
Other values (6) 45
 
1.0%
ValueCountFrequency (%)
r 506
17.3%
o 506
17.3%
a 479
16.3%
n 361
12.3%
B 351
12.0%
c 351
12.0%
P 150
 
5.1%
d 123
 
4.2%
e 42
 
1.4%
t 32
 
1.1%
Other values (6) 30
 
1.0%
ValueCountFrequency (%)
r 168
18.0%
o 168
18.0%
a 148
15.8%
B 93
9.9%
n 93
9.9%
c 93
9.9%
P 74
7.9%
d 54
 
5.8%
e 21
 
2.2%
t 20
 
2.1%
Other values (3) 3
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3792
81.6%
Uppercase Letter 857
 
18.4%
ValueCountFrequency (%)
Lowercase Letter 2420
82.6%
Uppercase Letter 511
 
17.4%
ValueCountFrequency (%)
Lowercase Letter 767
82.0%
Uppercase Letter 168
 
18.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 849
22.4%
o 849
22.4%
a 712
18.8%
d 379
10.0%
n 342
9.0%
c 326
 
8.6%
e 160
 
4.2%
t 145
 
3.8%
i 8
 
0.2%
g 8
 
0.2%
Other values (2) 14
 
0.4%
ValueCountFrequency (%)
r 506
20.9%
o 506
20.9%
a 479
19.8%
n 361
14.9%
c 351
14.5%
d 123
 
5.1%
e 42
 
1.7%
t 32
 
1.3%
i 5
 
0.2%
g 5
 
0.2%
Other values (2) 10
 
0.4%
ValueCountFrequency (%)
r 168
21.9%
o 168
21.9%
a 148
19.3%
n 93
12.1%
c 93
12.1%
d 54
 
7.0%
e 21
 
2.7%
t 20
 
2.6%
m 1
 
0.1%
l 1
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
P 516
60.2%
B 326
38.0%
I 8
 
0.9%
A 7
 
0.8%
ValueCountFrequency (%)
B 351
68.7%
P 150
29.4%
I 5
 
1.0%
A 5
 
1.0%
ValueCountFrequency (%)
B 93
55.4%
P 74
44.0%
A 1
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 4649
100.0%
ValueCountFrequency (%)
Latin 2931
100.0%
ValueCountFrequency (%)
Latin 935
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 849
18.3%
o 849
18.3%
a 712
15.3%
P 516
11.1%
d 379
8.2%
n 342
7.4%
B 326
 
7.0%
c 326
 
7.0%
e 160
 
3.4%
t 145
 
3.1%
Other values (6) 45
 
1.0%
ValueCountFrequency (%)
r 506
17.3%
o 506
17.3%
a 479
16.3%
n 361
12.3%
B 351
12.0%
c 351
12.0%
P 150
 
5.1%
d 123
 
4.2%
e 42
 
1.4%
t 32
 
1.1%
Other values (6) 30
 
1.0%
ValueCountFrequency (%)
r 168
18.0%
o 168
18.0%
a 148
15.8%
B 93
9.9%
n 93
9.9%
c 93
9.9%
P 74
7.9%
d 54
 
5.8%
e 21
 
2.2%
t 20
 
2.1%
Other values (3) 3
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4649
100.0%
ValueCountFrequency (%)
ASCII 2931
100.0%
ValueCountFrequency (%)
ASCII 935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 849
18.3%
o 849
18.3%
a 712
15.3%
P 516
11.1%
d 379
8.2%
n 342
7.4%
B 326
 
7.0%
c 326
 
7.0%
e 160
 
3.4%
t 145
 
3.1%
Other values (6) 45
 
1.0%
ValueCountFrequency (%)
r 506
17.3%
o 506
17.3%
a 479
16.3%
n 361
12.3%
B 351
12.0%
c 351
12.0%
P 150
 
5.1%
d 123
 
4.2%
e 42
 
1.4%
t 32
 
1.1%
Other values (6) 30
 
1.0%
ValueCountFrequency (%)
r 168
18.0%
o 168
18.0%
a 148
15.8%
B 93
9.9%
n 93
9.9%
c 93
9.9%
P 74
7.9%
d 54
 
5.8%
e 21
 
2.2%
t 20
 
2.1%
Other values (3) 3
 
0.3%

faixaEtaria
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct10129
Distinct (%)1.2%2.3%5.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
20_29
215 
30_39
203 
40_49
185 
50_59
114 
60_69
51 
Other values (5)
89 
20_29
139 
30_39
101 
40_49
95 
50_59
63 
15_19
39 
Other values (7)
74 
30_39
68 
40_49
43 
20_29
33 
50_59
14 
60_69
 
5
Other values (4)
 
5

Length

 Cluster 1Cluster 2Cluster 3
Max length161616
Median length555
Mean length5.06417745.08219185.0654762
Min length555

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters43402597851
Distinct characters212121
Distinct categories555 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique013 ?
Unique (%)0.0%0.2%1.8%

Sample

 Cluster 1Cluster 2Cluster 3
1st row20_2940_4940_49
2nd row40_4940_4930_39
3rd row50_5940_4920_29
4th row20_2950_5920_29
5th row30_3910_1430_39

Common Values

ValueCountFrequency (%)
20_29 215
25.1%
30_39 203
23.7%
40_49 185
21.6%
50_59 114
13.3%
60_69 51
 
6.0%
15_19 48
 
5.6%
70_79 21
 
2.5%
10_14 11
 
1.3%
Maior de 80 anos 5
 
0.6%
05_09 4
 
0.5%
ValueCountFrequency (%)
20_29 139
27.2%
30_39 101
19.8%
40_49 95
18.6%
50_59 63
12.3%
15_19 39
 
7.6%
60_69 34
 
6.7%
70_79 12
 
2.3%
10_14 11
 
2.2%
01_04 7
 
1.4%
05_09 6
 
1.2%
Other values (2) 4
 
0.8%
ValueCountFrequency (%)
30_39 68
40.5%
40_49 43
25.6%
20_29 33
19.6%
50_59 14
 
8.3%
60_69 5
 
3.0%
70_79 2
 
1.2%
05_09 1
 
0.6%
Maior de 80 anos 1
 
0.6%
01_04 1
 
0.6%

Length

2023-08-25T13:07:38.761399image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:38.947247image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Cluster 3

2023-08-25T13:07:39.164224image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
20_29 215
24.7%
30_39 203
23.3%
40_49 185
21.2%
50_59 114
13.1%
60_69 51
 
5.8%
15_19 48
 
5.5%
70_79 21
 
2.4%
10_14 11
 
1.3%
maior 5
 
0.6%
de 5
 
0.6%
Other values (3) 14
 
1.6%
ValueCountFrequency (%)
20_29 139
26.6%
30_39 101
19.3%
40_49 95
18.2%
50_59 63
12.0%
15_19 39
 
7.5%
60_69 34
 
6.5%
70_79 12
 
2.3%
10_14 11
 
2.1%
01_04 7
 
1.3%
05_09 6
 
1.1%
Other values (7) 16
 
3.1%
ValueCountFrequency (%)
30_39 68
39.8%
40_49 43
25.1%
20_29 33
19.3%
50_59 14
 
8.2%
60_69 5
 
2.9%
70_79 2
 
1.2%
05_09 1
 
0.6%
maior 1
 
0.6%
de 1
 
0.6%
80 1
 
0.6%
Other values (2) 2
 
1.2%

Most occurring characters

ValueCountFrequency (%)
_ 852
19.6%
9 841
19.4%
0 813
18.7%
2 430
9.9%
3 406
9.4%
4 381
8.8%
5 280
 
6.5%
1 118
 
2.7%
6 102
 
2.4%
7 42
 
1.0%
Other values (11) 75
 
1.7%
ValueCountFrequency (%)
_ 507
19.5%
9 489
18.8%
0 484
18.6%
2 278
10.7%
4 208
8.0%
3 202
 
7.8%
5 171
 
6.6%
1 108
 
4.2%
6 68
 
2.6%
7 24
 
0.9%
Other values (11) 58
 
2.2%
ValueCountFrequency (%)
0 170
20.0%
_ 167
19.6%
9 166
19.5%
3 136
16.0%
4 87
10.2%
2 66
 
7.8%
5 29
 
3.4%
6 10
 
1.2%
7 4
 
0.5%
3
 
0.4%
Other values (11) 13
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3418
78.8%
Connector Punctuation 852
 
19.6%
Lowercase Letter 50
 
1.2%
Space Separator 15
 
0.3%
Uppercase Letter 5
 
0.1%
ValueCountFrequency (%)
Decimal Number 2035
78.4%
Connector Punctuation 507
 
19.5%
Lowercase Letter 39
 
1.5%
Space Separator 12
 
0.5%
Uppercase Letter 4
 
0.2%
ValueCountFrequency (%)
Decimal Number 670
78.7%
Connector Punctuation 167
 
19.6%
Lowercase Letter 10
 
1.2%
Space Separator 3
 
0.4%
Uppercase Letter 1
 
0.1%

Most frequent character per category

Connector Punctuation
ValueCountFrequency (%)
_ 852
100.0%
ValueCountFrequency (%)
_ 507
100.0%
ValueCountFrequency (%)
_ 167
100.0%
Decimal Number
ValueCountFrequency (%)
9 841
24.6%
0 813
23.8%
2 430
12.6%
3 406
11.9%
4 381
11.1%
5 280
 
8.2%
1 118
 
3.5%
6 102
 
3.0%
7 42
 
1.2%
8 5
 
0.1%
ValueCountFrequency (%)
9 489
24.0%
0 484
23.8%
2 278
13.7%
4 208
10.2%
3 202
9.9%
5 171
 
8.4%
1 108
 
5.3%
6 68
 
3.3%
7 24
 
1.2%
8 3
 
0.1%
ValueCountFrequency (%)
0 170
25.4%
9 166
24.8%
3 136
20.3%
4 87
13.0%
2 66
 
9.9%
5 29
 
4.3%
6 10
 
1.5%
7 4
 
0.6%
8 1
 
0.1%
1 1
 
0.1%
Space Separator
ValueCountFrequency (%)
15
100.0%
ValueCountFrequency (%)
12
100.0%
ValueCountFrequency (%)
3
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 10
20.0%
o 10
20.0%
e 5
10.0%
n 5
10.0%
d 5
10.0%
r 5
10.0%
i 5
10.0%
s 5
10.0%
ValueCountFrequency (%)
o 8
20.5%
a 7
17.9%
n 5
12.8%
e 5
12.8%
d 4
10.3%
r 4
10.3%
i 3
 
7.7%
s 3
 
7.7%
ValueCountFrequency (%)
o 2
20.0%
a 2
20.0%
i 1
10.0%
r 1
10.0%
d 1
10.0%
e 1
10.0%
n 1
10.0%
s 1
10.0%
Uppercase Letter
ValueCountFrequency (%)
M 5
100.0%
ValueCountFrequency (%)
M 4
100.0%
ValueCountFrequency (%)
M 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4285
98.7%
Latin 55
 
1.3%
ValueCountFrequency (%)
Common 2554
98.3%
Latin 43
 
1.7%
ValueCountFrequency (%)
Common 840
98.7%
Latin 11
 
1.3%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 852
19.9%
9 841
19.6%
0 813
19.0%
2 430
10.0%
3 406
9.5%
4 381
8.9%
5 280
 
6.5%
1 118
 
2.8%
6 102
 
2.4%
7 42
 
1.0%
Other values (2) 20
 
0.5%
ValueCountFrequency (%)
_ 507
19.9%
9 489
19.1%
0 484
19.0%
2 278
10.9%
4 208
8.1%
3 202
 
7.9%
5 171
 
6.7%
1 108
 
4.2%
6 68
 
2.7%
7 24
 
0.9%
Other values (2) 15
 
0.6%
ValueCountFrequency (%)
0 170
20.2%
_ 167
19.9%
9 166
19.8%
3 136
16.2%
4 87
10.4%
2 66
 
7.9%
5 29
 
3.5%
6 10
 
1.2%
7 4
 
0.5%
3
 
0.4%
Other values (2) 2
 
0.2%
Latin
ValueCountFrequency (%)
a 10
18.2%
o 10
18.2%
e 5
9.1%
n 5
9.1%
M 5
9.1%
d 5
9.1%
r 5
9.1%
i 5
9.1%
s 5
9.1%
ValueCountFrequency (%)
o 8
18.6%
a 7
16.3%
n 5
11.6%
e 5
11.6%
d 4
9.3%
M 4
9.3%
r 4
9.3%
i 3
 
7.0%
s 3
 
7.0%
ValueCountFrequency (%)
o 2
18.2%
a 2
18.2%
i 1
9.1%
r 1
9.1%
M 1
9.1%
d 1
9.1%
e 1
9.1%
n 1
9.1%
s 1
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4340
100.0%
ValueCountFrequency (%)
ASCII 2597
100.0%
ValueCountFrequency (%)
ASCII 851
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 852
19.6%
9 841
19.4%
0 813
18.7%
2 430
9.9%
3 406
9.4%
4 381
8.8%
5 280
 
6.5%
1 118
 
2.7%
6 102
 
2.4%
7 42
 
1.0%
Other values (11) 75
 
1.7%
ValueCountFrequency (%)
_ 507
19.5%
9 489
18.8%
0 484
18.6%
2 278
10.7%
4 208
8.0%
3 202
 
7.8%
5 171
 
6.6%
1 108
 
4.2%
6 68
 
2.6%
7 24
 
0.9%
Other values (11) 58
 
2.2%
ValueCountFrequency (%)
0 170
20.0%
_ 167
19.6%
9 166
19.5%
3 136
16.0%
4 87
10.2%
2 66
 
7.8%
5 29
 
3.4%
6 10
 
1.2%
7 4
 
0.5%
3
 
0.4%
Other values (11) 13
 
1.5%

sexo
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size7.7 KiB4.6 KiB1.6 KiB
M
720 
F
137 
F
293 
M
218 
M
127 
F
41 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowMFM
2nd rowMMM
3rd rowMFM
4th rowFFM
5th rowMFM

Common Values

ValueCountFrequency (%)
M 720
84.0%
F 137
 
16.0%
ValueCountFrequency (%)
F 293
57.3%
M 218
42.7%
ValueCountFrequency (%)
M 127
75.6%
F 41
 
24.4%

Length

2023-08-25T13:07:39.326038image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:39.471265image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:39.619725image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:39.750308image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
m 720
84.0%
f 137
 
16.0%
ValueCountFrequency (%)
f 293
57.3%
m 218
42.7%
ValueCountFrequency (%)
m 127
75.6%
f 41
 
24.4%

Most occurring characters

ValueCountFrequency (%)
M 720
84.0%
F 137
 
16.0%
ValueCountFrequency (%)
F 293
57.3%
M 218
42.7%
ValueCountFrequency (%)
M 127
75.6%
F 41
 
24.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 720
84.0%
F 137
 
16.0%
ValueCountFrequency (%)
F 293
57.3%
M 218
42.7%
ValueCountFrequency (%)
M 127
75.6%
F 41
 
24.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 720
84.0%
F 137
 
16.0%
ValueCountFrequency (%)
F 293
57.3%
M 218
42.7%
ValueCountFrequency (%)
M 127
75.6%
F 41
 
24.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 720
84.0%
F 137
 
16.0%
ValueCountFrequency (%)
F 293
57.3%
M 218
42.7%
ValueCountFrequency (%)
M 127
75.6%
F 41
 
24.4%

ESCOLARID
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct666
Distinct (%)0.7%1.2%3.6%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
De 4 a 7 anos
322 
De 8 a 11 anos
316 
De 1 a 3 anos
116 
De 12 a 14 anos
52 
Nenhuma
35 
De 8 a 11 anos
198 
De 4 a 7 anos
160 
De 1 a 3 anos
56 
De 12 a 14 anos
50 
15 anos e mais
28 
De 8 a 11 anos
81 
De 4 a 7 anos
42 
De 12 a 14 anos
17 
De 1 a 3 anos
13 
15 anos e mais

Length

 Cluster 1Cluster 2Cluster 3
Max length151515
Median length131414
Mean length13.26371113.41487313.52381
Min length777

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters1136768552272
Distinct characters191919
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowDe 4 a 7 anos15 anos e maisDe 8 a 11 anos
2nd rowDe 4 a 7 anosDe 1 a 3 anosDe 4 a 7 anos
3rd rowDe 4 a 7 anosDe 8 a 11 anosDe 8 a 11 anos
4th rowDe 8 a 11 anosDe 4 a 7 anosDe 1 a 3 anos
5th rowDe 8 a 11 anosDe 4 a 7 anosDe 8 a 11 anos

Common Values

ValueCountFrequency (%)
De 4 a 7 anos 322
37.6%
De 8 a 11 anos 316
36.9%
De 1 a 3 anos 116
 
13.5%
De 12 a 14 anos 52
 
6.1%
Nenhuma 35
 
4.1%
15 anos e mais 16
 
1.9%
ValueCountFrequency (%)
De 8 a 11 anos 198
38.7%
De 4 a 7 anos 160
31.3%
De 1 a 3 anos 56
 
11.0%
De 12 a 14 anos 50
 
9.8%
15 anos e mais 28
 
5.5%
Nenhuma 19
 
3.7%
ValueCountFrequency (%)
De 8 a 11 anos 81
48.2%
De 4 a 7 anos 42
25.0%
De 12 a 14 anos 17
 
10.1%
De 1 a 3 anos 13
 
7.7%
15 anos e mais 9
 
5.4%
Nenhuma 6
 
3.6%

Length

2023-08-25T13:07:39.885225image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:40.072222image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:40.245993image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:40.422354image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
anos 822
19.9%
de 806
19.5%
a 806
19.5%
4 322
 
7.8%
7 322
 
7.8%
8 316
 
7.7%
11 316
 
7.7%
1 116
 
2.8%
3 116
 
2.8%
12 52
 
1.3%
Other values (5) 135
 
3.3%
ValueCountFrequency (%)
anos 492
20.1%
de 464
18.9%
a 464
18.9%
8 198
8.1%
11 198
8.1%
4 160
 
6.5%
7 160
 
6.5%
1 56
 
2.3%
3 56
 
2.3%
12 50
 
2.0%
Other values (5) 153
 
6.2%
ValueCountFrequency (%)
anos 162
20.1%
de 153
19.0%
a 153
19.0%
8 81
10.0%
11 81
10.0%
4 42
 
5.2%
7 42
 
5.2%
12 17
 
2.1%
14 17
 
2.1%
1 13
 
1.6%
Other values (5) 46
 
5.7%

Most occurring characters

ValueCountFrequency (%)
3272
28.8%
a 1679
14.8%
1 868
 
7.6%
n 857
 
7.5%
e 857
 
7.5%
s 838
 
7.4%
o 822
 
7.2%
D 806
 
7.1%
4 374
 
3.3%
7 322
 
2.8%
Other values (9) 672
 
5.9%
ValueCountFrequency (%)
1940
28.3%
a 1003
14.6%
1 580
 
8.5%
s 520
 
7.6%
e 511
 
7.5%
n 511
 
7.5%
o 492
 
7.2%
D 464
 
6.8%
4 210
 
3.1%
8 198
 
2.9%
Other values (9) 426
 
6.2%
ValueCountFrequency (%)
639
28.1%
a 330
14.5%
1 218
 
9.6%
s 171
 
7.5%
e 168
 
7.4%
n 168
 
7.4%
o 162
 
7.1%
D 153
 
6.7%
8 81
 
3.6%
4 59
 
2.6%
Other values (9) 123
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5190
45.7%
Space Separator 3272
28.8%
Decimal Number 2064
 
18.2%
Uppercase Letter 841
 
7.4%
ValueCountFrequency (%)
Lowercase Letter 3150
46.0%
Space Separator 1940
28.3%
Decimal Number 1282
18.7%
Uppercase Letter 483
 
7.0%
ValueCountFrequency (%)
Lowercase Letter 1035
45.6%
Space Separator 639
28.1%
Decimal Number 439
19.3%
Uppercase Letter 159
 
7.0%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
3272
100.0%
ValueCountFrequency (%)
1940
100.0%
ValueCountFrequency (%)
639
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 1679
32.4%
n 857
16.5%
e 857
16.5%
s 838
16.1%
o 822
15.8%
m 51
 
1.0%
h 35
 
0.7%
u 35
 
0.7%
i 16
 
0.3%
ValueCountFrequency (%)
a 1003
31.8%
s 520
16.5%
e 511
16.2%
n 511
16.2%
o 492
15.6%
m 47
 
1.5%
i 28
 
0.9%
h 19
 
0.6%
u 19
 
0.6%
ValueCountFrequency (%)
a 330
31.9%
s 171
16.5%
e 168
16.2%
n 168
16.2%
o 162
15.7%
m 15
 
1.4%
i 9
 
0.9%
h 6
 
0.6%
u 6
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 868
42.1%
4 374
18.1%
7 322
 
15.6%
8 316
 
15.3%
3 116
 
5.6%
2 52
 
2.5%
5 16
 
0.8%
ValueCountFrequency (%)
1 580
45.2%
4 210
 
16.4%
8 198
 
15.4%
7 160
 
12.5%
3 56
 
4.4%
2 50
 
3.9%
5 28
 
2.2%
ValueCountFrequency (%)
1 218
49.7%
8 81
 
18.5%
4 59
 
13.4%
7 42
 
9.6%
2 17
 
3.9%
3 13
 
3.0%
5 9
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
D 806
95.8%
N 35
 
4.2%
ValueCountFrequency (%)
D 464
96.1%
N 19
 
3.9%
ValueCountFrequency (%)
D 153
96.2%
N 6
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 6031
53.1%
Common 5336
46.9%
ValueCountFrequency (%)
Latin 3633
53.0%
Common 3222
47.0%
ValueCountFrequency (%)
Latin 1194
52.6%
Common 1078
47.4%

Most frequent character per script

Common
ValueCountFrequency (%)
3272
61.3%
1 868
 
16.3%
4 374
 
7.0%
7 322
 
6.0%
8 316
 
5.9%
3 116
 
2.2%
2 52
 
1.0%
5 16
 
0.3%
ValueCountFrequency (%)
1940
60.2%
1 580
 
18.0%
4 210
 
6.5%
8 198
 
6.1%
7 160
 
5.0%
3 56
 
1.7%
2 50
 
1.6%
5 28
 
0.9%
ValueCountFrequency (%)
639
59.3%
1 218
 
20.2%
8 81
 
7.5%
4 59
 
5.5%
7 42
 
3.9%
2 17
 
1.6%
3 13
 
1.2%
5 9
 
0.8%
Latin
ValueCountFrequency (%)
a 1679
27.8%
n 857
14.2%
e 857
14.2%
s 838
13.9%
o 822
13.6%
D 806
13.4%
m 51
 
0.8%
N 35
 
0.6%
h 35
 
0.6%
u 35
 
0.6%
ValueCountFrequency (%)
a 1003
27.6%
s 520
14.3%
e 511
14.1%
n 511
14.1%
o 492
13.5%
D 464
12.8%
m 47
 
1.3%
i 28
 
0.8%
N 19
 
0.5%
h 19
 
0.5%
ValueCountFrequency (%)
a 330
27.6%
s 171
14.3%
e 168
14.1%
n 168
14.1%
o 162
13.6%
D 153
12.8%
m 15
 
1.3%
i 9
 
0.8%
N 6
 
0.5%
h 6
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11367
100.0%
ValueCountFrequency (%)
ASCII 6855
100.0%
ValueCountFrequency (%)
ASCII 2272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3272
28.8%
a 1679
14.8%
1 868
 
7.6%
n 857
 
7.5%
e 857
 
7.5%
s 838
 
7.4%
o 822
 
7.2%
D 806
 
7.1%
4 374
 
3.3%
7 322
 
2.8%
Other values (9) 672
 
5.9%
ValueCountFrequency (%)
1940
28.3%
a 1003
14.6%
1 580
 
8.5%
s 520
 
7.6%
e 511
 
7.5%
n 511
 
7.5%
o 492
 
7.2%
D 464
 
6.8%
4 210
 
3.1%
8 198
 
2.9%
Other values (9) 426
 
6.2%
ValueCountFrequency (%)
639
28.1%
a 330
14.5%
1 218
 
9.6%
s 171
 
7.5%
e 168
 
7.4%
n 168
 
7.4%
o 162
 
7.1%
D 153
 
6.7%
8 81
 
3.6%
4 59
 
2.6%
Other values (9) 123
 
5.4%

TIPOCUP
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct555
Distinct (%)0.6%1.0%3.0%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Outra
547 
Desempregado
222 
Aposentado
 
44
Dona de Casa
 
38
Profissional de Saude
 
6
Outra
337 
Dona de Casa
59 
Desempregado
57 
Aposentado
44 
Profissional de Saude
 
14
Outra
123 
Desempregado
28 
Dona de Casa
 
11
Aposentado
 
5
Profissional de Saude
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length212121
Median length555
Mean length7.49241547.45792566.8690476
Min length555

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters642138111154
Distinct characters222222
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique001 ?
Unique (%)0.0%0.0%0.6%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowOutraOutraOutra
2nd rowDesempregadoOutraOutra
3rd rowOutraDona de CasaDesempregado
4th rowDesempregadoOutraOutra
5th rowOutraOutraOutra

Common Values

ValueCountFrequency (%)
Outra 547
63.8%
Desempregado 222
25.9%
Aposentado 44
 
5.1%
Dona de Casa 38
 
4.4%
Profissional de Saude 6
 
0.7%
ValueCountFrequency (%)
Outra 337
65.9%
Dona de Casa 59
 
11.5%
Desempregado 57
 
11.2%
Aposentado 44
 
8.6%
Profissional de Saude 14
 
2.7%
ValueCountFrequency (%)
Outra 123
73.2%
Desempregado 28
 
16.7%
Dona de Casa 11
 
6.5%
Aposentado 5
 
3.0%
Profissional de Saude 1
 
0.6%

Length

2023-08-25T13:07:40.585436image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:40.764725image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:40.925444image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:41.127775image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
outra 547
57.9%
desempregado 222
23.5%
aposentado 44
 
4.7%
de 44
 
4.7%
dona 38
 
4.0%
casa 38
 
4.0%
profissional 6
 
0.6%
saude 6
 
0.6%
ValueCountFrequency (%)
outra 337
51.3%
de 73
 
11.1%
dona 59
 
9.0%
casa 59
 
9.0%
desempregado 57
 
8.7%
aposentado 44
 
6.7%
profissional 14
 
2.1%
saude 14
 
2.1%
ValueCountFrequency (%)
outra 123
64.1%
desempregado 28
 
14.6%
de 12
 
6.2%
dona 11
 
5.7%
casa 11
 
5.7%
aposentado 5
 
2.6%
profissional 1
 
0.5%
saude 1
 
0.5%

Most occurring characters

ValueCountFrequency (%)
a 939
14.6%
r 775
12.1%
e 760
11.8%
t 591
9.2%
u 553
8.6%
O 547
8.5%
o 360
 
5.6%
d 316
 
4.9%
s 316
 
4.9%
p 266
 
4.1%
Other values (12) 998
15.5%
ValueCountFrequency (%)
a 643
16.9%
r 408
10.7%
t 381
10.0%
u 351
9.2%
O 337
8.8%
e 302
7.9%
o 232
 
6.1%
d 188
 
4.9%
s 188
 
4.9%
146
 
3.8%
Other values (12) 635
16.7%
ValueCountFrequency (%)
a 191
16.6%
r 152
13.2%
t 128
11.1%
u 124
10.7%
O 123
10.7%
e 102
8.8%
o 51
 
4.4%
d 46
 
4.0%
s 46
 
4.0%
D 39
 
3.4%
Other values (12) 152
13.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5432
84.6%
Uppercase Letter 901
 
14.0%
Space Separator 88
 
1.4%
ValueCountFrequency (%)
Lowercase Letter 3081
80.8%
Uppercase Letter 584
 
15.3%
Space Separator 146
 
3.8%
ValueCountFrequency (%)
Lowercase Letter 950
82.3%
Uppercase Letter 180
 
15.6%
Space Separator 24
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 939
17.3%
r 775
14.3%
e 760
14.0%
t 591
10.9%
u 553
10.2%
o 360
 
6.6%
d 316
 
5.8%
s 316
 
5.8%
p 266
 
4.9%
m 222
 
4.1%
Other values (5) 334
 
6.1%
ValueCountFrequency (%)
a 643
20.9%
r 408
13.2%
t 381
12.4%
u 351
11.4%
e 302
9.8%
o 232
 
7.5%
d 188
 
6.1%
s 188
 
6.1%
n 117
 
3.8%
p 101
 
3.3%
Other values (5) 170
 
5.5%
ValueCountFrequency (%)
a 191
20.1%
r 152
16.0%
t 128
13.5%
u 124
13.1%
e 102
10.7%
o 51
 
5.4%
d 46
 
4.8%
s 46
 
4.8%
p 33
 
3.5%
m 28
 
2.9%
Other values (5) 49
 
5.2%
Uppercase Letter
ValueCountFrequency (%)
O 547
60.7%
D 260
28.9%
A 44
 
4.9%
C 38
 
4.2%
P 6
 
0.7%
S 6
 
0.7%
ValueCountFrequency (%)
O 337
57.7%
D 116
 
19.9%
C 59
 
10.1%
A 44
 
7.5%
P 14
 
2.4%
S 14
 
2.4%
ValueCountFrequency (%)
O 123
68.3%
D 39
 
21.7%
C 11
 
6.1%
A 5
 
2.8%
P 1
 
0.6%
S 1
 
0.6%
Space Separator
ValueCountFrequency (%)
88
100.0%
ValueCountFrequency (%)
146
100.0%
ValueCountFrequency (%)
24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6333
98.6%
Common 88
 
1.4%
ValueCountFrequency (%)
Latin 3665
96.2%
Common 146
 
3.8%
ValueCountFrequency (%)
Latin 1130
97.9%
Common 24
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 939
14.8%
r 775
12.2%
e 760
12.0%
t 591
9.3%
u 553
8.7%
O 547
8.6%
o 360
 
5.7%
d 316
 
5.0%
s 316
 
5.0%
p 266
 
4.2%
Other values (11) 910
14.4%
ValueCountFrequency (%)
a 643
17.5%
r 408
11.1%
t 381
10.4%
u 351
9.6%
O 337
9.2%
e 302
8.2%
o 232
 
6.3%
d 188
 
5.1%
s 188
 
5.1%
n 117
 
3.2%
Other values (11) 518
14.1%
ValueCountFrequency (%)
a 191
16.9%
r 152
13.5%
t 128
11.3%
u 124
11.0%
O 123
10.9%
e 102
9.0%
o 51
 
4.5%
d 46
 
4.1%
s 46
 
4.1%
D 39
 
3.5%
Other values (11) 128
11.3%
Common
ValueCountFrequency (%)
88
100.0%
ValueCountFrequency (%)
146
100.0%
ValueCountFrequency (%)
24
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6421
100.0%
ValueCountFrequency (%)
ASCII 3811
100.0%
ValueCountFrequency (%)
ASCII 1154
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 939
14.6%
r 775
12.1%
e 760
11.8%
t 591
9.2%
u 553
8.6%
O 547
8.5%
o 360
 
5.6%
d 316
 
4.9%
s 316
 
4.9%
p 266
 
4.1%
Other values (12) 998
15.5%
ValueCountFrequency (%)
a 643
16.9%
r 408
10.7%
t 381
10.0%
u 351
9.2%
O 337
8.8%
e 302
7.9%
o 232
 
6.1%
d 188
 
4.9%
s 188
 
4.9%
146
 
3.8%
Other values (12) 635
16.7%
ValueCountFrequency (%)
a 191
16.6%
r 152
13.2%
t 128
11.1%
u 124
10.7%
O 123
10.7%
e 102
8.8%
o 51
 
4.4%
d 46
 
4.0%
s 46
 
4.0%
D 39
 
3.4%
Other values (12) 152
13.2%

sitAtual
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Cura
717 
Abandono
140 
Cura
470 
Abandono
 
41
Cura
121 
Abandono
47 

Length

 Cluster 1Cluster 2Cluster 3
Max length888
Median length444
Mean length4.65344224.32093935.1190476
Min length444

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters39882208860
Distinct characters999
Distinct categories222 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowCuraCuraCura
2nd rowCuraCuraCura
3rd rowCuraCuraAbandono
4th rowCuraCuraCura
5th rowCuraCuraCura

Common Values

ValueCountFrequency (%)
Cura 717
83.7%
Abandono 140
 
16.3%
ValueCountFrequency (%)
Cura 470
92.0%
Abandono 41
 
8.0%
ValueCountFrequency (%)
Cura 121
72.0%
Abandono 47
 
28.0%

Length

2023-08-25T13:07:41.304873image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:41.480583image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:41.625492image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:41.766824image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
cura 717
83.7%
abandono 140
 
16.3%
ValueCountFrequency (%)
cura 470
92.0%
abandono 41
 
8.0%
ValueCountFrequency (%)
cura 121
72.0%
abandono 47
 
28.0%

Most occurring characters

ValueCountFrequency (%)
a 857
21.5%
C 717
18.0%
u 717
18.0%
r 717
18.0%
n 280
 
7.0%
o 280
 
7.0%
A 140
 
3.5%
b 140
 
3.5%
d 140
 
3.5%
ValueCountFrequency (%)
a 511
23.1%
C 470
21.3%
u 470
21.3%
r 470
21.3%
n 82
 
3.7%
o 82
 
3.7%
A 41
 
1.9%
b 41
 
1.9%
d 41
 
1.9%
ValueCountFrequency (%)
a 168
19.5%
C 121
14.1%
u 121
14.1%
r 121
14.1%
n 94
10.9%
o 94
10.9%
A 47
 
5.5%
b 47
 
5.5%
d 47
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3131
78.5%
Uppercase Letter 857
 
21.5%
ValueCountFrequency (%)
Lowercase Letter 1697
76.9%
Uppercase Letter 511
 
23.1%
ValueCountFrequency (%)
Lowercase Letter 692
80.5%
Uppercase Letter 168
 
19.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 857
27.4%
u 717
22.9%
r 717
22.9%
n 280
 
8.9%
o 280
 
8.9%
b 140
 
4.5%
d 140
 
4.5%
ValueCountFrequency (%)
a 511
30.1%
u 470
27.7%
r 470
27.7%
n 82
 
4.8%
o 82
 
4.8%
b 41
 
2.4%
d 41
 
2.4%
ValueCountFrequency (%)
a 168
24.3%
u 121
17.5%
r 121
17.5%
n 94
13.6%
o 94
13.6%
b 47
 
6.8%
d 47
 
6.8%
Uppercase Letter
ValueCountFrequency (%)
C 717
83.7%
A 140
 
16.3%
ValueCountFrequency (%)
C 470
92.0%
A 41
 
8.0%
ValueCountFrequency (%)
C 121
72.0%
A 47
 
28.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3988
100.0%
ValueCountFrequency (%)
Latin 2208
100.0%
ValueCountFrequency (%)
Latin 860
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 857
21.5%
C 717
18.0%
u 717
18.0%
r 717
18.0%
n 280
 
7.0%
o 280
 
7.0%
A 140
 
3.5%
b 140
 
3.5%
d 140
 
3.5%
ValueCountFrequency (%)
a 511
23.1%
C 470
21.3%
u 470
21.3%
r 470
21.3%
n 82
 
3.7%
o 82
 
3.7%
A 41
 
1.9%
b 41
 
1.9%
d 41
 
1.9%
ValueCountFrequency (%)
a 168
19.5%
C 121
14.1%
u 121
14.1%
r 121
14.1%
n 94
10.9%
o 94
10.9%
A 47
 
5.5%
b 47
 
5.5%
d 47
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3988
100.0%
ValueCountFrequency (%)
ASCII 2208
100.0%
ValueCountFrequency (%)
ASCII 860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 857
21.5%
C 717
18.0%
u 717
18.0%
r 717
18.0%
n 280
 
7.0%
o 280
 
7.0%
A 140
 
3.5%
b 140
 
3.5%
d 140
 
3.5%
ValueCountFrequency (%)
a 511
23.1%
C 470
21.3%
u 470
21.3%
r 470
21.3%
n 82
 
3.7%
o 82
 
3.7%
A 41
 
1.9%
b 41
 
1.9%
d 41
 
1.9%
ValueCountFrequency (%)
a 168
19.5%
C 121
14.1%
u 121
14.1%
r 121
14.1%
n 94
10.9%
o 94
10.9%
A 47
 
5.5%
b 47
 
5.5%
d 47
 
5.5%

tipoCaso
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct334
Distinct (%)0.4%0.6%2.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Novo
823 
Recidiva
 
21
Retr Aband
 
13
Novo
500 
Retr Aband
 
6
Recidiva
 
5
Novo
154 
Retr Aband
 
7
Recidiva
 
6
Retrat apos falencia/resistencia
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length101032
Median length444
Mean length4.18903154.1095894.5595238
Min length444

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters35902100766
Distinct characters151520
Distinct categories334 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique001 ?
Unique (%)0.0%0.0%0.6%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNovoNovoNovo
2nd rowNovoNovoNovo
3rd rowNovoNovoNovo
4th rowNovoNovoNovo
5th rowNovoNovoNovo

Common Values

ValueCountFrequency (%)
Novo 823
96.0%
Recidiva 21
 
2.5%
Retr Aband 13
 
1.5%
ValueCountFrequency (%)
Novo 500
97.8%
Retr Aband 6
 
1.2%
Recidiva 5
 
1.0%
ValueCountFrequency (%)
Novo 154
91.7%
Retr Aband 7
 
4.2%
Recidiva 6
 
3.6%
Retrat apos falencia/resistencia 1
 
0.6%

Length

2023-08-25T13:07:41.896419image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:42.385781image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:42.514139image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:42.650715image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
novo 823
94.6%
recidiva 21
 
2.4%
retr 13
 
1.5%
aband 13
 
1.5%
ValueCountFrequency (%)
novo 500
96.7%
retr 6
 
1.2%
aband 6
 
1.2%
recidiva 5
 
1.0%
ValueCountFrequency (%)
novo 154
87.0%
retr 7
 
4.0%
aband 7
 
4.0%
recidiva 6
 
3.4%
retrat 1
 
0.6%
apos 1
 
0.6%
falencia/resistencia 1
 
0.6%

Most occurring characters

ValueCountFrequency (%)
o 1646
45.8%
v 844
23.5%
N 823
22.9%
i 42
 
1.2%
R 34
 
0.9%
e 34
 
0.9%
d 34
 
0.9%
a 34
 
0.9%
c 21
 
0.6%
t 13
 
0.4%
Other values (5) 65
 
1.8%
ValueCountFrequency (%)
o 1000
47.6%
v 505
24.0%
N 500
23.8%
R 11
 
0.5%
e 11
 
0.5%
a 11
 
0.5%
d 11
 
0.5%
i 10
 
0.5%
t 6
 
0.3%
r 6
 
0.3%
Other values (5) 29
 
1.4%
ValueCountFrequency (%)
o 309
40.3%
v 160
20.9%
N 154
20.1%
a 18
 
2.3%
e 17
 
2.2%
i 15
 
2.0%
R 14
 
1.8%
d 13
 
1.7%
t 10
 
1.3%
r 9
 
1.2%
Other values (10) 47
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2707
75.4%
Uppercase Letter 870
 
24.2%
Space Separator 13
 
0.4%
ValueCountFrequency (%)
Lowercase Letter 1577
75.1%
Uppercase Letter 517
 
24.6%
Space Separator 6
 
0.3%
ValueCountFrequency (%)
Lowercase Letter 581
75.8%
Uppercase Letter 175
 
22.8%
Space Separator 9
 
1.2%
Other Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1646
60.8%
v 844
31.2%
i 42
 
1.6%
e 34
 
1.3%
d 34
 
1.3%
a 34
 
1.3%
c 21
 
0.8%
t 13
 
0.5%
r 13
 
0.5%
b 13
 
0.5%
ValueCountFrequency (%)
o 1000
63.4%
v 505
32.0%
e 11
 
0.7%
a 11
 
0.7%
d 11
 
0.7%
i 10
 
0.6%
t 6
 
0.4%
r 6
 
0.4%
b 6
 
0.4%
n 6
 
0.4%
ValueCountFrequency (%)
o 309
53.2%
v 160
27.5%
a 18
 
3.1%
e 17
 
2.9%
i 15
 
2.6%
d 13
 
2.2%
t 10
 
1.7%
r 9
 
1.5%
n 9
 
1.5%
c 8
 
1.4%
Other values (5) 13
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
N 823
94.6%
R 34
 
3.9%
A 13
 
1.5%
ValueCountFrequency (%)
N 500
96.7%
R 11
 
2.1%
A 6
 
1.2%
ValueCountFrequency (%)
N 154
88.0%
R 14
 
8.0%
A 7
 
4.0%
Space Separator
ValueCountFrequency (%)
13
100.0%
ValueCountFrequency (%)
6
100.0%
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3577
99.6%
Common 13
 
0.4%
ValueCountFrequency (%)
Latin 2094
99.7%
Common 6
 
0.3%
ValueCountFrequency (%)
Latin 756
98.7%
Common 10
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1646
46.0%
v 844
23.6%
N 823
23.0%
i 42
 
1.2%
R 34
 
1.0%
e 34
 
1.0%
d 34
 
1.0%
a 34
 
1.0%
c 21
 
0.6%
t 13
 
0.4%
Other values (4) 52
 
1.5%
ValueCountFrequency (%)
o 1000
47.8%
v 505
24.1%
N 500
23.9%
R 11
 
0.5%
e 11
 
0.5%
a 11
 
0.5%
d 11
 
0.5%
i 10
 
0.5%
t 6
 
0.3%
r 6
 
0.3%
Other values (4) 23
 
1.1%
ValueCountFrequency (%)
o 309
40.9%
v 160
21.2%
N 154
20.4%
a 18
 
2.4%
e 17
 
2.2%
i 15
 
2.0%
R 14
 
1.9%
d 13
 
1.7%
t 10
 
1.3%
r 9
 
1.2%
Other values (8) 37
 
4.9%
Common
ValueCountFrequency (%)
13
100.0%
ValueCountFrequency (%)
6
100.0%
ValueCountFrequency (%)
9
90.0%
/ 1
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3590
100.0%
ValueCountFrequency (%)
ASCII 2100
100.0%
ValueCountFrequency (%)
ASCII 766
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1646
45.8%
v 844
23.5%
N 823
22.9%
i 42
 
1.2%
R 34
 
0.9%
e 34
 
0.9%
d 34
 
0.9%
a 34
 
0.9%
c 21
 
0.6%
t 13
 
0.4%
Other values (5) 65
 
1.8%
ValueCountFrequency (%)
o 1000
47.6%
v 505
24.0%
N 500
23.8%
R 11
 
0.5%
e 11
 
0.5%
a 11
 
0.5%
d 11
 
0.5%
i 10
 
0.5%
t 6
 
0.3%
r 6
 
0.3%
Other values (5) 29
 
1.4%
ValueCountFrequency (%)
o 309
40.3%
v 160
20.9%
N 154
20.1%
a 18
 
2.3%
e 17
 
2.2%
i 15
 
2.0%
R 14
 
1.8%
d 13
 
1.7%
t 10
 
1.3%
r 9
 
1.2%
Other values (10) 47
 
6.1%

FORMACLIN1
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct8108
Distinct (%)0.9%2.0%4.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Pul
818 
Pleural
 
27
Ganglionar Periferica
 
5
Meningea
 
3
Vias Urinarias
 
1
Other values (3)
 
3
Pul
435 
Pleural
 
43
Ganglionar Periferica
 
13
Oftalmica
 
6
Pele
 
5
Other values (5)
 
9
Pul
134 
Pleural
15 
Ganglionar Periferica
 
6
Meningea
 
5
Miliar
 
3
Other values (3)
 
5

Length

 Cluster 1Cluster 2Cluster 3
Max length212121
Median length333
Mean length3.28238043.94129164.4285714
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters28132014744
Distinct characters212120
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique421 ?
Unique (%)0.5%0.4%0.6%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowPulGanglionar PerifericaPul
2nd rowPulPulPul
3rd rowPulPleuralPul
4th rowPulPulPleural
5th rowPulPulPul

Common Values

ValueCountFrequency (%)
Pul 818
95.4%
Pleural 27
 
3.2%
Ganglionar Periferica 5
 
0.6%
Meningea 3
 
0.4%
Vias Urinarias 1
 
0.1%
Ossea 1
 
0.1%
Outras 1
 
0.1%
Multiplos Orgaos 1
 
0.1%
ValueCountFrequency (%)
Pul 435
85.1%
Pleural 43
 
8.4%
Ganglionar Periferica 13
 
2.5%
Oftalmica 6
 
1.2%
Pele 5
 
1.0%
Outras 3
 
0.6%
Ossea 2
 
0.4%
Miliar 2
 
0.4%
Vias Urinarias 1
 
0.2%
Genital 1
 
0.2%
ValueCountFrequency (%)
Pul 134
79.8%
Pleural 15
 
8.9%
Ganglionar Periferica 6
 
3.6%
Meningea 5
 
3.0%
Miliar 3
 
1.8%
Outras 2
 
1.2%
Multiplos Orgaos 2
 
1.2%
Oftalmica 1
 
0.6%

Length

2023-08-25T13:07:42.803830image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:43.004199image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:43.204636image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:43.401469image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
pul 818
94.7%
pleural 27
 
3.1%
ganglionar 5
 
0.6%
periferica 5
 
0.6%
meningea 3
 
0.3%
vias 1
 
0.1%
urinarias 1
 
0.1%
ossea 1
 
0.1%
outras 1
 
0.1%
multiplos 1
 
0.1%
ValueCountFrequency (%)
pul 435
82.9%
pleural 43
 
8.2%
ganglionar 13
 
2.5%
periferica 13
 
2.5%
oftalmica 6
 
1.1%
pele 5
 
1.0%
outras 3
 
0.6%
ossea 2
 
0.4%
miliar 2
 
0.4%
vias 1
 
0.2%
Other values (2) 2
 
0.4%
ValueCountFrequency (%)
pul 134
76.1%
pleural 15
 
8.5%
ganglionar 6
 
3.4%
periferica 6
 
3.4%
meningea 5
 
2.8%
miliar 3
 
1.7%
outras 2
 
1.1%
multiplos 2
 
1.1%
orgaos 2
 
1.1%
oftalmica 1
 
0.6%

Most occurring characters

ValueCountFrequency (%)
l 879
31.2%
P 850
30.2%
u 847
30.1%
a 51
 
1.8%
r 46
 
1.6%
e 44
 
1.6%
i 22
 
0.8%
n 17
 
0.6%
g 9
 
0.3%
7
 
0.2%
Other values (11) 41
 
1.5%
ValueCountFrequency (%)
l 548
27.2%
P 496
24.6%
u 481
23.9%
a 105
 
5.2%
r 89
 
4.4%
e 82
 
4.1%
i 53
 
2.6%
n 28
 
1.4%
f 19
 
0.9%
c 19
 
0.9%
Other values (11) 94
 
4.7%
ValueCountFrequency (%)
l 178
23.9%
P 155
20.8%
u 153
20.6%
a 47
 
6.3%
r 40
 
5.4%
e 37
 
5.0%
i 32
 
4.3%
n 22
 
3.0%
g 13
 
1.7%
M 10
 
1.3%
Other values (10) 57
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1942
69.0%
Uppercase Letter 864
30.7%
Space Separator 7
 
0.2%
ValueCountFrequency (%)
Lowercase Letter 1475
73.2%
Uppercase Letter 525
 
26.1%
Space Separator 14
 
0.7%
ValueCountFrequency (%)
Lowercase Letter 560
75.3%
Uppercase Letter 176
 
23.7%
Space Separator 8
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 879
45.3%
u 847
43.6%
a 51
 
2.6%
r 46
 
2.4%
e 44
 
2.3%
i 22
 
1.1%
n 17
 
0.9%
g 9
 
0.5%
s 7
 
0.4%
o 7
 
0.4%
Other values (4) 13
 
0.7%
ValueCountFrequency (%)
l 548
37.2%
u 481
32.6%
a 105
 
7.1%
r 89
 
6.0%
e 82
 
5.6%
i 53
 
3.6%
n 28
 
1.9%
f 19
 
1.3%
c 19
 
1.3%
o 13
 
0.9%
Other values (4) 38
 
2.6%
ValueCountFrequency (%)
l 178
31.8%
u 153
27.3%
a 47
 
8.4%
r 40
 
7.1%
e 37
 
6.6%
i 32
 
5.7%
n 22
 
3.9%
g 13
 
2.3%
o 10
 
1.8%
f 7
 
1.2%
Other values (5) 21
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
P 850
98.4%
G 5
 
0.6%
M 4
 
0.5%
O 3
 
0.3%
V 1
 
0.1%
U 1
 
0.1%
ValueCountFrequency (%)
P 496
94.5%
G 14
 
2.7%
O 11
 
2.1%
M 2
 
0.4%
V 1
 
0.2%
U 1
 
0.2%
ValueCountFrequency (%)
P 155
88.1%
M 10
 
5.7%
G 6
 
3.4%
O 5
 
2.8%
Space Separator
ValueCountFrequency (%)
7
100.0%
ValueCountFrequency (%)
14
100.0%
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2806
99.8%
Common 7
 
0.2%
ValueCountFrequency (%)
Latin 2000
99.3%
Common 14
 
0.7%
ValueCountFrequency (%)
Latin 736
98.9%
Common 8
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 879
31.3%
P 850
30.3%
u 847
30.2%
a 51
 
1.8%
r 46
 
1.6%
e 44
 
1.6%
i 22
 
0.8%
n 17
 
0.6%
g 9
 
0.3%
s 7
 
0.2%
Other values (10) 34
 
1.2%
ValueCountFrequency (%)
l 548
27.4%
P 496
24.8%
u 481
24.1%
a 105
 
5.2%
r 89
 
4.5%
e 82
 
4.1%
i 53
 
2.6%
n 28
 
1.4%
f 19
 
0.9%
c 19
 
0.9%
Other values (10) 80
 
4.0%
ValueCountFrequency (%)
l 178
24.2%
P 155
21.1%
u 153
20.8%
a 47
 
6.4%
r 40
 
5.4%
e 37
 
5.0%
i 32
 
4.3%
n 22
 
3.0%
g 13
 
1.8%
M 10
 
1.4%
Other values (9) 49
 
6.7%
Common
ValueCountFrequency (%)
7
100.0%
ValueCountFrequency (%)
14
100.0%
ValueCountFrequency (%)
8
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2813
100.0%
ValueCountFrequency (%)
ASCII 2014
100.0%
ValueCountFrequency (%)
ASCII 744
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 879
31.2%
P 850
30.2%
u 847
30.1%
a 51
 
1.8%
r 46
 
1.6%
e 44
 
1.6%
i 22
 
0.8%
n 17
 
0.6%
g 9
 
0.3%
7
 
0.2%
Other values (11) 41
 
1.5%
ValueCountFrequency (%)
l 548
27.2%
P 496
24.6%
u 481
23.9%
a 105
 
5.2%
r 89
 
4.4%
e 82
 
4.1%
i 53
 
2.6%
n 28
 
1.4%
f 19
 
0.9%
c 19
 
0.9%
Other values (11) 94
 
4.7%
ValueCountFrequency (%)
l 178
23.9%
P 155
20.8%
u 153
20.6%
a 47
 
6.3%
r 40
 
5.4%
e 37
 
5.0%
i 32
 
4.3%
n 22
 
3.0%
g 13
 
1.7%
M 10
 
1.3%
Other values (10) 57
 
7.7%

classif
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct434
Distinct (%)0.5%0.6%2.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Pul
785 
Ext
 
38
P+E
 
33
Dissem
 
1
Pul
428 
Ext
76 
P+E
 
7
Pul
68 
P+E
66 
Ext
32 
Dissem
 
2

Length

 Cluster 1Cluster 2Cluster 3
Max length636
Median length333
Mean length3.003500633.0357143
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters25741533510
Distinct characters12712
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique100 ?
Unique (%)0.1%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowPulExtP+E
2nd rowP+EPulP+E
3rd rowP+EExtPul
4th rowPulPulExt
5th rowPulPulP+E

Common Values

ValueCountFrequency (%)
Pul 785
91.6%
Ext 38
 
4.4%
P+E 33
 
3.9%
Dissem 1
 
0.1%
ValueCountFrequency (%)
Pul 428
83.8%
Ext 76
 
14.9%
P+E 7
 
1.4%
ValueCountFrequency (%)
Pul 68
40.5%
P+E 66
39.3%
Ext 32
19.0%
Dissem 2
 
1.2%

Length

2023-08-25T13:07:43.572407image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:43.747164image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:43.885697image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:44.036200image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
pul 785
91.6%
ext 38
 
4.4%
p+e 33
 
3.9%
dissem 1
 
0.1%
ValueCountFrequency (%)
pul 428
83.8%
ext 76
 
14.9%
p+e 7
 
1.4%
ValueCountFrequency (%)
pul 68
40.5%
p+e 66
39.3%
ext 32
19.0%
dissem 2
 
1.2%

Most occurring characters

ValueCountFrequency (%)
P 818
31.8%
u 785
30.5%
l 785
30.5%
E 71
 
2.8%
x 38
 
1.5%
t 38
 
1.5%
+ 33
 
1.3%
s 2
 
0.1%
D 1
 
< 0.1%
i 1
 
< 0.1%
Other values (2) 2
 
0.1%
ValueCountFrequency (%)
P 435
28.4%
u 428
27.9%
l 428
27.9%
E 83
 
5.4%
x 76
 
5.0%
t 76
 
5.0%
+ 7
 
0.5%
ValueCountFrequency (%)
P 134
26.3%
E 98
19.2%
u 68
13.3%
l 68
13.3%
+ 66
12.9%
x 32
 
6.3%
t 32
 
6.3%
s 4
 
0.8%
D 2
 
0.4%
i 2
 
0.4%
Other values (2) 4
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1651
64.1%
Uppercase Letter 890
34.6%
Math Symbol 33
 
1.3%
ValueCountFrequency (%)
Lowercase Letter 1008
65.8%
Uppercase Letter 518
33.8%
Math Symbol 7
 
0.5%
ValueCountFrequency (%)
Uppercase Letter 234
45.9%
Lowercase Letter 210
41.2%
Math Symbol 66
 
12.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 818
91.9%
E 71
 
8.0%
D 1
 
0.1%
ValueCountFrequency (%)
P 435
84.0%
E 83
 
16.0%
ValueCountFrequency (%)
P 134
57.3%
E 98
41.9%
D 2
 
0.9%
Lowercase Letter
ValueCountFrequency (%)
u 785
47.5%
l 785
47.5%
x 38
 
2.3%
t 38
 
2.3%
s 2
 
0.1%
i 1
 
0.1%
e 1
 
0.1%
m 1
 
0.1%
ValueCountFrequency (%)
u 428
42.5%
l 428
42.5%
x 76
 
7.5%
t 76
 
7.5%
ValueCountFrequency (%)
u 68
32.4%
l 68
32.4%
x 32
15.2%
t 32
15.2%
s 4
 
1.9%
i 2
 
1.0%
e 2
 
1.0%
m 2
 
1.0%
Math Symbol
ValueCountFrequency (%)
+ 33
100.0%
ValueCountFrequency (%)
+ 7
100.0%
ValueCountFrequency (%)
+ 66
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2541
98.7%
Common 33
 
1.3%
ValueCountFrequency (%)
Latin 1526
99.5%
Common 7
 
0.5%
ValueCountFrequency (%)
Latin 444
87.1%
Common 66
 
12.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 818
32.2%
u 785
30.9%
l 785
30.9%
E 71
 
2.8%
x 38
 
1.5%
t 38
 
1.5%
s 2
 
0.1%
D 1
 
< 0.1%
i 1
 
< 0.1%
e 1
 
< 0.1%
ValueCountFrequency (%)
P 435
28.5%
u 428
28.0%
l 428
28.0%
E 83
 
5.4%
x 76
 
5.0%
t 76
 
5.0%
ValueCountFrequency (%)
P 134
30.2%
E 98
22.1%
u 68
15.3%
l 68
15.3%
x 32
 
7.2%
t 32
 
7.2%
s 4
 
0.9%
D 2
 
0.5%
i 2
 
0.5%
e 2
 
0.5%
Common
ValueCountFrequency (%)
+ 33
100.0%
ValueCountFrequency (%)
+ 7
100.0%
ValueCountFrequency (%)
+ 66
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2574
100.0%
ValueCountFrequency (%)
ASCII 1533
100.0%
ValueCountFrequency (%)
ASCII 510
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 818
31.8%
u 785
30.5%
l 785
30.5%
E 71
 
2.8%
x 38
 
1.5%
t 38
 
1.5%
+ 33
 
1.3%
s 2
 
0.1%
D 1
 
< 0.1%
i 1
 
< 0.1%
Other values (2) 2
 
0.1%
ValueCountFrequency (%)
P 435
28.4%
u 428
27.9%
l 428
27.9%
E 83
 
5.4%
x 76
 
5.0%
t 76
 
5.0%
+ 7
 
0.5%
ValueCountFrequency (%)
P 134
26.3%
E 98
19.2%
u 68
13.3%
l 68
13.3%
+ 66
12.9%
x 32
 
6.3%
t 32
 
6.3%
s 4
 
0.8%
D 2
 
0.4%
i 2
 
0.4%
Other values (2) 4
 
0.8%

descoberta
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct666
Distinct (%)0.7%1.2%3.6%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Demanda Ambulatorial
340 
Urgencia / Emergencia
267 
Elucidacao Diagn. em Internacao
168 
Busca Ativa na Comunidade
 
34
Investigacao de Contatos
 
24
Demanda Ambulatorial
329 
Urgencia / Emergencia
92 
Elucidacao Diagn. em Internacao
61 
Investigacao de Contatos
 
16
Busca Ativa na Comunidade
 
11
Elucidacao Diagn. em Internacao
115 
Demanda Ambulatorial
32 
Urgencia / Emergencia
17 
Busca Ativa em Instituicao
 
2
Busca Ativa na Comunidade
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length313131
Median length262031
Mean length22.94632421.74951127.755952
Min length202020

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters19665111144663
Distinct characters262626
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique002 ?
Unique (%)0.0%0.0%1.2%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowElucidacao Diagn. em InternacaoDemanda AmbulatorialElucidacao Diagn. em Internacao
2nd rowDemanda AmbulatorialDemanda AmbulatorialUrgencia / Emergencia
3rd rowDemanda AmbulatorialDemanda AmbulatorialDemanda Ambulatorial
4th rowDemanda AmbulatorialDemanda AmbulatorialElucidacao Diagn. em Internacao
5th rowDemanda AmbulatorialDemanda AmbulatorialElucidacao Diagn. em Internacao

Common Values

ValueCountFrequency (%)
Demanda Ambulatorial 340
39.7%
Urgencia / Emergencia 267
31.2%
Elucidacao Diagn. em Internacao 168
19.6%
Busca Ativa na Comunidade 34
 
4.0%
Investigacao de Contatos 24
 
2.8%
Busca Ativa em Instituicao 24
 
2.8%
ValueCountFrequency (%)
Demanda Ambulatorial 329
64.4%
Urgencia / Emergencia 92
 
18.0%
Elucidacao Diagn. em Internacao 61
 
11.9%
Investigacao de Contatos 16
 
3.1%
Busca Ativa na Comunidade 11
 
2.2%
Busca Ativa em Instituicao 2
 
0.4%
ValueCountFrequency (%)
Elucidacao Diagn. em Internacao 115
68.5%
Demanda Ambulatorial 32
 
19.0%
Urgencia / Emergencia 17
 
10.1%
Busca Ativa em Instituicao 2
 
1.2%
Busca Ativa na Comunidade 1
 
0.6%
Investigacao de Contatos 1
 
0.6%

Length

2023-08-25T13:07:44.184253image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:44.363346image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:44.535900image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:44.710381image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
demanda 340
13.8%
ambulatorial 340
13.8%
urgencia 267
10.9%
267
10.9%
emergencia 267
10.9%
em 192
7.8%
internacao 168
6.8%
diagn 168
6.8%
elucidacao 168
6.8%
busca 58
 
2.4%
Other values (7) 222
9.0%
ValueCountFrequency (%)
demanda 329
25.7%
ambulatorial 329
25.7%
urgencia 92
 
7.2%
92
 
7.2%
emergencia 92
 
7.2%
em 63
 
4.9%
internacao 61
 
4.8%
diagn 61
 
4.8%
elucidacao 61
 
4.8%
investigacao 16
 
1.3%
Other values (7) 82
 
6.4%
ValueCountFrequency (%)
em 117
19.8%
elucidacao 115
19.5%
internacao 115
19.5%
diagn 115
19.5%
demanda 32
 
5.4%
ambulatorial 32
 
5.4%
17
 
2.9%
emergencia 17
 
2.9%
urgencia 17
 
2.9%
busca 3
 
0.5%
Other values (7) 10
 
1.7%

Most occurring characters

ValueCountFrequency (%)
a 3014
15.3%
1600
 
8.1%
e 1583
 
8.0%
n 1518
 
7.7%
i 1374
 
7.0%
m 1173
 
6.0%
c 1144
 
5.8%
r 1042
 
5.3%
l 848
 
4.3%
o 806
 
4.1%
Other values (16) 5563
28.3%
ValueCountFrequency (%)
a 1903
17.1%
m 824
 
7.4%
e 772
 
6.9%
767
 
6.9%
n 752
 
6.8%
l 719
 
6.5%
i 679
 
6.1%
r 574
 
5.2%
o 512
 
4.6%
t 455
 
4.1%
Other values (16) 3157
28.4%
ValueCountFrequency (%)
a 750
16.1%
422
 
9.0%
n 417
 
8.9%
c 385
 
8.3%
e 318
 
6.8%
i 305
 
6.5%
o 268
 
5.7%
m 199
 
4.3%
r 181
 
3.9%
l 179
 
3.8%
Other values (16) 1239
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15690
79.8%
Uppercase Letter 1940
 
9.9%
Space Separator 1600
 
8.1%
Other Punctuation 435
 
2.2%
ValueCountFrequency (%)
Lowercase Letter 9098
81.9%
Uppercase Letter 1096
 
9.9%
Space Separator 767
 
6.9%
Other Punctuation 153
 
1.4%
ValueCountFrequency (%)
Lowercase Letter 3655
78.4%
Uppercase Letter 454
 
9.7%
Space Separator 422
 
9.0%
Other Punctuation 132
 
2.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3014
19.2%
e 1583
10.1%
n 1518
9.7%
i 1374
8.8%
m 1173
 
7.5%
c 1144
 
7.3%
r 1042
 
6.6%
l 848
 
5.4%
o 806
 
5.1%
g 726
 
4.6%
Other values (6) 2462
15.7%
ValueCountFrequency (%)
a 1903
20.9%
m 824
9.1%
e 772
8.5%
n 752
 
8.3%
l 719
 
7.9%
i 679
 
7.5%
r 574
 
6.3%
o 512
 
5.6%
t 455
 
5.0%
d 428
 
4.7%
Other values (6) 1480
16.3%
ValueCountFrequency (%)
a 750
20.5%
n 417
11.4%
c 385
10.5%
e 318
8.7%
i 305
8.3%
o 268
 
7.3%
m 199
 
5.4%
r 181
 
5.0%
l 179
 
4.9%
t 157
 
4.3%
Other values (6) 496
13.6%
Space Separator
ValueCountFrequency (%)
1600
100.0%
ValueCountFrequency (%)
767
100.0%
ValueCountFrequency (%)
422
100.0%
Uppercase Letter
ValueCountFrequency (%)
D 508
26.2%
E 435
22.4%
A 398
20.5%
U 267
13.8%
I 216
11.1%
B 58
 
3.0%
C 58
 
3.0%
ValueCountFrequency (%)
D 390
35.6%
A 342
31.2%
E 153
 
14.0%
U 92
 
8.4%
I 79
 
7.2%
C 27
 
2.5%
B 13
 
1.2%
ValueCountFrequency (%)
D 147
32.4%
E 132
29.1%
I 118
26.0%
A 35
 
7.7%
U 17
 
3.7%
B 3
 
0.7%
C 2
 
0.4%
Other Punctuation
ValueCountFrequency (%)
/ 267
61.4%
. 168
38.6%
ValueCountFrequency (%)
/ 92
60.1%
. 61
39.9%
ValueCountFrequency (%)
. 115
87.1%
/ 17
 
12.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 17630
89.7%
Common 2035
 
10.3%
ValueCountFrequency (%)
Latin 10194
91.7%
Common 920
 
8.3%
ValueCountFrequency (%)
Latin 4109
88.1%
Common 554
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3014
17.1%
e 1583
 
9.0%
n 1518
 
8.6%
i 1374
 
7.8%
m 1173
 
6.7%
c 1144
 
6.5%
r 1042
 
5.9%
l 848
 
4.8%
o 806
 
4.6%
g 726
 
4.1%
Other values (13) 4402
25.0%
ValueCountFrequency (%)
a 1903
18.7%
m 824
 
8.1%
e 772
 
7.6%
n 752
 
7.4%
l 719
 
7.1%
i 679
 
6.7%
r 574
 
5.6%
o 512
 
5.0%
t 455
 
4.5%
d 428
 
4.2%
Other values (13) 2576
25.3%
ValueCountFrequency (%)
a 750
18.3%
n 417
10.1%
c 385
9.4%
e 318
 
7.7%
i 305
 
7.4%
o 268
 
6.5%
m 199
 
4.8%
r 181
 
4.4%
l 179
 
4.4%
t 157
 
3.8%
Other values (13) 950
23.1%
Common
ValueCountFrequency (%)
1600
78.6%
/ 267
 
13.1%
. 168
 
8.3%
ValueCountFrequency (%)
767
83.4%
/ 92
 
10.0%
. 61
 
6.6%
ValueCountFrequency (%)
422
76.2%
. 115
 
20.8%
/ 17
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19665
100.0%
ValueCountFrequency (%)
ASCII 11114
100.0%
ValueCountFrequency (%)
ASCII 4663
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3014
15.3%
1600
 
8.1%
e 1583
 
8.0%
n 1518
 
7.7%
i 1374
 
7.0%
m 1173
 
6.0%
c 1144
 
5.8%
r 1042
 
5.3%
l 848
 
4.3%
o 806
 
4.1%
Other values (16) 5563
28.3%
ValueCountFrequency (%)
a 1903
17.1%
m 824
 
7.4%
e 772
 
6.9%
767
 
6.9%
n 752
 
6.8%
l 719
 
6.5%
i 679
 
6.1%
r 574
 
5.2%
o 512
 
4.6%
t 455
 
4.1%
Other values (16) 3157
28.4%
ValueCountFrequency (%)
a 750
16.1%
422
 
9.0%
n 417
 
8.9%
c 385
 
8.3%
e 318
 
6.8%
i 305
 
6.5%
o 268
 
5.7%
m 199
 
4.3%
r 181
 
3.9%
l 179
 
3.8%
Other values (16) 1239
26.6%

bac
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct333
Distinct (%)0.4%0.6%1.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Pos
580 
Neg
150 
N/realiz
127 
Pos
303 
Neg
110 
N/realiz
98 
Neg
92 
Pos
42 
N/realiz
34 

Length

 Cluster 1Cluster 2Cluster 3
Max length888
Median length333
Mean length3.74095683.95890414.0119048
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters32062023674
Distinct characters121212
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowPosNegN/realiz
2nd rowPosPosPos
3rd rowPosN/realizNeg
4th rowPosPosNeg
5th rowNegNegN/realiz

Common Values

ValueCountFrequency (%)
Pos 580
67.7%
Neg 150
 
17.5%
N/realiz 127
 
14.8%
ValueCountFrequency (%)
Pos 303
59.3%
Neg 110
 
21.5%
N/realiz 98
 
19.2%
ValueCountFrequency (%)
Neg 92
54.8%
Pos 42
25.0%
N/realiz 34
 
20.2%

Length

2023-08-25T13:07:44.869025image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:45.027720image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:45.162129image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:45.297994image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
pos 580
67.7%
neg 150
 
17.5%
n/realiz 127
 
14.8%
ValueCountFrequency (%)
pos 303
59.3%
neg 110
 
21.5%
n/realiz 98
 
19.2%
ValueCountFrequency (%)
neg 92
54.8%
pos 42
25.0%
n/realiz 34
 
20.2%

Most occurring characters

ValueCountFrequency (%)
P 580
18.1%
o 580
18.1%
s 580
18.1%
N 277
8.6%
e 277
8.6%
g 150
 
4.7%
/ 127
 
4.0%
r 127
 
4.0%
a 127
 
4.0%
l 127
 
4.0%
Other values (2) 254
7.9%
ValueCountFrequency (%)
P 303
15.0%
o 303
15.0%
s 303
15.0%
N 208
10.3%
e 208
10.3%
g 110
 
5.4%
/ 98
 
4.8%
r 98
 
4.8%
a 98
 
4.8%
l 98
 
4.8%
Other values (2) 196
9.7%
ValueCountFrequency (%)
N 126
18.7%
e 126
18.7%
g 92
13.6%
P 42
 
6.2%
o 42
 
6.2%
s 42
 
6.2%
/ 34
 
5.0%
r 34
 
5.0%
a 34
 
5.0%
l 34
 
5.0%
Other values (2) 68
10.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2222
69.3%
Uppercase Letter 857
 
26.7%
Other Punctuation 127
 
4.0%
ValueCountFrequency (%)
Lowercase Letter 1414
69.9%
Uppercase Letter 511
 
25.3%
Other Punctuation 98
 
4.8%
ValueCountFrequency (%)
Lowercase Letter 472
70.0%
Uppercase Letter 168
 
24.9%
Other Punctuation 34
 
5.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 580
67.7%
N 277
32.3%
ValueCountFrequency (%)
P 303
59.3%
N 208
40.7%
ValueCountFrequency (%)
N 126
75.0%
P 42
 
25.0%
Lowercase Letter
ValueCountFrequency (%)
o 580
26.1%
s 580
26.1%
e 277
12.5%
g 150
 
6.8%
r 127
 
5.7%
a 127
 
5.7%
l 127
 
5.7%
i 127
 
5.7%
z 127
 
5.7%
ValueCountFrequency (%)
o 303
21.4%
s 303
21.4%
e 208
14.7%
g 110
 
7.8%
r 98
 
6.9%
a 98
 
6.9%
l 98
 
6.9%
i 98
 
6.9%
z 98
 
6.9%
ValueCountFrequency (%)
e 126
26.7%
g 92
19.5%
o 42
 
8.9%
s 42
 
8.9%
r 34
 
7.2%
a 34
 
7.2%
l 34
 
7.2%
i 34
 
7.2%
z 34
 
7.2%
Other Punctuation
ValueCountFrequency (%)
/ 127
100.0%
ValueCountFrequency (%)
/ 98
100.0%
ValueCountFrequency (%)
/ 34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3079
96.0%
Common 127
 
4.0%
ValueCountFrequency (%)
Latin 1925
95.2%
Common 98
 
4.8%
ValueCountFrequency (%)
Latin 640
95.0%
Common 34
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 580
18.8%
o 580
18.8%
s 580
18.8%
N 277
9.0%
e 277
9.0%
g 150
 
4.9%
r 127
 
4.1%
a 127
 
4.1%
l 127
 
4.1%
i 127
 
4.1%
ValueCountFrequency (%)
P 303
15.7%
o 303
15.7%
s 303
15.7%
N 208
10.8%
e 208
10.8%
g 110
 
5.7%
r 98
 
5.1%
a 98
 
5.1%
l 98
 
5.1%
i 98
 
5.1%
ValueCountFrequency (%)
N 126
19.7%
e 126
19.7%
g 92
14.4%
P 42
 
6.6%
o 42
 
6.6%
s 42
 
6.6%
r 34
 
5.3%
a 34
 
5.3%
l 34
 
5.3%
i 34
 
5.3%
Common
ValueCountFrequency (%)
/ 127
100.0%
ValueCountFrequency (%)
/ 98
100.0%
ValueCountFrequency (%)
/ 34
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3206
100.0%
ValueCountFrequency (%)
ASCII 2023
100.0%
ValueCountFrequency (%)
ASCII 674
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 580
18.1%
o 580
18.1%
s 580
18.1%
N 277
8.6%
e 277
8.6%
g 150
 
4.7%
/ 127
 
4.0%
r 127
 
4.0%
a 127
 
4.0%
l 127
 
4.0%
Other values (2) 254
7.9%
ValueCountFrequency (%)
P 303
15.0%
o 303
15.0%
s 303
15.0%
N 208
10.3%
e 208
10.3%
g 110
 
5.4%
/ 98
 
4.8%
r 98
 
4.8%
a 98
 
4.8%
l 98
 
4.8%
Other values (2) 196
9.7%
ValueCountFrequency (%)
N 126
18.7%
e 126
18.7%
g 92
13.6%
P 42
 
6.2%
o 42
 
6.2%
s 42
 
6.2%
/ 34
 
5.0%
r 34
 
5.0%
a 34
 
5.0%
l 34
 
5.0%
Other values (2) 68
10.1%

BACOUTRO
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct433
Distinct (%)0.5%0.6%1.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N/realiz
790 
Neg
 
40
Pos
 
26
And
 
1
N/realiz
476 
Neg
 
23
Pos
 
12
N/realiz
99 
Neg
40 
Pos
29 

Length

 Cluster 1Cluster 2Cluster 3
Max length888
Median length888
Mean length7.60910157.65753425.9464286
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters65213913999
Distinct characters151212
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique100 ?
Unique (%)0.1%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowN/realizN/realizPos
2nd rowN/realizN/realizN/realiz
3rd rowN/realizN/realizN/realiz
4th rowN/realizN/realizN/realiz
5th rowN/realizN/realizPos

Common Values

ValueCountFrequency (%)
N/realiz 790
92.2%
Neg 40
 
4.7%
Pos 26
 
3.0%
And 1
 
0.1%
ValueCountFrequency (%)
N/realiz 476
93.2%
Neg 23
 
4.5%
Pos 12
 
2.3%
ValueCountFrequency (%)
N/realiz 99
58.9%
Neg 40
23.8%
Pos 29
 
17.3%

Length

2023-08-25T13:07:45.425755image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:45.588745image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:45.736080image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:45.869520image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n/realiz 790
92.2%
neg 40
 
4.7%
pos 26
 
3.0%
and 1
 
0.1%
ValueCountFrequency (%)
n/realiz 476
93.2%
neg 23
 
4.5%
pos 12
 
2.3%
ValueCountFrequency (%)
n/realiz 99
58.9%
neg 40
23.8%
pos 29
 
17.3%

Most occurring characters

ValueCountFrequency (%)
N 830
12.7%
e 830
12.7%
/ 790
12.1%
r 790
12.1%
a 790
12.1%
l 790
12.1%
i 790
12.1%
z 790
12.1%
g 40
 
0.6%
P 26
 
0.4%
Other values (5) 55
 
0.8%
ValueCountFrequency (%)
N 499
12.8%
e 499
12.8%
/ 476
12.2%
r 476
12.2%
a 476
12.2%
l 476
12.2%
i 476
12.2%
z 476
12.2%
g 23
 
0.6%
P 12
 
0.3%
Other values (2) 24
 
0.6%
ValueCountFrequency (%)
N 139
13.9%
e 139
13.9%
/ 99
9.9%
r 99
9.9%
a 99
9.9%
l 99
9.9%
i 99
9.9%
z 99
9.9%
g 40
 
4.0%
P 29
 
2.9%
Other values (2) 58
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4874
74.7%
Uppercase Letter 857
 
13.1%
Other Punctuation 790
 
12.1%
ValueCountFrequency (%)
Lowercase Letter 2926
74.8%
Uppercase Letter 511
 
13.1%
Other Punctuation 476
 
12.2%
ValueCountFrequency (%)
Lowercase Letter 732
73.3%
Uppercase Letter 168
 
16.8%
Other Punctuation 99
 
9.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 830
96.8%
P 26
 
3.0%
A 1
 
0.1%
ValueCountFrequency (%)
N 499
97.7%
P 12
 
2.3%
ValueCountFrequency (%)
N 139
82.7%
P 29
 
17.3%
Lowercase Letter
ValueCountFrequency (%)
e 830
17.0%
r 790
16.2%
a 790
16.2%
l 790
16.2%
i 790
16.2%
z 790
16.2%
g 40
 
0.8%
o 26
 
0.5%
s 26
 
0.5%
n 1
 
< 0.1%
ValueCountFrequency (%)
e 499
17.1%
r 476
16.3%
a 476
16.3%
l 476
16.3%
i 476
16.3%
z 476
16.3%
g 23
 
0.8%
o 12
 
0.4%
s 12
 
0.4%
ValueCountFrequency (%)
e 139
19.0%
r 99
13.5%
a 99
13.5%
l 99
13.5%
i 99
13.5%
z 99
13.5%
g 40
 
5.5%
o 29
 
4.0%
s 29
 
4.0%
Other Punctuation
ValueCountFrequency (%)
/ 790
100.0%
ValueCountFrequency (%)
/ 476
100.0%
ValueCountFrequency (%)
/ 99
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5731
87.9%
Common 790
 
12.1%
ValueCountFrequency (%)
Latin 3437
87.8%
Common 476
 
12.2%
ValueCountFrequency (%)
Latin 900
90.1%
Common 99
 
9.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 830
14.5%
e 830
14.5%
r 790
13.8%
a 790
13.8%
l 790
13.8%
i 790
13.8%
z 790
13.8%
g 40
 
0.7%
P 26
 
0.5%
o 26
 
0.5%
Other values (4) 29
 
0.5%
ValueCountFrequency (%)
N 499
14.5%
e 499
14.5%
r 476
13.8%
a 476
13.8%
l 476
13.8%
i 476
13.8%
z 476
13.8%
g 23
 
0.7%
P 12
 
0.3%
o 12
 
0.3%
ValueCountFrequency (%)
N 139
15.4%
e 139
15.4%
r 99
11.0%
a 99
11.0%
l 99
11.0%
i 99
11.0%
z 99
11.0%
g 40
 
4.4%
P 29
 
3.2%
o 29
 
3.2%
Common
ValueCountFrequency (%)
/ 790
100.0%
ValueCountFrequency (%)
/ 476
100.0%
ValueCountFrequency (%)
/ 99
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6521
100.0%
ValueCountFrequency (%)
ASCII 3913
100.0%
ValueCountFrequency (%)
ASCII 999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 830
12.7%
e 830
12.7%
/ 790
12.1%
r 790
12.1%
a 790
12.1%
l 790
12.1%
i 790
12.1%
z 790
12.1%
g 40
 
0.6%
P 26
 
0.4%
Other values (5) 55
 
0.8%
ValueCountFrequency (%)
N 499
12.8%
e 499
12.8%
/ 476
12.2%
r 476
12.2%
a 476
12.2%
l 476
12.2%
i 476
12.2%
z 476
12.2%
g 23
 
0.6%
P 12
 
0.3%
Other values (2) 24
 
0.6%
ValueCountFrequency (%)
N 139
13.9%
e 139
13.9%
/ 99
9.9%
r 99
9.9%
a 99
9.9%
l 99
9.9%
i 99
9.9%
z 99
9.9%
g 40
 
4.0%
P 29
 
2.9%
Other values (2) 58
5.8%

cultEsc
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct443
Distinct (%)0.5%0.8%1.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Pos
697 
N/realiz
97 
Neg
 
62
And
 
1
N/realiz
372 
Pos
80 
Neg
57 
And
 
2
Pos
81 
N/realiz
60 
Neg
27 

Length

 Cluster 1Cluster 2Cluster 3
Max length888
Median length383
Mean length3.56592776.63992174.7857143
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters30563393804
Distinct characters151512
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique100 ?
Unique (%)0.1%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowPosN/realizN/realiz
2nd rowPosN/realizN/realiz
3rd rowPosN/realizNeg
4th rowPosPosNeg
5th rowPosNegN/realiz

Common Values

ValueCountFrequency (%)
Pos 697
81.3%
N/realiz 97
 
11.3%
Neg 62
 
7.2%
And 1
 
0.1%
ValueCountFrequency (%)
N/realiz 372
72.8%
Pos 80
 
15.7%
Neg 57
 
11.2%
And 2
 
0.4%
ValueCountFrequency (%)
Pos 81
48.2%
N/realiz 60
35.7%
Neg 27
 
16.1%

Length

2023-08-25T13:07:46.023917image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:46.192693image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:46.348244image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:46.491478image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
pos 697
81.3%
n/realiz 97
 
11.3%
neg 62
 
7.2%
and 1
 
0.1%
ValueCountFrequency (%)
n/realiz 372
72.8%
pos 80
 
15.7%
neg 57
 
11.2%
and 2
 
0.4%
ValueCountFrequency (%)
pos 81
48.2%
n/realiz 60
35.7%
neg 27
 
16.1%

Most occurring characters

ValueCountFrequency (%)
P 697
22.8%
o 697
22.8%
s 697
22.8%
N 159
 
5.2%
e 159
 
5.2%
/ 97
 
3.2%
r 97
 
3.2%
a 97
 
3.2%
l 97
 
3.2%
i 97
 
3.2%
Other values (5) 162
 
5.3%
ValueCountFrequency (%)
N 429
12.6%
e 429
12.6%
/ 372
11.0%
r 372
11.0%
a 372
11.0%
l 372
11.0%
i 372
11.0%
z 372
11.0%
P 80
 
2.4%
o 80
 
2.4%
Other values (5) 143
 
4.2%
ValueCountFrequency (%)
N 87
10.8%
e 87
10.8%
P 81
10.1%
o 81
10.1%
s 81
10.1%
/ 60
7.5%
r 60
7.5%
a 60
7.5%
l 60
7.5%
i 60
7.5%
Other values (2) 87
10.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2102
68.8%
Uppercase Letter 857
28.0%
Other Punctuation 97
 
3.2%
ValueCountFrequency (%)
Lowercase Letter 2510
74.0%
Uppercase Letter 511
 
15.1%
Other Punctuation 372
 
11.0%
ValueCountFrequency (%)
Lowercase Letter 576
71.6%
Uppercase Letter 168
 
20.9%
Other Punctuation 60
 
7.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 697
81.3%
N 159
 
18.6%
A 1
 
0.1%
ValueCountFrequency (%)
N 429
84.0%
P 80
 
15.7%
A 2
 
0.4%
ValueCountFrequency (%)
N 87
51.8%
P 81
48.2%
Lowercase Letter
ValueCountFrequency (%)
o 697
33.2%
s 697
33.2%
e 159
 
7.6%
r 97
 
4.6%
a 97
 
4.6%
l 97
 
4.6%
i 97
 
4.6%
z 97
 
4.6%
g 62
 
2.9%
n 1
 
< 0.1%
ValueCountFrequency (%)
e 429
17.1%
r 372
14.8%
a 372
14.8%
l 372
14.8%
i 372
14.8%
z 372
14.8%
o 80
 
3.2%
s 80
 
3.2%
g 57
 
2.3%
n 2
 
0.1%
ValueCountFrequency (%)
e 87
15.1%
o 81
14.1%
s 81
14.1%
r 60
10.4%
a 60
10.4%
l 60
10.4%
i 60
10.4%
z 60
10.4%
g 27
 
4.7%
Other Punctuation
ValueCountFrequency (%)
/ 97
100.0%
ValueCountFrequency (%)
/ 372
100.0%
ValueCountFrequency (%)
/ 60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2959
96.8%
Common 97
 
3.2%
ValueCountFrequency (%)
Latin 3021
89.0%
Common 372
 
11.0%
ValueCountFrequency (%)
Latin 744
92.5%
Common 60
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 697
23.6%
o 697
23.6%
s 697
23.6%
N 159
 
5.4%
e 159
 
5.4%
r 97
 
3.3%
a 97
 
3.3%
l 97
 
3.3%
i 97
 
3.3%
z 97
 
3.3%
Other values (4) 65
 
2.2%
ValueCountFrequency (%)
N 429
14.2%
e 429
14.2%
r 372
12.3%
a 372
12.3%
l 372
12.3%
i 372
12.3%
z 372
12.3%
P 80
 
2.6%
o 80
 
2.6%
s 80
 
2.6%
Other values (4) 63
 
2.1%
ValueCountFrequency (%)
N 87
11.7%
e 87
11.7%
P 81
10.9%
o 81
10.9%
s 81
10.9%
r 60
8.1%
a 60
8.1%
l 60
8.1%
i 60
8.1%
z 60
8.1%
Common
ValueCountFrequency (%)
/ 97
100.0%
ValueCountFrequency (%)
/ 372
100.0%
ValueCountFrequency (%)
/ 60
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3056
100.0%
ValueCountFrequency (%)
ASCII 3393
100.0%
ValueCountFrequency (%)
ASCII 804
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 697
22.8%
o 697
22.8%
s 697
22.8%
N 159
 
5.2%
e 159
 
5.2%
/ 97
 
3.2%
r 97
 
3.2%
a 97
 
3.2%
l 97
 
3.2%
i 97
 
3.2%
Other values (5) 162
 
5.3%
ValueCountFrequency (%)
N 429
12.6%
e 429
12.6%
/ 372
11.0%
r 372
11.0%
a 372
11.0%
l 372
11.0%
i 372
11.0%
z 372
11.0%
P 80
 
2.4%
o 80
 
2.4%
Other values (5) 143
 
4.2%
ValueCountFrequency (%)
N 87
10.8%
e 87
10.8%
P 81
10.1%
o 81
10.1%
s 81
10.1%
/ 60
7.5%
r 60
7.5%
a 60
7.5%
l 60
7.5%
i 60
7.5%
Other values (2) 87
10.8%

RX
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct555
Distinct (%)0.6%1.0%3.0%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Susp TB
517 
Susp c/cavid
179 
N/realiz
125 
Normal
 
30
Outra Patologia
 
6
Susp TB
304 
Susp c/cavid
109 
N/realiz
62 
Normal
 
29
Outra Patologia
 
7
Susp TB
120 
N/realiz
21 
Normal
20 
Outra Patologia
 
4
Susp c/cavid
 
3

Length

 Cluster 1Cluster 2Cluster 3
Max length151515
Median length777
Mean length8.21120198.24070457.2857143
Min length666

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters703742111224
Distinct characters242424
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowSusp c/cavidSusp TBSusp TB
2nd rowSusp TBN/realizSusp TB
3rd rowSusp TBN/realizSusp c/cavid
4th rowSusp TBSusp TBSusp TB
5th rowSusp TBSusp c/cavidSusp TB

Common Values

ValueCountFrequency (%)
Susp TB 517
60.3%
Susp c/cavid 179
 
20.9%
N/realiz 125
 
14.6%
Normal 30
 
3.5%
Outra Patologia 6
 
0.7%
ValueCountFrequency (%)
Susp TB 304
59.5%
Susp c/cavid 109
 
21.3%
N/realiz 62
 
12.1%
Normal 29
 
5.7%
Outra Patologia 7
 
1.4%
ValueCountFrequency (%)
Susp TB 120
71.4%
N/realiz 21
 
12.5%
Normal 20
 
11.9%
Outra Patologia 4
 
2.4%
Susp c/cavid 3
 
1.8%

Length

2023-08-25T13:07:46.630690image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:46.809987image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:46.963906image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:47.121193image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
susp 696
44.6%
tb 517
33.2%
c/cavid 179
 
11.5%
n/realiz 125
 
8.0%
normal 30
 
1.9%
outra 6
 
0.4%
patologia 6
 
0.4%
ValueCountFrequency (%)
susp 413
44.4%
tb 304
32.7%
c/cavid 109
 
11.7%
n/realiz 62
 
6.7%
normal 29
 
3.1%
outra 7
 
0.8%
patologia 7
 
0.8%
ValueCountFrequency (%)
susp 123
41.7%
tb 120
40.7%
n/realiz 21
 
7.1%
normal 20
 
6.8%
outra 4
 
1.4%
patologia 4
 
1.4%
c/cavid 3
 
1.0%

Most occurring characters

ValueCountFrequency (%)
702
10.0%
u 702
10.0%
S 696
9.9%
s 696
9.9%
p 696
9.9%
T 517
 
7.3%
B 517
 
7.3%
c 358
 
5.1%
a 352
 
5.0%
i 310
 
4.4%
Other values (14) 1491
21.2%
ValueCountFrequency (%)
420
10.0%
u 420
10.0%
S 413
9.8%
s 413
9.8%
p 413
9.8%
T 304
 
7.2%
B 304
 
7.2%
a 221
 
5.2%
c 218
 
5.2%
i 178
 
4.2%
Other values (14) 907
21.5%
ValueCountFrequency (%)
127
10.4%
u 127
10.4%
S 123
10.0%
s 123
10.0%
p 123
10.0%
T 120
9.8%
B 120
9.8%
a 56
 
4.6%
r 45
 
3.7%
l 45
 
3.7%
Other values (14) 215
17.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4134
58.7%
Uppercase Letter 1897
27.0%
Space Separator 702
 
10.0%
Other Punctuation 304
 
4.3%
ValueCountFrequency (%)
Lowercase Letter 2494
59.2%
Uppercase Letter 1126
26.7%
Space Separator 420
 
10.0%
Other Punctuation 171
 
4.1%
ValueCountFrequency (%)
Lowercase Letter 661
54.0%
Uppercase Letter 412
33.7%
Space Separator 127
 
10.4%
Other Punctuation 24
 
2.0%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
702
100.0%
ValueCountFrequency (%)
420
100.0%
ValueCountFrequency (%)
127
100.0%
Lowercase Letter
ValueCountFrequency (%)
u 702
17.0%
s 696
16.8%
p 696
16.8%
c 358
8.7%
a 352
8.5%
i 310
7.5%
d 179
 
4.3%
v 179
 
4.3%
r 161
 
3.9%
l 161
 
3.9%
Other values (6) 340
8.2%
ValueCountFrequency (%)
u 420
16.8%
s 413
16.6%
p 413
16.6%
a 221
8.9%
c 218
8.7%
i 178
7.1%
d 109
 
4.4%
v 109
 
4.4%
r 98
 
3.9%
l 98
 
3.9%
Other values (6) 217
8.7%
ValueCountFrequency (%)
u 127
19.2%
s 123
18.6%
p 123
18.6%
a 56
8.5%
r 45
 
6.8%
l 45
 
6.8%
i 28
 
4.2%
o 28
 
4.2%
e 21
 
3.2%
z 21
 
3.2%
Other values (6) 44
 
6.7%
Uppercase Letter
ValueCountFrequency (%)
S 696
36.7%
T 517
27.3%
B 517
27.3%
N 155
 
8.2%
O 6
 
0.3%
P 6
 
0.3%
ValueCountFrequency (%)
S 413
36.7%
T 304
27.0%
B 304
27.0%
N 91
 
8.1%
O 7
 
0.6%
P 7
 
0.6%
ValueCountFrequency (%)
S 123
29.9%
T 120
29.1%
B 120
29.1%
N 41
 
10.0%
O 4
 
1.0%
P 4
 
1.0%
Other Punctuation
ValueCountFrequency (%)
/ 304
100.0%
ValueCountFrequency (%)
/ 171
100.0%
ValueCountFrequency (%)
/ 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6031
85.7%
Common 1006
 
14.3%
ValueCountFrequency (%)
Latin 3620
86.0%
Common 591
 
14.0%
ValueCountFrequency (%)
Latin 1073
87.7%
Common 151
 
12.3%

Most frequent character per script

Common
ValueCountFrequency (%)
702
69.8%
/ 304
30.2%
ValueCountFrequency (%)
420
71.1%
/ 171
28.9%
ValueCountFrequency (%)
127
84.1%
/ 24
 
15.9%
Latin
ValueCountFrequency (%)
u 702
11.6%
S 696
11.5%
s 696
11.5%
p 696
11.5%
T 517
8.6%
B 517
8.6%
c 358
 
5.9%
a 352
 
5.8%
i 310
 
5.1%
d 179
 
3.0%
Other values (12) 1008
16.7%
ValueCountFrequency (%)
u 420
11.6%
S 413
11.4%
s 413
11.4%
p 413
11.4%
T 304
8.4%
B 304
8.4%
a 221
 
6.1%
c 218
 
6.0%
i 178
 
4.9%
d 109
 
3.0%
Other values (12) 627
17.3%
ValueCountFrequency (%)
u 127
11.8%
S 123
11.5%
s 123
11.5%
p 123
11.5%
T 120
11.2%
B 120
11.2%
a 56
 
5.2%
r 45
 
4.2%
l 45
 
4.2%
N 41
 
3.8%
Other values (12) 150
14.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7037
100.0%
ValueCountFrequency (%)
ASCII 4211
100.0%
ValueCountFrequency (%)
ASCII 1224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
702
10.0%
u 702
10.0%
S 696
9.9%
s 696
9.9%
p 696
9.9%
T 517
 
7.3%
B 517
 
7.3%
c 358
 
5.1%
a 352
 
5.0%
i 310
 
4.4%
Other values (14) 1491
21.2%
ValueCountFrequency (%)
420
10.0%
u 420
10.0%
S 413
9.8%
s 413
9.8%
p 413
9.8%
T 304
 
7.2%
B 304
 
7.2%
a 221
 
5.2%
c 218
 
5.2%
i 178
 
4.2%
Other values (14) 907
21.5%
ValueCountFrequency (%)
127
10.4%
u 127
10.4%
S 123
10.0%
s 123
10.0%
p 123
10.0%
T 120
9.8%
B 120
9.8%
a 56
 
4.6%
r 45
 
3.7%
l 45
 
3.7%
Other values (14) 215
17.6%

NECROP
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N/realiz
854 
Sugestivo TB
 
3
N/realiz
509 
BAAR pos
 
2
N/realiz
167 
Sugestivo TB
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length12812
Median length888
Mean length8.014002388.0238095
Min length888

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters686840881348
Distinct characters181518
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique001 ?
Unique (%)0.0%0.0%0.6%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowN/realizN/realizN/realiz
2nd rowN/realizN/realizN/realiz
3rd rowN/realizN/realizN/realiz
4th rowN/realizN/realizN/realiz
5th rowN/realizN/realizN/realiz

Common Values

ValueCountFrequency (%)
N/realiz 854
99.6%
Sugestivo TB 3
 
0.4%
ValueCountFrequency (%)
N/realiz 509
99.6%
BAAR pos 2
 
0.4%
ValueCountFrequency (%)
N/realiz 167
99.4%
Sugestivo TB 1
 
0.6%

Length

2023-08-25T13:07:47.258181image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:47.403401image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:47.520702image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:47.641593image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n/realiz 854
99.3%
sugestivo 3
 
0.3%
tb 3
 
0.3%
ValueCountFrequency (%)
n/realiz 509
99.2%
baar 2
 
0.4%
pos 2
 
0.4%
ValueCountFrequency (%)
n/realiz 167
98.8%
sugestivo 1
 
0.6%
tb 1
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e 857
12.5%
i 857
12.5%
N 854
12.4%
r 854
12.4%
a 854
12.4%
l 854
12.4%
z 854
12.4%
/ 854
12.4%
v 3
 
< 0.1%
T 3
 
< 0.1%
Other values (8) 24
 
0.3%
ValueCountFrequency (%)
N 509
12.5%
/ 509
12.5%
r 509
12.5%
e 509
12.5%
a 509
12.5%
l 509
12.5%
i 509
12.5%
z 509
12.5%
A 4
 
0.1%
B 2
 
< 0.1%
Other values (5) 10
 
0.2%
ValueCountFrequency (%)
e 168
12.5%
i 168
12.5%
N 167
12.4%
r 167
12.4%
a 167
12.4%
l 167
12.4%
z 167
12.4%
/ 167
12.4%
v 1
 
0.1%
T 1
 
0.1%
Other values (8) 8
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5148
75.0%
Uppercase Letter 863
 
12.6%
Other Punctuation 854
 
12.4%
Space Separator 3
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 3060
74.9%
Uppercase Letter 517
 
12.6%
Other Punctuation 509
 
12.5%
Space Separator 2
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 1010
74.9%
Uppercase Letter 170
 
12.6%
Other Punctuation 167
 
12.4%
Space Separator 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 857
16.6%
i 857
16.6%
r 854
16.6%
a 854
16.6%
l 854
16.6%
z 854
16.6%
v 3
 
0.1%
o 3
 
0.1%
u 3
 
0.1%
t 3
 
0.1%
Other values (2) 6
 
0.1%
ValueCountFrequency (%)
r 509
16.6%
e 509
16.6%
a 509
16.6%
l 509
16.6%
i 509
16.6%
z 509
16.6%
p 2
 
0.1%
o 2
 
0.1%
s 2
 
0.1%
ValueCountFrequency (%)
e 168
16.6%
i 168
16.6%
r 167
16.5%
a 167
16.5%
l 167
16.5%
z 167
16.5%
v 1
 
0.1%
o 1
 
0.1%
u 1
 
0.1%
t 1
 
0.1%
Other values (2) 2
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
N 854
99.0%
T 3
 
0.3%
S 3
 
0.3%
B 3
 
0.3%
ValueCountFrequency (%)
N 509
98.5%
A 4
 
0.8%
B 2
 
0.4%
R 2
 
0.4%
ValueCountFrequency (%)
N 167
98.2%
T 1
 
0.6%
S 1
 
0.6%
B 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
/ 854
100.0%
ValueCountFrequency (%)
/ 509
100.0%
ValueCountFrequency (%)
/ 167
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%
ValueCountFrequency (%)
2
100.0%
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6011
87.5%
Common 857
 
12.5%
ValueCountFrequency (%)
Latin 3577
87.5%
Common 511
 
12.5%
ValueCountFrequency (%)
Latin 1180
87.5%
Common 168
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 857
14.3%
i 857
14.3%
N 854
14.2%
r 854
14.2%
a 854
14.2%
l 854
14.2%
z 854
14.2%
v 3
 
< 0.1%
T 3
 
< 0.1%
o 3
 
< 0.1%
Other values (6) 18
 
0.3%
ValueCountFrequency (%)
N 509
14.2%
r 509
14.2%
e 509
14.2%
a 509
14.2%
l 509
14.2%
i 509
14.2%
z 509
14.2%
A 4
 
0.1%
B 2
 
0.1%
R 2
 
0.1%
Other values (3) 6
 
0.2%
ValueCountFrequency (%)
e 168
14.2%
i 168
14.2%
N 167
14.2%
r 167
14.2%
a 167
14.2%
l 167
14.2%
z 167
14.2%
v 1
 
0.1%
T 1
 
0.1%
o 1
 
0.1%
Other values (6) 6
 
0.5%
Common
ValueCountFrequency (%)
/ 854
99.6%
3
 
0.4%
ValueCountFrequency (%)
/ 509
99.6%
2
 
0.4%
ValueCountFrequency (%)
/ 167
99.4%
1
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6868
100.0%
ValueCountFrequency (%)
ASCII 4088
100.0%
ValueCountFrequency (%)
ASCII 1348
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 857
12.5%
i 857
12.5%
N 854
12.4%
r 854
12.4%
a 854
12.4%
l 854
12.4%
z 854
12.4%
/ 854
12.4%
v 3
 
< 0.1%
T 3
 
< 0.1%
Other values (8) 24
 
0.3%
ValueCountFrequency (%)
N 509
12.5%
/ 509
12.5%
r 509
12.5%
e 509
12.5%
a 509
12.5%
l 509
12.5%
i 509
12.5%
z 509
12.5%
A 4
 
0.1%
B 2
 
< 0.1%
Other values (5) 10
 
0.2%
ValueCountFrequency (%)
e 168
12.5%
i 168
12.5%
N 167
12.4%
r 167
12.4%
a 167
12.4%
l 167
12.4%
z 167
12.4%
/ 167
12.4%
v 1
 
0.1%
T 1
 
0.1%
Other values (8) 8
 
0.6%

hiv
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct343
Distinct (%)0.4%0.8%1.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Neg
738 
Pos
 
63
N/realiz
 
56
Neg
451 
N/realiz
 
44
Pos
 
15
And
 
1
Pos
148 
Neg
 
11
N/realiz
 
9

Length

 Cluster 1Cluster 2Cluster 3
Max length888
Median length333
Mean length3.32672113.43052843.2678571
Min length333

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters28511753549
Distinct characters121512
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique010 ?
Unique (%)0.0%0.2%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNegNegPos
2nd rowNegNegPos
3rd rowPosNegPos
4th rowPosNegN/realiz
5th rowNegNegPos

Common Values

ValueCountFrequency (%)
Neg 738
86.1%
Pos 63
 
7.4%
N/realiz 56
 
6.5%
ValueCountFrequency (%)
Neg 451
88.3%
N/realiz 44
 
8.6%
Pos 15
 
2.9%
And 1
 
0.2%
ValueCountFrequency (%)
Pos 148
88.1%
Neg 11
 
6.5%
N/realiz 9
 
5.4%

Length

2023-08-25T13:07:47.760577image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:47.911029image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:48.044990image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:48.194354image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
neg 738
86.1%
pos 63
 
7.4%
n/realiz 56
 
6.5%
ValueCountFrequency (%)
neg 451
88.3%
n/realiz 44
 
8.6%
pos 15
 
2.9%
and 1
 
0.2%
ValueCountFrequency (%)
pos 148
88.1%
neg 11
 
6.5%
n/realiz 9
 
5.4%

Most occurring characters

ValueCountFrequency (%)
N 794
27.8%
e 794
27.8%
g 738
25.9%
P 63
 
2.2%
o 63
 
2.2%
s 63
 
2.2%
/ 56
 
2.0%
r 56
 
2.0%
a 56
 
2.0%
l 56
 
2.0%
Other values (2) 112
 
3.9%
ValueCountFrequency (%)
N 495
28.2%
e 495
28.2%
g 451
25.7%
/ 44
 
2.5%
r 44
 
2.5%
a 44
 
2.5%
l 44
 
2.5%
i 44
 
2.5%
z 44
 
2.5%
P 15
 
0.9%
Other values (5) 33
 
1.9%
ValueCountFrequency (%)
P 148
27.0%
o 148
27.0%
s 148
27.0%
N 20
 
3.6%
e 20
 
3.6%
g 11
 
2.0%
/ 9
 
1.6%
r 9
 
1.6%
a 9
 
1.6%
l 9
 
1.6%
Other values (2) 18
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1938
68.0%
Uppercase Letter 857
30.1%
Other Punctuation 56
 
2.0%
ValueCountFrequency (%)
Lowercase Letter 1198
68.3%
Uppercase Letter 511
29.2%
Other Punctuation 44
 
2.5%
ValueCountFrequency (%)
Lowercase Letter 372
67.8%
Uppercase Letter 168
30.6%
Other Punctuation 9
 
1.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 794
92.6%
P 63
 
7.4%
ValueCountFrequency (%)
N 495
96.9%
P 15
 
2.9%
A 1
 
0.2%
ValueCountFrequency (%)
P 148
88.1%
N 20
 
11.9%
Lowercase Letter
ValueCountFrequency (%)
e 794
41.0%
g 738
38.1%
o 63
 
3.3%
s 63
 
3.3%
r 56
 
2.9%
a 56
 
2.9%
l 56
 
2.9%
i 56
 
2.9%
z 56
 
2.9%
ValueCountFrequency (%)
e 495
41.3%
g 451
37.6%
r 44
 
3.7%
a 44
 
3.7%
l 44
 
3.7%
i 44
 
3.7%
z 44
 
3.7%
o 15
 
1.3%
s 15
 
1.3%
n 1
 
0.1%
ValueCountFrequency (%)
o 148
39.8%
s 148
39.8%
e 20
 
5.4%
g 11
 
3.0%
r 9
 
2.4%
a 9
 
2.4%
l 9
 
2.4%
i 9
 
2.4%
z 9
 
2.4%
Other Punctuation
ValueCountFrequency (%)
/ 56
100.0%
ValueCountFrequency (%)
/ 44
100.0%
ValueCountFrequency (%)
/ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2795
98.0%
Common 56
 
2.0%
ValueCountFrequency (%)
Latin 1709
97.5%
Common 44
 
2.5%
ValueCountFrequency (%)
Latin 540
98.4%
Common 9
 
1.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 794
28.4%
e 794
28.4%
g 738
26.4%
P 63
 
2.3%
o 63
 
2.3%
s 63
 
2.3%
r 56
 
2.0%
a 56
 
2.0%
l 56
 
2.0%
i 56
 
2.0%
ValueCountFrequency (%)
N 495
29.0%
e 495
29.0%
g 451
26.4%
r 44
 
2.6%
a 44
 
2.6%
l 44
 
2.6%
i 44
 
2.6%
z 44
 
2.6%
P 15
 
0.9%
o 15
 
0.9%
Other values (4) 18
 
1.1%
ValueCountFrequency (%)
P 148
27.4%
o 148
27.4%
s 148
27.4%
N 20
 
3.7%
e 20
 
3.7%
g 11
 
2.0%
r 9
 
1.7%
a 9
 
1.7%
l 9
 
1.7%
i 9
 
1.7%
Common
ValueCountFrequency (%)
/ 56
100.0%
ValueCountFrequency (%)
/ 44
100.0%
ValueCountFrequency (%)
/ 9
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2851
100.0%
ValueCountFrequency (%)
ASCII 1753
100.0%
ValueCountFrequency (%)
ASCII 549
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 794
27.8%
e 794
27.8%
g 738
25.9%
P 63
 
2.2%
o 63
 
2.2%
s 63
 
2.2%
/ 56
 
2.0%
r 56
 
2.0%
a 56
 
2.0%
l 56
 
2.0%
Other values (2) 112
 
3.9%
ValueCountFrequency (%)
N 495
28.2%
e 495
28.2%
g 451
25.7%
/ 44
 
2.5%
r 44
 
2.5%
a 44
 
2.5%
l 44
 
2.5%
i 44
 
2.5%
z 44
 
2.5%
P 15
 
0.9%
Other values (5) 33
 
1.9%
ValueCountFrequency (%)
P 148
27.0%
o 148
27.0%
s 148
27.0%
N 20
 
3.6%
e 20
 
3.6%
g 11
 
2.0%
/ 9
 
1.6%
r 9
 
1.6%
a 9
 
1.6%
l 9
 
1.6%
Other values (2) 18
 
3.3%

aids
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
807 
S
 
50
N
498 
S
 
13
S
144 
N
24 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNS
2nd rowNNS
3rd rowSNS
4th rowNNN
5th rowNNS

Common Values

ValueCountFrequency (%)
N 807
94.2%
S 50
 
5.8%
ValueCountFrequency (%)
N 498
97.5%
S 13
 
2.5%
ValueCountFrequency (%)
S 144
85.7%
N 24
 
14.3%

Length

2023-08-25T13:07:48.317600image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:48.462224image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:48.579883image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:48.700035image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 807
94.2%
s 50
 
5.8%
ValueCountFrequency (%)
n 498
97.5%
s 13
 
2.5%
ValueCountFrequency (%)
s 144
85.7%
n 24
 
14.3%

Most occurring characters

ValueCountFrequency (%)
N 807
94.2%
S 50
 
5.8%
ValueCountFrequency (%)
N 498
97.5%
S 13
 
2.5%
ValueCountFrequency (%)
S 144
85.7%
N 24
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 807
94.2%
S 50
 
5.8%
ValueCountFrequency (%)
N 498
97.5%
S 13
 
2.5%
ValueCountFrequency (%)
S 144
85.7%
N 24
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 807
94.2%
S 50
 
5.8%
ValueCountFrequency (%)
N 498
97.5%
S 13
 
2.5%
ValueCountFrequency (%)
S 144
85.7%
N 24
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 807
94.2%
S 50
 
5.8%
ValueCountFrequency (%)
N 498
97.5%
S 13
 
2.5%
ValueCountFrequency (%)
S 144
85.7%
N 24
 
14.3%

DIABETES
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
793 
S
 
64
N
481 
S
 
30
N
165 
S
 
3

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNN
2nd rowNNN
3rd rowNNN
4th rowNNN
5th rowNNN

Common Values

ValueCountFrequency (%)
N 793
92.5%
S 64
 
7.5%
ValueCountFrequency (%)
N 481
94.1%
S 30
 
5.9%
ValueCountFrequency (%)
N 165
98.2%
S 3
 
1.8%

Length

2023-08-25T13:07:48.816504image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:48.969680image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:49.097142image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:49.218898image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 793
92.5%
s 64
 
7.5%
ValueCountFrequency (%)
n 481
94.1%
s 30
 
5.9%
ValueCountFrequency (%)
n 165
98.2%
s 3
 
1.8%

Most occurring characters

ValueCountFrequency (%)
N 793
92.5%
S 64
 
7.5%
ValueCountFrequency (%)
N 481
94.1%
S 30
 
5.9%
ValueCountFrequency (%)
N 165
98.2%
S 3
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 793
92.5%
S 64
 
7.5%
ValueCountFrequency (%)
N 481
94.1%
S 30
 
5.9%
ValueCountFrequency (%)
N 165
98.2%
S 3
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 793
92.5%
S 64
 
7.5%
ValueCountFrequency (%)
N 481
94.1%
S 30
 
5.9%
ValueCountFrequency (%)
N 165
98.2%
S 3
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 793
92.5%
S 64
 
7.5%
ValueCountFrequency (%)
N 481
94.1%
S 30
 
5.9%
ValueCountFrequency (%)
N 165
98.2%
S 3
 
1.8%

ALCOOLISMO
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
597 
S
260 
N
457 
S
54 
N
144 
S
24 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNN
2nd rowSSN
3rd rowNNS
4th rowNNN
5th rowNNN

Common Values

ValueCountFrequency (%)
N 597
69.7%
S 260
30.3%
ValueCountFrequency (%)
N 457
89.4%
S 54
 
10.6%
ValueCountFrequency (%)
N 144
85.7%
S 24
 
14.3%

Length

2023-08-25T13:07:49.828921image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:49.972078image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:50.095965image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:50.218851image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 597
69.7%
s 260
30.3%
ValueCountFrequency (%)
n 457
89.4%
s 54
 
10.6%
ValueCountFrequency (%)
n 144
85.7%
s 24
 
14.3%

Most occurring characters

ValueCountFrequency (%)
N 597
69.7%
S 260
30.3%
ValueCountFrequency (%)
N 457
89.4%
S 54
 
10.6%
ValueCountFrequency (%)
N 144
85.7%
S 24
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 597
69.7%
S 260
30.3%
ValueCountFrequency (%)
N 457
89.4%
S 54
 
10.6%
ValueCountFrequency (%)
N 144
85.7%
S 24
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 597
69.7%
S 260
30.3%
ValueCountFrequency (%)
N 457
89.4%
S 54
 
10.6%
ValueCountFrequency (%)
N 144
85.7%
S 24
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 597
69.7%
S 260
30.3%
ValueCountFrequency (%)
N 457
89.4%
S 54
 
10.6%
ValueCountFrequency (%)
N 144
85.7%
S 24
 
14.3%

MENTAL
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
842 
S
 
15
N
503 
S
 
8
N
166 
S
 
2

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNN
2nd rowNNN
3rd rowNNN
4th rowNNN
5th rowNNN

Common Values

ValueCountFrequency (%)
N 842
98.2%
S 15
 
1.8%
ValueCountFrequency (%)
N 503
98.4%
S 8
 
1.6%
ValueCountFrequency (%)
N 166
98.8%
S 2
 
1.2%

Length

2023-08-25T13:07:50.339978image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:50.512551image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:50.647249image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:50.778746image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 842
98.2%
s 15
 
1.8%
ValueCountFrequency (%)
n 503
98.4%
s 8
 
1.6%
ValueCountFrequency (%)
n 166
98.8%
s 2
 
1.2%

Most occurring characters

ValueCountFrequency (%)
N 842
98.2%
S 15
 
1.8%
ValueCountFrequency (%)
N 503
98.4%
S 8
 
1.6%
ValueCountFrequency (%)
N 166
98.8%
S 2
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 842
98.2%
S 15
 
1.8%
ValueCountFrequency (%)
N 503
98.4%
S 8
 
1.6%
ValueCountFrequency (%)
N 166
98.8%
S 2
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 842
98.2%
S 15
 
1.8%
ValueCountFrequency (%)
N 503
98.4%
S 8
 
1.6%
ValueCountFrequency (%)
N 166
98.8%
S 2
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 842
98.2%
S 15
 
1.8%
ValueCountFrequency (%)
N 503
98.4%
S 8
 
1.6%
ValueCountFrequency (%)
N 166
98.8%
S 2
 
1.2%

DROGADICAO
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
634 
S
223 
N
471 
S
 
40
N
136 
S
32 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNN
2nd rowNSS
3rd rowNNS
4th rowSNN
5th rowNNN

Common Values

ValueCountFrequency (%)
N 634
74.0%
S 223
 
26.0%
ValueCountFrequency (%)
N 471
92.2%
S 40
 
7.8%
ValueCountFrequency (%)
N 136
81.0%
S 32
 
19.0%

Length

2023-08-25T13:07:50.901144image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:51.053028image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:51.180691image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:51.304111image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 634
74.0%
s 223
 
26.0%
ValueCountFrequency (%)
n 471
92.2%
s 40
 
7.8%
ValueCountFrequency (%)
n 136
81.0%
s 32
 
19.0%

Most occurring characters

ValueCountFrequency (%)
N 634
74.0%
S 223
 
26.0%
ValueCountFrequency (%)
N 471
92.2%
S 40
 
7.8%
ValueCountFrequency (%)
N 136
81.0%
S 32
 
19.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 634
74.0%
S 223
 
26.0%
ValueCountFrequency (%)
N 471
92.2%
S 40
 
7.8%
ValueCountFrequency (%)
N 136
81.0%
S 32
 
19.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 634
74.0%
S 223
 
26.0%
ValueCountFrequency (%)
N 471
92.2%
S 40
 
7.8%
ValueCountFrequency (%)
N 136
81.0%
S 32
 
19.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 634
74.0%
S 223
 
26.0%
ValueCountFrequency (%)
N 471
92.2%
S 40
 
7.8%
ValueCountFrequency (%)
N 136
81.0%
S 32
 
19.0%

TABAGISMO
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N
614 
S
243 
N
463 
S
48 
N
154 
S
 
14

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNNN
2nd rowSSN
3rd rowNNN
4th rowNNN
5th rowSNN

Common Values

ValueCountFrequency (%)
N 614
71.6%
S 243
 
28.4%
ValueCountFrequency (%)
N 463
90.6%
S 48
 
9.4%
ValueCountFrequency (%)
N 154
91.7%
S 14
 
8.3%

Length

2023-08-25T13:07:51.421243image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:51.567830image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:51.691633image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:51.816295image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n 614
71.6%
s 243
 
28.4%
ValueCountFrequency (%)
n 463
90.6%
s 48
 
9.4%
ValueCountFrequency (%)
n 154
91.7%
s 14
 
8.3%

Most occurring characters

ValueCountFrequency (%)
N 614
71.6%
S 243
 
28.4%
ValueCountFrequency (%)
N 463
90.6%
S 48
 
9.4%
ValueCountFrequency (%)
N 154
91.7%
S 14
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 857
100.0%
ValueCountFrequency (%)
Uppercase Letter 511
100.0%
ValueCountFrequency (%)
Uppercase Letter 168
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 614
71.6%
S 243
 
28.4%
ValueCountFrequency (%)
N 463
90.6%
S 48
 
9.4%
ValueCountFrequency (%)
N 154
91.7%
S 14
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 857
100.0%
ValueCountFrequency (%)
Latin 511
100.0%
ValueCountFrequency (%)
Latin 168
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 614
71.6%
S 243
 
28.4%
ValueCountFrequency (%)
N 463
90.6%
S 48
 
9.4%
ValueCountFrequency (%)
N 154
91.7%
S 14
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 614
71.6%
S 243
 
28.4%
ValueCountFrequency (%)
N 463
90.6%
S 48
 
9.4%
ValueCountFrequency (%)
N 154
91.7%
S 14
 
8.3%

motMudEsquema
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct344
Distinct (%)0.4%0.8%2.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Nulo
854 
Resistencia Medicamentosa
 
2
Intolerancia/Toxicidade
 
1
Nulo
503 
Intolerancia/Toxicidade
 
5
Resistencia Medicamentosa
 
2
Outro Motivo
 
1
Nulo
159 
Intolerancia/Toxicidade
 
7
Resistencia Medicamentosa
 
1
Outro Motivo
 
1

Length

 Cluster 1Cluster 2Cluster 3
Max length252525
Median length444
Mean length4.07117854.28375734.9642857
Min length444

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters34892189834
Distinct characters212323
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique112 ?
Unique (%)0.1%0.2%1.2%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowNuloNuloNulo
2nd rowNuloNuloNulo
3rd rowNuloNuloNulo
4th rowNuloNuloNulo
5th rowNuloNuloNulo

Common Values

ValueCountFrequency (%)
Nulo 854
99.6%
Resistencia Medicamentosa 2
 
0.2%
Intolerancia/Toxicidade 1
 
0.1%
ValueCountFrequency (%)
Nulo 503
98.4%
Intolerancia/Toxicidade 5
 
1.0%
Resistencia Medicamentosa 2
 
0.4%
Outro Motivo 1
 
0.2%
ValueCountFrequency (%)
Nulo 159
94.6%
Intolerancia/Toxicidade 7
 
4.2%
Resistencia Medicamentosa 1
 
0.6%
Outro Motivo 1
 
0.6%

Length

2023-08-25T13:07:51.939001image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:52.090791image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:52.227052image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:52.372309image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
nulo 854
99.4%
resistencia 2
 
0.2%
medicamentosa 2
 
0.2%
intolerancia/toxicidade 1
 
0.1%
ValueCountFrequency (%)
nulo 503
97.9%
intolerancia/toxicidade 5
 
1.0%
resistencia 2
 
0.4%
medicamentosa 2
 
0.4%
outro 1
 
0.2%
motivo 1
 
0.2%
ValueCountFrequency (%)
nulo 159
93.5%
intolerancia/toxicidade 7
 
4.1%
resistencia 1
 
0.6%
medicamentosa 1
 
0.6%
outro 1
 
0.6%
motivo 1
 
0.6%

Most occurring characters

ValueCountFrequency (%)
o 858
24.6%
l 855
24.5%
N 854
24.5%
u 854
24.5%
e 10
 
0.3%
a 9
 
0.3%
i 9
 
0.3%
n 6
 
0.2%
c 6
 
0.2%
s 6
 
0.2%
Other values (11) 22
 
0.6%
ValueCountFrequency (%)
o 518
23.7%
l 508
23.2%
u 504
23.0%
N 503
23.0%
i 22
 
1.0%
a 21
 
1.0%
e 18
 
0.8%
n 14
 
0.6%
c 14
 
0.6%
d 12
 
0.5%
Other values (13) 55
 
2.5%
ValueCountFrequency (%)
o 177
21.2%
l 166
19.9%
u 160
19.2%
N 159
19.1%
i 25
 
3.0%
a 24
 
2.9%
e 18
 
2.2%
n 16
 
1.9%
c 16
 
1.9%
d 15
 
1.8%
Other values (13) 58
 
7.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2626
75.3%
Uppercase Letter 860
 
24.6%
Space Separator 2
 
0.1%
Other Punctuation 1
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 1662
75.9%
Uppercase Letter 519
 
23.7%
Other Punctuation 5
 
0.2%
Space Separator 3
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 648
77.7%
Uppercase Letter 177
 
21.2%
Other Punctuation 7
 
0.8%
Space Separator 2
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 858
32.7%
l 855
32.6%
u 854
32.5%
e 10
 
0.4%
a 9
 
0.3%
i 9
 
0.3%
n 6
 
0.2%
c 6
 
0.2%
s 6
 
0.2%
t 5
 
0.2%
Other values (4) 8
 
0.3%
ValueCountFrequency (%)
o 518
31.2%
l 508
30.6%
u 504
30.3%
i 22
 
1.3%
a 21
 
1.3%
e 18
 
1.1%
n 14
 
0.8%
c 14
 
0.8%
d 12
 
0.7%
t 11
 
0.7%
Other values (5) 20
 
1.2%
ValueCountFrequency (%)
o 177
27.3%
l 166
25.6%
u 160
24.7%
i 25
 
3.9%
a 24
 
3.7%
e 18
 
2.8%
n 16
 
2.5%
c 16
 
2.5%
d 15
 
2.3%
t 11
 
1.7%
Other values (5) 20
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
N 854
99.3%
R 2
 
0.2%
M 2
 
0.2%
I 1
 
0.1%
T 1
 
0.1%
ValueCountFrequency (%)
N 503
96.9%
T 5
 
1.0%
I 5
 
1.0%
M 3
 
0.6%
R 2
 
0.4%
O 1
 
0.2%
ValueCountFrequency (%)
N 159
89.8%
T 7
 
4.0%
I 7
 
4.0%
M 2
 
1.1%
R 1
 
0.6%
O 1
 
0.6%
Space Separator
ValueCountFrequency (%)
2
100.0%
ValueCountFrequency (%)
3
100.0%
ValueCountFrequency (%)
2
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%
ValueCountFrequency (%)
/ 5
100.0%
ValueCountFrequency (%)
/ 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3486
99.9%
Common 3
 
0.1%
ValueCountFrequency (%)
Latin 2181
99.6%
Common 8
 
0.4%
ValueCountFrequency (%)
Latin 825
98.9%
Common 9
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 858
24.6%
l 855
24.5%
N 854
24.5%
u 854
24.5%
e 10
 
0.3%
a 9
 
0.3%
i 9
 
0.3%
n 6
 
0.2%
c 6
 
0.2%
s 6
 
0.2%
Other values (9) 19
 
0.5%
ValueCountFrequency (%)
o 518
23.8%
l 508
23.3%
u 504
23.1%
N 503
23.1%
i 22
 
1.0%
a 21
 
1.0%
e 18
 
0.8%
n 14
 
0.6%
c 14
 
0.6%
d 12
 
0.6%
Other values (11) 47
 
2.2%
ValueCountFrequency (%)
o 177
21.5%
l 166
20.1%
u 160
19.4%
N 159
19.3%
i 25
 
3.0%
a 24
 
2.9%
e 18
 
2.2%
n 16
 
1.9%
c 16
 
1.9%
d 15
 
1.8%
Other values (11) 49
 
5.9%
Common
ValueCountFrequency (%)
2
66.7%
/ 1
33.3%
ValueCountFrequency (%)
/ 5
62.5%
3
37.5%
ValueCountFrequency (%)
/ 7
77.8%
2
 
22.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3489
100.0%
ValueCountFrequency (%)
ASCII 2189
100.0%
ValueCountFrequency (%)
ASCII 834
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 858
24.6%
l 855
24.5%
N 854
24.5%
u 854
24.5%
e 10
 
0.3%
a 9
 
0.3%
i 9
 
0.3%
n 6
 
0.2%
c 6
 
0.2%
s 6
 
0.2%
Other values (11) 22
 
0.6%
ValueCountFrequency (%)
o 518
23.7%
l 508
23.2%
u 504
23.0%
N 503
23.0%
i 22
 
1.0%
a 21
 
1.0%
e 18
 
0.8%
n 14
 
0.6%
c 14
 
0.6%
d 12
 
0.5%
Other values (13) 55
 
2.5%
ValueCountFrequency (%)
o 177
21.2%
l 166
19.9%
u 160
19.2%
N 159
19.1%
i 25
 
3.0%
a 24
 
2.9%
e 18
 
2.2%
n 16
 
1.9%
c 16
 
1.9%
d 15
 
1.8%
Other values (13) 58
 
7.0%

tipoTrat
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
Supervisionado
724 
Auto-Administrado
133 
Supervisionado
363 
Auto-Administrado
148 
Auto-Administrado
125 
Supervisionado
43 

Length

 Cluster 1Cluster 2Cluster 3
Max length171717
Median length141417
Mean length14.46557814.86888516.232143
Min length141414

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters1239775982727
Distinct characters161616
Distinct categories333 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowSupervisionadoSupervisionadoAuto-Administrado
2nd rowSupervisionadoSupervisionadoSupervisionado
3rd rowSupervisionadoSupervisionadoSupervisionado
4th rowSupervisionadoAuto-AdministradoAuto-Administrado
5th rowSupervisionadoSupervisionadoAuto-Administrado

Common Values

ValueCountFrequency (%)
Supervisionado 724
84.5%
Auto-Administrado 133
 
15.5%
ValueCountFrequency (%)
Supervisionado 363
71.0%
Auto-Administrado 148
29.0%
ValueCountFrequency (%)
Auto-Administrado 125
74.4%
Supervisionado 43
 
25.6%

Length

2023-08-25T13:07:52.517426image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:52.679481image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:52.808814image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:52.948515image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
supervisionado 724
84.5%
auto-administrado 133
 
15.5%
ValueCountFrequency (%)
supervisionado 363
71.0%
auto-administrado 148
29.0%
ValueCountFrequency (%)
auto-administrado 125
74.4%
supervisionado 43
 
25.6%

Most occurring characters

ValueCountFrequency (%)
i 1714
13.8%
o 1714
13.8%
d 990
8.0%
u 857
 
6.9%
r 857
 
6.9%
s 857
 
6.9%
n 857
 
6.9%
a 857
 
6.9%
S 724
 
5.8%
p 724
 
5.8%
Other values (6) 2246
18.1%
ValueCountFrequency (%)
i 1022
13.5%
o 1022
13.5%
d 659
8.7%
u 511
 
6.7%
r 511
 
6.7%
s 511
 
6.7%
n 511
 
6.7%
a 511
 
6.7%
S 363
 
4.8%
p 363
 
4.8%
Other values (6) 1614
21.2%
ValueCountFrequency (%)
o 336
12.3%
i 336
12.3%
d 293
10.7%
A 250
9.2%
t 250
9.2%
u 168
 
6.2%
n 168
 
6.2%
s 168
 
6.2%
r 168
 
6.2%
a 168
 
6.2%
Other values (6) 422
15.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11274
90.9%
Uppercase Letter 990
 
8.0%
Dash Punctuation 133
 
1.1%
ValueCountFrequency (%)
Lowercase Letter 6791
89.4%
Uppercase Letter 659
 
8.7%
Dash Punctuation 148
 
1.9%
ValueCountFrequency (%)
Lowercase Letter 2309
84.7%
Uppercase Letter 293
 
10.7%
Dash Punctuation 125
 
4.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1714
15.2%
o 1714
15.2%
d 990
8.8%
u 857
7.6%
r 857
7.6%
s 857
7.6%
n 857
7.6%
a 857
7.6%
p 724
6.4%
e 724
6.4%
Other values (3) 1123
10.0%
ValueCountFrequency (%)
i 1022
15.0%
o 1022
15.0%
d 659
9.7%
u 511
7.5%
r 511
7.5%
s 511
7.5%
n 511
7.5%
a 511
7.5%
p 363
 
5.3%
e 363
 
5.3%
Other values (3) 807
11.9%
ValueCountFrequency (%)
o 336
14.6%
i 336
14.6%
d 293
12.7%
t 250
10.8%
u 168
7.3%
n 168
7.3%
s 168
7.3%
r 168
7.3%
a 168
7.3%
m 125
 
5.4%
Other values (3) 129
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
S 724
73.1%
A 266
 
26.9%
ValueCountFrequency (%)
S 363
55.1%
A 296
44.9%
ValueCountFrequency (%)
A 250
85.3%
S 43
 
14.7%
Dash Punctuation
ValueCountFrequency (%)
- 133
100.0%
ValueCountFrequency (%)
- 148
100.0%
ValueCountFrequency (%)
- 125
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12264
98.9%
Common 133
 
1.1%
ValueCountFrequency (%)
Latin 7450
98.1%
Common 148
 
1.9%
ValueCountFrequency (%)
Latin 2602
95.4%
Common 125
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1714
14.0%
o 1714
14.0%
d 990
8.1%
u 857
7.0%
r 857
7.0%
s 857
7.0%
n 857
7.0%
a 857
7.0%
S 724
 
5.9%
p 724
 
5.9%
Other values (5) 2113
17.2%
ValueCountFrequency (%)
i 1022
13.7%
o 1022
13.7%
d 659
8.8%
u 511
 
6.9%
r 511
 
6.9%
s 511
 
6.9%
n 511
 
6.9%
a 511
 
6.9%
S 363
 
4.9%
p 363
 
4.9%
Other values (5) 1466
19.7%
ValueCountFrequency (%)
o 336
12.9%
i 336
12.9%
d 293
11.3%
A 250
9.6%
t 250
9.6%
u 168
6.5%
n 168
6.5%
s 168
6.5%
r 168
6.5%
a 168
6.5%
Other values (5) 297
11.4%
Common
ValueCountFrequency (%)
- 133
100.0%
ValueCountFrequency (%)
- 148
100.0%
ValueCountFrequency (%)
- 125
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12397
100.0%
ValueCountFrequency (%)
ASCII 7598
100.0%
ValueCountFrequency (%)
ASCII 2727
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1714
13.8%
o 1714
13.8%
d 990
8.0%
u 857
 
6.9%
r 857
 
6.9%
s 857
 
6.9%
n 857
 
6.9%
a 857
 
6.9%
S 724
 
5.8%
p 724
 
5.8%
Other values (6) 2246
18.1%
ValueCountFrequency (%)
i 1022
13.5%
o 1022
13.5%
d 659
8.7%
u 511
 
6.7%
r 511
 
6.7%
s 511
 
6.7%
n 511
 
6.7%
a 511
 
6.7%
S 363
 
4.8%
p 363
 
4.8%
Other values (6) 1614
21.2%
ValueCountFrequency (%)
o 336
12.3%
i 336
12.3%
d 293
10.7%
A 250
9.2%
t 250
9.2%
u 168
 
6.2%
n 168
 
6.2%
s 168
 
6.2%
r 168
 
6.2%
a 168
 
6.2%
Other values (6) 422
15.5%

idade
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct444
Distinct (%)0.5%0.8%2.4%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
40_54
251 
23_39
223 
0_22
192 
Mais de 54
191 
0_22
144 
23_39
135 
40_54
120 
Mais de 54
112 
40_54
65 
23_39
62 
Mais de 54
22 
0_22
19 

Length

 Cluster 1Cluster 2Cluster 3
Max length101010
Median length555
Mean length5.89031515.814095.5416667
Min length444

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters50482971931
Distinct characters141414
Distinct categories555 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st row23_3940_5440_54
2nd row40_5440_5423_39
3rd rowMais de 5440_5423_39
4th row0_22Mais de 540_22
5th row23_390_2223_39

Common Values

ValueCountFrequency (%)
40_54 251
29.3%
23_39 223
26.0%
0_22 192
22.4%
Mais de 54 191
22.3%
ValueCountFrequency (%)
0_22 144
28.2%
23_39 135
26.4%
40_54 120
23.5%
Mais de 54 112
21.9%
ValueCountFrequency (%)
40_54 65
38.7%
23_39 62
36.9%
Mais de 54 22
 
13.1%
0_22 19
 
11.3%

Length

2023-08-25T13:07:53.083824image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:53.255184image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:53.409375image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:53.564390image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
40_54 251
20.3%
23_39 223
18.0%
0_22 192
15.5%
mais 191
15.4%
de 191
15.4%
54 191
15.4%
ValueCountFrequency (%)
0_22 144
19.6%
23_39 135
18.4%
40_54 120
16.3%
mais 112
15.2%
de 112
15.2%
54 112
15.2%
ValueCountFrequency (%)
40_54 65
30.7%
23_39 62
29.2%
mais 22
 
10.4%
de 22
 
10.4%
54 22
 
10.4%
0_22 19
 
9.0%

Most occurring characters

ValueCountFrequency (%)
4 693
13.7%
_ 666
13.2%
2 607
12.0%
3 446
8.8%
0 443
8.8%
5 442
8.8%
382
7.6%
9 223
 
4.4%
M 191
 
3.8%
a 191
 
3.8%
Other values (4) 764
15.1%
ValueCountFrequency (%)
2 423
14.2%
_ 399
13.4%
4 352
11.8%
3 270
9.1%
0 264
8.9%
5 232
7.8%
224
7.5%
9 135
 
4.5%
M 112
 
3.8%
a 112
 
3.8%
Other values (4) 448
15.1%
ValueCountFrequency (%)
4 152
16.3%
_ 146
15.7%
3 124
13.3%
2 100
10.7%
5 87
9.3%
0 84
9.0%
9 62
6.7%
44
 
4.7%
M 22
 
2.4%
a 22
 
2.4%
Other values (4) 88
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2854
56.5%
Lowercase Letter 955
 
18.9%
Connector Punctuation 666
 
13.2%
Space Separator 382
 
7.6%
Uppercase Letter 191
 
3.8%
ValueCountFrequency (%)
Decimal Number 1676
56.4%
Lowercase Letter 560
 
18.8%
Connector Punctuation 399
 
13.4%
Space Separator 224
 
7.5%
Uppercase Letter 112
 
3.8%
ValueCountFrequency (%)
Decimal Number 609
65.4%
Connector Punctuation 146
 
15.7%
Lowercase Letter 110
 
11.8%
Space Separator 44
 
4.7%
Uppercase Letter 22
 
2.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 693
24.3%
2 607
21.3%
3 446
15.6%
0 443
15.5%
5 442
15.5%
9 223
 
7.8%
ValueCountFrequency (%)
2 423
25.2%
4 352
21.0%
3 270
16.1%
0 264
15.8%
5 232
13.8%
9 135
 
8.1%
ValueCountFrequency (%)
4 152
25.0%
3 124
20.4%
2 100
16.4%
5 87
14.3%
0 84
13.8%
9 62
10.2%
Connector Punctuation
ValueCountFrequency (%)
_ 666
100.0%
ValueCountFrequency (%)
_ 399
100.0%
ValueCountFrequency (%)
_ 146
100.0%
Space Separator
ValueCountFrequency (%)
382
100.0%
ValueCountFrequency (%)
224
100.0%
ValueCountFrequency (%)
44
100.0%
Uppercase Letter
ValueCountFrequency (%)
M 191
100.0%
ValueCountFrequency (%)
M 112
100.0%
ValueCountFrequency (%)
M 22
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 191
20.0%
i 191
20.0%
s 191
20.0%
d 191
20.0%
e 191
20.0%
ValueCountFrequency (%)
a 112
20.0%
i 112
20.0%
s 112
20.0%
d 112
20.0%
e 112
20.0%
ValueCountFrequency (%)
a 22
20.0%
i 22
20.0%
s 22
20.0%
d 22
20.0%
e 22
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3902
77.3%
Latin 1146
 
22.7%
ValueCountFrequency (%)
Common 2299
77.4%
Latin 672
 
22.6%
ValueCountFrequency (%)
Common 799
85.8%
Latin 132
 
14.2%

Most frequent character per script

Common
ValueCountFrequency (%)
4 693
17.8%
_ 666
17.1%
2 607
15.6%
3 446
11.4%
0 443
11.4%
5 442
11.3%
382
9.8%
9 223
 
5.7%
ValueCountFrequency (%)
2 423
18.4%
_ 399
17.4%
4 352
15.3%
3 270
11.7%
0 264
11.5%
5 232
10.1%
224
9.7%
9 135
 
5.9%
ValueCountFrequency (%)
4 152
19.0%
_ 146
18.3%
3 124
15.5%
2 100
12.5%
5 87
10.9%
0 84
10.5%
9 62
7.8%
44
 
5.5%
Latin
ValueCountFrequency (%)
M 191
16.7%
a 191
16.7%
i 191
16.7%
s 191
16.7%
d 191
16.7%
e 191
16.7%
ValueCountFrequency (%)
M 112
16.7%
a 112
16.7%
i 112
16.7%
s 112
16.7%
d 112
16.7%
e 112
16.7%
ValueCountFrequency (%)
M 22
16.7%
a 22
16.7%
i 22
16.7%
s 22
16.7%
d 22
16.7%
e 22
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5048
100.0%
ValueCountFrequency (%)
ASCII 2971
100.0%
ValueCountFrequency (%)
ASCII 931
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 693
13.7%
_ 666
13.2%
2 607
12.0%
3 446
8.8%
0 443
8.8%
5 442
8.8%
382
7.6%
9 223
 
4.4%
M 191
 
3.8%
a 191
 
3.8%
Other values (4) 764
15.1%
ValueCountFrequency (%)
2 423
14.2%
_ 399
13.4%
4 352
11.8%
3 270
9.1%
0 264
8.9%
5 232
7.8%
224
7.5%
9 135
 
4.5%
M 112
 
3.8%
a 112
 
3.8%
Other values (4) 448
15.1%
ValueCountFrequency (%)
4 152
16.3%
_ 146
15.7%
3 124
13.3%
2 100
10.7%
5 87
9.3%
0 84
9.0%
9 62
6.7%
44
 
4.7%
M 22
 
2.4%
a 22
 
2.4%
Other values (4) 88
9.5%

HISTOPATOL
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct333
Distinct (%)0.4%0.6%1.8%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
N/realiz
828 
Sugestivo TB
 
18
BAAR pos
 
11
N/realiz
470 
Sugestivo TB
 
29
BAAR pos
 
12
N/realiz
143 
Sugestivo TB
18 
BAAR pos
 
7

Length

 Cluster 1Cluster 2Cluster 3
Max length121212
Median length888
Mean length8.0840148.22700598.4285714
Min length888

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters692842041416
Distinct characters212121
Distinct categories444 ?
Distinct scripts222 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st rowN/realizN/realizBAAR pos
2nd rowN/realizN/realizSugestivo TB
3rd rowN/realizBAAR posN/realiz
4th rowN/realizN/realizSugestivo TB
5th rowN/realizN/realizN/realiz

Common Values

ValueCountFrequency (%)
N/realiz 828
96.6%
Sugestivo TB 18
 
2.1%
BAAR pos 11
 
1.3%
ValueCountFrequency (%)
N/realiz 470
92.0%
Sugestivo TB 29
 
5.7%
BAAR pos 12
 
2.3%
ValueCountFrequency (%)
N/realiz 143
85.1%
Sugestivo TB 18
 
10.7%
BAAR pos 7
 
4.2%

Length

2023-08-25T13:07:53.716221image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:53.884694image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:54.014536image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:54.147165image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
n/realiz 828
93.5%
sugestivo 18
 
2.0%
tb 18
 
2.0%
baar 11
 
1.2%
pos 11
 
1.2%
ValueCountFrequency (%)
n/realiz 470
85.1%
sugestivo 29
 
5.3%
tb 29
 
5.3%
baar 12
 
2.2%
pos 12
 
2.2%
ValueCountFrequency (%)
n/realiz 143
74.1%
sugestivo 18
 
9.3%
tb 18
 
9.3%
baar 7
 
3.6%
pos 7
 
3.6%

Most occurring characters

ValueCountFrequency (%)
e 846
12.2%
i 846
12.2%
N 828
12.0%
r 828
12.0%
a 828
12.0%
l 828
12.0%
z 828
12.0%
/ 828
12.0%
o 29
 
0.4%
B 29
 
0.4%
Other values (11) 210
 
3.0%
ValueCountFrequency (%)
e 499
11.9%
i 499
11.9%
N 470
11.2%
r 470
11.2%
a 470
11.2%
l 470
11.2%
z 470
11.2%
/ 470
11.2%
o 41
 
1.0%
B 41
 
1.0%
Other values (11) 304
7.2%
ValueCountFrequency (%)
e 161
11.4%
i 161
11.4%
N 143
10.1%
r 143
10.1%
a 143
10.1%
l 143
10.1%
z 143
10.1%
/ 143
10.1%
o 25
 
1.8%
B 25
 
1.8%
Other values (11) 186
13.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5145
74.3%
Uppercase Letter 926
 
13.4%
Other Punctuation 828
 
12.0%
Space Separator 29
 
0.4%
ValueCountFrequency (%)
Lowercase Letter 3088
73.5%
Uppercase Letter 605
 
14.4%
Other Punctuation 470
 
11.2%
Space Separator 41
 
1.0%
ValueCountFrequency (%)
Lowercase Letter 1023
72.2%
Uppercase Letter 225
 
15.9%
Other Punctuation 143
 
10.1%
Space Separator 25
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 846
16.4%
i 846
16.4%
r 828
16.1%
a 828
16.1%
l 828
16.1%
z 828
16.1%
o 29
 
0.6%
s 29
 
0.6%
g 18
 
0.3%
v 18
 
0.3%
Other values (3) 47
 
0.9%
ValueCountFrequency (%)
e 499
16.2%
i 499
16.2%
r 470
15.2%
a 470
15.2%
l 470
15.2%
z 470
15.2%
o 41
 
1.3%
s 41
 
1.3%
g 29
 
0.9%
v 29
 
0.9%
Other values (3) 70
 
2.3%
ValueCountFrequency (%)
e 161
15.7%
i 161
15.7%
r 143
14.0%
a 143
14.0%
l 143
14.0%
z 143
14.0%
o 25
 
2.4%
s 25
 
2.4%
g 18
 
1.8%
v 18
 
1.8%
Other values (3) 43
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
N 828
89.4%
B 29
 
3.1%
A 22
 
2.4%
T 18
 
1.9%
S 18
 
1.9%
R 11
 
1.2%
ValueCountFrequency (%)
N 470
77.7%
B 41
 
6.8%
T 29
 
4.8%
S 29
 
4.8%
A 24
 
4.0%
R 12
 
2.0%
ValueCountFrequency (%)
N 143
63.6%
B 25
 
11.1%
T 18
 
8.0%
S 18
 
8.0%
A 14
 
6.2%
R 7
 
3.1%
Other Punctuation
ValueCountFrequency (%)
/ 828
100.0%
ValueCountFrequency (%)
/ 470
100.0%
ValueCountFrequency (%)
/ 143
100.0%
Space Separator
ValueCountFrequency (%)
29
100.0%
ValueCountFrequency (%)
41
100.0%
ValueCountFrequency (%)
25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6071
87.6%
Common 857
 
12.4%
ValueCountFrequency (%)
Latin 3693
87.8%
Common 511
 
12.2%
ValueCountFrequency (%)
Latin 1248
88.1%
Common 168
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 846
13.9%
i 846
13.9%
N 828
13.6%
r 828
13.6%
a 828
13.6%
l 828
13.6%
z 828
13.6%
o 29
 
0.5%
B 29
 
0.5%
s 29
 
0.5%
Other values (9) 152
 
2.5%
ValueCountFrequency (%)
e 499
13.5%
i 499
13.5%
N 470
12.7%
r 470
12.7%
a 470
12.7%
l 470
12.7%
z 470
12.7%
o 41
 
1.1%
B 41
 
1.1%
s 41
 
1.1%
Other values (9) 222
6.0%
ValueCountFrequency (%)
e 161
12.9%
i 161
12.9%
N 143
11.5%
r 143
11.5%
a 143
11.5%
l 143
11.5%
z 143
11.5%
o 25
 
2.0%
B 25
 
2.0%
s 25
 
2.0%
Other values (9) 136
10.9%
Common
ValueCountFrequency (%)
/ 828
96.6%
29
 
3.4%
ValueCountFrequency (%)
/ 470
92.0%
41
 
8.0%
ValueCountFrequency (%)
/ 143
85.1%
25
 
14.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6928
100.0%
ValueCountFrequency (%)
ASCII 4204
100.0%
ValueCountFrequency (%)
ASCII 1416
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 846
12.2%
i 846
12.2%
N 828
12.0%
r 828
12.0%
a 828
12.0%
l 828
12.0%
z 828
12.0%
/ 828
12.0%
o 29
 
0.4%
B 29
 
0.4%
Other values (11) 210
 
3.0%
ValueCountFrequency (%)
e 499
11.9%
i 499
11.9%
N 470
11.2%
r 470
11.2%
a 470
11.2%
l 470
11.2%
z 470
11.2%
/ 470
11.2%
o 41
 
1.0%
B 41
 
1.0%
Other values (11) 304
7.2%
ValueCountFrequency (%)
e 161
11.4%
i 161
11.4%
N 143
10.1%
r 143
10.1%
a 143
10.1%
l 143
10.1%
z 143
10.1%
/ 143
10.1%
o 25
 
1.8%
B 25
 
1.8%
Other values (11) 186
13.1%
 Cluster 1Cluster 2Cluster 3
Distinct222
Distinct (%)0.2%0.4%1.2%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
1
613 
0
244 
0
458 
1
53 
1
102 
0
66 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters222
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st row101
2nd row100
3rd row100
4th row100
5th row101

Common Values

ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Length

2023-08-25T13:07:54.270730image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:54.409960image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:54.531758image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:54.655586image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Most occurring characters

ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 857
100.0%
ValueCountFrequency (%)
Decimal Number 511
100.0%
ValueCountFrequency (%)
Decimal Number 168
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Most occurring scripts

ValueCountFrequency (%)
Common 857
100.0%
ValueCountFrequency (%)
Common 511
100.0%
ValueCountFrequency (%)
Common 168
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 613
71.5%
0 244
 
28.5%
ValueCountFrequency (%)
0 458
89.6%
1 53
 
10.4%
ValueCountFrequency (%)
1 102
60.7%
0 66
39.3%

Cluster
Categorical

 Cluster 1Cluster 2Cluster 3
Distinct111
Distinct (%)0.1%0.2%0.6%
Missing000
Missing (%)0.0%0.0%0.0%
Memory size13.4 KiB8.0 KiB2.6 KiB
0
857 
1
511 
2
168 

Length

 Cluster 1Cluster 2Cluster 3
Max length111
Median length111
Mean length111
Min length111

Characters and Unicode

 Cluster 1Cluster 2Cluster 3
Total characters857511168
Distinct characters111
Distinct categories111 ?
Distinct scripts111 ?
Distinct blocks111 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Cluster 1Cluster 2Cluster 3
Unique000 ?
Unique (%)0.0%0.0%0.0%

Sample

 Cluster 1Cluster 2Cluster 3
1st row012
2nd row012
3rd row012
4th row012
5th row012

Common Values

ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Length

2023-08-25T13:07:54.770405image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Cluster 1

2023-08-25T13:07:54.906160image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:55.013680image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:55.122796image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Most occurring characters

ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 857
100.0%
ValueCountFrequency (%)
Decimal Number 511
100.0%
ValueCountFrequency (%)
Decimal Number 168
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 857
100.0%
ValueCountFrequency (%)
Common 511
100.0%
ValueCountFrequency (%)
Common 168
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 857
100.0%
ValueCountFrequency (%)
ASCII 511
100.0%
ValueCountFrequency (%)
ASCII 168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 857
100.0%
ValueCountFrequency (%)
1 511
100.0%
ValueCountFrequency (%)
2 168
100.0%

Correlations

Cluster 1

2023-08-25T13:07:55.255007image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 2

2023-08-25T13:07:55.587618image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 3

2023-08-25T13:07:55.924348image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Cluster 1

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_Resistencia
racaCor1.0000.0900.1740.0630.0810.1060.0530.0000.0000.0130.0360.0000.0820.0650.0000.0150.0460.0000.0420.0000.0000.0580.0000.0000.0560.0000.128
faixaEtaria0.0901.0000.0970.1280.3250.1320.0000.1200.0880.0720.0330.1110.0150.0000.0000.0990.0590.2350.2510.0000.3020.1680.0000.0600.8360.1360.000
sexo0.1740.0971.0000.0000.4720.0000.0250.0300.0860.1000.0000.0000.1840.0000.0000.0690.0510.0000.1490.0940.0000.0160.0000.0000.0840.0240.220
ESCOLARID0.0630.1280.0001.0000.1240.0000.0000.0480.0450.0000.0000.0000.0400.0380.0000.0390.0000.0000.1590.1390.0790.0920.0000.1000.1030.0470.036
TIPOCUP0.0810.3250.4720.1241.0000.1410.0280.0700.0130.0000.0390.0780.0760.0000.0000.0600.1160.1040.1820.1220.2510.0460.0000.0800.2210.0780.138
sitAtual0.1060.1320.0000.0000.1411.0000.0920.0480.0000.0690.0000.0570.0830.0570.0000.1830.0470.0100.0750.0000.2140.0000.0000.0780.1350.0000.000
tipoCaso0.0530.0000.0250.0000.0280.0921.0000.0580.0000.0000.0150.0000.0420.0000.0000.0720.1260.0000.0340.0000.0000.0740.0000.0670.0500.0000.047
FORMACLIN10.0000.1200.0300.0480.0700.0480.0581.0000.8130.0220.2330.2120.2110.2280.0000.1060.0000.0860.0000.0000.0600.0000.0000.0950.0520.1990.116
classif0.0000.0880.0860.0450.0130.0000.0000.8131.0000.1040.2340.2130.2110.1700.0000.0800.0460.0000.0000.0000.1290.0660.0000.0000.0000.2490.116
descoberta0.0130.0720.1000.0000.0000.0690.0000.0220.1041.0000.1030.1340.1390.0890.0670.0130.0350.0000.0860.0560.0410.1230.0000.1040.0420.0930.082
bac0.0360.0330.0000.0000.0390.0000.0150.2330.2340.1031.0000.1750.1690.0550.0000.0620.0970.0370.0000.0000.1170.0590.0000.0000.0000.0970.162
BACOUTRO0.0000.1110.0000.0000.0780.0570.0000.2120.2130.1340.1751.0000.1350.0990.0000.0000.0000.1130.0920.0000.0890.0000.0000.0890.0630.2580.018
cultEsc0.0820.0150.1840.0400.0760.0830.0420.2110.2110.1390.1690.1351.0000.0000.0000.0320.0550.0000.0990.0460.1200.1240.0370.0980.0000.0630.432
RX0.0650.0000.0000.0380.0000.0570.0000.2280.1700.0890.0550.0990.0001.0000.0000.0800.0930.0000.0090.0110.0000.0630.0000.0000.0000.0710.047
NECROP0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0670.0000.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0380.0000.000
hiv0.0150.0990.0690.0390.0600.1830.0720.1060.0800.0130.0620.0000.0320.0800.0001.0000.8830.0640.0180.0000.0650.0830.0370.1680.0800.0000.043
aids0.0460.0590.0510.0000.1160.0470.1260.0000.0460.0350.0970.0000.0550.0930.0000.8831.0000.0510.0000.0000.0000.0220.0000.0480.0860.0000.040
DIABETES0.0000.2350.0000.0000.1040.0100.0000.0860.0000.0000.0370.1130.0000.0000.0000.0640.0511.0000.0570.0730.1390.0110.0620.0860.2400.0220.031
ALCOOLISMO0.0420.2510.1490.1590.1820.0750.0340.0000.0000.0860.0000.0920.0990.0090.0000.0180.0000.0571.0000.0200.2810.3240.0280.0000.2430.0000.093
MENTAL0.0000.0000.0940.1390.1220.0000.0000.0000.0000.0560.0000.0000.0460.0110.0000.0000.0000.0730.0201.0000.0000.0050.1710.0000.0000.0000.000
DROGADICAO0.0000.3020.0000.0790.2510.2140.0000.0600.1290.0410.1170.0890.1200.0000.0000.0650.0000.1390.2810.0001.0000.1870.0000.0820.2810.0240.113
TABAGISMO0.0580.1680.0160.0920.0460.0000.0740.0000.0660.1230.0590.0000.1240.0630.0000.0830.0220.0110.3240.0050.1871.0000.0000.0000.1320.0300.196
motMudEsquema0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0370.0000.0000.0370.0000.0620.0280.1710.0000.0001.0000.0000.0000.0000.000
tipoTrat0.0000.0600.0000.1000.0800.0780.0670.0950.0000.1040.0000.0890.0980.0000.0000.1680.0480.0860.0000.0000.0820.0000.0001.0000.0000.0200.000
idade0.0560.8360.0840.1030.2210.1350.0500.0520.0000.0420.0000.0630.0000.0000.0380.0800.0860.2400.2430.0000.2810.1320.0000.0001.0000.0000.000
HISTOPATOL0.0000.1360.0240.0470.0780.0000.0000.1990.2490.0930.0970.2580.0630.0710.0000.0000.0000.0220.0000.0000.0240.0300.0000.0200.0001.0000.000
Status_Resistencia0.1280.0000.2200.0360.1380.0000.0470.1160.1160.0820.1620.0180.4320.0470.0000.0430.0400.0310.0930.0000.1130.1960.0000.0000.0000.0001.000

Cluster 2

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_Resistencia
racaCor1.0000.0480.1100.0720.0390.0000.0000.0000.0000.0000.1260.0000.0610.0000.0000.0380.1440.0000.0000.0000.0000.0000.0570.1230.0290.0510.120
faixaEtaria0.0481.0000.2540.2910.2760.1290.0000.0850.0000.2220.1740.0000.0000.0140.0000.1410.0000.2510.1280.0000.1650.1320.1450.1260.8490.1310.022
sexo0.1100.2541.0000.1710.3320.1240.0000.0400.0000.0630.0000.0720.3890.0150.0000.0000.0260.0000.2340.0000.0250.0520.0000.0330.2230.0000.205
ESCOLARID0.0720.2910.1711.0000.1560.0480.0000.1400.0920.0770.1660.0720.0390.0350.0000.0000.0000.0000.1200.1430.0700.0000.0390.0790.1780.0220.090
TIPOCUP0.0390.2760.3320.1561.0000.1470.1340.0000.0000.0000.0940.0790.0690.0000.0000.0210.1170.2670.1110.1220.2090.0280.0000.1400.2890.0540.065
sitAtual0.0000.1290.1240.0480.1471.0000.0920.0000.0910.1450.1440.0000.0370.0000.0000.1580.0000.0000.0000.0000.0760.0000.0000.0000.1110.0060.000
tipoCaso0.0000.0000.0000.0000.1340.0921.0000.0000.0000.0000.0000.0340.0000.1060.2770.0000.0770.0000.1920.1180.1620.0000.0000.0000.0640.0000.000
FORMACLIN10.0000.0850.0400.1400.0000.0000.0001.0000.6960.0000.4140.2920.0000.3290.0000.0000.0000.0930.0000.0000.0000.0000.0000.1190.0410.3610.119
classif0.0000.0000.0000.0920.0000.0910.0000.6961.0000.0540.3830.2270.1230.2550.0000.0000.0340.0210.0370.0000.0190.0000.0000.0360.0270.2860.097
descoberta0.0000.2220.0630.0770.0000.1450.0000.0000.0541.0000.0800.0000.1120.0750.0000.0000.0000.0000.0240.0240.0000.0000.0000.0300.0980.0470.134
bac0.1260.1740.0000.1660.0940.1440.0000.4140.3830.0801.0000.2140.2010.2080.0000.0730.0000.0350.0160.0000.0470.0000.0000.1490.0770.2000.047
BACOUTRO0.0000.0000.0720.0720.0790.0000.0340.2920.2270.0000.2141.0000.1240.0950.1870.0000.0000.0000.0000.0000.0000.0440.0000.1030.0000.1630.057
cultEsc0.0610.0000.3890.0390.0690.0370.0000.0000.1230.1120.2010.1241.0000.0030.0000.0430.0000.1050.0000.0000.0000.0000.0000.0000.0000.0730.466
RX0.0000.0140.0150.0350.0000.0000.1060.3290.2550.0750.2080.0950.0031.0000.0000.0000.0480.0000.0100.0000.0000.0570.2170.0440.0080.1180.056
NECROP0.0000.0000.0000.0000.0000.0000.2770.0000.0000.0000.0000.1870.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
hiv0.0380.1410.0000.0000.0210.1580.0000.0000.0000.0000.0730.0000.0430.0000.0001.0000.8530.0000.1220.0000.0000.0000.1290.1420.0530.0000.000
aids0.1440.0000.0260.0000.1170.0000.0770.0000.0340.0000.0000.0000.0000.0480.0000.8531.0000.0000.0110.0000.0000.0000.2640.0440.0000.0000.000
DIABETES0.0000.2510.0000.0000.2670.0000.0000.0930.0210.0000.0350.0000.1050.0000.0000.0000.0001.0000.0000.0530.0000.0000.0930.1170.2570.0000.000
ALCOOLISMO0.0000.1280.2340.1200.1110.0000.1920.0000.0370.0240.0160.0000.0000.0100.0000.1220.0110.0001.0000.0000.2640.1100.0000.0000.1810.0000.000
MENTAL0.0000.0000.0000.1430.1220.0000.1180.0000.0000.0240.0000.0000.0000.0000.0000.0000.0000.0530.0001.0000.0000.0000.0000.0000.0000.0000.000
DROGADICAO0.0000.1650.0250.0700.2090.0760.1620.0000.0190.0000.0470.0000.0000.0000.0000.0000.0000.0000.2640.0001.0000.1100.0000.0000.2030.0000.000
TABAGISMO0.0000.1320.0520.0000.0280.0000.0000.0000.0000.0000.0000.0440.0000.0570.0000.0000.0000.0000.1100.0000.1101.0000.0570.0240.1340.0000.089
motMudEsquema0.0570.1450.0000.0390.0000.0000.0000.0000.0000.0000.0000.0000.0000.2170.0000.1290.2640.0930.0000.0000.0000.0571.0000.0000.0650.0000.000
tipoTrat0.1230.1260.0330.0790.1400.0000.0000.1190.0360.0300.1490.1030.0000.0440.0000.1420.0440.1170.0000.0000.0000.0240.0001.0000.1060.0300.000
idade0.0290.8490.2230.1780.2890.1110.0640.0410.0270.0980.0770.0000.0000.0080.0000.0530.0000.2570.1810.0000.2030.1340.0650.1061.0000.0590.000
HISTOPATOL0.0510.1310.0000.0220.0540.0060.0000.3610.2860.0470.2000.1630.0730.1180.0000.0000.0000.0000.0000.0000.0000.0000.0000.0300.0591.0000.000
Status_Resistencia0.1200.0220.2050.0900.0650.0000.0000.1190.0970.1340.0470.0570.4660.0560.0000.0000.0000.0000.0000.0000.0000.0890.0000.0000.0000.0001.000

Cluster 3

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_Resistencia
racaCor1.0000.0000.0000.0000.0000.0000.0000.0000.0000.0980.0000.0520.1010.0000.0000.2000.1740.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
faixaEtaria0.0001.0000.0000.2030.3670.2220.0000.0000.0000.0000.1110.0000.0810.0000.0000.3980.3010.5540.0000.0000.0300.0000.2380.0000.7860.1230.080
sexo0.0000.0001.0000.1440.4420.0980.0000.0000.0000.0000.1430.0000.0840.0000.0000.1780.1980.0000.0520.0000.1710.0000.0000.0000.0000.0000.000
ESCOLARID0.0000.2030.1441.0000.0830.0000.1880.0000.0000.0000.0870.0000.0000.0740.0000.0980.0880.0000.1420.2690.2030.1600.0000.0000.1910.0000.000
TIPOCUP0.0000.3670.4420.0831.0000.0420.0000.0000.1380.0000.1020.0410.0000.1220.0000.0000.0000.2020.0000.0000.2150.0780.5650.0880.1170.0000.000
sitAtual0.0000.2220.0980.0000.0421.0000.1320.1270.0460.1290.0330.0970.1640.2650.0000.1590.1400.0000.0000.0000.3140.0000.0000.0740.2030.1020.022
tipoCaso0.0000.0000.0000.1880.0000.1321.0000.0000.1530.0000.0000.0210.0210.0000.0000.0000.0000.0000.2300.2140.3050.0000.0000.0000.1050.0000.000
FORMACLIN10.0000.0000.0000.0000.0000.1270.0001.0000.7980.0000.2440.2920.2860.2760.0000.3190.3570.0000.0000.0000.0000.0000.0000.0000.0000.0660.207
classif0.0000.0000.0000.0000.1380.0460.1530.7981.0000.0900.2370.3160.2870.1760.0000.2220.2820.0940.0940.0000.0770.0660.0540.0000.0850.2450.330
descoberta0.0980.0000.0000.0000.0000.1290.0000.0000.0901.0000.0000.0880.1140.1270.0000.0000.0000.0000.0000.0000.0000.2230.0000.1480.0150.0000.150
bac0.0000.1110.1430.0870.1020.0330.0000.2440.2370.0001.0000.1210.3390.1720.0780.1340.2170.0770.1310.0000.0730.0280.0380.2280.0540.0000.113
BACOUTRO0.0520.0000.0000.0000.0410.0970.0210.2920.3160.0880.1211.0000.1480.0570.0000.0830.0780.0260.0000.0000.0000.1460.0940.0000.0000.1660.000
cultEsc0.1010.0810.0840.0000.0000.1640.0210.2860.2870.1140.3390.1481.0000.0000.0000.2950.3040.0710.0000.0990.0000.0890.1170.0000.0000.1150.451
RX0.0000.0000.0000.0740.1220.2650.0000.2760.1760.1270.1720.0570.0001.0000.0000.0000.0000.0000.2280.0000.0000.0410.1040.0920.0950.0000.033
NECROP0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0780.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
hiv0.2000.3980.1780.0980.0000.1590.0000.3190.2220.0000.1340.0830.2950.0000.0001.0000.8960.1300.0000.0000.0880.0180.1960.1860.2840.2070.293
aids0.1740.3010.1980.0880.0000.1400.0000.3570.2820.0000.2170.0780.3040.0000.0000.8961.0000.0000.0000.0000.0460.0000.1610.2070.1380.2410.235
DIABETES0.0000.5540.0000.0000.2020.0000.0000.0000.0940.0000.0770.0260.0710.0000.0000.1300.0001.0000.0000.0000.0000.0000.0000.0000.0130.0000.094
ALCOOLISMO0.0000.0000.0520.1420.0000.0000.2300.0000.0940.0000.1310.0000.0000.2280.0000.0000.0000.0001.0000.0000.3800.2020.0000.0500.0000.0630.000
MENTAL0.0000.0000.0000.2690.0000.0000.2140.0000.0000.0000.0000.0000.0990.0000.0000.0000.0000.0000.0001.0000.0000.0000.0000.0000.0340.2980.000
DROGADICAO0.0000.0300.1710.2030.2150.3140.3050.0000.0770.0000.0730.0000.0000.0000.0000.0880.0460.0000.3800.0001.0000.1960.0000.0000.1330.0000.000
TABAGISMO0.0000.0000.0000.1600.0780.0000.0000.0000.0660.2230.0280.1460.0890.0410.0000.0180.0000.0000.2020.0000.1961.0000.0000.0000.0000.0180.000
motMudEsquema0.0000.2380.0000.0000.5650.0000.0000.0000.0540.0000.0380.0940.1170.1040.0000.1960.1610.0000.0000.0000.0000.0001.0000.0650.1550.2300.000
tipoTrat0.0000.0000.0000.0000.0880.0740.0000.0000.0000.1480.2280.0000.0000.0920.0000.1860.2070.0000.0500.0000.0000.0000.0651.0000.1080.0000.000
idade0.0000.7860.0000.1910.1170.2030.1050.0000.0850.0150.0540.0000.0000.0950.0000.2840.1380.0130.0000.0340.1330.0000.1550.1081.0000.0000.075
HISTOPATOL0.0000.1230.0000.0000.0000.1020.0000.0660.2450.0000.0000.1660.1150.0000.0000.2070.2410.0000.0630.2980.0000.0180.2300.0000.0001.0000.106
Status_Resistencia0.0000.0800.0000.0000.0000.0220.0000.2070.3300.1500.1130.0000.4510.0330.0000.2930.2350.0940.0000.0000.0000.0000.0000.0000.0750.1061.000

Missing values

Cluster 1

2023-08-25T13:07:27.990925image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
A simple visualization of nullity by column.

Cluster 2

2023-08-25T13:07:32.893585image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
A simple visualization of nullity by column.

Cluster 3

2023-08-25T13:07:37.326814image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
A simple visualization of nullity by column.

Cluster 1

2023-08-25T13:07:28.501608image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Cluster 2

2023-08-25T13:07:33.456274image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Cluster 3

2023-08-25T13:07:37.837191image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Cluster 1

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
0Branco20_29MDe 4 a 7 anosOutraCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizPosSusp c/cavidN/realizNegNNNNNNNuloSupervisionado23_39N/realiz10
1Pardo40_49MDe 4 a 7 anosDesempregadoCuraNovoPulP+EDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNNSNuloSupervisionado40_54N/realiz10
3Branco50_59MDe 4 a 7 anosOutraCuraNovoPulP+EDemanda AmbulatorialPosN/realizPosSusp TBN/realizPosSNNNNNNuloSupervisionadoMais de 54N/realiz10
5Pardo20_29FDe 8 a 11 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizPosNNNNSNNuloSupervisionado0_22N/realiz10
6Branco30_39MDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp TBN/realizNegNNNNNSNuloSupervisionado23_39N/realiz10
8Pardo60_69MDe 8 a 11 anosAposentadoCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizPosSusp c/cavidN/realizNegNNNNNNNuloSupervisionadoMais de 54N/realiz10
11Preto20_29MDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialN/realizN/realizNegSusp TBN/realizNegNNNNSSNuloSupervisionado0_22N/realiz00
14Preto20_29MDe 4 a 7 anosDesempregadoAbandonoNovoPulPulUrgencia / EmergenciaPosN/realizPosSusp c/cavidN/realizNegNNNNNSNuloSupervisionado0_22N/realiz10
15Branco20_29FDe 4 a 7 anosOutraCuraNovoPulPulUrgencia / EmergenciaPosN/realizPosSusp TBN/realizNegNNSNNNNuloSupervisionado23_39N/realiz10
17Preto40_49FDe 12 a 14 anosDona de CasaCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNNNNuloSupervisionado40_54N/realiz10

Cluster 2

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
12Branco40_49F15 anos e maisOutraCuraNovoGanglionar PerifericaExtDemanda AmbulatorialNegN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado40_54N/realiz01
13Branco40_49MDe 1 a 3 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizN/realizN/realizNegNNSNSSNuloSupervisionado40_54N/realiz01
16Branco40_49FDe 8 a 11 anosDona de CasaCuraNovoPleuralExtDemanda AmbulatorialN/realizN/realizN/realizN/realizN/realizNegNNNNNNNuloSupervisionado40_54BAAR pos01
23Branco50_59FDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNNNNuloAuto-AdministradoMais de 54N/realiz01
25Pardo10_14FDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialNegN/realizNegSusp c/cavidN/realizNegNNNNNNNuloSupervisionado0_22N/realiz01
29Pardo50_59MDe 1 a 3 anosOutraAbandonoNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionadoMais de 54N/realiz01
33Branco15_19FDe 4 a 7 anosOutraCuraNovoPulPulUrgencia / EmergenciaPosN/realizPosNormalN/realizNegNNSNSNNuloSupervisionado0_22Sugestivo TB01
36Branco15_19FDe 8 a 11 anosDona de CasaCuraNovoPulPulDemanda AmbulatorialNegN/realizNegSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz01
41Branco40_49MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado40_54N/realiz01
42Branco40_49FDe 12 a 14 anosProfissional de SaudeCuraRecidivaPulPulDemanda AmbulatorialPosN/realizPosN/realizN/realizNegNNNNNNNuloAuto-Administrado40_54N/realiz01

Cluster 3

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
2Pardo40_49MDe 8 a 11 anosOutraCuraNovoPulP+EElucidacao Diagn. em InternacaoN/realizPosN/realizSusp TBN/realizPosSNNNNNNuloAuto-Administrado40_54BAAR pos12
4Branco30_39MDe 4 a 7 anosOutraCuraNovoPulP+EUrgencia / EmergenciaPosN/realizN/realizSusp TBN/realizPosSNNNSNNuloSupervisionado23_39Sugestivo TB02
7Branco20_29MDe 8 a 11 anosDesempregadoAbandonoNovoPulPulDemanda AmbulatorialNegN/realizNegSusp c/cavidN/realizPosSNSNSNNuloSupervisionado23_39N/realiz02
9Branco20_29MDe 1 a 3 anosOutraCuraNovoPleuralExtElucidacao Diagn. em InternacaoNegN/realizNegSusp TBN/realizN/realizNNNNNNNuloAuto-Administrado0_22Sugestivo TB02
10Preto30_39MDe 8 a 11 anosOutraCuraNovoPulP+EElucidacao Diagn. em InternacaoN/realizPosN/realizSusp TBN/realizPosSNNNNNNuloAuto-Administrado23_39N/realiz12
18Branco30_39M15 anos e maisOutraCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado40_54N/realiz02
21Preto30_39MDe 4 a 7 anosOutraAbandonoNovoPulP+EElucidacao Diagn. em InternacaoNegN/realizPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado23_39N/realiz12
24Pardo50_59MDe 8 a 11 anosProfissional de SaudeCuraNovoPulP+EElucidacao Diagn. em InternacaoNegN/realizPosN/realizN/realizPosSNNNNNResistencia MedicamentosaSupervisionadoMais de 54N/realiz12
26Branco30_39MDe 4 a 7 anosOutraAbandonoNovoPulP+EElucidacao Diagn. em InternacaoPosPosPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado23_39N/realiz12
28Branco30_39MDe 8 a 11 anosOutraCuraNovoPleuralExtElucidacao Diagn. em InternacaoNegN/realizNegSusp TBN/realizNegNNNNNNNuloAuto-Administrado23_39N/realiz02

Cluster 1

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
1512Pardo30_39MDe 8 a 11 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp TBN/realizNegNNSNNNNuloSupervisionado23_39N/realiz10
1516Branco20_29MDe 8 a 11 anosOutraCuraRecidivaPulPulDemanda AmbulatorialPosN/realizPosSusp c/cavidN/realizNegNNNNNNNuloSupervisionado0_22N/realiz00
1517Preto60_69MDe 4 a 7 anosAposentadoCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp TBN/realizNegNNSNNSNuloSupervisionadoMais de 54N/realiz10
1518Preto20_29MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizPosNormalN/realizNegNNNNNNNuloSupervisionado0_22N/realiz00
1521Pardo30_39MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp TBN/realizNegNNSNSNNuloSupervisionado23_39N/realiz00
1526Pardo20_29FDe 4 a 7 anosDona de CasaAbandonoNovoPulPulUrgencia / EmergenciaPosN/realizPosSusp TBN/realizNegNNSNSSNuloSupervisionado0_22N/realiz10
1527Branco50_59MDe 4 a 7 anosOutraAbandonoNovoPulPulUrgencia / EmergenciaPosN/realizPosNormalN/realizN/realizNSNNNNNuloSupervisionadoMais de 54N/realiz10
1531Indigena50_59MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizPosN/realizN/realizNegNNNNNNNuloSupervisionadoMais de 54N/realiz10
1533Pardo30_39MDe 8 a 11 anosOutraCuraRecidivaPulPulElucidacao Diagn. em InternacaoPosN/realizPosSusp TBN/realizNegNNNNNNNuloSupervisionado23_39N/realiz00
1534Pardo30_39MDe 1 a 3 anosDesempregadoCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizPosN/realizN/realizPosSNSNSNNuloSupervisionado23_39N/realiz10

Cluster 2

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
1507Branco40_49F15 anos e maisProfissional de SaudeCuraNovoPulPulDemanda AmbulatorialNegN/realizPosSusp c/cavidN/realizNegNNNNNSNuloAuto-Administrado40_54N/realiz11
1509Branco20_29MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizNegSusp TBN/realizNegNNNNNNNuloSupervisionado23_39N/realiz01
1511Branco50_59FDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizPosN/realizN/realizNegNNNNNNNuloSupervisionadoMais de 54N/realiz11
1514Branco50_59MDe 8 a 11 anosOutraCuraNovoGanglionar PerifericaExtDemanda AmbulatorialNegN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionadoMais de 54N/realiz01
1520Branco20_29MDe 1 a 3 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz01
1522Pardo40_49MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado40_54N/realiz01
1524Branco50_59MDe 4 a 7 anosOutraCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizN/realizSusp TBN/realizNegNNSNNNNuloSupervisionadoMais de 54N/realiz01
1529Branco40_49MDe 1 a 3 anosOutraCuraNovoPulPulDemanda AmbulatorialNegN/realizNegSusp TBN/realizNegNNSNNNNuloSupervisionado40_54N/realiz01
1530Branco70_79FDe 4 a 7 anosAposentadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNNNNuloSupervisionadoMais de 54N/realiz01
1535Branco40_49MDe 4 a 7 anosOutraCuraNovoPulPulUrgencia / EmergenciaPosN/realizN/realizSusp c/cavidN/realizNegNSNNNNNuloSupervisionado40_54N/realiz01

Cluster 3

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster
1505Branco30_39MDe 1 a 3 anosOutraCuraNovoGanglionar PerifericaExtElucidacao Diagn. em InternacaoN/realizN/realizN/realizNormalN/realizNegNNNNNNNuloAuto-Administrado40_54Sugestivo TB02
1506Branco30_39MDe 12 a 14 anosOutraCuraNovoGanglionar PerifericaExtDemanda AmbulatorialN/realizN/realizN/realizNormalN/realizPosSNNNNNNuloSupervisionado23_39N/realiz02
1510Branco40_49MDe 4 a 7 anosOutraCuraNovoPulP+EDemanda AmbulatorialN/realizN/realizPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado40_54N/realiz12
1513Pardo20_29MDe 4 a 7 anosOutraCuraNovoPulP+EElucidacao Diagn. em InternacaoNegN/realizPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado0_22N/realiz12
1515Branco40_49FDe 4 a 7 anosOutraCuraNovoPulP+EDemanda AmbulatorialN/realizPosPosSusp TBN/realizPosSNNNNNNuloSupervisionado40_54BAAR pos12
1519Preto30_39FDe 12 a 14 anosOutraAbandonoNovoPulPulElucidacao Diagn. em InternacaoPosN/realizPosSusp TBN/realizPosSNSNNNNuloAuto-Administrado23_39N/realiz12
1523Preto40_49FDe 8 a 11 anosOutraCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizN/realizSusp TBN/realizPosSSNNNNNuloAuto-Administrado40_54N/realiz02
1525Pardo30_39MDe 12 a 14 anosOutraCuraRecidivaPulP+EDemanda AmbulatorialNegN/realizPosSusp TBN/realizPosSNNNNNNuloAuto-Administrado23_39N/realiz12
1528Pardo20_29FDe 4 a 7 anosOutraAbandonoNovoPulP+EElucidacao Diagn. em InternacaoNegNegPosN/realizN/realizPosSNNNNNNuloAuto-Administrado0_22N/realiz12
1532Preto01_04MNenhumaOutraCuraNovoMeningeaExtUrgencia / EmergenciaNegNegNegNormalN/realizN/realizNNNNNNNuloAuto-Administrado0_22Sugestivo TB02

Duplicate rows

Cluster 1

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster# duplicates
112Branco20_29MDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz102
207Branco30_39MDe 4 a 7 anosDesempregadoCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizPosSusp TBN/realizNegNNSNSNNuloSupervisionado23_39N/realiz102
0Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz100
1Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNNNNuloSupervisionado23_39N/realiz100
2Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNSNNuloSupervisionado0_22N/realiz100
3Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNNNSNNuloSupervisionado23_39N/realiz100
4Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNNNNuloSupervisionado0_22N/realiz100
5Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNNNNuloSupervisionado23_39N/realiz100
6Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNSNNuloSupervisionado0_22N/realiz100
7Branco20_29FDe 4 a 7 anosDesempregadoCuraNovoPulPulDemanda AmbulatorialPosN/realizPosSusp TBN/realizNegNNSNSNNuloSupervisionado23_39N/realiz100

Cluster 2

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster# duplicates
1272Branco20_29FDe 8 a 11 anosOutraCuraNovoPulPulUrgencia / EmergenciaPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloAuto-Administrado0_22N/realiz012
1623Branco20_29MDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizN/realizN/realizNegNNNNNNNuloSupervisionado0_22N/realiz012
1671Branco20_29MDe 8 a 11 anosOutraCuraNovoPulPulElucidacao Diagn. em InternacaoPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz012
2068Branco30_39FDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado23_39N/realiz012
4227Pardo15_19MDe 8 a 11 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado0_22N/realiz012
5404Pardo30_39FDe 8 a 11 anosDona de CasaCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizN/realizN/realizNegNNNNNNNuloSupervisionado23_39N/realiz012
5596Pardo30_39FDe 8 a 11 anosOutraCuraNovoPulPulUrgencia / EmergenciaPosN/realizN/realizSusp TBN/realizNegNNNNNNNuloSupervisionado23_39N/realiz012
6623Pardo40_49MDe 4 a 7 anosOutraCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizSusp c/cavidN/realizNegNNSNNNNuloSupervisionado40_54N/realiz012
0Branco15_19FDe 4 a 7 anosDona de CasaCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizN/realizN/realizNegNNNNNNNuloAuto-Administrado0_22N/realiz010
1Branco15_19FDe 4 a 7 anosDona de CasaCuraNovoPulPulDemanda AmbulatorialPosN/realizN/realizN/realizN/realizNegNNNNNNNuloAuto-Administrado23_39N/realiz010

Cluster 3

racaCorfaixaEtariasexoESCOLARIDTIPOCUPsitAtualtipoCasoFORMACLIN1classifdescobertabacBACOUTROcultEscRXNECROPhivaidsDIABETESALCOOLISMOMENTALDROGADICAOTABAGISMOmotMudEsquematipoTratidadeHISTOPATOLStatus_ResistenciaCluster# duplicates
Dataset does not contain duplicate rows.