Supplementary Information Active learning framework with iterative clustering for bioimage classification Natsumaro Kutsuna, Takumi Higaki, Sachihiro Matsunaga, Tomoshi Otsuki, Masayuki Yamaguchi, Hirofumi Fujii & Seiichiro Hasezawa Supplementary Software 1 1 Supplementary Software 1 | Pseudocode of CARTA algorithm. Core routines of CARTA are shown in List 1–4. global parameters N: number of input images P: population size (number of individuals) in genetic algorithm (GA) List 1 1 function CARTA(images) do 2 for i ← 1 to N do 3 vectors[i] ← feature vector extracted from images[i] // Feature Extractor in Fig.1a 4 end for 5 // select features & annotated subset of images 6 selector, annotatedVectors, annotatedLabels ← iterativeClustering(vectors, images) // List 2 7 display selector to user 8 // perform supervised learning and cross-validation 9 classifierSub, accuracySub ← trainAndValidate(project(selector, annotatedVectors), annotatedLabels) // Lists 7 & 9 10 classifierFull, accuracyFull ← trainAndValidate(annotatedVectors, annotatedLabels) // List 7 11 // classify all images 12 if accuracyFull > accuracySub then 13 labels ← classify(classifierFull, vectors) // use full set of features, List 8 14 else 15 labels ← classify(classifierSub, project(selector, vectors)) // use selected features, Lists 8 & 9 16 end if 2 17 return labels 18 end function List 2 1 function iterativeClustering(vectors, images) do 2 // constant L: criteria to stop the iteration of GA 3 generation ← 1 4 annotatedVectors ← empty 5 annotatedLabels ← empty 6 peakGeneration ← 1 7 peakFitness ← 0 // minimum value of fitness value 8 makeFirstGeneration(population) // randomly initialize individuals, List 5 9 peakSelector ← featureSelector of population[1] 10 repeat do 11 foreach individual ∊ population do 12 evaluate(individual, vectors, annotatedVectors, annotatedLabels) // Feature Evaluator in Fig.1a, List 3 13 end foreach 14 bestIndividual ← individual assigned best fitness in population 15 currentFitness ← fitness of bestIndividual 16 display currentFitness & featureSelector of bestIndividual to user 17 if currentFitness > peakFitness then // better solution found 18 peakFitness ← currentFitness 19 peakGeneration ← generation 20 peakSelector ← featureSelector of bestIndividual 21 else if (annotatedLabels ≠ empty) and (generation peakGeneration > L) or (interrupted by user) then 22 return peakSelector, annotatedVectors, annotatedLabels 23 end if 24 newAnnotatedImages, newAnnotatedLabels ← acceptAnnotation(peakSelector, vectors, images) // List 4 3 25 if newAnnotatedImages ≠ empty then 26 peakFitness ← 0 // minimum value of fitness value 27 peakGeneration ← generation 28 peakSelector ← featureSelector of bestIndividual 29 for i ← 1 to N do 30 if images[i] in newAnnotatedImages then 31 append vectors[i] to annotatedVectors 32 end if 33 end for 34 append newAnnotatedLabels to annotatedLabels 35 end if 36 population ← makeOffsprings(population) // Feature Optimizer in Fig.1a, List 6 37 generation ← generation + 1 38 end repeat 39 end function List 3 1 procedure evaluate(individual, vectors, annotatedVectors, annotatedLabels) do // Feature Evaluator in Fig.1a 2 if annotatedLabels is empty then // unsupervised situation 3 fitness ← 1 4 else // semi-supervised situation 5 vectorsInSubspace ← project(featureSelector of individual, vectors) // List 9 6 som ← train self-organizing map (SOM) using vectorsInSubspace 7 fitness ← 0 8 foreach class ∊ classes of annotatedLabels do 9 classVectorsInSubspace ← project(featureSelector of individual, vectors labeled as class in annotatedVectors) // List 9 10 for i ← 1 to number of classVectorsInSubspace do 11 classPoints[i] ← location of best matching unit (BMU) in som to classVectorsInSubspace[i] 4 12 13 14 15 // location: f(x) in Q1×Q2 defined in equations (1, 2) end for classTree ← construct minimum spanning tree (MST) which connects all classPoints 𝟏 fitness ← fitness + 𝟏+ ∑ // compact tree yields high fitness |𝒂𝒓𝒄| 16 17 18 19 20 21 22 end foreach end if for i ← 1 to N do allLocation[i] ←location of BMU in som to vectorsInSubspace[i] // location: f(x) in Q1×Q2 end for allTree ← construct MST which connects allLocations 𝒇𝒊𝒕𝒏𝒆𝒔𝒔 fitness ← 𝟏+ ∑ // adjust fitness by occupancy of SOM nodes |𝒂𝒓𝒄| 𝒂𝒓𝒄∈𝒄𝒍𝒂𝒔𝒔𝑻𝒓𝒆𝒆 𝒂𝒓𝒄∈𝒂𝒍𝒍𝑻𝒓𝒆𝒆 23 assign fitness to individual 24 end procedure List 4 1 function acceptAnnotation(featureSelector, vectors, images) do 2 vectorsInSubspace ← project(featureSelector, vectors) // List 9 3 som ← train SOM using vectorsInSubspace 4 for i ← 1 to N do 5 location ← location of BMU in som to vectorsInSubspace[i] // location: f(x) in Q1×Q2 6 assign location to images[i] 7 end for 8 foreach node ∊ som do // display tiled images of SOM 9 location ← location of node 10 imagesAtXy ← get images which assigend to location from images 11 display one of imagesAtXy as the tile of image at location 12 end foreach 5 13 if inputs from user are exist then 14 return annotated images by user, annotated labels by user 15 else 16 return empty, empty 17 end if 18 end function 6