Imagine there’s a sequence of operations you need to perform on a dataset, and this dataset is very large. There is absolutely no way the entire dataset could fit in memory. Instead you need to store the dataset on disk. In fact each intermediate calculation step needs to be persisted to disk, and read back to perform the next step in the sequence of operations. Of course all of this reading, processing and writing needs to be parallelised to be as fast as possible. Data should be stored in shards. Multiple shards could be read from or written to simultaneously, and disk IO and CPU processing should be done in separate threads. All in all it gets fairly complicated…

BatchNode encapsulates and simplifies much of the complexity in doing all that. Specifically a BatchNode represents (temporary) state that need to be persisted before it’s possible to continue with the next step. Most likely there will be a series of such nodes created before the calculations are done. For each node the key thing you need to specify is what to persist – how to (de)serialise or (un)marshal the data. Then there’s a set of configurations to tweak for best performance, but all of those have workable defaults.

Probably best to just look at he example code below. In the example BatchNode is used to preprocess data and train a neural network. Other examples have shown how to do this without the use of BatchNode, and the size of the dataset is certainly not big enough to warrant its use. The example still uses it only to demonstrate the basics on how to use it.

Example Code

Console Output

class FashionMNISTWithBatchNode
ojAlgo
2022-05-12

Parsed IDX training data files: 1.333656766s
Initial training data: 2.077556784s
Scaled training data: 2.890675091s
Duplicated training data: 78.529841326s
Randomised training data: 124.459885072s
There are 768000 T-shirt/top instances in the scaled/duplicated/randomised traing set.
There are 768000 Trouser instances in the scaled/duplicated/randomised traing set.
There are 768000 Pullover instances in the scaled/duplicated/randomised traing set.
There are 768000 Dress instances in the scaled/duplicated/randomised traing set.
There are 768000 Coat instances in the scaled/duplicated/randomised traing set.
There are 768000 Sandal instances in the scaled/duplicated/randomised traing set.
There are 768000 Shirt instances in the scaled/duplicated/randomised traing set.
There are 768000 Sneaker instances in the scaled/duplicated/randomised traing set.
There are 768000 Bag instances in the scaled/duplicated/randomised traing set.
There are 768000 Ankle boot instances in the scaled/duplicated/randomised traing set.
Sample set Size=10, Mean=768000.0, Var=0.0, StdDev=0.0, Min=768000.0, Max=768000.0
Training data verified: 147.036370166s
Done 1000000 training iterations: 248.466925026s
Done 2000000 training iterations: 350.38887831s
Done 3000000 training iterations: 457.541165275s
Done 4000000 training iterations: 569.745276533s
Done 5000000 training iterations: 692.703884745s
Done 6000000 training iterations: 811.113295485s
Done 7000000 training iterations: 929.132814585s
Training done: 1017.177961434s
Parsed IDX test data files: 1020.346941825s

Image 0: Ankle boot <=> Ankle boot
                            
                            
                            
                            
                            
                            
                            
                            
                         +  
                +++     ++  
                +++X++++++  
               ++++X++X+++  
              ++++++++++++  
             +++++++++X+++  
            +++++++++++++++ 
           +++++++++++++++X 
         ++++++++++X+++++++ 
      ++++++++++++++++++X+X 
  ++++++++++++++++++++++++X 
 +++++++++++++++++XXXXXXXXX 
 +XXXXX++++++XXXXXXXXXXXXXX 
     ++XXXXXXX+     XXXX+++ 
                            
                            
                            
                            
                            
                            

Image 1: Pullover <=> Pullover
                            
        +XXXXXXXXXXX        
      +XXXXXXXXXXXXXXX      
     +XXXXXXXXXXXXXXXXX     
     XXXXXXXXXXXXXXXXXX     
     XXXX+ XXXXXXXXXXXX     
     XXXX  XXX  XX +XXX+    
     XXXXX XX ++    XXX+    
     XXXXX X        XXXX    
     XXXXX X  XXX XXXXXX    
     XXXXX   +  X+ +XXXX    
    +XXXXX++XXXXXX XXXXX    
    +XXXXXXXXXXXXXXXXXXX    
    +XXXXXXXXXXXXXXXXXXX    
    +XXX+XXXXXXXXXXX+XXX    
    XXXX XXXXXXXXXXX XXX    
    XXXX+XXXXXXXXXXX XXX    
    XXX++XXXXXXXXXXX XXX+   
    XXX+XXXXXXXXXXXX XXX+   
    XXX+XXXXXXXXXXXX XXX+   
    XXX+XXXXXXXXXXXX+XXX+   
    XXX+XXXXXXXXXXXX++XX+   
    XXXXXXXXXXXXXXXXX+XX+   
    XXX XXXXXXXXXXXXX XX+   
    XXX XXXXXXXXXXXX+ XXX   
    +XX               XX+   
     XX               XX+   
     XX               XX    

Image 2: Trouser <=> Trouser
           X++++++X         
          XXXXXXXXX         
          XXXXXXXXX+        
          XXXXXXXXX+        
          XXXXXXXXX+        
          XXXXXXXXXX        
         +XXXXXXXXXX        
         +XXXXXXXXX+        
         XXXXX XXXX+        
         XXXX+ XXXX+        
         XXXX  XXXX+        
         XXXX  XXXX+        
         XXXX  +XXX+        
         XXXX  +XXX+        
         XXXX   XXX+        
         XXX+   XXX+        
         XXX+   XXX+        
         XXX+   XXX+        
         +XXX   XXX+        
         +XXX   XXX+        
          XXX   XXX+        
          XXX   XXX+        
          XXX   XXX+        
          +XX+  XXX         
           XXX  XXX         
           XXX  +XX+        
           XXX  +XX+        
           ++    X+         

Image 3: Trouser <=> Trouser
          ++++  ++          
         +XXXXXXXX++        
         +XXXXXXXX++        
         +XXXXXXXX+         
         +XXXXXXXX+         
         ++XXXXXXX+         
         ++++XXX+++         
          +  ++X+++         
          +++++X++          
          ++++ X+++         
          ++++ X++          
          ++++ X++          
          ++++ X+           
          ++++ X+           
          + ++ X+           
          + ++ X+           
          ++++ +X           
          + X  +X           
          XXX   X++         
          XXX   XX+         
          XXX   XX+         
          XXX   XX+         
          XXX   XX+         
          XXX   XX+         
          XXX   XX+         
          +XX   XX          
          +XX   XX          
           X+   ++          

Image 4: Shirt <=> Shirt
                            
          +X++XXXXX+        
       +++++++++++++X+      
      +++++++X+++++++++     
      ++++++ +X+++++++++    
      +++++++ XX++++++++    
     ++++++++ +++++++++X    
     ++++ ++++++ ++++X+X    
     ++++ ++++++ ++++X+X    
     ++++++++++++++++X+X+   
    ++++++++++++++++XX+++   
    +++++++++++++++++X++X   
    +++X+++++++++++++X++X   
    +++X +++++++++++ X++X   
    +++X +++++++++++ X++X   
    +++X +++++++++++ X++X   
    ++++ +++++++++++ X++X   
    ++++ ++++++++++X +++X   
    ++++ ++++++++++X ++++   
   +++X  ++++++++++X++++X+  
   +++X  ++++++++++X+ X+++  
        ++++++++++++X       
        ++++++++++++X       
        ++++++++++++++      
       +++++++++++++++      
       +++++ ++++++++X      
        +XX++X+++XXX+       
            ++++++          

Image 5: Trouser <=> Trouser
          XXXX++XXXX        
          XXXXXXXXXX        
          XXXXXXXXXX        
         +XXXXXXXXXX        
         XXXXXXXXXXX+       
         XXXXXXXXXXX+       
         XXXXXXXXXXX+       
         XXXXXXXXXXX+       
        +X+XXX XXXXX+       
        +XXXXX +XXXX+       
        +XXXX+  XXXX+       
        XXXXX   XXXX+       
        XXXXX   XX+X+       
        +XXX+   XX+X+       
        +XXX    XX+++       
        +XXX    +X+++       
        +XXX     X+++       
         XXX     X+++       
         XXX+    X+X+       
         XXXX    XXX+       
         +XXX    XXX+       
          XXX    XXX+       
          XXX+   XXX+       
          +XX+   XXX+       
           XXX   +XX+       
           XXX   +XX+       
           +XX+  +XX+       
            ++    ++        

Image 6: Coat <=> Coat
                            
          XXXXXXX+          
         ++XX+XXX           
          + XXXX            
         ++  XX             
       +                    
                            
        +    +              
      + +                   
        +                   
        ++                  
        ++                  
        X          +        
        X          X        
        X          X        
        X          X+       
        X          X+       
       +X          XX       
       +X          XX       
       +X          XX       
       +X          +X       
       XX          +X       
       X+          +X       
       X+          +X       
       X            X       
       X            X+      
       +            +       
                            

Image 7: Shirt <=> Shirt
            +++++           
            XXXXX           
             XXX+ +         
        +++  +++  ++        
      + ++  +++++  +++      
      X     +   +     +     
     + +X+++X+++X +++       
     +      +  ++ +++ +     
     X+     +   +    ++     
     +++++  + + +  + ++     
     +  ++ +X ++X +++++     
     X  ++  X +++ +++  +    
     X+X    +   +   X+X+    
     + +++++X ++X ++X +     
     + +++++X+++X +++ ++    
    +++X    +       X+++    
    +++X +  + +++   ++++    
       X +  + +++   ++ +    
    + ++ + +X +++ ++++ +    
    X+XX    +  ++    XXX+   
      ++    +        +      
    + + +X++X++XX++X++++    
    + +     + +++    +++    
    ++X     +   +    +++    
      + ++ ++++++  +  ++    
     +X ++ ++++++  +  ++    
     +X  + X++++X  + +X+    
      +      +              

Image 8: Sandal <=> Sandal
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            
          + +               
       XXXXXX++X            
      X++  ++XXXXX          
     X         +XXX++++++++ 
     +    +++++++  ++++++++ 
  ++++++++++                
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            

Image 9: Sneaker <=> Sneaker
                            
                            
                            
                            
                            
                            
                            
                            
                            
              +X+X       +X 
             X++++X     +XX 
           X++ +++XX+  +XXX 
         X+  ++++++XXXXXX+X 
       +X  +++++++++++XX++X 
 +++++++++++++++XXXXX++++XX 
+XXXX+++++++XXXXXXXX+XXXXXX 
 ++XXXXX+XXXXXX+++X         
        +                   
                    +       
                            
                            
                            
                            
                            
                            
                            
                            
                            
Done: 1022.108845606s or 17.035148605216666min

===========================================================
Error rate: 0.1089

The Fashion MNIST dataset

https://github.com/zalandoresearch/fashion-mnist

The Fashion MNIST dataset is a drop in alternative to the original MNIST dataset. The idea is to enable testing a model developed for the MNIST dataset on something harder. The images are still grayscale 28×28 pixels, but instead of handwritten digits the images show “photographs” of clothing and accessories. Just as before there are 10 categories of “fashion”. Here are some example images:


At the Fashion-MNIST GitHub page they publish benchmark results comparing 129 classifiers on both the original MNIST and this Fashion-MNIST datasets. The best result on the original dataset is 97.8%, and 89.7% on the fashion dataset. The ojAlgo neural network got 98.1% and 89.1% respectively. (Results are not exactly the same each time you train the network, but those are the numbers published in the blog posts.)

The neural networks used in the ojAlgo examples are very simple – just 1 hidden layer with 200 nodes – and in this case it was only trained for 17min (on my laptop).