MNIST ML Pipeline Assignment Report

Assignment Report 1 Introduction Goal: designed and implemented a reliable Machine Learning pipeline for the MNIST dataset, integrating key components such as model storage, a testing cases, and an API interface. This report elaborates on the structure, components, and functionality of the developed pipeline. 2 Project Structure Root Directory: Lunit_assignment 1. nets: 1. nn.py 2. utils: 1. data_loader.py 2. util.py 3. Core Implementation files 1. trainer 2. evaluater 3. Server.py 4. Client.py 3 nets: ==> nn.py • nn.py: Defines the architecture of the neural network used for MNIST data classification. • Convolutional Layers: •conv1: This is the first convolutional layer which takes a single channel (grayscale) input and produces 32 output channels using a 3x3 kernel. •conv2: The second convolutional layer takes the 32 channels produced by conv1 as input and produces 64 output channels using a 3x3 kernel. • Dropout Layers: •dropout1: This dropout layer is designed to regularize the model by randomly setting a fraction (25%) of the input units to 0 during training, helping to prevent overfitting. •dropout2: A more aggressive dropout layer which sets 50% of its input units to 0 during training. • Fully Connected (Linear) Layers: •fc1: The first linear layer has an input feature size of 9216 and an output size of 128. •fc2: This linear layer reduces the feature size from 128 to 10, corresponding to the ten possible digit classes (0-9) of the MNIST dataset. 4 utils: ==> data_loader.py && util.py • The primary focus of these modules is to facilitate data loading and configuration handling for the MNIST classification task. Converts images to PyTorch tensors (transforms.ToTensor()). Function: This function facilitates the loading of MNIST data: • Arguments: • use_cuda: A flag indicating whether CUDA should be used. • batch_size: The size of batches in which data should be loaded. • Logic: • Based on the use_cuda flag, the function sets specific parameters (num_workers, pin_memory, shuffle) to optimize data loading when using GPU. • Utilizes the transformation returned by get_transform() to preprocess the data. • Loads both training and testing datasets from the ./dataset directory. • Uses DataLoader to prepare batches of data for both 5 trainer.py: • The model training process for MNIST classification. It integrates with the MLflow platform for effective experiment management. Key Components: Initialization: • Determines computation device. • Loads training and test datasets. • Initializes the neural network and optimizer. •Training Loop: • Processes data in batches, computes loss, and updates model weights. • Provides real-time feedback using a progress bar. • Evaluates model performance after each epoch. •Model Logging: • Model is saved as "mnist_model.pt". • Model weights and training artifacts are logged to MLflow for tracking. • For The Training: please run main.py 6 evaluater.py: Functionality: Evaluation Mode: • The model is set to evaluation mode using model.eval(), ensuring batch normalization and dropout layers behave differently from training. •Loss & Accuracy Computation: • Processes test data and computes the loss and predictions without gradients (torch.no_grad()). • Loss is aggregated and accuracy is determined by comparing predictions to actual targets. •Results Display: • Outputs the average loss and accuracy percentages for the test set. •MLflow Logging: • Logs test loss and accuracy metrics to MLflow. 7 server.py: • server.py implements a FastAPI server providing an API interface for training and prediction tasks using the MNIST model. Key Components: Initialization: • load_model(model_path): Loads the MNIST model from a specified path and sets it to evaluation mode. • On server startup, the model is pre-loaded, readying it for inference. •Endpoints: • Training (/train/): Accepts an uploaded configuration file to initiate model training, saving the configuration for reproducibility. Returns a success message upon completion. • Prediction (/predict/): Receives an image file, preprocesses it, and returns the model's digit prediction. •Server Launch: When executed directly, the script uses uvicorn to run the server on all interfaces at port 8000. 8 client.py: • client.py serves as an client interface to a FastAPI backend, catering to MNIST model operations, specifically training and inference. Key Components: Inference Request - request_inference(): • Data Preparation: Loads a test image from the MNIST dataset. • API Interaction: Sends the image to the /predict endpoint of the server to obtain a digit prediction. • Response Handling: Displays the model's prediction and any other associated metadata received from the server. •Training Request - request_train(args): • Configuration File Handling: Reads the configuration file provided through commandline arguments. • API Interaction: Sends the configuration file to the /train endpoint of the server to initiate the model training process. • Response Handling: Outputs the server's response, typically an acknowledgment of the training completion. •Command-Line Interface: • Utilizes the argparse library to allow users to specify whether they want to initiate a training or a testing request. Users can also provide a specific configuration file for the training process. • Error Handling && Execution Mechanism 9 tests: • The tests folder contains checks that ensure parts of the MNIST project work as intended. Key tests (functions): • Test_config_loading.py: Config Loading: • • • Checks if the configuration file loads correctly. • Confirms essential settings like max_epochs, lr, and batch_size are present. Test_data_loader.py: Data_loading: • • • Makes sure data is loaded correctly for training. • Verifies the shape and size of the loaded data batches. test_model.py: • • Model Verification • Initializes the neural network model. • Ensures the model has the expected number of layers or modules. test_tester.py • • Evaluation Test • Validates the model's testing process using sample data. • Uses a basic model and sample dataset to ensure smooth evaluation. For testing : run pytest from terminal (project directory: Lunit_assignment) 10 Result (Train with new config file && Inference) We have to give new config file (.yaml) in that line or by terminal 1 2 We set train default as True for the training, you can give by terminal python client.py --train --config initial_experiment2.yaml 1 2 Note: For training with a new configuration: • Utilize the provided command or alternatively, set the desired configuration file as the default argument within the code. • All configuration files are stored in the configs folder, found in the project's main directory. • They are organized by name for easy reference. Executing Inference: python client.py --test 11

MNIST ML Pipeline Assignment Report

Related documents

Products

Support

MNIST ML Pipeline Assignment Report

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib