Fabric Composition Identification using Fine-Tuned Vision Transformers

Given the rising trend in retail-based e-commerce from both the producer as well as the consumer, fabric composition identification has gained recent interest. While there are various solutions to this study, they all lack a pragmatic approach and simplicity in design and operation due to the use of NIR sensors and 3D tactile sensors. This research proposes a simple and effective method that requires no extravagant tools and can be tested on photographs taken using a smartphone. To generate a feature vector, a fine-tuned ViT model is employed for feature extraction. PCA-LDA is used to process the features, which are then supplied into an SVM for training and classification. Finally, the generated probabilistic values are calibrated with DGG and used to make predictions. For pure fabrics, the model produced an average F-score of 0.87, log-loss of 0.44, and mean squared error of 0.20. These findings are expanded to include multi-label categorization and composition identification.

Keywords: Fabric Composition, Vision Transformers, Transfer Learning, PCA-LDA, Probability Calibration, SHAP