Transfer Learning for Crop Classification by Inter-Regional Crop Spectral Differences Hengbin Wang1, Yu Yao1, Zijing Ye1, Wanqiu Chang1, Junyi Liu1, Yuanyuan Zhao1,2*, Shaoming Li1,2, Zhe Liu1,2, Xiaodong Zhang1,2 1College of Land Science and Technology, China Agricultural University, Beijing, China Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China 2Key * Correspondence: zhaoyuanyuan@cau.edu.cn Abstract Transferring a model trained in a region with sufficient samples to another region with sparse samples can solve the problem of classification in sample sparse regions, but the effect of transferring is related to the difference in geographical environment between regions. Therefore, this study proposes a transfer learning strategy that uses Inter-Regional Differences of Crop Spectra (IRDCS) to adapt the interregional differences in geographic environments, as a way to achieve a large scale transfer of classification models. In addition, a new crop classification model named Symmetric convolutional neural network with Position encoding (PoSyNet) is proposed. This model can use the temporal information of critical period of crop growth, as a way to discriminate the differences of the same time period between different crops. A simple and effective method for quantifying IRDCS is designed, and the IRDCS obtained by this method is representative and generalizable. To test the approaches proposed in this study, the Northwest region of China is used as the source domain, with a sufficient number of training samples, and the northeast region of China with fewer samples was used as the transfer target domain. PoSyNet is compared with Transformer, Random Forest (RF), and Convolution+Transformer (CT). In the source domain crop classification experiments, PoSyNet achieved an Overall Accuracy (OA) of 93.14%, an improvement of 3.58%, 4.8%, and 1.82%, respectively, compared to the other three methods, and PoSyNet still dominated the classification results in the target domain. The results showed that PoSyNet had the ability to learn multi-level semantic features and spatial generalization. We found that IRDCS can adapt the differences in geographic environments between regions, and adding IRDCS to model transfer can substantially improve classification accuracy. In the cross-regional transfer experiments, all four methods improved the accuracy of model transfer across years by more than 25% after adding IRDCS. In the scenario without using current-year samples, the accuracy improvement of the four methods was still close to 25% using only the IRDCS of samples of historical years. The results indicated the generality of the proposed transfer strategy. Overall, this study provided a new idea for large-scale crop classification and cross-regional model transfer. Keywords: transfer learning; IRDCS; PoSyNet; quantifying differences; crop classification 1. Introduction Accurate, timely, and reliable crop-type distribution maps provide important support for crop growth monitoring, yield prediction, and food security (Franch et al 2015, Gallego et al 2008). In recent years, remote sensing technology has developed rapidly, and its characteristics of wide observation range and long duration have been widely used in the field of crop monitoring. The high-resolution observation images provided by remote sensing provide a great convenience for large-scale crop classification(Chen et al 2015, Gong et al 2013, Yu et al 2013), and crop classification using the unique spectra of each crop provided by remote sensing has become the most common method(Bargiel 2017, Skakun et al 2017, Zhang et al 2014, Zhong et al 2019a). Ground-labeled samples are an important basis for achieving high accuracy in crop classification, especially when the labeled sample set is very rich, which can further improve the classification accuracy (Ghazaryan et al 2018). Field surveys and sampling of multiple areas are required for large-scale crop classification to meet the demand for classification samples, but such large-scale sample acquisition is labor-intensive and very expensive. Some studies were conducted by visual interpretation for manually labeled sample acquisition through Google Earth high-resolution images(Dong et al 2015, Zhang et al 2015). These methods are feasible for acquiring labeled samples of bulk crops, but require experts with extensive knowledge and are not applicable for acquiring a large volume of labeled samples. The Cropland Data Layer (CDL) released by the USDA is an effective way to obtain ground-based crop types (Boryan et al 2011), and this way of acquiring crop type maps based on a large number of the field labeled samples is not actually easy to achieve in other countries(Song et al 2017). Classification models with high robustness and generalization are the another basis for crop classification. Several studies have shown that Deep Learning(DL) models are more robust and more generalizable than traditional machine learning methods for crop classification(Xu et al 2020, Yuan et al 2022, Yuan et al 2023, Zhong et al 2019a). In a few cases, machine learning methods such as Random Forests and Multilayer Perceptron (MLP) remain general(Lin et al 2022, Skakun et al 2016, Wang et al 2019). Among DL models, Recurrent Neural Networks (RNN) (Minh et al 2018, Sharma et al 2018), Transformer(Yuan & Lin 2021) and Convolutional Neural Networks (CNN) (Lei et al 2021, Wang et al 2022) perform particularly well in pixel-level crop classification because of their ability to model the processing of high-dimensional series data. Further, fusing two deep learning models has also received extensive attention(Wang et al 2023). Existing DL models in the crop classification are more often derived from other fields, including Computer Vision (CV) and Natural Language Processing (NLP). Although these models can also effectively solve the problems in crop classification, their application to the crop classification is not well matched(Güldenring & Nalpantidis 2021). In addition, sufficient samples also provide support for DL, however, DL does not perform well in the regions with sparse samples. Existing approaches to solve the problem of crop classification in sample-sparse regions can be divided into two categories based on the source of training samples(Aneece & Thenkabail 2021, Hao et al 2020, Hao et al 2016, Huang et al 2020).(1) Sample generation: generate a large number of training samples in regions with sparse samples, and then use the generated samples to train the model in the target domain.(2) Transfer learning: train the model in sample-rich regions, and then use the trained model for classification in sample-sparse regions. Specifically, while sample generation produces a large number of samples, the generated samples depend on a small number of samples from the target domain, which makes the similarity between the two extremely high and the trained model has obvious drawbacks(Belgiu et al 2021b). In addition, some methods that generate samples need to find key screening points from the spectral profile(Belgiu et al 2021a, Lin et al 2022, Malambo & Heatwole 2020), but this is not adapted in most crops. Transfer learning is the most direct way to solve the crop classification problem in sample-sparse regions, and the models trained using samples rich in the source domain are robust(Nowakowski et al 2021a). There are exceptions, where model transfer on a small scale is advantageous, and larger scale transfer is influenced by climatic conditions, growing environment(Wang et al 2019). Several recent studies have revealed changes in the consistency and stability of crop spectra over time, both between years and between regions(Lin et al 2022, Wang et al 2019, Zhong et al 2014). The impact of these changes in crop spectra on transfer learning is significant, especially in different weather and management practice contexts (Johnson & Mueller 2021). In fact, the most important factor affecting transfer learning is the existence of differences in the spectra of the same crop between regions. Due to the similarity of spectra of the same crop within the same region, it is feasible to calculate the differences in crop spectra between the sample-sparse and sample-rich regions. However, how to quantify and adapt to this difference remains the biggest challenge for cross-regional model transfer. Considering that direct model transfer cannot obtain reliable classification results and that there are differences in crop phenology of the same crop between different geographical environments. In this paper, we proposed a transfer learning strategy incorporating IRDCS, and verified the generality of the transfer strategy using four classification methods in a study area geographically separated by more than 3000 km, with the sample-sufficient Northwest China region as the source domain and the sparsely sampled Northeast China region as the target domain. Our novel contributions were threefold: 1) constructed a variable dimensional convolution module to extract different forms of semantic features, and designed a symmetric structured network model to ensure no loss of spectral information; 2) proposed a computational method to quantify IRDCS and incorporated IRDCS into transfer learning to solve the problem of unsatisfactory performance when transfer model across regions.; 3) evaluated the cross-regional and cross-year generality of IRDCS, highlighting the superiority of variable dimensional convolutional modules and symmetric network structures. 2. Related work 2.1 Deep learning structures in crop classification In recent years, DL has become a cutting-edge technology for crop classification, thanks to its excellent performance in processing series data. RNN, Transformer and CNN are three commonly used DL structures in crop classification, which are good at extracting temporal features, global features and local features, respectively. For example, Xu et al (2020) used a variant of RNN, Long Short Term Memory (LSTM), to obtain temporal features from multi-temporal remote sensing data and demonstrated that LSTM outperformed RF, MLP and Transformer, which are not good at extracting temporal information. Further, Yuan et al (2022) used the Transformer encoding module to extract global and temporal features using temporal-spatial domain spectra as input. Wang et al (2023) added a local feature extraction module to this for classification. The results show that the more complete the features the better the classification performance. However, in another work, Rawat et al (2021) found that onedimensional CNN perform better than hybrid one-dimensional CNN-LSTM. This indicates that although fusion structures can theoretically lead to better results, the issue of compatibility between structures needs to be considered. Although significant progress has been made in DL architectures in crop classification, especially RNN and Transformer, the application of CNN in crop classification has rarely been considered, mainly because CNN is not good at processing in the temporal domain(Xu et al 2020). In fact, RNN cannot be computed in parallel, and Transformer requires a large number of training samples, which limits its further development. Theoretically, CNN Convolutional Layer can extract local features, Fully Connected Layer can fuse all local features to form global features, and if temporal features are introduced on this basis, complete features can be formed. Therefore, it is worth considering how to further develop CNN in crop classification. 2.2 Transfer learning in crop classification Transfer learning is an important solution to the problem of crop classification in regions with sparse samples. The source domain for transfer learning can be other domains, for example, Nowakowski et al (2021b) transfer a classification model from CV to crop classification achieving accuracy improvements. Similar work has been done in(Wang et al 2021, Yuan & Lin 2021). However, the upper performance bound of model transfer across domains is often limited by the difference between the source and target domains. The source domain for transfer learning can also be the historical year of the sample sparse region. These studies have used the historical samples from several years to train the model or generate the current year samples to train the model and thus complete the current year crop classification(Hao et al 2016, Konduri et al 2020, Yaramasu et al 2020). These efforts are based on the premise that interannual variation in crop phenology is consistent, however, inter-annual variation in phenology is generated over time (Lin et al 2022). The source domain of transfer learning is more often other regions. For example, several studies have trained model in sample-rich regions to achieve cross-state level model transfer in the US (Wang et al 2019, Xu et al 2020). On a larger scale, Hao et al (2020) used highconfidence CDL data as a training sample, and the trained model achieved satisfactory results in China, achieving cross-country level transfer. The essence of transfer learning is the assumption that phenology and growth patterns are similar for the same crop (Hao et al 2020), but that this assumption exists for a limited number of regions or years due to differences in regional cropping environments or yearly cropping management. However, all studies have focused on the study of transfer models. So far, this difference has not been investigated in transfer learning. 3. Materials 3.1 Study area Fig. 1. Geography of three study areas. The upper left study area is the Ili River Valley (Study area Ⅰ), the upper right study area is the Western Heilongjiang(Study area Ⅱ) and the lower right study area is the Eastern Heilongjiang(Study area Ⅲ). Three study areas were selected for this study as shown in Fig. 1. The first study area is located in the Ili River valley in northwest China, which has a continental temperate climate with high precipitation, large diurnal temperature difference, abundant light and heat resources, and annual sunshine hours between 2699 and 3158 h. The main crops in the study area are maize, rice, and wheat, which account for more than 90% of the total cultivated area, while other crops such as soybean and sunflower are grown in smaller proportions. The second and third study areas are located in the northeast region of China, with unique black land cultivation conditions, a cold-temperate continental monsoon climate with abundant rainfall and sunshine hours between 2560 and 2700 h. The main crops in the study area are soybean, rice, maize, and wheat, which account for more than 95% of the total cultivated area, and the transfer effects of soybean, rice, and maize are focused on in the study of this paper. The three study areas are geographically extremely far apart, but the main cropping structures are similar and suitable for verifying the performance of transfer learning. In addition, since the study areas belong to different climates and have differences in landscape conditions as well as land resources, differences in the phenology of the same crop can occur, which is an important reason for selecting these two areas as the study areas. 3.2 Satellite imagery and sample dataset We used remote sensing images from the GaoFen-1 satellite with a temporal resolution of 4 days and a spatial resolution of 16 m. The remote sensing images contain four bands: blue, green, red and near red. The remote sensing images from March 1 to November 1 were selected for the time series analysis according to the crop cultivation rules and the climatic conditions of the two regions. After 10% cloud screening, the number of time phases of remote sensing images in different regions would be different, which led to inconsistent time series length, so we performed linear time interpolation of the time series with an interval of 10 days. In study area Ⅰ, we have a rich field sampling sample, while in study area Ⅱ, Ⅲ field sampling samples are fewer. Study area Ⅰ in this paper used a sample from Northwest in 2017, and the other two study areas used samples from two consecutive years including 2017 and 2018. The sample categories common to the three study areas were rice, maize, and soybean, while other crops, as well as land cover types, were combined into other categories. The number of samples in each category is shown in Table Ⅰ. Table Ⅰ Sample size of the three study areas Year 2017 2018 Study area Ⅰ Study area Ⅱ Study area Ⅲ Study area Ⅱ Study area Ⅲ Rich 30771 39 197 139 70 Maize 171126 440 515 896 722 Soybean 9457 1207 571 896 223 Other 25584 364 197 277 84 Total 236938 2050 1480 2208 1099 4. Methods Our framework is designed to address the problem of crop classification and mapping in regions with sparse samples. As shown in Fig. 2. Proposed framework is divided into three parts: data processing, crop classification and validation, and crop mapping. In the first part, the data in the source and target domains are divided into training samples, validation samples and samples for calculating IRDCS. Of these, the proportion of the first two is 80% and the proportion of the latter is 20%. The validation sample is introduced into IRDCS to form a new validation sample for the target domain. In the second part, rich training samples from the source domain are input to VdPyNet, and the trained model is transferred to the target domain for use and three transfer results are obtained. In the third part, the optimal weights are first obtained from the second part. Then the IRDCS of the three crops obtained in the first part are introduced into the data to be mapped to form the new data to be mapped. Finally, the new data to be mapped are fed to the trained VdPyNet for prediction, and the prediction result after classification probability filtering is the final mapping result. Data Processing Data of Target Domain Data of Source Domain IRDCS Calculation Sample IRDCS Calculation Sample IRDCS Valid Sample New Valid Sample Train Sample Crop Classification and Validation Transfer learning classification result with No IRDCS VdPyNet Optimal weighting Transfer learning classification result With IRDCS Valid and Accuracy Comparision Transfer learning classification result With History IRDCS Crop Mapping Mapping Result of Rice filtering Classification Probability VdPyNet IRDCS Of Maize Data to be Mapped IRDCS Of Rice Mapping Result of Maize Mapping Result of Soybean IRDCS Of Soybean Fig. 2. Illustration of the proposed framework. 4.1 Transfer learning using spectral differences Stage 1. Dividing samples. To prevent pixels of the same plot from being divided into both training and validation sets, we devised an iterative sample division method. First, all pixels will be divided according to 8:2 to obtain ntrain and ndifference , where ntrain in the source and target domains represent the training and validation sets, respectively. Then all plots will be divided according to 8:2, N train and Ndifference , where the sets of pixels in N train and Ndifference are nˆtrain and nˆtrain , respectively. Finally, with nˆdifference ndifference nˆtrain ntrain as the optimization function, when is smallest, ntrain and ndifference are the final division results. Stage 2. Calculating spectral differences between individual samples. The motivation for IRCSD comes from image denoising, where all the differences between an image with noise and an image without noise are reflected in the noise, and more specifically in the differences in each pixel. Thus, the crop spectra reflectance in the source domain can be seen as pure, while the crop spectra reflectance in the target domain are noisy, as reflected in the spectral reflectance of Day of Year (DOY) in the crop growth period. The relationship between the two can be satisfied by equation (1) y (t ) x(t ) g (t ) (1) where y(t) and x(t) denote the crop spectral reflectance of the target and source domains, respectively, g(t) denotes the noise between them, and t denotes the DOY. However, when x(t) is 0, the noise remains, which does not follow the common sense. Therefore, in equation (1), we add a causal factor x t to the noise, i.e. y t x t g t (2) Stage 3. Calculating IRCSD. Since there is no correspondence between the source and target domain samples, a fuzzy correspondence is used for the calculation of IRCSD. Using the source domain samples as units x j , L samples are randomly selected from the target domain as the fuzzy set between each yi . The spectral difference x j in the source domain and the yi is calculated, and the mean value of the result is used as the IRCSD. The IRCS is computed as follows: n G y x i iL j 1 n n x j j (3) j 1 where G denotes IRCSD. L 0, c is the random number and c is the maximum value of the number of samples of each class used to calculate the spectral difference in the target domain. n denotes the number of samples in the source domain. Further, GC denotes the IRCSD of different crops and C denotes the crop category. Stage 4. Crop mapping using IRCSD. GC is unique for different crops is a prerequisite for crop mapping using IRCSD. The data to be mapped X is combined with IRCSD to form the new data Xˆ X to be mapped. X̂ is fed 1 GC into the trained model for prediction. It is worth noting that we did not use the source domain spectra minus the IRDCS to generate the target domain spectra. Converting the rich and diverse samples in the source domain into samples that are very similar to the sparse target domain spectra is very uneconomical, and the trained model has poor generalization ability. 4.2 Variable-dimension Position Symmetric Net The time point of crop spectral reflectance acquisition is an important feature, combining the spectral reflectance and temporal as the input of the classification model can improve the model's understanding of crop growth patterns. By setting different depths of convolutional layers, the features of different layers can be extracted to learn the spectral variation of crops in a diversified way. Based on this, we designed the Variable-dimension Position Symmetric Net (VPSNet) model for crop classification, which takes the crop spectral and temporal as input, a symmetric form of network architecture as the feature extractor, and a single Fully Connected Layer as the classifier. VPSNet consists of several blocks, each with different depths of hidden layers, to extract the features of different layers. The specific network architecture is shown in Fig. 3. Input module: the time series se1 , se2 ,..., seT of each pixel is encoded as a d-dimensional spectra feature vector sf i by equation (4), where set denotes the spectral reflectance of the four bands, T denotes the length of the time series, i [0, T ] , and the temporal doy1 , doy2 ,..., doyT of the time series is encoded as a d-dimensional temporal feature vector tfi by equation (5). The spectral and temporal features concat into xi by equation (6) as the input to the model. sfi f sei sin doyi / 10002 k / d tfi p 2k /d cos doyi / 1000 (4) if p 2k if p 2k 1 xi Concat tfi , sfi (5) (6) where k [0, d ] , d is the dimensionality of the feature vector, and i denotes the sample size. VPSNet Module: In this study, each Block in VPSNet with different hidden layer depths extracts features from the input vector at different levels. Although the depth of the middle hidden layer of each Block is changed according to the number of Blocks, the input layer and the output layer are of equal depth, which ensures that the information of the input feature vector is not lost. Compared with images, the redundancy of crop time series is low, so it is not possible to use a CNN structure similar to that of image classification. We designed the Block with variable depth of the middle hidden layer to ensure no information loss and to satisfy the diverse hierarchy of the network structure. In addition, to prevent the gradient from disappearing during the training process, we took the residual structure in each Block. Each Block hidden layer vector h, input input vector and output output vector can be expressed as input l output l 1 input l 1 hll1 LayerNorm wll1 input l bll1 h b (7) hll2 act wll2 hll1 bll2 output l wll3 where l denotes the lth Block, l l2 (8) l l3 li denotes the ith hidden layer in the lth Block, w and b denote the weight matrix that can be learned, act is the GELU activation function, and * denotes the convolution operation. LayerNorm is used for the first layer of convolution operation, it is more effective than BatchNorm for sequence processing, as has been demonstrated in previous studies (Liu et al 2022). The first hidden layer is composed of one-dimensional convolution, and the convolution kernel size is derived from the candidate values 3,5,7,9,11. The last two hidden layers consist of Linear Layers, extracting features in multiple dimensions. Compared with the traditional CNN network structure, VPSNet uses fewer activation layers. The previous network structure used a fixed pairing of convolution + activation, while we used only one activation function in a Block. The inverse bottleneck structure allows VPSNet to reduce the number of parameters while maintaining competitive performance and improving the efficiency of the overall model. Output module: This module consists of a Global Maximum Pooling Layer and a Linear Layer. The Linear Layer output vector is computed using Softmax to obtain the classification labels and classification probabilities. The training process uses the cross-entropy function as the loss function and the Adam optimizer for backpropagation. To prevent the model from overfitting, the Dropout technique is used, which randomly loses some of the weights in each Block and its value is set to 0.1. 4.3 Experimental setup 4.3.1 Different forms of network architectures To evaluate the performance of the proposed variable-dimension Block, a comprehensive comparison was conducted to compare five different forms of net architectures (Fig. X). To ensure a fair comparison, the number of Blocks is the same for all network structures and all with a position encoding module. A brief description of the five structures is as follows: (1) Invariant dimensional(ID): In this architecture, all blocks have the same dimensions and the inputs and outputs of each block are of the same dimension. The overall structure is symmetric. (2) Up-sampling(US): In this architecture, the dimensionality of blocks is variable, the dimensionality of each block is incremental, and the input and output of each block are equally dimensional. The overall structure is asymmetric. (3) Down-sampling(DS): In this architecture, the dimensionality of blocks is variable, the dimensionality of each block is decreasing, and the input and output of each block are equally dimensional. The overall structure is asymmetric. (4) Input-output non-reciprocal(IONR): In this architecture, the dimensionality of the blocks is variable, and the inputs and outputs of each module are non-identical in dimensionality. The overall structure is symmetric. (5) No position encoding(NoPE): In this architecture, the dimensionality of blocks is variable, and the input and output of each block are equally dimensional. The overall structure is symmetric, but the inputs are not position encoding. 4.3.2 Different classification methods For comparison, we compared the results of the three study areas with the three classification methods. RF is an effective traditional classification method known for its avoidance of overfitting and low complexity. RF is often built as a baseline model, and it performs very excellently in crop classification(Lin et al 2022, Wang et al 2019). Transformer consists of multiple self-attention layers that can extract dependencies between long time sequences and has been successfully applied in several crop classification works(Yuan & Lin 2021, Yuan et al 2022). Adding CNN to the Transformer has also become a mainstream solution in crop classification, where features can be extracted from both global and local perspectives(Li et al 2020, Wang et al 2023). VPSNet can be compared with RF to highlight the importance of temporal-domain learning, with Transformer to highlight the importance of local feature learning, and with CNN+Transformer to highlight the advantages of VPSNet structure. For the optimal hyperparameter setting, RF has two parameters n_estimator and max_features to be set respectively, in this study they are 500 and 4. The settings of Transformer hyperparameters include: the number of self-attention hidden layer dimensions is 256, the number of heads of multi-headed attention is 8, the number of layers of Transformer Block is 3, and the number of fully connected layer dimensions is 1024. CNN+Transformer in the Transformer part of the hyperparameter settings remain unchanged. CNN part: convolutional layer dimension is 256, convolutional kernel is 3 × 3, stride is 1. The other settings of Transformer and CNN+Transformer, such as optimizer, loss function, etc. are the same as VPSNet. 4.3.3 Different ways of IRCSD utilization In Section 4.1 we introduced a way of utilizing IRCSD. For comparison, we devised another way of exploiting IRCSD: after obtaining it, we applied it to the source domain, generated training samples in the target domain, and trained the model using the newly generated samples. In contrast to the approach in Section 4.1, we generate a large number of samples in the target domain to solve the crop classification problem in regions with sparse samples. The purpose is to emphasize the difference between generated samples and model transfer. 4.3.4 Experimental setup and accuracy assessment The number of blocks in VPSNet is set to 9,11,13 and the dimension of the initial hidden layer is 256. The training process of study area Ⅰ with eopch=20, study area Ⅱ and Ⅲ with epoch=1000, the batch is set to 256, the learning rate is set to 1e-5, and the dropout rate is set to 0.1. The optimizer uses Adma and the loss function uses the cross-entropy function. The entire experiment was run on a Windows platform configured with an i7-11700 K @ 3.60 GHz, 32 G RAM, and NVIDIA GeForce RTX 3080 GPU (10 GB RAM), and all programs were written using the python language. In this study, several evaluation metrics were used to evaluate the proposed transfer learning strategy and classification model. Overall Accuracy (OA) is used to evaluate the classification accuracy of the classification model and the overall performance of the transfer learning strategy. Intersection over Union (IoU) calculates the overlap of the prediction results of the two crop types and is used to evaluate the mapping effectiveness of the transfer learning strategy. Equations (9) and (10) were used to calculate OA and IoU. nicorr OA all i 1 ni N IoU i , j n jpred nipred nipred n jpred F1i 2 UAi * PAi UAi PAi (9) (10) (11) where N represents the number of categories, i represents the ith category, n represents the number of samples correctly classified in the ith category, n denotes the number of samples used for validation in the ith category. n pred represents the number of samples predicted, represents the number of pixels that overlap in the prediction of two categories of crops, and concurrent represents the sum of all pixels in the prediction of two categories of crops. UA and PA represent user accuracy and producer accuracy. 5. Results and analysis 5.1 Comparison of different network architectures Table 2 reports the accuracy results for different network architectures. It can be observed that our method (VPSNet-B) always performs best, due to the use of variable dimensional symmetric architecture. It can also be observed that the six different architectures perform better in study area Ⅰ (SA Ⅰ) than in study area Ⅱ (SA Ⅱ) and study area Ⅲ (SA Ⅲ), which shows that the model can obtain higher results when the sample is sufficient. Among all architectures, the US architecture performs the worst, which shows that increasing the number of dimensions of hidden layers with asymmetry increases the redundancy affecting the results. The IONR architecture performs second worst, and its inconsistent input and output dimensions corrupt the information of the original sequence. It is worth noting that the ID architecture achieves suboptimal results with fewer parameters, but performs very poorly in study area 1. The DS and NoPE architectures also perform satisfactorily, which demonstrates the importance of keeping the output and input dimensions consistent in crop classification. Table 3 reports the F1 scores for each class obtained for the different network architectures in the three study areas for two years. We noticed that VPSNet-B obtained the best accuracy in 7 out of 15 scenarios. For some scenarios, such as 2017_SA Ⅱ soybean and 2018_SA Ⅱ soybean, the proposed method obtained suboptimal accuracy. However, for a few network architectures, the results vary very much across scenarios, for example, the US architecture has an accuracy of 0% in rice and maize in 2017_SA Ⅱ, but an outstanding accuracy performance in soybean. For some scenarios, such as 2017_SA Ⅲ maize and 2018_SA Ⅲ rice, the VPSNet architecture is lower than the DS and ID architectures, possibly due to the addition of redundant information at variable dimensionality (e.g., up sampling), which is retained in the classification features during the follow-up process, but this scenario is more often found in regions with sparse samples. Table 2 Accuracy assessment of different network architectures Network 2017_SA 2017_SA 2017_SA 2018_SA 2018_SA Parameter architectures Ⅰ Ⅱ Ⅲ Ⅱ Ⅲ count OA ID 79.15 80.27 65.50 80.98 72.34 98k US 79.02 64.23 47.66 66.59 62.13 24M DS 95.03 76.89 66.67 80.73 68.94 24M IONR 80.84 76.89 61.99 77.80 57.87 17M NoPE 91.38 71.53 65.20 73.66 71.49 16M VPSNet-B 95.53 81.02 66.37 80.73 76.17 16M VPSNet-B: VPSNet Base Model, Configure 11 Block. Parameter count: number of trainable parameters without/with auxiliary classifiers Table 3 F1-score per scenario (%) of different network architectures Crop Net 2017_SA Ⅰ 2017_SAⅡ 2017_SAⅢ 2018_SAⅡ 2018_SAⅢ architectures ID 99.01 75.00 82.80 88.89 80.77 US 99.02 0.00 59.51 25.00 0.00 DS 98.41 61.54 79.29 89.66 68.00 IONR 98.81 76.89 78.37 96.25 33.33 NoPE 89.80 54.55 82.42 75.00 79.31 VPSNet-B 98.92 80.00 79.10 96.30 73.68 ID 85.63 75.15 54.54 81.70 81.02 US 85.52 0.00 44.25 68.57 76.39 DS 97.29 70.00 62.24 82.65 80.00 IONR 87.53 67.47 54.36 78.02 71.35 NoPE 94.96 47.95 57.14 74.01 79.61 VPSNet-B 97.69 76.83 56.99 80.38 84.32 ID 81.45 88.38 65.92 87.12 43.37 US 80.43 76.64 46.67 75.29 0.00 DS 82.33 84.46 65.85 85.48 8.00 Soybea IONR 80.84 84.97 62.73 85.71 21.33. n NoPE 77.77 80.59 64.34 80.31 39.51 VPSNet-B 80.97 87.65 68.50 86.81 58.95 Rice Maize 5.2 Evaluation of VPSNet In this set of experiments, we verified the effectiveness of the proposed VPSNet. On basis of VPSNet-B, we added VPSNet-S and VPSNet-L, both of which represent small-scale VPSNet and largescale VPSNet, respectively. As shown in Table 4, the results of VPSNet-B are better than the other three methods. Specifically, VPSNet-B improved the average OA of 3.14%, 1.7%, and 1.755% over the other methods in the three study areas, respectively. Surprisingly VPSNet-S performed equally well, with optimal results in three of the five scenarios and a significant advantage in its number of parameters. Unexpectedly VPSNet-L beat all the comparison methods. It is noteworthy that all methods achieve very good results in SA Ⅰ, which is rich in samples, while the results in SA Ⅱ and SA Ⅲ, which are sparse in samples, are relatively low. This demonstrates the inability of relying on classification methods to solve the crop classification problem in sample-sparse areas. Table 5 summarizes the F1 scores for each scenario for the different methods. We note that VPSNet achieves the best performance in the vast majority of scenarios (12 out of 15 scenarios). For example, an improvement of more than 6% in 2018_SAⅢ Rice. For CT, which also has the ability to extract complete features, the proposed method achieves an almost complete victory. This may be due to the fact that, although CT is able to extract complete features, the degree of feature fusion is not perfect due to the mutual exclusivity between the individual network architectures. Table 4 Accuracy assessment of different methods Methods 2017_SA Ⅰ 2017_SA 2017_SA Ⅱ 2018_SA Ⅲ Ⅱ 2018_S Paramet Ⅲ er count RF 91.32 78.59 67.54 79.76 71.49 1k < Transformer 77.89 76.64 62.28 78.29 69.79 24M O CT 92.39 80.05 61.70 80.71 67.23 28M A VPSNet-S 92.94 80.05 61.70 79.51 72.34 7.6M VPSNet-B 95.53 81.02 66.37 80.73 76.17 16M VPSNet-L 96.06 81.27 67.87 82.54 78.06 31M VPSNet-S: VPSNet Small Model, configure 9 Block. VPSNet-L: VPSNet Large Model, configure 13 Block. Table 5 F1-score per scenario (%)of different methods 2017_SA Ⅰ 2017_SAⅡ 2017_SAⅢ 2018_SAⅡ 2018_SAⅢ RF 98.68 0.00 79.01 92.31 69.57 Transformer 97.79 82.35 76.47 76.47 71.64 CT 88.44 80.05 76.39 82.76 69.57 VPSNet-S 98.25 66.67 76.54 88.00 71.70 VPSNet-B 98.92 80.00 79.10 96.30 73.68 VPSNet-L 98.20 71.43 80.21 98.89 80.00 RF 94.99 70.73 63.16 80.59 81.11 Transformer 84.80 67.05 52.91 63.53 48.57 CT 95.65 75.27 59.59 80.65 78.48 VPSNet-S 96.03 72.05 46.86 77.99 81.22 VPSNet-B 97.69 76.83 56.99 80.38 84.32 VPSNet-L 98.38 75.43 57.48 80.38 85.19 RF 86.30 86.05 67.67 85.64 32.43 Transformer 83.94 84.71 63.53 85.41 51.49 CT 83.18 76.68 60.98 86.60 33.73 Soybea VPSNet-S 80.27 87.48 63.80 86.61 53.06 n VPSNet-B 80.97 87.65 68.50 86.81 58.95 VPSNet-L 81.42 87.90 67.88 87.33 63.68 Crop Net architectures Rice Maize 5.3 Evaluation of IRCSD in Transfer Learning In this set of experiments, we compared three transfer learning methods: 1. direct transfer (TL 1); 2. adding IRCSD to the source domain (TL 2); 3. adding to IRCSD to the target domain (TL 3). As shown in Table 6, TL 3 has significant advantages over the other two transfer learning methods. Specifically, TL 3 has a minimum improvement of more than 25% and a maximum improvement of more than 70% in OA over TL 1, and a minimum improvement of more than 9% and a maximum improvement of more than 45% over TL 2. The improvement of the four methods is obvious, which demonstrates the generality of IRDCS. The introduction of IRCSD showed a significant improvement in the transfer learning results, which demonstrates the effectiveness of IRDCS. Similarly, the improvement of the four methods is also obvious, which demonstrates the generality of IRDCS. It is worth noting that TL 2 does not exceed the local results, although it has an improvement in results, while TL 3 obtains better results than the local ones. This shows the superiority of using IRCSD in TL 3 over TL 2. Although TL 2 generates samples in the target domain, the trained model is less generalizable. Table 7 provides the F1 scores for each scenario using the different transfer learning methods using VPSNet-B. We found that the introduction of IRCSD resulted in a significant increase in F1 scores in almost all scenarios. This demonstrates the effectiveness of the proposed IRCSD. For some scenarios, such as 2017_SAⅢ Maize, 2018_SAⅢ Maize and 2018_SAⅢ Rice, the effect TL2 after the introduction of IRCSD is worse than TL1 results, which means that the generated samples have noise and false information. It can also be noticed that the transfer results of TL1 are almost always unsatisfactory. Differences in geographic location as well as climatic conditions may be responsible for this result, and this difference is reflected in interregional crops as differences between crop spectra, as confirmed in Section 6.1. Table 6 Accuracy assessment of different Transfer learning methods Methods Transfer Learning 2017_SA Ⅱ 2017_SA Ⅲ 2018_SA Ⅱ 2018_SA Ⅲ TL 1 7.40 31.13 26.97 68.81 TL 2 53.85 46.36 61.58 42.66 TL 3 68.34 81.79 81.07 92.66 TL 1 6.80 29.80 19.77 62.11 TL 2 64.50 48.68 68.08 43.12 TL 3 83.79 83.71 77.01 91.58 TL 1 6.73 28.48 23.45 64.22 TL 2 71.01 51.66 73.73 62.39 TL 3 82.54 83.74 81.36 89.91 TL 1 7.14 30.53 19.21 64.86 TL 2 64.79 50.00 67.23 47.25 TL 3 82.84 82.45 86.72 93.12 Localmean 79.08 64.47 79.87 71.17 Localmax 81.02 67.54 80.73 76.17 RF Transformer CT VPSNet-B Localmean: Mean of local results of four methods. Localmax: Max of local results of four methods. Table 7 F1-score per scenario (%)of different Transfer learning methods using VPSNet-B Crop Transfer 2017_SAⅡ 2017_SAⅢ 2018_SAⅡ 2018_SAⅢ TL 1 4.82 32.40 11.67 49.41 TL 2 26.09 58.56 53.33 40.00 TL 3 73.68 77.92 46.15 84.21 TL 1 11.76 45.98 15.13 66.15 TL 2 9.43 41.18 63.95 52.94 TL 3 75.89 82.73 93.09 96.30 TL 1 3.33 2.75 31.30 13.45 TL 2 79.39 48.11 72.09 40.68 TL 3 86.84 85.57 86.69 87.80 Learning Rice Maize Soybean 1.使用历史 Table 8 Accuracy assessment of different Transfer learning methods using history IRCSD Study Transfer Area Learning RF Transformer CT VPSNet-B 19.21 TL 1 26.97 19.77 23.45 TL 2 25.71 31.07 44.63 2018_SA TL 3 75.71 83.03 76.55 Ⅱ TL 2max 73.73 TL 3max 86.72 TL 1 68.81 62.11 64.22 2018_SA TL 2 26.61 50.46 59.17 Ⅲ TL 3 90.83 92.87 94.95 TL 2max 62.39 TL 3max 93.12 87.87 Localmax 80.73 64.86 94.95 76.17 5.4 Crop Mapping To evaluate the effectiveness of the mapping of our proposed VPSNet and transfer learning strategy, we selected two 10 km × 10 km area in each of the study areas Ⅱ, Ⅲ for crop mapping, and the crop types mapped were rice, maize, and soybean. We mapped each region three times, retaining only the corresponding crop for each mapping (this was determined by the crop spectral differences between the regions used). To ensure the accuracy of the mapping, only the 'correct pixels' with high classification probability (>0.99) were retained and filtered to obtain the distribution layers of the three crops, and the three layers were superimposed to obtain the final crop mapping results. Since we need to map each area three times, this causes some pixels to have multiple crop categories, which is unavoidable. We call such pixels’ uncertain pixels and evaluated the order of magnitude of uncertain pixels in each mapping result by IoU. When the IoU was smaller, it indicated fewer uncertain pixels and less crop mixing. In addition, we removed pixels with a very small number of March-November images, which greatly interfered with the mapping effect. Crop mapping included the use of local samples for mapping (Local), direct model transfer learning mapping (TL 1), and the introduction of the IRCSD (TL 3) for mapping, and the results were shown in Fig. 11. (a)TL3 IOU= 10.9% (b) Local (c)TL (d)TL3 IOU=8.4% (e) Local (f)TL (g)TL3 IOU=12.17% (h) Local (i)TL (j)TL3 IOU=5.9% (k) Local (l)TL Fig. 11. Comparison of mapping results of different transfer learning methods. Where (a-f) are the results of 2017 mapping and (g-l) are the results of 2018 mapping. (a-c) and (g-l) are study area Ⅲ mapping results, (d-f) and (j-l) are study area Ⅱ mapping results Fig. 11 shows that the mapping results obtained by direct use of model migration were very different from the local mapping results, while the mapping results after adding IRCSD were very similar to the local results, which indicated the effectiveness of the proposed transfer strategy and crop mapping strategy. Combined with the two years of mapping results, the two study areas practiced a soybean and maize rotational cropping system, which was very consistent with the reality and demonstrated the completeness and validity of VPSNet for crop classification prediction results. The four mapping results using IRCSD IOU were 10.9%, 8.4%, 5.9% and 12.17%, respectively, which was an acceptable range. In fact, in this study area there was a pattern of soybean mixed with maize, where soybean and maize are planted in the same plot, because this planting behavior can increase the yield of both crops and therefore indeterminate pixels were suitable for the classification of this planting scenario. We also mapped the transfer of IRCSD for historical years (TL3H), and the mapping idea was consistent with the previous ones, and the results are shown in Fig. 12. (a)TL3H IoU = 18.89% (b) TL3 IOU= 12.17% (c)Local (d) TL3H IoU = 5.98% (e) TL3 IOU= 5.9% (f)Local Fig. 12. Comparison of mapping results of different transfer learning methods. Where (a-c) are study area Ⅲ mapping results, (d-f) are study area Ⅱ mapping results Fig. 12 shows that the mapping results using the IRCSD of historical years were not as good as using the IRCSD of the current year, and there was some improvement in IoU, but the mapping results were acceptable in some areas. Using historical inter-regional crop differences was a solution for crop mapping without current year field samples, and the mapping results provided in this study were also an acceptable option in this scenario. 6. Discussion 6.1 Analysis of inter-regional crop time series difference Fig. 4. Crop temporal profile for the blue band and nir-red band of rice, maize and soybean in the three study areas in 2017. The blue, green and red lines indicate the means of the time series for the different regions. The buffers indicate one standard deviation from the mean. Fig. 4 shows that the temporal profile of the two bands of the three crops in study area Ⅱ, Ⅲ have similarity, while the temporal profile of the two bands of wheat and maize in study area Ⅰ and study area Ⅱ, Ⅲ differed very significantly, especially during the growing season of the crop, while the differences in phenology between soybeans were relatively small and were concentrated before the maturity of the crop. It can be seen that for regions that were geographically close and had similar climatic, the possibility of differences in temporal profile between crops became smaller, while when geographically distant and climatic appeared significantly different, the temporal profile between crops would be significantly different. This is consistent with previous studies. Such inter-regional differences in crop spectra pose a barrier to the direct transfer of classification models, and eliminating or simulating such inter-spectral differences is key for models to achieve cross-regional transfer. Using a small amount of sample from the source and target domains, we combined the difference calculation equation to simulate inter-regional crop spectral differences and fuse the spectral differences with the target domain spectra to form a new target domain spectrum close to the source domain spectrum. We calculated the crop spectral differences between study area Ⅰ and study area Ⅱ, Ⅲ using equation (2), and the resulting new target domain crop spectra were plotted against the source domain crop spectra, as shown in Fig. 5. Fig. 5. 2017 crop temporal profile of rice, maize and soybean in blue and nir-red bands from three study areas with IRCSD added. Fig. 5 shows that the temporal profile of the target domain was very similar to those of the target domain after adding IRCSD, which was consistent with our assumptions. Among them, the temporal profile of rice and soybean in the target domain were very close to those of the source domain after adding IRCSD, but the two band buffered of the source domain and the target domain overlap less, and the widths of the bands in the target domain were wider than those in the source domain, especially at the end of growth. In contrast, the widths of the maize temporal profile were relatively wider in the source domain, but the mean temporal profile were somewhat different compared to the other two crops, concentrating on the mid-growth stage of the crop. This was related to the number of crop samples in the source domain, where rice and soybean samples were relatively small and densely distributed geographically, which made the sample diversity relatively weak, while maize samples in the source domain are extremely abundant and widely distributed geographically, which created sample diversity. The richer the sample diversity of the source domain, the more significant the spectral differences between the obtained regions, which is the key to the successful implementation of the proposed transfer learning strategy. Since only the trained model in the source domain need to be used in this study, it was necessary to introduce IRCSD for the unlabeled remote sensing data (For mapping) in the target domain as well in crop mapping. Fig. 10 shows that the IRCSD of different crops were also different, which provided convenience for crop mapping, and for different crops, the IRCSD of different crops can be selected. We performed i times plus crop spectral differences for the unlabeled remote sensing data in the target domain, where i was related to the number of classes of crops in the target domain. Fig. 10. Comparison of IRCSD for different crops. Transferring a model trained in a region with sufficient samples to a region with sparse samples can effectively solve the problem that crop classification cannot be performed in a region with sparse samples. The similarity of the same crop phenology in two regions was a prerequisite for model transfer(Hao et al 2020). However, the difference of geographical environment between regions would make the difference of the same crop phenology, and the effect of transfer would decrease with the increase of the difference(Wang et al 2019). Crop phenology was described by the spectral reflectance of remotely sensed images(Zhang et al 2014). Therefore, this study proposed the use of IRCSD to adapt inter-regional geographic environmental differences in transfer learning. IRCSD can reduce the differences in phenology of the same crop in different regions, thus improving the transfer effect of the model, and can realize the inter-annual interval of the historical classification model. The key to the success of the transfer strategy was that equation (2) considers not only the difference relationship between the spectral reflectance between regions, but also the multiplicative relationship between the spectral reflectance of the target domain and the IRCSD, which made the IRCSD significantly correlated with the spectral reflectance of the target domain. In crop classification, the classification method was one of the factors that affect the classification accuracy. Each classification has unique advantages for crop classification using the spectral reflectance of remotely sensed images(Wang et al 2022, Zhong et al 2019b). Designing an effective crop classification model would have a significant impact on the classification results. PoSyNet achieved optimal classification accuracy by extracting multiform semantic features using a multilevel network mechanism. This was due to both the design of the network architecture and the input of the temporal and spectral dimensions, resulting in accurate classification of different crops. The combination of a generic transfer strategy (IRCSD) and an effective classification model (PoSyNet) can enable the transfer of models between regions with large differences in geographic environments. 6. Conclusions In this study, a transfer learning strategy for the transfer of classification models across regions was proposed, which can solve the problem that classification models cannot obtain satisfactory classification accuracy when transferring across long distances. The strategy was based on the fact that crop phenology of the same crop in different regions can vary due to factors such as inter-regional geography, and such differences can be characterized using crop spectra. The results showed that the crop temporal profile of the same crop in geographically similar regions were very similar, while the temporal profile of the same crop in regions with differences in geography were different. The designed method for spectral differences can calculate the IRCSD using a small number of samples, and the improvement of transfer learning accuracy by adding the IRCSD is more than 25% in all transfer experiments, which demonstrated the generality of the proposed method. A multi-layer DL structure (PoSyNet) was constructed with different forms of features with semantic information extracted by setting up convolutional layers of different dimensions. Compared with RF, Transformer, and CT, PoSyNet had advantages in classification results. Adding position coding structure to the network structure improved the richness of model inputs. Compared with RF without position coding, the method with position coding methods had significant advantages. The transfer learning strategy proposed in this study can effectively solve the problem of model migration, but still requires a small number of samples as support. In future work, the possibility of using factors such as geography to generate IRCSD will be further explored, which in turn will further reduce the need for samples. Acknowledgements This work was supported in part by the National Natural Youth Science Foundation of China under Grant 42001352; in part by the National Key Research and Development Program of China under Grant 2021YFE0205100. References Aneece I, Thenkabail PS. 2021. Classifying Crop Types Using Two Generations of Hyperspectral Sensors (Hyperion and DESIS) with Machine Learning on the Cloud. Remote Sens-Basel 13: 4704 Bargiel D. 2017. A new method for crop classification combining time series of radar images and crop phenology information. Remote Sens Environ 198: 369-83 Belgiu M, Bijker W, Csillik O, Stein A. 2021a. Phenology-based sample generation for supervised crop type classification. Int J Appl Earth Obs 95 Belgiu M, Bijker W, Csillik O, Stein A. 2021b. Phenology-based sample generation for supervised crop type classification. Int J Appl Earth Obs 95: 102264 Boryan C, Yang Z, Mueller R, Craig M. 2011. Monitoring US agriculture: the US department of agriculture, national agricultural statistics service, cropland data layer program. Geocarto International 26: 341-58 Chen J, Chen J, Liao A, Cao X, Chen L, et al. 2015. Global land cover mapping at 30 m resolution: A POK-based operational approach. Isprs J Photogramm 103: 7-27 Dong J, Xiao X, Kou W, Qin Y, Zhang G, et al. 2015. Tracking the dynamics of paddy rice planting area in 1986–2010 through time series Landsat images and phenology-based algorithms. Remote Sens Environ 160: 99-113 Franch B, Vermote E, Becker-Reshef I, Claverie M, Huang J, et al. 2015. Improving the timeliness of winter wheat production forecast in the United States of America, Ukraine and China using MODIS data and NCAR Growing Degree Day information. Remote Sens Environ 161: 131-48 Gallego J, Craig M, Michaelsen J, Bossyns B, Fritz S. 2008. Best practices for crop area estimation with remote sensing. Ispra: Joint Research Center Ghazaryan G, Dubovyk O, Löw F, Lavreniuk M, Kolotii A, et al. 2018. A rule-based approach for crop identification using multi-temporal and multi-sensor phenological metrics. European Journal of Remote Sensing 51: 511-24 Gong P, Wang J, Yu L, Zhao Y, Zhao Y, et al. 2013. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int J Remote Sens 34: 2607-54 Güldenring R, Nalpantidis L. 2021. Self-supervised contrastive learning on agricultural images. Comput Electron Agr 191: 106510 Hao PY, Di LP, Zhang C, Guo LY. 2020. Transfer Learning for Crop classification with Cropland Data Layer data (CDL) as training samples. Sci Total Environ 733 Hao PY, Wang L, Zhan YL, Wang CY, Niu Z, Wu MQ. 2016. Crop classification using crop knowledge of the previous-year: Case study in Southwest Kansas, USA. European Journal of Remote Sensing 49: 1061-77 Huang H, Wang J, Liu C, Liang L, Li C, Gong P. 2020. The migration of training samples towards dynamic global land cover mapping. Isprs J Photogramm 161: 27-36 Johnson DM, Mueller R. 2021. Pre-and within-season crop type classification trained with archival land cover information. Remote Sens Environ 264: 112576 Konduri VS, Kumar J, Hargrove WW, Hoffman FM, Ganguly AR. 2020. Mapping crops within the growing season across the United States. Remote Sens Environ 251: 112048 Lei L, Wang XY, Zhong YF, Zhao HW, Hu X, Luo C. 2021. DOCC: Deep one-class crop classification via positive and unlabeled learning for multi-modal satellite imagery. Int J Appl Earth Obs 105 Li ZT, Chen GK, Zhang TX. 2020. A CNN-Transformer Hybrid Approach for Crop Classification Using Multitemporal Multisensor Images. Ieee J-Stars 13: 847-58 Lin C, Zhong L, Song X-P, Dong J, Lobell DB, Jin Z. 2022. Early-and in-season crop type mapping without current-year ground truth: Generating labels from historical information via a topology-based approach. Remote Sens Environ 274: 112994 Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition2022: 11976-86. Malambo L, Heatwole CD. 2020. Automated training sample definition for seasonal burned area mapping. Isprs J Photogramm 160: 107-23 Minh DHT, Ienco D, Gaetano R, Lalande N, Ndikumana E, et al. 2018. Deep recurrent neural networks for winter vegetation quality mapping via multitemporal SAR Sentinel-1. Ieee Geosci Remote S 15: 464-68 Nowakowski A, Mrziglod J, Spiller D, Bonifacio R, Ferrari I, et al. 2021a. Crop type mapping by using transfer learning. Int J Appl Earth Obs 98: 102313 Nowakowski A, Mrziglod J, Spiller D, Bonifacio R, Ferrari I, et al. 2021b. Crop type mapping by using transfer learning. Int J Appl Earth Obs 98 Rawat A, Kumar A, Upadhyay P, Kumar S. 2021. Deep learning-based models for temporal satellite data processing: Classification of paddy transplanted fields. Ecol Inform 61 Sharma A, Liu X, Yang X. 2018. Land cover classification from multi-temporal, multi-spectral remotely sensed imagery using patch-based recurrent neural networks. Neural Networks 105: 346-55 Skakun S, Franch B, Vermote E, Roger J-C, Becker-Reshef I, et al. 2017. Early season largearea winter crop mapping using MODIS NDVI data, growing degree days information and a Gaussian mixture model. Remote Sens Environ 195: 244-58 Skakun S, Kussul N, Shelestov AY, Lavreniuk M, Kussul O. 2016. Efficiency Assessment of Multitemporal C-Band Radarsat-2 Intensity and Landsat-8 Surface Reflectance Satellite Imagery for Crop Classification in Ukraine. Ieee J-Stars 9: 3712-19 Song X-P, Potapov PV, Krylov A, King L, Di Bella CM, et al. 2017. National-scale soybean mapping and area estimation in the United States using medium resolution satellite imagery and field survey. Remote Sens Environ 190: 383-95 Wang H, Chang W, Yao Y, Liu D, Zhao Y, et al. 2022. CC-SSL: A Self-Supervised Learning Framework for Crop Classification With Few Labeled Samples. Ieee J-Stars 15: 870418 Wang H, Chang W, Yao Y, Yao Z, Zhao Y, et al. 2023. Cropformer: A new generalized deep learning classification approach for multi-scenario crop classification. Frontiers in Plant Science 14 Wang S, Azzari G, Lobell DB. 2019. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens Environ 222: 303-17 Wang Y, Zhang Z, Feng L, Ma Y, Du Q. 2021. A new attention-based CNN approach for crop mapping using time series Sentinel-2 images. Comput Electron Agr 184: 106090 Xu JF, Zhu Y, Zhong RH, Lin ZX, Xu JL, et al. 2020. DeepCropMapping: A multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean mapping. Remote Sens Environ 247 Yaramasu R, Bandaru V, Pnvr K. 2020. Pre-season crop type mapping using deep neural networks. Comput Electron Agr 176: 105664 Yu L, Wang J, Clinton N, Xin Q, Zhong L, et al. 2013. FROM-GC: 30 m global cropland extent derived through multisource data integration. International Journal of Digital Earth 6: 521-33 Yuan Y, Lin L. 2021. Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification. Ieee J-Stars 14: 474-87 Yuan Y, Lin L, Liu QS, Hang RL, Zhou ZG. 2022. SITS-Former: A pre-trained spatio-spectraltemporal representation model for Sentinel-2 time series classification. Int J Appl Earth Obs 106 Yuan Y, Lin L, Zhou Z-G, Jiang H, Liu Q. 2023. Bridging optical and SAR satellite image time series via contrastive feature extraction for crop classification. Isprs J Photogramm 195: 222-32 Zhang G, Xiao X, Dong J, Kou W, Jin C, et al. 2015. Mapping paddy rice planting areas through time series analysis of MODIS land surface temperature and vegetation index data. Isprs J Photogramm 106: 157-71 Zhang J, Feng L, Yao F. 2014. Improved maize cultivated area estimation over a large scale combining MODIS–EVI time series data and crop phenological information. Isprs J Photogramm 94: 102-13 Zhong L, Gong P, Biging GS. 2014. Efficient corn and soybean mapping with temporal extendability: A multi-year experiment using Landsat imagery. Remote Sens Environ 140: 1-13 Zhong L, Hu L, Zhou H. 2019a. Deep learning based multi-temporal crop classification. Remote Sens Environ 221: 430-43 Zhong LH, Hu LN, Zhou H. 2019b. Deep learning based multi-temporal crop classification. Remote Sens Environ 221: 430-43