Uploaded by blaaa blooo

Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them)

advertisement
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [1]: import json
import pandas as pd
import numpy as np
THERE WILL BE AN ERROR WHILE LOADING THE BEC IN "PRODUCT DIPLAY NAME" SOME
ITEMS TOKE 11 CELLS IN THE CSV WHILE THERE IS ONLY 10 COLUMNS THATS I S10
CELLS BUT ITS OK WE ONLY NEED THE IDS OF THE PRODUCTS
In [2]: data = pd.read_csv('styles.csv', error_bad_lines=False)
b'Skipping line 6044: expected 10 fields, saw 11\nSkipping line 6569: expected
10 fields, saw 11\nSkipping line 7399: expected 10 fields, saw 11\nSkipping lin
e 7939: expected 10 fields, saw 11\nSkipping line 9026: expected 10 fields, saw
11\nSkipping line 10264: expected 10 fields, saw 11\nSkipping line 10427: expec
ted 10 fields, saw 11\nSkipping line 10905: expected 10 fields, saw 11\nSkippin
g line 11373: expected 10 fields, saw 11\nSkipping line 11945: expected 10 fiel
ds, saw 11\nSkipping line 14112: expected 10 fields, saw 11\nSkipping line 1453
2: expected 10 fields, saw 11\nSkipping line 15076: expected 10 fields, saw 12
\nSkipping line 29906: expected 10 fields, saw 11\nSkipping line 31625: expecte
d 10 fields, saw 11\nSkipping line 33020: expected 10 fields, saw 11\nSkipping
line 35748: expected 10 fields, saw 11\nSkipping line 35962: expected 10 field
s, saw 11\nSkipping line 37770: expected 10 fields, saw 11\nSkipping line 3810
5: expected 10 fields, saw 11\nSkipping line 38275: expected 10 fields, saw 11
\nSkipping line 38404: expected 10 fields, saw 12\n'
In [3]: data.head()
Out[3]:
id
gender
masterCategory
subCategory
articleType
baseColour
season
year
usage
0
15970
Men
Apparel
Topwear
Shirts
Navy Blue
Fall
2011.0
Casual
1
39386
Men
Apparel
Bottomwear
Jeans
Blue
Summer
2012.0
Casual
2
59263
Women
Accessories
Watches
Watches
Silver
Winter
2016.0
Casual
3
21379
Men
Apparel
Bottomwear
Track
Pants
Black
Fall
2011.0
Casual
4
53759
Men
Apparel
Topwear
Tshirts
Grey
Summer
2012.0
Casual
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 1/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [4]: data
Out[4]:
id
gender
masterCategory
subCategory
articleType
baseColour
season
year
us
0
15970
Men
Apparel
Topwear
Shirts
Navy Blue
Fall
2011.0
Cas
1
39386
Men
Apparel
Bottomwear
Jeans
Blue
Summer
2012.0
Cas
2
59263
Women
Accessories
Watches
Watches
Silver
Winter
2016.0
Cas
3
21379
Men
Apparel
Bottomwear
Track
Pants
Black
Fall
2011.0
Cas
4
53759
Men
Apparel
Topwear
Tshirts
Grey
Summer
2012.0
Cas
...
...
...
...
...
...
...
...
...
44419
17036
Men
Footwear
Shoes
Casual
Shoes
White
Summer
2013.0
Cas
44420
6461
Men
Footwear
Flip Flops
Flip Flops
Red
Summer
2011.0
Cas
44421
18842
Men
Apparel
Topwear
Tshirts
Blue
Fall
2011.0
Cas
44422
46694
Women
Personal Care
Fragrance
Perfume
and Body
Mist
Blue
Spring
2017.0
Cas
44423
51623
Women
Accessories
Watches
Watches
Pink
Winter
2016.0
Cas
44424 rows × 10 columns
In [5]: data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44424 entries, 0 to 44423
Data columns (total 10 columns):
#
Column
Non-Null Count
--- ------------------0
id
44424 non-null
1
gender
44424 non-null
2
masterCategory
44424 non-null
3
subCategory
44424 non-null
4
articleType
44424 non-null
5
baseColour
44409 non-null
6
season
44403 non-null
7
year
44423 non-null
8
usage
44107 non-null
9
productDisplayName 44417 non-null
dtypes: float64(1), int64(1), object(8)
memory usage: 3.4+ MB
Dtype
----int64
object
object
object
object
object
object
float64
object
object
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 2/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [6]: data.describe()
Out[6]:
id
year
count
44424.000000
44423.000000
mean
29696.334301
2012.806497
std
17049.490518
2.126480
min
1163.000000
2007.000000
25%
14768.750000
2011.000000
50%
28618.500000
2012.000000
75%
44683.250000
2015.000000
max
60000.000000
2019.000000
In [7]: JSON_files=data['id'].values
In [8]: JSON_files
Out[8]: array([15970, 39386, 59263, ..., 18842, 46694, 51623], dtype=int64)
In [9]: JSON_files.sort()
In [10]: JSON_files
Out[10]: array([ 1163,
1164,
1165, ..., 59998, 59999, 60000], dtype=int64)
In [11]: JSON_files=list(JSON_files)
In [12]: JSON_files
Out[12]: [1163,
1164,
1165,
1525,
1526,
1528,
1529,
1530,
1531,
1532,
1533,
1534,
1535,
1536,
1537,
1538,
1539,
1540,
1541,
1542
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 3/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [13]: JSON_files_Names=[]
converting the id to string to use them for refrence
In [14]: for i in JSON_files:
JSON_files_Names.append(str(i))
In [15]: JSON_files_Names
Out[15]: ['1163',
'1164',
'1165',
'1525',
'1526',
'1528',
'1529',
'1530',
'1531',
'1532',
'1533',
'1534',
'1535',
'1536',
'1537',
'1538',
'1539',
'1540',
'1541',
'1542'
lets test openeing the file
In [53]: df=pd.read_json("1163.json")
In [54]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 43 entries, code to styleOptions
Data columns (total 3 columns):
#
Column
Non-Null Count Dtype
--- ------------------- ----0
notification 0 non-null
float64
1
meta
2 non-null
object
2
data
41 non-null
object
dtypes: float64(1), object(2)
memory usage: 1.3+ KB
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 4/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [55]: df
Out[55]:
notification
meta
data
code
NaN
200
NaN
requestId
NaN
4cf4c56d-2941-4012b1d3-10f7c762a126
NaN
id
NaN
NaN
1163
price
NaN
NaN
895
discountedPrice
NaN
NaN
895
styleType
NaN
NaN
DEL
productTypeId
NaN
NaN
219
articleNumber
NaN
NaN
409962-480-895
visualTag
NaN
NaN
productDisplayName
NaN
NaN
Nike Sahara Team India Fanwear Round
Neck Jersey
variantName
NaN
NaN
Roundneck Jersey
myntraRating
NaN
NaN
1
catalogAddDate
NaN
NaN
1461658417
brandName
NaN
NaN
Nike
ageGroup
NaN
NaN
Adults-Men
gender
NaN
NaN
Men
baseColour
NaN
NaN
Blue
colour1
NaN
NaN
NA
colour2
NaN
NaN
NA
fashionType
NaN
NaN
Fashion
season
NaN
NaN
Summer
year
NaN
NaN
2011
usage
NaN
NaN
Sports
vat
NaN
NaN
5.5
displayCategories
NaN
NaN
Sports Wear,Sale
weight
NaN
NaN
0
navigationId
NaN
NaN
0
landingPageUrl
NaN
NaN
Tshirts/Nike/Nike-Sahara-Team-IndiaFanwear-Ro...
articleAttributes
NaN
NaN
{'Fit': 'Regular Fit', 'Fabric 3': 'NA', 'Body...
crossLinks
NaN
NaN
[{'key': 'More Tshirts by Nike', 'value': 'tsh...
brandUserProfile
NaN
NaN
{'uidx':
'6d415071.a389.472b.b5b6.c93d864afbea...
codEnabled
NaN
NaN
True
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 5/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
notification
meta
data
styleImages
NaN
NaN
{'default': {'imageURL':
'http://assets.myntas...
lookGoodAlbum
NaN
NaN
{}
style360Images
NaN
NaN
{}
masterCategory
NaN
NaN
{'id': 9, 'typeName': 'Apparel', 'active': Tru...
subCategory
NaN
NaN
{'id': 31, 'typeName': 'Topwear', 'active': Tr...
articleType
NaN
NaN
{'id': 90, 'typeName': 'Tshirts', 'active': Tr...
isEMIEnabled
NaN
NaN
True
otherFlags
NaN
NaN
[{'dataType': 'BOOLEAN', 'name': 'isFragile',
...
articleDisplayAttr
NaN
NaN
{'id': 90, 'core': {'order': '0', 'display': '...
productDescriptors
NaN
NaN
{'materials_care_desc': {'descriptorType':
'ma...
styleOptions
NaN
NaN
[{'id': 8289, 'name': 'Size', 'value': 'XXS', ...
In [56]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 43 entries, code to styleOptions
Data columns (total 3 columns):
#
Column
Non-Null Count Dtype
--- ------------------- ----0
notification 0 non-null
float64
1
meta
2 non-null
object
2
data
41 non-null
object
dtypes: float64(1), object(2)
memory usage: 1.3+ KB
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 6/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [57]: df1=df.drop(columns=['notification','meta'])
df1
Out[57]:
data
code
NaN
requestId
NaN
id
1163
price
895
discountedPrice
895
styleType
DEL
productTypeId
219
articleNumber
409962-480-895
visualTag
productDisplayName
Nike Sahara Team India Fanwear Round Neck Jersey
variantName
Roundneck Jersey
myntraRating
1
catalogAddDate
1461658417
brandName
Nike
ageGroup
Adults-Men
gender
Men
baseColour
Blue
colour1
NA
colour2
NA
fashionType
Fashion
season
Summer
year
2011
usage
Sports
vat
5.5
displayCategories
Sports Wear,Sale
weight
0
navigationId
0
landingPageUrl
Tshirts/Nike/Nike-Sahara-Team-India-Fanwear-Ro...
articleAttributes
{'Fit': 'Regular Fit', 'Fabric 3': 'NA', 'Body...
crossLinks
[{'key': 'More Tshirts by Nike', 'value': 'tsh...
brandUserProfile
{'uidx': '6d415071.a389.472b.b5b6.c93d864afbea...
codEnabled
True
styleImages
{'default': {'imageURL': 'http://assets.myntas...
lookGoodAlbum
{}
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 7/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
data
style360Images
{}
masterCategory
{'id': 9, 'typeName': 'Apparel', 'active': Tru...
subCategory
{'id': 31, 'typeName': 'Topwear', 'active': Tr...
articleType
{'id': 90, 'typeName': 'Tshirts', 'active': Tr...
isEMIEnabled
True
otherFlags
[{'dataType': 'BOOLEAN', 'name': 'isFragile', ...
articleDisplayAttr
{'id': 90, 'core': {'order': '0', 'display': '...
productDescriptors
{'materials_care_desc': {'descriptorType': 'ma...
styleOptions
[{'id': 8289, 'name': 'Size', 'value': 'XXS', ...
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 8/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [58]: df1=df1.T
df1.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, data to data
Data columns (total 43 columns):
#
Column
Non-Null Count
--- ------------------0
code
0 non-null
1
requestId
0 non-null
2
id
1 non-null
3
price
1 non-null
4
discountedPrice
1 non-null
5
styleType
1 non-null
6
productTypeId
1 non-null
7
articleNumber
1 non-null
8
visualTag
1 non-null
9
productDisplayName 1 non-null
10 variantName
1 non-null
11 myntraRating
1 non-null
12 catalogAddDate
1 non-null
13 brandName
1 non-null
14 ageGroup
1 non-null
15 gender
1 non-null
16 baseColour
1 non-null
17 colour1
1 non-null
18 colour2
1 non-null
19 fashionType
1 non-null
20 season
1 non-null
21 year
1 non-null
22 usage
1 non-null
23 vat
1 non-null
24 displayCategories
1 non-null
25 weight
1 non-null
26 navigationId
1 non-null
27 landingPageUrl
1 non-null
28 articleAttributes
1 non-null
29 crossLinks
1 non-null
30 brandUserProfile
1 non-null
31 codEnabled
1 non-null
32 styleImages
1 non-null
33 lookGoodAlbum
1 non-null
34 style360Images
1 non-null
35 masterCategory
1 non-null
36 subCategory
1 non-null
37 articleType
1 non-null
38 isEMIEnabled
1 non-null
39 otherFlags
1 non-null
40 articleDisplayAttr 1 non-null
41 productDescriptors 1 non-null
42 styleOptions
1 non-null
dtypes: object(43)
memory usage: 460.0+ bytes
Dtype
----object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out o… 9/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [59]: df1
Out[59]:
data
code
requestId
id
price
discountedPrice
styleType
productTypeId
articleNumber
NaN
NaN
1163
895
895
DEL
219
409962-480895
visua
1 rows × 43 columns
In [60]: df1 = df1.dropna(axis=1)
df1
Out[60]:
data
id
price
discountedPrice
styleType
productTypeId
articleNumber
1163
895
895
DEL
219
409962-480895
visualTag
productDisp
Nike Sah
India Fanwe
Ne
1 rows × 41 columns
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 10/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [61]: df1.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, data to data
Data columns (total 41 columns):
#
Column
Non-Null Count
--- ------------------0
id
1 non-null
1
price
1 non-null
2
discountedPrice
1 non-null
3
styleType
1 non-null
4
productTypeId
1 non-null
5
articleNumber
1 non-null
6
visualTag
1 non-null
7
productDisplayName 1 non-null
8
variantName
1 non-null
9
myntraRating
1 non-null
10 catalogAddDate
1 non-null
11 brandName
1 non-null
12 ageGroup
1 non-null
13 gender
1 non-null
14 baseColour
1 non-null
15 colour1
1 non-null
16 colour2
1 non-null
17 fashionType
1 non-null
18 season
1 non-null
19 year
1 non-null
20 usage
1 non-null
21 vat
1 non-null
22 displayCategories
1 non-null
23 weight
1 non-null
24 navigationId
1 non-null
25 landingPageUrl
1 non-null
26 articleAttributes
1 non-null
27 crossLinks
1 non-null
28 brandUserProfile
1 non-null
29 codEnabled
1 non-null
30 styleImages
1 non-null
31 lookGoodAlbum
1 non-null
32 style360Images
1 non-null
33 masterCategory
1 non-null
34 subCategory
1 non-null
35 articleType
1 non-null
36 isEMIEnabled
1 non-null
37 otherFlags
1 non-null
38 articleDisplayAttr 1 non-null
39 productDescriptors 1 non-null
40 styleOptions
1 non-null
dtypes: object(41)
memory usage: 444.0+ bytes
Dtype
----object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
object
In [62]: pd.set_option('max_columns', None)
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out …
11/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [63]: df1
Out[63]:
data
id
price
discountedPrice
styleType
productTypeId
articleNumber
1163
895
895
DEL
219
409962-480895
visualTag
productDisp
Nike Sah
India Fanwe
Ne
In [75]: df1.articleAttributes.values
Out[75]: array([{'Fit': 'Regular Fit', 'Fabric 3': 'NA', 'Body or Garment Size': 'Garmen
t Measurements in', 'Occasion': 'Sports'}],
dtype=object)
In [66]: TBDF_articleAttributes=df1.articleAttributes.values.tolist()
DF_articleAttributes=pd.DataFrame(TBDF_articleAttributes)
DF_articleAttributes
Out[66]:
0
Fit
Fabric 3
Body or Garment Size
Occasion
Regular Fit
NA
Garment Measurements in
Sports
In [28]: data = {'id': [1],
'name': ['Nike'],
'logo':["httpsgadga"],
'top':[0],
'slug':[0],
'meta_title':[0],
'meta_description':[0],
'created_at':[0],
'updated_at':[0],
}
df = pd.DataFrame(data)
​
print (df)
0
id
1
name
Nike
logo
httpsgadga
0
updated_at
0
top
0
slug
0
meta_title
0
meta_description
0
created_at
0
\
In [29]: df
Out[29]:
0
id
name
logo
top
slug
meta_title
meta_description
created_at
updated_at
1
Nike
httpsgadga
0
0
0
0
0
0
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 12/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [30]: print(df1['brandName'].unique())
['Nike']
In [31]: df=pd.read_json("./styles/1163.json")
df1=df.drop(columns=['notification','meta'])
df1=df1.T
df1 =df1.dropna(axis=1)
df1
Out[31]:
data
id
price
discountedPrice
styleType
productTypeId
articleNumber
1163
895
895
DEL
219
409962-480895
visualTag
productDisp
Nike Sah
India Fanwe
Ne
In [32]: JSON_files_Names
Out[32]: ['1163',
'1164',
'1165',
'1525',
'1526',
'1528',
'1529',
'1530',
'1531',
'1532',
'1533',
'1534',
'1535',
'1536',
'1537',
'1538',
'1539',
'1540',
'1541',
'1542'
In [33]: len(JSON_files_Names)
Out[33]: 44424
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 13/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [34]: Semi_proccessed_Complete_Jdata=df1
Semi_proccessed_Complete_Jdata
Out[34]:
data
id
price
discountedPrice
styleType
productTypeId
articleNumber
1163
895
895
DEL
219
409962-480895
visualTag
productDisp
Nike Sah
India Fanwe
Ne
In [35]: for i in range(1,len(JSON_files_Names)):
df=pd.read_json("./styles/"+JSON_files_Names[i]+".json")
df1=df.drop(columns=['notification','meta'])
df1=df1.T
df1 =df1.dropna(axis=1)
Semi_proccessed_Complete_Jdata=Semi_proccessed_Complete_Jdata.append(df1)
​
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 14/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [36]: Semi_proccessed_Complete_Jdata
Out[36]:
id
price
discountedPrice
styleType
productTypeId
articleNumber
data
1163
895
895
DEL
219
409962-480895
data
1164
1595
1595
P
289
Nike Sahara
Jersey
data
1165
2495
2495
D
219
Nike Jersey
data
1525
1299
1299
P
597
6818802
data
1526
1299
1299
P
294
6814201
...
...
...
...
...
...
...
data
59995
4300
2150
P
379
SR370-Black
data
59996
3400
1700
P
379
SR394PURPLE MIX59996
data
59998
1395
1395
P
445
9861EMT
data
59999
1595
1595
P
445
9874BX
visualTag
produ
N
India
eoss:PREMIUM
Ni
India
Nike M
Puma
...
Av
Avirat
Catw
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 15/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
data
id
price
discountedPrice
styleType
productTypeId
articleNumber
60000
499
499
DEL
364
AJ0056 Blue
visualTag
produ
A
44424 rows × 47 columns
In [85]:
Semi_proccessed_Complete_Jdata.head()
Out[85]:
Unnamed:
0
id
price
discountedPrice
styleType
productTypeId
articleNumber
visualTa
0
data
1163
895.0
895.0
DEL
219
409962-480895
Na
1
data
1164
1595.0
1595.0
P
289
Nike Sahara
Jersey
eoss:PREMIU
2
data
1165
2495.0
2495.0
D
219
Nike Jersey
Na
3
data
1525
1299.0
1299.0
P
597
6818802
Na
4
data
1526
1299.0
1299.0
P
294
6814201
Na
5 rows × 48 columns
In [70]:
Semi_proccessed_Complete_Jdata.to_csv('Semi_proccessed_Complete_Jdata.csv')
In [4]: Semi_proccessed_Complete_Jdata=pd.read_csv('Semi_proccessed_Complete_Jdata.csv',l
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 16/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [5]: Semi_proccessed_Complete_Jdata.tail()
Out[5]:
Unnamed:
0
id
price
discountedPrice
styleType
productTypeId
articleNumber
44419
data
59995
4300.0
2150.0
P
379
SR370-Black
Na
44420
data
59996
3400.0
1700.0
P
379
SR394PURPLE MIX59996
Na
44421
data
59998
1395.0
1395.0
P
445
9861EMT
Na
44422
data
59999
1595.0
1595.0
P
445
9874BX
Na
44423
data
60000
499.0
499.0
DEL
364
AJ0056 Blue
Na
visualT
5 rows × 48 columns
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 17/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [6]: BrandName=Semi_proccessed_Complete_Jdata['brandName'].unique()
BrandName
Out[6]: array(['Nike', 'Puma', 'Quechua', 'Artengo', 'Kalenji', 'Kipsta',
'Inesis', 'Domyos', 'Decathlon', 'Nabaiji', 'Newfeel', 'Geonaute',
'Reebok', 'Lotto', 'Inkfruit', 'FIFA', 'ADIDAS', 'Lee', 'Adidas',
'Basics', 'Probase', 'Red Tape', 'Numero Uno', 'Carlton London',
'Murcia', 'Disney', 'Classic Polo', 'Ediots', 'ID', 'Lee Cooper',
'Mr. Men Little Miss', 'Mr. Men', 'Catwalk', 'Do u speak green',
'Tantra', 'Guerrilla', 'ASICS', 'Myntra', 'Skechers', 'Converse',
'FILA', 'Status Quo', 'Crocs', 'Wrangler', 'Ed Hardy',
'Urban Yoga', 'Jealous 21', 'Spalding', 'Rockport', 'Mickey',
'Superman', 'Batman', 'DC Comics', 'Free Authority', 'Beatles',
'Pink Floyd', 'Smashing Pumpkins', 'Aerosmith', 'Marvel',
'Jimi Hendrix', 'Billy Idol', 'Rolling Stone', 'Nirvana',
'John Lenon', 'Wildcraft', 'Sher Singh', 'Levis Kids', 'Palm Tree',
'Gini and Jony', 'Being Human', 'DARK KNIGHT', 'LINKIN PARK',
'Queen', 'Ant', 'United Colors of Benetton', 'LOCOMOTIVE',
'HIGHLANDER', 'SPYKAR', 'Timberland', 'Forever New', 's.Oliver',
'SCULLERS FOR HER', 'SCULLERS', 'MEGADETH', 'W', 'Proline',
'Indigo Nation', 'Provogue', 'Fastrack', 'New Balance', 'Doodle',
'Mark Taylor', 'Regent Polo Club', 'John Miller', 'Buckaroo',
'Indian Terrain', 'Hush Puppies', 'Scholl', 'Little Miss',
'ESPRIT', 'Carrera', 'Clarks', 'Flying Machine', 'Vishudh',
'Playboy', 'Franco Leone', 'Ganuchi', 'AURELIA', 'Genesis',
'Reid & Taylor', 'Xoxo', 'Roadster', 'test', 'Mother Earth',
'Inc 5', 'Rocia', 'Chimp', 'Hanes', 'Belmonte', 'AND',
'Enroute Women', 'Vans', 'Arrow', 'New Hide', 'ADIDAS Originals',
'Arrow Sport', 'Hidekraft', 'Chhota Bheem', 'Belkin',
'Black coffee', 'Facit', 'Warner Bros', 'Tom & Jerry', 'Turtle',
'Mayhem', 'MTV', 'Tokyo Talkies', 'Enroute Men', 'Levis',
'Peter England', 'Spice Art', 'I DEE', 'Police', 'Image', 'GAS',
'Lino Perros', 'U.S. Polo Assn.', 'Pepe Jeans', 'CASIO', 'Beyouty',
'Speedo', 'CAT', 'Crusoe', 'DENI YO', 'Manchester United', 'Aneri',
'Wills Lifestyle', 'Undercolors of Benetton', 'VITAL Gear',
'GUESS', 'Nautica', 'DAVID BECKHAM', 'C Vox', 'Arrow Woman',
'Campbell', 'Quiksilver', 'ice watch', 'DIVA', 'Baggit', 'Tabac',
'4711', 'FOOTLOOSE', 'Skybags', 'Allen Solly', 'Celine Dion',
'Louis Philippe', 'Pal Zileri', 'Van Heusen', 'roxy', 'KIARA',
'Enamor', 'Fossil', 'Biba', 'John Players', 'Global Desi',
'Woodland', '2go ACTIVE GEAR USA', 'maxima', 'Satya Paul',
'Hugo Boss', 'aramis', 'DKNY', 'dunhill', 'Nautilus',
'Baldessarini', 'BOSS', 'Nike Fragrances', 'Music', 'Ray-Ban',
'OAKLEY', 'LA-EMOTIO', 'Folklore', 'Pacific Gold', 'Mumbai Slang',
'AMERICAN TOURISTER', 'Femella', 'J. DEL POZO', 'JAGUAR',
'Paris Hilton', 'TOUS', 'Slazenger', 'Formula 1', 'PERRY ELLIS',
'Calvin Klein', 'DAVIDOFF', 'yelloe', 'Miss-T', 'JOVAN',
'pierre cardin', 'MISS SIXTY', 'Kylie Minogue', 'Jockey',
'CHE GUEVARA', 'BULCHEE', 'YARDLEY', 'Secret Temptation',
'Park Avenue', 'Wild stone', 'Fogg', '18+', 'Gatsby', 'Old Spice',
'Denizen', 'CABARELLI', 'vogue', 'GIORDANO', 'ASPEN',
'Kenneth Cole', 'SKAGEN', 'Lovable', 'OPIUM', 'Fabindia', 'Azzaro',
'Cartier', 'Dolce & Gabbana', 'Ferrari', 'Issey Miyake', 'Versace',
'Arrow New York', 'Titan', 'Heart 2 Heart', 'Q&Q',
'Tonino Lamborghini', 'ONLY', 'Levitate', 'iPanema', 'Grendha',
'Allen Solly Woman', 'Van Heusen Woman', 'Angry Birds',
'U.S. Polo Assn. Denim Co.', 'Sepia', 'Jack & Jones', 'Homme',
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 18/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
'Cobblerz', 'Timex', 'Pieces', 'Vero Moda', 'Citizen', 'Tonga',
'Allen Solly Kids', 'Be For Bag', 'Envirosax', 'Windsor',
'Coolers', 'Fortune', 'Suunto', 'Senorita', 'Enroute Teens',
'Stens by Enroute', 'Bata', 'Gliders', 'Force 10', 'Kelme',
'Jungle Book', 'Peperone', 'Salomon', 'Ben 10', 'OTLS',
'SDL by Sweet Dreams', 'Strapless', 'Spinn', 'Paridhan',
'Nina Ricci', 'Yves Saint Laurent', 'Burberry',
'Salvatore Ferragamo', 'Bulgari', 'Mont Blanc', 'Estee Lauder',
'Valentino Perfumes', 'Giorgio Armani', 'Ralph Lauren',
'Carolina Herrera', 'U.S. Polo Assn. Kids', 'Madagascar 3',
'Happy Socks', '24', 'RNC', 'Mineral', 'Miami Blues', 'Polaroid',
'Kids Ville', 'Hannah Montana', 'Joker', 'Latin Quarters',
'Red Chief', 'Helix', 'Bwitch', 'Tortoise', 'Span', 'SWAYAM',
'Remanika', 'Estd. 1977', 'Prafful', 'Estelle', 'Alma',
'Little Miss Intimates', 'French Connection', 'Amante',
'FCUK Underwear', 'Calvin Klein Innerwear',
'Calvin Klein Underwear', 'Swiss Army', 'Wilson', 'Royal Diadem',
'HUGO', 'Carlos Moya', 'Spinz', 'Footfun', 'Globalite',
'Kama Sutra', 'Casio Baby-G', 'Lomani', 'Adrika', 'Fusion Beats',
'109F', 'Jacques M', 'Rasasi', 'Giorgio Beverly Hills',
'Paco Rabanne', 'Euroluxe', 'Rising Wave', 'Love Passport',
'Saint James', 'Barbie', 'F5', 'Umbro', 'York', 'Shree', 'Tiptopp',
'Portia', 'Avengers', 'The Amazing Spiderman', 'Pitaraa', 'Revv',
'Lucera', 'Miki Pearl', 'Deborah', 'Calzini', 'Parx', 'Stoln',
'Chromozome', 'Raymond', 'Hop Scotch', 'Kraus Jeans', 'Hidedge',
'Nyk', 'Inaya', 'Just Natural', 'Lencia', 'ToniQ', 'Red Rose',
'Mod-acc', 'Morellato', 'Just Cavalli', 'Fiorelli', 'JAG', 'FNF',
'Smugglerz', 'Ayaany', 'Garfield', 'Avon', 'F Sports', 'Sushilas',
'Ivory Tag', 'Lakme', 'Ponds', 'Smartoe', 'Revlon', 'Colorbar',
'Biara', 'Tommy Hilfiger', 'Streetwear', 'Olay', 'Horsefly', 'HM',
'Elle', 'Lotus Herbals', 'Biotique', 'Fruit of the loom',
'Rreverie', 'Rocky S', 'Alayna', 'FCUK', 'Hakashi',
'Taylor of London', 'Denim', 'Colour me', 'Cavallini', 'BRUT',
'Peri Peri', 'Avirate', 'Valley of Flowers'], dtype=object)
In [7]: idt=[]
In [8]: for i in range(1,425):
idt.append(i)
​
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 19/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [9]: idt
Out[9]: [1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20
In [10]: len(idt)
Out[10]: 424
In [11]: data={'id':idt,
'name':BrandName}
In [12]: dftest = pd.DataFrame(data)
In [13]: dftest
Out[13]:
id
name
0
1
Nike
1
2
Puma
2
3
Quechua
3
4
Artengo
4
5
Kalenji
...
...
...
419
420
Cavallini
420
421
BRUT
421
422
Peri Peri
422
423
Avirate
423
424
Valley of Flowers
424 rows × 2 columns
from translate import Translator translator= Translator(to lang="Arabic") translation =
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 20/33
6/28/22, 10:05 PM
o
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
t a s ate
po t
a s ato t a s ato
a s ato (to_ a g
ab c ) t a s at o
translator.translate("Nike") print(translation)
from googletrans import Translator
translator = Translator()
Arabic_BrandName=[] for i in BrandName: translation = translator.translate(i)
Arabic_BrandName.append(translation) Arabic_BrandName
In [14]: ​
result = translator.translate("Newfeel",src='en', dest='arabic')
result.text
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-14-e6322cfe5fca> in <module>
----> 1 result = translator.translate("Newfeel",src='en', dest='arabic')
2 result.text
NameError: name 'translator' is not defined
In [15]: result = translator.translate("Artengo",src='english', dest='arabic')
result.text
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-15-c6ef4e8326fa> in <module>
----> 1 result = translator.translate("Artengo",src='english', dest='arabic')
2 result.text
NameError: name 'translator' is not defined
In [16]: print(result.pronunciation)
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-16-9243e0d409fd> in <module>
----> 1 print(result.pronunciation)
NameError: name 'result' is not defined
In [17]: from deep_translator import GoogleTranslator
​
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 21/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [18]: Arabic_BrandName=[]
for i in BrandName:
if i.isnumeric():
Arabic_BrandName.append(i)
else:
result =GoogleTranslator(source='en', target='ar').translate(i)
Arabic_BrandName.append(result)
​
In [19]: Arabic_BrandName
Out[19]: ,'‫['نايك‬
,'‫'بوما‬
,'‫'الكيتشوا‬
,'‫'أرتينجو‬
,'‫'كالينجي‬
,'‫'كيبستا‬
,'‫'إينيسيس‬
,'‫'دوميوس‬
,'‫'ديكاتلون‬
,'‫'نابيجي‬
,'‫'شعور جيد‬
,'‫'جيونوت‬
,'‫'ريبوك‬
,'‫'لوتو‬
,'‫'إنكفروت‬
,'‫'اتحاد كرة القدم‬
,'‫'شركة اديداس‬
,'‫'لي‬
,'‫'شركة اديداس‬
'‫'األ ا ات‬
In [20]: result =GoogleTranslator(source='en', target='ar').translate("b24")
In [21]: result
Out[21]: '24 ‫'ب‬
In [22]: import requests
In [23]: scrapeKeywords=[]
for i in BrandName:
scrapeKeywords.append(i.replace(" ","+"))
​
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 22/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [24]: scrapeKeywords
Out[24]: ['Nike',
'Puma',
'Quechua',
'Artengo',
'Kalenji',
'Kipsta',
'Inesis',
'Domyos',
'Decathlon',
'Nabaiji',
'Newfeel',
'Geonaute',
'Reebok',
'Lotto',
'Inkfruit',
'FIFA',
'ADIDAS',
'Lee',
'Adidas',
'B i '
In [25]: response=requests.get("https://www.google.com/search?q=taylor+of+london+logo&tbm=
In [26]: data
Out[26]: {'id': [1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20
In [27]: from bs4 import BeautifulSoup
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 23/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [28]: soup= BeautifulSoup(response.content,"html.parser")
soup
Out[28]: <!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wap
forum.org/DTD/xhtml-mobile10.dtd">
<html dir="rtl" lang="ar" xmlns="http://www.w3.org/1999/xhtml"><head><meta co
ntent="application/xhtml+xml; charset=utf-8" http-equiv="Content-Type"/><meta
content="no-cache" name="Cache-Control"/><title dir="ltr">taylor of london lo
go - ‫ بحث‬Google</title><style>a{text-decoration:none;color:inherit}a:hover{te
xt-decoration:underline}a img{border:0}body{font-family:Roboto,Helvetica,Aria
l,sans-serif;padding:8px;margin:0 auto;max-width:700px;min-width:240px;}.FbhR
zb{border-right:thin solid #dadce0;border-left:thin solid #dadce0;border-top:
thin solid #dadce0;height:40px;overflow:hidden}.n692Zd{margin-bottom:10px}.cv
ifge{height:40px;border-spacing:0}.QvGUP{height:40px;padding:0 8px 0 8px;vert
ical-align:top}.O4cRJf{height:40px;width:100%;padding:0;padding-left:16px}.O1
ePr{height:40px;padding:0;vertical-align:top}.kgJEQe{height:36px;width:98px;v
ertical-align:top;margin-top:4px}.lXLRf{vertical-align:top}.MhzMZd{border:0;v
ertical-align:middle;font-size:14px;height:40px;padding:0;width:100%;paddingright:16px}.xB0fq{height:40px;border:none;font-size:14px;background-color:#42
85f4;color:#fff;padding:0 16px;margin:0;vertical-align:top;cursor:pointer}.xB
0fq:focus{border:1px solid #000}.M7pB2{border:thin solid #dadce0;margin:0 0 3
px 0;font-size:13px;font-weight:500;height:40px}.euZec{width:100%;height:40p
t t li
t
b d
i
0}t bl
Z
td{ ddi
0 idth 25%} QI I
In [29]: soup.select('img[src^="https://encrypted-tbn0.gstatic.com/images"]')[0]['src']
Out[29]: 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRj4QKatyLljpw3eCK0la48ov
NCQfbvQbiZEGe_oKd0ofUbfK3PB-MFTEO2RQ&s'
In [30]: BrandName_Logos=[]
for i in scrapeKeywords:
response=requests.get("https://www.google.com/search?q="+i+"+logo&tbm=isch&ve
soup= BeautifulSoup(response.content,"html.parser")
logoSRC=soup.select('img[src^="https://encrypted-tbn0.gstatic.com/images"]')[
BrandName_Logos.append(logoSRC)
In [31]: len(BrandName_Logos)
Out[31]: 424
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 24/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [32]: BrandName_Logos
Out[32]: ['https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQi_iLEOGN8vEqDnH3lObH
HlYM0uwme1L7LmejHPDOqk2wi50G-itdJUlSmS54&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTHg2wTDFgZrmOTk0eRkgg
bBbajhWNOoVSTcyiPy0yUluiNdASfBBBDQX6wug&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS9az2e0v7sgAQU1ObkG4x
nBdXdwGITW0xERtkMpb2RpzUeUKz9hFzknh2SZw&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRgQSAZ8MiWdlh-lIA6kA9
lnpAJHI79KWE4YdeSV3TK5Dns8RX967OSAEASyao&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRfRJKgaf74x6DiXbYvsPK
pQqMqkrGOlEpAAOQghpSbmD2K8MzzF_mpvYFDL7g&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQFpuWbkk2be4cq0UH7l0D
sgcmIQpktodh_JeOquBAc_HNOLL6OPKjRpMjW5g&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT1cX3hOc8WYdfJIsRqy6t
Sxb_EOugkRewTOEEwjyPoECR8qjraz50zmslx3g&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ0PS36o8zEDaT4CPoN_jD
cM1V37UVJbRihorFfFC5rt7wCRQ-_NtBc4unfZpI&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQcfd-UV8vJ8ujvf3-jfi9
yLDVwKETrH8ZsYNXQhd6Hk92_4y2P-gUL_FPTtQ&s',
'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSYxx-Tn_R-kVqg2niic1U
JFUM h Mb1Mh14 Zki 2 R tC
Y 5D 6
K & '
In [33]: top=[]
top[:]=np.zeros
top
--------------------------------------------------------------------------TypeError
Traceback (most recent call last)
<ipython-input-33-dd047583b9da> in <module>
1 top=[]
----> 2 top[:]=np.zeros
3 top
TypeError: can only assign an iterable
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 25/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [34]: top = []
for i in range(424):
top.append(int(0))
top
Out[34]: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
In [35]: len(top)
Out[35]: 424
In [36]: slug=[]
for i in range(424):
slug.append("")
In [37]: slug
Out[37]: ['',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 26/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [38]: meta_title=[]
for i in range(424):
meta_title.append("")
In [39]: meta_title
Out[39]: ['',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
In [40]: meta_description=meta_title
In [41]: meta_description
Out[41]: ['',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 27/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [42]: created_at=[]
updated_at=[]
for i in range(424):
created_at.append("")
updated_at.append("")
In [43]: created_at
Out[43]: ['',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
In [44]: updated_at
Out[44]: ['',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 28/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [45]: brands={'id':idt,
'name':BrandName,
'logo':BrandName_Logos,
'top':top,
'slug':slug,
'meta_title':meta_title,
'meta_description':meta_description,
'created_at':created_at,
'updated_at':updated_at}
In [46]: brands
Out[46]: {'id': [1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20
In [47]: brands = pd.DataFrame(brands)
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 29/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [48]: brands
Out[48]:
id
name
logo
top
0
1
Nike
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
1
2
Puma
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
2
3
Quechua
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
3
4
Artengo
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
4
5
Kalenji
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
...
...
...
...
...
419
420
Cavallini
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
420
421
BRUT
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
421
422
Peri Peri
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
422
423
Avirate
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
423
424
Valley of
Flowers
https://encryptedtbn0.gstatic.com/images?
q=tb...
0
slug
meta_title
meta_description
created_at
...
...
...
...
up
424 rows × 9 columns
In [49]: brands.to_csv('brands.csv',index=False)
In [50]: langarabic=[]
for i in range(424):
langarabic.append("Arabic")
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 30/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [51]: brands_translations={'id':idt,
'brand_id':idt,
'name':Arabic_BrandName,
'lang':langarabic,
'created_at':created_at,
'updated_at':updated_at}
In [52]: brands_translations = pd.DataFrame(brands_translations)
In [53]: brands_translations
Out[53]:
id
brand_id
name
lang
0
1
1
‫نايك‬
Arabic
1
2
2
‫بوما‬
Arabic
2
3
3
‫الكيتشوا‬
Arabic
3
4
4
‫أرتينجو‬
Arabic
4
5
5
‫كالينجي‬
Arabic
...
...
...
...
...
419
420
420
‫كافاليني‬
Arabic
420
421
421
BRUT
Arabic
421
422
422
‫بيري بيري‬
Arabic
422
423
423
‫أفيرات‬
Arabic
423
424
424
‫وادي الزهور‬
Arabic
created_at
updated_at
...
...
424 rows × 6 columns
In [54]: brands_translations.to_csv('brand_translations.csv',index=False)
In [55]: Semi_proccessed_Complete_Jdata.iloc[[0]].articleAttributes.values.tolist()
Out[55]: ["{'Fit': 'Regular Fit', 'Fabric 3': 'NA', 'Body or Garment Size': 'Garment Mea
surements in', 'Occasion': 'Sports'}"]
In [2]: dfs=pd.read_csv("brand_translations.csv")
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 31/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
In [3]: dfs
Out[3]:
id
brand_id
name
lang
created_at
updated_at
0
1
1
‫نايك‬
Arabic
NaN
NaN
1
2
2
‫بوما‬
Arabic
NaN
NaN
2
3
3
‫الكيتشوا‬
Arabic
NaN
NaN
3
4
4
‫أرتينجو‬
Arabic
NaN
NaN
4
5
5
‫كالينجي‬
Arabic
NaN
NaN
...
...
...
...
...
...
...
419
420
420
‫كافاليني‬
Arabic
NaN
NaN
420
421
421
BRUT
Arabic
NaN
NaN
421
422
422
‫بيري بيري‬
Arabic
NaN
NaN
422
423
423
‫أفيرات‬
Arabic
NaN
NaN
423
424
424
‫وادي الزهور‬
Arabic
NaN
NaN
424 rows × 6 columns
In [11]: dfs.to_csv("brand_translations_.csv", encoding="utf-8",index=False)
In [10]: pd.read_csv("brand_translations.csv")
Out[10]:
id
brand_id
name
lang
created_at
updated_at
0
1
1
‫نايك‬
Arabic
NaN
NaN
1
2
2
‫بوما‬
Arabic
NaN
NaN
2
3
3
‫الكيتشوا‬
Arabic
NaN
NaN
3
4
4
‫أرتينجو‬
Arabic
NaN
NaN
4
5
5
‫كالينجي‬
Arabic
NaN
NaN
...
...
...
...
...
...
...
419
420
420
‫كافاليني‬
Arabic
NaN
NaN
420
421
421
BRUT
Arabic
NaN
NaN
421
422
422
‫بيري بيري‬
Arabic
NaN
NaN
422
423
423
‫أفيرات‬
Arabic
NaN
NaN
423
424
424
‫وادي الزهور‬
Arabic
NaN
NaN
424 rows × 6 columns
In [ ]: ​
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 32/33
6/28/22, 10:05 PM
Data Preparation + Data input Pre Pre Processing(Merging JSON files to make a data frame out of them) - Jupyter Notebook
localhost:8888/notebooks/E Commerce Project/Data Preparation %2B Data input Pre Pre Processing(Merging JSON files to make a data frame out … 33/33
Download