IOAI 2024: ML problem . Colab
v
10/08/2074, 10:45 PM
Lost in a Hyperspace: ML Regression challenge
https:/Jcolab.research.google.com/drive/181h83ZYyL3twpaciwuS53ISqpKJZGapK#scrollTo=4b8elP3OA4IG
Page 1 of 7
.,
IOAI 2024: ML problem - Colab
10/08/2024, 10:45 PM
Story Background
Congratulations on your promotion to Principal Engineering Detective! Your impressive work
on the previous task has earned you this exciting new challenge. Now, you are entrusted with
the ancient and mesmerizing Glowing Hypercubes, which share some intriguing
characteristics with the "Pulse of the Machine" widgets from your last mission (refer to
Important Tips for details). Your mission is to unravel the mysteries of these Glowing
Hypercubes by predicting three vital properties using the provided data.
Objective and Limitations
• Your ultimate goal is to effectively predicts three properties of the Glowing Hypercubes
• Every Glowing Hypercube is represented by the (5 x 5 x 5 x 6) array with lots of
symmetries and unique properties (see Important tips section for details)
• You need to engineer a small number of features from the Glowing Hypercube data,
since efficient factory procedures allow you to only use Linear Regression as a model,
with no hyperparameters change allowed. You are also limited by 300 features for each
task.
• Your success will be measured by Root Mean Square Error metric for each feature
independently and is translated into the score on the leaderboard.
• Note that different features have different weights in the final score. See
SCALING_WEIGHTS variable for details. After scaling, to make a single score number,
we
will average normalized RMSEs for each property.
• Your solution for each task should not exceed 5 minutes for feature generation, training,
and inference on the standard Colab non-GPU instance.
• Sharethe nil_fea ture_0. txt, ml_fea ture_l.t xt,and ml_feature_2.txt f1leswith
us, and don't forget to supply your Google Colab as well
https://colab.research.google.com/drlve/181h83ZYyL3twpaclwuS531SQpKjZGapK#s
crollTo=4b8elP30A41G
Page 2 of 7
f:!
..
OAI 2024: ML problem _ Colab
V
10/08/2024, 10:45 PM
Important Tips
UI
01.t
001
shape= (5, 5, 5, 6)
°''
- Last dimension corresponds to the
threads In "Pulse of the Machine"
from home task as shown on the left
---
... _ _ _ _
.J
r
- First three axes are interchangeable
(answer does not change if axes are
swapped by f1J!!!JJJV.SWa~.§1)
- Data contains many symmetries and
is redundant
• Linear Regression documentation
0
bnps://scikitlearn.org/stable/modules/generated/sklearn. linear model.LinearRegression.html
• Handy Numpy functions:
b.ttps://numP-Y..org/doc/stable/reference/generated/numP-Y..SWapaxes.html
o https://numf2Y..org/doc/stable/reference/generated/numP-Y..ravel.html
o
0
bttps://numrzy.&rg/doc/stable/reference/generated/nu mpy.reshape. html
• Root Mean Square Error
o
https://en.wikipedia.org/wiki/Root_mean_fillllfil.e deviation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
SCALING_WEIGHTS = [100/15, 100/8, 100/100]
https:f/colab.research.google.com/drive/181h83ZYyL3twpaciwuS53ISqpKjZGapK#scrollTo=4b8elP3OA4IG
Page 3 of 7
_J
10/08/2024, 10:45 PM
IOAI 2024: ML problem - Colab
!gdown lfej2iwlU2k5ugiRdUGJulz_b_qOWydN
data= pd.read_pickle('ml_data_onsite_.pickle')
for key in data.ke ys():
print(key)
for key in data['X '].keys ():
print(key)
for key in data['y '].keys ():
print(key)
X_train = data['X '] ['train ']
y_train = data['y '] ['train ']
X_val = data['X '] ['val']
y_val = data['y '] ['val']
X_test = data['X '] ['test' ]
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape
def vis(ar r):
plt.fig ure(fig size=( 8, 8))
cnt = 1
for z in range(S):
for q in range (6) :
plt.sub plot(S, 6, cnt)
plt.imshow(arr[:, :, z, q], vmin=-40, vmax=40, cmap='hsv')
plt. grid()
pl t. axis ( off' )
cnt += 1
plt.tig ht_lay out()
1
vis(X_train[0])
https:/fcolab .researc h •go
ogle com/drive/18lh83ZYyL3twpaciwuS53iSqpKjZGapK#scrollTo=4b8elP30A41G
•
Page 4 of 7
10/08/2024, 10:45 PM
V
Functions for result evaluation / writing predictions
Do not change it!
def test_soluti on(X_train , y_train, X_val, y_val, feature_num=0):
assert X_train.shape[-1] <= 300, "Too many features! Should be less than 3E
assert X_val.shape[-1] <= 300, "Too many features! Should be less than 300'
model= LinearRegression().fit(
X_train,
y_train[:, feature_num]
)
prediction s= model.predict(X_val)
rmse = mean_squared_error(
predictions,
y_val[:, feature_num]
)**·5
normalized_rmse = rmse * SCALING_WEIGHTS[feature_num]
raw RMSE={rmse: .6f}
print(f Property #{feature_num}:
print(f Property #{feature_num}: scaled RMSE={normalized_rmse:.6f}")
return normalized_rmse.round(6)
11
11
)
11
def write_predictions(X_train, y_train, X_test, feature_num=0):
assert X_train.shape[-1] <= 300
assert X_val.shape[-1] <= 300
model= LinearRegression().fit(
X_train,
y_train[:, feature_num]
prediction s= model.predict(X_test)
1
with open(f 1 predictions_feature_{feature_num}.txt ,
f .write( \n .join( [str(x. round(6))
for x in prediction s]))
1
v
1
w
1
)
as f:
1
Let's try a baseline solution
https://colab.research.google.com/drive/181h83ZYyL3twpaclwuS53iSQpKJZGapK#scrollTo=4b8elP30A41G
Page 5 of 7
10/08/2024, 10:45 PM
IOAI 2024: ML problem - Colab
def dummy_feature_extractor(X):
X_new = X.reshape((X.shape[0], -1)) # ravel
X_new = X_new[:, :300] # pick first 300 features
return X_new
dummy_feature_extractor(X_train).shape
%%time
total_scor e = 0
for feature_number in range(3):
total_scor e += test_solut ion(
dummy_feature_extractor(X_train),
y_train,
dummy_feature_extractor(X_val),
y_val,
feature_num=feature_number
)
print ()
total_scor e /= 3
print( '='*16)
print(f"To tal score = {total_score: .6f}
v
11
)
How to prepare the answer files
for feature_number in range(3):
write_predictions(
dummy_feature_extractor(X_train),
y_train,
dummy_feature_extractor(X_test),
feature- num=feature- number
https:l/colab.research.google.com/drive/181h83ZYyL3twpaciwuS53iSQPKJZGapK#scrollTo=4b8elP30A41G
Page 6 of 7