I encountered a piece of pytorch code where the trained model is saved with .pk. I often see pytorch models being saved as .pth or .pt. What is .pk format and how is it different from .pth or .pt? Btw, the following parameters and weights are saved under the .pk file: save_dict = { "encoderOpts": encoderOpts, "classifierOpts": classifierOpts, "dataOpts": dataOpts, "encoderState": encoder_state_dict, "classifierState": classifier_state_dict, } Many thanks in advance!
I built a fasttext classification model in order to do sentiment analysis for facebook comments (using pyspark 2.4.1 on windows). When I use the prediction model function to predict the class of a sentence, the result is a tuple with the form below: [('__label__positif', '__label__négatif', '__label__neutre', 0.8947999477386475, 0.08174632489681244, 0.023483742028474808)] but when I tried to apply it to the column "text" I did this : from pyspark.sql.types import * from pyspark.sql.functions import udf, col import fasttext schema = StructType([ StructField("pos", StringType(), …
I prototyped an ML model consisting of preprocessing + multiple stacked regressors. I would like a colleague of mine to develop an API that will query the model. Is there any way to query the model (sklearn pipeline) without having to download all the dependencies (XGBoost, LGBM, CatBoost, ...). I tried to serialize it with Joblib but when we deserialize it on another machine it requires to have dependencies installed. The goal is really to transform the sklearn's pipeline to …
I am not entirely sure if this is on-topic here, so please let me know if it is not. I keep seeing the idea of YAML files pop up while reading machine learning literature. My question is, what exactly is a YAML file, and how does it relate to machine learning and data science projects?
I have about 100MB of CSV data that is cleaned and used for training in Keras stored as Panda DataFrame. What is a good (simple) way of saving it for fast reads? I don't need to query or load part of it. Some options appear to be: HDFS HDF5 HDFS3 PyArrow