model serialization - what is ".pk" format?

I encountered a piece of pytorch code where the trained model is saved with .pk. I often see pytorch models being saved as .pth or .pt. What is .pk format and how is it different from .pth or .pt? Btw, the following parameters and weights are saved under the .pk file: save_dict = { "encoderOpts": encoderOpts, "classifierOpts": classifierOpts, "dataOpts": dataOpts, "encoderState": encoder_state_dict, "classifierState": classifier_state_dict, } Many thanks in advance!
Category: Data Science

PicklingError: Could not serialize object: TypeError: can't pickle fasttext_pybind.fasttext objects

I built a fasttext classification model in order to do sentiment analysis for facebook comments (using pyspark 2.4.1 on windows). When I use the prediction model function to predict the class of a sentence, the result is a tuple with the form below: [('__label__positif', '__label__négatif', '__label__neutre', 0.8947999477386475, 0.08174632489681244, 0.023483742028474808)] but when I tried to apply it to the column "text" I did this : from pyspark.sql.types import * from pyspark.sql.functions import udf, col import fasttext schema = StructType([ StructField("pos", StringType(), …
Category: Data Science

Use serialized model without installing dependencies

I prototyped an ML model consisting of preprocessing + multiple stacked regressors. I would like a colleague of mine to develop an API that will query the model. Is there any way to query the model (sklearn pipeline) without having to download all the dependencies (XGBoost, LGBM, CatBoost, ...). I tried to serialize it with Joblib but when we deserialize it on another machine it requires to have dependencies installed. The goal is really to transform the sklearn's pipeline to …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.