{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"code","source":"# Introduction","metadata":{"papermill":{"duration":0.043352,"end_time":"2021-04-27T17:26:59.673447","exception":false,"start_time":"2021-04-27T17:26:59.630095","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### ADVERTISING SYSTEM OVERVIEW\nThe overall scenario of the display advertising system is illustrated below. \n![](https://media.arxiv-vanity.com/render-output/2954884/images/omni/sys4.png)\nWhen a user visits the e-commerce advertising system, it \n\ni) Checks user historical behavior data. \n\nii) Generates candidate ads by matching module. \n\niii) Predicts the click probability of each ad and selects appropriate ads which can attract attention (click) by ranking module. \n\niv) Logs the user reactions given the displayed ads. \n\nThis turns to be a closed-loop consumption and generation of user behavior data. \n\nTo fetch user's interest by utilising and excavating the rich historical behavior data is very crucial for building the click-through rate (CTR) prediction model in the online advertising system in e-commerce industry. \n\nThere are two key observations on user behavior data: \n\n**i) Diversity:** Users are interested in different kinds of goods when visiting e-commerce site. For example, a young mother may be interested in T-shits, leather handbag, shoes, earrings, children’s coat, etc at the same time. \n\n**ii) Local Activation:** Whether users click or not click an item depends only on part of their related historical behavior.For example, a swimmer will click a recommended goggle mostly due to the fact her recent purchase of bathing suit while not the books in her last week’s shopping list.\n\nBefore we dive deep into this subject let us understand some common terminologies.\n\n**CPC(Cost-Per-Click):** In CPC advertising systems like the one in Alibaba, advertisements are ranked based on **eCPM(effective Cost Per Mille)** which is a product of bid price and **CTR( Click-Through-Rate)**.\n\nOverall if we look at a performance of CTR prediction model it has a direct impact on the overall revenue and plays a crucial role in advertising systems.\n\nMost traditional CTR models lack capturing the structures of behavioral data.\n\nDeep learning methods because of its success rate are extensively used in CTR prediction models.They usually first employ embedding layer on the input, mapping original large scale sparse id features to the distributed representations, then add fully connected layers i.e. **MLP (Multi Layer Perceptrons)** to automatically learn the nonlinear relations among features.MLP reduce a lot of feature engineering jobs, which are time and effort consuming in industry applications and have become a popular model structure on CTR prediction problem. However, in the fields with rich internet-scale user behavior data, such as online advertising and recommendation system in e-commence industry, these MLPs models often lack a deeper understanding and exploiting the specific structures of behavior data, thus leaving a space for further improvement.\n\n In this notebook, a new proposed model called **Deep Interest Network (DIN)** is introduced ,implemented .This model is developed and deployed in the display advertising system in Alibaba. \n\n**DIN** represents users’ diverse interests with an interest distribution and designs an attention-like network structure to locally activate the related interests according to the candidate ad, which is proven to be effective and significantly outperforms traditional model. Overfitting problem is easy to encounter on training such industrial deep network with large scale sparse inputs and will be handled with a new proposed adaptive regularization technique.\n\nInspired by the attention mechanism used in machine translation model, DIN represents users’ diverse interests with an interest distribution and designs an attention-like network structure to locally activate the related interests according to the candidate ad. Behaviors with higher relevance to the candidate ad get higher attention scores and dominate the prediction. Experiments on Alibaba’s productive CTR prediction datasets prove that the proposed DIN model significantly outperforms MLPs under the **GAUC (Group weighted AUC)** metric measurement.Let us understand the GAUC metric in detail.\n\nArea under receiver operator curve (AUC)is a commonly used metric in CTR prediction area. In practice,a new metric named GAUC, which is the generalization of AUC is designed which is a weighted average of AUC calculated in the subset of samples group by each user. The weight can be impressions or clicks. An impression based GAUC is calculated as follows:\n\nGAUC = Sigma(wi* AUCi)/Sigma(wi) where i = 1 to n\n\nGAUC is practically proven to be more indicative in display advertisement settings, where CTR model is applied to rank candidate ads for each user and model performance is mainly measured by how good the ranking list is, that is, a user specific AUC. Hence, this method can remove the impact of user bias and measure more accurately the performance of the model over all users. With years of application implementation effectiveness in production systems, GAUC metric is verified to be more stable and reliable than AUC.\n\nOverfitting problem is easy to encounter on training such industrial deep network with large scale sparse inputs. The deep network models easily fall into the overfitting trap and cause the model performance to drop rapidly which is overcome by proposing an efficient adaptive regularization technique.\n\nLet us explore more in detail about Deep Interest Network model by looking its model architecture.\n\n### DIN MODEL ARCHITECTURE\n![](https://media.arxiv-vanity.com/render-output/2954884/images/omni/model_arch.png)\n\n#### BASE MODEL\nThe base model is composed with two steps: \n\ni) Transfer each sparse id feature into a embedded vector space. \n\nii) Apply MLPs to fit the output. \n\nNote that the input contains user behavior sequence ids, of which the length can be varied. Thus we add a pooling layer (e.g. sum operation) to summarize the sequence and get a fixed size vector. As illustrated in the left part of the model architecture, the base model works well practically, which now serves the main traffic of our online display advertising system.\n\nHowever, going deep into the pooling operation, we will find that much information is lost, that is, it destroys the inner structure of user behavior data. This observation inspires us to design a better model.\n\n#### DEEP INTEREST NETWORK (DIN) DESIGN\nIn our display advertising scenario, we wish our model to truly reveal the relationship between the candidate ad and users’ interest based on their historical behaviors.\n\nAs discussed above, behavior data contains two structures: diversity and local activation. \n\nThe diversity of behavior data reflects users’ various interests. User click of ad often originates from just part of user’s interests. In NMT task it is assumed that the importance of each word in each decode process is different in a sentence. Attention network can be viewed as a specially designed pooling layer which learns to assign attention scores to each word in the sentence, which in other words follows the diversity structure of data.\n\nNote :It is unsuitable and highly not recommended to directly apply the attention layer in our applications, where embedding vector of user interest varies with different candidate ads but rather should follow the local activation structure. Let’s check what will happen if the local activation structure is not followed.\n\nNow we get the distributed representation of users(Vu) and ads(Va). \nFor the same user,Vu is a fixed point in embedding space. It is the same to ad embedding Va. \n\nLet us assume that we use inner product to calculate the relevance between user and ad, \n\nF(U,A) =Vu∙Va. \n\nIf both F(U,A) and F(U,B) are high, which means user U is relevant to both ads \"A\" and \"B\". Under this way of calculation, any point on the line between the vector of Va and Vb will get high relevance score. \n\nIt brings a hard constraint to the learning of distributed representation vector for both user and ad. One may increase the embedding dimensionality of the vector space to satisfy the constraint, which can work perhaps, but will cause a huge increase of model parameters.\n\nTo overcome the above problem of having hugh increase in model parameters DIN is designed with two structures of data as illustrated in the right side of the above model architecture diagram. \n\nMathematically, the embedding vector Vu of user U turns to be a function of the embedding vector Va of ad A, i.e.\n\nVu = f(Va)= Sigma(wi*Vi) where i = 1 to N\n \nVu = sigma (g(Vi*Va)* Vi)\n\nWhere\n\nVi = embedding of behavior id i, such as good_id,shop_id etc\n\nVu = weighted sum of all the behavior ids. \n\nwi = the attention score that the behavior id i contributes to the overall user interest embedding vector Vu with respect to the candidate ad A.\n\ng = activation function = g(Vi*Va) = wi --- In our implementation,wi is the output of activation unit (denoted by function g) with inputs of Vi and Va. PReLU is a common used activation function at the beginning.However, with large scale sparse input ids, training such industrial-scale network still faces a lot of challenge. To further improve the convergence rate and performance of our model,a novel data dependent activation function named \"Dice\" is used.\n\nIn all DIN designs the activation unit to follow local activation structure and weighted sum pooling to follow diversity structure. \n\nDIN is implemented at a multi-GPU distributed training platform named **X-Deep Learning (XDL)**, which supports model-parallelism and data-parallelism. Due to the high performance and flexibility of XDL platform, we accelerate training process about 10 times and optimize hyparameters automatically with high tuning efficiency.\n![](https://media.arxiv-vanity.com/render-output/2954884/images/omni/XDL.png)\nXDL is designed to solve the challenges of training industrial scale deep learning networks with large scale sparse inputs and tens of billions of parameters. Most of the deep networks published so far are constructed with two steps namely: \n\ni) Employ the embedding technique to cast the original sparse input into low dimensional and dense vectors \nii) Bridge with networks like MLPs, RNN, CNN etc. Most of the parameters are focused in the first embedding step which needs to be distributed over multi machines. The second network step can be handled within single machine. Under such circumstance, we architecture the XDL platform is architected in a bridge manner, as shown above, which is composed of three main kinds of components:\n\n**a. Distributed Embedding Layer:** It is a model-parallelism module, parameters of embedding layer are distributed over multi-GPUs. Embedding Layer works as a predefined network unit, which provides with forward and backward modes.\n\n**b.Local Backend:** It is a standalone module, which aims to handle the local network training. Here we reuse the open-sourced deep learning frameworks, like tensorflow. With the unified data exchange interface and abstraction, it is easy for us to integrate and switch in different kinds of frameworks.\n\n**c.Communication Component:** It is the base module, which helps to parallel both the embedding layer and backend.\n\nBelow is a visualization of embeddings of goods in DIN model. Shape of goods represents category of goods. Color of goods corresponds to CTR prediction value.\n![](https://media.arxiv-vanity.com/render-output/2954884/images/omni/TDdiagram.png)\n\nThe below illustration of locally activation property in DIN model. Behaviors of high relevance with candidate ad get high attention intensity.\n![](https://media.arxiv-vanity.com/render-output/2954884/images/omni/attention2.png)\n\nI think enough overview of the theory .Let us jump into the real time implementation of the DIN model in Ad business world by coming up with a Click prediction model using DeepCTR library . \n\nLets install and move forward in implementation. I have choosen a unique public Ad Display/Click Data on **Taobao.com** available at https://tianchi.aliyun.com/dataset/dataDetail?dataId=56&userId=1\n\n### Dataset details:\n\n**raw_sample.csv**\n\nWe randomly sampled 1140000 users from the website of Taobao for 8 days of ad display / click logs (26 million records) to form the original sample skeleton. Field description is as follows:\n\n(1) user: User ID(int);\n\n(2) time_stamp: time stamp(Bigint, 1494032110 stands for 2017-05-06 08:55:10);\n\n(3) adgroup_id: adgroup ID(int);\n\n(4) pid: scenario;\n\n(5) noclk: 1 for not click, 0 for click;\n\nWe used 7 days’s samples as training samples (20170506-20170512), and the last day’s samples as test samples (20170513).\n\n**ad_feature.csv**\n\nThis data set covers the basic information of all ads in raw_sample. Field description is as follows:\n\n(1) adgroup_id:Ad ID(int) ;\n\n(2) cate_id:category ID;\n\n(3) campaign_id:campaign ID;\n\n(4) brand:brand ID;\n\n(5) customer_id: Advertiser ID;\n\nOne of the ad ID corresponds to an item, an item belongs to a category, an item belongs to a brand.\n\n**user_profile.csv**\n\nThis data set covers the basic information of 1060000 users in raw_sample.. Field description is as follows:\n\n(1) userid: user ID;\n\n(2) cms_segid: Micro group ID;\n\n(3) cms_group_id: cms_group_id;\n\n(4) final_gender_code: gender 1 for male , 2 for female\n\n(5) age_level: age_level\n\n(6) pvalue_level: Consumption grade, 1: low, 2: mid, 3: high\n\n(7) shopping_level: Shopping depth, 1: shallow user, 2: moderate user, 3: depth user\n\n(8) occupation: Is the college student 1: yes, 0: no?\n\n(9) new_user_class_level: City level","metadata":{"papermill":{"duration":0.029392,"end_time":"2021-04-27T17:26:59.733101","exception":false,"start_time":"2021-04-27T17:26:59.703709","status":"completed"},"tags":[]}},{"cell_type":"code","source":"!pip install --no-warn-conflicts -q deepctr==0.7.4","metadata":{"papermill":{"duration":4.556475,"end_time":"2021-04-27T17:27:04.319737","exception":false,"start_time":"2021-04-27T17:26:59.763262","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"from sklearn.metrics import log_loss, roc_auc_score\nfrom sklearn.model_selection import StratifiedKFold\nfrom sklearn.preprocessing import LabelEncoder\nfrom tensorflow.keras.models import Model, load_model\nfrom deepctr.models import DIN,DeepFM,DIEN,DSIN,xDeepFM\nfrom deepctr.inputs import SparseFeat,VarLenSparseFeat,DenseFeat,get_feature_names\nfrom tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler, Callback\nfrom sklearn.preprocessing import OrdinalEncoder, LabelEncoder, OneHotEncoder\nfrom tensorflow.keras.utils import get_custom_objects\nfrom tensorflow.keras.optimizers import Adam,RMSprop\nfrom tensorflow.keras.layers import Activation\nfrom tensorflow.keras import backend as K\nfrom tensorflow.keras import callbacks\nfrom tensorflow.keras import utils\nimport tensorflow.keras as keras\nimport tensorflow as tf\nimport pandas as pd\nimport numpy as np\nimport tensorflow as tf\nimport warnings\nimport pandas_profiling \nfrom tensorflow.keras.losses import binary_crossentropy\nfrom sklearn.metrics import log_loss, roc_auc_score\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import MinMaxScaler\nwarnings.simplefilter('ignore')","metadata":{"_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","papermill":{"duration":8.2329,"end_time":"2021-04-27T17:27:12.583027","exception":false,"start_time":"2021-04-27T17:27:04.350127","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Load Taoboa Dataset","metadata":{"papermill":{"duration":0.057567,"end_time":"2021-04-27T17:27:12.70128","exception":false,"start_time":"2021-04-27T17:27:12.643713","status":"completed"},"tags":[]}},{"cell_type":"code","source":"raw_sample_df = pd.read_csv('../input/ad-displayclick-data-on-taobaocom/raw_sample.csv')\nad_feature_df = pd.read_csv('../input/ad-displayclick-data-on-taobaocom/ad_feature.csv')\nuser_profile_df=pd.read_csv('../input/ad-displayclick-data-on-taobaocom/user_profile.csv')","metadata":{"_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","papermill":{"duration":28.303579,"end_time":"2021-04-27T17:27:41.062423","exception":false,"start_time":"2021-04-27T17:27:12.758844","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Optimise dataset:\nDue to the size of the dataset it is observed that while processing the data CPU and RAM utilisation reached optimum levels leading to failure of the notebook and restarting again. To avoid this problem I have come up with a way to optimise the memory utilisation by more than 75 % reduction in RAM usage as shown below","metadata":{"papermill":{"duration":0.032123,"end_time":"2021-04-27T17:27:41.127018","exception":false,"start_time":"2021-04-27T17:27:41.094895","status":"completed"},"tags":[]}},{"cell_type":"code","source":"test_size_mb = raw_sample_df.memory_usage().sum() / 1024 / 1024\ntest_size_mb1 = ad_feature_df.memory_usage().sum() / 1024 / 1024\ntest_size_mb2 = user_profile_df.memory_usage().sum() / 1024 / 1024\nprint(\"raw_sample_df memory size: %.2f MB\" % test_size_mb)\nprint(\"ad_feature_df memory size: %.2f MB\" % test_size_mb1)\nprint(\"user_profile_df memory size: %.2f MB\" % test_size_mb2)","metadata":{"papermill":{"duration":0.058413,"end_time":"2021-04-27T17:27:41.217027","exception":false,"start_time":"2021-04-27T17:27:41.158614","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### We're going to be calculating memory usage a lot,so we'll create a function namely mem_usage()to save us some time!","metadata":{"papermill":{"duration":0.031067,"end_time":"2021-04-27T17:27:41.281353","exception":false,"start_time":"2021-04-27T17:27:41.250286","status":"completed"},"tags":[]}},{"cell_type":"code","source":"def mem_usage(pandas_obj):\n if isinstance(pandas_obj,pd.DataFrame):\n usage_b = pandas_obj.memory_usage(deep=True).sum()\n else: # we assume if not a df it's a series\n usage_b = pandas_obj.memory_usage(deep=True)\n usage_mb = usage_b / 1024 ** 2 # convert bytes to megabytes\n return \"{:03.2f} MB\".format(usage_mb)","metadata":{"papermill":{"duration":0.043094,"end_time":"2021-04-27T17:27:41.35723","exception":false,"start_time":"2021-04-27T17:27:41.314136","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Let us consider first the raw_sample_df dataframe and its current memory utilisation and look at each column type how much memory it is consuming and provide optimisation of those columns as shown below","metadata":{"papermill":{"duration":0.032053,"end_time":"2021-04-27T17:27:41.421458","exception":false,"start_time":"2021-04-27T17:27:41.389405","status":"completed"},"tags":[]}},{"cell_type":"code","source":"raw_sample_df.info(memory_usage='deep')","metadata":{"papermill":{"duration":5.103002,"end_time":"2021-04-27T17:27:46.556525","exception":false,"start_time":"2021-04-27T17:27:41.453523","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"optimized_gl = raw_sample_df.copy()\n\ngl_int = raw_sample_df.select_dtypes(include=['int'])\nconverted_int = gl_int.apply(pd.to_numeric,downcast='unsigned')\noptimized_gl[converted_int.columns] = converted_int\n\n\ngl_obj = raw_sample_df.select_dtypes(include=['object']).copy()\nconverted_obj = pd.DataFrame()\nfor col in gl_obj.columns:\n num_unique_values = len(gl_obj[col].unique())\n num_total_values = len(gl_obj[col])\n if num_unique_values / num_total_values < 0.5:\n converted_obj.loc[:,col] = gl_obj[col].astype('category')\n else:\n converted_obj.loc[:,col] = gl_obj[col]\noptimized_gl[converted_obj.columns] = converted_obj\nprint(\"Original Ad Feature dataframe:{0}\".format(mem_usage(raw_sample_df)))\nprint(\"Memory Optimised Ad Feature dataframe:{0}\".format(mem_usage(optimized_gl)))","metadata":{"papermill":{"duration":22.511394,"end_time":"2021-04-27T17:28:09.101279","exception":false,"start_time":"2021-04-27T17:27:46.589885","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"raw_sample_df = optimized_gl.copy()\nraw_sample_df_new = raw_sample_df.rename(columns = {\"user\": \"userid\"})","metadata":{"papermill":{"duration":0.44638,"end_time":"2021-04-27T17:28:09.580923","exception":false,"start_time":"2021-04-27T17:28:09.134543","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"ad_feature_df.info(memory_usage='deep')","metadata":{"papermill":{"duration":0.060244,"end_time":"2021-04-27T17:28:09.67385","exception":false,"start_time":"2021-04-27T17:28:09.613606","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"optimized_g2 = ad_feature_df.copy()\n\ng2_int = ad_feature_df.select_dtypes(include=['int'])\nconverted_int = g2_int.apply(pd.to_numeric,downcast='unsigned')\noptimized_g2[converted_int.columns] = converted_int\n\ng2_float = ad_feature_df.select_dtypes(include=['float'])\nconverted_float = g2_float.apply(pd.to_numeric,downcast='float')\noptimized_g2[converted_float.columns] = converted_float\n\nprint(\"Original Ad Feature dataframe:{0}\".format(mem_usage(ad_feature_df)))\nprint(\"Memory Optimised Ad Feature dataframe:{0}\".format(mem_usage(optimized_g2)))","metadata":{"papermill":{"duration":0.193349,"end_time":"2021-04-27T17:28:09.915377","exception":false,"start_time":"2021-04-27T17:28:09.722028","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"user_profile_df.info(memory_usage='deep')","metadata":{"papermill":{"duration":0.074202,"end_time":"2021-04-27T17:28:10.022984","exception":false,"start_time":"2021-04-27T17:28:09.948782","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"optimized_g3 = user_profile_df.copy()\n\ng3_int = user_profile_df.select_dtypes(include=['int'])\nconverted_int = g3_int.apply(pd.to_numeric,downcast='unsigned')\noptimized_g3[converted_int.columns] = converted_int\n\ng3_float = user_profile_df.select_dtypes(include=['float'])\nconverted_float = g3_float.apply(pd.to_numeric,downcast='float')\noptimized_g3[converted_float.columns] = converted_float\n\nprint(\"Original User Feature dataframe:{0}\".format(mem_usage(user_profile_df)))\nprint(\"Memory Optimised User Feature dataframe:{0}\".format(mem_usage(optimized_g3)))","metadata":{"papermill":{"duration":0.288134,"end_time":"2021-04-27T17:28:10.344749","exception":false,"start_time":"2021-04-27T17:28:10.056615","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Now that we optimised all the dataframes it is time to converge all into a single final dataset for our model prediction implementation","metadata":{"papermill":{"duration":0.033407,"end_time":"2021-04-27T17:28:10.413522","exception":false,"start_time":"2021-04-27T17:28:10.380115","status":"completed"},"tags":[]}},{"cell_type":"code","source":"df1 = raw_sample_df_new.merge(optimized_g3, on=\"userid\")\nfinal_df = df1.merge(optimized_g2, on=\"adgroup_id\")\nfinal_df.head()","metadata":{"papermill":{"duration":22.474214,"end_time":"2021-04-27T17:28:32.921629","exception":false,"start_time":"2021-04-27T17:28:10.447415","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Ideally the dataset should contain historical columns for our model implementation .To overcome this problem I have replicated the two columns as historical columns for calculating the historical behavior of the users.","metadata":{"papermill":{"duration":0.034079,"end_time":"2021-04-27T17:28:32.99067","exception":false,"start_time":"2021-04-27T17:28:32.956591","status":"completed"},"tags":[]}},{"cell_type":"code","source":"final_df['hist_cate_id'] = final_df['cate_id']\nfinal_df['hist_adgroup_id'] = final_df['adgroup_id']","metadata":{"papermill":{"duration":0.105256,"end_time":"2021-04-27T17:28:33.130571","exception":false,"start_time":"2021-04-27T17:28:33.025315","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Now let us choose the sparse,dense and sequence features required for DIN model as shown below","metadata":{"papermill":{"duration":0.034112,"end_time":"2021-04-27T17:28:33.19921","exception":false,"start_time":"2021-04-27T17:28:33.165098","status":"completed"},"tags":[]}},{"cell_type":"code","source":"sparse_features = [feat for feat in final_df.columns if feat not in ['time_stamp','pid', 'nonclk','brand',\n 'cms_segid', 'cms_group_id', 'age_level',\n 'pvalue_level', 'shopping_level', 'occupation', 'new_user_class_level ',\n 'campaign_id', 'customer', 'price', 'hist_cate_id','hist_adgroup_id','clk']]\nsparse_features","metadata":{"papermill":{"duration":0.046643,"end_time":"2021-04-27T17:28:33.281767","exception":false,"start_time":"2021-04-27T17:28:33.235124","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"dense_features = [feat for feat in final_df.columns if feat not in ['userid', 'time_stamp', 'adgroup_id', 'pid', 'nonclk', 'clk',\n 'cms_segid', 'cms_group_id', 'final_gender_code', 'occupation', 'new_user_class_level ',\n 'cate_id', 'campaign_id', 'shopping_level','customer', 'brand','hist_cate_id','hist_adgroup_id']]\ndense_features","metadata":{"papermill":{"duration":0.047061,"end_time":"2021-04-27T17:28:33.363637","exception":false,"start_time":"2021-04-27T17:28:33.316576","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"sequence_features = [feat for feat in final_df.columns if feat not in ['userid', 'time_stamp', 'adgroup_id', 'pid', 'nonclk', 'clk',\n 'cms_segid', 'cms_group_id', 'final_gender_code', 'age_level',\n 'pvalue_level', 'shopping_level', 'occupation', 'new_user_class_level ',\n 'cate_id', 'campaign_id', 'customer', 'brand', 'shopping_level','price']]\nsequence_features","metadata":{"papermill":{"duration":0.047921,"end_time":"2021-04-27T17:28:33.448481","exception":false,"start_time":"2021-04-27T17:28:33.40056","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"behavior_feature_list = [feat for feat in final_df.columns if feat in ['adgroup_id', 'cate_id']]","metadata":{"papermill":{"duration":0.045131,"end_time":"2021-04-27T17:28:33.531064","exception":false,"start_time":"2021-04-27T17:28:33.485933","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"final_df[sparse_features] = final_df[sparse_features].fillna('-1', )\nfinal_df[sequence_features] = final_df[sequence_features].fillna('-1', )\nfinal_df[dense_features] = final_df[dense_features].fillna(0, )\ntarget = ['clk']","metadata":{"papermill":{"duration":2.398405,"end_time":"2021-04-27T17:28:35.966395","exception":false,"start_time":"2021-04-27T17:28:33.56799","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":" #### 1. Perform simple transformation on dense features","metadata":{"papermill":{"duration":0.03603,"end_time":"2021-04-27T17:28:36.039125","exception":false,"start_time":"2021-04-27T17:28:36.003095","status":"completed"},"tags":[]}},{"cell_type":"code","source":"mms = MinMaxScaler(feature_range=(0, 1))\nfinal_df[dense_features] = mms.fit_transform(final_df[dense_features])","metadata":{"papermill":{"duration":22.083117,"end_time":"2021-04-27T17:28:58.159013","exception":false,"start_time":"2021-04-27T17:28:36.075896","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### 2. Set hashing space for each sparse field,and record dense feature field name","metadata":{"papermill":{"duration":0.035887,"end_time":"2021-04-27T17:28:58.231628","exception":false,"start_time":"2021-04-27T17:28:58.195741","status":"completed"},"tags":[]}},{"cell_type":"code","source":"fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=2000000,embedding_dim=8) for feat in sparse_features] + [DenseFeat(feat, 1, )for feat in dense_features] + [VarLenSparseFeat(SparseFeat(feat, vocabulary_size=2000000,embedding_dim=8), maxlen=1) for feat in sequence_features] \nlinear_feature_columns = fixlen_feature_columns\ndnn_feature_columns = fixlen_feature_columns\nfeature_names = get_feature_names(linear_feature_columns + dnn_feature_columns, )\n","metadata":{"papermill":{"duration":0.077545,"end_time":"2021-04-27T17:28:58.345837","exception":false,"start_time":"2021-04-27T17:28:58.268292","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### 3.Generate input data for model","metadata":{"papermill":{"duration":0.036396,"end_time":"2021-04-27T17:28:58.41938","exception":false,"start_time":"2021-04-27T17:28:58.382984","status":"completed"},"tags":[]}},{"cell_type":"code","source":"train, test = train_test_split(final_df, test_size=0.2)\ntrain_model_input = {name:train[name] for name in feature_names if name != 'clk'}\ntest_model_input = {name:test[name] for name in feature_names if name != 'clk'}","metadata":{"papermill":{"duration":13.125771,"end_time":"2021-04-27T17:29:11.581498","exception":false,"start_time":"2021-04-27T17:28:58.455727","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":" ##### 4. Define Model,Train,Predict and Evaluate","metadata":{"papermill":{"duration":0.036331,"end_time":"2021-04-27T17:29:11.6553","exception":false,"start_time":"2021-04-27T17:29:11.618969","status":"completed"},"tags":[]}},{"cell_type":"code","source":"tf.compat.v1.disable_eager_execution()\n# model = DIN(linear_feature_columns,behavior_feature_list, task='binary')\nmodel = xDeepFM(linear_feature_columns,dnn_feature_columns,)\n# model = DSTN(linear_feature_columns,behavior_feature_list,)\n# model = DIEN(linear_feature_columns, behavior_feature_list)\n# DIEN()","metadata":{"papermill":{"duration":2.240199,"end_time":"2021-04-27T17:29:13.931793","exception":false,"start_time":"2021-04-27T17:29:11.691594","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"import seaborn","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"final_df.columns","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"\nimport matplotlib.pyplot as plt\nplt.hist(user_profile_df['age_level'])\nplt.title('Age Distribution')","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"user_profile_df['age_level'].head(5000).unique()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"behavior_feature_list","metadata":{"papermill":{"duration":0.049756,"end_time":"2021-04-27T17:29:14.035005","exception":false,"start_time":"2021-04-27T17:29:13.985249","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"def Mixed_loss(a = 1.0):\n \"\"\"\n \"\"\"\n# a = tf.constant(a, dtype=tf.float32)\n\n def binary_focal_loss_fixed(y_true, y_pred):\n alpha = tf.constant(0.8, dtype=tf.float32)\n gamma = tf.constant(0.6, dtype=tf.float32)\n \"\"\"\n y_true shape need be (None,1)\n y_pred need be compute after sigmoid\n \"\"\"\n y_true = tf.cast(y_true, tf.float32)\n alpha_t = y_true*alpha + (K.ones_like(y_true)-y_true)*(1-alpha)\n \n p_t = y_true*y_pred + (K.ones_like(y_true)-y_true)*(K.ones_like(y_true)-y_pred) + K.epsilon()\n focal_loss = - alpha_t * K.pow((K.ones_like(y_true)-p_t),gamma) * K.log(p_t)\n# return K.mean(focal_loss)\n return focal_loss\n \n def mixed_loss(y_true, y_pred):\n a = tf.constant(0, dtype=tf.float32)\n# return a*binary_focal_loss_fixed(y_true, y_pred)+(1-a)*binary_crossentropy(y_true,y_pred)\n return binary_crossentropy(y_true,y_pred)\n return mixed_loss\n","metadata":{"papermill":{"duration":0.050238,"end_time":"2021-04-27T17:29:14.121652","exception":false,"start_time":"2021-04-27T17:29:14.071414","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"from sklearn.metrics import precision_score","metadata":{"papermill":{"duration":0.036646,"end_time":"2021-04-27T17:29:14.195574","exception":false,"start_time":"2021-04-27T17:29:14.158928","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"from keras.optimizers import Adam\nmodel.compile(\"adam\",binary_crossentropy,metrics=['AUC','accuracy'])\nhistory = model.fit(train_model_input, train[target].values,batch_size=4096, epochs=1, verbose=1, validation_split=0.2,shuffle=True )","metadata":{"papermill":{"duration":1348.436716,"end_time":"2021-04-27T17:51:42.66997","exception":false,"start_time":"2021-04-27T17:29:14.233254","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"pred_ans = model.predict(test_model_input, batch_size=256)\n\nprint(\"test LogLoss\", round(log_loss(test[target].values, pred_ans), 2))\nprint(\"test AUC\", roc_auc_score(test[target].values, pred_ans))","metadata":{"papermill":{"duration":101.055833,"end_time":"2021-04-27T17:53:30.376651","exception":false,"start_time":"2021-04-27T17:51:49.320818","status":"completed"},"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Conclusion:\nIn this notebook, we focused on the CTR prediction task in the scenario of display advertising in e-commerce industry in our example taken Taoboa dataset , which involves internet-scale user behavior data for 7 days .In conclusion revealed and summarized the two key structures of data i.e. diversity and local activation and designed a novel model named DIN(Deep Interest Network with better exploitation of data structures.The above experiments show DIN brings more interpretability and achieves better GAUC(Group Area Under Curve) performance compared with popular MLPs model. Besides, we studied the overfitting problem in training such industrial deep networks and proposed an adaptive regularization technique \"Dice\" which reduced overfitting greatly in our scenario.\n\nIn the upcoming versions will bring more insights on the DIN model implementation.\n\n### I hope you had a good overview of DIN model implementation.Greatly appreciate to leave your comments and if you liked this kernel do encourage with an upvote. Thank you :)\n","metadata":{"papermill":{"duration":5.707721,"end_time":"2021-04-27T17:53:41.57454","exception":false,"start_time":"2021-04-27T17:53:35.866819","status":"completed"},"tags":[]}},{"cell_type":"code","source":"","metadata":{"papermill":{"duration":5.715433,"end_time":"2021-04-27T17:54:04.251886","exception":false,"start_time":"2021-04-27T17:53:58.536453","status":"completed"},"tags":[]},"execution_count":null,"outputs":[]}]}