A Hybrid Travel Recommender Model Based on Deep Level Autoencoder And Machine Learning Algorithms

This research investigates the application of autoencoders in processing travelogues written in the Malayalam language on Facebook. The main objective is to harness the capabilities of autoencoders to learn a compressed representation of the input data and employ it to train various machine learning models for enhanced accuracy and efficiency. The major challenge of unavailability of a benchmark dataset in the Malayalam language for the travel domain was overcome by employing NLP techniques on the unstructured, lengthy, imbalanced travelogues, applying some additional filtering methods, and the creation of an exclusive Part of Travel Tagger (POT Tagger) along with lookup dictionaries. As this pioneering work focuses on Malayalam travel reviews posted on social media, the model presents a valuable opportunity for extension to other low-resourced Indian languages. The study follows a two-step approach. Initially, an autoencoder neural network architecture is utilized to encode the travelogues into a lower-dimensional latent space representation. The encoder network adeptly captures crucial features and patterns within the data. The compressed representation obtained from the encoder is then fed into the decoder, which reconstructs the original travelogues. Subsequently, the encoded model is employed to train diverse machine learning models, including logistic regression, decision tree classifier, support vector machine (SVM), random forest classifier (RFC), K-nearest neighbours (KNN), stochastic gradient descent (SGD), and multilayer perceptron


Introduction
Travel and tourism play a pivotal role in promoting cultural exchange, economic growth, and personal enrichment.With the exponential growth of social media platforms and online travel communities, individuals now have unprecedented access to a wealth of travel-related information and experiences shared by fellow travellers.Among these platforms, Facebook travel groups have emerged as dynamic hubs for travellers to document their journeys, share captivating travelogues, and exchange valuable insights and recommendations.In this research paper, we focus on harnessing the vast potential of Facebook travel groups to develop a personalized travel recommender model.Our goal is to create a system that leverages the power of autoencoders and machine learning algorithms to provide tailored travel recommendations to users based on their individual preferences and interests.To achieve this, we collected a substantial corpus of 12,500 travelogues, each contain 40 sentences in average, written in the Malayalam language by scraping data from various Facebook travel groups.
Malayalam, the official language of Kerala, a southern state of India, and the Lakshadweep Islands, presents unique challenges in text and speech processing due to its intricate morphology, agglutinative nature, and rich inflectional structure.As a low-resourced language, it lacks uniformity in spelling and sentence structure across various sources within the State.Moreover, the unavailability of benchmark datasets poses a significant hurdle in developing robust models for text and speech processing tasks in Malayalam.These factors collectively contribute to the complexities faced by researchers and developers in harnessing the full potential of this unique language for natural language processing and speech-related applications.To ensure the quality and reliability of our dataset, we implemented rigorous data cleaning processes, resulting in a structured dataset with key features identified as Travel Type (TT), Travel Mode (TM), Location Type (LT), Location Climate (LC), Users (U) and specific destinations (L) mentioned in each travelogue.These crucial features form the foundation of our personalized travel recommender model.
The centrepiece of our approach lies in the application of autoencoder neural network architecture.By adopting this deep learning technique, we seek to study the reduced version of the travelogue data, effectively capturing essential patterns and features intrinsic to each travel experience.This encoded representation serves as a powerful tool to train various machine learning models, enabling them to learn from the compact and meaningful representation of the travel data.In this paper, we present our research findings, including the architecture of the autoencoder, the details of the data cleaning and preprocessing procedures, and the results obtained from training the machine learning models using the encoded features.We also highlight the validation accuracy achieved, which stands at an impressive 95.84%.The implications of our study extend to enhancing travel experiences by offering personalized travel recommendations that align with individual preferences, thereby promoting a more enriching and fulfilling travel journey.Key Contributions are: 1. Autoencoder Model for Malayalam Travelogues: This research introduces the construction of an autoencoder model specifically designed for processing travelogues in the Malayalam language.The autoencoder serves as a powerful tool for unsupervised learning, capturing underlying patterns and features present in the travel data.2. Analysis with and without Data Compression using autoencoder architecture.

Evaluation of Compressed Data:
The research evaluates the performance of the autoencoder model when applied to compress travel data.4. Enhanced Travel Recommendation Accuracy: By utilizing the encoded representations obtained from the autoencoder, machine learning models are trained for travel recommendation tasks. 5. Comparative Analysis of Machine Learning Models: The study includes a comparative analysis of various machine learning models, such as logistic regression, decision tree classifier, SVM, random forest, KNN, SGD, and MLP, all trained using the encoded travel representations.

Background and Related work
NLP advancements enable robots to read and analyse human language with remarkable precision, leading to a transformative impact on text understanding [1].Mary Priya S conducted experimental investigations involving Statistical Machine Translation (SMT) for the Malayalam language and explores the impact of adapting SMT methods designed for foreign languages to a Dravidian language like Malayalam [2].In Paper [3] R.K. Thandil explain the challenges on performing language processing on Malayalam language as its complex agglutinative nature and morphological richness.
A systematic study on development of Recommender system done by D. Roy [4] demonstrated various strategies for information filtering, its types, their characteristics, and challenges.Personalized travel recommendation systems have garnered significant attention in recent years due to the increasing demand for tailored travel experiences [5].F. Lu and W. Zhang in paper [6] [8].In [9], Ding introduced an innovative privacy-preserving approach that adapts autoencoder for federated collaborative filtering, guaranteeing data privacy while achieving superior model performance.Guo et al. [10] introduced an innovative hybrid recommendation system known as AutoLFA, that amalgamates methods of LFA and Autoencoder.AutoLFA utilizes two separate recommendation systems, each functioning within its distinct metric representation space, leveraging their respective advantages, and amalgamates them through a tailored self-adaptive weighting mechanism to harness the strengths of both methodologies.Y. Bougteb, et.al. [11], investigated a hybrid recommender system that combines a deep autoencoder for learning user interests and reconstructing missing ratings, along with SVD++ decomposition to capture correlations between different feature factors.In paper [12] Kamble studied the performance of recommender systems by leveraging data mining techniques by using an SVM-based recommender system and conducts experiments on various product datasets.A hybrid RS consists of deep learning model like autoencoders and various machine learning [13] models like SVM, KNN, logistic regression, decision tree, Random Forest and Multilayer perceptron are compared here in this paper.

Methodology
The methodology for developing a travel recommendation system using an autoencoder in this study is a multifaceted and cohesive process.Beginning with data collection, where information is sourced from community sites and online travel groups, the process continues with data preprocessing to transform the unstructured and noisy data into a consistent format.Feature extraction follows, where essential attributes are distilled from the text, facilitating dataset preparation.An autoencoder model is then constructed and trained to learn the compressed representation of travelogue data, capturing essential patterns and features.The selfsupervised learning framework enhances the efficiency of the model, allowing for further refinement.Next, a machine learning recommender system is designed to work with both original data and the encoded representations obtained from the autoencoder, offering a fusion of traditional ML techniques with deep learning.Finally, experimental results are analyzed, and various models are compared, providing insights into the effectiveness and potential improvements of the travel recommendation system.The integration of these stages forms a comprehensive and robust methodology that tailors travel recommendations to individual preferences and experiences.The entire methodology is divided into the following phases as given in Figure 1.
Fig. 1 Step by step methods of proposed work.

Data collection
In this study, our data collection process involved sourcing information from diverse community sites, with a primary focus on the largest Malayalam Travel group on Facebook, known as 'Sanchari,' as well as various travel blogs containing reviews and travelogues.Employing web scraping techniques, we gathered online write-ups contributed by random users.However, the process came with significant challenges, including client-side rendering, memory leaks, and bot detection mechanisms, which required careful navigation.The data obtained from these sources was highly unstructured, noisy, and inconsistent.Figure 2 shows the structure of list of posts(travelogues in Malayalam), its reactions, user name and profile picture, total count of viewed people and posted date and time.These posts are retrieved into an excel sheet which is the foundation of data collection and dataset preparation.
Fig. 2 The list of travelogues and associated details in observed in Facebook travel group.
To maintain organization, we stored each write-up in separate individual files, named after the respective traveler, ensuring easy retrieval and management.Moreover, our system accommodates input from new users, capturing their individual travel-related text and storing it in separate files, facilitating seamless incorporation of new data.After extensive preprocessing, we transformed the collected 12,500 unstructured travelogues into a structured tabular format.A sample form of scraped travelogue is given as figure 3 which contain message as Travelogue, posted time, url of post and profile link, total reactions, likes, comments and its shares.

Travelogue preprocessing
Text preprocessing is a crucial step in preparing the unstructured travelogues for the development of our personalized travel recommender model.The first step involves tokenization, where the text is divided into individual tokens or words, enabling us to work with discrete units of information.Following tokenization, the data undergoes cleaning to remove any irrelevant characters, symbols, or special characters, streamlining the text for further analysis.Additionally, we perform stopwords removal, eliminating common words that do not carry significant meaning, reducing noise in the dataset.Table 1 listed few among the stop words used in Malayalam language.To ensure linguistic consistency, we apply stemming or lemmatization, reducing words to their root or base forms, enabling us to group variations of a word together.This process further aids in text normalization.Moreover, we map certain words from a predefined dictionary, resolving synonyms and related terms to a standardized representation, enhancing the model's performance by reducing ambiguity.
Incorporating domain-specific knowledge, we utilize a Part of Travelogue Tagger (POTT) to label and identify travel-related entities like locations, activities, and transportation modes.The tagged data is then organized and stored in CSV or Excel files for easy retrieval and analysis.

Feature Extraction
The feature extraction process plays a vital role in transforming the lengthy, noisy, and code-mixed unstructured travelogues into a structured and informative dataset.Through an intense preprocessing pipeline, each token in the travelogues undergoes annotation using our custom-created Part of Travelogue Tagger (POTT), a specialized tool designed exclusively for this research work.This tagging process categorizes each token into one of the feature categories, which includes Travel Type (TT), Travel Mode (TM), Location Climate (LC), Location Type (LT), and specific destinations (L).By annotating each token with the relevant feature category, we create a structured dataset that captures crucial information related to different aspects of travel experiences.Table 2 show the skeleton of structured dataset with essential features.This dataset becomes the foundation for training our personalized travel recommender model, enabling it to make accurate and tailored recommendations based on users' preferences and interests.The integration of the POTT tool ensures that the extracted features are aligned with the context of travelogues in the Malayalam language, enhancing the model's ability to understand the nuances and complexities of travel-related content.Overall, the feature extraction process facilitates the transformation of unstructured travelogues into a rich and organized dataset, empowering our model to deliver personalized and relevant travel recommendations for users.A custom autoencoder architecture has been constructed and trained using pre-processed Malayalam travelogue data, containing the decoder with an encoder.Transformation of the input features into a compact latent space representation with reduced dimensions is the duty of encoder, and reconstruction of the original data from this compressed data is the responsibility of decoder.The model architecture depicted in Figure 5 is specifically designed for a unique task of compressing textual data in the Malayalam language, employing a pre-processed dataset rather than a synthetic one.Both the encoder and decoder components of this architecture consist of two dense layers each, integrating leaky ReLU activation and batch normalization techniques.The bottleneck layer is configured to have a size half of the input features, enabling the generation of a condensed representation of the textual data.To optimize the model's performance, it is compiled Adam as the optimizer, loss function as MSE, and accuracy.The main objective of this autoencoder is to learn a compressed representation of the textual data while capturing significant characteristics, including context, tone, semantics, and language intricacies specific to the Malayalam language.Additionally, the data is scaled using MinMaxScaler to enhance the model's efficiency and ability to handle variations in the textual input.

Fusion of autoencoder with various machine learning approaches
Next phase is to perform the efficiency of various machine learning algorithms to predict the destination as per given dataset.The output of Autoencoder model will be given as the input feature to these algorithms.Figure 6 shows the diagrammatic representation of this fusion.
Fig. 6 Combination of Autoencoder with various ML algorithms.

Evaluation and Performance Metrics
For the experiment, the considered ML algorithms are Logistic regression, Support vector machine, decision tree, Random Forest, K Nearest neighbour, SGD and MLP.Each of them have their own advantages and efficiency features.The general features are stated here.Logistic Regression is used for binary classification problems.Support Vector Machine finds the optimal hyperplane separating different classes; Decision Trees divide the dataset into subsets based on attribute values; Random Forest is an learning method utilizing multiple decision trees; K-Nearest Neighbors (KNN) classifies based on the majority of its neighboring data points; Stochastic Gradient Descent (SGD) minimizes the cost function by iteratively updating the coefficients; and a Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network consisting of multiple layers of nodes.
The experiment has done on these algorithms and the observed results are given below.While considering accuracy as the measure, all machine learning models exhibit relatively high accuracy rates, ranging from 83% to 90%.Logistic Regression and MLP achieve the highest accuracy of 90%, while SVM has the lowest accuracy of 83%.The difference in accuracy between the models is not substantial, suggesting that all models perform reasonably well in making correct predictions.Discussing about Precision, Logistic Regression and MLP models demonstrate the highest precision of 90%.The Decision Tree and SVM models have slightly lower precision scores, ranging from 84% to 85%.The precision scores show the trade-off between true positives and false positives for each model.When comparing the F1 Score of these algorithms, MLP stands out with the highest F1 Score of 87%, indicating a balanced performance between precision and recall.The other models show F1 Scores ranging from 83% to 86%.The F1 Scores demonstrate the models' overall performance, considering both false positives and false negatives.Recall: KNN achieves the highest recall score of 88%, indicating its ability to correctly identify a larger proportion of positive instances.The SVM model has the lowest recall score of 83%, suggesting that it struggles to identify a significant number of positive instances.Table 3 gives the tabular details and Figure 7 gives pictorial representation.
Fig. 7 Representation of performance evaluation of ML algorithms.

Discussion and Comparative analysis
Autoencoder is executed in two methodologies, one with compression and another without compressing data and features.Both have different results to show.Encoder Without Compression, methodology achieves an accuracy of 95.84%, which is quite high.It involves using the autoencoder to learn a compressed representation of the input data without applying further dimensionality reduction.The encoder captures essential features and patterns while preserving most of the original information.While analysing performance of Encoder with Compression, this method achieves a higher accuracy of 96.96%.It involves using the autoencoder to further reduce the dimensionality of the input data, creating a more compact and compressed representation in the bottleneck layer.This compressed representation retains crucial information while discarding fewer essential details, resulting in a more efficient representation of the data.Table 4 shows the accuracy and loss curve of both compression algorithms as well as uncompressed algorithm.Figure 8 shows diagrammatic representation of results of models.
Fig. 8 Performance of autoencoder model with compression and without compression.

Conclusion and Future Work
This research paper presents a novel personalized travel recommender model catering specifically to the Malayalam-speaking community.By combining an autoencoder with diverse machine learning algorithms, the model efficiently processes unstructured travelogues and extracts crucial features, including Travel Type, Travel Mode, Location Climate, Location Type, and specific destinations.The autoencoder effectively learns a compressed representation of the travel data, reducing dimensionality while retaining vital characteristics.Experimental results reveal substantial accuracy improvements compared to conventional methods.A comprehensive comparative analysis identifies Logistic Regression and MLP as the most effective models, attaining the highest accuracy and precision scores.This underscores their potential for delivering accurate and reliable travel recommendations to users.The research contributes significantly to advancing personalized travel recommendation systems, enhancing user satisfaction and engagement in travel planning.By intelligently learning from unstructured travelogues, the model provides personalized suggestions that resonate with individual preferences, fostering a deeper connection between travellers and their desired destinations.
Future endeavours focus on refining the model further by integrating user feedback and real-time data, continuously enhancing recommendation accuracy and adaptability.The study lays the foundation for more sophisticated and user-centric travel recommendation systems, ultimately enriching the travel planning experience for users within the Malayalam-speaking community.

Table 1 .
Sample Stop words, Tokens, Lemmas in Malayalam, and corresponding English translation.

Table 2 .
Structure of dataset prepared from unstructured travelogue.

Table 3 .
evaluation of performance of autoencoder and ML algorithms

Table 4 .
Autoencoder performance.With and without compression