Data

Classification of predictor and response variable

This study analyses the effect of algal biomass properties and the pyrolysis experimental condition on bio-oil production. To understand the production of bio-oil quantitatively and qualitatively, two different response variables were chosen: bio-oil yield and HHV Bio-oil yield depicts the amount of bio-oil produced during the pyrolysis process and is a quantitative measurement of bio-oil production, while HHV helps evaluate the quality of the bio-oil produced. In my dataset, biomass type (whether it is microalgae or macroalgae), the pigment present in the algal biomasses (green, brown), and the growth habitat of algal biomasses (freshwater, marine water) were considered as predictor variables. Similarly, algal biomasses' carbon, hydrogen, oxygen, nitrogen, and sulfur content were considered predictor variables to analyze the influence of elemental analysis of algal biomass on its bio-oil production. For understanding the impact of proximate analysis of algal biomasses on its bio-oil production, algae ash, fixed carbon, volatile compound, and moisture content were included as predictor variables. Moreover, the carbohydrate, protein, and lipid composition of algal biomass were considered predictor variables for analyzing the effect of biochemical composition on the produced bio-oil. To analyze the pyrolysis process, the predictor variables selected were the highest heating temperature, pyrolysis heating rate, and residence time. A sample of the collected datasets containing all the predictor and response variables has been presented in Table 1.

Table 1: Representation of datasets collected for the analysis, containing all the predictor and response variables

PS: HABITAT: Algal growth habitat; C: Carbon content of algae; H: Hydrogen content of algae; O: Oxygen content of algae; N: Nitrogen content of algae; S: Sulfur content of algae; ASH: Ash content of algae; FC: Fixed carbon content of algae; VC: Volatile compound of algae; MOISTURE: Moisture content of algae; CARB: Carbohydrate content of algae; PROT: Protein content of algae; LIPID: Lipid content of algae; HHV: The higher heating value of algae; RT: Pyrolysis residence time; HR: Pyrolysis heating rate; TEMP: Pyrolysis heating temperature; YIELD: Bio-oil yield; HHV_OIL: The higher heating value of the bio-oil

Data cleaning and organization

After collecting the data, the distribution of each response and predictor variable was performed by obtaining their quantile values; further, the distributions in the variables were visualized with the help of boxplots and histograms. It was observed that most variables have outliers; some are extreme outliers and can impact or introduce biases in the analysis results. Therefore, the extreme outliers associated with each variable were identified and removed (Figure 2). Not all the outliers have been removed from the datasets; the data having extreme outliers were removed to make the analysis more streamlined and specific to the algal properties and pyrolysis process. Some of the predictor variables from our datasets are categorical, whereas some are continuous. The algal biomass type and the growth habitat of algal biomasses are the categorical variables, and the rest of the predictor variables are continuous.

Figure 2: The process of data cleaning by removing the outliers from the datasets has been shown by the examples of Sulphur content and pyrolysis heating rate