Association Rule Mining (ARM) is a data mining technique used to discover interesting relationships or patterns among a large set of variables in transactional data. The purpose of ARM is to find strong associations or correlation relationships among different items in a dataset. ARM is typically used in market basket analysis, where the goal is to find patterns like "if a customer buys item A, they are likely to buy item B."
Support: The frequency of an itemset appearing in transactions.
Confidence: The likelihood that item B appears in transactions that contain item A.
Lift: A measure of how much more likely item B is to be observed in transactions that contain item A than expected.
The Apriori algorithm is a popular algorithm used in data mining to identify frequent itemsets and generate association rules from a large dataset. It is particularly useful in the context of market basket analysis, where the goal is to find relationships between items that are frequently purchased together.
Because this patents view dataset does not have multi categorial columns, create a new column which is a sum of several categorial columns.
And patentsview dataset has specific main categorial values. 'patent_kind', 'patent_type', 'patent_year', 'patent_firstnamed_assignee_country' are like those. They are excellent choices for an Apriori algorithm analysis because they provide a meaningful combination of categorical information that can reveal interesting patterns and associations between different patent characteristics
The rule (utility) → (B2) shows a high support of 0.812982, meaning that 81.29% of the transactions contain both "utility" and "B2" patents. The high support values indicate common patterns, such as the frequent association of utility patents with B2 patents, or the common occurrence of utility patents in specific years like 2023.
(2019) → (utility) has a confidence of 1.0, meaning every time "2019" is present as the antecedent, "utility" is always the consequent. This shows a very strong dependency in this case.
High confidence values indicate strong relationships between the antecedent and consequent, particularly in regions or years. For example, utility patents are highly prevalent in several years (2019, 2020, 2021, etc.) and in countries like CA (Canada), DE (Germany), and JP (Japan).
Lift measures how much more likely the consequent is to appear given the antecedent compared to its overall occurrence. A lift of greater than 1 suggests a stronger association than expected by chance. Higher lift values indicate stronger, potentially non-random associations. The relationships between the US, certain patent types (B1, B2), and specific years (2019, 2020) are more significant and less likely to be random.
Showing that Utility has strong connection between [B2] and [B2,US]. Can figure out B2 type patents are mostly utility patents.
In this visual, we can figure out in 2019, there was a lot of utility type B1 patents are revealed in US.
Because our confidences shows top 15 are all related to the Utility. This is good to know, but not that useful.
Utility patents and B2 patents frequently co-occur, indicating that these patent types are commonly filed together. B2 patents are issued patents (often for utility inventions), so their high support makes sense in this dataset. The year 2023 is frequently associated with utility patents, reflecting a recent trend or spike in utility patent filings for that year. This could be due to advancements in specific technologies or industries that have driven patenting activities. The combination of B2 and US suggests a high volume of patents filed in the US that are designated as B2. The frequent association indicates that many utility patents issued in the US fall under the B2 category. The association between 2019, US, and B1 patents has a high lift, indicating that this combination occurs more frequently than expected by chance. This suggests that a significant number of patents filed in the US in 2019 were designated as B1, likely representing reissue patents or designations related to this time and location.