Efficient-Apriori ================= An efficient pure Python implementation of the Apriori algorithm. Overview -------- An efficient pure Python implementation of the Apriori algorithm. The apriori algorithm uncovers hidden structures in categorical data. The classical example is a database containing purchases from a supermarket. Every purchase has a number of items associated with it. We would like to uncover association rules such as `{bread, eggs} -> {bacon}` from the data. This is the goal of `association rule learning `_, and the `Apriori algorithm `_ is arguably the most famous algorithm for this problem. This project contains an efficient, well-tested implementation of the apriori algorithm as descriped in the `original paper `_ by Agrawal et al, published in 1994. Installation ------------ The package is distributed on `PyPI `_. From your terminal, simply run the following command to install the package. :: $ pip install efficient-apriori Notice that the name of the package is ``efficient-apriori`` on PyPI, while it's imported as ``import efficient_apriori``. A minimal working example ------------------------- .. py:currentmodule:: efficient_apriori Here's a minimal working example. Notice that in every transaction with `eggs` present, `bacon` is present too. Therefore, the rule `{eggs} -> {bacon}` is returned with 100 % confidence. .. code-block:: python from efficient_apriori import apriori transactions = [('eggs', 'bacon', 'soup'), ('eggs', 'bacon', 'apple'), ('soup', 'bacon', 'banana')] itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1) print(rules) # [{eggs} -> {bacon}, {soup} -> {bacon}] See the API documentation for the full signature of :func:`~efficient_apriori.apriori`. More examples are included below. More examples ------------- Filtering and sorting association rules ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It's possible to filter and sort the returned list of association rules. .. code-block:: python from efficient_apriori import apriori transactions = [('eggs', 'bacon', 'soup'), ('eggs', 'bacon', 'apple'), ('soup', 'bacon', 'banana')] itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1) # Print out every rule with 2 items on the left hand side, # 1 item on the right hand side, sorted by lift rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules) for rule in sorted(rules_rhs, key=lambda rule: rule.lift): print(rule) # Prints the rule and its confidence, support, lift, ... Contributing ------------ You are very welcome to scrutinize the code and make pull requests if you have suggestions for improvements. Your submitted code must be PEP8 compliant, and all tests must pass. See `tommyod/Efficient-Apriori `_ on GitHub for more information. API documentation ----------------- Although the Apriori algorithm uses many sub-functions, only three functions are likely of interest to the reader. The :func:`~efficient_apriori.apriori` returns both the itemsets and the association rules, which is obtained by calling :func:`~efficient_apriori.itemsets_from_transactions` and :func:`~efficient_apriori.generate_rules_apriori`, respectively. The rules are returned as instances of the :class:`~efficient_apriori.Rule` class, so reading up on it's basic methods might be useful. Apriori function ~~~~~~~~~~~~~~~~ .. autofunction:: apriori Itemsets function ~~~~~~~~~~~~~~~~~ .. autofunction:: itemsets_from_transactions Association rules function ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: generate_rules_apriori Rule class ~~~~~~~~~~ .. autoclass:: Rule