Understanding Modules and Packages in Python

    Kabaki Antony
    Share

    In this article, we’ll look at some of the concepts involved when structuring our Python code using modules and packages. We’ll learn how to create our own modules, how to define functions and classes, and how we can use them in other modules or packages. We’ll also look at how to create packages, by organizing related modules in a directory, and how to import modules from packages. Finally, we’ll explore some of Python’s built-in modules and packages.

    By the end of this tutorial, we’ll have a solid understanding of how to structure our code using modules and packages, greatly enhancing our ability to write maintainable, reusable, and readable code.

    Table of Contents
    1. Introducing Modules and Packages
    2. Working with Modules
    3. Introducing Packages
    4. The __all__ attribute
    5. The Python Standard Library and Popular Third-party Packages
    6. Packaging and Distribution
    7. Conclusion

    Introducing Modules and Packages

    A module in Python is a single file that contains Python code in the form of functions, executable statements, variables, and classes. A module acts as a self-contained unit of code that can be imported and used in other programs or modules.

    A package, on the other hand, is a collection of modules organized in a directory. Packages allow us to group multiple related modules together under a common namespace, making it easier to organize and structure our code base.

    Breaking code down into modules and packages offers immense benefits:

    • Maintainability. Breaking down code into modules helps us make changes in the independent parts of the overall application without affecting the whole application, since the modules are designed to only deal with one part of the application.

    • Reusability. This is a key part of software development, where we write code once and we can use it in many different parts of an application as many times as we want. This enables us to write clean and dry code.

    • Collaboration. Modular code enhances and enables collaboration. Different teams can work on different parts of the same application at the same time without interfering with each other’s work.

    • Readability. Breaking down code into modules and packages enhances code readability. We can easily tell what’s going on in a file. We might, for example, have a file named databaseConnection.py: just from the name we can tell that this file deals with database connections.

    Working with Modules

    Modules can be imported and used in other programs, modules, and packages. They’re very beneficial in an application, since they break down the application function into smaller, manageable, and logical units.

    For instance, say we want to create a web application: the application is going to need code for connecting to a database, code for creating database models, code that’s going to be executed when a user visits a certain route, and so on.

    We can put all the code in one file, but then the code very quickly becomes unmaintainable and unreadable. By using modules, we can break down the code into units that are more manageable. We’ll put all the code needed to connect to the database in one file, code for database models is put in another file, and code for the routes into a module. Breaking the code down into those modules promotes organization, reusability, and maintainability.

    Creating a simple module

    It’s quite straightforward to create a module in Python. Say we have a number of related functions, variables, and classes: we could put them in one module, and give the module any name we want, but it’s advisable to give our modules descriptive names — just as with functions, variables, classes.

    To create a module in Python, open up an IDE or text editor, create a file, and give it a descriptive name and a .py extension. For this example, let’s call it sample.py and enter in the following code:

    # sample.py
    
    # create a variable in the module
    sample_variable  = "This is a string variable in the sample.py module"
    
    # A function in the module
    def say_hello(name):
      return f"Hello, {name}  welcome to this simple module."
    
    # This is another function in the module
    def add(a, b):
      return f"The sum of {a} + {b} is = {a+b}"
    
    print(sample_variable)
    print(say_hello("kabaki"))
    print(add(2, 3))
    

    The code above defines a module named sample.py. It contains a variable named sample_variable whose value is the string "This is a string variable in the sample.py module". This module also contains two function definitions. When called, the say_hello() function takes in a name parameter, and it returns a welcome message if we pass a name to it. The add() function returns the sum of two numbers that have been passed to it.

    While modules are meant to be used in other parts of the program or an application, we can run them independently. To run this module, we need to have Python installed in our development environment. We can run it on the terminal using the following command:

    python sample.py 
    

    Or we can use the following command:

    python3 sample.py
    

    This will return the following output:

    This is a string variable in the sample.py module
    Hello, kabaki welcome to this simple module.
    The sum of 2 + 3 is = 5
    

    For one-off module usage, we can run it as a standalone, but most modules are made to be used in other modules or other parts of a Python program. So to use variables, functions, and classes from one module in another module we have to import the module. There are different ways of importing modules, so let’s look at them.

    Using the import statement

    We can use the import statement to make the contents of one module available for use in another module. Consider our sample.py from above: to use its contents in another module, we just import it:

    # another_module.py
    
    import sample
    
    print(sample.sample_variable)
    print(sample.say_hello(“John”))
    print(sample.add(2, 3))
    

    The code above shows how to import the functions from the sample.py module, making them available for use in the another_module.py. Note that, when we import a module, we don’t include the .py extension; Python automatically knows we’re importing a module.

    Using the from keyword

    We can also use the from keyword to import specific functions or variables. Say a module has a large number of functions and variables defined in it and we don’t want to use all of them. We can specify the functions or variables we want to use, using the from keyword:

    # another_module.py
    
    from sample import add
    
    print(add(10, 4))
    

    The code above shows that we’ve specifically imported the add() function from the sample module.

    Another benefit of using the from keyword is that we’ll run the imported function without namespacing it or prefixing it with the name of its parent module. Instead, we’ll use the function like we’ve defined it in the file where we’re using it. This leads to more concise and readable code.

    Using as

    We can use as to provide an alias or an alternate name for the module.

    At times, we may define module names that are quite long or unreadable. Python provides a way of giving the module imports an alternate or alias, which we can use to refer to them in the modules we’re importing them into. To do this, we’ll use the as keyword:

    # another_module.py
    
    import sample as sp
    
    result = sp.add(5, 5)
    print(result)
    print(sp.say_hello("Jason"))
    

    This code shows an import of the sample module, where the module is being given an alternate name sp. So using sp is just the same as calling sample. Therefore, using the alias, we have access to the variables and functions, in the same way we could if we were using the original name.

    Using those three methods, we’re able to use the variables or functions from one module in another module, enhancing the readability of our application where we don’t need to put the code in one file.

    While naming our modules, it’s good practice to use lowercase letters and separate words with underscores. For instance, if we have a module for handling database connections, we might name it database_connection.py. To avoid naming conflicts, try to choose descriptive and unique names for modules. If a module name might cause a name clash with a Python built-in keyword or module from a third-party library, consider using a different name or adding a prefix that’s relevant to the project. Also, remember that names are case-sensitive in Python, so make sure to use the correct module name when importing.

    Overall, using modules lets us create and organize our code in a readable and maintainable way. And this is very useful — whether we’re working on a small script or a large application. Later, we’ll look at some common Python standard library modules.

    Introducing Packages

    A package in Python is a way of organizing related modules into a directory. This provides a better way of organizing code, enabling us to group modules that serve a common purpose or are part of the same component.

    Packages are particularly beneficial when structuring larger projects or libraries. For instance, consider the case of a web application where we have code for different database models, views, and utilities.

    It would make a lot of sense if we created a models package with different modules for the different models in an application. Say our web app is a blogging application: possible models could be a users model and a posts model; we would then create a module for user management, and a module for posts management, and then put them in the models package.

    It’s important to reiterate at this point that modules are individual files containing Python code: they help put related functions, classes, and variables within a single file. In contrast, packages are directories that contain multiple modules or subpackages. They provide a higher level of organization for our code, by grouping related modules and enabling us to create more structured and maintainable projects.

    Building and managing packages

    While packages organize related code modules in one directory, just putting the modules in a directory doesn’t make it a package. For Python to identify a directory as a package or a subpackage, the directory must contain a special file named __init__.py.

    This file notifies Python that the directory containing it should be treated as a package or a subpackage. This file could be empty, and most of the time it is, but it can also contain initialization code, and it plays a vital role in Python’s package structure and import mechanisms. So using __init__.py tells Python that we are intentionally creating a package, thereby helping it differentiate between a package and an ordinary directory.

    Packages can have a hierarchical structure, meaning we can create subpackages within our packages to further organize our code. This enables finer and more controlled separation of components and functionality. Consider the following example:

    my_package/
    ├── __init__.py
    ├── module1.py
    └── subpackage/
      ├── __init__.py
      ├── submodule1.py
      └── submodule2.py
    

    This diagram shows my_package is the main package, and subpackage is a subpackage within it. Both directories have an __init__.py file. Using this kind of structure helps us organize our code into a meaningful hierarchy.

    Creating packages and subpackages

    To create a package, we first create a directory that’s going to contain our modules. Then we create an __init__.py file. Then we create our modules in it, along with any subpackages.

    Say we’re building a calculator application: let’s create a package for various calculations, so create a directory in our terminal or our IDE and name it calculator.

    In the directory, create the __init__.py file, then create some modules. Let’s create three modules, add.py, subtract.py, and multiply.py. In the end, we’ll have a directory structure similar to this:

    calculator/
    ├── __init__.py
    ├── add.py
    ├── subtract.py
    └── multiply.py
    

    Let’s put some samples in those files. Open the add.py module and put in the following code:

    # add.py
    
    def add(a, b):
      """
      Adds two numbers and returns the result.
    
      :param a: First number.
      :param b: Second number.
      :return: Sum of a and b.
      """
      return a + b
    

    This creates a module for addition, separating it from other calculations. Let’s create one more module for subtraction. Open the subtract.py file and put the following code in it:

    # subtract.py
    
    def subtract(a, b):
      """
      Subtracts two numbers and returns the result.
    
      :param a: First number.
      :param b: Second number.
      :return: Difference of a and b.
      """
      return a - b
    

    So in our application, if we wish to take advantage of the calculator modules, we’ll just import the package. There are different ways to import from a package, so let’s look at them in the next section.

    Importing from packages

    To import modules from packages or subpackages, there are two main ways. We can either use a relative import or an absolute import.

    Absolute imports

    Absolute imports are used to directly import modules or subpackages from the top-level package, where we specify the full path to the module or package we want to import.

    Here’s an example of importing the add module from the calculator package:

    # calculate.py
    
    from calculator.add import add
    
    result = add(5, 9)
    
    print(result)
    

    The above example shows an external module — calculate.py — that imports the add() function from the add module using an absolute import by specifying the absolute path to the function.

    Relative imports

    Relative imports are used to import modules or packages relative to the current module’s position in the package hierarchy. Relative imports are specified using dots (.) to indicate the level of relative positioning.

    In order to demonstrate relative imports, let’s create a subpackage in the calculator package, call the subpackage multiply, then move the multiply.py module into that subpackage, so that we’ll have an updated package structure like this:

    calculator/
    ├── __init__.py
    ├── add.py
    ├── subtract.py
    └── multiply/
      ├── __init__.py
      └── multiply.py
    

    With this setup, we can now use relative imports to access the multiply module from other modules within the calculator package or its subpackages. For instance, if we had a module inside the calculator package that needs to import the multiply module, we could use the code below:

    from .multiply import multiply 
    
    result = multiply(5, 9)
    print(result)
    

    Overall, relative imports are particularly useful for imports within a package and subpackage structure.

    The __all__ attribute

    There are times when we may use all modules from a package or subpackages, or all functions and variables from a module, so typing out all names becomes quite cumbersome. So we want a way to specify that we’re importing functions and variables that a module has to offer or all modules that package offers.

    To set up what can be imported when a user wants to import all offerings from a module or a package, Python has the __all__ attribute, which is a special attribute that’s used in modules or packages to control what gets imported when a user uses the from module import * statement. This attribute allows us to specify a list of names that will be considered “public” and will be imported when the wildcard (*) import is used.

    Using the __all__ attribute in modules

    In a module, we can define the __all__ attribute to explicitly specify which names should be imported when the from module import * statement is used. This helps prevent unintended imports of internal names, providing a clear way of showing the functions that can be imported publicly and those that are meant for use only in the module.

    Here’s an example:

    # my_module.py
    
    __all__ = ['public_function', 'public_variable']
    
    def public_function():
      return "This is a public function."
    
    def _internal_function():
      return "This is an internal function."
    
    public_variable = "This is a public variable."
    _internal_variable = "This is an internal variable."
    

    The code above defines a module named my_module.py, and with the __all__ attribute being set, only the public_function and the public_variable will be imported when the from my_module import * is used. The function and variable names starting with an underscore won’t be imported.

    It’s important to note a few things. If we know the absolute paths to the functions starting with an underscore, we can still import them to our code. However, that goes against the convention of encapsulation, since the underscore (_) denotes them as private members of the module and indicates that they shouldn’t be used outside the module. So it’s good practice to follow Python programming conventions even if Python doesn’t enforce strict encapsulation.

    Using the __all__ attribute in packages

    The __all__ attribute can also be used in __init__.py files within a package or subpackage to control the default behavior of wildcard imports for submodules or subpackages. This can help ensure that only specific modules are imported when using wildcard imports on packages:

    # my_package/__init__.py
    
    __all__ = ['submodule1', 'subpackage']
    
    from . import submodule1
    from . import subpackage
    

    This example shows an __init__.py file specifying that only submodule1 and subpackage1 will be imported when using from my_package import *. Other submodules or subpackages won’t be imported by default.

    As in the case of modules, we can still import the other modules not specified in the __all__ attribute list if we know their absolute paths. So the __all__ attribute acts as a convention rather than as a strict rule. It’s meant to communicate what can be used publicly from a module or a package. It is, however, recommended that explicit imports (import module_name) be used instead of wildcard imports (from module_name import *).

    The Python Standard Library and Popular Third-party Packages

    The Python Standard Library is a collection of modules and packages that come included with the Python interpreter installation. These modules provide a wide range of functionalities — from working with data types and performing file operations to handling network communication and implementing various algorithms.

    Some of the commonly used modules in the Python standard library include:

    • os: gives us an API for interacting with the host operating system
    • math: provides a wide range of mathematical functions and constants (useful when performing various mathematical operations in our code)
    • datetime: enables us to work with dates and time in our code
    • json: enables us to handle JSON data in our code
    • argparse: enables us to create command line interfaces
    • csv: enables us to read and write CSV files

    The standard library contains a lot more modules than these few examples, each with its own area of application, enforcing the benefits of breaking code down into modules. To learn more about the modules on offer, visit the official Python documentation.

    The Python Package Index and third-party packages

    The Python Package Index (PyPI) is a repository of third-party Python packages that extend the functionality of the Python Standard Library. These packages cover a wide range of domains and provide solutions to various programming challenges. These packages are created by the open-source community. We can also create our own package and publish it with the repository.

    To manage third-party packages, Python uses a tool called pip (Python Package Installer). pip allows us to easily install, upgrade, and manage packages from PyPI.

    We can install any third-party library using pip:

    pip install package_name
    

    For instance, to install the Django package (which is used for web development) we can run this:

    pip install django
    

    Here are examples of some popular third-party packages:

    • NumPy: a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a variety of mathematical functions to operate on these arrays.

    • Pandas: a library for data manipulation and analysis. It provides data structures like DataFrames for efficiently handling and analyzing tabular data.

    • Matplotlib: a widely-used library for creating static, animated, and interactive visualizations in Python. It offers a MATLAB-like interface for plotting various types of graphs and charts.

    • SciPy: built on top of NumPy, SciPy provides additional functions for optimization, integration, linear algebra, signal processing, and more.

    • Django: a high-level web framework for building web applications. It follows the Model-View-Controller (MVC) architecture and offers features for handling databases, URLs, templates, and more.

    • Flask: another web framework, Flask is more lightweight and minimal compared to Django. It’s ideal for building smaller web applications or APIs.

    • Requests: a package for making HTTP requests and handling responses. It simplifies working with web APIs and fetching data from the Internet.

    The packages listed above are just a few examples of the vast ecosystem of third-party packages available on PyPI. Packages like these can save us a lot of time and effort.

    Packaging and Distribution

    Packaging and distributing our Python projects allows others to easily install and use our code. This is especially important when we want to share our libraries or applications with a wider audience. Here’s a brief overview of how to package and distribute our Python projects.

    setuptools for packaging

    setuptools is a package that provides building and packaging capabilities for our Python projects. It simplifies the process of creating distribution packages, including source distributions (sdist) and binary distributions (bdist). To use setuptools, we typically create a setup.py script in our project’s root directory.

    Here’s a simple example of a setup.py script:

    from setuptools import setup, find_packages
    
    setup(
      name="my_project",
      version="0.1",
      packages=find_packages(),
      install_requires=[
          "requests",
          # other dependencies
      ],
      entry_points={
          "console_scripts": [
              "my_script = my_project.my_module:main",
          ],
      },
    )
    

    In the script above, we specify the project’s name, version, packages, dependencies, and any entry points using the setup() function.

    twine for publishing

    Once our project is properly packaged using setuptools, we can use twine to upload our package to PyPI for distribution. twine is a tool that helps us securely upload packages to PyPI.

    To use twine, we need to install it:

    pip install twine
    

    We then go to our project’s root directory and use the following command to upload our package:

    twine upload dist/*
    

    Keep in mind that distributing packages on PyPI requires creating an account and following certain guidelines. It’s recommended that we read the official PyPI documentation for detailed instructions on packaging and distribution.

    Some of the guidelines:

    • Versioning. Properly version packages to indicate changes and updates. This helps users understand what’s new and ensures compatibility.

    • Documentation. Include clear documentation for the code, describing how to install and use our package. Use tools like Sphinx to generate documentation.

    • Licensing. Clearly specify the license under which the package is distributed to ensure users understand how they can use it.

    • Testing. Implement testing to ensure the package functions as expected. Tools like pytest can be helpful for writing and running tests.

    By properly packaging and distributing our Python projects, we make it easier for others to access and use our code, contributing to a more collaborative and open-source development environment.

    Conclusion

    In this tutorial, we’ve explored the concepts of modules and packages in Python and their significance in writing well-organized, maintainable, and reusable code.

    Modules are individual files containing Python code that encapsulate functions, classes, and variables. They promote code organization within a single script and facilitate code reuse across multiple scripts.

    Packages take the concept of modularity to the next level by allowing us to organize related modules into directory hierarchies. This hierarchical structure enhances code organization in larger projects and fosters a clear separation of concerns.

    As we continue our Python journey, mastering the art of modular programming with modules and packages will undoubtedly contribute to us becoming more proficient and efficient developers. By leveraging these concepts, we’ll be better equipped to tackle complex projects and collaborate effectively with other developers.

    FAQs About Modules and Packages in Python

    What is a module in Python?

    A module in Python is a file containing Python code, functions, classes, or variables. Modules allow you to organize and reuse code by separating it into individual files.

    How do I create and use a module in Python?

    To create a module, you simply create a .py file with Python code and save it in the same directory as your Python script. You can then import and use functions, classes, or variables from the module using the import statement.

    What is a package in Python?

    A package in Python is a way to organize related modules into directories and subdirectories. It helps manage and structure larger Python projects by grouping related functionality.

    How do I create and use a package in Python?

    To create a package, you create a directory and place one or more module files inside it. You also include a special __init__.py file (which can be empty) to indicate that the directory is a package. You can then import modules from the package using dot notation.

    How does Python find and load modules and packages?

    Python looks for modules and packages in directories listed in the sys.path variable. It searches for them in the current directory and standard library paths. You can also add custom paths to sys.path to make your modules or packages accessible.

    What is a namespace collision, and how can I avoid it when using modules and packages?

    A namespace collision occurs when two modules or packages have the same name. To avoid collisions, choose unique module and package names, or use aliasing with the as keyword when importing to create shorter, distinct names for the imported entities.