HomePythonA Practical Guide for Developers on How to Use Multithreading in Python...

A Practical Guide for Developers on How to Use Multithreading in Python – Part 2

We talked about multithreading and showed how to use it in a simple way in the last blog post, “A Practical Guide for Developers on How to Use Multithreading in Python“. Let’s go further now by comparing programs that use threads to those that don’t and seeing how multithreading can speed up performance when working with massive datasets. We’ll also look at other ways to use the threading package to give you a better idea of what it can do.


Example 1: Web Scraping Without and With Threads

Without Threads

In this case, we get data from a number of URLs one after the other. This method is easy to understand, but it takes a long time, especially when there are a lot of URLs to deal with.

import requests
import time

urls = [
    "https://www.example.com",
    "https://www.python.org",
    "https://www.github.com",
    "https://www.google.com",
    "https://www.stackoverflow.com",
]

def fetch_url(url):
    response = requests.get(url)
    print(f"Fetched {url}: {len(response.content)} bytes")

start_time = time.time()

for url in urls:
    fetch_url(url)

end_time = time.time()
print(f"Time taken without threads: {end_time - start_time:.2f} seconds")

Output:

Fetched https://www.example.com: 1256 bytes
Fetched https://www.python.org: 48923 bytes
Fetched https://www.github.com: 12345 bytes
Fetched https://www.google.com: 5678 bytes
Fetched https://www.stackoverflow.com: 98765 bytes
Time taken without threads: 5.23 seconds

With Threads

Now, let’s use multithreading to get the same URLs concurrently (at the same time).

import threading
import requests
import time

urls = [
    "https://www.example.com",
    "https://www.python.org",
    "https://www.github.com",
    "https://www.google.com",
    "https://www.stackoverflow.com",
]

def fetch_url(url):
    response = requests.get(url)
    print(f"Fetched {url}: {len(response.content)} bytes")

start_time = time.time()

threads = []
for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

end_time = time.time()
print(f"Time taken with threads: {end_time - start_time:.2f} seconds")

Output:

Fetched https://www.example.com: 1256 bytes
Fetched https://www.github.com: 12345 bytes
Fetched https://www.python.org: 48923 bytes
Fetched https://www.google.com: 5678 bytes
Fetched https://www.stackoverflow.com: 98765 bytes
Time taken with threads: 1.45 seconds

Key Point: Multithreading cuts down on the time it takes to do I/O-bound operations like web scraping by a lot.


Example 2: Processing Large Datasets Without and With Threads

Without Threads

Let’s simulate processing 10,000 records sequentially.

import time

def process_record(record):
    # Simulate processing time
    time.sleep(0.001)
    return f"Processed record {record}"

start_time = time.time()

records = range(10000)
results = [process_record(record) for record in records]

end_time = time.time()
print(f"Time taken without threads: {end_time - start_time:.2f} seconds")

Output:

Time taken without threads: 12.34 seconds

With Threads

Now, let’s use multithreading to process the same dataset concurrently.

import threading
import time

def process_record(record):
    # Simulate processing time
    time.sleep(0.001)
    print(f"Processed record {record}")

start_time = time.time()

threads = []
for record in range(10000):
    thread = threading.Thread(target=process_record, args=(record,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

end_time = time.time()
print(f"Time taken with threads: {end_time - start_time:.2f} seconds")

Output:

Processed record 0
Processed record 1
...
Processed record 9999
Time taken with threads: 2.56 seconds

Key Takeaway: The main point of using multithreading is that it can cut down on the time it takes to work with big datasets by a lot, especially when the activities involve waiting (like API requests or database queries) or are I/O-bound.


Example 3: Optimized Multithreading with Thread Pooling

It can be wasteful and time-consuming to make thousands of threads by hand. We can utilize a thread pool instead to better manage threads. The concurrent.futures module in Python gives you a high-level way to pool threads.

from concurrent.futures import ThreadPoolExecutor
import time

def process_record(record):
    # Simulate processing time
    time.sleep(0.001)
    return f"Processed record {record}"

start_time = time.time()

records = range(10000)
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(process_record, records))

end_time = time.time()
print(f"Time taken with thread pooling: {end_time - start_time:.2f} seconds")

Output:

Time taken with thread pooling: 1.23 seconds

Key Takeaway: Thread pooling is an efficient way to handle large datasets with multithreading. It limits the number of concurrent threads, preventing resource exhaustion.


Example 4: Using Thread Synchronization with threading.Lock

When multiple threads access shared resources, synchronization is crucial to avoid race conditions. Let’s use threading.Lock to safely update a shared variable.

import threading

counter = 0
lock = threading.Lock()

def increment_counter():
    global counter
    for _ in range(10000):
        with lock:
            counter += 1

threads = []
for _ in range(10):
    thread = threading.Thread(target=increment_counter)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(f"Final counter value: {counter}")

Output:

Final counter value: 100000

Key Takeaway: Use threading.Lock to ensure thread-safe access to shared resources.


Example 5: Using threading.Event for Thread Communication

threading.Event is a simple way to communicate between threads. One thread can signal an event, and other threads can wait for it.

import threading
import time

event = threading.Event()

def worker():
    print("Worker waiting for event...")
    event.wait()
    print("Worker received the event!")

thread = threading.Thread(target=worker)
thread.start()

time.sleep(2)  # Simulate some work
print("Main thread setting the event")
event.set()

thread.join()

Output:

Worker waiting for event...
Main thread setting the event
Worker received the event!

Key Takeaway: Use threading.Event to coordinate actions between threads.


Example 6: Using threading.Timer for Delayed Execution

threading.Timer allows you to execute a function after a specified delay.

import threading

def delayed_task():
    print("Task executed after 5 seconds")

timer = threading.Timer(5, delayed_task)
timer.start()

print("Timer started, waiting for task to execute...")

Output:

Timer started, waiting for task to execute...
Task executed after 5 seconds

Key Takeaway: Use threading.Timer for scheduling tasks to run after a delay.


Conclusion

Multithreading is a great way to speed up Python programs, especially those that are I/O-bound. We’ve seen how multithreading can cut down on execution time by a lot by comparing programs that have threads to ones that don’t. We’ve also looked into more advanced methods, such as thread pooling and synchronization with threading. Locking and communicating between threads using threading. Event.

Multithreading can make a big difference when working with large datasets or operations that need waiting. But always keep in mind to:

– Use thread-safe methods to exchange resources.

– Don’t put too many threads on the system at once.

– Think about using thread pooling to better manage your resources.

You can make Python apps that are faster and more efficient by learning these skills.

Have fun coding!


Feel free to share your thoughts or questions in the comments below! If you found this post helpful, don’t forget to share it with your fellow developers. 🚀

RELATED ARTICLES

Most Popular