By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SA Jobs SearchSA Jobs Search
  • Jobs
  • Employers/Agencies
  • Career Advice
    • Skills
    • News
  • NECTA Results
  • SA Articles
    • SASSA Updates
  • Scholarships
  • Technology
Search
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Reading: Running in Parallel Python for Data Science
Share
Sign In
Notification Show More
Latest News
High Paying Jobs In Canada For Immigrants
Career Advice
AKA Accused Of Killing Nellie Tembe In Leaked Court Documents
AKA Accused Of Killing Nellie Tembe In Leaked Court Documents
Entertainment News
Vumoo: Free Movie Streaming
Tech Tips
What Are Basic Skills In Education | How to develop the 10 key skills you need to work in Education
Education, Colleges & Institutes
Is Capital Goods a Good Career Path?
Productivity
Aa
SA Jobs SearchSA Jobs Search
Aa
  • Home
  • Home
  • Home
  • Home
  • Jobs
  • Resume Home
  • Jobs
  • Jobs
  • Job Listing Home
  • Search Map Home
  • RecruitExpert Home
  • Career Home
  • Companies
  • Job Categories Home
  • Resumes
  • Candidate Home
  • #17001 (no title)
  • Pages
  • Hiring Home
  • Jobs
  • Style 1
  • Style 2
  • Style 3
  • Style 4
  • Companies
  • Companies
  • Resumes
  • Resumes
  • Jobs
  • Pages
  • Pages
  • Style 1
  • Style 2
  • Style 3
  • Style 4
  • Companies
  • Companies
  • Resumes
  • Resumes
  • Pages
  • Pages
Search
  • Disclaimer
  • About Us
  • Privacy Policy
  • Contact Us
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Education, Colleges & Institutes

Running in Parallel Python for Data Science

Admin
Last updated: 2021/12/06 at 3:10 AM
Admin
Share
SHARE

Most computers today are multi-core (two or more processors in one package), some with multiple physical CPUs. One of the major limitations of Python is that it uses one core by default. (It was created at a time when single cores were the norm.)
Data science projects require a lot of math. In particular, part of the scientific aspect of data science relies on repeated tests and experiments on different data matrices. Don’t forget that working with huge amounts of data means that most time-consuming transformations repeat observation after observation (eg identical, unrelated operations on different parts of the matrix).
Using more CPU cores speeds up the computation process by a factor that roughly matches the number of cores. For example, having four cores means working at best four times faster. You do not receive a full quadruple increment because there is an overhead when starting a parallel process – new running instances of Python must be set up with the correct information in memory and run; Thus, the improvement will be less than what can be achieved but will still be significant.
So knowing how to use more than one CPU is an advanced but incredibly useful skill for increasing the number of analytics completed, and for speeding up your processes when setting up and when using your data products.
Multiprocessing works by repeating the same code and memory content in many new Python instances (workers), compute the result for each, and return the combined results to the original main console. If the original instance is already taking up a lot of available RAM, it will not be possible to create new instances, and your device may run out of memory.
 
Multi-core parallel performance
To perform multi-core parallelism with Python, you can combine the Scikit-Learn package with the Joblib package for time-consuming operations, such as copying models to check results or searching for the best hyperparameters. In particular, Scikit-Learn allows multiple processing when
Cross-validation: Testing machine learning hypothesis results using different training and testing data
Searching the network: systematic change of hyperparameters of a machine learning hypothesis and testing its consequences
Multi-label prediction: Run an algorithm multiple times against multiple targets when there are many different target outcomes to predict at the same time
Group machine learning methods: Modeling a large set of classifiers, each independent of the other, as when using RandomForest-based modeling
You don’t have to do anything special to take advantage of parallel computations – you can activate parallelism by setting the n_jobs parameter to a number of cores more than 1 or by setting the value to -1 , which means you want to use all available CPU instances.
If you are not running your code from the console or from the IPython Notebook, it is very important that you separate your code from any package import or global variable setting in your script with the if __name __ == ‘__ main__’ command : at the beginning of any code that implements multi-core parallelism. The if statement checks whether the program is being run directly or if it is being called by the already running Python console, avoiding any confusion or error by multi-parallel operation (such as recursively calling parallel).
 
Show multiprocessing
It’s a good idea to use IPython when running a demo of how multiprocessing really saves time during data science projects. Using IPython offers the advantage of using the %timeit magic command to implement the timing. You start by loading a multi-class dataset, a complex machine-learning algorithm (Support Vector Classifier, or SVC), and performing a cross-validation to estimate reliable results from all actions.
The most important thing to know is that the procedures are getting quite large because the SVC produces 10 models, which iterate 10 times each using cross-validation, for a total of 100 models.
from sklearn.datasets import load_digits
digits = load_digits()
X, y = digits.data,digits.target
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_score
%timeit single_core_learning = cross_val_score(SVC(), X,
y, cv=20, n_jobs=1)
Out [1] : 1 loops, best of 3: 17.9 s per loop
After this test, you need to activate the multicore parallelism and time the results using the following commands:
%timeit multi_core_learning = cross_val_score(SVC(), X, y,
cv=20, n_jobs=-1)
Out [2] : 1 loops, best of 3: 11.7 s per loop
The example machine demonstrates a positive advantage using multicore processing, despite using a small dataset where Python spends most of the time starting consoles and running a part of the code in each one. This overhead, a few seconds, is still significant given that the total execution extends for a handful of seconds. Just imagine what would happen if you worked with larger sets of data — your execution time could be easily cut by two or three times.
Although the code works fine with IPython, putting it down in a script and asking Python to run it in a console or using an IDE may cause errors because of the internal operations of a multicore task. The solution is to put all the code under an if statement, which checks whether the program started directly and wasn’t called afterward. Here’s an example script:
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_score
if __name__ == ‘__main__’:
digits = load_digits()
X, y = digits.data,digits.target
multi_core_learning = cross_val_score(SVC(), X, y,
cv=20, n_jobs=-1)

You Might Also Like

What Are Basic Skills In Education | How to develop the 10 key skills you need to work in Education

Ardhi University (ARU) Entry Requirements 2022-2023

FORM FIVE SELECTION 2022-2023

Fluoxetine Side Effects in Females

TAGGED: parallel processing in python machine learning, python parallel for loop, python parallel processing, python ray multiprocessing example, python ray shared memory, python ray tutorial, python train multiple models in parallel, Running in Parallel Python for Data Science

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
[mc4wp_form]
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share this Article
Facebook Twitter Copy Link Print
Share
Previous Article Coding functions for storing data in SQL and NoSQL databases
Next Article The BeagleBone Black’s Task Bar
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow
banner banner
Create an Amazing Newspaper
Discover thousands of options, easy to customize layouts, one-click to import demo and much more.
Learn More

Latest News

High Paying Jobs In Canada For Immigrants
AKA Accused Of Killing Nellie Tembe In Leaked Court Documents
AKA Accused Of Killing Nellie Tembe In Leaked Court Documents
Vumoo: Free Movie Streaming
What Are Basic Skills In Education | How to develop the 10 key skills you need to work in Education
//

We influence 20 million users and is the number one business and technology news network on the planet

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

SA Jobs SearchSA Jobs Search
Follow US

© 2023 SA Jobs Search. All Rights Reserved.

Removed from reading list

Undo
Welcome Back!

Sign in to your account

Register Lost your password?