Popular tools used in data science

  • Data pre-processing and analysis Python, R, Microsoft Excel, SAS, SPSS
  • Data exploration and visualization – Tableau, Qlikview, Microsoft Excel
  • Parallel and distributed computing incase of big data Apache Spark, Apache Hadoop

Evolution of Python

  • Python was developed by Guido van Rossum in the late eighties at the ‘National Research Institute for Mathematics and Computer Science’ at Netherlands
  • Python Editions

Python 1.0

Python 2.0

Python 3.0

 

Python as a programming language

  • Supports multiple programming paradigm Functional, Structural, OOPs, etc.
  • Dynamic typing
  • Runtime type safety checks
  • Reference counts
  • Deallocates objects which are not used for long
  • Late binding
  • Methods are looked up by name during runtime
  • Python’s design is guided by 20 aphorisms as described in Zen of Python by Tim Peters

  • Standard CPython interpreter is managed by “Python Software Foundation”
  • There are other interpreters namely JPython (Java), Iron Python (C#), Stackless Python (C, used for parallelism), PyPy (Python itself JIT compilation) Standard libraries are written in python itself.
  • High standards of readability
  • Cross-platform (Windows, Linux, Mac)
  • Highly supported by a large community group
  • Better error handle
  • Comparison to Java
  • Python vs Java

Java is statically typed i.e. type safety is checked during compilation (static compilation)

Thus in Java the time required to develop the code is more

Python which is dynamically typed compensates for huge compilation time when compared to Java

Codes which are dynamically typed tend to be less verbose therefore offering more readability

 

Advantages of using python

  • Python has several features that make it well suited for data science
  • Open source and community development
  • Developed under Open Source Initiative license making it free to use and distribute even commercially
  • Syntax used is simple to understand and code
  • Libraries designed for specific data science tasks . Combines well with majority of the cloud platform service providers

Coding environment

  • A software program can be written using a terminal, a command prompt (cmd), a text editor or through an Integrated Development Environment (IDE)
  • The program needs to be saved in a file with an appropriate extension (.py for python, .mat for matlab, etc…) and can be executed in corresponding environment (Python, Matlab, etc…)
  • Integrated Development Environment (IDE) is a software product solely developed to support software development in various or specific programming language(s)

 

  1. Python 2.x support will be available till 2020 Python 3.x is an enhanced version of 2.x and will only be maintained from 3.6.x post 2020
  2. Install basic python version or use the online python console as in
  3. https://www.python.org/
  4. Execute following commands and view the outputs in terminal or command prompt.
  5. Basic print statement Naming conventions for variables and functions, operators.
  6. Conditional operations, looping statements (nested) Function declaration and calling
  7. Installing modules

Integrated development environment (IDE)

  • Software application consisting of a cohesive unit of tools required for development
  • Designed to simplify software development
  • Utilities provided by IDEs include tools for managing, compiling, deploying and debugging software

 

Coding environment- IDE

  1.  An IDE usually comprises of Source code editor
  2. Compiler Debugger
  3. Additional features include syntax and error highlighting,
  4. code completion Offers supports in building and executing the program along with debugging the code from within the environment
  5. Best IDEs provide version control features
  6. Eclipse+PyDev, Sublime Text, Atom, GNU Emacs, Vi/Vim, Visual Studio, Visual Studio Code are general IDEs with python support
  7. Apart from these some of the python specific editors include Pycharm, Jupyter, Spyder, Thonny.

 

Spyder

  • Supported across Linux, Mac OS X and Windows platforms
  • Available as open source version
  • Can be installed separately or through Anaconda distribution
  • Developed for Python and specifically data science

Features include

  • Code editor with robust syntax and error highlighting
  • Code completion and navigation
  • Debugger
  • Integrated document
  • Interface similar to MATLAB and RStudio

 

PyCharm

  • Supported across Linux, Mac OS X and Windows platforms
  • Available as community (free open source) and professional (paid) version
  • Supports only Python
  • Can be installed separately or through Anaconda distribution Features include

Code editor provides syntax and error highlighting -Code completion and navigation

  1. Unit testing
  2. Debugger
  3. Version control

Jupyter Notebook

  • Web application that allows creation and manipulation of documents called ‘notebook’
  • Supported across Linux, Mac OS X and Windows platforms
  • Available as open source version
  • Bundled with Anaconda
  • Distribution or can be installed separately
  • Supports Julia, Python, R and Scala
  • Consists of ordered collection of input and output cells that contain code, text, plots etc.
  • Allows sharing of code and narrative text through output formats like PDF, HTML etc. Education and presentation tool
  • Lacks most of the features of a good IDE

How to choose the best IDE?

  • Requirements
  • Working with different IDEs helps us understand our own requirement

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2022 itfreesource.com