2018年11月30日 星期五

My machine learning course history (updating)

1. Udacity ud120
2. Andrew Ng youtube Videos
3. kaggle learning (including python ,SQL ,Pandas, Deep learning and Data Visualization)
4. Google AI education
5. pyimagesearch website held by  which has loads of opencv intros and object detection demo codes and detail applications sample codes. YOLO3. Must take a peek if you are interest in computer visions.

2018年7月29日 星期日

Pandas index and selection

pandas is a very handy module for data scientist using python. There are some tricks that worth noted here for reference. Skip the basic DataFrame and Series class, i'll put focus on data selection. Most commonly used method for indexing are loc and iloc.

reviews.loc[[0,1,10,100],['country','province','region_1','region_2']
Above code select columns 'country','province','region_1','region_2' of row 0,1,10,100 from reviews DataFrame. loc is used for selection with string column name or index name.

reviews.iloc[[1,2,3,5,8],:]
This line used to select with numeric indexing of rows 1,2,3,5,8 from reviews dataframe

reviews.loc[[x for x in range(101)],['country','variety']]
More complex usage. Select first 100 rows of columns 'country' and 'variety'

reviews.country == 'Italy'
This line of code can produce a boolean Series which can be used for conditioning select. For example:

reviews[reviews.country =='Italy']
This line can select reviews of country equal to 'Italy'

reviews.region_2.notnull()
notnull isnull can be used to produce a Series used to indexing whether the column is not NaN or is NaN logical operation can also been used for dataframe selection

ds3 = reviews[reviews.country.isin(['Italy','France']) & (reviews.points >=90)].country
isin method equals to 'in' operation of python. notice the '&' equal to 'and' in python. But we may got confused why it creates another operators for pandas indexing?

2018年6月15日 星期五

Feature optimize for machine learning-One-Hot Encoder

Recently begin study MLCC from google.
Besides complex mathematics underneath all kinds of optimizers, over 80% of work time will be spent on data collecting/processing/cleaning and define useful features to feed to optimizer.
Linear regressor  requires numeric features. So for some of the data columns which contains characters/string(categorical), we can use so call "One hot encoding" method to convert these kind of data. skikit-learn offer module to easily get these done 
LabelEncoder
OneHotEncoder


from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
input_data = ['happy','sad','cry','happy','blank','blank','sad','cry','happy','sad']
#need to convert to array structure
values = array(input_data)
# integer encode
label_encoder = LabelEncoder()
encoded_output_list = label_encoder.fit_transform(values)
print(encoded_output_list)

Output:
[2 3 1 2 0 0 3 1 2 3]

output is the transformed integer list from input list, but still not yet an one-hot list.
You still need to User OneHotEncoder to encode integer list to one-hot formateed list with below code


# binary encode
onehot_encoder = OneHotEncoder(sparse=False)
#need to reshape integer list shape from 1xn to nx1 since it fits 
#feature columns more for later Machine learning usage
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)  
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print(onehot_encoded)

Output:
[[0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

This discrete feature to numeric feature transformation is frequently used in ML. Noted Here.

2018年4月29日 星期日

How to set up cgi module for apache2

At this moment, i am studying CGI(common gateway interface) script with python. Need to setup a temporary local website for test. Some notes here.

Environment
1. ubuntu 14 server
2. apache2 install

I am not sure any python copy should be installed or not necessary.

First need to edit /etc/apache2/conf-enabled/serve-cgi-bin.conf

<IfModule mod_alias.c>
 <IfModule mod_cgi.c>
  Define ENABLE_USR_LIB_CGI_BIN
 </IfModule>
 
 <IfModule mod_cgid.c>
  Define ENABLE_USR_LIB_CGI_BIN
 </IfModule>
 
 <IfDefine ENABLE_USR_LIB_CGI_BIN>
  ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
  <Directory "/usr/lib/cgi-bin">
   AllowOverride None
   Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
   Require all granted
  </Directory>
 </IfDefine>
</IfModule>
 
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
substitute the directory after "ScriptAlias" with the directory you want to put your cgi 
script files. 
Note. this directory must fall in access range of apache2. You can edit apache2.conf under 
/etc/apache2/ directory to add your directory 
<Directory "/var/www/cgi-bin">
   AllowOverride None
   Options ExecCGI
   Order allow,deny
   Allow from all
</Directory>

<Directory "/var/www/cgi-bin">
   Options All
</Directory>
cgi module is not enabled by default while apache2 installed. You have to manually add it by make a link to /etc/apache2/mods-enabled directory with below command

$ cd /etc/apache2/mods-enabled
$ sudo ln -s ../mods-available/cgi.load
then reload apache2

$ sudo service apache2 reload
It's all set. You can now put your first cgi script to your cgi-bin directory and test it with your favorite browser under http://localhost/cgi-bin/xxx.py
oh! one more thing! Remember to change access mode to your script file.
sudo chmod +x /usr/lib/cgi-bin/xxx.py

2018年4月3日 星期二

Python install Modules

Besides standard Library, there are lots of modules developed by independent contributors and academy or science group. You can use commands to download and install these modules with built in pip install program.

python -m pip install --user modulename 

Here list some modules of categories

1. web framework

  • Django
  • Pyramid
  • Web2py
  • flask
2. Graphic processing
  • PIL
  • Pillow
3. Science and Math
  • numpy
  • Matplotlib
  • pandas
  • scikit-learn
4. command line operation
  • fabric
  • paramiko
5. Nature language processing
  • nltk
  • textblob
  • jieba
6. network client
  • requests
  • pycurl
7. database protocol
  • mysql-python
  • pymongo
  • psycopg2

Python Study Note (1)

Besides ordinary Python Manual reading, i would like to put some notes here to refresh my bad memory. There are many key words that i need to keep in mind when learning new computer language especially this language being a huge different compared to what i used for living(c, assembly).

Well skip the interpreter part which i see tiny chance of using it.

Typical python module definition file which have file extension of *.py have below format.

if you have multiple version of python install in your system, you can specify which version you want your program to run with.
#below  check your py.ini for defaults of "python" version

#! python  

#below specify the encoding of this python source file. utf-8 is default.

#-*-coding: utf-8-*-  


#then import some module you reference

import numpy

import sys,os


def fun1(a):

    pass


def main():

    pass


#below will use module file as script to run main, mainly for test purpose 

if __name__ == '__main__':

    main()  
There are some coding style that it would be nice to follow.
  • Use 4-space indentation, and no tabs.
    4 spaces are a good compromise between small indentation (allows greater nesting depth) and large indentation (easier to read). Tabs introduce confusion, and are best left out.
  • Wrap lines so that they don’t exceed 79 characters.
    This helps users with small displays and makes it possible to have several code files side-by-side on larger displays.
  • Use blank lines to separate functions and classes, and larger blocks of code inside functions.
  • When possible, put comments on a line of their own.
  • Use docstrings.
  • Use spaces around operators and after commas, but not directly inside bracketing constructs: a = f(1, 2) + g(3, 4).
  • Name your classes and functions consistently; the convention is to use CamelCase for classes and lower_case_with_underscores for functions and methods. Always use self as the name for the first method argument.
  • Don’t use fancy encodings if your code is meant to be used in international environments. Python’s default, UTF-8, or even plain ASCII work best in any case.
  • Likewise, don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code.

2018年3月7日 星期三

AI study note

AI is getting more and more real life applications recent couple years. By realize this, i think it is time to know more detail of it. There's more resources on the web then before.
The most famous online course in this field is opened by Andres Ng on coursera. He is the icon of current AI industry.
Andrew Ng Stanford University

There's also a simple introduction to AI learning by Shival Grupta
Shiva Gupta blog

Albeit there's no pre-request of programming language, but i decide to cut in follow Shival's path with Python and google's tensorflow.