安德魯讀書筆記: 2019

2019年3月26日星期二

有買房有投資對比沒買房只投資，到底退休時會差多少呢?

我們的參數是基於

早點買房跟晚點買房的差異？

透過跑程式，我們得知，如果你在很年輕就買房，到最後退休時，可能資金會差距很大，以下我們比較，30歲,40歲,50歲買房，退休會差距多少錢

房貸究竟付快一點好，還是慢一點好呢？透過以下的比較，我們可以瞭解到，能付越少，越好，最後的錢會越多

假如買房的話，房屋貸款年限究竟該長該短？透過以下的圖，我們可以知道，貸款年數，越長越好!!

每個月多省下2000元拿來投資（開銷少一點），有很大的差別嗎?透過下圖，我們可以得知，如果你每月省2000元，退休時，你會多一百多萬，對於整體而言，大概是多3%，這個我覺得開心就好，畢竟人生只有一次，我們應該理財，也應該享受

2019年3月25日星期一

股價淨值比 < 1 ，但要分散風險，因為即使它的股價淨值比低，但是也有可能是因為公司表現不好，而股價迅速下滑

KD值

最近三個月的月營收的平均 > 該家公司12個月的月營收平均

中型股波動較大型股波動大，但我們除了獲利，我們還要安全，所以

自由現金流量(公司能自由運用的錢) = 營業現金收入(賣東西賺的錢) - 投資現金流出 (更新設備擴建廠房等)

自由現金流量我們希望可以大於0，代表公司財務狀況比較安全

營業利益成長率 = (當期的營業利益 - 去年同期的營業利益) x 100% / 去年同期的營業利益

我們希望營業利益成長率是正的，代表公司比去年更賺錢

但即使這樣，我們想要公司跟公司比，於是我們看

股東權益報酬率(ROE) = 本期淨利/股東權利

ROE高 -> 小資本賺很多
ROE低 -> 大資本但不會賺錢

但即使roe高，股價太誇張了呢，於是我們再看

市值營收比(PSR - PRICE TO SALE RATIO) = 市值/月營收

PSR 高 - 市值(股價)太高，營收太少
PSR低 - 市值(股價)低，營收高

在配合有(上漲趨勢，可能有大戶)RSV 下去投資

2019年3月19日星期二

python os#path #write #close time#sleep

import time
import os

#if no directory called andrew_course9
if 'andrew_course9' not in os.listdir():
#create andrew_course9 directory
os.mkdir('andrew_course9')
#stock_id
sid = ['1101','2330']

for s in sid:
res = requests.get('website')
res.encoding = 'big5'
#set up save address , ex:andrew_course9\1101.html
path = os.path.join('andrew_course9',s+'.html')
#open the file ,write in ,then close
f = open(path,'w',encoding='utf-8')
f.write(res.text)
f.close()

print(s)
#pause 20 sec , to prevent website does not let you catch data
time.sleep(20)

********************************************************************************

dfs = []
#for every stock
for s in sid:
#get the file from directory
path = os.path.join('andrew_course9', s + '.html')

dfs.append(pd.read_html(path,encoding = 'utf-8'))

dfs[1][1].head(10)

*********************************************************************************

python dataframe#read row and column#.iloc #.loc #dropna #set_index #how to check pandas version #how to save CSV and read #how to read and save sqlit3

import requests
from io import StringIO
import pandas as pd

url = 'website'
r = requests.get(url)
#let it can read chinese
r.encoding = 'big5'

dfs = pd.read_html(StringIO(r.text))
#get the first table of html
df = dfs[0]

# in [:,:10],before comma it indicate to row id，after comma,it indicates columns id
df = df.iloc[:,:10]
# we can get the row which first column is '公司代號'
column_name = df[df[0]=='公司代號']
# select any row of column_name as column 's name' （because in here,all the row is the same）
df.columns = column_name.iloc[0]
df['當月營收'] = pd.to_numeric(df['當月營收'],errors = 'coerce')
# delect row under 當月營收 which shows NaN by using .dropna
df = df.dropna(subset=['當月營收'])
# 刪除「公司代號」中出現「合計」的行數，其中「～」是否定的意思
#delect df['公司代號'] which shows 合計 ,and ~ means "not"
df = df.loc[~(df['公司代號']=='合計')]

df = df.set_index(['公司代號','公司名稱'])
df = df.apply(pd.to_numeric)
#df.head()

**********************************************************************

#check pandas version

pd.__version__

**********************************************************************

df.to_csv('test.csv',encoding='utf_8_sig')

df = pd.read_csv('test.csv',index_col=['公司代號','公司名稱'])

**********************************************************************

import sqlite3

conn = sqlite3.connect('test.sqlite3')

df.to_sql('monthly_report', conn, if_exists='replace')

df = pd.read_sql('select * from monthly_report', conn, index_col=['公司代號','公司名稱'])

df.head()

**********************************************************************

2019年3月17日星期日

Python #crawling爬蟲 #save as a file as csv or html 存檔 #sqllite3 #dataframe.astype(str) #rename #remove column which is NaN# set index #lambda無名氏函數 no name function #pandas type to number轉數字 #date.strftime #list(filter)

import requests #think it as a browser
import pandas as pd

res = requests.get("website address") #inside the () you can put the website link

res.text # it will show the word you can not understand

res.encoding = 'utf-8' #utf-8 ,so we can read chinese word

res.text
# save the file
f = open('andrew_file.html','w',encoding = 'utf-8')
f.write(res.text)
f.close()

#we need to use pandas to read html as dataframe

df = pd.read_html('andrew_file.html')

print(df[1])

***************************************************************************
import requests #think it as a browser
import pandas as pd

res = requests.get("website address") #inside the () you can put the website link

res.text # it will show the word you can not understand

res.encoding = 'utf-8' #utf-8 ,so we can read chinese word

res.text
# save the file
f = open('andrew_file.html','w',encoding = 'utf-8')
f.write(res.text)
f.close()

#we need to use pandas to read html as dataframe

df = pd.read_html('andrew_file.html')

print(df[1])
*********************************************************************
#We also can directly let it be as dataframe , and no need to save
from io import StringIO
#StringIo make res.text like a file
dfs = pd.read_html(StringIO(res.text))
dfs[3]

*********************************************************************
import requests
import pandas as pd
from io import StringIO

response = requests.get('website address')

#df1 = pd.read_csv(StringIO(response.text))
#want to save as df , but fail , it show ParserError: Error tokenizing data. C error: Expected 6 fields in line 159, saw 17
#different row on the file

lines = response.text.split('\n')
lines[159]

newlines = []
#use for loop to organize the data
#we use ", to seperate ,the reason is we are afraid we meet the string like this ' "你", ",好,", "嗎",'
for line in lines:
if len(line.split('",')) == 17:
newlines.append(line)

print("original line count = ")
print(len(lines))
print("after process line count = " )
print(len(newlines))

j = '\n'
s = j.join(newlines)
s.replace('=','')

df = pd.read_csv(StringIO(s))
#only show first 5 row
df.head()

#but now ,they looks like number , but they are string
#and we dont want ","
#we dont want Unnamed16

#change all the content into string
df = df.astype(str)
#lambda is like a function ,but it make definition simple
#applymap is to excute funtion
df = df.applymap(lambda s:s.replace(',',''))

df = df.set_index('證券代號')
#change sting into number
df = df.apply(lambda s: pd.to_numeric(s,errors = 'coerce' ))

# to delect no use columns
# axis=1 is to check every columns
# how='all' means if all is nan ,then it will act
# （this also works） df = df[df.columns[df.isnull().sum() != len(df)]]
df.dropna(axis = 1 ,how = 'all',inplace = True)

df.head()

*********************************************************************

#how to write red bar of stock

#length of red bar,1 means no up no down

#<1 means stock closing price is lower(stock price fall)

#>1 means stock closing price is larger（stock price rise）

close_open = df['收盤價']/df['開盤價']

close_open.head(5)

*********************************************************************

#select stock which closing price is larger 5% than opening price

df[close_open > 1.05]

*********************************************************************

#save as csv file , so that we can open by excel

df.to_csv('daily_price.csv',encoding = "utf_8_sig")

#read file ,and assign "證卷代號" as index

df = pd.read_csv('daily_price.csv',index_col = ['證券代號'])

df.head()

*********************************************************************

#save to sqllite3

import sqlite3

conn = sqlite3.connect('test.sqlite3')

df.to_sql('daily_price',conn,if_exists='replace')

#read the file

df = pd.read_sql('Select * from daily_price',conn,index_col = ["證券代號"])

df.head()

*********************************************************************

*********************************************************************
import pandas as pd
import requests
from io import StringIO

def crawl_price(date):
#change date into string
datestr = date.strftime('%y%m%d')

r = requests.post('website part1' + datestr + 'website part2')

content = r.text.replace('=','')

lines = content.split('\n')
#delect row which columns <=10
lines = list(filter(lambda l:len(l.split('",')) > 10, lines))
#connect all the lines with \n
content = "\n".join(lines)

if content == '':
return None

df = pd.read_csv(StringIO(content))

df = df.astype(str)
df = df.apply(lambda s:s.str.replace(',',''))

df['date'] = pd.to_datetime(date)
#change the name of columns
df = df.rename(columns={'證券代號':'stock_id'})
df = df.set_index(['stock_id','date'])
#df = df.set_index('stock_id')
# turn all the content of table into number , coerce means the content will be NaN if it is fail to translate
df = df.apply(lambda s:pd.to_numeric(s,errors = 'coerce'))
#delect the columns which is no need
df = df[df.columns[df.isnull().all()==False]]

return df

import datetime
crawl_price(datetime.datetime(2019,3,18))

*********************************************************************

)

2019年3月15日星期五

Python pandas 使用快速查閱

Pandas 可以用來分析time series
主要有兩種資料結構

Series : 一條時間序列
DataFrame : 很多的時間序列

Pandas文檔

把pandas 叫進來，命名為pd

import pandas as pd

給它一串list

s = pd.Series([1,2,3,4,5,6,7,8])

結果為

0~7為index

1~8為對應到的值

*************************************

date = pd.date_range('20190309',periods=6)

s = pd.Series([1,2,3,4,5,6], date)

****************************************************

2019-03-09    1
2019-03-10    2
2019-03-11    3
2019-03-12    4
2019-03-13    5
2019-03-14    6
Freq: D, dtype: int64

********************************************************

# 建立一個 time series
import pandas as pd
date = pd.date_range('20190309', periods=6)
s = pd.Series([1,2,3,4,5,6], index=date)

# 查找
s.loc['20190312']

***********************************************

******************************************************

s.loc['20190310':'20190312']

**********************************

2019-03-10    2
2019-03-11    3
2019-03-12    4
Freq: D, dtype: int64

*********************************

s.iloc[1]

****************************

**********************************

s.iloc[1:4]

*********************************

2019-03-10    2
2019-03-11    3
2019-03-12    4
Freq: D, dtype: int64

***********************************

print(s)
print(s.max())
print(s.min())
print(s.mean())
print(s.std())

***********************************

2019-03-09    1
2019-03-10    2
2019-03-11    3
2019-03-12    4
2019-03-13    5
2019-03-14    6
Freq: D, dtype: int64
6
1
3.5
1.8708286933869707

*********************************************

print(s)
print(s.cumsum())
print(s.cumprod())

*********************************************

2019-03-09    1
2019-03-10    2
2019-03-11    3
2019-03-12    4
2019-03-13    5
2019-03-14    6
Freq: D, dtype: int64
2019-03-09     1
2019-03-10     3
2019-03-11     6
2019-03-12    10
2019-03-13    15
2019-03-14    21
Freq: D, dtype: int64
2019-03-09      1
2019-03-10      2
2019-03-11      6
2019-03-12     24
2019-03-13    120
2019-03-14    720
Freq: D, dtype: int64

*************************************************

print(s)
#將當下跟上一個相加，1+nan 等於nan
print(s.rolling(2).sum())
print(s.rolling(2).max())
print(s.rolling(2).min())
print(s.rolling(2).mean())
print(s.rolling(2).std())

*********************************************

2019-03-09    1
2019-03-10    2
2019-03-11    3
2019-03-12    4
2019-03-13    5
2019-03-14    6
Freq: D, dtype: int64
2019-03-09     NaN
2019-03-10     3.0
2019-03-11     5.0
2019-03-12     7.0
2019-03-13     9.0
2019-03-14    11.0
Freq: D, dtype: float64
2019-03-09    NaN
2019-03-10    2.0
2019-03-11    3.0
2019-03-12    4.0
2019-03-13    5.0
2019-03-14    6.0
Freq: D, dtype: float64
2019-03-09    NaN
2019-03-10    1.0
2019-03-11    2.0
2019-03-12    3.0
2019-03-13    4.0
2019-03-14    5.0
Freq: D, dtype: float64
2019-03-09    NaN
2019-03-10    1.5
2019-03-11    2.5
2019-03-12    3.5
2019-03-13    4.5
2019-03-14    5.5
Freq: D, dtype: float64
2019-03-09         NaN
2019-03-10    0.707107
2019-03-11    0.707107
2019-03-12    0.707107
2019-03-13    0.707107
2019-03-14    0.707107
Freq: D, dtype: float64

*************************************************

print(s)
s < 3



2018-01-01    1
2018-01-02    2
2018-01-03    3
2018-01-04    4
2018-01-05    5
2018-01-06    6
Freq: D, dtype: int64










Out[12]:

2018-01-01     True
2018-01-02     True
2018-01-03    False
2018-01-04    False
2018-01-05    False
2018-01-06    False
Freq: D, dtype: bool
*****************************************************
#在瀏覽器畫圖
%matplotlib inline
s.plot()



*****************************************************************

larger_than_3 = s > 3
print (larger_than_3)
s.loc[larger_than_3]
*****************************************************************


2019-03-09    False
2019-03-10    False
2019-03-11    False
2019-03-12     True
2019-03-13     True
2019-03-14     True
Freq: D, dtype: bool







Out[35]:

2019-03-12    4
2019-03-13    5
2019-03-14    6
Freq: D, dtype: int64

*****************************************************************

s.loc[larger_than_3] = s.loc[larger_than_3] + 1
s

*****************************************************************

2019-03-09    1
2019-03-10    2
2019-03-11    3
2019-03-12    5
2019-03-13    6
2019-03-14    7
Freq: D, dtype: int64

*****************************************************************

s1 = pd.Series([1,2,3,4,5,6], index=date)
s2 = pd.Series([5,6,7,8,9,10], index=date)
s3 = pd.Series([11,12,5,7,8,2], index=date)

dictionary = {
    'C1':s1,
    'C2':s2,
    'C3':s3,
}
df = pd.DataFrame(dictionary)
df

*****************************************************************

*****************************************************************

%matplotlib inline
df.plot()

*****************************************************************

*****************************************************************

df.loc['2019-03-11']

*****************************************************************

C1    3
C2    7
C3    5

*****************************************************************

df.iloc[1]

*****************************************************************

C1     2
C2     6
C3    12

*****************************************************************

print(df)
df.loc['2019-03-09':'2019-03-11', ['C1','C2']]

*****************************************************************

            C1  C2  C3
2019-03-09   1   5  11
2019-03-10   2   6  12
2019-03-11   3   7   5
2019-03-12   4   8   7
2019-03-13   5   9   8
2019-03-14   6  10   2


C1 C2

2019-03-09 1 5
2019-03-10 2 6
2019-03-11 3 7

	C1	C2
2019-03-09	1	5
2019-03-10	2	6
2019-03-11	3	7

*****************************************************************

df.iloc[1:4, [0, 1]]

*****************************************************************


C1 C2

2019-03-10 2 6
2019-03-11 3 7
2019-03-12 4 8

	C1	C2
2019-03-10	2	6
2019-03-11	3	7
2019-03-12	4	8

*****************************************************************

df.cumsum()

*****************************************************************


C1 C2 C3

2019-03-09 1 5 11
2019-03-10 3 11 23
2019-03-11 6 18 28
2019-03-12 10 26 35
2019-03-13 15 35 43
2019-03-14 21 45 45

C1	C2	C3
2019-03-09	1	5	11
2019-03-10	3	11	23
2019-03-11	6	18	28
2019-03-12	10	26	35
2019-03-13	15	35	43
2019-03-14	21	45	45

*****************************************************************

df.cumprod()

*****************************************************************


C1 C2 C3

2019-03-09 1 5 11
2019-03-10 2 30 132
2019-03-11 6 210 660
2019-03-12 24 1680 4620
2019-03-13 120 15120 36960
2019-03-14 720 151200 73920

	C1	C2	C3
2019-03-09	1	5	11
2019-03-10	2	30	132
2019-03-11	6	210	660
2019-03-12	24	1680	4620
2019-03-13	120	15120	36960
2019-03-14	720	151200	73920

*****************************************************************

df.rolling(2).mean()

*****************************************************************


C1 C2 C3

2019-03-09 NaN NaN NaN
2019-03-10 1.5 5.5 11.5
2019-03-11 2.5 6.5 8.5
2019-03-12 3.5 7.5 6.0
2019-03-13 4.5 8.5 7.5
2019-03-14 5.5 9.5 5.0

	C1	C2	C3
2019-03-09	NaN	NaN	NaN
2019-03-10	1.5	5.5	11.5
2019-03-11	2.5	6.5	8.5
2019-03-12	3.5	7.5	6.0
2019-03-13	4.5	8.5	7.5
2019-03-14	5.5	9.5	5.0

*****************************************************************

print(df)
df.cumsum(axis=1)

*****************************************************************



      C1  C2  C3
2019-03-09   1   5  11
2019-03-10   2   6  12
2019-03-11   3   7   5
2019-03-12   4   8   7
2019-03-13   5   9   8
2019-03-14   6  10   2





 


C1 C2 C3

2019-03-09 1 6 17
2019-03-10 2 8 20
2019-03-11 3 10 15
2019-03-12 4 12 19
2019-03-13 5 14 22
2019-03-14 6 16 18

	C1	C2	C3
2019-03-09	1	6	17
2019-03-10	2	8	20
2019-03-11	3	10	15
2019-03-12	4	12	19
2019-03-13	5	14	22
2019-03-14	6	16	18

**********************************************************************

import pandas as pd 



def create_list (start_value,end_value):
    listz = []
    for v in range(end_value):
        if v+1 >= start_value:
            listz.append(v+1);
    return listz

list1 = create_list(1,10)
list2 = create_list(11,20)
list3 = create_list(21,30)

list1_series = pd.Series(list1,range(1,11,1))
list2_series = pd.Series(list2,range(1,11,1))
list3_series = pd.Series(list3,range(1,11,1))



dictionary = {
    'C1':list1_series,
    'C2':list2_series,
    'C3':list3_series
}

df = pd.DataFrame(dictionary)

print(df)
print("************************")
print(df.loc[2])
print("************************")
print(df.iloc[0])
print("************************")
print(df.loc[2,["C2"]])
print("************************")
print(df.iloc[1,1])
print("************************")
print(df)
print("************************")
print(df.cumsum())
print("************************")
print(df.cumsum(axis = 1))
print("************************")
df_larger_than_3 = df > 3 
print(df_larger_than_3)
print("************************")

%matplotlib inline
df.plot()

*******************************************************************

    C1  C2  C3
1    1  11  21
2    2  12  22
3    3  13  23
4    4  14  24
5    5  15  25
6    6  16  26
7    7  17  27
8    8  18  28
9    9  19  29
10  10  20  30
************************
C1     2
C2    12
C3    22
Name: 2, dtype: int64
************************
C1     1
C2    11
C3    21
Name: 1, dtype: int64
************************
C2    12
Name: 2, dtype: int64
************************
12
************************
    C1  C2  C3
1    1  11  21
2    2  12  22
3    3  13  23
4    4  14  24
5    5  15  25
6    6  16  26
7    7  17  27
8    8  18  28
9    9  19  29
10  10  20  30
************************
    C1   C2   C3
1    1   11   21
2    3   23   43
3    6   36   66
4   10   50   90
5   15   65  115
6   21   81  141
7   28   98  168
8   36  116  196
9   45  135  225
10  55  155  255
************************
    C1  C2  C3
1    1  12  33
2    2  14  36
3    3  16  39
4    4  18  42
5    5  20  45
6    6  22  48
7    7  24  51
8    8  26  54
9    9  28  57
10  10  30  60
************************
       C1    C2    C3
1   False  True  True
2   False  True  True
3   False  True  True
4    True  True  True
5    True  True  True
6    True  True  True
7    True  True  True
8    True  True  True
9    True  True  True
10   True  True  True
************************

python #string字串 #float浮點數 #function函數 #datetime時間日期 #if elif else #for loop 迴圈#while #dictionary 字典快速查閱

String字串

將float轉成字串
str(float)

將字串內的文字_ 改變成you
test_string.replace('_','you')

Float

只留float小數點後兩位
format(float value,'.2f')

簡單計算功能

總和
sum(salary)
陣列長度
len(salary)

基本運算

a**b 等於 a的b次方

定義函數

def substract(a,b):
z = a - b
return z

answer = substract(99999,2343)
print(answer)

顯示時間，現在幾點

import datetime

datetime.datetime.now()

if , elif , else 用法

def check_profit_or_not(buy_price,sell_price):
if buy_price > sell_price :
print("虧錢")
elif sell_price > buy_price :
print("賺錢")
else :
print("沒賺沒賠")

check_profit_or_not(10000,12000)

FOR loop迴圈

for v in range(90):
print(v)

print("*********************************************")
for x in [10,20,30,40]:
print(x)
print("*********************************************")
for x in 學生名單:
print(x)
print("*********************************************")
for x in 學生名單.values():
print(x)
print("*********************************************")

print(學生名單.get("小白"))

While迴圈

i = 0
while i < 5 :
print(i)

i+=1

python 學習資源

2019年3月8日星期五

python 學習筆記 - 環境架設

Python的函式庫
畫圖表:Matplotlib
架網站:Django
機器學習:TensorFlow

函式庫Package常常更新，版本常不相容
解決辦法:一個專案用同一個環境Enviroment

Anaconda整合了:
Package安裝
Enviroment建設
寫程式的環境

可以來這邊安裝anaconda
https://anaconda.org/

訂閱：文章 (Atom)

安德魯讀書筆記

2019年3月26日星期二

有買房有投資對比沒買房只投資，到底退休時會差多少呢?

2019年3月25日星期一

Python 程式抓取股票策略

2019年3月19日星期二

python os#path #write #close time#sleep

python dataframe#read row and column#.iloc #.loc #dropna #set_index #how to check pandas version #how to save CSV and read #how to read and save sqlit3

2019年3月17日星期日

Python #crawling爬蟲 #save as a file as csv or html 存檔 #sqllite3 #dataframe.astype(str) #rename #remove column which is NaN# set index #lambda無名氏函數 no name function #pandas type to number轉數字 #date.strftime #list(filter)

2019年3月15日星期五

Python pandas 使用快速查閱

python #string字串 #float浮點數 #function函數 #datetime時間日期 #if elif else #for loop 迴圈#while #dictionary 字典快速查閱

String字串

Float

簡單計算功能

基本運算

定義函數

顯示時間，現在幾點

if , elif , else 用法

FOR loop迴圈

While迴圈

python 學習資源

2019年3月8日星期五

python 學習筆記 - 環境架設

網誌存檔

About

2019年3月26日 星期二

2019年3月25日 星期一

2019年3月19日 星期二

2019年3月17日 星期日

2019年3月15日 星期五

String字串

Float

簡單計算功能

基本運算

定義函數

顯示時間，現在幾點

if , elif , else 用法

FOR loop迴圈

While迴圈

2019年3月8日 星期五

2019年3月26日星期二

2019年3月25日星期一

2019年3月19日星期二

2019年3月17日星期日

2019年3月15日星期五

2019年3月8日星期五