2019年3月26日 星期二

有買房有投資 對比 沒買房只投資,到底退休時會差多少呢?


我們的參數是基於



60.00


26

8.00

2.20

0.70

55

0.70

7.00

1000

300

40

2.50

20



  • 早點買房跟晚點買房的差異?

  • 透過跑程式,我們得知,如果你在很年輕就買房,到最後退休時,可能資金會差距很大,以下我們比較,30歲,40歲,50歲買房,退休會差距多少錢

    img
    img
    房貸究竟付快一點好,還是慢一點好呢?透過以下的比較,我們可以瞭解到,能付越少,越好,最後的錢會越多
    img
    img
    • 假如買房的話,房屋貸款年限究竟該長該短?透過以下的圖,我們可以知道,貸款年數,越長越好!!
    img
    img
    • 每個月多省下2000元拿來投資(開銷少一點),有很大的差別嗎?透過下圖,我們可以得知,如果你每月省2000元,退休時,你會多一百多萬,對於整體而言,大概是多3%,這個我覺得開心就好,畢竟人生只有一次,我們應該理財,也應該享受
    img
    img


    2019年3月25日 星期一

    Python 程式抓取 股票 策略

    股價淨值比 < 1 ,但要分散風險,因為即使它的股價淨值比低,但是也有可能是因為公司表現不好,而股價迅速下滑

    KD值

    最近三個月的月營收的平均 > 該家公司12個月的月營收平均

    中型股波動 較 大型股波動 大,但我們除了獲利,我們還要安全,所以

    自由現金流量(公司能自由運用的錢)  = 營業現金收入(賣東西賺的錢) - 投資現金流出 (更新設備擴建廠房等)

    自由現金流量我們希望可以大於0,代表公司財務狀況比較安全

    營業利益成長率 = (當期的營業利益 - 去年同期的營業利益) x 100% / 去年同期的營業利益

    我們希望營業利益成長率 是 正的,代表公司比去年更賺錢

    但即使這樣,我們想要公司跟公司比,於是我們看

    股東權益報酬率(ROE) = 本期淨利/股東權利

    ROE高 -> 小資本 賺很多
    ROE低 -> 大資本 但不會賺錢

    但即使roe高,股價太誇張了呢 ,於是我們再看

    市值營收比(PSR - PRICE TO SALE RATIO) = 市值/月營收

    PSR 高 - 市值(股價)太高,營收太少
    PSR低  - 市值(股價)低,營收高

    在配合有(上漲趨勢,可能有大戶)RSV 下去投資

    2019年3月19日 星期二

    python os#path #write #close time#sleep

    import time
    import os

    #if no directory called andrew_course9
    if 'andrew_course9' not in os.listdir():
        #create andrew_course9 directory
        os.mkdir('andrew_course9')
    #stock_id   
    sid = ['1101','2330']

    for s in sid:
        res = requests.get('website')
        res.encoding = 'big5'
        #set up save address , ex:andrew_course9\1101.html
        path = os.path.join('andrew_course9',s+'.html')
        #open the file ,write in ,then close
        f = open(path,'w',encoding='utf-8')
        f.write(res.text)
        f.close()
       
        print(s)
        #pause 20 sec , to prevent website does not let you catch data
        time.sleep(20)

    ********************************************************************************

    dfs = []
    #for every stock
    for s in sid:
        #get the file from directory
        path = os.path.join('andrew_course9', s + '.html')
       
        dfs.append(pd.read_html(path,encoding = 'utf-8'))
       

    dfs[1][1].head(10)

    *********************************************************************************

    python dataframe#read row and column#.iloc #.loc #dropna #set_index #how to check pandas version #how to save CSV and read #how to read and save sqlit3

    import requests
    from io import StringIO
    import pandas as pd

    url = 'website'
    r = requests.get(url)
    #let it can read chinese
    r.encoding = 'big5'

    dfs = pd.read_html(StringIO(r.text))
    #get the first table of html
    df = dfs[0]

    # in [:,:10],before comma it indicate to row id,after comma,it indicates columns id
    df = df.iloc[:,:10]
    # we can get the row which first column is '公司代號'
    column_name = df[df[0]=='公司代號']
    # select any row of column_name as column 's name' (because in here,all the row is the same)
    df.columns = column_name.iloc[0]
    df['當月營收'] = pd.to_numeric(df['當月營收'],errors = 'coerce')
    # delect row under 當月營收 which shows NaN by using   .dropna 
    df = df.dropna(subset=['當月營收'])
    # 刪除「公司代號」中出現「合計」的行數,其中「~」是否定的意思
    #delect df['公司代號'] which shows 合計 ,and ~ means "not"
    df = df.loc[~(df['公司代號']=='合計')]

    df = df.set_index(['公司代號','公司名稱'])
    df = df.apply(pd.to_numeric)
    #df.head()


    **********************************************************************
    #check pandas version
    pd.__version__

    **********************************************************************
    df.to_csv('test.csv',encoding='utf_8_sig')

    df = pd.read_csv('test.csv',index_col=['公司代號','公司名稱'])

    **********************************************************************
    import sqlite3

    conn = sqlite3.connect('test.sqlite3')
    df.to_sql('monthly_report', conn, if_exists='replace')


    df = pd.read_sql('select * from monthly_report', conn, index_col=['公司代號','公司名稱'])
    df.head()
    ********************************************************************** 

    2019年3月17日 星期日

    Python #crawling爬蟲 #save as a file as csv or html 存檔 #sqllite3 #dataframe.astype(str) #rename #remove column which is NaN# set index #lambda無名氏函數 no name function #pandas type to number轉數字 #date.strftime #list(filter)

    import requests #think it as a browser
    import pandas as pd

    res = requests.get("website address") #inside the () you can put the website link

    res.text # it will show the word you can not understand

    res.encoding = 'utf-8' #utf-8 ,so we can read chinese word

    res.text
    # save the file
    f = open('andrew_file.html','w',encoding = 'utf-8')
    f.write(res.text)
    f.close()

    #we need to use pandas to read html as dataframe

    df = pd.read_html('andrew_file.html')

    print(df[1])

    ***************************************************************************
    import requests #think it as a browser
    import pandas as pd

    res = requests.get("website address") #inside the () you can put the website link

    res.text # it will show the word you can not understand

    res.encoding = 'utf-8' #utf-8 ,so we can read chinese word

    res.text
    # save the file
    f = open('andrew_file.html','w',encoding = 'utf-8')
    f.write(res.text)
    f.close()

    #we need to use pandas to read html as dataframe

    df = pd.read_html('andrew_file.html')

    print(df[1])
    *********************************************************************
    #We also can directly let it be as dataframe , and no need to save
    from io import StringIO
    #StringIo make res.text like a file
    dfs = pd.read_html(StringIO(res.text))
    dfs[3]

    *********************************************************************
    import requests
    import pandas as pd
    from io import StringIO

    response = requests.get('website address')

    #df1 = pd.read_csv(StringIO(response.text))
    #want to save as df , but fail , it show ParserError: Error tokenizing data. C error: Expected 6 fields in line 159, saw 17
    #different row on the file


    lines = response.text.split('\n')
    lines[159]

    newlines = []
    #use for loop to organize the data
    #we use ", to seperate ,the reason is we are afraid we meet the string like this '  "你",   ",好,",   "嗎",'
    for line in lines:
        if len(line.split('",')) == 17:
            newlines.append(line)
         
    print("original line count = ")
    print(len(lines))
    print("after process line count = " )
    print(len(newlines))

    j = '\n'
    s = j.join(newlines)
    s.replace('=','')

    df = pd.read_csv(StringIO(s))
    #only show first 5 row
    df.head()

    #but now ,they looks like number , but they are string
    #and we dont want ","
    #we dont want Unnamed16

    #change all the content into string
    df = df.astype(str)
    #lambda is like a function ,but it make definition simple
    #applymap is to excute funtion
    df = df.applymap(lambda s:s.replace(',',''))

    df = df.set_index('證券代號')
    #change sting into number
    df = df.apply(lambda s: pd.to_numeric(s,errors = 'coerce' ))

    # to delect no use columns
    # axis=1 is to check every columns
    # how='all'  means if all is nan ,then it will act
    # (this also works) df = df[df.columns[df.isnull().sum() != len(df)]]
    df.dropna(axis = 1 ,how = 'all',inplace = True)

    df.head()

    *********************************************************************
    #how to write red bar of stock
    #length of red bar,1 means no up no down
    #<1 means stock closing price is lower(stock price fall)
    #>1 means stock closing price is larger(stock price rise)
    close_open = df['收盤價']/df['開盤價']
    close_open.head(5)
    *********************************************************************
    #select stock which closing price is larger 5% than opening price
    df[close_open > 1.05]
    *********************************************************************
    #save as csv file , so that we can open by excel
    df.to_csv('daily_price.csv',encoding = "utf_8_sig")

    #read file ,and assign "證卷代號" as index
    df = pd.read_csv('daily_price.csv',index_col = ['證券代號'])

    df.head()

    *********************************************************************
    #save to sqllite3

    import sqlite3
    conn = sqlite3.connect('test.sqlite3')
    df.to_sql('daily_price',conn,if_exists='replace')

    #read the file

    df = pd.read_sql('Select * from daily_price',conn,index_col = ["證券代號"])

    df.head()

    *********************************************************************

    *********************************************************************
    import pandas as pd
    import requests
    from io import StringIO

    def crawl_price(date):
        #change date into string
        datestr = date.strftime('%y%m%d')
       
        r = requests.post('website part1' + datestr + 'website part2')
       
        content = r.text.replace('=','')
       
        lines = content.split('\n')
        #delect row which columns <=10
        lines = list(filter(lambda l:len(l.split('",')) > 10, lines))
        #connect all the lines with \n
        content = "\n".join(lines)
       
        if content == '':
            return None
       
        df = pd.read_csv(StringIO(content))
       
        df = df.astype(str)
        df = df.apply(lambda s:s.str.replace(',',''))
       
        df['date'] = pd.to_datetime(date)
        #change the name of columns
        df = df.rename(columns={'證券代號':'stock_id'})
        df = df.set_index(['stock_id','date'])
        #df = df.set_index('stock_id')
        # turn all the content of table into number , coerce means the content will be NaN if it is fail to translate
        df = df.apply(lambda s:pd.to_numeric(s,errors = 'coerce'))
        #delect the columns which is no need
        df = df[df.columns[df.isnull().all()==False]]
       
        return df
       
       
    import datetime
    crawl_price(datetime.datetime(2019,3,18))


    *********************************************************************

    )

    2019年3月15日 星期五

    Python pandas 使用快速查閱

    Pandas 可以用來分析time series
    主要有兩種資料結構

    1. Series : 一條時間序列
    2. DataFrame : 很多的時間序列


    把pandas 叫進來,命名為pd
    import pandas as pd  
    給它一串list
    s = pd.Series([1,2,3,4,5,6,7,8])
    結果為

    0    1
    1    2
    2    3
    3    4
    4    5
    5    6
    6    7
    7    8
    
    
    
    0~7為index
    1~8為對應到的值
    
    
    
    
    
    
    *************************************
    date = pd.date_range('20190309',periods=6)
    
    s = pd.Series([1,2,3,4,5,6], date)
    s
    ****************************************************
    2019-03-09    1
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    2019-03-13    5
    2019-03-14    6
    Freq: D, dtype: int64
    ********************************************************
    
    
    # 建立一個 time series
    import pandas as pd
    date = pd.date_range('20190309', periods=6)
    s = pd.Series([1,2,3,4,5,6], index=date)
    
    # 查找
    s.loc['20190312']
    
    
    ***********************************************
    4
    ******************************************************
    s.loc['20190310':'20190312']
    **********************************
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    Freq: D, dtype: int64
    *********************************
    s.iloc[1]
    ****************************
    2
    **********************************
    s.iloc[1:4]
    *********************************
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    Freq: D, dtype: int64
    ***********************************
    print(s)
    print(s.max())
    print(s.min())
    print(s.mean())
    print(s.std())
    ***********************************
    2019-03-09    1
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    2019-03-13    5
    2019-03-14    6
    Freq: D, dtype: int64
    6
    1
    3.5
    1.8708286933869707
    
    *********************************************
    print(s)
    print(s.cumsum())
    print(s.cumprod())
    *********************************************
    2019-03-09    1
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    2019-03-13    5
    2019-03-14    6
    Freq: D, dtype: int64
    2019-03-09     1
    2019-03-10     3
    2019-03-11     6
    2019-03-12    10
    2019-03-13    15
    2019-03-14    21
    Freq: D, dtype: int64
    2019-03-09      1
    2019-03-10      2
    2019-03-11      6
    2019-03-12     24
    2019-03-13    120
    2019-03-14    720
    Freq: D, dtype: int64
    
    *************************************************
    print(s)
    #將當下跟上一個相加,1+nan 等於nan
    print(s.rolling(2).sum())
    print(s.rolling(2).max())
    print(s.rolling(2).min())
    print(s.rolling(2).mean())
    print(s.rolling(2).std())
    
    
    *********************************************
    2019-03-09    1
    2019-03-10    2
    2019-03-11    3
    2019-03-12    4
    2019-03-13    5
    2019-03-14    6
    Freq: D, dtype: int64
    2019-03-09     NaN
    2019-03-10     3.0
    2019-03-11     5.0
    2019-03-12     7.0
    2019-03-13     9.0
    2019-03-14    11.0
    Freq: D, dtype: float64
    2019-03-09    NaN
    2019-03-10    2.0
    2019-03-11    3.0
    2019-03-12    4.0
    2019-03-13    5.0
    2019-03-14    6.0
    Freq: D, dtype: float64
    2019-03-09    NaN
    2019-03-10    1.0
    2019-03-11    2.0
    2019-03-12    3.0
    2019-03-13    4.0
    2019-03-14    5.0
    Freq: D, dtype: float64
    2019-03-09    NaN
    2019-03-10    1.5
    2019-03-11    2.5
    2019-03-12    3.5
    2019-03-13    4.5
    2019-03-14    5.5
    Freq: D, dtype: float64
    2019-03-09         NaN
    2019-03-10    0.707107
    2019-03-11    0.707107
    2019-03-12    0.707107
    2019-03-13    0.707107
    2019-03-14    0.707107
    Freq: D, dtype: float64
    *************************************************
    print(s)
    s < 3
    
    
    2018-01-01    1
    2018-01-02    2
    2018-01-03    3
    2018-01-04    4
    2018-01-05    5
    2018-01-06    6
    Freq: D, dtype: int64
    
    Out[12]:
    2018-01-01     True
    2018-01-02     True
    2018-01-03    False
    2018-01-04    False
    2018-01-05    False
    2018-01-06    False
    Freq: D, dtype: bool
    *****************************************************
    #在瀏覽器畫圖
    %matplotlib inline
    s.plot()
    
    
    *****************************************************************
    
    
    larger_than_3 = s > 3
    print (larger_than_3)
    s.loc[larger_than_3]
    *****************************************************************
    2019-03-09    False
    2019-03-10    False
    2019-03-11    False
    2019-03-12     True
    2019-03-13     True
    2019-03-14     True
    Freq: D, dtype: bool
    
    Out[35]:
    2019-03-12    4
    2019-03-13    5
    2019-03-14    6
    Freq: D, dtype: int64
    
    
    *****************************************************************
    
    
    s.loc[larger_than_3] = s.loc[larger_than_3] + 1
    s
    *****************************************************************
    2019-03-09    1
    2019-03-10    2
    2019-03-11    3
    2019-03-12    5
    2019-03-13    6
    2019-03-14    7
    Freq: D, dtype: int64
    *****************************************************************
    s1 = pd.Series([1,2,3,4,5,6], index=date)
    s2 = pd.Series([5,6,7,8,9,10], index=date)
    s3 = pd.Series([11,12,5,7,8,2], index=date)
    
    dictionary = {
        'C1':s1,
        'C2':s2,
        'C3':s3,
    }
    df = pd.DataFrame(dictionary)
    df
    *****************************************************************
    
    
    *****************************************************************
    
    
    %matplotlib inline
    df.plot()
    *****************************************************************
    
    
    
    
    *****************************************************************
    df.loc['2019-03-11']
    *****************************************************************
    C1    3
    C2    7
    C3    5
    
    *****************************************************************
    df.iloc[1]
    *****************************************************************
    C1     2
    C2     6
    C3    12
    
    *****************************************************************
    print(df)
    df.loc['2019-03-09':'2019-03-11', ['C1','C2']]
    *****************************************************************
                C1  C2  C3
    2019-03-09   1   5  11
    2019-03-10   2   6  12
    2019-03-11   3   7   5
    2019-03-12   4   8   7
    2019-03-13   5   9   8
    2019-03-14   6  10   2
    
    
    
    C1C2
    2019-03-0915
    2019-03-1026
    2019-03-1137
    *****************************************************************
    df.iloc[1:4, [0, 1]]
    *****************************************************************
    C1C2
    2019-03-1026
    2019-03-1137
    2019-03-1248
    *****************************************************************
    df.cumsum()
    *****************************************************************
    C1C2C3
    2019-03-091511
    2019-03-1031123
    2019-03-1161828
    2019-03-12102635
    2019-03-13153543
    2019-03-14214545
    *****************************************************************
    df.cumprod()
    *****************************************************************
    C1C2C3
    2019-03-091511
    2019-03-10230132
    2019-03-116210660
    2019-03-122416804620
    2019-03-131201512036960
    2019-03-1472015120073920
    *****************************************************************
    df.rolling(2).mean()
    *****************************************************************
    C1C2C3
    2019-03-09NaNNaNNaN
    2019-03-101.55.511.5
    2019-03-112.56.58.5
    2019-03-123.57.56.0
    2019-03-134.58.57.5
    2019-03-145.59.55.0
    
    
    *****************************************************************
    print(df)
    df.cumsum(axis=1)
    *****************************************************************
          C1  C2  C3
    2019-03-09   1   5  11
    2019-03-10   2   6  12
    2019-03-11   3   7   5
    2019-03-12   4   8   7
    2019-03-13   5   9   8
    2019-03-14   6  10   2

    C1C2C3
    2019-03-091617
    2019-03-102820
    2019-03-1131015
    2019-03-1241219
    2019-03-1351422
    2019-03-1461618
    
    
    **********************************************************************
    import pandas as pd 
    
    
    
    def create_list (start_value,end_value):
        listz = []
        for v in range(end_value):
            if v+1 >= start_value:
                listz.append(v+1);
        return listz
    
    list1 = create_list(1,10)
    list2 = create_list(11,20)
    list3 = create_list(21,30)
    
    list1_series = pd.Series(list1,range(1,11,1))
    list2_series = pd.Series(list2,range(1,11,1))
    list3_series = pd.Series(list3,range(1,11,1))
    
    
    
    dictionary = {
        'C1':list1_series,
        'C2':list2_series,
        'C3':list3_series
    }
    
    df = pd.DataFrame(dictionary)
    
    print(df)
    print("************************")
    print(df.loc[2])
    print("************************")
    print(df.iloc[0])
    print("************************")
    print(df.loc[2,["C2"]])
    print("************************")
    print(df.iloc[1,1])
    print("************************")
    print(df)
    print("************************")
    print(df.cumsum())
    print("************************")
    print(df.cumsum(axis = 1))
    print("************************")
    df_larger_than_3 = df > 3 
    print(df_larger_than_3)
    print("************************")
    
    
    %matplotlib inline
    df.plot()
    *******************************************************************
    
    
    
    
        C1  C2  C3
    1    1  11  21
    2    2  12  22
    3    3  13  23
    4    4  14  24
    5    5  15  25
    6    6  16  26
    7    7  17  27
    8    8  18  28
    9    9  19  29
    10  10  20  30
    ************************
    C1     2
    C2    12
    C3    22
    Name: 2, dtype: int64
    ************************
    C1     1
    C2    11
    C3    21
    Name: 1, dtype: int64
    ************************
    C2    12
    Name: 2, dtype: int64
    ************************
    12
    ************************
        C1  C2  C3
    1    1  11  21
    2    2  12  22
    3    3  13  23
    4    4  14  24
    5    5  15  25
    6    6  16  26
    7    7  17  27
    8    8  18  28
    9    9  19  29
    10  10  20  30
    ************************
        C1   C2   C3
    1    1   11   21
    2    3   23   43
    3    6   36   66
    4   10   50   90
    5   15   65  115
    6   21   81  141
    7   28   98  168
    8   36  116  196
    9   45  135  225
    10  55  155  255
    ************************
        C1  C2  C3
    1    1  12  33
    2    2  14  36
    3    3  16  39
    4    4  18  42
    5    5  20  45
    6    6  22  48
    7    7  24  51
    8    8  26  54
    9    9  28  57
    10  10  30  60
    ************************
           C1    C2    C3
    1   False  True  True
    2   False  True  True
    3   False  True  True
    4    True  True  True
    5    True  True  True
    6    True  True  True
    7    True  True  True
    8    True  True  True
    9    True  True  True
    10   True  True  True
    ************************
    
    
    
    
    

    python #string字串 #float浮點數 #function函數 #datetime時間日期 #if elif else #for loop 迴圈#while #dictionary 字典 快速查閱

    String字串


    將float轉成字串
    str(float)

    將字串內的文字_ 改變成you
    test_string.replace('_','you')


    Float

    只留float小數點後兩位
    format(float value,'.2f')


    簡單計算功能

    總和
    sum(salary)
    陣列長度
    len(salary)


    基本運算

    a**b 等於     a的b次方


    定義函數

    def substract(a,b):
        z = a - b 
        return z 

    answer = substract(99999,2343)
    print(answer)


    顯示時間,現在幾點


    import datetime


    datetime.datetime.now()


    if , elif , else  用法


    def check_profit_or_not(buy_price,sell_price):
        if buy_price > sell_price :
            print("虧錢")
        elif sell_price > buy_price :
            print("賺錢")
        else :
            print("沒賺沒賠")

            

    check_profit_or_not(10000,12000)


    FOR loop迴圈


    for v in range(90):
        print(v)

    print("*********************************************")
    for x in [10,20,30,40]:
        print(x)
    print("*********************************************")
    for x in 學生名單:
        print(x)
    print("*********************************************")
    for x in 學生名單.values():
        print(x)
    print("*********************************************")


    print(學生名單.get("小白"))



    While迴圈



    i = 0
    while i < 5 :
        print(i)

        i+=1

    python 學習資源

    2019年3月8日 星期五

    python 學習筆記 - 環境架設

    Python的函式庫
    畫圖表:Matplotlib
    架網站:Django
    機器學習:TensorFlow


    函式庫Package常常更新,版本常不相容
    解決辦法:一個專案用同一個環境Enviroment

    Anaconda整合了:
    Package安裝
    Enviroment建設
    寫程式的環境

    可以來這邊安裝anaconda
    https://anaconda.org/