Life has no boundaries: 2009

Wednesday, December 30, 2009

การไปเที่ยวได้อะไรมากกว่าที่คิด

มีโอกาสไปเที่ยวเชียงคาน ก่อนที่อะไร ๆ จะเหมือนปาย ได้อะไรมาบ้างมาดูกัน

ไปเชียงคานถ้าให้สนุกต้องไปทาง ลพบุรี-เพชรบูรณ์-เลย-เชียงคาน ท่านจะพบว่า การขับรถบนสันเขาสนุกเพียงใด
เต็นที่ดีควรจะต้องกันน้ำค้างได้ปริมาณมาก ๆ และต้องมีขนาดใหญ่พอนอนได้สำหรับ 2 คน โดยที่เวลาขยับตัวต้องไม่โดนผืนเต็น ไม่อย่างนั้นจะเปียก
ถุงนอนที่ดีข้างในควรเป็นผ้า ไม่ใช่พลาสติก
ถ้าหากกลางคืนหนาวมาก ให้นำรถยนต์ที่ขับไปด้วยจอดไว้ข้าง ๆ ถ้าหากหนาวจนทนไม่ไหว ก็ให้ไปนอนในรถ (อันนี้เจ้าถิ่นบอกมา)
คนอีสานเลย นี่ ใจดีจริง ๆ มีข้าวเย็นให้กินด้วย เขาเรียกว่า "มาซุมกัน"
ถ้าหากคุณต้องซื้อผ้าห่มที่ผลิตมาจากโรงงาน ที่มีกรรมวิธีที่เหมือนกับทำจากมือ handmade แนะนำให้ซื้อ Handmade มันจะขลังและกันหนาวได้ดีกว่า เพราะว่าคุณจะนึกถึงคนขายและคนทำ จนคุณไม่กล้าต่อราคา
ถ้าคุณจะพยายามเก่งทุกอย่าง ให้เปลี่ยนความคิดใหม่ "เก่งอย่างเดียวแบบสุด ๆ" ดีกว่า เพราะว่าจะมีคนอย่างนี้ไม่กี่คนในโลก (แล้วเค้าจะถามหาคุณเอง) ส่วนคนที่เก่งหลาย ๆ อย่างมีเยอะแล้ว เหมือนผ้าห่มที่ซื้อ ที่ขายข้างทาง ร้านจะบอกไม่ได้ว่ากรรมวิธีทำอย่างไร ต้องเอาใจในขนาดไหน แต่พอถามยาย ร้านนิยมไทย น้ำตาแทบแตก นั่นเป็นเหตุผลว่าทำไมผ้าห่มที่นี่ถึงอุ่นตลอดเวลาไม่ว่าจะหน้าร้อนหรือหน้าหนาว
สาธุ!!! อย่าให้เชียงคานในอนาคตเป็นเหมือนปายตอนนี้เลย เมืองไทยจะได้มีที่ที่น่าเที่ยวเพิ่มขึ้น
สุดท้าย... ผมคิดถึงที่นั่น

การใช้งาาน PostgreSQL ในส่วนงานต่าง ๆ OLTP, DW, WEB

คราวก่อนนำเนื้อหามาแปะไว้ (PostgreSQL กับการกำหนดค่าที่น่าสนใจของ WEB, OLPT, DW ) กะว่าจะอธิบายต่อที่ตรงนี้ แต่เท่าที่ดูแล้วคิดว่านำออกมาอธิบายต่างหากดีกว่า

การติดตั้ง PostgreSQL โดย Package ที่สำเร็จรูปนั้น แน่นอนว่าการทำ Package ออกมาจะมีการกำหนดค่ามาเพื่อรองรับ Application ทั่ว ๆ ไป แต่เนื่องจากความเป็นจริง เมื่อมีการนำมาใช้งานแล้ว เราต้องทำการปรับค่าต่าง ๆ เพื่อให้เหมาะสมกับงานที่ต่างกันออกไป

งานที่ต่าง ๆ กันก็ขอแบ่งออกเป็น 3 อย่างดังนี้ก่อน

WEB - Web Transactional
OLTP - Online Transactional Processing
DW - Data Warehouse

งานทั้ง 3 ประเภทนี้ต่างกัน ทำให้การกำหนดค่าต่าง ๆ ของ Database แตกต่างกันด้วย ลองนึกถึงว่าถ้าหากเราต้องการขับรถ ไปยังที่แตกต่างกัน เราจะเลือกรถอย่างไรดี

ในด้าน Database ก็เหมือนกัน ทุกวันนี้หลายคนใช้ Database Configurature เดียว เพื่องานทุกอย่าง อาจเนื่องจากเหตุผลทางด้านการเงิน แต่เมื่อทำงานไปซักพักก็จะเข้าใจว่าทำไม

ก่อนอื่นเราต้องทำความเข้าใจกับ ประเภทของงานทั้ง 3 กันก่อน

"WEB" การเรียกใช้งานส่วนใหญ่เป็นการแสดงผล มีการเรียกการใช้งาน Database บ้าง แต่ส่วนใหญ่จะถูกออกแบบมาเพื่อบริการข้อมูล ไม่ค่อยมีการทำ Transaction

"OLTP" มีการเรียกใช้งานข้อมูลเพื่อประกอบการใช้งาน และมีการบันทึกข้อมูล Transaction กลับไปยัง ฐานข้อมูล มีรายงานที่เกี่ยวกับทางด้าน Tranaction ในแต่ละวัน

"DW" Datawarehouse มีลักษณะเฉพาะ โดยมีจุดประสงค์เพื่อนำข้อมูลที่เก็บไว้มาทำเป็นการวิเคราะห์ และตัดสินใจ และบางครั้งส่งผลที่ได้กลับไปยัง OLTP เพื่อกำหนดวิธีการทำงานต่าง ๆ รายงานส่วนใหญ่ถูกออกแบบมาเพื่อ ทำเป็นข้อมูลเพื่อการตัดสินใจ

เมื่อทราบลักษณะงานต่าง ๆ แล้ว เราก็เริ่มดำเนินการกำหนดค่ากันเลย ซึ่งในส่วนนี้จะมุ่งไปยัง Database เท่านั้น ส่วน Application นั้น ก็ขึ้นอยู่กับประเภทของงาน

CPU
ปัจจุบันเท่าที่ประสบมา จำนวน CPU 4-8 cores ก็เพียงพอกับงานที่เกี่ยวกับ Database แล้ว และแต่ละ CPU ก็มีความเร็วที่ค่อนข้างมากแล้วด้วย

MEMORY
หลักง่าย ๆ คือ ถ้าอยากให้เร็วก็ใส่ เยอะ ๆ ปัจจุบันเท่าที่เห็นคือ 16GB

HARDDISK
"many spindles are better" หมายความว่า ถ้าทำ RAID ให้ใช้ HD หลาย ๆ ตัวทำ RAID ใน 1 Volume ในส่วนนี้มีผลต่อ การทำงานของ Database เป็นอย่างมาก เพราะว่า งานส่วนใหญ่ให้ Load Data แล้วประมวลผลใน MEMORY ไม่พอหรอก

NETWORK
หลาย ๆ คนซื้อ Server ที่มี Networks Interface เยอะ ๆ แต่ว่าใช้อันเดียวเอง (เป็นเพราะว่า ไม่เคยเจอ Traffice เยอะ ๆ อย่างผม)

คราวนี้มาดูการปรับแต่งของ Database กัน

max_connections หมายถึงจำนวน connection ที่สามารถเข้ามาใช้งานได้เมื่อขนาดที่มีการใช้งานมากที่สุด โดยแต่ละ connection นั้นจะใช้ ทรัพยากร shared_buffer ที่กำหนดไว้ แน่นอนว่า ถ้ากำหนดไม่ดีจะมีอาการที่บอกว่า "out of memory" นะ

max_connections = 200  # small server
  max_connections = 700  # web application database
  max_connections = 40   # data warehousing database

shared_buffer หมายถึงปริมาณของหน่วยความจำที่ต้องใช้งานเพื่อการประมวลผล process ก่อนที่จะมีการใช้ disk เพื่อทำ operation ประมาณว่ามี active operations ได้มากเท่าไหร่

# shared_buffers = ( Available RAM / 4 )
# shared_buffers = 512MB   # basic 2GB web server
# shared_buffers = 8GB     # 64-bit server with 32GB RAM

work_mem หมายถึงปริมาณของหน่วยความจำที่สามารถใช้ในแต่ละ query ได้ ถ้า Query อะไรที่ซับซ้อนก็ต้องเพิ่มค่านี้เลยครับ

# Most web applications should use the formula below, because their 
# queries often require no work_mem. 
# work_mem = ( AvRAM / max_connections ) ROUND DOWN to 2^x
# work_mem = 2MB  # for 2GB server with 700 connections  

# Formula for most BI/DW applications, or others running many complex
# queries:
# work_mem = ( AvRAM / ( 2 * max_connections ) ) ROUND DOWN to 2^x
# work_mem = 128MB   # DW server with 32GB RAM and 40 connections 
ในส่วนตัวแล้ว กว่าจะปรับค่านี้ลงตัวก็ต้องลองหลาย ๆ รอบเหมือนกัน บางทีดูรูปแล้วน่าจะเข้าใจมากขึ้น
ref: http://momjian.us/main/presentations.html

Wednesday, December 16, 2009

Opensource Commercial ที่เค้าใช้กันทำเงินกันได้อย่างไร

วันนี้ต้องสมัครสมาชิกอ่าน feed พอดีอ่านไปเจออันนี้ Canonical: take my money ทำให้เข้าใจว่าในที่สุด user ที่ใช้ opensource ต้องการอะไร

You claim that there is little money in the desktop software business and more in services. Well here is something I would pay money for:
Take the top selling business laptops from Dell, Acer, HP, Lenovo and offer customized distributions for them.
I would pay for that in an instant. All too often people confuse open source with free of charge. I’m perfectly capable of making that distinction. In fact, I use my machines for my work and don’t want to spend days configuring all the devices on them. As such, I would pay something like 50 USD for a customized (K)Ubuntu or perhaps 150-200 USD if it came with some sort of (e-mail) support contract for a year.
I don’t use Linux / Ubuntu because it costs less, I use it because I prefer it over Windows to do my job. I would pay that kind of money because I would save time and money in the long run.
Until the major hardware vendors offer decent (worldwide) support for Linux on their machines (out of the box that is), I think this is an idea with potential and I hope at least someone picks it up. Go ahead, let me spend money on it!

บริการจัดว่าเป็นรายได้หลัก ๆ แต่ที่ลูกค้าอยากได้ต่อไปคือ save time นั้นเอง

Life has no boundaries.
Wattana

ลบ log file ที่เกิน 90 วัน ทุก ๆ วันทำอย่างไร

GNU มันเจ๋งอย่างนี้นี่เอง

อ้างอิง: http://www.faqs.org/qa/qa-10326.html

เนื่องจากปัจจุบันตั้ง CRON ไว้สำหรับทำงานในการเก็บ log ไฟล์ โดยกำหนดให้สร้าง Folder เมื่อถึงวันใหม่ พอระบบทำงานมาได้ซักพักก็พบว่า เจ้าปริมาณของ log ไฟล์ นั้นมีขนาดใหญ่ขึ้นเรื่อย ๆ

ทำอย่างไรให้มันเก็บย้อนหลังได้แค่ 90 วัน (เขาคิดว่าจะรอดปลอดภัย) ในใจแว็บเค้ามาเลยครับ "ก็ให้มันทำการลบ log ไฟล์ที่มันย้อนหลังมากกว่า 90 วันสิ"

เข้าทางของ CRON เลยครับ "ทำอย่างไรจะลบ log ไฟล์ ที่เกิน 90 วัน ทีละวันได้" พออ่านเอกสารแล้วก็พบว่า เจ้า date ใน UNIX นี่ช่วยได้ทีเดียว แต่สิ่งมหัศจรรย์ที่เกิดขึ้นคือ date ของ GNU มันทำอย่างนี้ได้ด้วย

$ date -d "90 day ago"
พฤ. 17 ก.ย. 2552 05:19:56 ICT

ดูซิ ทำกันได้

Friday, November 27, 2009

ประสบการณ์ การใช้งาน PostgreSQL

ช่วงนี้ต้องตรวจสอบการทำงานของ Database เพื่อทำการ รีดประสิทธิภาพให้ได้มากที่สุด คิดว่า 80GB น่าจะใหญ่แล้ว ยังมีคนทำ DB ใหญ่กว่าเราอีกแหะ

ref: http://osdir.com/ml/db.postgresql.sql/2002-04/msg00232.html

This is a collection of many performance tips that we've gathered together at Affymetrix, and I thought it would be useful to post them to the PostgreSQL news group.

The single most helpful trick has been the "Partial index trick" at the bottom and the use of temp tables. Most of these tricks came from either this news group, or from my colleagues in the bioinformatics department, so I'd like to thank and acknowledge both groups.

I'd like to thank Tom Lane, who clearly has been working very hard on the Optimizer, and all the other people who have worked on Postgres. Your efforts have been invaluable to us. Keep up the good work!

We are currently working on a Transcriptome project, which is a follow-on to the human genome project, in which we systematically look across all parts of the genome to see what is expressed in the form of RNA. It is publicly funded by the National Cancer Institute and the data is made publicly available at: http://www.netaffx.com/transcriptome/

We currently have about 100GB of data and will soon grow to a multi-terabyte system. We have tables of up to 1 billion rows and have been able to get ~1 million row queries to run in about 5 min. We've been very pleased with postgres. After a couple of major outages in our server room, it came back up flawlessly each time. So it has been an invaluable asset for this project. We run 7.2 on Red Hat on a 2-processor machine with SAN, and we have a 128-node linux cluster which will make analysis runs against the database.

Our main request is continued enhancement of the optimizer for these heavy types of queries. Improved use of indexes, ability to control execution plans explicitly, ability to use indexes for data retrieval without touching the table in certain cases, and other such features would be very useful. I'm also curious to hear about whether there is any good clustering system for making a parallel postgres installation, and if others have experience with creating such large databases.

We've been very happy and impressed with the constant improvements to the system. Thank You!

This page is a long detailed list of performance tips for doing heavy duty queries.

Indexes 1. Indexes are critical. Create exactly the combined (multi-field) indexes that are being joined in a particular join. The order of fields in the index and in the join must match exactly.
Indexes 2. Multi-Field Indexes. Having indexes on individual columns as well as combinations of 2,3,and 4 columns can help. Sometimes is uses the 3 version, and sometimes it uses one 2 and one singlet index. This can be helpful, especially when seq scan is turned off and you are using limit.
Indexes 3. Remember that multiple-field indexes must have the fields in the correct order as they are accessed in the query. An index can only be used to the extent allowed by the keys. An index over (A B C) can be used to find (A B), but not (B C).
Vacuum. Always vacuum analyze the table(s) after creating indices (or loading/deleting data).
Limit and Order by. May have to use order by and/or limit to use the indexes. May need to use order by with limit. Sometimes order by increases speed by causing use of an index. Sometimes it decreases speed because a Sort step is required. A where condition that is sufficiently restrictive may also cause an index to be used.
Join Order. Order of fields, joins, and order by fields has a big impact.
Casting 1. May have to explicitly cast things. For instance where x=3 must become (where x=cast(3 as smallint)). This can make a huge difference.
Casting 2. Simply adding abs(destype)=(cast 111 as smallint) to my query and turning seq scans off seems to change the query execution plan. Writing this as (destype=111 or destype=-111) makes the cost over 7 times higher!!
Seq Scans 1. Can you disable seq scans? Yes, you can type "set enable_seqscan=no;" at the psql prompt and disable it. Do not be surprised if this does not work though. You can also disable merges, joins, nested loops, and sorts. Try this and attempt to enable the correct combination that you want it to use.
Seq Scans 2. In general you would like it to use an index, but don't be afraid to try the seq scans if cost is say < 150,000 and see if it it finishes in a few minutes. For large joins with no where clause, Postgres always uses seq scans. Try to add a where clause, even a non-restrictive one, and use an index. However, remember that postgres must go get the table data too, so this can be more costly. Postgres cannot read data solely from an index (some commercial databases can).
Seq Scans 3. Sometimes it is true that seq scans are faster. It tries to use the analyzed statistics to decide which is better. But don't always trust it, try it both ways. This is why analyzing your table will produce different execution plans at after analysis -- The analysis step will update the stats of the table. The change in estimated costs might cause a different plan to be chosen.
Explain Output. Reading the Explain output can be confusing. In general, the numbers are a range. If you are trying to just get some rows back, you'd like the left most number to be 0. This means that the right-most number will probably not happen, because you will not really have to search the entire table. The right-most number is an upper bound. The numbers sum as you go up. What you don't want is a large number for both the min and max. Sometimes a cost of about 100,000 takes about 3 minutes. Sometimes this is not accurate. Sometimes I was able to to see a lower seq scan cost, but when I disable seq scans and used indexes, the actual performance was faster. In general the cost is in milliseconds. Use Explain Analyze which will run through they query and produce actual times.
SQL tricks. Remember the standard SQL tricks which I will not cover here (get a good thick SQL book). For example using Like, etc. can be slow. Remember that if there is no data in your table for a given where clause, it must scan the entire result just to tell you "no results found" so know your data in advance.
Nested loops are probably the most expensive operation.
Having several merges and sorts can be way better than having a single nestloop in your query.
Explicit Joins. For more than 2 joined tables, consider using explicit joins (see:http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/explicit-joins.html)
News Groups. Try the postgres news groups: http://www.us.postgresql.org/users-lounge/index.html
Hardware/Configuration changes. I won't go into a lot of detail here as this page is more about the query optimizer, but you can look at how much your CPU and memory is being taxed, and try running postmaster with various flags to increase speed and memory. However, if your query plan is not coming out right this will have little impact.
Identities. You can try typing "and a.id=a.id" and this will actually help encourage the query planner to use an index. In one example, select with x=x and y=y order by x worked best (order by y too made it worse!).
Temp tables. You may want to explicitly control the query by breaking it into several steps, with intermediate tables being created along the way. You can make these true temp tables, which will go away when you log out, or you may want to keep them around. You might want to create a procedure or script that automates/hides this process.
Views. Views sometimes say that they are adding a step to the query planner, but it does not seem to impact query speed. But if you add more clauses to the view this may change the query plan in a bad way, which is confusing to the user.
Stored Procedures. Try writing a stored procedure to more explicitly control the query execution. If you do this break out SQL into many small cursors instead of 1 large cursor, otherwise you will run up against the same problems.
External programs. As above, breaking out a query into a series of small, explicit nested loops in a C, Perl, or other client program, may actually improve performance (especially if you want a subset of results/tables).
Monitor Query Progress. Alan Williams provided a good trick to monitor the progress of a long running query. If you add to the query a sequence (select nextval('sq_test'),...) then you can use select currval('sq_test') to see how far the query has progressed.
Partial Indices. You can use this feature to force use of an index!!! (it is also useful as a true partial index). Assume table1 below has no rows where field1=0. By doing the actions below, it stores the clause field1<>0 in pg_index and when it sees that predicate, it always uses the partial index. In this case we are using it as a full index to trick it. Example:

create index i on table1(field1) where field1 <> 0;

     select * from table1 where field1<>0;

Shane Brubaker

BioInformatics Engineer

Affymetric, Inc.

PostgreSQL กับการกำหนดค่าที่น่าสนใจของ WEB, OLPT, DW

Well, that doesn't help unless we either provide a .conf generation tool (something I favor) or docs somewhere which explain which are the variables to be the most concerned with instead of making users read through all 218 of them.

Attached is the postgresql.conf.simple I used in my presentaiton. It has an egregious math error in it (see if you can find it) but should give you the general idea.

--Josh

# ----------------------------------------
# Simple PostgreSQL Configuration File
# ----------------------------------------

# This file provides a simple configuration with the most common options
# which most users need to modify for running PostgreSQL in production, 
# including extensive notes on how to set each of these.  If your configuration
# needs are more specific, then use the standard postgresql.conf, or add 
# additional configuration options to the bottom of this file.
#
# This file is re-read when you send a SIGHUP to the server, or on a full
# restart.  Note that on a SIGHUP simply recommenting the settings is not
# enough to reset to default value; the last explicit value you set will
# still be in effect.
#
# AvRAM:  Several of the formulas below ask for "AvRAM", which is short for
# "Available RAM".  This refers to the amount of memory which is available for
# running PostgreSQL.  On a dedicated PostgreSQL server, you can use the total
# system RAM, but on shared servers you need to estimate what portion of RAM
# is usually available for PostgreSQL.
#
# Each setting below lists one recommended starting setting, followed by
# several alternate settings which are commented out.  If multiple settings
# are uncommented, the *last* one will take effect.

# listen_addresses
# ------------------------
# listen_addresses takes a list of network interfaces the Postmaster will
# listen on.  The setting below, '*', listens on all interfaces, and is only
# appropriate for development servers and initial setup.  Otherwise, it 
# should be restrictively set to only specific addresses. Note that most
# PostgreSQL access control settings are in the pg_hba.conf file.

  listen_addresses = '*' # all interfaces 
# listen_addresses = 'localhost'  # unix sockets and loopback only
# listen_addresses = 'localhost,192.168.1.1' # local and one external interface

# max_connections
# ------------------------
# An integer setting a limit on the number of new connection processes which 
# PostgreSQL will create.  Should be set to the maximum number of connections 
# which you expect to need at peak load.  Note that each connection uses
# shared_buffer memory, as well as additional non-shared memory, so be careful
# not to run the system out of memory.  In general, if you need more than 1000
# connections, you should probably be making more use of connection pooling.
# 
# Note that by default 3 connections are reserved for autovacuum and 
# administration.

  max_connections = 200  # small server
# max_connections = 700  # web application database
# max_connections = 40   # data warehousing database

# shared_buffers
# ------------------------
# A memory quantity defining PostgreSQL's "dedicated" RAM, which is used
# for connection control, active operations, and more.  However, since
# PostgreSQL also needs free RAM for file system buffers, sorts and 
# maintenance operations, it is not advisable to set shared_buffers to a
# majority of RAM.  
#
# Note that increasing shared_buffers often requires you to increase some 
# system kernel parameters, most notably SHMMAX and SHMALL.  See 
# Operating System Environment: Managing Kernel Resources in the PostgreSQL
# documentation for more details.  Also note that shared_buffers over 2GB is 
# only supported on 64-bit systems.
#
# The setting below is a formula.  Calculate the resulting value, then
# uncomment it.  Values should be expressed in kB, MB or GB.

# shared_buffers = ( AvRAM / 4 )
# shared_buffers = 512MB   # basic 2GB web server
# shared_buffers = 8GB     # 64-bit server with 32GB RAM

# work_mem
# ------------------------
# This memory quantity sets the limit for the amount of non-shared RAM 
# available for each query operation, including sorts and hashes.  This limit
# acts as a primitive resource control, preventing the server from going
# into swap due to overallocation.  Note that this is non-shared RAM per
# *operation*, which means large complex queries can use multple times
# this amount.  Also, work_mem is allocated by powers of two, so round
# to the nearest binary step.

# The setting below is a formula.  Calculate the resulting value, then                                                                                                              
# uncomment it.  Values should be expressed in kB, MB or GB.  Maximum
# is currently 2GB.

# Most web applications should use the formula below, because their 
# queries often require no work_mem. 
# work_mem = ( AvRAM / max_connections ) ROUND DOWN to 2^x
# work_mem = 2MB  # for 2GB server with 700 connections  

# Formula for most BI/DW applications, or others running many complex
# queries:
# work_mem = ( AvRAM / ( 2 * max_connections ) ) ROUND DOWN to 2^x
# work_mem = 128MB   # DW server with 32GB RAM and 40 connections 

# maintenance_work_mem
# -------------------------
# This memory value sets the limit for the amount that autovacuum, 
# manual vacuum, bulk index build and other maintenance routines are 
# permitted to use.  Setting it to a moderately high value will increase
# the efficiency of vacuum and other operations.

# The setting below is a formula.  Calculate the resulting value, then                                                                                                                   
# uncomment it.  Values should be expressed in kB, MB or GB.  
# Maximum is currently 2GB.                                                                                                                           

# Formula for most databases
# maintenance_work_mem = ( AvRAM / 8 ) ROUND DOWN to 2^x
# maintenance_work_mem = 256MB  #webserver with 2GB RAM
# maintenance_work_mem = 2GB  #DW server with 32GB RAM

# max_fsm_pages
# --------------------------
# An integer which sets the maximum number of data pages with free space 
# which the Postmaster will track.  Setting this too low can lead to 
# table bloat and need for VACUUM FULL.  Should be set to the maximum number
# of data pages you expect to be updated between vacuums. 
#
# Increasing this setting requires dedicated RAM and like shared_buffers
# may require to to increase system kernel parameters.  Additionally, the
# recommended setting below is based on the default autovacuum settings;
# if you change the autovacuum parameters, then you may need to adjust
# this setting to match.

# The setting below is a formula.  Calculate the resulting value, then                                                                                                                   
# uncomment it.  DBsize is your estimate of the maximum size of the database;
# if the database is already loaded, you can get his from pg_database_size().
# For large data warehouses, use the volume of data which changes between 
# batch loads as your "DBSize"

# For small databases ( less than 10GB )
# max_fsm_pages = ( ( DBsize / 8kB ) / 8 )
# max_fsm_pages = 100000  #6GB web database 

# For larger databases ( Many GB to a few TB )
# max_fsm_pages = ( ( DBsize / 8kB ) / 16 )
# max_fsm_pages = 800000  #100GB OLTP database
# max_fsm_pages = 4000000  #DW loading 0.5TB data daily

# synchronous_commit
# -------------------------
# This boolean setting controls whether or not all of your transactions
# are gauranteed to be written to disk when they commit.  If you are
# willing to lose up to 0.4 seconds of data in the event of an unexpected 
# shutdown (as many web applications are), then you can gain substantial
# performance benefits by turning off synchronous commit.  For most
# applications, however, this setting is better used on a per-session 
# basis.

  synchronous_commit = on   #most applications
# synchronous_commit = off  #if speed is more important than data

# wal_buffers
# -------------------------
# this memory setting defines how much buffer space is available for 
# the Write Ahead Log.  Set too low, it can become a bottleneck on 
# inserts and updates; there is no benefit to setting it high, however.
# As with some of the other settings above, may require increasing
# some kernel parameters.

wal_buffers = 8MB
 
# checkpoint_segments
# -------------------------
# This integer defines the maximum number of 8MB transaction log segments
# PostgreSQL will create before forcing a checkpoint.  For most
# high-volume OTLP databases and DW you will want to increase this
# setting significantly.  Alternately, just wait for checkpoint 
# warnings in the log before increasing this.
#
# Increasing this setting can make recovery in the event of unexpected 
# shutdown take longer.
#
# Maximum disk space required is (checkpoint_segments * 2 + 1) * 16MB, 
# so make sure you have that much available before setting it.

checkpoint_segments = 16    #normal small-medium database
# checkpoint_segments = 64  #high-volume OLTP database
# checkpoint_segments = 128 #heavy-ETL large database

# autovacuum
# ---------------------------
# autovacuum turns on a maintenance daemon which runs in the background, 
# periodically cleaning up your tables and indexes.  The only reason to turn
# autovacuum off is for large batch loads (ETL).

  autovacuum = on   #most databases
# autovacuum = off  #large DW

# effective_cache_size
# --------------------------
# This memory setting tells the PostgreSQL query planner how much RAM
# is estimated to be available for caching data, in both shared_buffers and
# in the filesystem cache. This setting just helps the planner make good
# cost estimates; it does not actually allocate the memory.

# The setting below is a formula.  Calculate the resulting value, then                                                                                                                   
# uncomment it.

# effective_cache_size = ( AvRAM * 0.75 )

# default_statistics_target
# --------------------------
# This integer setting determines the histogram sample size for the 
# data about table contents kept by the query planner.  The default
# is fine for most databases, but often users need to increase it 
# either because they're running data warehouses or because they have
# a lot of poorly planned queries.

default_statistics_target = 10
# default_statistics_target = 200  #have had some bad plans
# default_statistics_target = 400  #data warehouse

# constraint_exclusion
# --------------------------
# This boolean setting should be turned "on" if you plan to use table 
# partitioning.  Otherwise, it should be "off".

  constraint_exclusion = off #in general
# constraint_exclusion = on  #if you plan to use partitioning

# log_destination & logging settings
# --------------------------
# This ENUM value determines where PostgreSQL's logs are sent.  What
# setting to use really depends on your server room setup and the 
# production status and OS of your server.
#
# Note that there are several dozen settings on what and how often
# to log; these will not be covered in detail in this quick 
# configuration file.  Instead, several common combinations are
# given.

# Syslog setup for centralized monitoring
# log_destination = 'syslog'
# syslog_facility = 'LOCAL0'  #local syslog
# syslog_facility = 'log_server_name'  #remote syslog

# Windows
# log_destination = 'eventlog'

# Private PostgreSQL Log
# log_destination = 'stderr'
# log_collector = on
# log_directory = '/path/to/log/dir'

# CSV logging for collecting performance statistics.
# Warning: this much logging will generate many log
# files and affect performance.
# log_destination = 'csvlog'
# log_collector = on
# log_directory = '/path/to/log/dir'
# log_duration = on
# log_temp_files = 256kB
# log_statement = 'all'

เดี๋ยวค่อยมาอธิบายครับ รอหน่อย

Tuesday, November 17, 2009

Don't speak Business to Me.

May i was in wrong place and situation!

I hate business talking because I'm an engineer. Business always speak with Finance and trick. That why i dont like it.

Please give me direct words! Because it's complicated.

But if it has to be. Please tell me first!

Monday, November 16, 2009

ท่านคึกฤทธิ์ ว่าไว้

ม.ร.ว.คึกฤทธิ์ ปราโมช
หนังสือพิมพ์สยามรัฐสัปดาหวิจารณ์
18 ตุลาคม 2502

          สัปดาห์นี้มีเรื่องความเมืองใหญ่         ไทยถูกฟ้องขับไล่ขึ้นโรงศาล

          เคยเป็นเรื่องโต้เถียงกันมานาน        ที่ยอดเขาพระวิหารรู้ทั่วกัน

          กะลาครอบมานานโบราณว่า             พอแลเห็นท้องฟ้าก็หุนหัน

          คิดว่าตนนั้นใหญ่ใครไม่ทัน              ทำกำเริบเสิบสันทุกอย่างไป

          อันคนไทยนั้นสุภาพไม่หยาบหยาม    เห็นใครหย่อนอ่อนความก็ยกให้

          ถึงล่วงเกินพลาดพลั้งยังอภัย             ด้วยเห็นใจว่ายังเยาว์เบาความคิด

          เขียนบทความด่าตะบึงถึงหัวหู           ไทยก็ยังนิ่งอยู่ไม่ถือผิด

          สั่งถอนทูตเอิกเกริกเลิกเป็นมิตร         แล้วกลับติดตามต่อขอคืนดี

          ไทยก็ยอมตามใจไม่ดึงดื้อ               เพราะไทยถือเขมรผองเหมือนน้องพี่

          คิดตกลงปลงกันได้ด้วยไมตรี            ถึงคราวนี้ใจเขมรแลเห็นกัน

          หากไทยจำล้ำเลิกบ้างอ้างขอบเขต    เมืองเขมรทั้งประเทศของใครนั่น ?

          ใครเล่าตั้งวงศ์กษัตริย์ปัจจุบัน            องค์ด้วงนั้นคือใครที่ไหนมา ?

          เป็นเพียงเจ้าไม่มีศาลซมซานวิ่ง         ได้แอบอิงอำนาจไทยจึงใหญ่กล้า

          ทัพไทยช่วยปราบศัตรูกู้พารา            สถาปนาจัดระบอบให้ครอบครอง

          ได้เดชไทยไปคุ้มกะลาหัว                 จึงตั้งตัวขึ้นมาอย่างจองหอง

          เป็นข้าขัณฑสีมาฝ่าละออง               ส่งดอกไม้เงินทองตลอดมา

          ไม่เหลียวดูโภไคไอศวรรย์                ทั้งเครื่องราชกกุธภัณฑ์เป็นหนักหนา

          ฝีมือไทยแน่นักประจักษ์ตา               เพราะทรงพระกรุณาประทานไป

          มีพระคุณจุนเจือเหลือประมาณ          ถึงลูกหลานกลับเนรคุณได้

          สมกับคำโบราณท่านว่าไว้               อย่าไว้ใจเขมรเห็นจริงเอย

Friday, October 23, 2009

Took Lae Dee (ถูกและดี)

ชีวิตผมจะผูกพันกับฟู๊ดแลนด์ (www.foodland.co.th) ค่อนข้างมาก เอาเป็นว่าไม่ค่อยอยากเดินที่ซุปเปอร์มาร์เก็ต ที่อื่นแล้วกัน

วันนี้ก็อีกวันหนึ่งที่ต้องฝากท้องที่ฟู๊ดแลนด์ ถูกและดี (http://foodland.co.th/restaurant.htm) ชื่อร้านก็ธรรมดา เข้าใจง่าย ร้านอาหารส่วนมากจะซ่อนครัวไว้หลังร้าน แต่ที่ฟู๊ดแลนด์ไม่ เค้าเอามาให้ดูเลยว่าเตรียมกันอย่างไร วุ่นวายขนาดไหน

ผมไม่รู้ว่าวัตถุดิบมาจากที่ใดได้ แต่ขอเดาว่า น่าจะมาจาก ฟู๊ดแลนด์ซุปเปอร์มาเก็ต นี่แหละ เป็นเปิดร้านอาหารที่มีแหล่งที่มาใกล้ ๆ ดี

ว่าง ๆ ลองไปชิมดูนะครับ

Sunday, October 11, 2009

Service Level Agreement

ปัจจุบันนี้การซื้อสินค้าจะมีบริการหลังการขายให้มาด้วย เช่น เมื่อเราซื้อ Apple Mac Book แล้ว ก็จะมีการรับประกันสินค้าและบริการให้ภายใน 1 ปี พอหลังจากนั้นทางลูกค้าก็สามารถเลือกได้ว่าจะซื้อบริการหลังการขายต่อได้ เช่น Apple Care

ปัจจัยอะไรที่ทำให้ลูกค้าตัดสินใจในการซื้อบริการเหล่านี้บ้าง

ราคาค่าบริการ
การบริการที่ได้รับ

ซึ่งก็ต้องมองว่าความคุ้มค่าต่อการซื้อบริการหรือไม่

พอมามองถึงการซื้อซอฟต์แวร์ ก็จะมีลักษณะคล้าย ๆ กัน โดยทั่วไปแล้วจะมีการคิดค่าบริการอยู่ที่ 10-15% ของราคาขาย

Software as Service หละ เค้าคิดกันอย่างไร?

เนื่องจากการใช้งาน Sofware ในลักษณะนี้เป็นการซื้อบริการตั่งแต่แรก ดังนั้นจึงไม่มีการซื้อสินค้าเลย แล้วจะคิดราคาอย่างไร

เมื่อก่อนเราบอกลูกค้าว่าต้องจ่าย 10-15% แต่เดี่ยวนี้อาจต้องบอกว่า ลูกค้าจะจ่ายเท่าไหร่ แล้วทางผู้บริการก็บริการเท่าที่ทำได้ แต่เนื่องจากต้องมีหลักให้ทางลูกค้าพิจารณาเริ่มต้นก่อน ดังนั้นจึงแยกออกมาเป็น 3 ระดับง่าย ๆ ก่อน เช่น

Platinum Service
Silver Service
Bronze Service

โดยแต่ละระดับก็ให้บริการที่ต่าง กัน เช่น

Services	Bronze Service	Silver Service	Platinum Service
Response Time	1 hr	30 m	15 m
Resolved Time	3 hr	2 hr	1 hr
Service Engineer	N/A	1	2
Business Application Engineer	N/A	N/A	1
Phone support	2 hr/week	3 hr/week	5 hr/week
Email Support	5 email/week	10 email/week	Unlimit
Time of Service	Mon-Fri 8:00-17:00	Mon-Fri 8:00-17:00	All Time

สุดท้ายก็ดูที่ราคาที่ผู้ให้บริการสามารถยอมรับได้

ความยากก็ตกอยู่ที่ผู้ให้บริการแทน เพราะว่าต้องคอยทำให้ได้ตามที่ตกลงกันไว้นั่นเอง

Friday, October 9, 2009

กดของความพอดีของ Python

ผมเป็นพวกสาวก Python เพราะว่าเขียน Java แบบโครงการได้แค่ Hello World แต่เขียน Python ได้ถึงระดับ บังครับ IE ทำงานเป็น Robot

เอาเป็นว่ามาดูความง่ายของ Python กัน

The Zen of Python

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Monday, October 5, 2009

Decision Support System ในด้านอื่นมากกว่า Finance

ตัวอย่างการใช้งานของ DSS นั้น ส่วนใหญ่จะมุ่งไปยังด้านการเงิน หรือไม่ก็ไม่ยุ่งเกี่ยวกับระบบ ERP เป็นหลัก แต่ปัจจุบันมีการใช้งานระบบสารสนเทศกันอย่างกว้างขวาง ดังนั้นจึงเกิดมีการนำระบบ DSS ไปวิเคราะห์ในงานด้านใหม่ ๆ เกิดขึ้นอีกมากมาย

หลายองค์ที่มุ่งเน้นแต่เรื่องทางด้านการเงินก็จะสูญเสียโอกาสอันหนึ่งไปคือ "ระบบการเงินนั้นเป็นระบบปลายน้ำ" กล่าวคือ มันเป็นผลของกิจกรรมอื่น ๆ นั่นเอง จะใช้คาดการอนาคตไม่ได้อย่างถูกต้องนัก

ปัจจัยที่ระบบ DSS ส่วนใหญ่ยังคงอยู่กับระบบการเงินนั้น มีเหตุผลดังนี้

เป็นหน่วยงานที่มีข้อมูลให้วิเคราะห์
งานทางด้านอื่น ๆ เช่น การตลาดมีการเปลี่ยนแปลงข้อมูลที่ไว และไม่ค่อยเป็นระเบียบ ทำให้ยากต่อการหารูปแบบที่แน่นอน
อาจเป็นไปได้ว่าคนที่มาทำระบบให้ ไม่ค่อยมีประสบการณ์กับระบบอื่น ๆ

ยกตัวอย่างระบบที่เค้าเรียกว่าต้นน้ำกัน

ระบบวิเคราะห์การจัดส่งจดหมายหรือสื่อสิ่งพิมพ์ ให้กับลูกค้าที่มีความสนใจในผลิตภัณฑ์ เช่น งานวิเคราะห์ Questionnaire งานการลงทะเบียน
การวิเคราะห์หาสาเหตุของการเปลี่ยนอะไหล่รถยนต์ในแต่ละพื้นที่
การหาช่วงเวลาของการเปิดสัญญาณไฟ ตามแยกต่าง ๆ ตามเวลา ที่เหมาะสม

ช่องว่างของการวิเคราะห์เหล่านี้ต้องการผู้ที่เข้าใจการใช้งานข้อมูลเป็นอย่างยิ่ง ซึ่งปัจจุบันยังขาดบุคคลากรดำเนินการนี้

มาดู WorkFlow ที่เค้าทำกัน

ปัญหาที่เกิด
เมื่อมีลูกจ้างในหน่วยงานด้าน Healthcare จะรู้ได้อย่างไรว่า ใบอนุญาติหมดอายุเมื่อใด วิธีการจัดการมี 2 แบบ คือมอบหมายให้ตัวแทนการจัดหาลูกจ้างคอยจัดหาลูกจ้างที่ใบอนุญาติยังไม่หมดอายุเข้ามาทำงานเท่านั้น หรือ หาทางป้องกันไม่ให้ลูกจ้างที่ใบอนุญาติหมดอายุทำงานที่เสี่ยงจนกว่าจทำการต่อใบอนุญาติ
จุดมุ่งหมายของธุรกิจ
ต้องการเพิ่มความปลอดภัยของผู้ป่วย โดยการลดจำนวนของลูกจ้างที่ใบอนุญาติหมดอายุ ลดค่าใช้จ่ายของตัวแทนที่ต้องจัดหาลูกค้าที่จะหมดอายุ
ขบวนการในปัจจุบัน
ให้แต่ละผู้จัดการในส่วนต่าง ๆ ตรวจสอบวันหมดอายุของลูกค้าในส่วนที่เกี่ยวข้อง
วิธีการตอบปัญหา
ทำตารางเวลาของรายงานจากฐานข้อมูลส่วนกลาง แต่ละแผนกโดยมีรายละเอียดของลูกจ้าง และวันหมดอายุของใบอนุญาติส่งให้ผู้ที่เกี่ยวข้อง

Friday, October 2, 2009

Simple Cube Store

มาดูกันว่า หลังจากได้ Dimension แล้ว คราวนี้ เราจะมาสร้าง Cube อย่างไร

การมอง Cube คือการมองที่ตาราง Fact นั่นเอง โดยมี Foreign key ไปยังแต่ละ Dimension ที่สร้าง

จะมีสิ่งที่ดูแล้วใหม่นิดหนึ่งคือตรง Dimension "Store Type" เพราะว่าเป็นการ Degenerate Dimension เพราะว่าเนื่องจากว่า Store Type นั้นมีสมาชิกไม่กี่ตัว การทำตารางออกมาอีกต่างหากเพื่อ Dimension นั้นดูแล้วจะทำให้การ Query ต้องทำงานมากขึ้น

Join Table ใน Dimension

ปกติแล้วเราจะมีแค่ Table เดียวใน Dimension ซึ่งเรียกว่า Star schema แต่ก็มีบางครั้งที่ ใน Dimension นั้นมีการ Join กันของตารางที่มากกว่า 1 ตาราง เรียกว่า "Snowflake schema" หลักการของการเขียน Dimension ก็แบบนี้ (ดูแล้วคล้าย ๆ กับ Table join กันนั้นแหละ)

พอเรา Join กันแล้ว ตรง Level เราก็สามารถอ้างอิงถึง สมาชิกของแต่ละตารางได้

Time Dimension

Time Dimension เป็นลักษณะการสร้าง Hierarchy ของเวลา โดยรูปแบบมักจะเป็นดังนี้

Year

Quarter

Month

Week

Day in month

อันนี้ก็แล้วแต่ความต้องการจะวิเคราะห์

เหตุที่มี Hierarchy 2 อันนั้นถ้าดูจากฐานข้อมูลแล้วจะพบว่า ข้อมูลนั้นอยู่บนตารางเดียวกัน ซึ่งในบางครั้งเราสามารถงานต่างกันได้

มาเรียนรู้ FoodMart กันเถอะ

คราวนี้หลังจากทำการติดตั้ง Pentaho เรียบร้อยแล้ว ก็ถึงเวลามาเจอะการใช้งานเป็นส่วนๆ ไป ส่วนแรกในวันนี้เราจะพูดถึง "ทำความเข้าใจกับ Mondrian จาก FoodMart"

Foodmart เป็น Sample Data ที่ติดมากับ Mondrian การติดตั้ง FoodMart นั้นต้องกลับไปดูบทความก่อน ๆ นะครับ

Dimension แรกที่เราจะพูดถึงคือ Store มีโครงสร้างดังนี้

ผมชอบ XML ตรงที่ โครงสร้างอธิบายความหมายด้วยนี่แหละ เพราะว่า ไม่ต้องเสียเวลาอธิบาย ลงไปใน Code

จาก Dimension นี้เราจะได้ Dimension Store ที่บอกถึงคุณลักษณะของ Store เราได้อย่างครบถ้วน ที่ชอบมาก็เห็นจะเป็น ขนาดต่าง ๆ ใน Dimension นี้นี่แหละ

Store Sqft = ขนาดพื้นที่ทั้งหมดของร้าน มีหน่วยเป็นตารางฟุ๊ต
Grocery Sqft = ขนาดพื้นที่ที่สามารถขายสินค้าได้ มีหน่วยเป็นตารางฟุ๊ต
Frozen Sqft = ขนาดพื้นที่แช่แข็ง
Meat Sqft = ขนาดพื้นที่ของอาหารประเภทเนื้อ
Has coffee bar = ร้านนี้มีชั้นขายเครื่องดื่มกาแฟหรือไม่ (คิดได้ไงเนี่ย)

จะเห็นว่า Dimension นี้มีไว้สำหรับการวิเคราะประเภทร้านได้ดีทีเดียว

แล้วตรง Property มีไว้ทำไม?
Property มีไว้สำหรับการเข้าถึง Member โดยใช้ชื่อของ Property ดังตัวอย่างนี้

Saturday, September 26, 2009

มาเรียนภาษาญี่ปุ่นกันเถอะ

ปกติเวลานั่งทำงานอยู่ที่บ้านผมจะเปิดโทรทัศน์ หรือวิทยุ ไปด้วย เพราะว่าทำอย่างที่บ้านไม่ต้องการสมาธิมากมาย วันนี้หมุนหาช่องรายการทีวี น่าดูไม่เจอ สุดท้ายก็ไปเจอรายการ nhkworld ของ www.nhk.co.jp

สถานีนี้เป็นสถานีข่าว แต่ว่ามีรายการสลับระหว่างข่าวน่าดูมาก เอ... แต่ก็ยังไม่เห็นโฆษณานะ ไม่รู้ว่าเป็นเหมือน thai bps บ้านเราหรือเปล่า

วันนี้เป็นรายการของการใช้พลังจากของประเทศญี่ปุ่น เทคโนโลยีของญี่ปุ่นทันสมัยมาก และก็มีการปรับให้เข้ากับวัฒนธรรมด้วย สิ่งที่ได้จากรายการวันนี้มีดังนี้

ญี่ปุ่นเค้าเอาความร้อนของ Magma มาใช้ผลิตไฟฟ้า ถ้าเป็นบ้านเราต้องเอาวัสดุธรรมชาติที่ได้จากการเกษตรมาทำบ้าน
บ้านที่ไม่ต้องใช้เครื่องปรับอากาศ แค่เปลี่ยนทิศทางของลม และ ผสมกับหลักการของธรรมชาตินิดหน่อย
วิธีการลดความร้อนภายในบ้านหรือในอาคารในกรุงโตเกียว โดยใช้การปลูกต้นไม้ เช่น การปลูกต้นไม้ (ที่กินได้ บนดาดฟ้าอาคารสูง) การปลูกผักเป็นม่านกันความร้อนของโรงเรียน การปลูกต้นไม้เป็นที่บังแดดของบ้านที่มีหน้าต่างด้านล่างที่เป็นกระจก
การมุงหลังคาด้วยหญ้า และการเปลี่ยนรอบการมุง

ทั้งหมดยังเป็นการรักษาความเป็นญี่ปุ่นไว้ได้ดี บ้านที่ลดความร้อนและให้แสดงสว่างในตัวเอง ยังเป็นรูปแบบญี่ปุ่นอยู่เลย ถ้าเป็นบ้านหลาย ๆ คนคงว่าเชย

เขียนมาเยอะ ที่จริงแล้ววันนี้มี "มาเรียนภาษาญี่ปุ่นกันเถอะ" หมายความถึง อันนี้ครับ ดูแล้ว นี่สิถึงจะเป็นการใช้เทคโนโลยีที่แท้จริง

ตามนี้เลยครับ http://www.nhk.or.jp/lesson/thai/

Thursday, September 24, 2009

การเปรียบเทียบ Column ใน Analysis Service

เมื่อเรามีข้อมูลหลาย ๆ ช่วงเวลา และต้องการเปรียบเทียบกัน และแสดงผลต่างนั้น ทางที่ดูแล้วง่าย ก็คือเขียน MDX เอง ดีกว่า

ดังนี้

with member [Measures].[Weight] as '([Measures].[Sales] / ([Order Status].Parent, [Measures].[Sales]))', FORMAT_STRING = "#,###.#%"
 member [Measures].[Val] as '[Measures].[Sales]', FORMAT_STRING = IIf(([Measures].[Sales] > ([Time].PrevMember, [Measures].[Sales])), "|#|style=green", "|#|style=red")
 member [Measures].[Qtd] as '[Measures].[Quantity]', FORMAT_STRING = IIf(([Measures].[Quantity] > ([Time].PrevMember, [Measures].[Quantity])), "|#|style=green", "|#|style=red")
select NON EMPTY (Union({[Time].[All Years]}, [Time].Children) * {[Measures].[Qtd], [Measures].[Val], [Measures].[Weight]}) ON COLUMNS,
 NON EMPTY Order([Order Status].Children, [Measures].Value, DESC) ON ROWS
from [SteelWheelsSales]
where ([Product].[All Products], [Markets].[All Markets])

ไม่อธิบายนะครับดูจาก code ก็จะรู้เรื่องดูอยู่แล้ว

Wednesday, September 23, 2009

เรื่องของเครื่องครัวใน data integration

ถ้าหากเคยใช้ Pentaho Data Integration แล้ว ก็จะพบว่าหลัก ๆ แล้วประกอบด้วย 3 ด้วยกันคือ

1. Spoon
2. Kitchen
3. Pan

ตัว Data Integration เองถูกเรียกว่า Kettle ก็เกี่ยวกับครัว ๆ นั่นเอง แต่หลาย ๆ คนพอได้ลองใช้งาน แล้วก็มีคำศัพท์ 3 อันคือ

1. Developing
2. Transformation
3. Job

ใน ##pentaho มีคนบอกความสัมพันธ์ไว้ว่า

Spoon is GUI for developing. Kitchen runs transformations non-interactively, pan runs jobs non-interactively

ปัญหาการ Save Analysis บน Pentaho

วันก่อนพอทำการ Drill ไปจนถึงข้อมูลที่มีภาษาไทย แล้วต้องการ Save เก็บไว้
หลังจาก Save แล้วทำให้ไม่สามารถเรียกใช้ Analysis ที่ได้บันทึกไว้ได้

สุดท้ายก็ต้องไปพึ่ง irc.freenode.net ##pentaho

แล้วก็มีผู้ช่วยเหลือจาก Protuguese บอกไว้ว่า

rGoncalves: I've had a problem with protuguese chars and solved it starting the JVM adding -Dfile.encoding=UTF-8 to the catalina start up options. Maybe you can find an encoding that works for you

จากนั้นเราก็มาทำการ Start JVM ใหม่

มันได้ผล....

ที่น่าดีใจคือ มันมีครช่วยเหลือ

export CATALINA_OPTS="-Dfile.encoding=UTF-8 -Xms256m -Xmx768m -XX:MaxPermSize=256m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000"

Sunday, September 20, 2009

การใช้งาน Combination lookup/update เพื่อสร้าง Cube

หลังจากนั่งรอสร้าง Cube จากจำนวน Record 540,000 โดยที่มีจำนวนสินค้า และจำนวนลูกค้าที่หลังจากทำ Unique แล้ว จะมีจำนวน Record จำนวนมาก โดยเฉพาะที่ ตารางที่เป็น Dimension ของ Customer ซึ่งในรอบแรกใช้เวลาในการสร้าง Dimension มากกว่า 2 ชั่วโมง

แต่พอทำการย้ายตำแหน่งของการสร้าง dimension ก็พบว่าเร็วขึ้นกว่าเดิมเยอะ

สรุปได้ดังนี้ครับ

1. สร้าง Dimension ที่มีจำนวน มาก ๆ ก่อนเสมอ เช่น Customer, Product
2. สร้าง Primary key ไว้สำหรับทำ Aggregate ได้เลย จะได้ไม่เสียเวลาทีหลัง

แก้ไข

จากที่ให้ทำการสร้าง Dimension ของจำนวนข้อมูลที่มีขนาดใหญ่นั้น เท่าที่ลองดูแล้ว แนะนำให้สร้างเองโดยไม่ใช้เครื่องมือดีกว่า

Friday, September 18, 2009

เมื่อ MightMouse ของ Mac เลื่อนแล้วไม่ค่อยทำงาน

วันนี้ลำคาญการใช้ Wheel ของเจ้า Mighty Mouse ไปเจอในหลาย ๆ เวป เค้าให้ถอด แกะ ออกมา แต่ว่าเท่าที่ดูแล้ว Mouse นี่มันไม่มีที่ให้แงะ นะสิ

สุดท้ายก็อ่านเจอว่า เค้าโทรไปถามที่ Support ของ Apple แล้วเค้าบอกว่าทำดังนี้

1. หงาย Mouse ขึ้น
2. แล้วใช้ ผ้าชุบน้ำอุ่น ๆ หรือ แอลกอฮอล์ ให้ชุ่ม
3. แล้วก็คลึงที่ Wheel ประมาณซัก 30 วินาที จะสังเกตได้ว่า จะมีขยะติดมาออกมาติดกับผ้าครับ
4. แล้วใช้ผ้าแห้ง ๆ คลึงที่ Wheel ซัก 30 วินาที จนมั่นใจได้ว่า Wheel นั้นแห้งแล้ว
5. ทดลองใช้งานดูครับ

ของผมกลับมาใช้งานได้แล้วคับ

Monday, September 14, 2009

How to load FoodMart Database into Postgresql

You have to read the installation manual first (http://mondrian.pentaho.org/documentation/installation.php#PostgreSQL_on_Windows_example) then you need a little change to load FoodMart into Postgres

The easy way to import FoodMart data use the pentaho

1. Create Database

server# createdb -U -Eutf8 foodmart

2. Locate biserver-ce folder.

3. Go to library folder.

server# cd biserver-ce/tomcat/webapps/pentaho/WEB-INF/

3. Then locate the Data FoodMartCreateData.sql. (in my case is /mondrian/demo/FoodMartCreateData.sql)


server:biserver-ce/tomcat/webapps/pentaho/WEB-INF/$ java -cp \
 "mondrian-3.1.1.jar:lib/log4j-1.2.8.jar:lib/eigenbase-resgen-1.3.0.11873.jar:lib/eigenbase-properties-1.1.0.10924.jar:lib/eigenbase-xom-1.3.0.11999.jar:lib/commons-logging-1.1.jar:lib/postgresql-8.2-504.jdbc3.jar"  \
  mondrian.test.loader.MondrianFoodMartLoader \
  -verbose -tables -data -indexes \
  -jdbcDrivers="org.postgresql.Driver,sun.jdbc.odbc.JdbcOdbcDriver" \
  -inputFile=/mondrian/demo/FoodMartCreateData.sql \
  -outputJdbcURL="jdbc:postgresql://localhost/foodmart" \
  -outputJdbcUser=postgres \
  -outputJdbcPassword=password

You need to locate your classes for run this.

Good Luck!

Ask me if you need this postgres dump file!

Wednesday, August 19, 2009

กรอบการเรียนรู้ของเด็ก ๆ ที่ฟินแลนด์

ช่วงนี้มีเรื่องเกี่ยวกับประเทศไทย จากมุมมองของคนข้างนอกเยอะ หนึ่งในนั้นเป็นมุมมองจากคนไทยที่ไปอาศัยอยู่ที่ฝรังเศส แต่เล่าวิธีการจัดการกับการศึกษาของประเทศฟินแลนด์ น่าสนใจทีเดียว

"ที่ฟินแลนด์ เค้าให้ความสำคัญกับเด็กมาก ๆ และยังกล้าบอกว่าการเรียนที่ฟินแลนด์นั้นถือได้ว่าเป็นอันดับหนึ่งของโลก" รู้ไหมว่าทำไม?

ฟินแลนด์ไม่ได้บอกว่ามีคนเก่งที่สุดเรียนที่นี่ ไม่ได้บอกว่ามีคนเรียนเก่งเยอะ แต่บอกว่า คนส่วนใหญ่มีความรู้ขั้นพื้นฐานที่ดี และเป็นจำนวนส่วนมากด้วย

เหตุผลก็คือ ที่ฟินแลนด์ ทำให้ทุกคนเก่งเหมือน ๆ กัน ค่อย ๆ เก่ง ค่อย ๆ ผลิต และมีข้อที่น่าสนใจคือ "เด็ก ๆ ที่นั่น จะไม่ได้รับการป้อนการเรียนทางด้านวิชาการจนกระทั่งถึงอายุ 7 ขวบ" ไม่มีการบังคับว่าเด็กแต่ละคนจะต้องเรียนอะไรตามผู้ปกครองสั่ง แต่ปล่อยให้เด็ก ๆ ฝึกการเรียนรู้และหาสิ่งที่ตนเองชอบ

ขั้นเทพ...

มานึกดูก็ใช่ การที่เรากำหนดให้เด็กเรียนอันโน้น อันนี้ หรือว่ากำหนดให้ทำอะไรที่เราอยากให้เค้าเป็น มันดูแล้วเหมือนเป็นการไปกำหนดกรอบความคิดให้เด็ก ตัวอย่างเช่น พ่อแม่อยากให้ลูกเป็นวิศวกร ก็เลยกำหนดให้ลูกเรียนทางด้านวิทยาศาสตร์ คณิตศาสตร์ เพราะคิดว่าจบไปแล้วจะมีงานทำ และเป็นอาชีพที่ดีในอนาคต

หรือพ่อแม่เป็นอาจารย์ ลูก ๆ ก็เห็นพ่อแม่ทำงานอย่างนี้ทุกวัน ลูก ๆ ก็เลยไม่รู้จักวิชาชีพอื่น สุดท้ายก็เป็นอาจารย์เหมือนกัน ครอบครัวอาจารย์ซะงั้น

ดังนั้นโอกาสที่จะมีอาชีพใหม่ ๆ ให้เด็กได้ฝันว่าอยากจะเป็น มันก็ดูแคบทุกที

แล้วคุณหละมีกรอบตั้งแต่เมื่อไหร่

Friday, August 14, 2009

knowledge management ภาค social network

วันนี้มาเรียนขบวนการของ knowledge management กับ ดร. วรา วราวิทย์ อาจารย์ได้กล่าวไว้ว่า ความรู้ และสิ่งที่ทำให้ได้มาซึ่งความรู้ในโลกปัจจุบันนี้ กำลังเปลี่ยนไป เรามีอาจารย์หน้า ใหม่ ๆ ดังนี้

อาจารย์ Google
อาจารย์ Wiki pedia
อาจารย์ Messenger

สิ่งที่ควรจะรู้สำหรับวันนี้มี 2 อย่างคือ

คุณรู้อะไรลึก ๆ เข้าใจ และเป็นเทพ
เพื่อนคุณรู้อะไรที่ลึก ๆ เข้าใจ และเป็นเทพ

แน่นอน คุณต้องสร้างเครือข่ายของเพื่อน ที่ต้องรู้ให้ได้ว่าควรถามสิ่งที่ต้องการรู้จากเพื่อนคนไหน มากกว่าตั้งหน้าตั้งตา ศึกษาทุกเรื่องเอง

Friday, July 10, 2009

mirror with other than wget, httrack

Normally when i need to mirror a whole or a part of web site i used wget. But when the session or cookies authenticated we can use wget, But it sometime is not working as well.

In my head, i have to think to write down my own Python Script to do that. But luckily we have firefox!

Why FireFox?

Yes it's free and easy way to do that. After that you have to download add-on. I recommend ScrapBook. It's my favorite.

Wednesday, July 8, 2009

มหาวิทยาลัยแห่งประเทศไทย Thailand University

เรื่องของเรื่องก็คือมีคนที่กลับมาจากอเมริกา เค้าเล่าให้ฟังเรื่องแนวคิดนี้ครับ เลยอยากจะนำมาลงไว้ที่นี่สักหน่อย เพราะว่าแนวคิดเข้าท่าเข้าทางดี

ที่อเมริกา ถ้าผมพูดถึง MIT, Standford , Hardvard , Cornell ทุกท่านคงรู้ว่่ามหาวิทยาลัยเหล่านี้มีชื่อเสียงด้านไหนบ้าง

แต่ถ้าให้ประเทศไทยตั้งมหาวิทยาลัยแห่งชาติ ผมขอบอกว่า ที่ผมชอบและคิดว่าน่าเหมาะสมมีดังนี้

มหาวิทยาลัยการเกษตรแห่งภูมิภาคเอเซียตะวันออกเฉียงใต้ แบ่งภาควิชาได้ดังนี้
การเกษตรบนพื้นที่สูงบนภูเขา
การเกษตรบนที่ราบลุ่มแม่น้ำเจ้าพระยา

คุณคิดว่าอย่างไร เพราะว่าถ้าผมต้องบินไปเรียนเกษตรกรรม เพื่อรับปริญญาบัตร ที่อเมริการ คงพิลึก

Monday, June 29, 2009

Postfix กับการส่ง email ด้วยหลาย ๆ Relay

---+------------------------+

---| relay ISP

---| 202.25.25.25:25

---+------------------------+

----------^

----------|

+-------------------------+

| 202.25.25.24

| postfix

| 10.0.0.1

+------------------------+

----------|

----------v

---+-------------------------+

---| relay LOCAL

---| 10.0.0.25:25

---+-------------------------+

Normally when you need to send massive emails via shared gateway. The possibility way is to set your gateway to relay to ISP's smtp. But what happen if a huge destination emails is in local network. You have to send every local email by local relay?

Please try this

# Create a hash transport map

# touch /etc/postfix/transport

# Add this lines to /etc/postfix/transport

mydomain.com relay:10.0.0.25:25

# Create Map

#postmap /etc/postfix/transport

# Add your transportation to /etc/postfix/main.cf

# postconf -e "transport_maps = hash:/etc/postfix/transport"

# Then reload your postfix

#/etc/init.d/postfix reload

Monday, June 22, 2009

เรื่องที่แก้กันไม่ขาดซักที

วันนี้ฟังข่าวว่า มีประเทศแถบตะวันออกกลางที่เป็นเจ้าของบ่อทองสีดำ (น้ำมัน) เค้ากำลังพยายามจะมา ซื้อ/เช่า ที่เพื่อการปลูกข้าว แล้วส่งไปยังประเทศของเขา

นึกแล้วก็ไม่เข้าใจเหมือนกันว่า เรายอมให้ทำอย่างนั้นได้อย่างไร ทำไม เราไม่ปลูกแล้วก็ขาย

คนไทยเรากำลังจะแย่หละครับ เพราะว่าการศึกษาเราไม่ทันเขา ย่อมจะเป็นรองในหลาย ๆ ด้าน

การพัฒนาที่ดีน่าจะเริ่มจากการศึกษามากกว่า แต่ว่าเราช้าไปหลายก้าว หรือไม่ก็กำลังเดินถอยหลังด้วยซ้ำ ทุกวันนี้เรากำลังมองหลังของ เกาหลีใต้ มาเลเซีย สิงคโปร์ และ กำลังจะ เวียดนาม เรามัวแต่พัฒนาด้านอื่นอยู่ จนไม่ได้คำถึงถึงลูกหลายภายหลังเลย

ทุกวันนี้ถ้าเราเดินทางไปต่างจังหวัด และแอบไปถามเค้าว่า ที่ตรงนี้เท่าไหร่ แน่นอนว่าเค้าจะเสนอราคาให้อย่างงาม หลาย ๆ คนจะบอกว่าเหมาะแก่การนำไปสร้าง อันโน้น อันนี้ ซึ่งชาวบ้านส่วนใหญ่อยากได้เงินก้อนอยู่แล้ว จะทำนำ ทำสวนกันไปทำไม ให้เหนื่อย สู่เอาเงินมากอดเลยไม่ดีกว่าหรือ ...

ความผิดเกิดจากคนที่ไม่เสนอความรู้ให้ชาวบ้าน ชาวนา ต่างหาก เราควรจะบอกเค้าว่า ที่แห่งนี้ ไม่ได้มีค่าแค่เงินทองเท่านั้น แต่มันเป็นมรดกของชาติ และเป็นขุมทองที่แท้จริงของโลก

ประเทศไทยตั้งอยู่บนพื้นที่ที่อุดมสมบูรณ์ไปด้วยทรัพยากรธรรมชาติทุกอย่าง มีทรัพย์ในดินสินในน้ำ มีน้ำมันที่ไม่มีวันหมด ถ้ามีพื้นดินอยู่

ว่ากันมานาน สิ่งที่จะบอกให้ท่านที่มีอำนาจ หรือมีกำลังในการจัดการคือ

แก้ปัญหาน้ำท่วม ที่มักจะเกิดทุกปี ไม่ใช่พอถึงฤดูกาลก็ออกมาหาเสียงเป็นครั้ง ๆ ไป ตัวอย่างที่ดีก็มีให้เห็นแล้วไง "แก้มลิง" ถ้าไม่ได้ท่านเราลำบากไปนานแล้ว
น้ำมันเพื่อการเกษตรกรรม "ดีเซลล์พลังธรรมชาติ" ก็ท่านอีกนั่นแหละที่ชี้ทางให้ แต่มีซักกี่คนที่นำมาทำต่อ มีแต่คนบอกว่า ถ้าเอาพึชพันธ์มาทำน้ำมันจะทำให้การบริโภคคลาดแคลน (นั่นมันคำฝรั่ง) เค้าขาดแคลน แต่เราจะมีไว้ใช้ตลอด เราใช้เหลือค่อยเอาไปขายเขาสิ อย่าเอาแต่ขายเขาแล้วเราก็ซื้อของที่เราขายเขาอีกที
ข้าวไทยไปไกลทั่วโลก แต่เชื่อหรือไม่ว่า "บางคนไม่รู้จักข้าว เส้าไห้ หอมมะลิ" ผมเห็นด้วยกับความคิดที่ว่า เราต้องมีกินมีใช้ก่อน ค่อยคิดจะค้าขายไปให้คนอื่น หรือนำสิ่งที่เราพอมีเหลือไปขายหรือแลกเปลี่ยนเป็นสิ่งที่เค้ามีเหลือเหมือนกัน และคุณก็รู้ว่านี่เป็นความคิดของใคร
เรามีเด็กที่แข่งขันโอลิมปิกชนะมากมาย แต่ถ้าเทียบกับจำนวนนักเรียน นักศึกษาทั้งประเทศแล้ว มันเหมือนมดตัวเดียว ผิดกับบางประเทศที่เขามีนักเรียนที่ชนะน้อย แล้วกลับไปดูประเทศเขา มีอย่างที่มาแข่งขันกันครึ่งประเทศ
เราบ้าคนต่างประเทศมากเกินไป คุณรู้ไหมว่า เราก็เป็นที่หนึ่งในเรื่องง่าย ๆ เช่น ทอดไข่ดาว ไข่เจียว (เพื่อน ๆ ฝรั่งผมมันไม่รู้จักด้วยซ้ำ) แต่เราดันไปแข่งเขาทำ แซนวิด ซะนี่
มีคนบอกว่า "เด็กสมัยนี้ไม่รักวัฒนธรรมไทย" แต่ผมว่าเป็นเพราะผู้ใหญ่ไม่ปลูกฝังมากกว่า ดูจากผู้ใหญ่นั่นแหละที่เป็นคนนำวัฒนธรรมต่างชาติ เช่น หนังเกาหลี การเต้น การร้อง เข้ามา ลองไปถามเด็กดูสิว่า เค้าได้เคยได้ยิน "มัสมั่นแกงแก้วตา หอมยี่หร่ารสร้อนแรง"

Friday, June 19, 2009

อนาคตที่ฝันไว้

คุณเคยมีอนาคตที่ฝันไว้หรือเปล่า

ของผมเป็น ทำงานหลักเป็นชาวสวน ส่วนงานรองเป็นงานด้านความปลอดภัย IT และวิเคราะห์ข้อมูลระดับประเทศไป แต่สุดท้ายการสร้างหนังก็สนุกดี

แล้วคุณหละ

Monday, June 15, 2009

กิจกรรมที่เกิดภายใน Data Warehouse

นอกจากจะมีการ Extract-Transform-Load ที่เกิดกับ Data warehouse แล้ว ยังมีอีกหลายกิจกรรมที่จะต้องคำนึงถึง ตัวอย่างเช่น Periodically analyze

"The data warehouse is analyzed periodically by a program that examines relevant characteristics and criteria. The analysis then creates a small files in the online environment that contains succinct information about the business of the enterprise."

การตูน Regular Expression

ค้นหาข้อมูล Data mining อยู่ ดี ๆ ก็เจอหนังตลกซะงั้น

ฮา ดี

Friday, June 12, 2009

Java กับผมไม่ถูกกันจริง ๆ

ตอนนี้นำ Pentaho มาติดตั้ง ด่านแรกก็ไม่ผ่านแล้ว เจ้า Tomcat มัน Start ไม่ได้สักที

Updated -- 9 July 2009

หลังจากปล้ำกันมานาน ก็ค้นพบว่ามันเป็นเพราะ Java HOME นั้นเอง

Thursday, June 11, 2009

Pentaho - Introduction

Data Warehouse, Data Mart, Data Mining, Migrating Data, Exporting Data, Loading Data, Data Cleansing, Integrating Application, Business Intelligence

ถ้าคุณเป็นคนที่รู้จักคำเหล่านี้ คุณกำลังต้องการเครื่องมือ และวันนี้เครื่องมือที่ผมจะแนะนำคือ Pentaho

Pentaho Data Integration หรืออีกชื่อหนึ่งคือ Kettle เป็นเครื่องมือสำหรับทำ ETL Extract-Transform-Load ประกอบด้วยการทำงานแบบลากแปะ กำหนด Input และ Output ของแต่ละส่วน สามารถใช้ได้กับทั้งไฟล์ และระบบฐานข้อมูล และที่สำคัญคือ มันเป็น Opensource ที่เราสามารถเข้าไปแก้ไขส่วนต่าง ๆ เพิ่มเติมได้

ETL Extract-Transform-Load

หากต้องการวิเคราะห์ข้อมูลระดับ 10,000 ข้อมูล อย่างเป็นระบบและง่ายต่อการเปลี่ยนมุมมองแล้ว ผมก็ยังเห็นว่าทำเป็น Data Base แล้วเราก็ทำการ Query มา น่าจะเป็นวิธีที่ดีที่สุด แต่

มาตอนนี้ ก็เข้าปีที่ 9 แล้ว มันมีเครื่องมือมาช่วยเราแล้ว

เมื่อก่อนทำ ETL ด้วยวิธีมือ เพราะว่าเป็นพวก Technical Guy สรุปเป็นขั้นตอนดังนี้

ถ้าไฟล์มาเป็น excel (ส่วนมากเป็นอย่างนั้น) เราก็ save เป็น csv แล้วก็นำเข้าฐานข้อมูล
จากข้อมูลที่ได้เราก็ทำการวิเคราะห์ ว่าอยากรู้อะไรจากข้อมูลที่ได้
นำข้อมูลนั้นออกมาเป็นแผนภูมิ ต่าง ๆ ให้เหมาะสมกับแต่ละงาน
แล้วก็ย้อนกลับไปทำข้อ 1 อีก ถ้ามีการเปลี่ยนแปลงข้อมูลใหม่ ๆ

ดูขั้นตอนแล้วจะเห็นได้ว่า มันเป็นงานมือที่เหนื่อยเอาการ แน่นอน เราได้ยินคำว่า "Data Warehouse" มานานแล้ว และมันก็เป็นการแก้ปัญหานี้ได้ดีทีเดียว สิ่งที่ผมดำเนินการมาเกือบทั้งหมดถูกแทนที่ด้วย Application ซะแล้ว ลองไปหาอ่านดูนะครับ

http://community.pentaho.com

ผมรับรองเลยว่าทุกคนสามารถให้ application นี้เป็น แต่หลายคนคงติดตรงที่ว่า สิ่งที่คุณนั้นได้คืออะไรกันแน่

data
information
knowledge

แน่นอน สำหรับคนที่เป็น Technical ทั้งหลายผ่านตรง data/information มาได้ แต่ก็จะติดตรง knowledge เพราะว่ามันคือความเข้าใจกับ ข้อมูลที่ได้มา แล้วก็นำไปใช้ให้เกิดประโยชน์นั่นเอง

บทความต่อไปจะกล่าวลึกไปยัง Pentaho ทีละส่วน นะ

Monday, June 8, 2009

ประเพณี ซ้อมเชียร์ จุดประสงค์อะไรกันแน่

ต้นเดือนเดินทางกลับไปที่มหาวิทยาลัย ดันพอดีกับวันแรกของการเปิดเทอม สิ่งที่ข้าพเจ้าเห็นแล้วสลดใจคือ ประเพณีการรับน้องของรุ่นพี่ ๆ นั่นแหละ แน่นอน ผมเกลียดการรับน้องแบบนี้ ถึงแม้ว่ามันจะเป็นประเพณีที่ส่งต่อกันมาเป็นรุ่น ๆ ก็ตาม

ถึงแม้ว่าผมจะไม่ชอบประเพณีนี้ แต่ผมก็เป็นคนหนีึ่งที่เข้าร่วมประเพณีนี้ชนิดว่า ไม่เคยขาดซ้อมเลย

ท่านเคยถามรุ่นพี่ หรือรุ่นเดียวกัน หรือเพื่อนข้าง ๆ หรือไม่ว่า "เค้าซ้อมเชียร์กันไปทำไม" มาดูคำตอบกัน

เพื่อนผมตอนนั้นมันบอกว่า "ต้องเอาคืน!" (ตอนนั้นอยู่ปี 2 )
รุ่นพี่อีกนิดบอกว่า "น้อง ๆ จะได้รักกัน"
รุ่นเก่า ๆ จะบอกว่า "ทำให้รุ่นนี้มีความรักกัน และความเป็นหนึ่งเดียว"
รุ่นเก๋า ๆ จะบอกว่า "เค้ากันมาอย่างนั้นนิ"
รุ่นอาจารย์จะบอกว่า "มันยังทำกันอย่างนี้อีกหรือ" ประเภทว่านับกันเป็นแค่ 3 กับ 4

แต่สำหรับเหตุผลที่ผมคิดว่าน่าจะใช่คือ "เพื่อให้น้อง ๆ ที่เข้ามาเรียนที่สถาบันเดียวกันมีความสนิทสนม และสามัคคีกันมากขึ้น"

ในความคิดผมนั้นไม่ใช่ว่า ดีหรือไม่ดี แต่ผมกลับคิดว่า "เด็กระดับมหาวิทยาลัย" น่าจะมีความคิดมากกว่าการซ้อมเชียร์ไปจนครบกำหนด แล้วก็เลิกกันไป ต่างคนต่างเรียน พวกเรียนได้ก็เรียนไป พวกเรียนไม่ได้ก็หาเรียนกันใหม่

"มันควรมีอะไรมากกว่านั้น คุณว่าไหม!"

เช่น

เข้าค่ายพัฒนาชุมชน หรือมหาวิทยาลัย
จัดกิจกรรมของมหาวิทยาลัย เพื่อส่งเสริมด้านวิชาการ โดยให้น้อง ๆ ช่วยกันออกความคิด และลงมือปฏิบัติ
กีฬาระหว่างคณะ (แต่ไม่ใช่เชียร์ระหว่างคณะ)
กิจกรรมทัศนะศึกษามหาลัย (เชื่อหรือไม่ว่า เรียนจบแล้วก็ยังไม่รู้ว่าสนามบาสมันอยู่ที่ไหน หรือว่ามันมีทางลัดไปกินข้าวหลังมหาวิทยาลัย ที่อร่อยกว่าในมหาวิทยาลัยด้วย)

มันก็แปลกว่ารุ่น ถัด ๆมา ไม่สามารถเปลี่ยนแปลงส่ิงเหล่านี้ได้ แต่ดันไปทำให้มันมีความคิดออกนอกกรอบที่ไม่ดีไปกัันใหญ่
ประเพณีส่วนย่อยยังไม่ดีเลย แล้วประเพณีระดับประเทศเช่น การบ้างการเมือง จะปรับปรุงดีขึ้นได้อย่างไร หากเด็ก ๆ ยังไม่ได้มีส่วนคิดเลย

Friday, May 29, 2009

ปัญหากับ public key ของ ETCH

หมู่นี้เจอปัญหานี้บ่อย ขอจดวิธีการจัดการไว้หน่อยเถอะ

=========== ต้นตอ aptitude update ==============

Hit http://ftp.debianclub.org etch Release
Err http://ftp.debianclub.org etch Release

Get:2 http://ftp.debianclub.org etch Release [67.8kB]
Ign http://ftp.debianclub.org etch Release
Ign http://ftp.debianclub.org etch/main Packages/DiffIndex
Ign http://ftp.debianclub.org etch/main Sources/DiffIndex
Hit http://ftp.debianclub.org etch/main Packages
Hit http://ftp.debianclub.org etch/main Sources
Get:3 http://security.debian.org etch/updates Release.gpg [1032B]
Hit http://security.debian.org etch/updates Release
Ign http://security.debian.org etch/updates/main Packages/DiffIndex
Ign http://security.debian.org etch/updates/main Sources/DiffIndex
Hit http://security.debian.org etch/updates/main Packages
Hit http://security.debian.org etch/updates/main Sources
Fetched 68.8kB in 2s (23.2kB/s)
Reading package lists... Done
W: GPG error: http://ftp.debianclub.org etch Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 9AA38DCD55BE302B
W: There is no public key available for the following key IDs:
9AA38DCD55BE302B
W: You may want to run apt-get update to correct these problems
==========================

เมื่อเจอดังนี้ก็ต้องไปหา public key มา

# gpg --keyserver wwwkeys.eu.pgp.net --recv-keys  9AA38DCD55BE302B
# gpg --armor --export 9AA38DCD55BE302B | apt-key add -
# aptitude update

เท่านี้ก็เรียบร้อย

Data integration Case Study

การใช้ Data Integration กับ หลาย ๆ ลักษณะงาน

ทางด้าน Traffic

At the Traffic Centre of the Flemish governement, they have an application called the Traffic Control Centre that they use to monitor the state of the road network in Flanders. Every minute data is entering the system from more then 570 locations on the highways. However, because of the high volume of data this represents a problem towards the analyses of traffic situations. For that reason, a decision was made to go for the creation of a data warehouse. This data warehouse puts data in a multi-dimensional data model to allow the combination of different types of data against common dimensions. For example, it becomes possible to look the speeds measurements on a certain point on a road together with the weather conditions in that region. Long term storage is also a part of the objective. Because a data warehouse is by definition a historical database, this part would be easy if not for the large data volumes involved in this case. The largest fact table of the Traffic Centre's data warehouse contains above 1 bilion rows for 13 months worth of historical data. Kettle is used in this project to handle the data acquisition on the Traffic Control Centre application, as wel as the update of the dimensions and fact tables in the data warehouse. Furthermore, detailed logging is used to check for errors in the jobs that are launched every 15 minutes. (96 times a day!) So far the project is running for almost a year without a problem.

การเงิน
Financial institutions have to deal more and more with increasing regulations and obligations. One of the new accounting rule-sets that are coming their way is called "Basel II". One interesting part of "Basel II" says that banks need to keep a certain standard reserve for bad loans. However, if the bank in question can prove that they have a better then average loans portofolio, they can lower the percentage of money that they have to keep in reserve. This is called the Internal Rating Based (IRB) calculation. So, instead of letting money sit idle in the reserve, this money can be used to generate new income from other banking operations. However, how do yo prove that your customers are better than average? Well, you can use a data warehouse to do this. The data warehouse needs to acquire information from different parts of the lending process, from the request for a loan, over the acceptance to the processing of the monthly payments. A lot of information is also uploaded in the warehouse about the customers info and the different kind of products he has. For example if a customer has a lot of savings, this would lower the risk of a bad loan. A data warehouse can then excell in the creation of reports that combine all this info, providing a relative easy solution to this complex problem. The use of a data warehouse is also encouraged in this case because the IRB system has to have historical data for the past 6 years in order to be valid.

การตลาด

This is one area where a business intelligence system can really make a difference. That is because a direct marketing manager can make a lot of use of a data warehouse to report on his customers. For example if he has a budget to send a mailing to his 1000 best customers, he needs to select these 1000 customers. To determine what his best customers are, the manager wants to use Recency, Frequency and Monetary value (RFM) as parameters. These parameters are determined in the data warehouse by first gathering all the sales data in a sales fact table. (sales per customer per product). Every month, the data warehouse looks at all the customers and looks in the sales fact table and determines

how long it's been since the customer ordered? (Recency)
how many times he ordered in the last year? (Frequency)
the total amount of products ordered in that year? (Monetary value)

Based on this new fact table containing the RFM data, the manager can then make a better choice of customers based on the marketing principles of RFM. In short RFM means: a customer that orders frequently and has ordered not long ago and for a lot of money is more likely to do so then others in the future.As a consequence, if you select the 1000 customers based on RFM, you are likely to get a better return on investment then by selecting 1000 random customers. Departing from the sales and RFM fact tables, you can segment your customers into types like 'new customer', 'very good customer', '...' that eases the selection even more.

มันมีประโยชน์อย่างนี้นี่เอง ...

Tuesday, May 26, 2009

Data warehouse - Chapter 1 - evolution of decision support system

วิวัฒนาการของระบบสนับสนุนการตัดสินใจ

"หากเรารู้ว่าเท คอนกรีต อย่างไร เจาะอย่างไร และรู้ว่าขันน๊อตอย่างไร เมื่อเราสร้างสะพาน เราสามารถสร้างได้โดยไม่ต้องคำนึงถึงรูปร่างหรือการใช้งานของสะพานที่เราจะสร้าง"

แน่นอนว่ารายละเอียดของข้อมูลเป็นสิ่งสำคัญของคลังข้อมูล แต่คลังข้อมูลจะต้องสร้างจากสถาปัตยกรรมอันเดียวที่มองจากภาพกว้างเป็นหลัก จากนั้นค่อยย่อยลงไปที่ละส่วน แต่รายละเอียดนั้นจะถูกมองอีกทีเมื่อมองจากภาพกว้างแล้ว

Data warehouse เกิดจากการรวมกันของข้อมูล application ที่เรียกว่า Operational data โดยอาจมีการ integrate มาจากหลายระบบ ซึ่งขบวนการดังกล่าวมีความซับซ้อนและเป็นงานที่กินเวลานาน

การพัฒนา Data warehouse มีความแตกต่างจากการพัฒนาระบบ application อย่างสิ้นเชิง การพัฒนา application นั้นพัฒนามาจากหลักการของ SDLC (Software Development Life Cycle) แต่ Data warehouse พัฒนาด้วยหลักการของ Spiral Development

ผู้ใช้งาน Data warehouse นั้นมีความต่างจากผู้ใช้งานในระบบทั่วไป เพราะขบวนการคิดของคนเหล่านี้จะมองจากการค้นหาก่อน ซึ่งไม่เหมือนกับการกำหนดความต้องการ requirement ขึ้นมาก่อน ประโยคนี้อธิบายความหมายของ user นี้ได้ดี "Give me what I say I want, and then I can tell you what i really want."

Monday, May 25, 2009

หนังสือเล่มต่อไป

http://www.tpabookcentre.com/catalog/images/products/smT0413.jpg

ชื่อผู้แต่ง: Mint ( A Society for the Study of Management and Information Technology) ประเทศญี่ปุ่น
ชื่อผู้แปล:ดร.สมชาย กิตติชัยกุลกิจ
ISBN: 9744431555
BARCODE: 9789744431554
ปี: 2005
พิมพ์ครั้งที่: 1
ขนาด: B5
จำนวนหน้า: 316

เนื้อหา ไม่เหมือนหนังสือทั่วไป เพราะไม่เน้นการอธิบายทฤษฎี (ที่ไม่ได้นำมาใช้งานจริง) แต่เป็นการเล่าเรื่องง่าย ๆ แต่ครอบคลุมความรู้และเทคนิคสำหรับใช้ในงานจริง รวมถึง
- ซอฟต์แวร์เอ็นจิเนียริง
- โมเดลในการพัฒนา
- การวิเคราะห์และการออกแบบ
- เทคนิคเชิงโครงสร้างและ Object Oriented
- สิ่งแวดล้อมในการเขียนโปรแกรม
- คุณภาพของซอฟต์แวร์
- กระบวนการและเทคนิคในการทดสอบ
- แนวโน้มของการพัฒนาซอฟต์แวร์

ที่อยากอ่านไม่มีอะไรมาก แค่ว่าญี่ปุ่น เท่านั้นเอง

Thursday, May 21, 2009

Data Warehouse บทที่ 0

หลังจากหาหนังสืออ้างอิงภาษาไทยอยู่นาน วันนี้ก็ซื้อหนังสือมาเล่มหนึ่งชื่อว่า "Building the Data Warehouse" และแน่นอนว่าหนาพอดู

จากการอ่านบทนำนั้นหนังสือเขียนไว้ว่า "Father of Data Warehouse is Inmon"

Tuesday, May 5, 2009

คุณจำผลลัพธ์ หรือ วิธีหาผลลัพธ์ กันแน่

ฉุกคิดได้ว่า เวลานึกอะไรที่ผ่านมาเนิ่นนานแล้ว ผมจะนึกการวิธีการเข้าไปหามัน แต่ไม่ใช่นึกว่ามันคืออะไร แล้วคุณหละ?

ผลดีของการคบกับผู้ใหญ่ไว้บ้าง

ผมนี่ก็คบกับผู้ที่อาวุโสมากกว่าหลายคน ส่งที่ผมพึ่งจะนึกได้ก็คือ สิ่งที่ได้จากการมีเพื่อนหรือที่ปรึกษาที่มีความอาวุโสมากกว่า นั้นคือ ประสบการณ์นั่นเอง

เพราะว่ามันหาไม่ได้ตามหนังสือหรือส่ืออ่ื่นๆ มันตอบโต้ไม่ได้ ดังนั้นสื่อนี้จึงถือว่าเป็นองค์ความรู้ที่มีประโยชน์

Sunday, March 29, 2009

การเริ่มต้นไม่มีที่สิ้นสุด ความรู้ที่ผมมีมันควรได้รับการถ่ายทอดบ้าง

ผมกำลังดำเนินการ

เริ่ม

ถ่ายทอดความรู้ที่มี

Friday, March 27, 2009

ดูไบ กับ ประเทศไทยขณะนี้

เมื่อคืนดูรายการ ชีพจรโลก ของคุณสุทธิชัย หยุ่น เค้าหยิบเรื่อง ดูไบ มาเสนอ

เวลาพูดถึงดูไบ ผมจะนึกถึงสายการบินอาหรับเอมิเรต แต่นึกไม่ออกว่า ดูไบ เป็นประเทศอย่างไร พอดูรายการนี้จึงทำให้เราเข้าใจประเทศนี้มากขึ้น

ดูไบเป็นประเทศที่ส่งออกนำ้มัน (ที่ผมเข้าใจผิดมาแต่แรก) แค่ 6% ของ GDP แต่ส่งที่ดูไบ ได้ดำเนินการในขณะนี้คือ อสังหาริมทรัพย์ ไม่น่าเชื่อว่า เค้าทำการถมทะเลเพื่อทำเกาะ มีทั้งเกาะที่เป็นรูปต้นปาล์ม แผนที่โลก หรือแค่กระทั้งแผนที่ จักรวาล

สิ่งที่ผู้นำดูไบในตั้งวิสัยทัศน์ไว้คือ "ดูไบ ส่งออกนำ้มันอย่างเดียวไม่ได้ เพราะว่าบ่อนำ้มัน มันเล็ก ดังนั้นดูไบ ต้องมองหาสิ่งใหม่ ๆ เพื่อรองรับอนาคตเสมอ" และนั้นก็คือการเป็นศูนย์กลางของตะวันออกกลางนั่นเอง เวลาในการเดินทางจากยุโรปมา ตะวันออกกลางคือ 6 ชั่วโมง เวลาเดินทางจากตะวันออกกลางไปเอเซียประมาณ 6 ชั่วโมง

แน่นอนเมื่อเป็นศูนย์กลางของตะวันออกกลางนั้น หมายความว่าเค้าจะต้องมีสาธารนูปโภคไว้ให้บริการด้วย โรงแรมเกิดกลางทะเล ที่ได้ทำการสร้างเกาะไว้เป็นจำนวนมาก 5 ดาว น่าจะไม่พอสำหรับโรงแรมที่นี่

สิ่งที่ผู้นำดูไบ กล่าวไว้คือ "นี้เป็นแค่ 10% ที่กำลังจะเกิด" อีก 100 ปีข้างหน้าดูไบจะเป็นยิ่งกว่านี้

ผมหละทึ่ง!!! คุณว่าไหม

เมื่อก่อนไม่รู้หรอกว่า การเดินทางไปต่างประเทศนั้นสำคัญอย่างไร แต่พอมีรายการต่าง ๆ ที่พาไปดูโน้นดูนี่เยอะ ๆ แล้ว มันทำให้ผมรู้่ว่าคนเราต้องเดินทาง ยิ่งเดินทางมาก ยิ่งรู้มาก เหมือนรายการ princess diary ได้กล่าวไว้

ไม่รู้ว่าคิดรวมกันกับการไปดูงานของหน่วยงานต่างๆ ได้หรือไม่ยิ่งข้าราชการ หรือการเดินทางเพื่อไปเที่ยวแบบไม่ค่อยได้อะไรกลับมาพัฒนาเท่าไหร่

หันกลับมาดูเมืองไทยวันนี้ หรือ ความจริงวันนี้ ผมเห็นประเทศแล้วผมสงสาร ลูกหลาน ของประเทศ ที่ต้องมาคอยรับผลของการกระทำทุกวันนี้ ผมไม่รู้ว่าเค้าคิดอะไรกัน แต่แน่นอน ผลที่เกิดขึ้นไม่ดีอย่างแน่นอน มันเหมือนว่า "ทุกวันนี้เราไม่ได้ทำลายตัวเอง แต่เรากำลังทำลายอนาคต กำลังใช้ทรัพยากร ของลูกหลานเราต่างหาก"

อ้อ คุณจะรู้สึกอย่างไรถ้าเจออย่างผม เปิดรายการชีพจรโลก ดูการพัฒนาของดูไบ แล้วในเวลาเดียวกันก็เห็นรายการ สารคดีข่าว บอกว่า "วันนี้พบรอยพญานาค ที่วัดโน้นวัดนี้"

ผมยังมองไม่ออกว่า อีก 100 ปีข้างหน้าประเทศไทยจะเป็นอย่างไร ที่ดูไบ เค้าคิดว่าเค้าจะไม่มีน้ำมันขายแล้ว แล้วบ้านเราหละคิดหรือยังว่า ถ้าไม่มีทรัพยากร ไม่มีพื้นดินที่อุดมสมบูรณ์แล้ว เราจะทำอะไร

ผมรักประเทศไทย และก็ยังคงทำอะไรบ้างเพื่อประเทศแล้วคุณหละ

Sunday, March 15, 2009

การรับงาน Software Development ของผู้พัฒนา

นอกจากนักพัฒนาระบบจะต้องพัฒนาระบบให้สำเร็จแล้วนั้น ยังมีส่วนที่นักพัฒนาระบบมักจะลืมอยู่ คือ งานเอกสาร ซึ่งงานเอกสารนี้เองที่กินเวลามาก และบางครั้งกินเวลามากกว่าการพัฒนาระบบเสียอีก

บ่อยครั้งที่พนักงานขายมักจะลืมคิดประเด็นนี้ไป และใช้ส่วนนี้เป็นส่วนลดราคาของงาน หากได้มีข้อตกลงกันไว้แล้วกับลูกค้าก็ดีไป แต่หากไม่หละก็ ผมบอกได้เลยว่า นาน ๆ ๆ ๆ

เอกสารมีอะไรบ้างมาดูกัน

เอกสารก่อนตกลงซื้อขาย

เอกสารเสนอราคา
เอกสารแนะนำระบบ Presentation ทั้งหลายนั้นแหละ
เอกสารความสามารถของระบบ (Features list)
เอกสารความต้องการของระบบ (System Requirement)
เอกสารข้อกำหนดก่อนเริ่ม (Preliminary)
เอกสารของเอกสาร (Request for proposal)
เอกสารคู่เทียบหากต้องการเทียบกับระบบอื่น (อันนี้น่าจะเป็นหน้าที่ของลูกค้า คุณว่าอย่างผมหรือเปล่า)

เอกสารระหว่างดำเนินการ

ตารางเวลาของการพัฒนาระบบ
ตารางเวลาทดสอบระบบ
ตารางเวลาการอบรม
ตารางเวลาใช้งานระบบ
เอกสารด้านบัญชี เช่น การแจ้งหนี้ การเก็บงวดต่าง ๆ

เอกสารหลังจากดำเนินการเสร็จ

เอกสารรับระบบ
เอกสารรับประกันระบบ

เท่าที่นึกได้นั้น มีเท่านี้ แต่แค่นี้ก็นึกออกแล้วหรือไม่ว่า งานทางด้านเอกสารนั้นเป็นงานที่ค่อนข้างหนักเอาการ

ดังนั้นหากต้องการหลีกเลี่ยงความสับสน และวุ่นวายต่าง ๆ แนะนำให้รีบดำเนินการไว้ก่อนได้เลย ยิ่งทำเป็น online document ได้ยิ่งดี ปล่อยให้ลูกค้าดำเนินการศึกษาเองบ้าง อย่าไปทำให้ทั้งหมด เพราะบ่อยครั้งที่ระบบที่พัฒนาจากฝีมือคนไทยนั้น ทางลูกค้าไม่ยอมทำการบ้านเองบ้าง แต่พอเป็นระบบที่มาจากต่างประเทศ ลูกค้ากลับศึกษาเองได้

ฝากด้วยนะครับ