<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>en.credativ blog: Category PostgreSQL</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/" />
    <link rel="self" type="application/atom+xml" href="http://blog.credativ.com/en/atom.xml" />
    <id>tag:blog.credativ.com,2010-03-05:/en//2</id>
    <updated>2010-09-20T13:56:42Z</updated>
    <subtitle>All about Linux and Open Source</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.34-en</generator>

<entry>
    <title>Open Source lives - PostgreSQL developers at credativ</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/09/open-source-lives---postgresql-developers-at-credativ.html" />
    <id>tag:blog.credativ.com,2010:/en//2.175</id>

    <published>2010-09-20T14:00:34Z</published>
    <updated>2010-09-20T13:56:42Z</updated>

    <summary> Earlier this year, blogger and PostgreSQL Committer Andrew Dunstan drew up a list of individual Committers to the PostgreSQL Project. We are proud to say that this list featured some of our employees. In May, PostgreSQL&apos;s Andrew Dunstan published...</summary>
    <author>
        <name>Irenie White</name>
        <uri>http://www.credativ.co.uk</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
<em>Earlier this year, blogger and PostgreSQL Committer Andrew Dunstan drew up a list of individual Committers to the PostgreSQL Project. We are proud to say that this list featured some of our employees.</em><br />
<br/><br />
In May, PostgreSQL's Andrew Dunstan published some data about the productivity of PostgreSQL Committers at <a href="http://people.planetpostgresql.org/andrew/index.php?/archives/79-30,000-commits-and-still-going-strong.html/">30,000 commits and still going strong</a>, detailing the number of commits made by developers with commit rights. Incidentally, becoming a Committer is no mean feat; although there is no set procedure for acquiring the right to commit, it will generally follow a candidate having sent numerous good patches over a long period of time. Existing Committers, or the core team will then propose and approve assigning Committer's rights to the candidate. </p>

<p>credativ can claim involvement with many other Open Source Projects in addition to PostgreSQL. Community involvement is taken seriously at credativ, as is evident from Andrew Dunstan's statistics. A few of the Committers mentioned work at various international credativ offices; Michael Meskes, Joe Conway and Dave Cramer. What is not clear from Dunstan's list is the number of credativ employees who contribute large amounts of code but are not actually Committers; take Bernd Helmle, for example, who readers of this blog will be familiar with from his <a href="http://blog.credativ.com/en/postgresql/">PostgreSQL articles</a> not only as author but also as a developer, yet he does not feature in Andrew's statistics.</p>

<p>Nevertheless credativ's presence on this list is indicative of our achievements as well as our employees' connections with Open Source; if you would like to know more about our Open Source involvement simply leave us a comment here... and if you are interested in <a href="http://www.credativ.co.uk/services/support/">Open Source Support</a>, please <a href="http://www.credativ.co.uk/contact/">contact us</a>.<br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - which configs require restart?</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/09/tip-postgresql-tip-of-the-day---which-configs-require-restart.html" />
    <id>tag:blog.credativ.com,2010:/en//2.190</id>

    <published>2010-09-11T01:12:19Z</published>
    <updated>2010-09-11T01:32:34Z</updated>

    <summary> I&apos;ve been asked on at least three separate occasions lately how to know if changing a particular postgresql.conf item requires a restart, or a reload, of PostgreSQL. Here is my quick and dirty favorite way to answer this question:...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="plpgsql" label="plpgsql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sequences" label="sequences" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="http://blog.credativ.com/en/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
I've been asked on at least three separate occasions lately how to know if changing a particular postgresql.conf item requires a restart, or a reload, of PostgreSQL. Here is my quick and dirty favorite way to answer this question:<br />
<br/></p>
<pre class='brush: sql'>
-- configs requiring postgresql restart
select name, setting, context
  from pg_settings where context = 'postmaster';

-- configs requiring postgresql reload
select name, setting, context
 from pg_settings where context = 'sighup';
</pre>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - PL/R performance improvements</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-topic-of-the-day---plr-performance-improvements.html" />
    <id>tag:blog.credativ.com,2010:/en//2.184</id>

    <published>2010-07-24T18:31:12Z</published>
    <updated>2010-07-24T20:30:16Z</updated>

    <summary>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. A change is being tested which treats arrays of 4 byte integers and 8 byte floating point values as a...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="analytics" label="analytics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plr" label="PL/R" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><img alt="Rlogo.jpg" src="http://blog.credativ.com/en/Rlogo.jpg" width="100" height="76" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. A change is being tested which treats arrays of 4 byte integers and 8 byte floating point values as a special case, resulting in a dramatic performance improvement.</em></p>

<p>In a recent post, I discussed PL/R performance related to seismic timeseries data stored as an array of floats that are all recorded during some seismic event at a constant sampling rate. The problem was that when dealing with, say, 14000 arrays of floats, each having on the order of 16000 elements, passing the data to and from PL/R proved slower than hoped.</p>

<p>My ultimate solution was to show how a significant performance improvement could be achieved by importing the arrays into Postgres tables directly as raw R objects, and then operating on those objects later using PL/R. The problem with this approach is that in some, if not most, cases, you may want to access that same data from other procedural languages or hand off the arrays to some client other than R. In this case the raw R object does not meet your needs.</p>

<p>So I thought about it a bit and researched the source code on the Postgres and R sides of PL/R, and concluded that for certain special cases it was possible to dramatically improve speed by skipping the one-at-a-time element conversion as arrays are processed going between PostgreSQL and R. Specifically, the in-memory storage of the array data is binary compatible in the following circumstances:<br />
<ol><li>pgsql -> R</li><ul><li>Argument is integer or double precision array</li><li>Element data type is pass-by-value for given Postgres version and architecture</li><li>No NULL elements</li><li>Array is one dimensional</li></ul><li>R -> pgsql</li><ul><li>Integer vector returned with integer array return type</li><li>Real vector returned with double precision array return type</li><li>No NA elements</li><li>One dimensional vector</li></ul></ol></p>

<p>Pass-by-value is most likely true for double precision (float8) if PostgreSQL is at least version 8.4 and was built with a 64 bit system architecture. If these conditions are met, PL/R now simply copies en masse the in-memory array data from the PostgreSQL array data structure to the R vector data structure. This avoids all the overhead associated with iterating over the array element by element. Although I am not a fan of special case code such as this, the use case is important (if you are crunching numbers, they are likely stored as double precision elements), and the performance benefit is huge. Here is the timing difference with the patched PL/R versus the unpatched PL/R:<br />
</p>
<pre class='brush: sql'>
DROP TABLE IF EXISTS test_ts;
CREATE TABLE test_ts
(
  dataid bigint NOT NULL,
  data double precision[],
  CONSTRAINT pk_data PRIMARY KEY (dataid)
);

CREATE OR REPLACE FUNCTION load_test(int) RETURNS text AS $$
 DECLARE
  i    int;
 BEGIN
  FOR i IN 1..$1 LOOP
   --16789 double precision elements in the data array
   INSERT INTO test_ts (dataid, data) VALUES (i, '{-0.0205086770285039,...}');
  END LOOP;
  RETURN 'OK';
 END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION
filt_r_nothing(ts double precision[])
RETURNS double precision[] AS $$
  return(ts);
$$ LANGUAGE 'plr' IMMUTABLE;

CREATE OR REPLACE FUNCTION
filt_r_avg(ts double precision[])
RETURNS double precision AS $$
  return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

-- INSERT 14000 rows of 16789 element arrays
SELECT load_test(14000);

-- unpatched code
UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 1224087.064 ms

-- patched code
UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 225591.429 ms

-- unpatched code
contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 441573.619 ms

-- patched code
contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 6541.039 ms

-- unpatched code
select array_upper(filt_r_nothing(data),1) from test_ts;
 array_upper 
-------------
       16879
       16879
[...]
(14000 rows)
Time: 1108651.349 ms

-- patched code
select array_upper(filt_r_nothing(data),1) from test_ts;
 array_upper 
-------------
       16879
       16879
[...]
(14000 rows)
Time: 23101.602 ms
</pre><p><br />
So to summarize:<table><tr color="gray"><td>Test</td><td>Case</td><td>Time (ms)</td><td>Improvement</td></tr><tr><td>UPDATE NOOP</td><td>Unpatched</td><td>1224087.064</td><td>--</td></tr><tr><td>UPDATE NOOP</td><td>Patched</td><td>225591.429</td><td>82%</td></tr><tr><td>SELECT AVG</td><td>Unpatched</td><td>441573.619</td><td>--</td></tr><tr><td>SELECT AVG</td><td>Patched</td><td>6541.039</td><td>98%</td></tr><tr><td>SELECT NOOP</td><td>Unpatched</td><td>1108651.349</td><td>--</td></tr><tr><td>SELECT NOOP</td><td>Patched</td><td>23101.602</td><td>98%</td></tr></table></p>

<p>Pretty substantial improvement in these particular, but I think common, use cases. The UPDATE test sees less overall benefit because the time to write out the changes would be significant and the same regardless of array handling in PL/R. The difference between SELECT NOOP and SELECT AVG is driven by the fact that the latter returns a scalar result, while the former returns the entire array. The reason SELECT NOOP does array_upper() on the returned array, is that otherwise all that array data (something like 4 GB) gets materialized in memory by psql, which of course greatly slows things further and is not what we are trying to test.</p>

<p>Please give the changes a try and provide feedback before I release another PL/R version. You can grab the new code from <a href="http://github.com/jconway/plr">github</a> and sign up for the <a href="http://pgfoundry.org/mail/?group_id=1000247">PL/R mailing list</a> to post your results or report any questions/problems. And of course visit the <a href="http://www.joeconway.com/plr/">PL/R homepage</a> and <a href="http://www.joeconway.com/web/guest/pl/r">PL/R wiki</a> for more general information about PL/R -- particularly to watch for these changes in the next official release. Finally, don't hesitate to <a href="http://www.credativ.us/contact/">contact me</a> directly if the other choices don't suit you for some reason.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - advanced analytics</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-topic-of-the-day---advanced-analytics.html" />
    <id>tag:blog.credativ.com,2010:/en//2.183</id>

    <published>2010-07-12T02:58:22Z</published>
    <updated>2010-12-07T12:43:00Z</updated>

    <summary>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. It&apos;s better to directly store the data as R objects. I had been planning to continue with timeseries aggregation, but...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="analytics" label="analytics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plr" label="PL/R" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><img alt="Rlogo.jpg" src="http://blog.credativ.com/en/Rlogo.jpg" width="100" height="76" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. It's better to directly store the data as R objects.</em></p>

<p>I had been planning to continue with timeseries aggregation, but decided to take a side-road based on a recent question on the <a href="http://www.joeconway.com/plr">PL/R</a> mailing list.</p>

<p>The question was related to seismic data, which is in fact timeseries data. However, I guess the data is normally stored as an array of floats that are all recorded during some seismic event at a constant sampling rate. The arrays are available from online sources in an individual file for each event being analyzed. The problem was that when dealing with, say, 14000 arrays of floats, each having on the order of 16000 elements, passing the data to and from PL/R proved slower than hoped.</p>

<p>So we start with loading of sample data for a performance test:</p>
<pre class='brush: sql'>DROP TABLE IF EXISTS test_ts;
CREATE TABLE test_ts
(
  dataid bigint NOT NULL,
  data double precision[],
  CONSTRAINT pk_data PRIMARY KEY (dataid)
);

CREATE OR REPLACE FUNCTION filt_r_nothing(ts double precision[])
RETURNS double precision[] AS $$
 return(ts);
$$ LANGUAGE 'plr' IMMUTABLE;

CREATE OR REPLACE FUNCTION load_test(int) RETURNS text AS $$
  DECLARE
   i    int;
  BEGIN
    FOR i IN 1..$1 LOOP
      INSERT INTO test_ts(dataid,data) VALUES (i,'{-0.0205086770285039, ...'})
    END LOOP;
    RETURN 'OK';
  END;
$$ LANGUAGE plpgsql;

SELECT load_test(14000);
 load_test 
-----------
 OK
(1 row)

Time: 123861.362 ms
</pre><p></p>

<p>The array in the VALUES clause of that function actually contains 16879 float8 elements. You can see that it takes over two minutes on my development machine to load the table with 14000 rows of this array. Note that on my development machine I have done no tuning of PostgreSQL configs, and I built with --enable-debug, --enable-cassert, and CFLAGS='-O0 -g3'.</p>

<p>Next, we update the data column with filt_r_nothing() which does nothing other than returning the same array it was passed.<br />
</p>
<pre class='brush: sql'>UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 1224087.064 ms
</pre><p></p>

<p>Not pretty. Over 20 minutes. I did some profiling of PL/R and concluded most of the time was being spent converting 16879 PostgreSQL array elements from float8 datums to R vector elements one at a time while processing the function argument, and then repeating the process in reverse while creating the returned result. Perhaps there are optimizations that can be made to that process, but since PostgreSQL and R each have their own binary representation of this data, there is no avoiding the conversion overhead.</p>

<p>However, what is the point of the proposed performance test? The comparison was being made to another procedural language, which apparently does not convert the array elements if they are not used. A real function is presumably going to do some calculation over the array elements, requiring that they be individually accessed.</p>

<p>I decided to see how PL/pgSQL performs if forced to modify and return the passed array. The difference between this test and the PL/R one will give some insight on the time spent converting elements from PostgreSQL to R native form.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_plpgsql_nothing(ts double precision[])
RETURNS double precision[] AS $$
 BEGIN
  RETURN ts || 3.14159::float8;
 END
$$ LANGUAGE 'plpgsql' IMMUTABLE;

UPDATE test_ts SET data = filt_plpgsql_nothing(data);
UPDATE 14000
Time: 239054.580 ms
</pre><p></p>

<p>About 6 minutes. Much better. But let's see what happens if we do some more meaningful, if simple, calculations on the array elements.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_plpgsql_avg(ts double precision[])
RETURNS double precision AS $$
 DECLARE
  i int;
  numts int = array_upper(ts,1);
  ts_sum float8 = 0.0;
 BEGIN
  FOR i IN 1..numts LOOP
    ts_sum := ts_sum + ts[i];
  END LOOP;
  RETURN (ts_sum/numts::float8);
 END
$$ LANGUAGE 'plpgsql' IMMUTABLE;

select filt_plpgsql_avg(data) from  test_ts;
--killed after &gt; 1 hour

CREATE OR REPLACE FUNCTION filt_r_avg(ts double precision[])
RETURNS double precision AS $$
 return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 441573.619 ms
</pre><p></p>

<p>Although the PL/R function still took over 7 minutes to process 14000 rows with 16879 elements, PL/pgSQL took long enough that I killed it out of impatience.</p>

<p>It occurred to me that a feature I added to PL/R within the past year or so might come in handy about now. Namely, it is possible to directly store R objects in PostgreSQL tables. This means that when the datum is passed to a PL/R function, it is all ready to go -- no conversion needed. Let's take a look at that scenario.<br />
</p>
<pre class='brush: sql'>DROP TABLE IF EXISTS test_ts_obj;
CREATE TABLE test_ts_obj
(
  dataid serial PRIMARY KEY,
  data bytea
);

CREATE OR REPLACE FUNCTION make_r_object(fname text)
RETURNS bytea AS $$
 myvar&lt;-scan(fname,sep=&quot;,&quot;)
 return(myvar);
$$ LANGUAGE 'plr' IMMUTABLE;

INSERT INTO test_ts_obj (data) SELECT make_r_object('array-data.csv') from generate_series(1,14000);
INSERT 0 14000
Time: 44182.598 ms

CREATE OR REPLACE FUNCTION filt_r_avg(ts bytea)
RETURNS double precision AS $$
 return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

select filt_r_avg(data) from  test_ts_obj;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
 [...]
 0.656530643017027
(14000 rows)

Time: 12828.331 ms
</pre><p></p>

<p>This results in 44 seconds to load the same 14000 rows of array data as before, but<br />
directly as R objects. Compare that to the 2 minutes to load as PostgreSQL arrays as seen at the beginning of this article. And now it only takes 13 seconds to operate on the 14000 R objects compared to 442 seconds. That's a nice improvement!</p>

<p>But PL/R gives you access to the full power of the R environment for statistical computing and graphics. Just for fun, here is a PL/R function that calculates the "Power Spectrum" of the seismic data, and returns the result as a JPEG of the plot.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_r_ps(ts bytea)
RETURNS bytea AS $$
  library(quantmod)
  library(cairoDevice)
  library(RGtk2)

  fourier&lt;-fft(ts)
  magnitude&lt;-Mod(fourier)
  y2 &lt;- magnitude[1:(length(magnitude)/10)]
  x2 &lt;- 1:length(y2)/length(magnitude)
  mydf &lt;- data.frame(x2,y2)

  pixmap &lt;- gdkPixmapNew(w=500, h=500, depth=24)
  asCairoDevice(pixmap)

  plot(mydf,type=&quot;l&quot;)
  plot_pixbuf &lt;- gdkPixbufGetFromDrawable(NULL, pixmap,
                                                        pixmap$getColormap(),
                                                        0, 0, 0, 0, 500, 500)
  buffer &lt;- gdkPixbufSaveToBufferv(plot_pixbuf,
                                                       &quot;jpeg&quot;,
                                                        character(0),
                                                        character(0))$buffer
  return(buffer)
$$ LANGUAGE 'plr' IMMUTABLE;
</pre><p></p>

<p>This is now not about performance so much as it is about analytical power. About half of the lines in this function are setting up to capture the output graph. The "meat" of the function can be contained in these few lines:<br />
</p>
<pre class='brush: sql'>fourier&lt;-fft(ts)
magnitude&lt;-Mod(fourier)
plot(x=1:length(y2)/length(magnitude),
       y=magnitude[1:(length(magnitude)/10)],
       type=&quot;l&quot;)
</pre><p></p>

<p>Compliment that PL/R function with a bit of PHP code...<br />
</p>
<pre class='brush: sql'>&lt;?php
function hex2bin($data)
{
	$data = ltrim($data, &quot;\x&quot;);
	$len = strlen($data);
	return pack(&quot;H&quot; . $len, $data);
} 

$dbconn = pg_connect(&quot;dbname=contrib_regression&quot;);
$rs = pg_query( $dbconn, &quot;select plr_get_raw(filt_r_ps(data))
                                    from test_ts_obj where dataid = 42&quot;);
$hexpic = pg_fetch_array($rs);
$cleandata = hex2bin($hexpic[0]);

header(&quot;Content-Type: image/jpeg&quot;);
header(&quot;Last-Modified: &quot; .
date(&quot;r&quot;, filectime($_SERVER['SCRIPT_FILENAME'])));
header(&quot;Content-Length: &quot; . strlen($cleandata));
echo $cleandata;
?&gt;
</pre><p></p>

<p>...and the output looks like:<img alt="plr-blog.jpg" src="/en/jco/plr-blog.jpg" width="500" height="500" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>Fairly sophisticated output for relatively little effort! For more information or assistance with respect to PostgreSQL, PL/R, and/or advanced analytics, <a href="http://www.credativ.us/contact/">don't hesitate to contact us</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - aggregating timeseries data</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/tip-postgresql-tip-of-the-day---aggregating-timeseries-data.html" />
    <id>tag:blog.credativ.com,2010:/en//2.182</id>

    <published>2010-07-09T00:21:04Z</published>
    <updated>2010-12-07T12:44:45Z</updated>

    <summary>Frequently when dealing with parametric data, you need to &quot;roll up&quot; the data in summary fashion as it ages in order to reduce the volume kept on hand, or maybe because the summary statistics are what really interests you. There...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>Frequently when dealing with parametric data, you need to "roll up" the data in summary fashion as it ages in order to reduce the volume kept on hand, or maybe because the summary statistics are what really interests you. There are several ways to do that, and this post highlights four different approaches.</em></p>

<p>I was reminded of this kind of "roll ups" today by a question on the pgsql-novice list. This is actually quite a large topic, so I this tip will likely just scratch the surface. The question was related to storing min, max, and avg summaries on an hourly, daily, and weekly basis. The basic idea, for example, is that you can keep raw data for maybe a week, hourly summaries for 6 months, daily summaries for 3 years, and weekly summaries forever. As I mentioned in my reply, I have done this kind of thing over the years using at least 4 approaches:</p>

<ol>
	<li>Aggregate on demand</li>
	<li>Batch aggregate on a periodic basis -- e.g. run your aggregate query with a cron job which truncates and rebuilds a table (i.e. a  "materialized view")</li>
	<li>Write a C based trigger that does "continuous aggregation" to a materialized table</li>
	<li>Write a C based bulk loader that aggregates as it bulk loads the raw  data into a materialized table</li>
</ol>

<p>The first approach is simply to run an aggregate query whenever you need the summarized data. Obviously this does not really satisfy the stated desire to discard aged raw data, but I mention it for completeness. In some cases you have sufficient storage given your data volume, and performance of the aggregate is "good enough".</p>

<p>The second is the rough equivalent of a materialized view. In other words, run a batch job via cron or something similar that <tt>TRUNCATE</tt>s and then repopulates a table used for storage of the aggregate result. Particularly for daily or weekly summary data, when the consumers of the data are 9-5 folk, this approach works pretty well. This also fits in nicely with common partitioning schemes.</p>

<p>The third is one where you want summary statistics to be updated live. In this case you actually want the summary data for the current hour/day/week to all be constantly updated as new raw data comes in. Otherwise you are stuck always looking at last hour's, or yesterday's, or last weeks, data. The way to do this is through a trigger. A while back I implemented a continuous aggregation trigger in C that used prepared queries to update my aggregate table for every <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> occurring on the target table. However even with the trigger written in C and using prepared queries, the performance impact of the trigger firing for every DML event was significant.</p>

<p>Finally, the forth method can be used when your reporting needs are such that the raw data can be collected for some period before storing in your database. Let's say the summary reports are never run against the current hour. What you can do is build up a file in suitable format for bulk loading via <tt>COPY</tt>. Then process the data as it is bulk loaded to calculate and insert the summary at the same time. Again, I had done that in the past using a C program that read in the stored files, generated the summary data while building a string buffer, and finally using libpq's <tt>PQputCopyData()</tt> to populate the tables.</p>

<p>More than likely some combination of the above is what you really want. Perhaps use method 2 to maintain your weekly and daily aggregate materialized views, and use method 4 to update your hourly aggregate data.</p>

<p>This post was a lot of discussion and no code -- perhaps tomorrow I will continue with some more concrete examples.</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - mass modification of sequences</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-tip-of-the-day---mass-modification-of-sequences.html" />
    <id>tag:blog.credativ.com,2010:/en//2.180</id>

    <published>2010-07-07T21:34:07Z</published>
    <updated>2010-12-07T12:45:11Z</updated>

    <summary>Someone posted a dilemma to the pgsql-sql list today that involved many if not all of his sequences getting out of sync with their respective &quot;serial&quot; columns. In other words, something like &quot;SELECT max(id) FROM sometable&quot; yields 42, but the...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="plpgsql" label="plpgsql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sequences" label="sequences" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" />Someone posted a dilemma to the pgsql-sql list today that involved many if not all of his sequences getting out of sync with their respective "serial" columns. In other words, something like "SELECT max(id) FROM sometable" yields 42, but the sequence nextval for sometable.id is currently set to 36. This is obviously bad (for reasons left as an exercise for the reader). So besides trying to figure out how the database ended up in this state, he needed a script to reset all of his sequences to the correct next value.</p>

<p>I had run into a similar need not too long ago. Namely, when setting up multi-master replication with Bucardo you need your sequences to draw different values on either master so as not to conflict. One solution is to set up all your sequences to jump by 2, and use even numbers on one master and odd numbers on the other. Again, a script makes this easier to deal with, and I had developed one for this situation. So I modified it for the problem mentioned above.</p>

<p>Both versions follow:<br />
</p>
<pre class='brush: sql'>-- create &quot;odd&quot; and &quot;even&quot; sequences in multi-master scenario
CREATE OR REPLACE FUNCTION adjust_seqs(namespace text, starteven bool)
  RETURNS text AS $$
DECLARE
  rec         record;
  startval   bigint;
  sql          text;
  fqname  text;
BEGIN
  FOR rec in EXECUTE 'select relname from pg_class where relkind = ''S''
                      and relnamespace = (select oid from pg_namespace
                      where nspname=''' || namespace || ''')' LOOP
    fqname :=  namespace || '.' ||  rec.relname;
    IF starteven THEN
      EXECUTE 'SELECT ((last_value / 2) * 2) + 2 from ' || fqname INTO startval;
    ELSE
      EXECUTE 'SELECT ((last_value / 2) * 2) + 1 from ' || fqname INTO startval;
    END If;
    sql := 'ALTER SEQUENCE ' || fqname || ' INCREMENT BY 2 RESTART WITH ' || startval;
    EXECUTE sql;
    RAISE NOTICE '%', sql;
  END LOOP;
  RETURN 'OK';
END;
$$ LANGUAGE plpgsql STRICT;
SELECT adjust_seqs('public', true);  -- in master1 (even)
SELECT adjust_seqs('public', false); -- in master2 (odd)
</pre><p><br />
</p>
<pre class='brush: sql'>-- update sequences that have gotten out-of-sync with the
-- PK field for which they normally provide the default
CREATE OR REPLACE FUNCTION adjust_seqs(namespace text)
  RETURNS text AS $$
DECLARE
  rec           record;
  startval     bigint;
  sql            text;
  seqname  text;
BEGIN
  FOR rec in EXECUTE 'select table_name, column_name, column_default
                      from information_schema.columns
                      where table_schema = ''' || namespace || '''
                      and column_default like ''nextval%''' LOOP

    seqname := pg_get_serial_sequence(rec.table_name, rec.column_name);
    sql := 'select max(' || rec.column_name || ') + 1 from ' || rec.table_name;
    EXECUTE sql INTO startval;
    IF startval IS NOT NULL THEN
      sql := 'ALTER SEQUENCE ' || seqname || ' RESTART WITH ' || startval;
      EXECUTE sql;
      RAISE NOTICE '%', sql;
    END IF;
  END LOOP;
  RETURN 'OK';
END;
$$ LANGUAGE plpgsql STRICT;
select adjust_seqs('public');
</pre><p></p>

<p>Neither of these is heavily tested, and both make certain assumptions, so please test and modify to suit your own needs. Caveat emptor!</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - loading a PostGIS database dump</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-tip-of-the-day---loading-a-postgis-database-dump.html" />
    <id>tag:blog.credativ.com,2010:/en//2.179</id>

    <published>2010-07-07T01:10:01Z</published>
    <updated>2010-12-07T12:44:00Z</updated>

    <summary>I was given a Postgres database dump to analyze today created by &quot;pg_dump -Fc&quot;. The source database included PostGIS 1.3.x extensions. I&apos;m not sure if this is standard with PostGIS, but the related database objects were all dumped with a...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgis" label="PostGIS" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p>I was given a Postgres database dump to analyze today created by "pg_dump -Fc". The source database included PostGIS 1.3.x extensions. I'm not sure if this is standard with PostGIS, but the related database objects were all dumped with a hard-coded library path, specifically <tt>/usr/lib/postgresql/8.3/lib</tt>. On my machine, I have many PostgreSQL clusters (essentially at least one for every supported branch dating back to 7.3.x), but they are not located under <tt>/usr/lib/postgresql</tt>.</p>

<p>As such, I needed a quick fix. To wit:<br />
</p>
<pre class='brush: sql'>pg_restore database.with.postgis.tgz &gt; db.w.postgis.dmp
sed 's|/usr/lib/postgresql/8.3/lib|$libdir|g' &lt; db.w.postgis.dmp &gt; db.w.postgis.dmp.new
</pre><p></p>

<p>The first line extracts the dump file from the compressed "custom" format into a human readable text SQL file. The second line replaces the hard-coded library path with the special PostgreSQL $libdir variable. This will always point to the correct location for any given PostgreSQL cluster. You can always discover where this is by running:<br />
<pre>pg_config --libdir</pre></p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL 9.0 is now in Betaphase</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/05/postgresql-90-is-now-in-betaphase.html" />
    <id>tag:blog.credativ.com,2010:/en//2.162</id>

    <published>2010-05-25T10:29:00Z</published>
    <updated>2010-05-25T10:16:44Z</updated>

    <summary> The PostgreSQL developers&apos; community recently published the first Beta version of the new 9.0 release. Over 200 new functions and improvements feature in this new version. With this new release, PostgreSQL now amongst other features claims an inbuilt replication...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
<em>The PostgreSQL developers' community recently published the first Beta version of the new 9.0 release. Over 200 new functions and improvements feature in this new version.</em></p>

<p>With this <a href="http://www.postgresql.org/about/news.1198">new release</a>, PostgreSQL now amongst other features claims an inbuilt replication solution as well as the ability to access and read standby nodes, continuously being updated by <a href="http://www.postgresql.org/docs/8.4/static/warm-standby.html">Log Shipping </a> (Hot Standby). Streaming replication allows the sending of transaction logs directly to one or more standby nodes, which considerably reduces the amount of time lost compared with the more common, file-based log shipping. Combining these two features makes for an extremely efficient solution for high availability or loadbalanced systems.</p>

<p>The all new PostgreSQL version also offers the following innovations:</p>

<ul>
	<li>Memory based <strong>LISTEN/NOTIFY</strong>: this replaces the previous table based implementation and is much faster.</li>

<p>        <li><strong>Exclusion Constraints</strong>: broadens constraints to be able to deal with the complex datatypes of overlapping constraints.</li><br />
        <li>Procedural code such as PL/pgSQL, PL/Perl and PL/Python can now be done inline per <strong>DO</strong> command.</li> This means there is no longer need to define a function with <strong>CREATE FUNCTION</strong>.</p>

<p>        <li>Triggers on columns</li><br />
        <li>Triggers can now be tied to conditions</li><br />
        <li>Named argument lists for procedures</li><br />
        <li>Parameters can now be flexibly linked to rolls/databases</li><br />
</ul></p>

<p>As always, anyone interested is invited to share their test results with the developers.  Information on the procedure for testing and filing of error messages can be found in the <a href="http://wiki.postgresql.org/wiki/HowToBetaTest">Wiki</a>.</p>

<p>All blog articles which fall into the <a href="/en/postgresql/">PostgreSQL category</a> are grouped in their own feed, and if you find you need <a href="http://www.credativ.co.uk/services/support/projects/databases/postgresql">support and services for PostgreSQL</a>, you've come to the right place at credativ.</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Howto] PostgreSQL and Linux Memory Management</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/03/postgresql-and-linux-memory-management.html" />
    <id>tag:blog.credativ.com,2010:/en//2.151</id>

    <published>2010-03-26T13:57:26Z</published>
    <updated>2010-03-26T14:53:42Z</updated>

    <summary>The OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks. Administrators of Linux machines with a very high RAM-Usage...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="Howto" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>The OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks.</em></p>

<p>Administrators of Linux machines with a very high RAM-Usage are sometimes faced with a terrifying scenario: the Linux <a href="http://linux-mm.org/OOM_Killer">OOM-Killer</a> (OOM = Out Of Memory). In situations such as a crashed PostgreSQL instance, the following entry can typically be found in the server log:<br />
</p>
<pre class='brush: text'>
Out of Memory: Killed process PID (Prozessname)
</pre><p></p>

<p>Why is this?</p>

<h3>Virtual Memory and Overcommit</h3>

<p>Virtual Memory used by Linux can be allocated in a number of ways: malloc(), mmap(), Swap, Shared Memory, to mention some examples. It is possible to overcommit virtual memory by allocating more than is actually available in the system. If this happens, a so-called "OOM-Condition" occurs; that is, your system no longer has any available space in the virtual memory area and cannot allocate any more. This is when the OOM-Killer is activated - and does what its name suggests: kills any processes which meet certain conditions in order to free memory.</p>

<p>If you have an environment where servers are running PostgreSQL in parallel with other memory-intensive processes on the same machine, it's likely that the OOM-Killer will kill certain PostgreSQL processes. Due to the amount of allocated shared memory and the memory usage of each backend, the OOM-Killer will target PostgreSQL by preference since it counts the complete addressed shared memory area of <strong>all</strong> backends into summary. </p>

<p>The amount of committed memory of your system at a given time can be examined with the <tt>/proc</tt>-Filesystem:<br />
</p>
<pre class='brush: text'>
$ grep Commit /proc/meminfo 
CommitLimit:    376176 kB
Committed_AS:   265476 kB
</pre><p></p>

<p>This example shows the current amount of committed memory at <tt>265476 kB</tt> (<tt>Committed_AS</tt>). Is this equal or even larger than the amount of <tt>Committed_AS</tt> the OOM-Killer is likely to be woken up.</p>

<p>However, the kernel provides some interfaces to adjust the behaviour of the OOM-Killer and Overcommit with regard to PostgreSQL installations.</p>

<h3>Turn off Overcommit</h3>

<p>A radical method is to turn overcommit off entirely, although this is only recommended on systems dedicated to PostgreSQL. The overcommit feature can be configured within three categories with the following kernel parameter:<br />
</p>
<pre class='brush: text'>
vm.overcommit_memory = 0
</pre><p></p>

<p>This can hold three different kinds of categories:</p>

<ul>
       <li><strong>0</strong>: Allow a careful strategy of overcommitting memory: small and reasonable amounts of overcommitting allocations are allowed, but heavy and wild allocations will be denied. In this mode, root can allocate more space than unprivileged users. This is also the kernel default setting.</li>
        <li><strong>1</strong>: Allow overcommit without any constraints</li>
        <li><strong>2</strong>: Turn off overcommit. The effective allocatable memory space cannot be larger than <tt>swap</tt> + a configurable percentage of physical RAM. 
</ul> 

<p>The fraction of physical RAM used by category <tt>2</tt> is defined by the parameter:<br />
</p>
<pre class='brush: text'>
vm.overcommit_ratio = 50
</pre><p></p>

<p>While <tt>vm.overcommit_memory=1</tt> is useful when tuning certain applications, the categories <tt>0</tt> or <tt>2</tt> are the best ones to use most of the time. If you turn off overcommit with <tt>vm.overcommit_memory=2</tt>, a process will get an "out of memory"-Exception (depending of <tt>vm_overcommit_ratio</tt>) if allocating memory when no more free space is available. Depending on the distribution you are using, we recommend that you save those settings in the configuration file <tt>/etc/sysctl.conf</tt> to ensure that they are activated on server reboot.<br />
</p>
<pre class='brush: text'>
$ echo &quot;vm.overcommit_memory=2 &gt;&gt; /etc/sysctl.conf
$ echo &quot;vm.overcommit_ratio=60 &gt;&gt; /etc/sysctl.conf
$ sysctl -p /etc/sysctl.conf
</pre><p></p>

<p>Changes to those parameters are activated immediately. You can recheck this by consulting  <tt>/proc/meminfo</tt>: <br />
</p>
<pre class='brush: text'>
$ grep Commit /proc/meminfo 
CommitLimit:    401440 kB
Committed_AS:   266456 kB
</pre><p></p>

<p>The machine has <tt>249848 kB</tt> of swap and <tt>252656 kB</tt> physical RAM. <br />
According to the formula <tt>swap + vm.overcommit_ratio * RAM</tt> this results in a <tt>CommitLimit</tt> of <tt>401440 kB</tt></p>

<h3>Configure OOM-Killer per process</h3>

<p>Where PostgreSQL is running without dedicated server hardware and in parallel with memory-intensive middleware (e.g. JBoss- or Tomcat-Installations), most admins would prefer to be able to control the OOM-Killer on a per-process basis and allow overcommitting of memory allocations. Since kernel 2.6.1, Linux has been providing an interface for tuning the OOM-Score of a process, which will in turn increase or decrease the affinity of the process to be killed when running in an OOM-Situation. This interface allows a very flexible configuration of processes in such environments regarding their memory requirements. The interface is exposed by the  <tt>/proc-Filesystem</tt>, for example here on a PostgreSQL-Installation on Debian:<br />
</p>
<pre class='brush: text'>
$ cat /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
0
</pre><p></p>

<p>Values allowed range from -17 to +15, a negative value decreases, while a positive value increases the likelihood of being killed by the OOM-Killer. -17 is a special value and turns killing the process in an OOM-Situation off.<br />
The settings are inherited from parent to child processes; in PostgreSQL you'll have to set this one to the PostgreSQL master process:<br />
</p>
<pre class='brush: text'>
$ echo -17 &gt;&gt; /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
$ psql -q postgres
test=# SELECT pg_backend_pid();
 pg_backend_pid 
----------------
           3429
(1 line)

test=# 
[1]+  Stopped                 psql -q test
$ cat /proc/3429/oom_adj
-17
</pre><p></p>

<p>The disadvantage of this method is that <strong>all</strong> child processes will now be excluded from the OOM-Killer, which is not generally what DBAs prefer. For example, where you want to protect the PostgreSQL system processes (like <tt>background writer</tt> oder <tt>autovacuum</tt>) from being killed by the OOM-Killer, but still kill ordinary database connections when running out of memory.</p>

<p>To set the OOM-Score you need to have a privileged user, so the best way to implement this setting is to put it into your PostgreSQL start script.</p>

<h3>Enhancements in PostgreSQL 9.0</h3>

<p><a href="/de/2010/02/postgresql-agenda-2010.html">PostgreSQL 9.0</a> will have additional <a href="http://archives.postgresql.org/pgsql-committers/2010-01/msg00169.php">support</a> for the pictured <tt>/proc</tt>-Interface. On one hand PostgreSQL 9.0 will come with a new <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=blob_plain;f=contrib/start-scripts/linux;hb=HEAD">Linux start script</a>, which supports setting the <tt>oom_adj</tt> value before starting up PostgreSQL; on the other hand it is possible to build PostgreSQL with the special C-Macro <tt>LINUX_OOM_ADJ</tt> defined, which will allow DBAs to limit the inheritance of the OOM-Score to backend childs as shown in this example:<br />
</p>
<pre class='brush: text'>
$ ./configure CC=&quot;ccache gcc&quot; CFLAGS=&quot;-DLINUX_OOM_ADJ=0&quot;
</pre><p></p>

<p>This method will save the PostgreSQL system process but will allow the OOM-Killer to kill database backend processes running amok.</p>

<h3>Alternatives</h3>

<p>An alternative solution is available by an <a href="http://www.cybertec.at/en/linux-kernel-patch">additional kernel patch</a>. This extends the existing <tt>/proc</tt>-Filesystem with a list of process names which should be excluded from the OOM-Killer. However, this patch is an unoffical extension to the Linux kernel and you may have to maintain your own builds of Linux kernels. In addition, it is not nearly as flexible as adjusting the OOM-Score and process names are not useful for uniquely identifying processes (e.g. Java- or Perlbased processes).</p>

<h3>Summary</h3>

<p>The Linuxkernel provides a comprehensive interface to adjust processes regarding their memory usage and the OOM-Killer. The most flexible method is the introduced <tt>/proc</tt>-Filesystem with the <tt>oom_adj</tt>-Interface. PostgreSQL 9.0 will have additional support for this interface. Dedicated PostgreSQL-Systems can be configured to avoid overcommit at all, but will need a deeper understanding of the number of memory resources the database system demands and the requirements of the VM of the kernel.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL 9.0alpha4 released</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/02/postgresql-90alpha4-released.html" />
    <id>tag:blog.credativ.com,2010:/en//2.120</id>

    <published>2010-02-25T12:02:50Z</published>
    <updated>2010-03-08T14:22:33Z</updated>

    <summary>The PostgreSQL project just released the Alpha 4 of its upcoming PostgreSQL 9.0. The Alpha4 version of the upcoming PostgreSQL 9.0 release is ready for download. It is planned that Alpha4 will be the last Alpha version before the Beta...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/en/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>The PostgreSQL project just released the Alpha 4 of its upcoming PostgreSQL 9.0.</em></p>

<p>The Alpha4 version of the upcoming PostgreSQL 9.0 release is ready for <a href="http://www.postgresql.org/developer/alpha">download</a>. It is planned that Alpha4 will be the last Alpha version before the Beta release cycle for PostgreSQL 9.0. Some highlights of this release are:<br />
<ul><br />
	<li>Reworked LISTEN/NOTIFY infrastructure: the performance has improved massively compared to the old table-based implementation, due to a pure main memory solution. Additionally, the new solution supports so called <em>"Payloads"</em> which makes it possible to transport messages.</li><br />
	<li>Streaming Replication: an integrated solution for replication which has noticeable lower latency times than the usual, WAL-shipping-based solutions.</li><br />
	<li>Procedural code with plpqsql and plperl can now be executed with the DO statement without the need to call a CREATE FUNCTION first.</li><br />
</ul></p>

<p>You are very much welcome to <a href="http://www.postgresql.org/developer/alpha">download</a> and test the Alpha version and play with it. The developers are interested in Bugs and test results; you can find the work flow to publish these <a href="http://wiki.postgresql.org/wiki/HowToBetaTest">outlined</a> in their Wiki.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL Optimizer Bits: Semi and Anti Joins</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/02/postgresql-optimizer-bits-semi-and-anti-joins.html" />
    <id>tag:platon.credativ.com,2010:/en//2.119</id>

    <published>2010-02-25T11:57:10Z</published>
    <updated>2010-06-23T10:42:54Z</updated>

    <summary>The series &quot;PostgreSQL Optimiser Bits&quot; will introduce the strategies and highlights of the PostgreSQL optimiser. We start today with a new feature of PostgreSQL 8.4: Semi and Anti Joins. Since version 8.4, PostgreSQL has been offering a new optimisation strategy...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="Debian" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<img alt="postgreslogo.png" src="/en/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>The series "PostgreSQL Optimiser Bits" will introduce the strategies and highlights of the PostgreSQL optimiser.  We start today with a new feature of PostgreSQL 8.4: Semi and Anti Joins.</em>
<br />
<br />
Since version 8.4, PostgreSQL has been offering a new optimisation strategy for the optimisation of certain queries: Semi and Anti Joins.
<br />
<br />
A <strong>Semi Join</strong> is a specific form of a join, which only takes the keys of relation <tt>a</tt> into account if these are also present in the associated table <tt>b</tt>. An <strong>Anti Join</strong> is the negative form of a Semi Join: that is, a key picked in table <tt>a</tt> will be taken into account if it is not present in table <tt>b</tt>.
<br />
<br />
To summarize, Semi and Anti Joins are specific forms of a join which only take certain keys on the left side into account - where queries want to make sure certain keys exist, but are not concerned with the content of the key itself. This behaviour is already widely known in Object Relation Mappers (ORM) which formulate such queries using EXIST() or NOT EXIST().
<br />
<br />
Compared to PostgreSQL 8.3 the same query is possible with a much simpler and more efficient query plan. The following simple example shows this improvement: take two tables, <tt>a, b</tt> and an EXIST() query. A certain set of data from <tt>a</tt> is to be found which has its equivalent <tt>a.id2 = b.id</tt> in <tt>b</tt>. Of course, this aim can also be accomplished by one single join, however, this example shows the improvements of the optimizer solving this query.
<pre class='brush: sql'>
EXPLAIN SELECT id FROM a WHERE a.id = 200 AND EXISTS(SELECT id FROM b WHERE a.id2 = b.id);
</pre>

The optimiser in PostgreSQL in 8.3 determines the following plan for this example. Keep in mind that both tables <tt>a</tt> and <tt>b</tt> each have an index on the column <tt>id</tt> and <tt>id2</tt>.
<pre class='brush: sql'>
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using a_id_idx on a  (cost=0.00..8355.27 rows=503 width=4)
   Index Cond: (id = 200)
   Filter: (subplan)
   SubPlan
     -&gt;  Index Scan using b_id_idx on b  (cost=0.00..8.27 rows=1 width=4)
           Index Cond: ($0 = id)
</pre>
In contrast, in PostgreSQL 8.4 the optimizer can use a hash Semi Join:
<pre class='brush: sql'>
                                QUERY PLAN
---------------------------------------------------------------------------
 Hash Semi Join  (cost=27.52..78.16 rows=969 width=4)
   Hash Cond: (a.id2 = b.id)
   -&gt;  Index Scan using a_id_idx on a  (cost=0.00..37.32 rows=969 width=8)
         Index Cond: (id = 200)
   -&gt;  Hash  (cost=15.01..15.01 rows=1001 width=4)
         -&gt;  Seq Scan on b  (cost=0.00..15.01 rows=1001 width=4)
</pre>
The reduced costs of this query plan are more than obvious - and lower costs mean fewer I/O accesses. So, in future a more detailed analysis of such queries is worth a look.]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL Agenda 2010</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/02/postgresql-agenda-2010-2.html" />
    <id>tag:platon.credativ.com,2010:/en//2.104</id>

    <published>2010-02-11T11:36:02Z</published>
    <updated>2010-03-05T11:02:58Z</updated>

    <summary>PostgreSQL is taking some big steps forward this year. The publishing of version 9.0 is just around the corner, while some of the older versions are coming to the end of their lifetime. PostgreSQL 9.0 2010 will see PostgreSQL release...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Support" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<img alt="postgreslogo.png" src="/en/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>PostgreSQL is taking some big steps forward this year. The publishing of version 9.0 is just around the corner, while some of the older versions are coming to the end of their lifetime.</em>

<h3>PostgreSQL 9.0</h3>

2010 will see PostgreSQL release its first major new version for a long time: version 9.0. The release of version 9.0 is an important milestone in the evolution of PostgreSQL. Integral to this release are new features such as the operation of standby servers in read-only mode (hot standby) and an integrated replication solution.

<h4>Hot Standby</h4>

Hot standby will allow a PostgreSQL instance to receive read requests on so-called standby nodes. The basic principle is the same as that included since version 8.0 under the name PITR (Point In Time Recovery) or WAL-Shipping. At regular intervals a copy of the database complete with transaction logs is generated (known as the Write Ahead Log or WAL), so that the standby nodes can be kept up to date with changes in the master database. In practice, this means incrementally applying all changes that were made on the master database from the point when the standby node was created. This was implemented as warm standby in previous versions, i.e. the database contained within a standby node could not be used by applications. However, with hot standby, it is possible to execute transactions on the node as long as they do not contain write operations. This is especially useful for high availability systems or analyses that can be run on separate nodes.

<h4>Streaming Replication - inbuilt asynchronous replication</h4>

For a long time in the PostgreSQL community, it was widely thought amongst developers that the infrastructure of an integrated replication system was difficult to maintain due to the complex requirements and variety of deployment scenarios. Therefore the flexibility and security expected of such solutions has been implemented in various specialised external projects. In recent years however, extensive communication with users has led to a large proportion of the desired functionality being implemented within PostgreSQL, mostly in the area of high availability. Thanks to this, an integrated solution is no longer just a dream, even for systems containing hundreds of gigabytes of data. Furthermore, the availability of an integrated replication solution is a critical factor for many data centres when choosing a database management system.

Streaming replication means that PostgreSQL can now offer an integrated solution for asynchronous replication of a primary database server (read- and writeable) to multiple additional secondary servers (read only). This functionality, based in part on the infrastructure implemented for WAL-Shipping, has made possible the replication of transactions in much smaller intervals. (Data is sent directly from the primary to the secondary server, hence the name "streaming"). Moreover, streaming replication permits the simple implementation of PostgreSQL replication clusters with multiple nodes. Whilst this is already possible with the existing hot-standby solution, it is much more complicated. Since the replicating data is based upon information from the WAL, this solution is extremely robust. Deployment scenarios such as partially replicated databases or modified database schemas are not currently possible on each replicated node, although these requirements are still achievable through the use of solutions such as <a href="http://www.slony.info/">Slony-I</a>, <a href="https://developer.skype.com/SkypeGarage/DbProjects/SkyTools">Londiste</a> or <a href="http://bucardo.org/wiki/Bucardo">Bucardo</a>.

<h3>Farewell to PostgreSQL 7.4, 8.0 and 8.1</h3>

2010 will herald the end of support for some versions of PostgreSQL. For the first time, three main versions are due to be phased out in the same year:
<ul>
	<li>PostgreSQL 7.4, Juli 2010</li>
	<li>PostgreSQL 8.0, Juli 2010</li>
	<li>PostgreSQL 8.1, November 2010</li>
</ul>
Support for PostgreSQL 8.0 and 8.1 on Windows was discontinued with the release of PostgreSQL 8.3 in February 2008. PostgreSQL 8.0 was the first release that could run natively on Windows, with many bugs being patched during development that could no longer be backported to older versions. So for quite some time now, Windows users have had to use at least PostgreSQL 8.2. We are now officially coming to the end of support for all other platforms, and also the last of the 7 series releases; PostgreSQL 7.4 is finally being phased out after 7 years. "Phased out" in PostgreSQL terms means that, primarily, no further binary packages or releases will be made and no further complex fixes will be ported, although the source code will continue to be available. As a rule, the PostgreSQL development team limit the lifetime of a main release to five years. However, the Windows variants of PostgreSQL 8.0 and 8.1 are proof that the lifetime of releases for single platforms can be shortened. The Release Policy can be found in the <a href="http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy">developer wiki</a> on the PostgreSQL project site.

<h3>Outlook</h3>

Although PostgreSQL 9.0 is not yet finished, hot standby <a href="http://www.postgresql.org/developer/alpha">can be tested</a> with version 8.5alpha3. Incidentally, the current alpha version is still named after the developer's branch 8.5, as it was named before the decision was made to move to version 9.0. Version 9.0alpha4 can be expected by late February, and should also include streaming replication. For those interested in testing, we are planning a guide with the title <a href="http://wiki.postgresql.org/wiki/HowToBetaTest">"How To Beta Test"</a>, which provides some guidelines for testing and feedback.]]>
        
    </content>
</entry>

</feed>
