<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>en.credativ blog: Category credativ</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/" />
    <link rel="self" type="application/atom+xml" href="http://blog.credativ.com/en/atom.xml" />
    <id>tag:blog.credativ.com,2010-03-05:/en//2</id>
    <updated>2011-06-07T14:33:32Z</updated>
    <subtitle>All about Linux and Open Source</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.34-en</generator>

<entry>
    <title>credativ and OpenERP Partner to Take on Proprietary ERP Giants</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2011/06/-rugby-uk---6.html" />
    <id>tag:blog.credativ.com,2011:/en//2.199</id>

    <published>2011-06-07T14:12:03Z</published>
    <updated>2011-06-07T14:33:32Z</updated>

    <summary> Rugby, UK - 6 June 2011 credativ Ltd, the UK branch of the largest independent provider of Open Source consultancy in Europe, today announced that it is partnering with OpenERP in a move aimed at increasing OpenERP&#8217;s share of...</summary>
    <author>
        <name>Irenie White</name>
        <uri>http://www.credativ.co.uk</uri>
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="news" label="News" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="openerp" label="OpenERP" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="opensource" label="Open Source" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="partnership" label="Partnership" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="OpenERP_138.png" src="http://blog.credativ.com/en/OpenERP_138.png" width="190" height="46" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /></p>

<p><strong>Rugby, UK - 6 June 2011</strong></p>

<p><a href="http://www.credativ.co.uk/">credativ Ltd</a>, the UK branch of the largest independent provider of Open Source consultancy in Europe, today announced that it is partnering with OpenERP in a move aimed at increasing OpenERP&#8217;s share of the UK enterprise resource planning market (ERP).</p>

<p>Chris Halls, MD, credativ UK, comments on the partnership: &#8220;OpenERP provides a flexible, robust and cost-effective alternative to proprietary systems such as SAP, JD Edwards EnterpriseOne and Sage, and is especially attractive to SMEs that may have previously found the cost of ERP systems prohibitive.&#8221;</p>

<p>&#8220;credativ has already introduced OpenERP to UK SMEs and enterprises in the manufacturing, ecommerce and logistics industries. credativ&#8217;s customers using OpenERP have already realised business benefits including cost savings, streamlined processes, improved visibility and simplified reporting.&#8221;</p>

<p>OpenERP's comprehensive suite of modular applications caters for all major business processes including: CRM, project management, warehouse management, manufacturing, financial management and human resources.</p>

<p>credativ has been providing open source training and consultancy to public and private sector clients since 1999. The credativ team has extensive experience of working with OpenERP; recent implementation work includes delivering customisations for warehousing, accounting, VAT, reporting and Magento e-commerce integration.</p>

<p>Halls continues: &#8220;Our partnership with OpenERP underlines our commitment to improving the system&#8217;s functionality. We want to highlight open source ERP as an alternative to less flexible proprietary platforms, and believe that this new partnership will bring our experience, size and range of services to organisations who are considering OpenERP.</p>

<p>OpenERP&#8217;s modular design allows organisations to introduce or replace existing ERP systems at their own pace without the burden of ongoing licensing costs. We see our partnership with OpenERP as an opportunity to encourage more organisations to make the move to open source.</p>

<p>credativ's unique support offering is available from operating systems to business applications - at scale. Our international <a href="http://www.credativ.co.uk/services/support">OSSC</a> (Open Source Support Centre) provides support and consultancy not only for OpenERP but for all major open source applications and distributions.&#8221;</p>

<p>Committed to actively participating in the Open Source community, members of credativ&#8217;s 40+ developer team regularly contribute to projects with recent input into OpenERP bug fixes, banking functionality and VAT reporting modules.</p>

<p><strong>About credativ:</strong></p>

<p>Founded in 1999, credativ is an independent consulting and services company which operates from Germany, the U.K., Canada, and the U.S. With a large team of experts in open source software, credativ offers a vast knowledge base that can be tapped into at any time by its clients. The company focuses on the service and support of open source software with a comprehensive range of services, including open source consulting, architectural and technical advice, open source software development, open source training, and personalised support. credativ is &#8220;Your One-Stop Shop for Open Source Support<small>TM</small>&#8221;.</p>]]>
        
    </content>
</entry>

<entry>
    <title>credativ and Black Duck announce International Partnership</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/12/credativ-black-duck-partnership.html" />
    <id>tag:blog.credativ.com,2010:/en//2.195</id>

    <published>2010-12-06T15:13:54Z</published>
    <updated>2010-12-07T10:11:59Z</updated>

    <summary> Rugby, 6 December 2010 - credativ Ltd and Black Duck Software Inc. have announced an international partnership to help further the deployment and integration of Open Source Software. The OSSC (Open Source Support Centre) run by credativ in the...</summary>
    <author>
        <name>Irenie White</name>
        <uri>http://www.credativ.co.uk</uri>
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Support" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="blackduck" label="black duck" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="news" label="news" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="opensourcesupport" label="open source support" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="partnership" label="partnership" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="black duck image.jpeg" src="http://blog.credativ.com/en/2010/12/06/black%20duck%20image.jpeg" width="70" height="70" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /></p>

<p><em><strong>Rugby, 6 December 2010</strong> - credativ Ltd and Black Duck Software Inc. have announced an international partnership to help further the deployment and integration of Open Source Software.</em></p>

<p>The <a href="http://www.credativ.co.uk/services/support/">OSSC</a> (Open Source Support Centre) run by credativ in the UK, US, Germany and Canada will now also be providing support for customers of Black Duck Software Inc. </p><p><a href="http://www.blackducksoftware.com/news/releases/2010-11-30/">Black Duck Software</a> is a worldwide provider of &#8220;managed software component reuse&#8221; solutions; they support businesses and organisations who use Open Source and third party source code in adhering to relevant licensing obligations, thereby reducing the associated business risks.</p><p>Through this partnership with credativ, Black Duck can now also offer comprehensive technical support for the many free software projects which are developed through extensive developer communities rather than through an organisation. This service guarantees Black Duck customers additional security for complex <a href="http://credativ.co.uk/services/">Open Source services</a> and provides an alternative which is comparable to the manufacturer's support available with proprietary software.</p><p>Mr. Chris Halls, Managing Director of credativ Ltd in the UK, explains: </p><p>&#8220;We are delighted about the partnership with Black Duck. We hope that combining our competencies will enable us to cover all the requirements for safe operation of Open Source software. Our partnership is a good basis for further international expansion - our Open Source Support Centres will be enhancing Black Duck's service offering, not only for the US but also the European market.&#8221;</p>

<p>If you would like to know more about our Open Source involvement simply leave us a comment here... alternatively please <a href="http://www.credativ.co.uk/contact/">contact us</a> directly.</p>

<p style="margin-bottom: 0cm; widows: 0; orphans: 0;" class="western"><strong>About credativ</strong></p><p style="margin-bottom: 0cm; widows: 0; orphans: 0;" class="western"><span>Founded
in 1999, credativ is an independent consulting and services company
which operates from Germany, the U.K., Canada, and the U.S. With a
large team of experts in open source software, credativ offers a vast
knowledge base that can be tapped into by its clients. The company
focuses on the service and support of open source software with a
comprehensive range of services, including open source consulting,
architectural and technical advice, open source software development,
open source training, and personalized support. credativ is &#8220;Your
One-Stop Shop for Open Source Support&#8221; </span><sup><span>TM</span></sup><span>.</span></p><p>The Open Source Support Centre (OSSC) offers support for the following:</p><p><em>Debian,
Kubuntu, Ubuntu, Xandros, SUSE, Red Hat, Fedora, CentOS, Linspire,
Mandriva, Slackware, Open BSD, Gnome, KDE, MySQL, PostgreSQL, PostGIS,
Slony, Zarafa, eGroupware, Kolab Groupware, Scalix, SugarCRM, vtiger,
CITADEL, Mozilla-Firefox, Mozilla-Suite, OpenOffice, Thunderbird, Wine,
Apache, Asterisk, OpenSER, FreePBX, OpenPBX, CallWeaver, SpamAssassin,
ClamAV, OpenLDAP, OTRS, RT, Samba, Cyrus, Dovecot, Exim, Postfix,
sendmail, Amanda, Bacula, DRBD, Heartbeat, Keepalived, Nagios, Open
Security Filter, Ferm, FAI, Squid, XEN, VirtualBox.</em></p><p>For further information please contact: </p><p>
credativ Ltd,<br />
36 Regent Street,<br />
Rugby,<br />
Warwickshire,<br />
CV21 2PS</p><h4>Press contact</h4><p>
Simon Bowring
</p><p>
Tel: +44 (0) 1788 298150<br />
Fax: +44 (0) 1788 298159<br />
Email: <a href="mailto:simon.bowring@credativ.co.uk">simon.bowring@credativ.co.uk</a></p><p class="western" style="margin-bottom: 0cm; font-style: normal; font-weight: normal; text-decoration: none;" lang="en-GB"><span><strong>About Black Duck Software Inc</strong></span></p><p style="margin-bottom: 0cm; widows: 0; orphans: 0;" class="western">Black
Duck Software is the leading provider of products and services for
automating the management, governance and secure use of open source
software, at enterprise scale, in a multi-source development process.
Black Duck™ enables companies to shorten time-to-solution and
reduce development costs while mitigating the management, compliance
and security challenges associated with open source software.&#160; Black
Duck Software powers Koders.com, the industry&#8217;s leading code search
engine for open source, and is among the 500 largest software
companies in the world, according to Softwaremag.com. The company is
headquartered near Boston and has offices in San Mateo, California,
London, Paris, Frankfurt, Hong Kong, Tokyo and Beijing.</p><p style="margin-bottom: 0cm; widows: 0; orphans: 0;" class="western">For
more information, visit <a href="http://www.blackducksoftware.com/">www.blackducksoftware.com</a>.&#160;</p><p style="margin-bottom: 0cm; widows: 0; orphans: 0;" class="western"><em>Black
Duck, Know Your Code and the Black Duck logo are registered
trademarks of Black Duck Software, Inc. in the United States and
other jurisdictions. Koders is a trademark of Black Duck Software,
Inc. All other trademarks are the property of their respective
holders.</em></p><h4>Press contacts</h4><p class="western" style="margin-bottom: 0cm; font-style: normal; text-decoration: none;" lang="en-GB"><strong><strong>Peter
Vescuso</strong></strong><br />Black Duck
Software<br /><a href="http://press@blackducksoftware.com/">press@blackducksoftware.com
</a><br />+1 781-891-5100</p><p class="western" style="margin-bottom: 0cm; font-style: normal; text-decoration: none;" lang="en-GB"><strong><strong>Ann
Dalrymple</strong></strong><br />TopazPartners</p>]]>
        
    </content>
</entry>

<entry>
    <title>Open Source lives - PostgreSQL developers at credativ</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/09/open-source-lives---postgresql-developers-at-credativ.html" />
    <id>tag:blog.credativ.com,2010:/en//2.175</id>

    <published>2010-09-20T14:00:34Z</published>
    <updated>2010-09-20T13:56:42Z</updated>

    <summary> Earlier this year, blogger and PostgreSQL Committer Andrew Dunstan drew up a list of individual Committers to the PostgreSQL Project. We are proud to say that this list featured some of our employees. In May, PostgreSQL&apos;s Andrew Dunstan published...</summary>
    <author>
        <name>Irenie White</name>
        <uri>http://www.credativ.co.uk</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
<em>Earlier this year, blogger and PostgreSQL Committer Andrew Dunstan drew up a list of individual Committers to the PostgreSQL Project. We are proud to say that this list featured some of our employees.</em><br />
<br/><br />
In May, PostgreSQL's Andrew Dunstan published some data about the productivity of PostgreSQL Committers at <a href="http://people.planetpostgresql.org/andrew/index.php?/archives/79-30,000-commits-and-still-going-strong.html/">30,000 commits and still going strong</a>, detailing the number of commits made by developers with commit rights. Incidentally, becoming a Committer is no mean feat; although there is no set procedure for acquiring the right to commit, it will generally follow a candidate having sent numerous good patches over a long period of time. Existing Committers, or the core team will then propose and approve assigning Committer's rights to the candidate. </p>

<p>credativ can claim involvement with many other Open Source Projects in addition to PostgreSQL. Community involvement is taken seriously at credativ, as is evident from Andrew Dunstan's statistics. A few of the Committers mentioned work at various international credativ offices; Michael Meskes, Joe Conway and Dave Cramer. What is not clear from Dunstan's list is the number of credativ employees who contribute large amounts of code but are not actually Committers; take Bernd Helmle, for example, who readers of this blog will be familiar with from his <a href="http://blog.credativ.com/en/postgresql/">PostgreSQL articles</a> not only as author but also as a developer, yet he does not feature in Andrew's statistics.</p>

<p>Nevertheless credativ's presence on this list is indicative of our achievements as well as our employees' connections with Open Source; if you would like to know more about our Open Source involvement simply leave us a comment here... and if you are interested in <a href="http://www.credativ.co.uk/services/support/">Open Source Support</a>, please <a href="http://www.credativ.co.uk/contact/">contact us</a>.<br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - which configs require restart?</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/09/tip-postgresql-tip-of-the-day---which-configs-require-restart.html" />
    <id>tag:blog.credativ.com,2010:/en//2.190</id>

    <published>2010-09-11T01:12:19Z</published>
    <updated>2010-09-11T01:32:34Z</updated>

    <summary> I&apos;ve been asked on at least three separate occasions lately how to know if changing a particular postgresql.conf item requires a restart, or a reload, of PostgreSQL. Here is my quick and dirty favorite way to answer this question:...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="plpgsql" label="plpgsql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sequences" label="sequences" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="http://blog.credativ.com/en/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
I've been asked on at least three separate occasions lately how to know if changing a particular postgresql.conf item requires a restart, or a reload, of PostgreSQL. Here is my quick and dirty favorite way to answer this question:<br />
<br/></p>
<pre class='brush: sql'>
-- configs requiring postgresql restart
select name, setting, context
  from pg_settings where context = 'postmaster';

-- configs requiring postgresql reload
select name, setting, context
 from pg_settings where context = 'sighup';
</pre>]]>
        
    </content>
</entry>

<entry>
    <title>CIOZone Interview</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/08/ciozone-interview.html" />
    <id>tag:blog.credativ.com,2010:/en//2.186</id>

    <published>2010-08-11T14:55:23Z</published>
    <updated>2010-08-11T14:55:29Z</updated>

    <summary>CIOZone, a social network for CIOs, recently interviewed our Founder and CEO Dr. Michael Meskes. CIOZone is a central place where CIOs can network. In this video, Roger Green takes the time to drop in to our office in Moenchengladbach,...</summary>
    <author>
        <name>Lukas Gärtner</name>
        <uri>http://blog.credativ.de</uri>
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><a href="http://blog.credativ.com/de/assets_c/2010/08/mme-ciozone-79.html" onclick="window.open('http://blog.credativ.com/de/assets_c/2010/08/mme-ciozone-79.html','popup','width=443,height=324,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blog.credativ.com/de/assets_c/2010/08/mme-ciozone-thumb-100x73-79.png" width="100" height="73" alt="mme-ciozone.png" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /></a><em>CIOZone, a social network for CIOs, recently interviewed our Founder and CEO Dr. Michael Meskes.</em><br/><br />
CIOZone is a central place where CIOs can network. In this video, Roger Green takes the time to drop in to our office in Moenchengladbach, Germany to interview Michael Meskes, the founder of <a href="http://www.credativ.com">credativ</a> about the history of the company, how Open Source has developed and how the business is different today from what it was 10 years ago.</p>

<p>This discussion is followed by analysis of current development and future challenges; the difference between Open Source vendors and proprietary global players; virtualisation and cloud computing in relation to Open Source and what to keep in mind when migrating to Open Source software.</p>

<p>Read on or watch the video at <a href="http://www.ciozone.com/index.php/Open-Source-Video/Interview-with-Dr.-Michael-Meskes-Founder/CEO-Credativ-GMBH-Germany.html">ciozone.com</a>.</p>

<p>If you're looking for support, services and <a href="http://www.credativ.co.uk/services/training">training</a> for Open Source software, you've come to the right place at credativ!</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - PL/R performance improvements</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-topic-of-the-day---plr-performance-improvements.html" />
    <id>tag:blog.credativ.com,2010:/en//2.184</id>

    <published>2010-07-24T18:31:12Z</published>
    <updated>2010-07-24T20:30:16Z</updated>

    <summary>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. A change is being tested which treats arrays of 4 byte integers and 8 byte floating point values as a...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="analytics" label="analytics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plr" label="PL/R" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><img alt="Rlogo.jpg" src="http://blog.credativ.com/en/Rlogo.jpg" width="100" height="76" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. A change is being tested which treats arrays of 4 byte integers and 8 byte floating point values as a special case, resulting in a dramatic performance improvement.</em></p>

<p>In a recent post, I discussed PL/R performance related to seismic timeseries data stored as an array of floats that are all recorded during some seismic event at a constant sampling rate. The problem was that when dealing with, say, 14000 arrays of floats, each having on the order of 16000 elements, passing the data to and from PL/R proved slower than hoped.</p>

<p>My ultimate solution was to show how a significant performance improvement could be achieved by importing the arrays into Postgres tables directly as raw R objects, and then operating on those objects later using PL/R. The problem with this approach is that in some, if not most, cases, you may want to access that same data from other procedural languages or hand off the arrays to some client other than R. In this case the raw R object does not meet your needs.</p>

<p>So I thought about it a bit and researched the source code on the Postgres and R sides of PL/R, and concluded that for certain special cases it was possible to dramatically improve speed by skipping the one-at-a-time element conversion as arrays are processed going between PostgreSQL and R. Specifically, the in-memory storage of the array data is binary compatible in the following circumstances:<br />
<ol><li>pgsql -> R</li><ul><li>Argument is integer or double precision array</li><li>Element data type is pass-by-value for given Postgres version and architecture</li><li>No NULL elements</li><li>Array is one dimensional</li></ul><li>R -> pgsql</li><ul><li>Integer vector returned with integer array return type</li><li>Real vector returned with double precision array return type</li><li>No NA elements</li><li>One dimensional vector</li></ul></ol></p>

<p>Pass-by-value is most likely true for double precision (float8) if PostgreSQL is at least version 8.4 and was built with a 64 bit system architecture. If these conditions are met, PL/R now simply copies en masse the in-memory array data from the PostgreSQL array data structure to the R vector data structure. This avoids all the overhead associated with iterating over the array element by element. Although I am not a fan of special case code such as this, the use case is important (if you are crunching numbers, they are likely stored as double precision elements), and the performance benefit is huge. Here is the timing difference with the patched PL/R versus the unpatched PL/R:<br />
</p>
<pre class='brush: sql'>
DROP TABLE IF EXISTS test_ts;
CREATE TABLE test_ts
(
  dataid bigint NOT NULL,
  data double precision[],
  CONSTRAINT pk_data PRIMARY KEY (dataid)
);

CREATE OR REPLACE FUNCTION load_test(int) RETURNS text AS $$
 DECLARE
  i    int;
 BEGIN
  FOR i IN 1..$1 LOOP
   --16789 double precision elements in the data array
   INSERT INTO test_ts (dataid, data) VALUES (i, '{-0.0205086770285039,...}');
  END LOOP;
  RETURN 'OK';
 END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION
filt_r_nothing(ts double precision[])
RETURNS double precision[] AS $$
  return(ts);
$$ LANGUAGE 'plr' IMMUTABLE;

CREATE OR REPLACE FUNCTION
filt_r_avg(ts double precision[])
RETURNS double precision AS $$
  return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

-- INSERT 14000 rows of 16789 element arrays
SELECT load_test(14000);

-- unpatched code
UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 1224087.064 ms

-- patched code
UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 225591.429 ms

-- unpatched code
contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 441573.619 ms

-- patched code
contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 6541.039 ms

-- unpatched code
select array_upper(filt_r_nothing(data),1) from test_ts;
 array_upper 
-------------
       16879
       16879
[...]
(14000 rows)
Time: 1108651.349 ms

-- patched code
select array_upper(filt_r_nothing(data),1) from test_ts;
 array_upper 
-------------
       16879
       16879
[...]
(14000 rows)
Time: 23101.602 ms
</pre><p><br />
So to summarize:<table><tr color="gray"><td>Test</td><td>Case</td><td>Time (ms)</td><td>Improvement</td></tr><tr><td>UPDATE NOOP</td><td>Unpatched</td><td>1224087.064</td><td>--</td></tr><tr><td>UPDATE NOOP</td><td>Patched</td><td>225591.429</td><td>82%</td></tr><tr><td>SELECT AVG</td><td>Unpatched</td><td>441573.619</td><td>--</td></tr><tr><td>SELECT AVG</td><td>Patched</td><td>6541.039</td><td>98%</td></tr><tr><td>SELECT NOOP</td><td>Unpatched</td><td>1108651.349</td><td>--</td></tr><tr><td>SELECT NOOP</td><td>Patched</td><td>23101.602</td><td>98%</td></tr></table></p>

<p>Pretty substantial improvement in these particular, but I think common, use cases. The UPDATE test sees less overall benefit because the time to write out the changes would be significant and the same regardless of array handling in PL/R. The difference between SELECT NOOP and SELECT AVG is driven by the fact that the latter returns a scalar result, while the former returns the entire array. The reason SELECT NOOP does array_upper() on the returned array, is that otherwise all that array data (something like 4 GB) gets materialized in memory by psql, which of course greatly slows things further and is not what we are trying to test.</p>

<p>Please give the changes a try and provide feedback before I release another PL/R version. You can grab the new code from <a href="http://github.com/jconway/plr">github</a> and sign up for the <a href="http://pgfoundry.org/mail/?group_id=1000247">PL/R mailing list</a> to post your results or report any questions/problems. And of course visit the <a href="http://www.joeconway.com/plr/">PL/R homepage</a> and <a href="http://www.joeconway.com/web/guest/pl/r">PL/R wiki</a> for more general information about PL/R -- particularly to watch for these changes in the next official release. Finally, don't hesitate to <a href="http://www.credativ.us/contact/">contact me</a> directly if the other choices don't suit you for some reason.</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Howto] Debian preseed with Netboot</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/howto-debian-preseed-with-netboot.html" />
    <id>tag:blog.credativ.com,2010:/en//2.172</id>

    <published>2010-07-23T11:00:00Z</published>
    <updated>2010-07-23T11:09:59Z</updated>

    <summary>The vast majority of Debian installations are simplified with the use of Preseeding and Netboot. Friedrich Weber, a school student on a work experience placement with us at our German office has observed the process and captured it in a...</summary>
    <author>
        <name>Irenie White</name>
        <uri>http://www.credativ.co.uk</uri>
    </author>
    
        <category term="Debian" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Howto" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="credativ" label="credativ" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="debian" label="Debian" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="howto" label="howto" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="linux" label="Linux" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="preseed" label="preseed" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="debianlogo.png" src="http://blog.credativ.com/de/static/debianlogo.png" width="60" height="73" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>The vast majority of Debian installations are simplified with the use of Preseeding and Netboot. Friedrich Weber, a school student on a work experience placement with us at our German office has observed the process and captured it in a Howto here:</em></p>

<p>Imagine the following situation: you find yourself with ten to twenty brand new Notebooks and the opportunity to install them with Debian and customise to your own taste. In any case it would be great fun to manually perform the Debian installation and configuration on each Notebook. This is where <a href="http://d-i.alioth.debian.org/manual/en.i386/apb.html">Debian Preseed</a> comes into play.</p>

<p>The concept is simple and self-explanatory; usually, whoever is doing the installation will be faced with a number of issues during the process (e.g. language, partitioning, packages, Bootloader, etc.) In terms of Preseed, all of these issues can now be resolved. Only those which are not already accounted for in Preseed remain for the Debian installer.  In the ideal situation these would become apparent at the outset of the installation, where the solution would differ depending on the target system and which the administrator must deal with manually - only when these have been dealt with can the installation be left to run unattended. </p>

<p>Preseed functions on some simple inbuilt configuration data: <tt>preseed.cfg</tt>. It includes, as detailed above, the questions which must be answered during installation, and in <a href="http://en.wikipedia.org/wiki/Debconf_(software_package)">debconf</a>-format. Data such as this consists of several rows, each row of which defines a debconf configuration option - a response to a question - for example:<br />
 </p>
<pre class='brush: text'>
    d-i debian-installer/local	string de_DE.UTF-8
</pre><p></p>

<p>The first element of these lines is the name of the package, which is configured (d-i is here an abbreviation of debian installer), the second element is the name of the option, which is set, as the third element of the type of option (a string) and the rest is the value of the option. In this example, we set the language to German using UTF-8-coding.</p>

<p>You can put lines like this together yourself, even simpler with the tool <tt>debconf-get-selections</tt>: these commands provide straight forward and simple options, which can be set locally.  From the selection you can choose your desired settings, adjusted if necessary and copied into <tt>preseed.cfg</tt>.</p>

<p>Here is an example of <tt>preseed.cfg</tt>:<br />
</p>
<pre class='brush: text'>
    d-i debian-installer/locale string de_DE.UTF-8
    d-i debian-installer/keymap select de-latin1
    d-i console-keymaps-at/keymap select de
    d-i languagechooser/language-name-fb select German
    d-i countrychooser/country-name select Germany
    d-i console-setup/layoutcode string de_DE

    d-i clock-setup/utc boolean true
    d-i time/zone string Europe/Berlin
    d-i clock-setup/ntp boolean true
    d-i clock-setup/ntp-server string ntp1

    tasksel tasksel/first multiselect standard, desktop, gnome-desktop, laptop
    d-i pkgsel/include string openssh-client vim less rsync
</pre><p></p>

<p>In addition to language and timezone settings, selected tasks and packages are also set with these options.  If left competely unattended, the installation will not complete, but will make a good start.</p>

<p>Now onto the question of where Preseed pulls its data from. It is in fact possible to use Preseed with CD and DVD images or USB sticks, but generally more comfortable to use a Debian Netboot Image, essentially an installer, which is started across the network and which can cover its Preseed configuration. This boot across the network is implemented with <a href="http://wikipedia.org/wiki/Preboot_Execution_Environment">PXE</a> and requires a system that can boot from a network card.</p>

<p>Next,  the system depends on booting from the network card. It travels from a DHCO server to an IP address per broadcast. This DHCP server transmits not only a suitable IP, but also to the IP of a so-called Bootserver. A Bootserver is a <a href="http://wikipedia.org/wiki/Trivial_File_Transfer_Protocol">TFTP-Server</a>, which provides a Bootloader to assist the Administrator with the desired Debian Installer. At the same time the Debian Installer can be shared with the Boot options that Preseed should use and where he can find the Preseed configuration. Here is a snippet of the PXELINUX configuration data <tt>pxelinux.cfg/default</tt>:<br />
</p>
<pre class='brush: text'>
    label i386
        kernel debian-installer/i386/linux
        append vga=normal initrd=debian-installer/i386/initrd.gz netcfg/choose_interface=eth0 domain=example.com locale=de_DE debian-installer/country=DE debian-installer/language=de debian-installer/keymap=de-latin1-nodeadkeys console-keymaps-at/keymap=de-latin1-nodeadkeys auto-install/enable=false preseed/url=http://$server/preseed.cfg DEBCONF_DEBUG=5 -- quiet 
</pre><p></p>

<p>When the user types <tt>i386</tt>, the tt>debian-installer/i386/linux</tt> kernel (found on the TFTP server) is downloaded and run. This is in addition to a whole load of bootoptions given along the way. The debian installer allows the provision of debconf options as boot parameters. It is good practice for the installer to somehow communicate where to find the Preseed communication on the network (<tt>preseed/url</tt>). In order to download this Preseed configuration, it must also be somehow built into the network.  </p>

<p>The options for that will be handed over (the options for the hostnames would be deliberately omitted here, as every target system has its own Hostname). <tt>auto-install/enable</tt> would delay the language set up so that it is only enabled after the network configuration, in order that these installations are read through <tt>preseed.cfg</tt>. It is not necessary as the language set up will also be handed over to the kernel options to ensure that the network configuration is German.</p>

<p>The examples and configuration excerpts mentioned here are obviously summarised and shortened. Even so, this blog post should have given you a glimpse into the concept of Preseed in connection with netboot. Finally, here is a complete version of <tt>preseed.cfg</tt>:<br />
</p>
<pre class='brush: text'>
    d-i debian-installer/locale string de_DE.UTF-8
    d-i debian-installer/keymap select de-latin1
    d-i console-keymaps-at/keymap select de
    d-i languagechooser/language-name-fb select German
    d-i countrychooser/country-name select Germany
    d-i console-setup/layoutcode string de_DE

    # Network
    d-i netcfg/choose_interface select auto
    d-i netcfg/get_hostname string debian
    d-i netcfg/get_domain string example.com

    # Package mirror
    d-i mirror/protocol string http
    d-i mirror/country string manual
    d-i mirror/http/hostname string debian.example.com
    d-i mirror/http/directory string /debian
    d-i mirror/http/proxy string
    d-i mirror/suite string lenny

    # Timezone
    d-i clock-setup/utc boolean true
    d-i time/zone string Europe/Berlin
    d-i clock-setup/ntp boolean true
    d-i clock-setup/ntp-server string ntp.example.com

    # Root-Account
    d-i passwd/make-user boolean false
    d-i passwd/root-password password secretpassword
    d-i passwd/root-password-again password secretpassword

    # Further APT-Options
    d-i apt-setup/non-free boolean false
    d-i apt-setup/contrib boolean false
    d-i apt-setup/security-updates boolean true

    d-i apt-setup/local0/source boolean false
    d-i apt-setup/local1/source boolean false
    d-i apt-setup/local2/source boolean false

    # Tasks
    tasksel tasksel/first multiselect standard, desktop
    d-i pkgsel/include string openssh-client vim less rsync
    d-i pkgsel/upgrade select safe-upgrade

    # Popularity-Contest
    popularity-contest popularity-contest/participate boolean true

    # Command to be followed after the installation. `in-target` means that 
         the following
    # Command is followed in the installed environment, rather than in 
        the installation environment.
    # Here http://$server/skript.sh nach /tmp is downloaded, enabled and 
        implemented.
    d-i preseed/late_command string in-target wget -P /tmp/ http://$server/skript.sh; 
  in-target chmod +x /tmp/skript.sh; in-target /tmp/skript.sh
</pre><p></p>

<p>All Howtos of this blog are grouped together in the <a href="/en/howto/">Howto category</a> - and if you happen to be looking for <a href="http://www.credativ.co.uk/services/support/projects/linux-distributions/debian-gnulinux/">Support and Services for Debian</a> you've come to the right place at credativ.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - advanced analytics</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-topic-of-the-day---advanced-analytics.html" />
    <id>tag:blog.credativ.com,2010:/en//2.183</id>

    <published>2010-07-12T02:58:22Z</published>
    <updated>2010-12-07T12:43:00Z</updated>

    <summary>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. It&apos;s better to directly store the data as R objects. I had been planning to continue with timeseries aggregation, but...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="analytics" label="analytics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plr" label="PL/R" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><img alt="Rlogo.jpg" src="http://blog.credativ.com/en/Rlogo.jpg" width="100" height="76" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>When you pass large amounts of data to and from PL/R, quite a lot of time is needed for converting. It's better to directly store the data as R objects.</em></p>

<p>I had been planning to continue with timeseries aggregation, but decided to take a side-road based on a recent question on the <a href="http://www.joeconway.com/plr">PL/R</a> mailing list.</p>

<p>The question was related to seismic data, which is in fact timeseries data. However, I guess the data is normally stored as an array of floats that are all recorded during some seismic event at a constant sampling rate. The arrays are available from online sources in an individual file for each event being analyzed. The problem was that when dealing with, say, 14000 arrays of floats, each having on the order of 16000 elements, passing the data to and from PL/R proved slower than hoped.</p>

<p>So we start with loading of sample data for a performance test:</p>
<pre class='brush: sql'>DROP TABLE IF EXISTS test_ts;
CREATE TABLE test_ts
(
  dataid bigint NOT NULL,
  data double precision[],
  CONSTRAINT pk_data PRIMARY KEY (dataid)
);

CREATE OR REPLACE FUNCTION filt_r_nothing(ts double precision[])
RETURNS double precision[] AS $$
 return(ts);
$$ LANGUAGE 'plr' IMMUTABLE;

CREATE OR REPLACE FUNCTION load_test(int) RETURNS text AS $$
  DECLARE
   i    int;
  BEGIN
    FOR i IN 1..$1 LOOP
      INSERT INTO test_ts(dataid,data) VALUES (i,'{-0.0205086770285039, ...'})
    END LOOP;
    RETURN 'OK';
  END;
$$ LANGUAGE plpgsql;

SELECT load_test(14000);
 load_test 
-----------
 OK
(1 row)

Time: 123861.362 ms
</pre><p></p>

<p>The array in the VALUES clause of that function actually contains 16879 float8 elements. You can see that it takes over two minutes on my development machine to load the table with 14000 rows of this array. Note that on my development machine I have done no tuning of PostgreSQL configs, and I built with --enable-debug, --enable-cassert, and CFLAGS='-O0 -g3'.</p>

<p>Next, we update the data column with filt_r_nothing() which does nothing other than returning the same array it was passed.<br />
</p>
<pre class='brush: sql'>UPDATE test_ts SET data = filt_r_nothing(data);
UPDATE 14000
Time: 1224087.064 ms
</pre><p></p>

<p>Not pretty. Over 20 minutes. I did some profiling of PL/R and concluded most of the time was being spent converting 16879 PostgreSQL array elements from float8 datums to R vector elements one at a time while processing the function argument, and then repeating the process in reverse while creating the returned result. Perhaps there are optimizations that can be made to that process, but since PostgreSQL and R each have their own binary representation of this data, there is no avoiding the conversion overhead.</p>

<p>However, what is the point of the proposed performance test? The comparison was being made to another procedural language, which apparently does not convert the array elements if they are not used. A real function is presumably going to do some calculation over the array elements, requiring that they be individually accessed.</p>

<p>I decided to see how PL/pgSQL performs if forced to modify and return the passed array. The difference between this test and the PL/R one will give some insight on the time spent converting elements from PostgreSQL to R native form.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_plpgsql_nothing(ts double precision[])
RETURNS double precision[] AS $$
 BEGIN
  RETURN ts || 3.14159::float8;
 END
$$ LANGUAGE 'plpgsql' IMMUTABLE;

UPDATE test_ts SET data = filt_plpgsql_nothing(data);
UPDATE 14000
Time: 239054.580 ms
</pre><p></p>

<p>About 6 minutes. Much better. But let's see what happens if we do some more meaningful, if simple, calculations on the array elements.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_plpgsql_avg(ts double precision[])
RETURNS double precision AS $$
 DECLARE
  i int;
  numts int = array_upper(ts,1);
  ts_sum float8 = 0.0;
 BEGIN
  FOR i IN 1..numts LOOP
    ts_sum := ts_sum + ts[i];
  END LOOP;
  RETURN (ts_sum/numts::float8);
 END
$$ LANGUAGE 'plpgsql' IMMUTABLE;

select filt_plpgsql_avg(data) from  test_ts;
--killed after &gt; 1 hour

CREATE OR REPLACE FUNCTION filt_r_avg(ts double precision[])
RETURNS double precision AS $$
 return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

contrib_regression=# select filt_r_avg(data) from test_ts;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
[...]
(14000 rows)
Time: 441573.619 ms
</pre><p></p>

<p>Although the PL/R function still took over 7 minutes to process 14000 rows with 16879 elements, PL/pgSQL took long enough that I killed it out of impatience.</p>

<p>It occurred to me that a feature I added to PL/R within the past year or so might come in handy about now. Namely, it is possible to directly store R objects in PostgreSQL tables. This means that when the datum is passed to a PL/R function, it is all ready to go -- no conversion needed. Let's take a look at that scenario.<br />
</p>
<pre class='brush: sql'>DROP TABLE IF EXISTS test_ts_obj;
CREATE TABLE test_ts_obj
(
  dataid serial PRIMARY KEY,
  data bytea
);

CREATE OR REPLACE FUNCTION make_r_object(fname text)
RETURNS bytea AS $$
 myvar&lt;-scan(fname,sep=&quot;,&quot;)
 return(myvar);
$$ LANGUAGE 'plr' IMMUTABLE;

INSERT INTO test_ts_obj (data) SELECT make_r_object('array-data.csv') from generate_series(1,14000);
INSERT 0 14000
Time: 44182.598 ms

CREATE OR REPLACE FUNCTION filt_r_avg(ts bytea)
RETURNS double precision AS $$
 return(mean(ts));
$$ LANGUAGE 'plr' IMMUTABLE;

select filt_r_avg(data) from  test_ts_obj;
    filt_r_avg     
-------------------
 0.656530643017027
 0.656530643017027
 [...]
 0.656530643017027
(14000 rows)

Time: 12828.331 ms
</pre><p></p>

<p>This results in 44 seconds to load the same 14000 rows of array data as before, but<br />
directly as R objects. Compare that to the 2 minutes to load as PostgreSQL arrays as seen at the beginning of this article. And now it only takes 13 seconds to operate on the 14000 R objects compared to 442 seconds. That's a nice improvement!</p>

<p>But PL/R gives you access to the full power of the R environment for statistical computing and graphics. Just for fun, here is a PL/R function that calculates the "Power Spectrum" of the seismic data, and returns the result as a JPEG of the plot.<br />
</p>
<pre class='brush: sql'>CREATE OR REPLACE FUNCTION
filt_r_ps(ts bytea)
RETURNS bytea AS $$
  library(quantmod)
  library(cairoDevice)
  library(RGtk2)

  fourier&lt;-fft(ts)
  magnitude&lt;-Mod(fourier)
  y2 &lt;- magnitude[1:(length(magnitude)/10)]
  x2 &lt;- 1:length(y2)/length(magnitude)
  mydf &lt;- data.frame(x2,y2)

  pixmap &lt;- gdkPixmapNew(w=500, h=500, depth=24)
  asCairoDevice(pixmap)

  plot(mydf,type=&quot;l&quot;)
  plot_pixbuf &lt;- gdkPixbufGetFromDrawable(NULL, pixmap,
                                                        pixmap$getColormap(),
                                                        0, 0, 0, 0, 500, 500)
  buffer &lt;- gdkPixbufSaveToBufferv(plot_pixbuf,
                                                       &quot;jpeg&quot;,
                                                        character(0),
                                                        character(0))$buffer
  return(buffer)
$$ LANGUAGE 'plr' IMMUTABLE;
</pre><p></p>

<p>This is now not about performance so much as it is about analytical power. About half of the lines in this function are setting up to capture the output graph. The "meat" of the function can be contained in these few lines:<br />
</p>
<pre class='brush: sql'>fourier&lt;-fft(ts)
magnitude&lt;-Mod(fourier)
plot(x=1:length(y2)/length(magnitude),
       y=magnitude[1:(length(magnitude)/10)],
       type=&quot;l&quot;)
</pre><p></p>

<p>Compliment that PL/R function with a bit of PHP code...<br />
</p>
<pre class='brush: sql'>&lt;?php
function hex2bin($data)
{
	$data = ltrim($data, &quot;\x&quot;);
	$len = strlen($data);
	return pack(&quot;H&quot; . $len, $data);
} 

$dbconn = pg_connect(&quot;dbname=contrib_regression&quot;);
$rs = pg_query( $dbconn, &quot;select plr_get_raw(filt_r_ps(data))
                                    from test_ts_obj where dataid = 42&quot;);
$hexpic = pg_fetch_array($rs);
$cleandata = hex2bin($hexpic[0]);

header(&quot;Content-Type: image/jpeg&quot;);
header(&quot;Last-Modified: &quot; .
date(&quot;r&quot;, filectime($_SERVER['SCRIPT_FILENAME'])));
header(&quot;Content-Length: &quot; . strlen($cleandata));
echo $cleandata;
?&gt;
</pre><p></p>

<p>...and the output looks like:<img alt="plr-blog.jpg" src="/en/jco/plr-blog.jpg" width="500" height="500" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>Fairly sophisticated output for relatively little effort! For more information or assistance with respect to PostgreSQL, PL/R, and/or advanced analytics, <a href="http://www.credativ.us/contact/">don't hesitate to contact us</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL topic of the Day - aggregating timeseries data</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/tip-postgresql-tip-of-the-day---aggregating-timeseries-data.html" />
    <id>tag:blog.credativ.com,2010:/en//2.182</id>

    <published>2010-07-09T00:21:04Z</published>
    <updated>2010-12-07T12:44:45Z</updated>

    <summary>Frequently when dealing with parametric data, you need to &quot;roll up&quot; the data in summary fashion as it ages in order to reduce the volume kept on hand, or maybe because the summary statistics are what really interests you. There...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>Frequently when dealing with parametric data, you need to "roll up" the data in summary fashion as it ages in order to reduce the volume kept on hand, or maybe because the summary statistics are what really interests you. There are several ways to do that, and this post highlights four different approaches.</em></p>

<p>I was reminded of this kind of "roll ups" today by a question on the pgsql-novice list. This is actually quite a large topic, so I this tip will likely just scratch the surface. The question was related to storing min, max, and avg summaries on an hourly, daily, and weekly basis. The basic idea, for example, is that you can keep raw data for maybe a week, hourly summaries for 6 months, daily summaries for 3 years, and weekly summaries forever. As I mentioned in my reply, I have done this kind of thing over the years using at least 4 approaches:</p>

<ol>
	<li>Aggregate on demand</li>
	<li>Batch aggregate on a periodic basis -- e.g. run your aggregate query with a cron job which truncates and rebuilds a table (i.e. a  "materialized view")</li>
	<li>Write a C based trigger that does "continuous aggregation" to a materialized table</li>
	<li>Write a C based bulk loader that aggregates as it bulk loads the raw  data into a materialized table</li>
</ol>

<p>The first approach is simply to run an aggregate query whenever you need the summarized data. Obviously this does not really satisfy the stated desire to discard aged raw data, but I mention it for completeness. In some cases you have sufficient storage given your data volume, and performance of the aggregate is "good enough".</p>

<p>The second is the rough equivalent of a materialized view. In other words, run a batch job via cron or something similar that <tt>TRUNCATE</tt>s and then repopulates a table used for storage of the aggregate result. Particularly for daily or weekly summary data, when the consumers of the data are 9-5 folk, this approach works pretty well. This also fits in nicely with common partitioning schemes.</p>

<p>The third is one where you want summary statistics to be updated live. In this case you actually want the summary data for the current hour/day/week to all be constantly updated as new raw data comes in. Otherwise you are stuck always looking at last hour's, or yesterday's, or last weeks, data. The way to do this is through a trigger. A while back I implemented a continuous aggregation trigger in C that used prepared queries to update my aggregate table for every <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> occurring on the target table. However even with the trigger written in C and using prepared queries, the performance impact of the trigger firing for every DML event was significant.</p>

<p>Finally, the forth method can be used when your reporting needs are such that the raw data can be collected for some period before storing in your database. Let's say the summary reports are never run against the current hour. What you can do is build up a file in suitable format for bulk loading via <tt>COPY</tt>. Then process the data as it is bulk loaded to calculate and insert the summary at the same time. Again, I had done that in the past using a C program that read in the stored files, generated the summary data while building a string buffer, and finally using libpq's <tt>PQputCopyData()</tt> to populate the tables.</p>

<p>More than likely some combination of the above is what you really want. Perhaps use method 2 to maintain your weekly and daily aggregate materialized views, and use method 4 to update your hourly aggregate data.</p>

<p>This post was a lot of discussion and no code -- perhaps tomorrow I will continue with some more concrete examples.</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - mass modification of sequences</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-tip-of-the-day---mass-modification-of-sequences.html" />
    <id>tag:blog.credativ.com,2010:/en//2.180</id>

    <published>2010-07-07T21:34:07Z</published>
    <updated>2010-12-07T12:45:11Z</updated>

    <summary>Someone posted a dilemma to the pgsql-sql list today that involved many if not all of his sequences getting out of sync with their respective &quot;serial&quot; columns. In other words, something like &quot;SELECT max(id) FROM sometable&quot; yields 42, but the...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="plpgsql" label="plpgsql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sequences" label="sequences" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" />Someone posted a dilemma to the pgsql-sql list today that involved many if not all of his sequences getting out of sync with their respective "serial" columns. In other words, something like "SELECT max(id) FROM sometable" yields 42, but the sequence nextval for sometable.id is currently set to 36. This is obviously bad (for reasons left as an exercise for the reader). So besides trying to figure out how the database ended up in this state, he needed a script to reset all of his sequences to the correct next value.</p>

<p>I had run into a similar need not too long ago. Namely, when setting up multi-master replication with Bucardo you need your sequences to draw different values on either master so as not to conflict. One solution is to set up all your sequences to jump by 2, and use even numbers on one master and odd numbers on the other. Again, a script makes this easier to deal with, and I had developed one for this situation. So I modified it for the problem mentioned above.</p>

<p>Both versions follow:<br />
</p>
<pre class='brush: sql'>-- create &quot;odd&quot; and &quot;even&quot; sequences in multi-master scenario
CREATE OR REPLACE FUNCTION adjust_seqs(namespace text, starteven bool)
  RETURNS text AS $$
DECLARE
  rec         record;
  startval   bigint;
  sql          text;
  fqname  text;
BEGIN
  FOR rec in EXECUTE 'select relname from pg_class where relkind = ''S''
                      and relnamespace = (select oid from pg_namespace
                      where nspname=''' || namespace || ''')' LOOP
    fqname :=  namespace || '.' ||  rec.relname;
    IF starteven THEN
      EXECUTE 'SELECT ((last_value / 2) * 2) + 2 from ' || fqname INTO startval;
    ELSE
      EXECUTE 'SELECT ((last_value / 2) * 2) + 1 from ' || fqname INTO startval;
    END If;
    sql := 'ALTER SEQUENCE ' || fqname || ' INCREMENT BY 2 RESTART WITH ' || startval;
    EXECUTE sql;
    RAISE NOTICE '%', sql;
  END LOOP;
  RETURN 'OK';
END;
$$ LANGUAGE plpgsql STRICT;
SELECT adjust_seqs('public', true);  -- in master1 (even)
SELECT adjust_seqs('public', false); -- in master2 (odd)
</pre><p><br />
</p>
<pre class='brush: sql'>-- update sequences that have gotten out-of-sync with the
-- PK field for which they normally provide the default
CREATE OR REPLACE FUNCTION adjust_seqs(namespace text)
  RETURNS text AS $$
DECLARE
  rec           record;
  startval     bigint;
  sql            text;
  seqname  text;
BEGIN
  FOR rec in EXECUTE 'select table_name, column_name, column_default
                      from information_schema.columns
                      where table_schema = ''' || namespace || '''
                      and column_default like ''nextval%''' LOOP

    seqname := pg_get_serial_sequence(rec.table_name, rec.column_name);
    sql := 'select max(' || rec.column_name || ') + 1 from ' || rec.table_name;
    EXECUTE sql INTO startval;
    IF startval IS NOT NULL THEN
      sql := 'ALTER SEQUENCE ' || seqname || ' RESTART WITH ' || startval;
      EXECUTE sql;
      RAISE NOTICE '%', sql;
    END IF;
  END LOOP;
  RETURN 'OK';
END;
$$ LANGUAGE plpgsql STRICT;
select adjust_seqs('public');
</pre><p></p>

<p>Neither of these is heavily tested, and both make certain assumptions, so please test and modify to suit your own needs. Caveat emptor!</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Tip] PostgreSQL Tip of the Day - loading a PostGIS database dump</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/07/postgresql-tip-of-the-day---loading-a-postgis-database-dump.html" />
    <id>tag:blog.credativ.com,2010:/en//2.179</id>

    <published>2010-07-07T01:10:01Z</published>
    <updated>2010-12-07T12:44:00Z</updated>

    <summary>I was given a Postgres database dump to analyze today created by &quot;pg_dump -Fc&quot;. The source database included PostGIS 1.3.x extensions. I&apos;m not sure if this is standard with PostGIS, but the related database objects were all dumped with a...</summary>
    <author>
        <name>Joe Conway</name>
        <uri>http://www.credativ.us</uri>
    </author>
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tip" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgis" label="PostGIS" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p>I was given a Postgres database dump to analyze today created by "pg_dump -Fc". The source database included PostGIS 1.3.x extensions. I'm not sure if this is standard with PostGIS, but the related database objects were all dumped with a hard-coded library path, specifically <tt>/usr/lib/postgresql/8.3/lib</tt>. On my machine, I have many PostgreSQL clusters (essentially at least one for every supported branch dating back to 7.3.x), but they are not located under <tt>/usr/lib/postgresql</tt>.</p>

<p>As such, I needed a quick fix. To wit:<br />
</p>
<pre class='brush: sql'>pg_restore database.with.postgis.tgz &gt; db.w.postgis.dmp
sed 's|/usr/lib/postgresql/8.3/lib|$libdir|g' &lt; db.w.postgis.dmp &gt; db.w.postgis.dmp.new
</pre><p></p>

<p>The first line extracts the dump file from the compressed "custom" format into a human readable text SQL file. The second line replaces the hard-coded library path with the special PostgreSQL $libdir variable. This will always point to the correct location for any given PostgreSQL cluster. You can always discover where this is by running:<br />
<pre>pg_config --libdir</pre></p>]]>
        
    </content>
</entry>

<entry>
    <title>PostgreSQL 9.0 is now in Betaphase</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/05/postgresql-90-is-now-in-betaphase.html" />
    <id>tag:blog.credativ.com,2010:/en//2.162</id>

    <published>2010-05-25T10:29:00Z</published>
    <updated>2010-05-25T10:16:44Z</updated>

    <summary> The PostgreSQL developers&apos; community recently published the first Beta version of the new 9.0 release. Over 200 new functions and improvements feature in this new version. With this new release, PostgreSQL now amongst other features claims an inbuilt replication...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresql" label="PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><br />
<em>The PostgreSQL developers' community recently published the first Beta version of the new 9.0 release. Over 200 new functions and improvements feature in this new version.</em></p>

<p>With this <a href="http://www.postgresql.org/about/news.1198">new release</a>, PostgreSQL now amongst other features claims an inbuilt replication solution as well as the ability to access and read standby nodes, continuously being updated by <a href="http://www.postgresql.org/docs/8.4/static/warm-standby.html">Log Shipping </a> (Hot Standby). Streaming replication allows the sending of transaction logs directly to one or more standby nodes, which considerably reduces the amount of time lost compared with the more common, file-based log shipping. Combining these two features makes for an extremely efficient solution for high availability or loadbalanced systems.</p>

<p>The all new PostgreSQL version also offers the following innovations:</p>

<ul>
	<li>Memory based <strong>LISTEN/NOTIFY</strong>: this replaces the previous table based implementation and is much faster.</li>

<p>        <li><strong>Exclusion Constraints</strong>: broadens constraints to be able to deal with the complex datatypes of overlapping constraints.</li><br />
        <li>Procedural code such as PL/pgSQL, PL/Perl and PL/Python can now be done inline per <strong>DO</strong> command.</li> This means there is no longer need to define a function with <strong>CREATE FUNCTION</strong>.</p>

<p>        <li>Triggers on columns</li><br />
        <li>Triggers can now be tied to conditions</li><br />
        <li>Named argument lists for procedures</li><br />
        <li>Parameters can now be flexibly linked to rolls/databases</li><br />
</ul></p>

<p>As always, anyone interested is invited to share their test results with the developers.  Information on the procedure for testing and filing of error messages can be found in the <a href="http://wiki.postgresql.org/wiki/HowToBetaTest">Wiki</a>.</p>

<p>All blog articles which fall into the <a href="/en/postgresql/">PostgreSQL category</a> are grouped in their own feed, and if you find you need <a href="http://www.credativ.co.uk/services/support/projects/databases/postgresql">support and services for PostgreSQL</a>, you've come to the right place at credativ.</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Howto] RHCS: install on Debian</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/05/howto-rhcs-install-on-debian.html" />
    <id>tag:blog.credativ.com,2010:/en//2.148</id>

    <published>2010-05-20T10:40:00Z</published>
    <updated>2010-05-25T10:10:26Z</updated>

    <summary>Following our earlier introduction to RHCS we now present a real world example: the installation of RHCS with Debian to provide certain virtual machines as services. Our RHCS overview already explained the basics of RHCS. This time we will take...</summary>
    <author>
        <name>Roland Wolters</name>
        <uri>http://www.credativ.de</uri>
    </author>
    
        <category term="Debian" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Howto" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="RHEL/CentOS" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="tux.jpg" src="/de/static/tux.jpg" width="86" height="102" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>Following our earlier introduction to RHCS we now present a real world example: the installation of RHCS with Debian to provide certain virtual machines as services.</em></p>

<p>Our <a href="/en/2010/03/rhcs-an-introduction.html">RHCS overview</a> already explained the basics of RHCS. This time we will take two hosts with shared storage and provide KVM guests as services.</p>

<h3>Installation of the nodes</h3>
In this setup the nodes are the machines which are running KVM. Each running KVM guest is a service managed by RHCS. While installing the KVM hosts you should make sure you comply with the following suggestions:
<ul><li><tt>/tmp/</tt> and <tt>/var/</tt> should be running on different partitions, this improves performance.</li>
<li>Activate Debian backports, especially for the Kernel.</li>
<li>Make sure all IP addresses can be resolved in both directions - <tt>/etc/hosts</tt> helps here in worst case.</li>
<li>The host name must not resolve to <tt>127.0.0.1</tt>! You would only get problems with the Cluster Management System CMAN.</li>
<li><tt>/etc/hosts/</tt> and <tt>/etc/resolv.conf</tt> should be the same on all nodes.</li>
<li>Create password free ssh keys for all nodes and distribute them.</li>
<li>For ultimate performance it is best to install the latest Debian Linux kernel. In our example we used <tt>linux-image-2.6.32-bpo.2-amd64</tt>, which crashes the guest kernels >= 2.6.30. However, a patch is available, see <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=573071">bug #573071</a>.</li>
<li>The network devices should be named in a way that makes sense, for example: <tt>rhcs-backbone</tt> and <tt>external</tt> instead of <tt>eth0</tt> and <tt>eth1</tt>.</li></ul>

<h3>Configuring the shared storage</h3>
As with almost any HA solution, a key element of RHCS is the shared storage which is accessed by all the nodes. In this example we take a "private" machine and install an iSCSI target on it:
<pre class='brush: plain'>
apt-get install iscsitarget iscsitarget-source 
echo 'ISCSITARGET_ENABLE=true' &gt; /etc/default/iscsitarget
m-a a-i iscsitarget
</pre><p><br />
Keep in mind that the iSCSI target must build properly, see <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=566740">bug #566740</a>. The configuration of the shared storage is done via <tt>/etc/ietd.conf</tt>:</p>
<pre class='brush: plain'>
IncomingUser discovery_in YourSecurePwd1
OutgoingUser discovery_out YourSecurePwd2
Target YOURMACHINE:clvm1
       IncomingUser node_in YourSecurePwd1
       OutgoingUser node_out YourSecurePwd2
       Lun 0 Path=/dev/sdx1,Type=blockio
</pre><p><br />
On the nodes the same target must be accessed, so make sure <tt>/etc/iscsi/iscsid.conf</tt> is correct:</p>
<pre class='brush: plain'>
discovery.sendtargets.auth.authmethod = CHAP
discovery.sendtargets.auth.username = discovery_in
discovery.sendtargets.auth.password = YourSecurePwd1
discovery.sendtargets.auth.username_in = discovery_out
discovery.sendtargets.auth.password_in = YourSecurePwd2
node.startup = automatic
node.session.auth.authmethod = CHAP
node.session.auth.username = node_in
node.session.auth.password = YourSecurePwd1
node.session.auth.username_in = node_out
node.session.auth.password_in = YourSecurePwd2
</pre><p><br />
The service is started with <tt>/etc/init.d/open-iscsi start</tt>. Existing targets can be searched, deleted or added by the following commands:</p>
<pre class='brush: plain'>
# discovering the targets
iscsiadm -m discovery -t st -p YOURMACHINE -P 1
# deleting target on wrong interface
iscsiadm -m node -p 192.168.0.100:3260,1 -o delete
# opening the portal
iscsiadm -m node --targetname &quot;iqn.2010-03.YOURMACHINE:clvm1&quot; --portal &quot;YOURMACHINE:3260&quot; --</pre><p></p>

<h3>VM setup</h3>
The virtual machines are provided by KVM. Thus the apropriate KVM software must be installed first:
<pre class='brush: plain'>
apt-get install linux-image-2.6.32-bpo.2-amd64 kvm libvirt-bin virtinst -t lenny-backports
</pre><p><br />
When configuring the bridge, make sure that the bridge name is the same on all nodes. Also the libvirt configuration must be the same on all hosts, so it makes sense to use <a href="/en/2010/03/howto-introduction-to-puppet.html">puppet</a> or similar techniques.<br />
Afterwards, bring up the guests with:</p>
<pre class='brush: plain'>
virt-install -n &lt;NAME&gt; -r 256 --vcpus=1 --disk path=/dev/vg_cluster#/&lt;LV&gt; \
  -c /root/debian-&lt;VERSION&gt;-amd64-netinst.iso --vnc --noautoconsole --os-type linux \
  --os-variant debianLenny --accelerate --network=bridge:bridge0 --hvm -k de
</pre><p><br />
To monitor the process use <tt>virt-viewer -c qemu+ssh://<node>:<port>/system <NAME></tt>.</p>

<h3>RHCS setup</h3>
The next step is the setup of RHCS itself. Again, first things first, the software: <tt>apt-get install redhat-cluster-suite</tt>. This pulls quite a number of services which are not needed in our example:
<pre class='brush: plain'>
invoke-rc.d nfs-kernel-server stop
invoke-rc.d nfs-common stop
invoke-rc.d portmap stop
update-rc.d -f nfs-kernel-server remove
update-rc.d -f nfs-common remove
update-rc.d -f portmap remove
</pre><p><br />
Btw., <tt>system-config-cluster</tt> is not available for Lenny, but our Philipp Hübner has created a backport:</p>
<pre class='brush: plain'>
wget --no-check-certificate https://www.credativ.com/~phu/lenny-backports/system-config-cluster/system-config-cluster_1.0.53-1_all.deb
dpkg -i system-config-cluster_1.0.53-1_all.deb
apt-get -f install
apt-get install xauth
</pre><p><br />
In order to have locking on the LVM cluster, you now need to modify <tt>/etc/lvm/lvm.conf</tt>: check for the <tt>global</tt> part.</p>
<pre class='brush: plain'>
 locking_type = 3
</pre><p><br />
With the newer kernels the module <tt>lock_dlm</tt> also vanished, so CMAN init script must be modified: comment out the line <tt>modprobe lock_dlm 2>&1 || return 1</tt>. Additionally, RHCS 2 only supports XEN, so for libvirt you need to load the resource handler  <tt>vm.sh</tt>.</p>
<pre class='brush: plain'>
wget --no-check-certificate https:///www.credativ.com/~phu/vm.sh -O /usr/share/cluster/vm.sh
chmod +x /usr/share/cluster/vm.sh
</pre><p></p>

<p>RHCS itself is called via</p>
<pre class='brush: plain'>
/etc/init.d/cman start
/etc/init.d/clvm start
/etc/init.d/rgmanager start
</pre><p></p>

<h3>Fencing</h3>
Fencing describes the automagical neutralization of nodes which cease to function properly. In our example we use a power plug which can be controlled via network, NETIO-230A. Currently there is no real fence agent available for the device, but the python library <a href="http://github.com/pklaus/netio230a">Python-Bibliothek</a> offers the necessary background to quickly write one.

<h3>Closing words</h3>
This howto has shown the setup of RHCS on Debian in easy steps - but of course, the correct steps depend very much on the targeted services, so this is just an example. If you need help just ask - <a href="http://www.credativ.co.uk/services/support/projects/high-availability-clustering/">Open Source HA solutions</a> are our speciality, and <a href="http://www.credativ.co.uk/services/support/projects/virtualisation/kvm/">we offer services and support  for KVM virtualization</a> as part of our day to day business.]]>
        
    </content>
</entry>

<entry>
    <title>credativ Training at Munich Open Source School</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/05/credativ-training-at-munich-open-source-school.html" />
    <id>tag:blog.credativ.com,2010:/en//2.160</id>

    <published>2010-05-05T14:00:12Z</published>
    <updated>2010-07-08T11:27:22Z</updated>

    <summary>In May, Consultants from credativ GmbH will be holding a 3 day advanced system and network administration workshop at the Open Source School in Munich. Training specifics (subject to modifications!): Kerberos: This training covers the Kerberos authentification protocol, which can...</summary>
    <author>
        <name>Michael Banck</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Open Source" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Security" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><em>In May, Consultants from credativ GmbH will be holding a 3 day advanced system and network administration workshop at the <a href="http://www.opensourceschool.de/">Open Source School</a> in Munich.</em></p><br />
Training specifics (subject to modifications!):<br />
<ul><br />
<li><b><a href="http://www.opensourceschool.de/kurse/muenchen/schulung/kerberos/">Kerberos:</a></b> This training covers the Kerberos authentification protocol, which can handle a range of services and operating systems transparently. The use of tickets makes single-sign-in possible; so a user can access all services with a unique log in.  The training will be aimed at network and system administrators who wish to roll out Kerberos in their business or administrative network; it will also cover the installation and management of Kerberos, as well as the integration of services and client programs. <br />
<p><br />
When: <a href="http://www.opensourceschool.de/kurstermine/muenchen/schulung/kerberos-5-2010/">03-05/05/2010</a> and <a href="http://www.opensourceschool.de/kurstermine/muenchen/schulung/kerberos-09-2010/">13-15/09/2010</a></p>

<p><li><b><a href="http://www.opensourceschool.de/kurse/muenchen/schulung/spam-und-virenabwehr/">Spam and Virus Defense:</a></b> This training will clarify the integration and fine tuning of open source based services Postfix, Amavis and SpamAssassin, which protect a network from unnecessary strain due to spam mail or malware. This training will be geared at administrators who wish to secure their company's email systems against spam and viruses.<br />
<p><br />
When: <a href="http://www.opensourceschool.de/kurstermine/muenchen/schulung/spam-und-virenabwehr-05-2010/">26-28/05/2010</a> and <a href="http://www.opensourceschool.de/kurstermine/muenchen/schulung/spam-und-virenabwehr-10-2010/">18-20/10/2010</a></p>

<p><li><b><a href="http://www.opensourceschool.de/kurse/muenchen/schulung/samba-in-heterogenen-netzen/">Samba in heterogenous networks:</a></b> This training concerns Samba as a replacement for Windows servers for smooth integration for both Windows clients in unix-based networks, and Linux servers in Windows-based networks.  The training is directed at administrators wanting to migrate a Windows network completely or partly to Linux with the help of Samba.  The goal of the training is the management and administration of LDAP-based primary/backup domain controller setups.<br />
<p><br />
When: <a href="http://www.opensourceschool.de/kurstermine/muenchen/schulung/samba-06-2010/">30/06-02/07/2010</a><br />
</ul></p>

<p>The training will take place at the Open Source School in Munich city centre, <a href="http://www.opensourceschool.de/ort-anreise/">Amalienstrasse 77</a>. Applications can be made via the Open Source School website or by <a href="http://www.opensourceschool.de/fileadmin/oss_website/downloads/oss_anmeldung.pdf">faxing this form</a>. For further information contact <a href="mailto:Michael Banck <michael.banck@credativ.de>">Michael Banck</a>.
</p>
<p>Further dates for your diary: 21-23 April - <a href="http://www.linuxhotel.de/kurs/postgresql/">PostgreSQL training</a> will be carried out by credativ experts at the Linuxhotel <a href="http://www.linuxhotel.de">Linuxhotel</a> in Essen.
</p>]]>
        
    </content>
</entry>

<entry>
    <title>[Howto] PostgreSQL and Linux Memory Management</title>
    <link rel="alternate" type="text/html" href="http://blog.credativ.com/en/2010/03/postgresql-and-linux-memory-management.html" />
    <id>tag:blog.credativ.com,2010:/en//2.151</id>

    <published>2010-03-26T13:57:26Z</published>
    <updated>2010-03-26T14:53:42Z</updated>

    <summary>The OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks. Administrators of Linux machines with a very high RAM-Usage...</summary>
    <author>
        <name>Bernd Helmle</name>
        
    </author>
    
        <category term="Howto" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="credativ" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.credativ.com/en/">
        <![CDATA[<p><img alt="postgreslogo.png" src="/de/static/postgreslogo.png" width="97" height="100" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /><em>The OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks.</em></p>

<p>Administrators of Linux machines with a very high RAM-Usage are sometimes faced with a terrifying scenario: the Linux <a href="http://linux-mm.org/OOM_Killer">OOM-Killer</a> (OOM = Out Of Memory). In situations such as a crashed PostgreSQL instance, the following entry can typically be found in the server log:<br />
</p>
<pre class='brush: text'>
Out of Memory: Killed process PID (Prozessname)
</pre><p></p>

<p>Why is this?</p>

<h3>Virtual Memory and Overcommit</h3>

<p>Virtual Memory used by Linux can be allocated in a number of ways: malloc(), mmap(), Swap, Shared Memory, to mention some examples. It is possible to overcommit virtual memory by allocating more than is actually available in the system. If this happens, a so-called "OOM-Condition" occurs; that is, your system no longer has any available space in the virtual memory area and cannot allocate any more. This is when the OOM-Killer is activated - and does what its name suggests: kills any processes which meet certain conditions in order to free memory.</p>

<p>If you have an environment where servers are running PostgreSQL in parallel with other memory-intensive processes on the same machine, it's likely that the OOM-Killer will kill certain PostgreSQL processes. Due to the amount of allocated shared memory and the memory usage of each backend, the OOM-Killer will target PostgreSQL by preference since it counts the complete addressed shared memory area of <strong>all</strong> backends into summary. </p>

<p>The amount of committed memory of your system at a given time can be examined with the <tt>/proc</tt>-Filesystem:<br />
</p>
<pre class='brush: text'>
$ grep Commit /proc/meminfo 
CommitLimit:    376176 kB
Committed_AS:   265476 kB
</pre><p></p>

<p>This example shows the current amount of committed memory at <tt>265476 kB</tt> (<tt>Committed_AS</tt>). Is this equal or even larger than the amount of <tt>Committed_AS</tt> the OOM-Killer is likely to be woken up.</p>

<p>However, the kernel provides some interfaces to adjust the behaviour of the OOM-Killer and Overcommit with regard to PostgreSQL installations.</p>

<h3>Turn off Overcommit</h3>

<p>A radical method is to turn overcommit off entirely, although this is only recommended on systems dedicated to PostgreSQL. The overcommit feature can be configured within three categories with the following kernel parameter:<br />
</p>
<pre class='brush: text'>
vm.overcommit_memory = 0
</pre><p></p>

<p>This can hold three different kinds of categories:</p>

<ul>
       <li><strong>0</strong>: Allow a careful strategy of overcommitting memory: small and reasonable amounts of overcommitting allocations are allowed, but heavy and wild allocations will be denied. In this mode, root can allocate more space than unprivileged users. This is also the kernel default setting.</li>
        <li><strong>1</strong>: Allow overcommit without any constraints</li>
        <li><strong>2</strong>: Turn off overcommit. The effective allocatable memory space cannot be larger than <tt>swap</tt> + a configurable percentage of physical RAM. 
</ul> 

<p>The fraction of physical RAM used by category <tt>2</tt> is defined by the parameter:<br />
</p>
<pre class='brush: text'>
vm.overcommit_ratio = 50
</pre><p></p>

<p>While <tt>vm.overcommit_memory=1</tt> is useful when tuning certain applications, the categories <tt>0</tt> or <tt>2</tt> are the best ones to use most of the time. If you turn off overcommit with <tt>vm.overcommit_memory=2</tt>, a process will get an "out of memory"-Exception (depending of <tt>vm_overcommit_ratio</tt>) if allocating memory when no more free space is available. Depending on the distribution you are using, we recommend that you save those settings in the configuration file <tt>/etc/sysctl.conf</tt> to ensure that they are activated on server reboot.<br />
</p>
<pre class='brush: text'>
$ echo &quot;vm.overcommit_memory=2 &gt;&gt; /etc/sysctl.conf
$ echo &quot;vm.overcommit_ratio=60 &gt;&gt; /etc/sysctl.conf
$ sysctl -p /etc/sysctl.conf
</pre><p></p>

<p>Changes to those parameters are activated immediately. You can recheck this by consulting  <tt>/proc/meminfo</tt>: <br />
</p>
<pre class='brush: text'>
$ grep Commit /proc/meminfo 
CommitLimit:    401440 kB
Committed_AS:   266456 kB
</pre><p></p>

<p>The machine has <tt>249848 kB</tt> of swap and <tt>252656 kB</tt> physical RAM. <br />
According to the formula <tt>swap + vm.overcommit_ratio * RAM</tt> this results in a <tt>CommitLimit</tt> of <tt>401440 kB</tt></p>

<h3>Configure OOM-Killer per process</h3>

<p>Where PostgreSQL is running without dedicated server hardware and in parallel with memory-intensive middleware (e.g. JBoss- or Tomcat-Installations), most admins would prefer to be able to control the OOM-Killer on a per-process basis and allow overcommitting of memory allocations. Since kernel 2.6.1, Linux has been providing an interface for tuning the OOM-Score of a process, which will in turn increase or decrease the affinity of the process to be killed when running in an OOM-Situation. This interface allows a very flexible configuration of processes in such environments regarding their memory requirements. The interface is exposed by the  <tt>/proc-Filesystem</tt>, for example here on a PostgreSQL-Installation on Debian:<br />
</p>
<pre class='brush: text'>
$ cat /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
0
</pre><p></p>

<p>Values allowed range from -17 to +15, a negative value decreases, while a positive value increases the likelihood of being killed by the OOM-Killer. -17 is a special value and turns killing the process in an OOM-Situation off.<br />
The settings are inherited from parent to child processes; in PostgreSQL you'll have to set this one to the PostgreSQL master process:<br />
</p>
<pre class='brush: text'>
$ echo -17 &gt;&gt; /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
$ psql -q postgres
test=# SELECT pg_backend_pid();
 pg_backend_pid 
----------------
           3429
(1 line)

test=# 
[1]+  Stopped                 psql -q test
$ cat /proc/3429/oom_adj
-17
</pre><p></p>

<p>The disadvantage of this method is that <strong>all</strong> child processes will now be excluded from the OOM-Killer, which is not generally what DBAs prefer. For example, where you want to protect the PostgreSQL system processes (like <tt>background writer</tt> oder <tt>autovacuum</tt>) from being killed by the OOM-Killer, but still kill ordinary database connections when running out of memory.</p>

<p>To set the OOM-Score you need to have a privileged user, so the best way to implement this setting is to put it into your PostgreSQL start script.</p>

<h3>Enhancements in PostgreSQL 9.0</h3>

<p><a href="/de/2010/02/postgresql-agenda-2010.html">PostgreSQL 9.0</a> will have additional <a href="http://archives.postgresql.org/pgsql-committers/2010-01/msg00169.php">support</a> for the pictured <tt>/proc</tt>-Interface. On one hand PostgreSQL 9.0 will come with a new <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=blob_plain;f=contrib/start-scripts/linux;hb=HEAD">Linux start script</a>, which supports setting the <tt>oom_adj</tt> value before starting up PostgreSQL; on the other hand it is possible to build PostgreSQL with the special C-Macro <tt>LINUX_OOM_ADJ</tt> defined, which will allow DBAs to limit the inheritance of the OOM-Score to backend childs as shown in this example:<br />
</p>
<pre class='brush: text'>
$ ./configure CC=&quot;ccache gcc&quot; CFLAGS=&quot;-DLINUX_OOM_ADJ=0&quot;
</pre><p></p>

<p>This method will save the PostgreSQL system process but will allow the OOM-Killer to kill database backend processes running amok.</p>

<h3>Alternatives</h3>

<p>An alternative solution is available by an <a href="http://www.cybertec.at/en/linux-kernel-patch">additional kernel patch</a>. This extends the existing <tt>/proc</tt>-Filesystem with a list of process names which should be excluded from the OOM-Killer. However, this patch is an unoffical extension to the Linux kernel and you may have to maintain your own builds of Linux kernels. In addition, it is not nearly as flexible as adjusting the OOM-Score and process names are not useful for uniquely identifying processes (e.g. Java- or Perlbased processes).</p>

<h3>Summary</h3>

<p>The Linuxkernel provides a comprehensive interface to adjust processes regarding their memory usage and the OOM-Killer. The most flexible method is the introduced <tt>/proc</tt>-Filesystem with the <tt>oom_adj</tt>-Interface. PostgreSQL 9.0 will have additional support for this interface. Dedicated PostgreSQL-Systems can be configured to avoid overcommit at all, but will need a deeper understanding of the number of memory resources the database system demands and the requirements of the VM of the kernel.</p>]]>
        
    </content>
</entry>

</feed>

