[cvs] [Wiki] created: ChucksHorde4Thoughts

Chuck Hagenbuch chuck at horde.org
Tue Jul 8 02:33:02 UTC 2008


chuck  Mon, 07 Jul 2008 22:33:02 -0400

Created page: http://wiki.horde.org/ChucksHorde4Thoughts

[[toc]]

+ Chuck's Horde 4 Thoughts

++ Unsorted

horde debugging:
http://www.sitepoint.com/blogs/2008/05/13/useful-in-browser-development-tools-for-php/

chuckhagenbuch: we could use standard ways of gathering debug output...

chuckhagenbuch: we could have a 'debug' driver...

chuckhagenbuch: specify an extra parameter like 'debug_driver' to say  
what real subclass
to use...

chuckhagenbuch: it would be able to intercept all calls and return  
values. that'd be
better than nothing...

AlkernF: could it be generic enough to work with
everything? and how do you intercept calls?

chuckhagenbuch: it would probably have to be written for each driver type

chuckhagenbuch: and by intercept, i mean that you would have it  
instantiate a real
driver, keep it as a variable, and then every method call, you can see  
the paramters,
then you pass them to the real driver, look at the return value, then  
pass the return
value back...

chuckhagenbuch: (by driver type i mean Connection, Prefs, etc... -  
API, i guess)


// return a 304 if the file hasn't been modified since the  
If-Modified-Since date
     // no point in resending all the data if the browser already has it cached
     if (function_exists("apache_request_headers")) {
         $headers = apache_request_headers();

         if ($headers['If-Modified-Since']) {
             $ims = strtotime($headers['If-Modified-Since']);
             if ($ims >= $serve_data['modified_time']) {
                 Header ("HTTP/1.0 304 Not Modified");
                 exit(0);
             }
         }
     }


horde apps - "instance" of a horde app == installed horde app + a  
group of horde_policies that configure it

let those policies be named

instead of using shipping/foo api calls, use $instance->foo()

$Horde->api->method() (chaining)?



I just went through my first signup process that required an
SMS-capable device for confirmation. It also didn't make me pick my
credit card type, and instead used my country code (+1) to decide on a
card detection algorithm.

update_client.pl /modules/future_contribution /modules/future_signup


I think I found now the right mysql-server settings,  with which the  
performance is quite Ok. Increasing the sort_buffer_size was one of  
the changes that helped.

skip-external-locking
skip-thread-priority
key_buffer = 64M
max_connections = 1024
max_connect_errors = 1000
max_allowed_packet = 8M
table_cache = 512
sort_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 64M
thread_cache_size = 50
query_cache_size = 128M
tmp_table_size= 1024M
thread_concurrency = 12
wait_timeout = 60
interactive_timeout = 60
log_slow_queries



add dynamic finders (find_by_name, find_by_id, etc.) to Rdo Mappers or  
Horde_Db_Model or whatever


Controller classes/objects vs. Action classes/objects vs. Resources vs. API



how to develop? give up central config?

http://www.w3.org/Provider/Style/URI

index.php - global dispatcher
how to do themes/custom templates? chain local -> app -> horde?

a horde 4 installation:
   config/
   lib/
   apps/
   public/ <- with app/ subdirs containing images, etc.
everything routable goes in apps/

apps/
   login/
   help/
   prefs/
   admin/
   etc...

... auto-install web files to a writable dir, either in web ui or in  
cli? keep apps self-contained that way?

app name is the first part of the route > /login
subdomain support
route aliases

an app should uncompress over a horde/ dir - /config/app/*.php ->  
config dir is compiled/cached

horde is not rails. it is designed as a container for multiple,  
collaborating apps

horde apps are configured by Horde_Policy objects

need Horde_Db, whatever implements DML, DDL, and SQL - Mad? MDB2?
  - prefer PHP over XML
merge Rdo and Mad into Horde_Db

allow for overriding the mappers so that non-SQL can be used, but,  
default to SQL/sqlite and leverage it

framework repository/module
   Horde/lib/...
   Rampage/lib/...
      use subpackages or multiple *.xml for packages to avoid silliness?

apps should be installable into a horde container. shouldn't be tied  
to the app name - keep imp, krono, etc, but install as mail, cal,  
events (should be able to install two versions of krono w/ different  
permissions - see HordeSpaces)

installing gives a slug, that slug manages config, templates, themes,  
perms, etc.

----

figure out how to merge luxor into Chora

----

for now, build Horde_Content_* based on Rdo, then move to Horde_Db

Horde_Db provides Horde_Db_Mapper which creates Horde_Model_Base objects

apps have a config/ dir, but that's just defaults and defining base  
routes, polices, etc. user settings are stored in the db or a global  
directory.



should have parallel web and cli configuration and installation/update  
tools; web requires webserver to have write access to a config/ dir  
and to public/; cli tools do not (if run as another user)



Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest  
libraries

Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4

/horde/page/ -> dispatcher for Rampage modules w/ views (overridable),  
routes, controllers, etc.?
have generic views for rampage_login, rampage_admin_*, etc.

configuration:
config/routes.php
config/routes_local.php -> do this for all config files

Horde_Content_Index -> horde-wide search


Random Horde Ideas

mini-cms for building your own sidebar/menu/etc?
- shortcuts to any bit of horde

labels labels labels
keywords also or just labels? probably just flexible labels
"smart folders"

Getting Things Done support? (other apps that do it - Tracks, Kinkless  
GTD, Midnight Inbox)

make mnemo into more of a snippet keeper? sort of like a personal cms  
- or wiki. carry the encryption feature through to other kinds of  
content

create an outliner!

tags/labels for mail

rename virtual folders to smart folders? too apple?

freetext boolean mail searches:
apples & oranges
apples | oranges
apples ! oranges (apples but not oranges)
apples & (oranges | lemons)


security of redirects:

http://www.xssed.com/mirror/39494/

This is sort of an interesting one.  For the actual attack he merely  
figured out that we are base64 encoding the successurl and reflecting  
back whatever is there.  The interesting thing is that merely  
filtering the unecoded data is not going to save us here.  The string  
was javascript:alert(/XSS.By.Mityo/) and was being loaded into the URL  
field of a meta redirect.  So our max filter of strip_tags is useless.  
  It just illustrates the rationale for Phase 2 of the security build  
out where we have to be careful when we are dealing with redirects.

In this particular case, we need to make sure we are getting a valid  
URL format.  That will prevent javascript insertions.  But we also  
want to make sure the URL is not redirecting outside the intended  
domain for some phishing scam.  In this case I will fix the problem by  
validating the URL on the cons/login.inc.php where the data is coming  
in but will also try doing it on the generic show_redirect_message()  
call if I think I can do so without breaking other pages.






Event-driven apps:
"Understanding and implementing this event model can free your  
application from the constraints of defined elements. For example,  
instead of applying an event listener for each link in a menu, you can  
assign a single listener to the menu item itself and retrieve the  
event target. That way you don?t need to change your script when the  
menu gets larger or when links get removed from it."
http://yuiblog.com/blog/2007/01/17/event-plan/



tagging/instant hierarchies as specialized permission-based search
RBAC


what is horde?

groupware?
horde data services?
horde data access?
ui layers

be the php dojo framework? or the php yui framework?
see http://tigermouse.epsi.pl/ ?
or, don't do desktop-like widgets? see UI design bookmarks

move away from gettext, at least as a default? midgard i18n notes:
http://www.midgard-project.org/discussion/developer-forum/midgard-s-multilang-support/

try to rely only on thread-safe extensions?
reduce dependency tree

avoid globals and non horde-namespaced functions/methods in framework  
and core app code
class-based registry apis


against edge cases: http://www.bakesalehq.com/contents/show/12/

features from Prado? http://www.urdalen.com/blog/?p=198

use functions where appropriate for shortcuts/helpers, like Mike's  
t("translated string") function? but would be horde_t? would call  
configured translation system


helper sets for dojo, protaculous, yui - simple functions like  
dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like  
Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/

Horde as a set of apps and methodology needs to pick a js lib, pick a  
template methodology, etc. - this is Rampage Horde as a framework can  
allow for flexibility


To make it even better, separate the control logic from the  
presentation. That way, back could be reverse, etc. I do this in all  
my forms since application logic and presentation "word play" are two  
distinct things to me. This is what I use:

<form method="post" action="form.php">
<input name="submit[back]" value="reverse" type="submit" />
&nbsp;
<input name="submit[next]" value="speed ahead" type="submit" />
&nbsp;
<input name="submit[home]" value="no place like home" type="submit" />
</form>

Then, you can have a simple routine that captures submit actions  
regardless of the presentation value. You check for the array submit  
-- count 1 and whitelist against the acceptable values. A multi-row  
table can expand upon the theme by using this: submit[edit_3],  
submit[delete_3]m submit[edit_5], etc.


caching
make sure Rdo and other services allow dropping in caching rules


http://sebastian-bergmann.de/pages/talks.html
phpunit - @test markup in methods
   phpunit + selenium
   cruise control?


really hope google will integrate any product of theirs with any other  
products of theirs? receive an email, transform it to document, add  
spreadheet, add notes, add bookmarks saved from search history and a  
link to an event in calendar anyone?


 From nyphp-talk:
     The other day I had to get an application started in a hurry.  It's
doing something useful at < 700 lines,  but I'm considering options that
could grow it out to about 10 times that.  It depends on a "core
library" that's < 500 lines.  This library deals with common issues in
string handling,  parameter handling,  and HTML form generation.

     About 10% of the application,  or 70 lines,  is a microframework
that's loosely built on Struts.  About 20 of those lines are in 2
functions which would be generally useful for microframeworks (such as
file_exists_in_include_path()).  Like Struts,  the microframework
chooses an "action" based on form parameters:  the action then chooses a
"view" -- a "view" is basically a template that a designer can edit
which can be supplemented by an optional "query" which pulls stuff out
of the database.  Like Ruby-on-Rails,  the microframework uses
convention instead of configuration:  the dispatcher computes an "action
name" based on query parameters,  and uses that to compute a
filename...  It checks that the file exists and executes it with the
"require method".

     The microframework uses no object-oriented techniques.  That's not
because I have any antipathy to OO,  but because I didn't need it,  and
I like writing my actions,  queries,  and views in a style that "feels
like PHP".

     Yes, my microframework is nowhere near as powerful as CakePHP or
Symfony.  Yet,  it's more flexible,  because I can codesign it with my
application.  Because it's so simple,  I can easily adapt it to do what
I want.  If I decide I really hate it,  I can write a new one in an
hour.  I'm an expert on it,  because I developed it,  and I wouldn't
have to take on the technical,  social and emotional burdens of
"forking" an open-source codebase if I wanted to make a change in direction.

     I'm moving towards a vision of web app architecture where we move
towards shared vocabulary and standardized interfaces.  Rather than
working with a "comprehensive framework" that does everything,  I'd like
to have a "framework construction set" that contains a number of
elements that I can take or leave."




Resources:
http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here


mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony

split db ideas: http://pear.php.net/pepr/pepr-proposal-show.php?id=359

http://dataspill.org/pages/projects/ruby-activeldap


More php features to look in to:

__toString works everywhere

SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator
Data: stream support

DateTime and DateTimeZone classes

set date.timezone ini setting automatically based on user?


Search engine sitemap stuff - of use at all? maybe support in rampage cms
http://p7.hostingprod.com/@www.ysearchblog.com/archives/000437.html



5. I want a registration info tab like Inbox.lv where they can change  
their personal stuff they put on file with us on the signup forms.

14. We may need  Windows address book synchronization(this is a  
feature that fastmail is adding, and hotmail already has, so I guess  
we will have to also?) It is not a must in my books.

17. I want to add a new feature next to the attach button that is like  
send message after attached, so if they are uploading a big file  the  
can leave and it will be sent automatically.

19. We MUST have an easy user interface. Fastmail has lots of features  
and they try to make it where you can do everything in 2 clicks or  
less. We need to try to do this. Fastmail is all bunched up and looks  
like shit though. We need to make ours more of a packed with features  
like fastmail, but spread out like AOL has or fastmail. This will  
attract all the old people and beginners of the internet who have just  
gotten off of AOL and moved to DSL Fastmail looks like it is only made  
for advanced users and is hard to get used to. We need to

Have a main Navigation bar and which is on every page, which has all  
the mail icons that people use the most like, compose, inbox,  
addressbook, options, and the main Navagation bar should be on every  
page at the top.Then we wil have  a subnavagation bar for each other  
page , for example, if you were to hit the calander icon on the main  
navagition bar that is on the top of EVERY page, then it would take  
you to the calander page and show you the calander and the  
subnavagation bar would have all the calander icons like add events  
ect. I was thinking, in IMP we could have the logo at the top left  
coner of the page, then on the top right we could have all the main  
navagation icons. Both the logo and the main navagitions would be o  
every sign page in IMP, so it would be easy to get around. Then the  
sub navagation bars coulkd go where the main navagition bar is now on  
IMP, understand?

2. Make a bounce button like fastmail.fm. This is how fastmail  
explains their bounce button:
'Bounce' takes the currently selected emails and sends back an email  
to the addresses the email(s) came from saying basically that 'the  
email address does not exist' in a standard internet email protocol  
way. Some more organised spammers remove these from their lists. After  
sending the bounce response, the messages are deleted."

* If accessed with a browser, public folder is also a personal  
web-site, accessible at http://username.fastmail.fm
* Provide tool allowing synchronization of Outlook Express etc address  
book with FastMail contacts, possibly using LDAP

* Use JavaScript for browsers that support it to speed up many  
actions, such as searching through the address book

* A general notification system, so you can send a pager message, SMS  
message, instant message, or short email


eGroupWare over Horde reasons

Linking:  There is the "infolog" for linking items.  An infolog item  
can be a to-do, call, or note.  It can link to the addressbook,  
projects, calendar, or another infolog item.  That is very flexible.

Access Control:  Under Preferences, there is a "Grant Access" link for  
the calendar, addressbook, infolog, and projects.  It allows you to  
select Read, Add, Edit, Delete, and Private access for each group and  
each user.  Again, very flexible.

Categories: Multiple category selection is allowed in the addressbook,  
projects, calendar and infolog.

Custom Fields:  I can create custom fields.




PHP_SELF

Executive summary: PHP_SELF intentionally includes extra URL garbage (or
valuable URL variables, take your pick) tacked on by the user.  Don't use
it without knowing what it does.

Here's what you get when you hit the URL:

http://example.com/info.php/testing1?testing2 :

_SERVER["REQUEST_URI"]         /info.php/testing1?testing2
_SERVER["PHP_SELF"]    /info.php/testing1
_SERVER["SCRIPT_NAME"]         /info.php

Get it?  If you don't want that extra stuff tacked on by the user, use the
correct _SERVER variable.  If you use REQUEST_URI or PHP_SELF, be aware the
user can affect the contents of that variable.  99% of the time, you want
SCRIPT_NAME, not PHP_SELF.

By the way, here's another test:

http://example.com/info.php/testing<script>?testing :

_SERVER["REQUEST_URI"]         /info.php/testing%3Cscript%3E?testing
_SERVER["PHP_SELF"]    /info.php/testing<script>
_SERVER["SCRIPT_NAME"]         /info.php

Note that the REQUEST_URI variable, which comes from Apache, is encoded,
while the PHP_SELF variable, which comes from PHP, is not.  So PHP 5.2.0
still makes it possible to shoot yourself in the foot, and as I've pointed
out below, well-known PHP authorities actually recommend that you do so.

Here's the email that I sent at in July 2005:


Subject: Re: [nyphp-talk] $_SERVER['PHP_SELF'} not working?
Date: Friday 22 July 2005 12:05 pm
From: Michael Sims <jellicle at gmail.com>
To: NYPHP Talk <talk at lists.nyphp.org>

On Thursday 21 July 2005 17:16, Dan Cech wrote:
You could put:

$_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME'];

into one of your common include files.

Yes.  I'm afraid I don't understand this entire thread.  Apparently
because of the numerous PHP developer articles recommending it, and
because of the php.net page which for whatever reason lists it first on
the list of predefined variables, people are using PHP_SELF when they
really want SCRIPT_NAME.  SCRIPT_NAME solves all the problems mentioned
in this thread - it's just the script name, without any extra garbage
that might be tacked on by the user.  PHP_SELF explicitly includes that
extra garbage, so solutions in this thread that involve stripping the
garbage off of PHP_SELF to make it safe are really, really missing the
point - just use SCRIPT_NAME instead.  Please don't use FORM ACTION="";
according to the spec, what the browser does with that is undefined, so
even if it works in current browsers, it might not work in future ones.

People can be forgiven for making this mistake -- I'm here holding my
copy of _Learning PHP 5_, and it recommends on page 8 and again on page
86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time
to put it to bed: PHP_SELF is unsafe for any usage where it is echoed
back to the page.


SESSIONS:

   I'll try to reply to this and some other people who replied to my  
previous message.
    I'll start with my background.  I've often been the person who the  
buck stops with --
somebody else develops an application that almost works (perhaps even  
puts it in
production) and then I have to clean up the mess.  The app might be  
written in PHP,
Java,  Cold Fusion,  Perl,  you name it.  I've learned to see session  
variables as a "bad
smell".

    When I develop my own applications,  I use cookies for  
personalization and caching.  I
use the authentication system described in

http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz

    this mechanism can carry a "session id",  which in turn can be  
used a key against
application state stored in a relational database.  I think through  
the boundary cases,
and find that my greenfield apps behave predictably -- my only woe is  
that you'll
discover that browsers have a lot of undocumented behavior connected  
with cookies,  form
handling,  and caching.  All problems that you still need to fight  
with if you use
sessions,  see the comments for

http://www.php.net/manual/en/function.session-cache-limiter.php

----

    The context of this is that the average web application is poor in  
the areas of
usability and security:  recent studies show that 80% of web  
applications have serious
security problems

http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf

    Jacob Nielsen's website has been chronicling the sorry state of  
web application
usability:

http://www.useit.com/

    Perhaps the top 20% of programmers can write applications with  
$_SESSION that don't
have serious security and usability problems,  but what about the other 80%?

----

(1)  Session variables are treacherous.  Odd things can happen in  
boundary cases,  such
as when sessions expire,  or when you are targeted by session fixation  
attacks.

http://shiflett.org/articles/security-corner-feb2004

    I've looked at many apps that use sessions that seem to be  
working...  Until you walk
away for two hours,  come back,  and discover that you're logged in as  
somebody else.  I
suppose I could have spent hours or days tracking down an intermittent  
problem,  which
involved some confluence of browser oddness (IE was fine,  Firefox was  
screwy),  the
behavior of the session system,  and crooked logic in the application.  
  Or I could use
cryptographically signed cookies to implement an authentication system  
which won't give
me surprises in the future.

Anybody can write applications that work 95% of the time with  
$_SESSION.  Getting the
other 5% right requires a deep understanding of state and  
statelessness on the web...
Which is what (many) people are trying to avoid when they use  
$_SESSION variables.

    There are more than twenty configuration variables that affect the  
way sessions work
under PHP.  Incorrect configuration of any of these can cause  
applications to fail,
often in intermittent ways.  The use of a custom session handler can  
have unpredictable
effects on security,  reliability and performance.

    Other languages are a lot worse than PHP -- the use of the "scope"  
concept in
languages such as Cold Fusion and Tango makes it easy to use a session  
variable without
realizing it...  Resulting in an application that "works" sometimes,   
but fails in
mysterious ways.

(2) Session variables are bound to a particular language.  In the real  
world,  I work
with legacy systems that might be written in other languages.  I might  
have some old
pages in Cold Fusion that work just fine,  and I won't rework them in  
PHP until I've got
a good reason.  If users can set a customization parameter,  such as  
the background of a
page,  it's easy to write a cookie that all languages can read.   
Applications stuck in
the session variable roach motel aren't as maintainable and portable.

(3) PHPSESSID.  Do I need to say more?  I consider the client that  
wants user tracking
and can't accept cookies,  so all the pages on their
site look like

http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob**

    Three months later they come back and wonder why their site isn't  
being indexed in
Google.  Yes,  there's a saner way to use this feature,  but this  
"cure" to privacy
violation is worse than the cookie "disease",  since session ids will  
leak out through
referrers,  bookmarks,  links that people cut-and-pate...

(4) The back button.  When somebody asks a question about sessions on  
a forum,  they'll
usually ask another question a few days or weeks later:  "How do I  
disable the back
button?"

    The underlying problem is a deep aspect of the structure of the  
web.  There is certain
state information that's particular to a request (GET and POST  
variables) and certain
state information that has a more persistent scope (cookies,  session  
information,  a
relational database.)  The back button makes it possible for these two  
things to get out
of sync.

    Ultimately,  we need a systematic strategy to deal with this.  One  
pattern is to put
the complete state of the application in form variables.  Applications  
that use this
pattern always work perfectly with the back button.  This pattern  
doesn't work always
(hitting the back button shouldn't cancel your order on an e-commerce  
site),  but it
works often...  For instance,  you can use hidden variables to hold  
onto form variables
for complicated forms that spread over several pages,

(5) Multiple windows.  I think it's a human right to be able to have  
more than one window
open on a web site.  If I'm shopping,  for instance, I'd like to be  
able to look at two
products simultaneously.  An application that keeps state in form  
variables doesn't care
how many you have open.  If you're looking for jobs at an organization  
that uses
taleo.net's software,  you'll find that it uses trickery to prevent  
you from having more
than one window open...  So you can't look at two jobs at once,  or  
look at the job
description while you're filling out the application.  I suspect that  
they did this
because they don't want to spend forever debugging "race conditions"  
that could be caused
by a user acting in two windows simultaneously.

    Session variables introduce problems of locking.  PHP gets an  
exclusive lock on the
session for each page displayed.  This hurts the performance of pages that use
dynamically generated images and Javascript,  and can mysteriously  
deadlock AJAX
applications.

(6) Scalability,  Reliability,  and all that.  This is a tricky one,   
because it depends
on particulars.  Sessions can be lightning-fast in systems that keep  
them in RAM,  such
as Java and Cold Fusion.  The default session handler in PHP uses  
files,  and is probably
faster than a relational database in a direct comparison:  however,   
the session handler
will load all of the data into RAM,  whereas a relational  
implementation may only need to
load information when it's needed.  Keeping information in POST  
variables or cookies also
involves a tradeoff -- this is as scalable as it gets so far as server  
resources,  but
requires that the state be passed back and forth between the browser  
and server.  This is
no big deal if the state is 500 bytes.  It's unacceptable if the state  
is 500 megabytes.
In most cases,  it starts looking expensive when we're passing an  
extra 10k-100k around.

I've recently been working on a legacy app that contains a query  
(select a subset of
items) and reporting (display user-selected fields of those items)  
function.  The
interface between those modules is simple:  the query system passes a  
comma-separated
list of item identifiers to the reporting system.  I like this,   
because it meant that
one system could be changed without affecting the other.  I had to  
update the app so it
would work with a changed database schema,  so both sides needed some work.

I discovered that the app was passing the item list as a session  
variable.  This worked:
unless I was using the application in two windows at a time.  In that  
case,  a query in
one window would change the report delivered in another window.  I  
thought about it,  and
realized that in this case,  result sets would always be under about  
10k,  and usually be
around 1k.  Therefore,  it made sense to pass this as a hidden  
variable in the form and
ditch the session variable.

This shows the kind of problems that regularly turn up in the  
applications that
developers "throw over the wall" to testers and clients.  Choose a  
session variable,  and
your application behaves mysteriously for a user who didn't respect  
the "one window at a
time" assumption you made.  Passing hidden variables in forms,  on the  
other hand,  might
work OK when you're testing with a small data set over a LAN,  but  
could rapidly become a
performance nightmare for dialup users using a production database.

Performance can be improved in a number of ways:  for instance,  by  
delta-sigma
compressing the item list,  or creating a "form scope" variable that's  
keyed against a
unique identifier in the form.  Either way,  quality web applications  
take quality
thought.

(7) Lack of engineered application state:  Engineered Application  
State is the gem of
database-backed web applications.

If you keep the state of your application in a relational database,   
you need to ~design~
the state of your application.  You need to ~think~ every time you add  
or change a table
in your relational database.  You can add a new variable to your  
application as easily as
typing '$'.

Desktop apps keep the application state in a tangle of pointers.  C  
and C++ applications
tend to contain 5 or more defects per thousand lines of code.  Errors  
show up in data
structures over time,  just as mutations occur in your cells.  Memory  
leaks,  application
hangs,  and crashes are cancers caused by these mutations.

PHP apps die at the end of each request,  and are reborn for the next  
request.  They
don't accumulate errors over time.  Web application environments such  
as Java and Cold
Fusion that involve a long-running process regularly hang or crash and  
require restarts.
When is the last time you've had to restart PHP?

A database protects you from errors in multiple ways.  Transactions,   
for instance,
protect against data corruption caused by crashing scripts.  It's easy  
to write

$_SESSION["logged_in"]=true;

in one place and

$_SESSION["logged-in"]=false;

in another,  introducing unpredictable behavior and security holes.  A  
relational
database will give you an error if you try something like that.

-------------

Can users of $_SESSION avoid the seven deadly sins?

Yes.

In practice they don't.


Paul,
That looks like a lot of info to digest without specific examples. Is  
there a book or
other resource on session management that you recommend that deals  
with these issues in
more detail?
Thanks.
-Leo
   I'm not aware of one,  but I wish there was.  I think the question  
isn't so much "session management" but about how to manage state in a  
stateless protocol -- sessions
are one abstraction for doing that,  but other abstractions exist too.

    I think the best approach here is the "Pattern Vocabulary"  
approach.  There are
certain practices,  that when applied to an application,  have certain  
results.

    For instance,  there's the pattern of "Stateless Server" -- the  
complete state of the
application (or subsystem thereof) is kept in hidden POST and GET  
variables.  You accept
some limits,  but get some real benefits:  infinite scalability,  no  
headaches with the
back button,  no need for cookies...

    You might try the above and then notice that you're passing 100K  
around in your hidden
form variables...  People are complaining that your app is slow.  Now  
you can generate a
unique id each time you draw a form ("Generated Form Scope",  for lack  
of a better term.)
  You can stuff your "hidden" variables into the database under this  
key,  and restore
them when the key comes back...  If your code is organized right (does  
something like
$vars=$_POST,  and only looks at $vars afterwards),  you can do this  
transparently to the
rest of your app.

    The same kind of thinking can protect you against certain kinds of  
back button woes --
you can at least stop people from submitting the same form more than  
once,  by checking
to see if a form with that unique id has been submitted before.

    "Shopping Cart" is another pattern.  People often use session  
variables to handle
shopping carts,  but that's really not ideal from a user interface  
perspective...
Ideally,  each instance of a shopping cart has it's own unique id...   
Imagine we want to
make an e-commerce site that behaves like amazon.com:

(1) User visits e-commerce site from a home computer -- a long-term  
tracking cookie gets
stuck on their browser
(2) User adds item A to their shopping cart...  A new shopping cart is  
created with id
#101,  associated with the tracking cookie. (3) User adds items B,C,D,  
and E to their
shopping cart in the course of 30 minutes of browsing.  Each time an  
item is added,  we
add a row to a table in the database that links the item id to the  
shopping cart id.
(4) 4-year old hits reset button
(5) User comes back to e-commerce site... He's happy to find his cart  
is still there.
User creates account #202 to check out.  Shopping cart #101 is  
associated with account
#202
(6) User checks out shopping cart.
(7) User comes back a week later,  wants to buy a few more items.  The  
site recognizes
who he is.  He adds two of item A and an item F to a newly created  
shopping cart with id
#102,  associated with user account #202.
(8) User goes to work, logs in...  The system sees that he has  
shopping cart #102 open.
He adds item G,  and then checks out.
(9) User learns that he can trust this site to work correctly and  
becomes a loyal
customer.

    It's nice that we've got a historical record of the shopping cart  
after the fact,  but
there's a more important point -- we could have lost the customer's  
dollar at many points
in the above transaction if we were using a $_SESSION based cart.    
The session wouldn't
have survived step 4,  for instance.  A good user interface isn't  
academic here...  It
puts money in our pocket.

    The above scenario is complex,  and it might not be fair to expect that a
first-generation shopping cart has those features.  A $_SESSION-based  
shopping cart would
need to be completely reworked to add the features  above.  A cart  
that uses a unique
"cart id" and relational back end,  will be a lot more maintainable...  
  You could even
start out using $_SESSION to keep track of the "cart id",  then keep  
it in a cookie,
then associate it with a user name,  add the facility to promote an  
anonymous cart to an
authenticated cart and so on.  Starting with a good design,  we can  
provide the interface
that we ~want~ to provide,  not that one that our abstract layer  
~forces~ us to provide.



In regards to slides 29 and 30, can you elaborate and give a more detailed
example what they are trying to say?  Are they saying that the session key
should contain a hash of the data? Or does the hash become the "salt" in
crypting the data? Finally, how does doing that make it easier to prevent
circumvention and forgeability.
   Let's take it a step at a time...  Imagine we've got a token of the  
following format...
$token="$user_id:$session_id"

    The session_id doesn't have to be unpredictable -- it could could from an
auto_increment column in a database table...  With the caveat that  
people could estimate
the usage of your site by looking at the session id's.

    You could put this in a cookie,  and it would work quite well,  as  
long as you didn't
have users who knew how to look at or change the cookies.  An attacker  
who understands
cookies can easily change the user id,  or session_id.

    To protect the cookies from tampering,  we could do something like

$hash=sha1($token);
$signed_token="$hash:$token";

    We could check the integrity of the token by recomputing the hash  
and see if it
matches the one in the signed token.  This protects against accidental  
damage,  or very
simple attacks.  Still,  it's quite possible that an attacker could  
guess what you're
doing:  it wouldn't be safe at all in an open source system.

    That's where the salt comes in...  For a particular web site,  we  
create a random
"salt" that,  effectively,  gives us a unique hash function for our web site.

$salt="... a random salt defined in a per-site configuration file ...";
function private_hash($token) {
    global $salt;
    return sha1("$salt:$token");
}
$private_hash=sha1("$salt:$token");
$signed_token="$private_hash:$token";

    Now,  nobody can alter your tokens unless they know your salt.

    Because the tokens are cryptographically signed,  the token itself  
is a proof that
somebody has logged in -- you don't need to look at the database or  
keep ~any~ server
side state.  This makes it a highly scalable system...  This basic  
approach is used on
some of the biggest sites in the world,  such as yahoo.com.

    Except for one little detail:  replay attacks.

    Nothing stops a person from saving his token and presenting later  
-- after his account
may have been deactivated,  or after associated session information  
has been purged (an
error condition.)  An attacker that gets the person's cookie jar,  or  
who intercepts
network traffic,  can also steal the token.

    It's not possible to completely protect against sophisticated  
attacks where a hostile
party controls your network without installing complex software on  
both ends,  and
solving some intrinsically difficult problems having to do with mutual  
authentication.
Let's just say that the developers of SSL have solved these problems,   
and that you
should use SSL for applications with the strongest security needs.

    We can,  however,  make replay attacks a lot harder by adding a  
timestamp...  Now the
token looks like

$timestamp:$user_id:$session_id

    Now we're keeping a table on the server that looks like

create table session (
    session id      ... session id ... primary key
    user_id          ... user id ...,
    last_updated  ... timestamp ...,
    begin_time    ... timestamp ...,
    end_time       ... timestamp ...
);

    Now we've got two constants:

REFRESH_TIME: how old a timestamp is before we issue a token with a  
new timestamp and
write the timestamp to the last_updated column.
EXPIRE_TIME: how old a timestamp is before we eliminate the session.

    You might think you could put the client ip address in the token,   
and lock the
session to an ip address to make it harder to steal tokens.  I tried  
this,  but found out
that some of the largest ISPs (such as aol) have a proxy server that  
makes users seem to
"jump around".  You can do it if you know people are logging from a  
sane ISP,  but you
can't do it in general.

    ---

    This system can be improved in numerous ways,  such as adding  
anonymous sessions,
operating in a split http/https mode,  and caching authorization  
system in the token.

    If you're worried about information leakage (you don't want  
someone to know that he
got session 88427 yesterday and 99105 today),  you can encrypt the  
token.  But be
careful...  It's easy to use cryptography the wrong way:  don't rely  
on encryption to
protect token integrity against tampering -- most of the obvious  
schemes don't really
work.






cookie usage:
20 per domain, 4094 characters (bytes) in the value

Horde_Model -> Horde_Rdo_Model extends it
Horde_Type

Page/Block object
- how to return block from driver, inherit Block methods, but also  
inherit Rdo_Base?
Mapper! _Mappers are the drivers_

Nag - tasks are a model
different models for different sources of tasks
so maybe horde_rdo_model isn't extension but delegate?

types are string, etc.
types can be used by rdo as well as by forms (models)

form helpers go into horde_view helper pack

Horde_Model:
validation:

validatesPresenceOf
validatesUniquenessOf
validatesAcceptanceOf
validatesConfirmationOf


one database, one real filesystem space

no globals

webroot has:
index.php
.htaccess
assets/ (css, images, js)
mod_rewrite rules
everything else pear-installable
make assets pear installable somehow

viewbuilder/pagebuilder - custom views
command line and web service actions (still api/method/params)

catalyst::message() - replaces logmessage - fatal, notification,  
observer - has a return value (?)

session object management

cms for rampage based on (replacing) ulaform + wicked + giapeto


horde_form
  - db and xml descriptions instead of just php building


reconcile driver architecture with Rdo Models

apps provide models instead of forms?
apps provide route bundles? (if frontcontroller)
forms are models!

reconcile models and mappers
what do routes point to (models? mappers? views?) -> controllers
controllers handle mappers vs. models?
composite mapper? (turba, etc.)




After reading that theserververside.com entry, it seems like we've  
been doing this in Solar (framework for PHP5) for a little while now.   
Essentially, after processing a form, you call  
$this->_redirectNoCache('controller/action') and you shouldn't get any  
re-POST troubles.

Boring code from the page-controller follows.

   <http://solarphp.com/svn/trunk/Solar/Controller/Page.php>;;

     /**
      *
      * Redirects to another page and action after disabling HTTP caching.
      *
      * The _redirect() method is often called after a successful POST
      * operation, to show a "success" or "edit" page. In such cases, clicking
      * clicking "back" or "reload" will generate a warning in the
      * browser allowing for a possible re-POST if the user clicks OK.
      * Typically this is not what you want.
      *
      * In those cases, use _redirectNoCache() to turn off HTTP caching, so
      * that the re-POST warning does not occur.
      *
      * This method sends the following headers before setting Location:
      *
      * {{code: php
      *     header("Cache-Control: no-store, no-cache, must-revalidate");
      *     header("Cache-Control: post-check=0, pre-check=0", false);
      *     header("Pragma: no-cache");
      * }}
      *
      * @param Solar_Uri_Action|string $spec The URI to redirect to.
      *
      * @param int|string $code The HTTP status code to redirect with; default
      * is '303 See Other'.
      *
      * @return void
      *
      */
     protected function _redirectNoCache($spec, $code = 303)
     {
         // reset cache-control
         $this->_response->setHeader(
             'Cache-Control',
             'no-store, no-cache, must-revalidate'
         );

         // append cache-control
         $this->_response->setHeader(
             'Cache-Control',
             'post-check=0, pre-check=0',
             false
         );

         // reset pragma header
         $this->_response->setHeader('Pragma', 'no-cache');

         // continue with redirection
         return $this->_redirect($spec, $code);
     }



More information about the cvs mailing list