If you want an insight into the critical design issues and programming techniques required for a web-oriented framework in PHP5, this book will be invaluable. Whether you want to build your own CMS style framework, want to understand how such frameworks are created, or simply want to review advanced PHP5 software development techniques, this book is for you.
As a former development team leader on the renowned Mambo open source content management system, author Martin Brampton offers unique insight and practical guidance into the problem of building an architecture for a web-oriented framework or content management system, using the latest versions of popular web scripting language PHP.
The scene-setting first chapter describes the evolution of PHP frameworks designed to support websites by acting as content management systems. It reviews the critical and desirable features of such systems, followed by an overview of the technology and a review of the technical environment.
The following chapters look at particular topics, with:
- A concise statement of the problem
- Discussion of the important design issues and problems faced
- Creation of the framework solution
At every point, there is an emphasis on effectiveness, efficiency, and security all—vital attributes for sound web systems. By and large these are achieved through thoughtful design and careful implementation.
Early chapters look at the best ways to handle some fundamental issues such as the automatic loading of code modules and interfaces to database systems. Digging deeper into the problems that are driven by web requirements, following chapters go deeply into session handling, caches, and access control.
New for this edition is a chapter discussing the transformation of URLs to turn ugly query strings into readable strings that are believed to be more "search engine friendly" and are certainly more user friendly. This topic is then extended into a review of ways to handle "friendly" URLs without going through query strings, and how to build RESTful interfaces.
The final chapter discusses the key issues that affect a wide range of specific content handlers and explores a practical example in detail.
Chapter 1, CMS Architecture: This chapter introduces the reasons why CMS frameworks have become such a widely used platform for websites and defines the critical features. The technical environment is considered, in particular the benefits of using PHP5 for a CMS. Some general questions about MVC, XHTML generation, and security are reviewed.
Chapter 2, Organizing Code: Before we go further with CMS development, let's look at a problem that can be neatly solved using PHP5. Substantial systems do not consist of a single file of code. Whatever our exact design, a large system should be broken down into smaller elements, and it makes sense to keep them in separate files, if the language supports it. Code is more manageable this way, and systems can be made more efficient.
As we are considering only PHP implementations, the source code files are used at runtime. PHP is an interpreted language and, at least in principle, runs the actual source code. So we need a good technique for handling many source files at runtime.
This creates issues; a paramount one is security. Another is ease of coding, where it is tedious and cumbersome to have to repeatedly include code to load other files. Yet another is efficiency, as we do not want to load code that is not needed for a particular request.
Chapter 3, Database and Data Objects: It is in the nature of a content management system that the database is at its heart. Before we get into the more CMS-specific questions about handling different kinds of users, it is worth considering how best to handle storage of data in a database. Applications for the web often follow similar patterns of data access, so we will develop the database aspect of the framework to offer methods that handle them easily. A relational database holds not just data, but also information about data. This is often underutilized. Our aim is to take advantage of it to make easier the inevitable changes in evolving systems, and to create simple but powerful data objects. Ancillary considerations such as security, efficiency, and standards compliance are never far away.
Chapter 4, Administrators, Users, and Guests: With some general ideas about a CMS framework established, it is time to dive into specifics. First, we will look at handling the different people who will use the CMS, creating a basis for ensuring that each individual is able to do appropriate things. Although we might talk generally of users, mostly the discussion of "users" means those people who have identified themselves to the system, while those who have not are deemed "guests". A special subset of users contains people who are given access to the special administrator interface provided by the system.
Questions arise concerning how to store data about users securely and efficiently. If the mechanisms are to work at all, the ability to authenticate people coming to the website is vital. Someone will have to look after the permanent records, so most sites will need the CMS to support basic administrative functions. And the nature of user management implies that customization is quite likely.
Not all of these potentially complex mechanisms will be fully described in this chapter, but looking at what is needed will reveal the need for other services. They will be described in detail in later chapters. For the time being, please accept that they are all available, to help solve the current set of issues. In this chapter, we are solely concerned with the general questions about user identification and authentication. Later chapters will consider the technical issues of sessions and the question of who can do what, otherwise known as access control.
Chapter 5, Sessions and Users: Here we get into the detailed questions involved in providing continuity for people using our websites. Almost any framework to support web content needs to handle this issue robustly, and efficiently. In this chapter, we will look at the need for sessions, and the PHP mechanism that makes them work. There are security issues to be handled, as sessions are a well known source of vulnerabilities. Search engine bots can take an alarmingly large portion of your site bandwidth, and special techniques can be used to minimize their impact on session handling. Actual mechanisms for handling sessions are provided. Session data has to be stored somewhere, and I argue that it is better to take charge of this task rather than leave it to PHP. A simple but fully effective session data handler is developed using database storage.
Chapter 6, Caches and Handlers: Running PHP has quite a high cost, but in return we gain the benefit of a very powerful and flexible language. The combination of power and high cost suggests that for any code that will be executed frequently, we should use the power of PHP to aid efficiency. The greatest efficiency is gained by streamlined design. After all, not doing things at all is always the best way to achieve efficiency. Designing with a broad canvas, so as to solve a number of problems with a single mechanism, also helps. And one particular device the cache provides a way to store data that has been partly or wholly processed and can be used again. This obviates doing the processing over again, which can lead to great efficiency gains.
The discussion here is entirely about server-side caching. In general, a CMS is serving dynamic pages that may change without warning. It is usually undesirable for proxies between the server and the client to hold copies of pages and there are severe limits on the feasibility of allowing the browser to cache pages. Individual elements such as images, CSS, or JavaScript have much more potential, but this is often better handled by careful configuration of the web server than by adding PHP code. But there are large gains to be had by building an efficient server-side caching mechanism.
Chapter 7, Access Control: With ideas about users and database established, we quickly run into another requirement. Many websites will want to control who has access to what. Once embarked on this route, it turns out there are many situations where access control is appropriate, and they can easily become very complex. So in this chapter we look at the most highly regarded model-role based access control-and find ways to implement it. The aim is to achieve a flexible and efficient implementation that can be exploited by increasingly sophisticated software. To show what is going on, the example of a file repository extension is used.
Chapter 8, Handling Extensions: Now we have reached a critical point in our book. In the previous chapters a core framework was created, but it did not actually make a significant website. Content is so varied that it makes good sense to follow the approach of creating a minimal framework to support user facing functions. But now we need to make the big step of adding real functionality. If we take this step to be a question of extending the minimal framework, it's logical to call our additions extensions. Flexibility in implementing our CMS suggests that it should be easy to install extensions into the basic framework.
This means two things. One is an issue of principle a sound architecture is needed for building extensions. The other is a practical one a simple and effective mechanism is needed for installing extensions, preferably using a web interface.
Extensions will be divided into four types, which represent the different ways in which they operate, and their individual purposes. The justification for this breakdown will be explained shortly, followed by consideration of how they fit together, and how they should be implemented.
Chapter 9, Menus: Most websites use menus, although great inventiveness goes into forms of presentation. A menu is simply a named list of possible destinations, which may be inside the site or elsewhere. The list may contain subsidiary lists within it, which obviously form submenus. It is a matter for presentation whether the sublists are always visible, or only become visible when the parent item is selected.
The site administrator needs a mechanism for maintaining these lists, with the ability to give each item an appropriate name. That implies some basic functionality. A subsidiary requirement is that it is often desirable to keep track of which menu item is relevant to the user's current activities. Menu entries that refer into the site can also be used to define page content.
Despite the huge variety in menu styling, the concept is standard, and there is no reason why a good CMS framework should not provide all the fundamental mechanisms for menu handling. It is important that these are provided in a way that does not constrain presentation.
Chapter 10, Languages: In the early days of computing, languages did not figure prominently. Much of the development and commercialization took place in English speaking countries. The "standard" character sets were ASCII and EBCDIC. At best, schemes were employed so that a computer could operate with one particular non-English language.
The world has changed a great deal since then. Especially with the rise of the internet, computer systems need to deal with more than one language. In fact, they need to be capable of dealing with a huge variety of languages, many of which require different alphabets. Information has to be stored in alternative versions for different languages, especially while computer translation remains a joke. So while some people may be able to do without it, many builders of a CMS will require language support.
Chapter 11, Presentation Services: Despite, or maybe because of, the huge amount of work that has been devoted to techniques for creating presentation output for websites, thorny issues continue to be disputed. To some extent, these can be regarded as turf wars between software developers and web designers. The story probably has a long way still to go. With honorable exceptions, the question of how to present the output from computer programs was rarely the subject of serious design effort prior to the advent of World Wide Web. Now, good design is vital to website creation, and both software architects and creative designers have to find a way to cope with the unaccustomed situation of working together.
Chapter 12, Other Services: This chapter could be described as a rag bag of miscellaneous services, but they are all significant in the construction of a CMS. Adding services to the framework in a standard way considerably eases the development of specific systems. Dealing with XML, handling configurations for extensions and manipulating sets of parameters are all loosely related services that have obvious uses, especially given that XML provides a simple, robust, and widely applicable technique for handling information.
File and directory handling is best treated as a service rather than being implemented in an ad hoc fashion using PHP functions, partly because of the complex permissions issues that can easily arise. Also, common operations are repeatedly needed, such as finding all the files in a directory that match a certain pattern.
Most systems need WYSIWYG editing in order to satisfy user expectations, and the sending of e-mail is often a requirement.
The most complex section of this chapter deals with the emerging possibilities for building standard logic for managing database tables. This is likely to evolve further with growing experience, but enough is given here to indicate some suggested directions.
Chapter 13, SEF and RESTful Services: Resources on the Web are accessed by the use of the Universal Resource Indicator, the URI. Although technology can lead to complicated formats for the URI, people prefer them to be readable. It is often thought that search engines also prefer a readable URI, and so making them look appealing has been a major part of efforts to make a CMS "search engine friendly". There are actually many other factors, including the handling of metadata and particularly titles.
A loosely related development is the rise of RESTful services. This is a move to adopt a style of interaction between websites that aims to naturally exploit the characteristics of the HTTP protocol, including the URI. The aim is to move away from protocols such as XML-RPC that wrap up all the information being passed to and fro, instead making more of it visible through standard features of web access. This includes the building of families of meaningful URIs.
Although the various applications added to a framework will have to do some of the work, there are important steps that can be taken within the framework to provide the tools that are needed. It is those we shall concentrate on in this chapter.
Chapter 14, Error Handling: In an ideal world software would never experience errors but we don't live in an ideal world! So we need to consider what to do when errors arise. One option is to simply leave PHP5 to do its best, but when the issues are considered, that doesn't look a good choice.
What are our concerns over errors? Perhaps the overriding issue here has to be that in the case of an error we need the software to degrade gracefully and not damage the system. Another consideration for web software is that errors should not provide information or opportunities that will aid crackers any more than can be helped.
Errors create problems for developers. One is that in the nature of the Web, errors are often not reported. People simply give up and do something else. Web software is often written quickly, and it is surprising how many errors exist in released software. Other factors for developers are that error handling can be a big overhead; also it is often unclear what counts as a good way to deal with errors.
Given this range of issues, it is clear that it will be helpful if the CMS framework can contribute useful functionality for error handling. Also included here for convenience is the special processing that takes place when a URI does not correspond to any page in our site, thus demanding a "404 error"; likewise handling of situations where a user has attempted something not permitted, making a "403" error appropriate.
Chapter 15, Real Content: Here we are at the last chapter, and our CMS framework still has no content! The reason for this state of affairs is that the provision of a CMS has a lot of common features, but most of them operate at a basic level below the provision of specific services. This is illustrated by looking at a popular off the shelf CMS and observing that of all the available extensions, the largest single category is simply described as "content management". So, however much the standard package provides, it seems that there is still enormous scope for additions.
In this chapter, I aim to describe a number of specific application areas, discussing the particular issues that arise with implementations. Looking at our framework solution, I will concentrate on one sample extension. It is a very simple text handling mechanism that can be explained in detail. Also, the ways in which the simple text system could be extended will be described.
Appendix A, Packaging Extensions: It provides information for those who want to build an installer following similar design principles to those described in this book, or for people who intend to use Aliro itself.
Appendix B, Packaging XML Example: It shows the packaging XML for the Aliro login component, which includes user management.