Standard scope
This standard applies to:
- pennmedicine.org
- All Penn Medicine websites
- Penn Medicine mobile applications
- All Penn Medicine digital products
Overview
Site architecture improves discoverability of content, strengthens keyword/topic relevance through internal linking, and supports a positive user experience that encourages engagement. Site hygiene helps to maintain crawlability, indexability, and trustworthiness of a website over time, so search engines reward it with higher visibility.
Important site-level elements include:
- Information architecture
- Crawl depth
- URL structures
- Noindex tagging
- Follow/nofollow attribution
- Redirects and vanity URLs
- Updating and deleting and old content
Information architecture
Information architecture (IA) is critical for SEO because it shapes how content is organized, connected, and discovered. A well-structured IA helps search engines crawl and index a site efficiently while guiding users to the information they need.
By grouping related content, using logical hierarchies, and implementing internal linking, IA ensures that important pages are prioritized, page relationships are established, and keyword relevance is reinforced. This improves site visibility, ranking potential, and user experience.
In 2024, the information architecture of the pennmedicine.org website was updated and aligned to a new design system.
Crawl depth
Be mindful of crawl depth and minimize folder levels and crawl depth where able. The deeper a page is in the folder structure and hierarchy, the more difficult it is for users and crawlers to find that page. Depth can also indicate that a page has decreased importance and can affect a page’s indexability.
URL folders should flow from the root domain through the parents and to the child in a structure that is intuitive for the topic of the child page.
A good rule of thumb is to ensure that all important pages are no more than 3-4 clicks away from the homepage.
URL structures
URL structures should reflect the information architecture of a site for both SEO and usability. That said, SEO doesn’t require URLs to perfectly mirror every IA level. The key is consistent, logical, and descriptive URLs that balance simplicity with context.
Refer to this checklist for URL naming for all pages:
- Paths should be clear, concise and descriptive of the destination.
- Align the URL with the H1/topic of the page (see Page-level guidance for more details about H1 titles)
- Use lowercase to avoid duplicate content/URL issues and problems with indexing by search engines. Case insensitive URLs should be enforced, but if not, URLs may be treated as distinct even if they are otherwise identical.
- Use non-trailing slash (/) format of the URL (ie: /conditions vs. /conditions/)
- Use hyphens to separate individual words.
- Shorten the URL slug (final path) to reflect the specificity of the page content in relation to the page’s title. For instance, instead of .../conditions/uterine-endometrial-cancer/uterine-endometrial-cancer-diagnosis, shorten the slug in this way .../conditions/uterine-endometrial-cancer/diagnosis as there is no need to repeat the broader topic from the higher level of the path.
- Do not include stop words or filler words, unless crucial to the topic.
- Do not use conjunctions.
- Do not use special characters, numerals, or non-ASCII characters (international language).
- No dates, years or any non-evergreen identifiers should be used in folder structure or final subdirectory URL.
- URL language should be inclusive of any child page that may appear under it (L2), be common and universally understood and clearly indicate a page with multiple child pages beneath it. For example, using /locations/ instead of /location/ or /specialties/ instead of /specialty/ uses plural and "group" language to indicate a user entering a larger area of the site with more than one page for that topic.
- All URLs and final subdirectories should be unique.
- Avoid changing a URL after a page is published as this can result in crawlers' inability to find and index changed URL, users 404ing, added redirect maintenance, and unreliable data collection and analysis of this page due to conflicting URLs.
- Only use dynamic URLs where needed. For example, Find-a-Doctor and Find-a Location searches can result in the same resulting URL from multiple access points creating large amounts of URLs for the same destination and encourage duplication.
- Use country-specific domains if multi-regional domains will exist
- All URLs should be secure with HTTPS protocol and SSL certification.
Additional guidance for pennmedicine.org pages
- The first element of the path of the URL should align to page type for specific types of content. For instance, content about conditions we treat at Penn Medicine is found under the path of https://www.pennmedicine.org/conditions
- Check new URLs on pennmedicine.org against all the above guidance as Sitecore may not automatically check for all of these standards.
Noindex tagging
A noindex tag tells search engines not to include a specific page in their index. It doesn’t block crawling, but it prevents the page from appearing in search results. The search engine bot will still visit the page and read the tag, but it will then drop the page from its index; this means the page will not show up for any search query. The noindex tag is important because it gives site owners control over what should and shouldn’t appear in the search engine results page (SERP).
Applying noindex can be a potential, but not definite, solution for pages that need to exist for business reasons but that you do not want to compete in the SERP with your other internal pages. However, applying the noIndex tag to a page is not recommended as an approach to avoid deleting duplicate content.
For pennmedicine.org, our standard setting is to index pages. This is not a default setting in Sitecore, so it must be set when the page is created.
Follow/nofollow attribution
A follow/nofollow attribute is important in SEO because it gives site owners and publishers control over how search engines treat certain links. A “follow” tag signals to search engines that they should crawl the linked page and pass SEO authority (or "link equity") to it. A “nofollow” tag signals to search engines to ignore the links on a page and, therefore, give no SEO authority or link equity to the pages linked to from this webpage.
Telling a search engine to follow links helps it crawl a website and recognize related pages, while placing nofollow on unimportant pages means saving crawl budget for higher priority items.
For pennmedicine.org, our standard setting is to follow. There is no default setting in Sitecore, so it must be set when the page is created.
Redirects and vanity URLs
Redirects should preserve SEO equity and user experience by being clean, direct, and permanent when appropriate. Vanity URLs should be simple, user-friendly entry points that redirect with SEO best practices in mind.
For redirects, follow this guidance:
- Use 301 redirects for permanent moves: This passes the majority of link equity (ranking power) to the new URL. This is the standard redirect code used when pages are deleted or URLs are changed.
- Use 302 redirects only for temporary changes: Search engines treat these as short-term and may not transfer full ranking signals.
- Redirect one-to-one, not chains: Avoid redirect chains (URL A → URL B → URL C). Instead, point URL A directly to the final destination.
- Update internal links: Don’t rely on redirects for navigation; update menus, sitemaps, and content to point to the new URLs directly.
- Avoid redirect loops: Loops (A → B → A or A → A) confuse crawlers and harm user experience.
For vanity URLs, follow this guidance:
- Keep them short and memorable: Easy for users to type, share, and remember (e.g.,
example.com/gala). - Match them to campaigns or offline promotions: They work best for marketing, print, or events.
- Redirect them to the full destination page: Use a 301 redirect so traffic and SEO signals consolidate on the primary URL.
- Avoid keyword stuffing: Descriptive is good, but overly long vanity URLs look spammy and don’t add SEO value.
- Consistent formatting: Stick to lowercase, hyphenated words for readability and best SEO practice.
For additional guidance, reference Vanity URLs in the technical standards.
Updating or deleting old content
Updating old content keeps your site competitive and trustworthy; while deleting or consolidating outdated content helps search engines and users focus on your most valuable resources. Together, these practices improve rankings, authority, and user experience.
There are several reasons websites should be continuously decluttered or updated:
- Indexing and crawling: The more pages a website contains, the more pages Google and search engines must crawl to keep your site indexed with the most updated version. Google has a crawl budget, which is the amount of time and resources Googlebot is allocated to crawl and update your site. While crawl budget is not a major concern for most smaller to medium-sized websites, it becomes a significant factor for very large sites with thousands or millions of URLs. Keeping your website clean helps Google avoid wasting crawl budget on underperforming or outdated content, as well as lessens the total amount of pages that search engines need to crawl to display the latest version of your site.
- User experience: A website with too many disorganized webpages can be overwhelming and confusing for users to navigate. A cleaner site architecture with well-defined topics and fewer, more comprehensive pages can lead to a better user experience, higher engagement, and lower bounce rates.
- Content management: Too much content makes it difficult to maintain quality and avoid internal competition between webpages, which may ultimately lead to diluting the authority and visibility potential of your best content.
- Search engine prioritization: Google's algorithms tend to favor content that is both high-quality and relevant to a user's intent. While it's not just about having the most recent publication date, updating older content with fresh information, new data, and a new publishing date can signal to search engines that the page is still relevant and valuable, giving it a potential boost. However, search engines now better understand that some evergreen content doesn't need frequent updates to be valuable.
That all being said, you should always check the performance (traffic, visibility/rankings, backlinks, etc.) of an old webpage before updating or deleting it. Let performance guide your decisions and do not fix what isn't broken. When deleting content, always set up a 301 redirect to a relevant, existing page to preserve any link equity and provide a good user experience.
Related resources
- pennmedicine.org URL Naming Conventions and Policies (VPN access required)
- Google: Managing your crawl budget
Contact
For questions, contact web-standards@pennmedicine.upenn.edu.