net.htmlparser.jericho
Class Attribute

java.lang.Object
  extended by Segment
      extended by Attribute
All Implemented Interfaces:
java.lang.CharSequence, java.lang.Comparable<Segment>

public final class Attribute
extends Segment

Represents a single attribute name/value segment within a StartTag.

An instance of this class is a representation of a single attribute in the source document and is not modifiable. The OutputDocument.replace(Attributes, Map) and OutputDocument.replace(Attributes, boolean convertNamesToLowerCase) methods provide the means to add, delete or modify attributes and their values in an OutputDocument.

Obtained using the Attributes.get(String key) method.

See also the XML 1.0 specification for attributes.

See Also:
Attributes

Method Summary
 java.lang.String getDebugInfo()
          Returns a string representation of this object useful for debugging purposes.
 java.lang.String getKey()
          Returns the name of this attribute in lower case.
 java.lang.String getName()
          Returns the name of this attribute in original case.
 Segment getNameSegment()
          Returns the segment spanning the name of this attribute.
 char getQuoteChar()
          Returns the character used to quote the value.
 StartTag getStartTag()
          Returns the start tag to which this attribute belongs.
 java.lang.String getValue()
          Returns the decoded value of this attribute, or null if it has no value.
 Segment getValueSegment()
          Returns the segment spanning the value of this attribute, or null if it has no value.
 Segment getValueSegmentIncludingQuotes()
          Returns the segment spanning the value of this attribute, including quotation marks if any, or null if it has no value.
 boolean hasValue()
          Indicates whether this attribute has a value.
 
Methods inherited from class Segment
charAt, compareTo, encloses, encloses, equals, getAllCharacterReferences, getAllElements, getAllElements, getAllElements, getAllElements, getAllElements, getAllElementsByClass, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTagsByClass, getAllTags, getAllTags, getBegin, getChildElements, getEnd, getFirstElement, getFirstElement, getFirstElement, getFirstElement, getFirstElementByClass, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTagByClass, getFormControls, getFormFields, getMaxDepthIndicator, getNodeIterator, getRenderer, getRowColumnVector, getSource, getStyleURISegments, getTextExtractor, getURIAttributes, hashCode, ignoreWhenParsing, isWhiteSpace, isWhiteSpace, length, parseAttributes, subSequence, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

getKey

public java.lang.String getKey()
Returns the name of this attribute in lower case.

This package treats all attribute names as case insensitive, consistent with HTML but not consistent with XHTML.

Returns:
the name of this attribute in lower case.
See Also:
getName()

getName

public java.lang.String getName()
Returns the name of this attribute in original case.

This is exactly equivalent to getNameSegment().toString().

Returns:
the name of this attribute in original case.
See Also:
getKey()

getNameSegment

public Segment getNameSegment()
Returns the segment spanning the name of this attribute.

Returns:
the segment spanning the name of this attribute.
See Also:
getName()

hasValue

public boolean hasValue()
Indicates whether this attribute has a value.

This method also returns true if this attribute has been assigned a zero-length value.

It only returns false if this attribute appears in minimized form.

Returns:
true if this attribute has a value, otherwise false.

getValue

public java.lang.String getValue()
Returns the decoded value of this attribute, or null if it has no value.

This is equivalent to CharacterReference.decode(getValueSegment(),true).

Note that before version 1.4.1 this method returned the raw value of the attribute as it appears in the source document, without decoding.

To obtain the raw value without decoding, use getValueSegment().toString().

Special attention should be given to attributes that contain URLs, such as the href attribute. When such an attribute contains a URL with parameters (as described in the form-urlencoded media type), the ampersand (&) characters used to separate the parameters should be encoded to prevent the parameter names from being unintentionally interpreted as character entity references. This requirement is explicitly stated in the HTML 4.01 specification section 5.3.2.

For example, take the following element in the source document:

<a href="Report.jsp?chapt=2&sect=3">next</a>
By default, calling getAttributes().getValue("href") on this element returns the string "Report.jsp?chapt=2§=3", since the text "&sect" is interpreted as the rarely used character entity reference &sect; (U+00A7), despite the fact that it is missing the terminating semicolon (;).

Most browsers recognise unterminated character entity references in attribute values representing a codepoint of U+00FF or below, but ignore those representing codepoints above this value. One relatively popular browser only recognises those representing a codepoint of U+003E or below, meaning it would have interpreted the URL in the above example differently to most other browsers. Most browsers also use different rules depending on whether the unterminated character reference is inside or outside of an attribute value, with both of these possibilities further split into different rules for character entity references, decimal character references, and hexadecimal character references.

The behaviour of this library is determined by the current compatibility mode setting, which is determined by the static Config.CurrentCompatibilityMode property.

Returns:
the decoded value of this attribute, or null if it has no value.

getValueSegment

public Segment getValueSegment()
Returns the segment spanning the value of this attribute, or null if it has no value.

Returns:
the segment spanning the value of this attribute, or null if it has no value.
See Also:
getValue()

getValueSegmentIncludingQuotes

public Segment getValueSegmentIncludingQuotes()
Returns the segment spanning the value of this attribute, including quotation marks if any, or null if it has no value.

If the value is not enclosed by quotation marks, this is the same as the value segment

Returns:
the segment spanning the value of this attribute, including quotation marks if any, or null if it has no value.

getQuoteChar

public char getQuoteChar()
Returns the character used to quote the value.

The return value is either a double-quote ("), a single-quote ('), or a space.

Returns:
the character used to quote the value, or a space if the value is not quoted or this attribute has no value.

getStartTag

public StartTag getStartTag()
Returns the start tag to which this attribute belongs.

Returns:
the start tag to which this attribute belongs, or null if it is not within a start tag.

getDebugInfo

public java.lang.String getDebugInfo()
Returns a string representation of this object useful for debugging purposes.

Overrides:
getDebugInfo in class Segment
Returns:
a string representation of this object useful for debugging purposes.